Computer Vision for Everyone

Original article was published on Deep Learning on Medium

Computer Vision for Everyone

1) The innovation

Empower Data Analyst/Software Engineer/Data Scientists to build state of the art computer vision solutions

2) Why do it? (The real pain)

Current deep learning methods require a huge amount of data and compute. Most Companies can benefit from having some kind of in-house CV related algorithms. Yet the expertise and money required to build a production-grade solution is too high for most companies. Even for an experienced Data Scientist, a quick prototype (considering data is available) can take 3–4 days. Most people also do not care about the algorithm used, they only care about the end results. APIs available are also too generic and/or are not optimized for the use-case.

3) Who will use it? How do they benefit after using your product?

Anyone with a computer will be able to build a computer vision solution. Only at Data collection process, they will be required to interact with the software. All the technical parts of Data Augmentation, Training, API building will be taken care of. It will help create a new breed of Data Analysts/software engineers which can also build cv solutions.

4) How to do it?

An Electron app/ Web-based tool. The software will benefit from being able to utilise more resources from the computer.

The kind of CV applications is broadly divided into these categories — Classification, segmentation, Saliency, Object Detection.

if data is present –

Use gsutil multithreaded uploading to our data server after converting into tfrecord( A serialized file for the dataset)- most people who will do it for the first time will likely make the mistake of using rsync or ftp without correct serialisation- too slow.

if data is already in the cloud — sync with it

Choose one of the categories.

choose the best algorithm(handled by the backend — mostly pretrained to reduce training time, preprocessing of images, data augmentation will be hidden from the user)

Use Kubeflow to reduce training time — distributed training using Kubernetes.

Present an API endpoint

if No data present –

Provide an interface to search for data regarding specific problems

integration with data labeling APIs like scale.ai etc to get data labeled

proceed in the normal flow

5) Who will kick your ass?

Tensorflow API

Google cloud platform autoML framework + Vision API

Clarifai

AWS ML

https://labelbox.com/

6) What is so great about this?

Tensorflow API, Google cloud platform autoML framework + Vision API, AWS all have a huge learning curve. This works for experienced software engineers/Data Scientists. But not for everyone. They are amazing if you want to create something totally new — a new algorithm, massive datasets(50 TB +). Most companies don’t need this.

Labelbox looks really close to what I am thinking. Though it only focuses on the data part.

The platform I am thinking is an intuitive, easy to use, get to production in record time with almost zero knowledge or experience in computer vision.

If aren’t aware of how hard it is at a large scale, watch this talk by Andrej Karpathy on building Computer Vision algorithms at Tesla —