Israeli-based Startups Eliminating Bottlenecks in the AI Workflow
My research on this topic began in 2019 during my summer internship at Deloitte’s Innovation Tech Terminal (now Deloitte Catalyst) in Tel Aviv through Birthright Israel Excel. Since then, I’ve returned to Tel Aviv to continue learning Hebrew, working in tech, and exploring the incredible startup ecosystem here.
AI Infrastructure: background, trends, and insights through the lens of Israel’s startup ecosystem
Over the past few years, artificial intelligence has played a major role in defining trends of startups. Across all industries, the general evolution has shifted from computing based on human instruction to computing based on self-learning. Research and advisory firm Tractica even predicted that the annual worldwide AI revenue will grow from $643.7M in 2016 to $38.8B by 2025. However, as new technologies are implemented across all domains, we need to consider the following: during a gold rush, sell shovels.
Thus, we begin to see an opportunity for artificial intelligence infrastructure. Essentially, along with a new class of software — here, artificial intelligence and its subset, machine learning — comes new infrastructure to support it.
Why the push for AI infrastructure? Traditionally, companies use a software-defined infrastructure (SDI) to support their dynamic environment. A typical SDI, like a cloud-based infrastructure, is built on the evolution of scripts or program code. It works independently of a specific hardware environment, and is designed so that it can control an infrastructure largely without human interaction. However, SDI has its limitations, especially as the technologies that companies use are beginning to transform and evolve. Software-defined infrastructure is constrained by a static source code, which also means that its functioning largely depends on the skills of the developer who writes the particular code. Additionally, an SDI is unable to understand or learn about its own environment. SDI, in essence, is unintelligent; it lacks flexibility. In contrast, AI infrastructure is an intelligent upgrade: it’s complete with AI and ML algorithms that can “learn” from the information it gains over time to build frameworks that can keep up with the new data. AI infrastructure can:
- Analyze the dynamic behavior of the existing infrastructure and learn to understand their working by itself.
- Eliminate errors in the environment by constantly monitoring the functioning of the infrastructure, fixing issues when they arise.
- Allocate resources when required by the workload and de-allocate them when they are no longer required.
We’ve already begun to see the shift towards AI infrastructure: from June 2018 to June 2019, there were 22 new Israeli startups founded in the sector of AI/ML infrastructure. However, major cloud providers already all have some kind of involvement, the most prominent example being Google with TensorFlow, an open-source machine learning library for research and production. So, with major multinationals already invested and active in the industry of AI infrastructure, startups should consider for themselves: is there a true startup opportunity here? More specifically, is there a true opportunity for an Israeli startup?
The current, nearly unanimous, answer is yes — startups do have an opportunity to become active in this space. Since 2012, there has been a 300,000x increase in the amount of compute used in the largest AI training runs, suggesting a sizeable opportunity for startups to aid in the efficiency of the artificial intelligence workflow. Specifically, Israeli startups seem effectively poised to be at the forefront of this disruption: Israel is a key infrastructure innovator (think: USB flash drive, the Intel 8088, VoIP, etc.), so we can reasonably expect Israel to continue to innovate in this next generation of computing infrastructure. However, as the industry is only just beginning to develop, startups need to address the difficulties of an artificial intelligence practice to truly understand where problems arise and, thus, where opportunities lie to innovate.
In this report, I’ll address the key loci where I have found bottlenecks in the artificial intelligence workflow. From here, I’ll introduce opportunities for startups to solve the related infrastructure issues, and point out Israeli startups already active in the domain. Where applicable, I’ll share my insights on where I expect to see a rise in startup activity within a particular domain or service.
Additionally, before continuing, it will be helpful to lay out some terminology that will be used throughout this report, and is commonplace in other industry discussions of AI/ML infrastructure:
- AI refers to the larger topic that includes artificial intelligence (AI), machine learning (ML), and deep learning (DL).
- AI frameworks provide data scientists and developers with the building blocks to train and validate AI models without having to go into the low-level programming of the underlying algorithms. Popular frameworks include TensorFlow (mentioned above) and Caffe.
- GPU refers to graphical processing units, which serve as a dense parallel computing engine.
- PoC refers to proof of concept, which demonstrates a system’s ability to perform an activity. In this case, a PoC would be used to demonstrate that a solution based on this architecture delivers the necessary benefits.
- HDFS refers to Hadoop Data File System, a common scale-out file system using storage rich servers for analytics and machine learning.
The Artificial Intelligence Workflow
The AI workflow is a detailed process cycle, and there are issues at multiple different points that prevent artificial intelligence technology from reaching its full efficiency and potential. To identify these issues, I should first lay out the general cycle and organization of the AI workflow:
1. Data collection involves installing and configuring the data to be used. The data can be collected over a number of years, and may be identified from a variety of sources:
- Traditional business data
- Sensor data
- Data from collaboration partners
- Data from mobile apps and/or social media
- Legacy data
2. Data preparation can take weeks or months to complete. The quality of an artificial intelligence model is directly related to the quality of the data used during training: as is often said in the artificial intelligence space, Bad data leads to bad inferences. Within the context of AI, data can be separated into a few buckets:
- Data used to train and test the models
- Data that is analyzed by the models
- Historical or archival data that may be reused (this data can come from a variety of places: databases, data lakes, public data, social media, and more)
3. Data training and optimization typically takes days to weeks. To train an AI model, the training data must be in a specific format, and each model has its own format. As a result, data preparation is often one of the largest challenges — both in complexity and time — for data scientists. In fact, many data scientists claim that over 80% of their time is spent in this data preparation phase, and only 20% on the actual art of data science.
4. Deployment and inference of the data typically only takes seconds to receive results.
5. Accuracy preservation and improvement reveals how the AI workflow is an iterative cycle: the output of the deployment phase is used as a new input to the data collection phase, so the model constantly improves in accuracy. AI infrastructure is important because the success of moving the data through this 5-step pipeline depends largely on the quality of the infrastructure.
Bottlenecks in the Workflow
Now that I have laid out the lifecycle of the artificial intelligence workflow, I can address the challenges preventing artificial intelligence technology from reaching maximum efficiency. Based on my own research and the research of industry professionals, there seem to be four main issues: I will outline these issues, highlighting where AI infrastructure can be used to streamline the process, and introducing areas where, from my personal insights, work from startups still needs to be done to speed up the workflow:
1. The artificial intelligence workflow is compute intensive.
2. Training and developing AI models requires an exorbitant amount of trial and error with hundreds, often thousands, of experiments.
3. Data annotation often is so time-intensive it creates a bottleneck.
4. Machine learning as a service is in high demand, since there aren’t enough trained data scientists to do the work manually.
I will address these issues in isolation, beginning with the first: the artificial intelligence workflow is compute intensive, meaning there isn’t a current infrastructure robust enough to deal with machine learning — specifically deep learning — operations at scale. As a result, startups and established companies alike have attempted to introduce their own solutions. For example, established companies like Google, Microsoft, Alibaba, and Intel have created their own hardware through AI-dedicated chipsets. Startups like Habanaand Hailo have followed suit in this hardware-driven thought process and attempted to bring their customized hardware to market. However, a parallel solution exists, which I find to be more cost-effective, scalable, and innovative: instead of creating new hardware, simply develop software that optimizes the existing hardware for machine learning tasks. We see this already in Uber’s open sourcing of Horovod, a distributed training framework for TensorFlow and other frameworks with the goal of making distributed deep learning fast and easy to use. Additionally, this hardware-optimizing software is seen in Google’s AutoML, a suite of machine learning products that enables developers to train models specific to their business needs.
The second issue in the artificial intelligence workflow pertains to data science. One part of this work involves running hundreds, often thousands, of experiments with a myriad of different parameters in order to reach the optimal result. This requires an exorbitant amount of trial and error, which isn’t necessarily scalable for robust models or large amounts of data. Israeli startups have already begun attempting to solve this issue — some major players in the startup space include:
- Missinglink: Utilizes computer vision, enabling users to visually track, document, and manage all their experiments in one dashboard and quickly spot vanishing gradients, anomaly detection, overfitting, and more.
- Allegro: Provides a complete product lifecycle management solution for AI development and production, beginning with computer vision.
- Cnvrg.io: Organizes every stage of a data science project, from research to collection to model optimization.
- Comet: Allows data scientists to automatically track datasets, code changes, and production models to improve efficiency, transparency, and reproducibility.
- Other similar startups in the space, and their descriptions, can be found here.
Additionally, we saw in April 2019 a $13M investment in Israeli startup Run.AI, which provides a virtualization and acceleration solution for deep learning (the software virtualizes many separate compute resources into a single giant virtual computer with nodes that can work in parallel).
A third issue in the artificial intelligence workflow is the annotation, or tagging, or data. Companies use hundreds of thousands — sometimes millions — of data points to train their models, meaning data annotation can often be a bottleneck in the workflow. Thus, startups have a unique opportunity to automate this data preparation process instead of just relying on cheap labor (think Amazon’s crowdsourcing marketplace Mechanical Turk): two examples in the Israeli startup ecosystem are Dataloop, which generates AI datasets from raw visual data, and DataGen. A pioneer in the synthetic data creation space, I find DataGen to be particularly interesting — DataGen creates synthetic data, realistic enough to effectively train a model, instead of sourcing existing datasets. This is beneficial because companies are often unable or reluctant to use client data because of privacy issues, and synthetic data allows them to use artificial data with the same characteristics as their real data. Another notable benefit here is that this type of synthetic data comes pre-annotated: the process of annotating data is incredibly time-consuming and expensive. It would only make sense to see a significant rise in the adoption of synthetic data in the coming years, and with it, a rise in startups doing work similar to that of DataGen.
Another workaround for the issue of data annotation is the concept of unsupervised learning, which does not require labeled data. Instead, unsupervised learning takes in the input set and finds patterns in the data, both organizing the data into groups (clustering) and finding outliers (anomaly detection). Within unsupervised learning is a particularly fascinating development in the AI infrastructure space, also utilized in DataGen’s technology: Generative Adversarial Networks (GANs). Here, two networks battle each other where one network — the generator — is tasked with creating data to trick the other network — the discriminator. From my research, I have found unsupervised learning to be an innovative development in the artificial intelligence space because it can sort data into groups that humans may not consider due to preexisting biases.
A fourth issue in the artificial intelligence workflow is simply the lack of data scientists. It’s no surprise that the fields of AI and machine learning have grown tremendously in recent years: as a result, we’re faced with a lack of trained data scientists who are able to keep up with the huge influx of data. Additionally, hiring a team of data scientists is simply too expensive for young startups who want to develop new artificial intelligence technologies, and too impractical for companies where artificial intelligence isn’t a core focus. A solution here is providing machine learning as a service: Palantir is the biggest example, along with Amazon Web Services offering their own product. We can also see Israeli companies in the space with SparkBeyond, Razor Labs, and Pita. All of these companies provide high-end, expensive services; thus, there is a unique opportunity for startups to develop affordable machine learning services that can be marketed toward a broader audience.
Insights for Startups
So, where do I see AI infrastructure as a unique opportunity for startups to engage? As a recap, here are the areas of the AI workflow I have identified where startups can introduce disruptive technology to improve the speed and efficiency of the process cycle:
- The AI workflow is compute intensive. Although this problem can be solved through additional hardware in the form of AI-dedicated chipsets, there is a unique opportunity for startups to develop software that optimizes the existing hardware for AI/ML tasks.
- Data annotation is incredibly time-intensive, and it can often lead to biases when training an AI model. Synthetic data is a solution to these problems: because it is pre-annotated, it saves companies time and money. Additionally, since synthetic data is artificially generated, it allows companies to avoid privacy concerns from using real client data, and ensures that companies won’t be training their AI models to have unconscious biases.
- Supervised learning requires labeled data, which may include unconscious biases on the part of the data scientist. As a solution, look to unsupervised learning, which does not require labeled data but instead sorts data into groups according to patterns. Unconscious bias will no longer play a role here, since the AI model is doing the data sorting according to objective patterns.
- Machine learning as a service is becoming increasingly popular for companies across all verticals. From healthcare to retail to automotive and everything in between, introducing artificial intelligence is becoming imperative to keep up with the ever-evolving industries. However, hiring a team of data scientists is often too costly for companies that don’t have AI as a core focus. Here, startups have an opportunity to develop a cost-effective service that allows companies across industries to utilize their own AI/ML frameworks.
Artificial intelligence will specifically impact infrastructure management and introduce significant business benefits across various stages of the AI workflow.
One specific benefit of utilizing artificial intelligence for infrastructure management is detection of cybersecurity threats. From incidents like WannaCry to the Cambridge Analytica scandal, the need for companies to have robust cybersecurity is at an all-time high. AI systems have the ability to quickly spot unusual patterns and predict possible security breaches by studying the organization’s networks. With the development of an AI infrastructure (as opposed to software-defined infrastructure), companies across all verticals can have stronger immunity against cybersecurity threats, even defeating cybersecurity issues preemptively, both reducing downtime and saving money.
Additionally, the use of AI in infrastructure management allows companies to have a reduced dependency on human resources. AI provides complete visibility into all process relations for infrastructure systems. AI reduces the complexities of business processes and cuts down costs, ensuring better decision making and reduced risk for unconscious bias in company practice.
AI infrastructure will revolutionize storage management. Because AI is capable of learning patterns and data lifecycles, AI infrastructure may have the potential to preemptively warn a user about a storage system failure, thus giving the user ample time to back up important data and replace hardware before the failure takes place.
In sum, artificial intelligence and machine learning are transforming businesses — and entire industries — faster than ever before. Success for startups will be based on how they can help companies understand the role of data in their respective industry and make the right choice regarding what infrastructure they implement.