Why AI startups have different economics from classic SaaS startups

Original article can be found here (source): Artificial Intelligence on Medium

Why AI startups have different economics from classic SaaS startups

Can we expect AI startups to bear out similar economic gains as traditional SaaS businesses? Or are there unique qualities to AI startups that warrant a different set of expectations from investors and entrepreneurs?

But First, A Bit of History

Let’s rewind the clock a bit. Back in the day, software vendors would write code, package it, and often distribute physically (through those nifty things called CDs). In this old world, buyers were shouldering most of the operational costs, such as running the applications that they bought on their own local data and compute centers (or laptops and desktops).

Then came the advent of faster Internet speeds and cloud computing, which really opened up software development and deployment to a whole new world. With that, we started to see a dramatic shift of infrastructure costs back to the software vendor. That is, under the SaaS world, vendors host and manage web apps in their own data centers or cloud environments, allowing buyers to gradually decreased their investment and expenses associated with managing infrastructure.

Even though infra-costs started to move upstream to vendors, it is still a better system net-net. Vendors can still build software once, but distribute it over and over again to as many customers as they like. They can deploy it faster and cheaper too, via the Internet, instead of using physical distribution channels. In that sense, vendors’ customer reach expanded as well. Any user with an Internet connection can buy and use software on-the-fly. Moreover, software updates can be done over-the-air and customers can pay via subscription, while gaining access to the latest and greatest features, fixes, and versions. And all this can be done without much painful new installation cycles on local machines. Commensurately, SaaS revenue models became not only able to scale, but also recurring streams. Usage-based pricing brings flexibility to buyers, substituting the need to pay for installations or licenses that didn’t always map well to actual utilization.

So… are AI Startups really just like traditional SaaS businesses?

While AI has technically been around for quite a while (albeit in different flavors and levels of utility), AI startups are still relatively new on the commercial scene. With that in mind, I think there are definitely some qualities to AI startups that may set it apart from the traditional SaaS startup, both in terms of technology and operations. And these differences, ultimately contribute to different cost structures between AI startups and traditional SaaS startups.

Since I personally work in automatic speech recognition (ASR), I’ll be using that as an example throughout, but I believe the points below would generally apply to other AI arenas, including image recognition, OCR, translation, text analytics, etc.

Technological differences drive different cost structures: Training & Inference.

1) Data and Model Training Drive Up Costs
In ASR, building speech recognition models requires a vast amount of training data. Not only would you need the audio data, but you’d also need corresponding ground-truth annotations to build an ASR engine. The thing about training data is that it’s not a one-time cost. It’s actually an ongoing expense. Even if you were to employ continuous user-data ingestion as part of your overall training pipeline, you’d still have to spend on data selection, structuring, re-training, etc. to make it useable.

There’s another phenomenon called data drift, which is the idea that over time, the data required to improve your models will change. That means you’ve got to buy/acquire more data.

And if you’re going to support multiple accents, languages, and domains (technical jargon, etc.), then you’ll have to multiply your training data sourcing efforts and expenditures. For simplicity, we can lump in the same cost effect for testing data too, although that can be somewhat mitigated.

And then there’s the actual cost of model-training activity. If AI were a one-and-done type of deal, then there wouldn’t be much “learning” going on. Therefore, for better and better results, such as accuracy of speech recognition, you’d need to re-train models. And that’s not necessarily cheap. In fact, it can consume a lot of compute resources.

To give an example, let’s say you’re running a classic database SaaS business. While there are definitely maintenance and upkeep, the traditional activities of software development don’t lend themselves to the same level of costs that correspond to continuous model training, which can be very compute intensive, for an AI startup. Whether you’re buying cloud infrastructure or spending it on managing your own, an AI startup has to foot that variable cost bill somewhere.

2) Inference
Inference is the process of using the trained models to produce results (e.g. make predictions). In ASR, for instance, anytime a user passes an audio through to your service, there’s variable compute costs associated with running the models to transcribe speech into text.

In traditional software, you build it and distribute it. Usage of the software typically does not drive up massive additional variable cost on your end as a software startup. Going back to the earlier example, you’re managing a traditional SaaS database business. The activity of read-write is much less complex, compared to something like running massive numbers of matrix calculations to listen to an audio file, and then transcribe it into text. And so, the higher complexity of inference directly drives up variable compute costs per user (or per unit consumption, such as minutes of audio transcribed, or number of images processed). Images, video, or audio all qualify under some form of rich media, which requires more compute power to process than writing or reading values to and from a database.

To summarize, collecting training data, training the backend models, and running inference all drive up variable costs that are not typically incurred by traditional SaaS businesses. What does this mean? All else being equal, this seems to imply that an AI business’ gross margins would be lower than those of traditional SaaS businesses.

Operational differences drive different cost structures, too.

As much as we hope, AI isn’t quite at the level that Hollywood likes to portray. Speech recognition, image recognition, text analytics, etc. are not perfect (although it’s all improving very quickly).

Now, in some applications, perfection isn’t required. But in a lot of use cases, they are. What does this mean?

It means that the output results of AI services may actually require additional human post-processing to get to a desired state for results. This is often referred to as human-in-the-loop. For instance, if you ran a startup offering AI-powered transcription, you might also employ a team of humans to make corrections the machine-output transcripts. Even if we argue that the transcription accuracy is something as good as 90%, depending on the use case, it may still require the human touch to be finally useable by the customer (e.g. medical transcription). So, in other words, having additional human labor also drives up costs (fixed, if you have a bunch of folks on standby; variable, if you pay them by the hour).

Now, you could pass this human post-processing effort to your customer and say that net-net, the basic activity of producing machine transcripts are already going to highly valuable, since using machine-output transcripts is a lot faster than old-school manual transcription altogether. So your customer may only have to spend a fraction of the time correcting automatic transcripts, rather than the accepting the full cost of manual services.

And that, to some extent might be a fair point, except by transferring cost down to your customer, you’re decreasing the value of your AI product proposition. The result is that you’ll probably have to charge a lower price in order to capture value. Whether the sacrifice is coming from the top line or ops line, it’s going to drive down your bottom line.

AI Startups face a different type of growing pain, translating to different expectations for the top-line (e.g. revenue and scale).

As a startup of any kind, one of the key metrics you’re measured by is your growth. To grow fast, or have fast adoption, your product needs to embody many characteristics. One of them needs to be quality.

For AI startups, the concept of quality is difficult to define. For instance, how do you judge whether speech-recognition accuracy is good enough for an application?

To think about this, I like to use an analogy in wine.

Unless you’re a sommelier (a wine master), you probably fall into the average or novice segment when it comes to the ability to discern if a wine is “good” or “bad.”

Aside: In fact, studies have demonstrated that nobody really knows, and that largely people index on price (higher price induces people to think they’re having better wine).

Now, the average wine drinker, possesses enough basic instinct to know when something is on either tail end of the quality curve. For example, you don’t have to be an expert to know that wine with some residual grit in it is pretty shabby. You can also probably tell if a wine is heavenly delicious. But it’s the space in between this quality curve, that really is mystifying and open to interpretation, or at least question.

It turns out — at least by my argument — that this phenomenon is pretty similar to determining AI quality as well. In ASR, for instance, if you’re product is recognizing, say… only 10% of the speech accurately, then you know it’s a shoddy speech-to-text service. If on the other hand, the ASR engine is producing 98%+ accuracy, you’ll know that’s exceedingly good(keep in mind human speech recognition accuracy is at best between 92–95%). But is an ASR service that’s 80-90% bad? Good? Great? Good enough?

In some domains, such as medicine or self-driving vehicles, the hazards of being wrong are so severe that we set pretty clear standards, for say… medical transcription of clinical notes or computer vision recognizing pedestrians. But that’s like being on the very tail end of the quality-expectation curve. The rules are strict and the risk is high, so every percentage in terms of inference accuracy counts.

But what if you’re running a contact center? And your goal is to use ASR and text analytics (NLP) to determine why a customer has called in to complain? Is buying an ASR service with 75% accuracy good enough? After all, it’s not a matter of life and death.

The point that I’m making is this — you can’t sell AI products like you sell traditional software, because the distribution of use cases is not only diverse, but also subject to a lot of customer hand-holding, interpretation, and custom POC building. The sales cycle has more friction, because you’ll have to help educate your customer about what is “practical quality” versus what are “academic benchmarks”. (And unlike wine, brand and price may only play an initial role in signaling quality, which quickly evaporates through direct service testing).

Furthermore, users don’t always have a great grasp of necessarily how to use your service. AI products are often open-ended meaning that they deal in very complex data and inputs, like… speech or images or videos, etc. That means there’s a lot of room for edge cases, contrasted against a traditional software startup. If I’m running an ASR service, and you attempt to transcribe an audio file with a lot of background music that interferes with the speech data, the product is probably going to be a really poor machine transcription output. Or, what if you simply input a silent audio file with no speech? Or what if you try to transcribe an audio recording of a crowd cheering at a concert? Put another way, there’s a vast state space that is subject to misuse and ambiguous performance, simply because the user can (and likely will) misuse it.

The commercial implications are clear here. Sales velocity is likely to be slower. POCs may be drawn out. In fact, use case requirements may be a lot fuzzier when it comes to AI product performance. All of this translates to a limited ability to scale and onboard customers. So right out of the gate, all else being equal, an AI startup is unlikely to enjoy the same non-linear adoption curve that a traditional SaaS startup does.

What’s the future for AI startups?

Frankly, I think the verdict is still out. But if you’re an investor, I think it’s important to temper your expectations on the financial bottom lines of AI startups compared to your traditional SaaS investments. I’m not saying that AI startups won’t be good investments, or even great investments. I’m just saying, you’ll have to balance your expectations.

For entrepreneurs venturing into AI startups, I think you’ll need to think deeply about your cost structure, especially how your variable costs shape up. You’ll also want to be hyper specific about the target user, and maybe constraining your problem space with more discipline, so that the problem-solution match is tighter.