Singles or Home Runs? How We Train Our AI Early-Stage Investing VC Bot

Original article was published by Connetic Ventures on Artificial Intelligence on Medium

Singles or Home Runs? How We Train Our Venture Capital Investing Bot

“What data do you use to train your AI”

This is a question we get asked frequently from both potential investors and startups when they learn that we use an AI bot to help us make Venture Capital investments.

For a bit of background before we dive in, nearly 2 years ago we launched Wendal, our AI / ML platform that automates due diligence on early-stage startups. We came up with the idea for Wendal because we are located in Covington, Kentucky (not a hotbed for unicorns), and wanted to build software to source and evaluate startups from all over North America.

To learn more about Wendal and how we use it at Connetic Ventures please check out these previous posts:

Applying Moneyball and AI to Venture Capital

Meet Connetic Ventures and Wendal

Now, back to training data. Wendal automates initial due diligence for us and for Wendal to get smarter we need to feed it data to learn from. There are several ways to train models, ranging from very simple binary outcomes (1 or 0) to very complex data sets that can have nearly infinite outcome variables.

At Connetic, we take the simple approach, we train Wendal on the binary outcome of whether a startup became a good deal or a bad deal:

1. Good Deal — Did this company make investors’ money?

2. Bad Deal — Did this company lose investors’ money?

For us, the goal of using data in early-stage investing is to increase the likelihood of making money with each investment. This is very similar to the approach Billy Beane and the Oakland Athletics took when they applied to the concept of Moneyball to fielding a baseball team. After years of extensive research, Billy Beane started drafting players with the goal of recruiting a team that had the highest likelihood of getting on base. In addition to drafting for value, which we also do, but we will save that for a separate post.

Getting on base had the highest correlation to scoring runs and ultimately winning games. They did not focus on home runs, slugging, steals, or even batting average. In Venture Capital, this might be a focus on disruptive technology, extensive IP, team backgrounds, or financial models. The Oakland A’s took the same binary approach as we are taking in Venture Capital, Good Deal (getting on base), and Bad Deal (not getting on base). We believe that creating a highly diversified portfolio of companies that have the highest likelihood of success will translate into the same magic that the Oaklands A’s achieved with the lowest payroll in all of baseball…. If you don’t know the story, then you should block off 2 hours tonight and watch Moneyball.

On-base percentage has a higher correlation to runs scored than batting average

Our goal at Connetic Ventures is to have each of our investments have the highest odds of returning our capital and we believe that the outliers (or home runs) will occur naturally. If the bases are empty when home runs occur, they will not be nearly as valuable to your overall portfolio returns. We want to have the bases loaded so that every time a home run occurs we get the maximum return.

Since the launch of Wendal in early 2019 we have had nearly 1800 companies apply for funding. Once we feel we have sufficient data on company performance, Wendal assigns them as a good deal or bad deal. The binary outcome of a good and bad deal is our training data which then the machine learning algorithms comb through all of the ingestion data to start feeding us predictions.

Screenshot from our Outcomes Dashboard

Given the relatively short time (2 years) we have been using data-driven investing, our current positive outcome data is directional since the average exit takes roughly 10–14 years. The negative outcome data (Bad Deals) are much more accurate given the average time for a startup to fail is roughly a year.

So, this approach naturally leads to the question….

If you can model winners, can you also model unicorns?

This is a great question and something we discuss internally. While we do think modeling unicorns and other return scenarios may be possible, there are a few reasons why we are not pursuing this now.

1. The timeline for unicorns is very long

  • This is probably the biggest hurdle for us at Connetic. As I mentioned, the average lifespan of a startup to exit is roughly 10–14 years.
  • As we continue to build out our database, we will know what creates startup failures in 1/10 the time it will take to know what creates a unicorn. So, given we are a few years into our data-driven strategy we have a much richer and larger sample size on what constitutes bad deals.

2. Early-Stage companies have very limited data to model

  • Due to our focus on early-stage investing we are limited on the data we can analyze on companies. Many companies are pre-revenue and do not even have financials to model off. Even post-revenue companies only have a year or two of financials to build a model from.
  • Later-stage VCs can model users, CAC, downloads, reviews, web traffic, etc…. with this more robust data, there may be a way to move the needle in identifying a unicorn. But by this point, there will be others with the same data so I’m not entirely sure there will be a competitive advantage as data-driven investing moves down the startup timeline.
  • We also believe the team is the most important part of the investing equation. This is something very meaningful we can do from the creation of the company, giving us a competitive advantage over other funds.

3. Unicorns can be reliant on certain macro trends

  • Unicorns are often created by a perfect storm of macro trends; many are impossible to predict.
  • Covid-19 is a perfect example of this. Covid-19 created several unicorns in the digital health space, but to capitalize on this you would have needed to invest 5–6 years ago which would have been impossible to incorporate into a model.

Now, you either think we are crazy and stopped reading… Or you want to know what the early, directional data suggests.

Early data suggests that Wendal is smarter than we thought he would be at this stage. After companies apply for funding, they are given an overall compositive score and star rating (1 to 5 stars). Companies scoring 4 or 5 stars are then recommended for investment to us humans. Of the 4- and 5-star companies in Wendal over the last 1.5 years, 63% of them end up being classified as a good deal.

Wendal Score is correlated to the probability of a startup being a good deal

Compare this to data published by Correlation Ventures, which suggests 35% of startups end up returning investor dollars and Wendal might be 1.5x smarter than humans. I’ve seen additional data that differs slightly but nearly everyone agrees that early-stage companies are successful ~35–40% of the time. This is the same distribution we are seeing from our early outcome data in Wendal where 39% of the companies we have assigned are a good deal (295/756).

Returns data from Correlation Ventures

Now imagine if our current numbers hold true… 0–1X returns are lower to 37% (from 64.8%) and the difference is distributed across the other realized multiples. Even if they all move to 1–5X returns then we are fundamentally changing the returns equation.

Again, these results are directional, but you must start somewhere! When we first started people said it’s going to take you 8 years to get some sort of statistically significant data.

Well now we tell them it is going to take 6 years 😊

What do you think? Do on-base percentage matter? Should we be targeting unicorns? And if so, how would you identify unicorns at the earliest stages?

Also — if any of this is interesting to you, please reach out. We are trying to build disruptive technology just like the companies we invest in. We appreciate all feedback from entrepreneurs, investors, or anyone else in the ecosystem. And if you think you have built something that may enhance Wendal we would also be interested in exploring that as well.

Written by: Chris Hjelm, Partner @ Connetic Ventures