Original article was published on Artificial Intelligence on Medium
The Cycle of AI in Product Development
Before a company incorporates AI into their product offerings, it’s important to consider a few critical steps that lead to a core parent level concept: a well rounded and extensible data strategy. Data strategies can come in different forms and for the most part are structured in a way to support both a business and future iterations of technology. With that said, let’s take a look at what these steps consist of in the context of solving problems with AI.
Commitment and strategy for data acquisition
Data is a critical piece to supporting performant machine learning (Note: Performance is something defined by an organization as it can be unique case by case — performance also doesn’t necessarily just mean accuracy since accuracy can be treated as a subset of performance.). No surprise to anyone, hopefully. With every iteration of a product, or delivery of an MVP, there should always be early conversations with Science and Data teams to answer questions around data collection.
A team set out to see if their unique data could predict whether a person was thinking about hurting themselves or others. This unique data consisted of internet browsing instances including Google searches, YouTube, or various web pages. The team had millions of events that consisted of discrete page visits that were timestamped and could be threaded across a selected interval. What the team was missing was the classification of those pages, in other words, whether the pages were related to self-harm or not. The team was looking to employ a supervised learning model (data events or instances that are labeled) but did not have enough labeled data. The team employed a simple model. In doing so the team released a low complexity model with a tradeoff of high bias. As a data collection strategy, the team released this model with a simple UI that enabled users reviewing the output to validate or categorize the data. This data collection strategy enabled the team to feed more data into the model over time and develop more complex and lower bias models.
Unification of data and accessibility
Often times this is facilitated by a data warehouse and data pipeline. Being able to access a diverse set of data easily helps Science and Data teams move and iterate quickly!
A team was supporting an eNPS (Employee Net Promoter Score) that used open text inputs from various sources (i.e. surveys, slack, notes documents). The objective was to try and find topics and sentiment across how employees were feeling about the company. By combining data from different sources, the team was able to feed a machine learning model different types of text. This type of study was run pretty frequently so having all of the data in one place and accessible made it really easy for the team to continue to iterate and generate outputs for the company.
Confidence and a path towards determining the value of the data
At the cross section of business and technology you find the opportunity to establish components of your products core value. In an age where more data generally leads to thoughts of higher valuations, it’s important that the first step has an iterative approach to identifying value in the data that is being collected. The focus can quickly change initiatives to collect specific types of data that bring the most value to the product. At early stages of the cycle, diverse data equates to more value since it leads to more trial and experimentation with users of the product.
A team deployed a single model that predicted when students were viewing explicit content on their school issued devices. This became a valuable tool for schools because it helped them identify bad content to block at a system level and also identify when students may be distracted. Because the schools found this valuable, the team was able to tie the unique data they were generating (i.e. an alert or label tied to a specific piece of content) to a numeric value (revenue, usage). The team quickly identified that the data they were collecting could be labeled or categorized in different ways to generate a similar alert to customers (i.e. an alert on viewing math content or viewing content on video games). The team quickly was able to issue out new models at a small scale and test with some customers to see if this type of alerting was valuable to them.
At Remesh, we have been able to successfully develop a product powered by AI, which has had a huge part in achieving our business milestones. One of the unique aspects of Remesh is that it was founded on the idea of collecting, prioritizing, and understanding the spread of opinions across an audience in real-time, between back-and-forth conversations. So, how did we get here!?
We can try to put things into a contextual framework. Take a look below.
The best software products (platforms) in the world have the most users. In commanding large user bases they effectively generate the most unique data. In generating high volumes of unique data they are able to iterate more on their products. From a business perspective, these products command the highest market values.
However, Remesh is in a stage that looks more like this:
The above illustrates a product development framework that optimizes for more users and more data. What this also suggests is iterating on a current experience of the product to improve and diversify acquisition of users and data before making large changes to core systems.
The main reason has to do with the learning factor of any AI based product. Over time it gets better and better. Without much testing, you end up never advancing past the data phase to the next critical milestone of your product; the evolution of the product. Another way to look at this is that you can always generate different ways to collect data. You end up with a data rich environment because you have tried different methods to collect different data. A piece you miss here by not evolving your AI strategy is data maturity (iterating and growing your understanding of the data). You effectively lose one of the most valuable factors of incorporating an AI strategy; the learning factor.
To continue building on the concept of data maturity, at times it is something hard to identify at a business level. You might have a lot of data but simply not know what to do with it. It is important that product teams take a comprehensive look at the data by being more specific about the business problems that need to be addressed.
Business problems extend beyond the technical implementation or “performance” of a model. They dive into a realm of understanding customer’s perceptions and behaviors. As a supplement to iterating on an organization’s AI strategy, thinking about how users engage with an AI product is paramount to helping Science and Data teams understand the holistic view of their work.
Remesh has had a Data Strategy from Day 1. Efforts have led to the collection of a target audiences’ responses, while also collecting additional information on how the language of those responses relate to each other in the context of their opinions. 2–3 years down the road, that strategy enabled the construction of a far superior model at capturing opinions that aren’t collected in real-time. Remesh has only realized a small amount of that potential through a small set of customer-facing data analysis tools. But that strategy of small units of data that inform how an audience’s’ opinion(s) maps to language is still going strong.
In summary, AI strategies can create massive experience gains for any users of the products that incorporate these strategies. It is important that organizations understand how this strategy evolves over time and threads within the fabrics of a product development cycle. Understanding data maturity by creating opportunities for users to inform the product directly can dramatically help a product organization improve their efforts and better align with customer expectations. It all starts with the data that is being collected, and the conscious effort to learn from that data, helping an organization avoid the trap of over-consuming data that might not lead to much value for the business or the businesses’ customers.