Source: Deep Learning on Medium
Improving Search with Natural Language Processing and Deep Learning
Whenever one of his models goes live in production, Markus Ludwig, a senior data scientist at Scout24, gets a little nervous. “It feels a bit like sending your kid off to its first day of school,” he says with a smile. This time the stakes are even higher because the new feature changes the way people can search on AutoScout24. Markus has been heavily involved in all phases of this project — from inception to release. The new search feature is aimed at enabling users to find cars using natural language and marks the third consumer-facing product powered by a machine learning model that he built. The other two are Price Estimation and Recommendations.
Markus’ love for machine learning started fifteen years ago when he took his first courses on neural networks and artificial intelligence as a student at the University of Zurich. Over time, he grew more and more fascinated by the possibilities of systems that learn from data. After using neural networks to model financial options in his graduate thesis, he decided to pursue a PhD to dive deeper into machine learning.
Now, as a data scientist at Scout24, he builds models and APIs that power search and discovery. Following his work at Scout24 means learning about the importance of AI within the company. Like with any other product at Scout24, it’s the consumers’ needs that drive innovations, and it’s the consumers who push data scientists like him to think outside the box.
The story of the new search feature, internally dubbed `AI Search`, started back in fall 2018 with a discovery workshop of the Consumer Experience Engagement Squad. The idea was to offer a more intuitive and seamless experience by enabling users to search using natural language. “Ideally, they can just type in what they’re looking for,” Markus says, “and we automatically select the right filters and keywords.”
A few weeks later, a first prototype built for a company-internal hackathon showcased the potential of an alternative search entry. Markus credits this initial demo with getting buy-in for an exploration of how machine learning could be used to understand text.
Machine learning is quite different from traditional programming. Instead of having a software engineer who understands the problem tell the computer step by step exactly what to do, a data scientist shows the computer lots of examples of inputs along with desired outputs so it can learn the mapping. To some extent that also means trading in human domain expertise for data that, hopefully, contains useful patterns.
The Price Estimation model, for example, is trained on data from millions of offers on AutoScout24, and learns how properties such as make, model, mileage and equipment influence prices. The same holds true for the Recommender system, which extracts patterns about listing similarity from millions of click events. The beauty of such systems is that they capture what users consider to be important, and that they can automatically adapt to changes.
The AI Search project, however, was special. In the past, Markus had worked with images, events and structured data but never with text. It was also a completely new product and being involved from the very start with the possibility to shape how the final feature looks and behaves was something that he was very excited about. Following the hackathon, Markus started to think about how the recent breakthroughs in natural language processing could be used to power the new search feature.
After considering several approaches to going from text to listings he realized that the task could be framed as machine translation. The idea was to directly translate user input into AutoScout24 URLs that can be processed with the existing search stack. By automatically mapping relevant parts of the input to either filters or keywords, this approach essentially combines structured search and keyword search. Users can now type in ‘RS6 Performance Panorama 2018’ and the system will select ‘Audi’ as make, ‘RS6’ as model, ‘Performance’ as keyword, ‘panorama roof’ as equipment and ‘2018’ as the earliest registration date.
“This, in turn, raised the question of where to obtain data that would allow a model to learn the desired responses,” Markus recalls. “We wanted a system that can gracefully handle synonyms and typos, understand numeric comparisons and ignore irrelevant inputs.” Then one evening, while commuting home from work, he had another idea: “Would it be possible to generate training data based on simple rules to bootstrap an initial model and later refine it using real user queries?”
In order to validate his ideas, he went on to train a first model. He also built an API that could handle queries for price, mileage and power in addition to filters such as make, model and location. This system worked well enough to convince Maximilian Stöckl to greenlight a first user-facing test. As product manager for the Engagement Web Team, Max is responsible for the core search funnel and aims to continuously improve the user experience on AutoScout24. With this new opportunity to search for listings, Max and the team hope to help more users find a car that fits them.
A few months later the first version went live. That initial test, however, was not designed to gauge interest in the feature but to gather more training data for the machine learning model. “We needed to expose the system to the variety in natural language including synonyms, abbreviations, and typos,” Markus explains.
As they watched the first searches stream in, Max and Markus realized that there would be no shortage of interesting queries for the model to learn from. Free from the constraints of dropdown menus and filters, users started to look for old police cars, fire trucks and rare BMWs from the 80s. On the other hand, the queries contained a lot of typos. This clearly indicated that providing some guidance for common queries would be beneficial, and they decided to add autocomplete to the list of user interface improvements.
Since re-training the model with new data, several A/B tests have shown that AI Search helps users find what they are looking for faster. The API has also been refined and can now efficiently serve production traffic. Overall, user engagement with the feature indicates that it adds value to the existing search.
Markus, who is the father of a five-month-old baby girl, is proud of what the system can already do but also excited for what’s still to come. The model builds on state-of-the-art research from Google Brain, and he hopes that it will eventually learn to map inputs that don’t have a direct correspondence to filters. Maybe it will start translating ‘sports car’ to ‘coupé with two seats over 150 horsepower’ or learn how the meaning of ‘cheap’ differs depending on the car brand. “It’s going to be interesting to see how it changes and evolves over time,” he says.
For now, Max and Markus will keep a close eye on the system and how users interact with it. Together with the Engagement Web Team they just released the feature in Germany. The next steps involve further tweaks to the user interface and the model. After that, they will tackle the rollout in other countries and on mobile apps.
Markus considers himself lucky that he can currently work on the entire machine learning stack, from conception and modelling to deploying and monitoring in production but knows that the future will bring more specialized roles. These changes are also visible at Scout24 with the recent introduction of an AI Platform team. He also thinks that the company will inevitably make use of more off-the-shelf solutions offered by cloud providers. Nevertheless, he believes that data science has a bright future as there is no substitute for understanding both the business needs, and the potential and limitations of the available data and modelling approaches.