Source: Deep Learning on Medium
The Data Science Behind Netflix
“Netflix is not only a successful Service but it is completely Data-Driven Service.”
Netflix in numbers
Last year Netflix announced that it signed on 135 million Paid customers worldwide.
Netflix’s US Users’ demographics perfectly represent the overall US population in terms of different factors like wealth, age and education.
Netflix’s Business model
With no ads, Netflix’s Business model relies on customers who subscribe to their service for the long run. The happier the customers are, the longer they stay subscribed to the service.
This is why it is central to Netflix’s business to identify and analyze factors that impact the viewer’s enjoyment.
Factors impacting customers enjoyment
Since in the early days, Netflix captures viewers’ enjoyment through rating given to the shows/Movies.
As streaming video becomes primary focus many more data points become available, giving insight into the customers.
The data points include…
Time of day something was watched.
User age and gender (based on individual logins)
Time spent selecting movies
How often a movie or program was paused/resume
Netflix predicts “Perfect situation”
Using all the above data points Netflix’s Data Scientist & Engineers build models to predict “perfect situation” in which, customers continuously receiving the programs they enjoy.
To do so, it assigns users to 3–5 different clusters among more than 1300 clusters, based on their viewing preferences.
Data-Driven categorization of movies
Using Data Science techniques, Netflix Service created 76,897 unique ways to describe types of movies.
These are called “alt-genres” which is what leads to Netflix’s Scarily specific movie/show suggestions(e.g. “Movie like: The Heart of Christmas”)
clearly they go beyond the classical categories like drama,sci-fi, and comedy.
Cover Image Personalization
As you observed that all users have different cover pages based on their movie preferences also it may change with time.
This is the most important thing which Netflix does for brings more new viewers.
Netflix models the shows’ cover image on the colors and styles for successful similarly tagged programs.
Also, they try with different versions of cover images to find out which one is more effective for the user.
Approach to achieve
Netflix’s recommendation engine is powered by machine learning algorithms. Traditionally, we collect a batch of data on how our members use the service. Then we run a new machine learning algorithm on this batch of data. Next, we test this new algorithm against the current production system through an A/B test. An A/B test helps us see if the new algorithm is better than our current production system by trying it out on a random subset of members. Members in group A get the current product experience while members in group B get the new algorithm. If members in group B have higher engagement with Netflix, then we roll-out the new algorithm to the entire member population. Unfortunately, this batch approach incurs regret: many members over a long period of time did not benefit from the better experience. This is illustrated in the figure below.
Netflix disrupted the TV industry using Data Science to provide viewers with exactly the content they want.