Original article was published by Samz Carki on Artificial Intelligence on Medium
If you open the TikTok app, you will see the page, where most of the users spend their time, such as myself. According to the TikTok page, this FOR YOU page is powered by a recommendation system that delivers content to each user that is likely to be of interest to that particular user with each person’s feed is unique and tailored to that specific individual.
The key success behind TikTok’s recommendation algorithm is that it keeps user staying within the app by suggesting robust personalized videos.
A recommendation engine or recommendation system filters the data to extract the relevant information (pre-processing) and design algorithms to recommend the most relevant items to users based on the past behavior of users. A company like YouTube, Netflix, Facebook, Amazon, and the list goes on, has used some kind of recommendation engine in some ways in their applications.
Some of the common filtering methods are:
- Content-based Filtering: This algorithm recommends content or videos, which are similar to the ones that a user has liked in the past. For e.g, you will get the recommended videos based on your interaction, such as share, comment, rewatch a video, etc..
- Collaborative Filtering: This algorithm recommends videos not just based on the content of users, but the behavior of different users as well. For e.g, you will get the recommended videos based on similar users like you, which could be based on location, country, language, etc.
- Knowledge-based Filtering
You can refer here for a working example of a recommendation engine in python.
In Next article, I will walk you through step by step, how recommendation works in Jupyter Notebook. Until then, let’s get dive into the overview of TikTok Recommendation system.
Note* All the information present below are from different articles as well as the TikTok website and does not represent TikTok functional core.
The main problem of most of the recommended algorithms is that they are based on the social graph, which means it recommends content from what we like previously, what our connected circle interacts, same videos from the same creator multiple times.
Tiktok has been able to recommend more diverse content for the user by eliminating a homogenous stream of videos based on users’ interest graph. Such as:
- Recommend a diversity of videos which give additional opportunities to stumble upon new content categories, discover new creators, and experience new perspectives and ideas.
- The recommendation system works to intersperse diverse types of content along with those you already know you love. For example, For You feed generally won’t show two videos in a row made with the same sound or by the same creator.
- Avoid recommendation of duplicated content, content you’ve already seen before, or any content that’s considered spam.
- The video that’s been well received by other users who share similar interests.
What kind of feature has been used?
These features are solely based on my research, TikTok is using more than just below features.
User interactions such as the videos you like or share,
- accounts you follow
- comments you post
- the content you create
- Share the content
- replays of the video
- many more……
Video information like captions, sounds, and hashtags.
- Video Format
- Music and sounds
- Creative effects, voice effects
Device and account settings
- language preference
- country setting
- device type
Completion Ratio: If users watch a longer video from beginning to end, would receive greater weight
- Watch Time: recommendation system takes watch time as a signal that users are enjoying your content
Note*: There are a lot more features that are used to develop the recommendation system.
Let’s dive into how data is collected, processed, and develop the model.
As you can see above picture, computer vision, NLP, and MetaData have been used to extract the key information about the video. Along with this, all the features mentioned above will be extracted and stored in the database. Data Scientist, Data Engineer, ML Engineer will do all the preprocess, deploy, and evaluate the model.
Multiple Models like classification models are used to detect violations and duplicate. The recommendation system will be trained on pre-processed data to develop the model. Base on the user’s preference, the model gets re-trained, and more personalized videos will be displayed on users’ TikTok app.
As we know, we observe the feature importance score on XGBoost or other ensemble algorithms for all the given features, which means that we can individually point out the feature which has more impact on making the decision.
Based on this article, the TikTok algorithm assigns more weight for below input features.
- Rewatch rate = 10 Points
- Completion rate = 8 Points
- Shares = 6 Points
- Comments = 4 Points
- Likes = 2 Points
How algorithm behaves when you are a new user?
For users who select categories, like pets or travel, to help tailor recommendations to their preferences. This allows the app to develop an initial feed, and it will start to polish recommendations based on your interactions with an early set of videos.
For users who don’t select categories: It will offer a generalized feed of popular videos to get you started. Your first set of likes, comments, and replays will initiate an early-round of recommendations as the system begins to learn more about your content tastes.
This recommendation system is so successful to hook users in the app for a long time because you are exploring new content on a real-time. Still, it needs to solve the problem regarding the offensive, spam and bias recommendation. The more longer user stays on the TikTok app and addition of new users on the platform will pile up the user’s data, which could recommend more personalized video but with a compromise of our privacy.
If you have some comments or feedback, please feel free to comment. Thank you for reading!