How to Build Machine Learning Models for #SwipeToSuccess

Original article was published by Valeriya Kushchuk on Artificial Intelligence on Medium


How to Build Machine Learning Models for #SwipeToSuccess 🚀

The best way to learn data science is by practicing with real-life problem statements and data. Exploring the data and testing some simple models by yourself can be a challenge — that is why we recently hosted an online walkthrough of our latest competition, #SwipeToSuccess, to help our community members learn:

⚡️ How to design and train a Machine Learning model

⚡️ How to improve your score and the algorithm’s performance by using more data

⚡️ How to join a Data Science competition and submit your solution

Taking the pain out of networking

Job networking can be frustrating — it can be a huge challenge to find people whose goals and needs align with yours.

That’s why Atrae, Inc., a startup from Japan, created an AI-powered app called yenta where you swipe right on users whose goals and interests match yours. If the other user is also interested in networking with you, they also swipe right, and you can connect with them over messages or a meeting. bitgrit and Atrae teamed up to launch the #SwipeToSuccess competition and challenge data scientists to optimize the app’s profile-matching algorithm by predicting the compatibility of its users.

The data

For this competition, we give you access to a real dataset that’s usually not widely available. The dataset includes different values (which are anonymized to protect private user privacy) such as:

  • Users’ educational backgrounds
  • Work experience
  • Self-introductions
  • Professional skills
  • Reasons why they downloaded the app
  • Past user-to-user interactions, including swipes and reviews

Working with the data

Data Scientist Jorge Quinteros led this online workshop last Monday and explained how numerical IDs replace the user data above to anonymize the data.

To solve classification problems like the one we’re dealing with here, Jorge suggested using an algorithm called Random Forest. This is an easy-to-use yet powerful algorithm that allows you to run quick tests with the data and train your model. A Random Forest is an ensemble of many Decision Trees with varying structures used for voting and making a prediction.

Sources: 1, 2.

Your input data should look like this:

  • You take a pair of User IDs from the training set
  • For each user ID, group together user attributes
  • Concatenate these attributes with the other user’s equivalent attributes to form input vector (X)
  • Target value (y) will be the score between these users in that order

Train your model with thousands of these examples so that you end up with a model that is able to make predictions on the test data.

📌 Useful resources

✔️ An easy three-step guide to getting you started on your first AI project

✔️ An article covering all the basics of Python knowledge

✔️ A great resource on the basic data visualization in Python

✔️ A read on how the Random Forest algorithm works

✔️ A chapter on text data vectorization

✔️ Our blog on the random forest algorithm, one of the most heavily used machine learning techniques in the industry

✔️ The recording of the workshop with the coding walkthrough:

How to join the competition and submit your solution

  • Go to http://www.bitgrit.net/competition/4
  • Register & click on Participate
  • Accept the Non-disclosure Agreement (NDA)
  • Refresh the page and go to Resources
  • Download the data and then check the Rules

The #SwipeToSuccess competition with a grand prize of $5,000 will run until Oct. 31, 2020, 6:30 p.m. UTC. Sign up for the competition here! 🏆

Good luck to all contestants, and many thanks to our wonderful co-host Le Wagon for helping us put on this workshop!