Applying Machine Learning to Find Love

Source: Artificial Intelligence on Medium

Photo by Yogas Design on Unsplash

Profile Data Needed

When it comes to dating profiles, there are many different aspects to each dating profile. Due to privacy concerns and other security issues, publicly available data on users’ dating profiles is extremely limited. In place of real dating profiles, we will have to create numerous fake dating profiles in order to simulate the thousands of dating profiles out there. To do so, we must lay out our own format of a dating profile.

Dating Profile Format

The format of our pseudo dating profiles will involve people’s short bios about themselves. These bios will possibly play the most important part of each profile. Next comes people’s interests and beliefs. Their interests will be covered under their favorite movies, tv shows, music, etc. Their beliefs will be covered under politics and religion. This should cover the most basic parts of a dating profile but of course more could be potentially included.

We are creating a simple dating profile for the sake of this project. Once the basic outline of the entire project has been constructed, then we could maybe add more data.

Filling in the Dating Profiles

Next we’ll actually have to start filling the dating profiles. We are aiming to create a little over 5,000 fake dating profiles to test on. In order to fill out each dating profile, we’ll have to simplify interests and beliefs so that we may substitute numbers in place of actual songs, movies, shows, etc.

For example, in our dating profiles, a user will be shown 10 very different movies and must pick only one as their preferred movie. This process is then repeated for each category. This way we will have the values of 0 through 9 under each category, which can easily be generated with a random number generator in Python.

Now that takes care of the interests and beliefs for a user but what about the bios? The bios are supposed to be a representation of a user’s personality and attitude. We can’t simply fill in over 5,000 bios. Instead we must turn to a fake user biography generator to fill in the data we need. There are numerous generators online to use so we can pick whichever generator works best for us.

The only downside to using these generators is that they will typically end up being of the same format. This can detract from the realism of our profiles but the purpose of this project is to create something simple as a proof of concept.

Once we have all the generated data necessary, then the collected data should look like the data frame below:

The DataFrame for the fake user profile data