Building a Job Recommender for Non-Technical Business Roles via NLP and Machine Learning

Original article was published by on AI Magazine


Building a Job Recommender via NLP and Machine Learning

I built a job recommender for people with non-technical backgrounds. This article concerns the app’s features and the steps I took to build it.

Photo by Crew on Unsplash

Here is the link to the app.

I was inspired to build a job recommender because of recent layoffs at my old company. Many of my former colleagues who were let go were less than 3 years into their careers, and they wondered, based on their experiences, what roles they fit in and where they should start their job search. I remembered a similar struggle coming out of college, where I didn’t really know where to focus my search either. I believe that this issue is quite common for people with non-technical backgrounds.

I thus decided to build a job recommender for people in our positions — those with non-technical backgrounds like a Psychology major — that would use information based on our past experiences to help steer us in the correct path.

The app combines NLP techniques such as topic modeling with classification-style machine learning in order to determine the best fit for you. You copy and paste your resume / LinkedIn into the text box, and the app parses the text and presents you with ML-driven analysis of which jobs you fit and why.

The App has 3 Features:

Feature 1: Return percent match by job type.

Feature 2: Return a chart of where your resume fits in with the other job positions based on topic matches. This chart hopefully provides some sort of explanation as to your results from Feature 1.

Feature 3: Select a dropdown of different job archetypes and see which keywords your resume both matches and doesn’t match.

In the rest of this article, I will go over the steps I took to build the job recommender. The link to my code is here.

Step 1: Scoping the Project

The most important part of a data science project is scoping — that is, planning your project so that it fits your time and effort constraints, yet is still capable of answering a valuable question. There’s so much data and avenues for exploration that a certain domain can possess, the sheer amount can be overwhelming. You thus need to be clear about what issue you are trying to tackle, what specific data you’ll look for, and what is the minimum threshold for success.

They key question for my scoping was figuring out which job types to analyze. So many different jobs exist in the non-technical business space. I felt that if I included too many job applications, the project would not have its intended effect. The modeling might be less accurate, and the app might ultimately be too cluttered and unfocused to be helpful for the end user. I thus thought back to my original intent and settled on 2 criteria for jobs to analyze:

  1. The jobs must be business roles that don’t require technical skills. This excludes roles such as software engineers or medicine or acting.
  2. The job must be between the 40–120k salary range. This captures the range at which people with 0–3 years of experience typically earn for.

After that, I thought of some broad job archetypes that made sense, but also sent out a survey to see what others thought of my ideas. I then had my colleagues rank the job types they were interested in so that I could limit the number of job types I had to analyze. Ideally, I wanted to analyze less than 10 job types.

With this info, I chose the 5 most popular jobs as the basis for my analysis but ended up adding more later on when I was confident my model could correctly distinguish between the job types. I ended up choosing 8 different job types in the non-technical business space and also added “Data Scientist” out of personal interest.

Step 2: Scraping Data

The data I planned to use was job postings in the respective job types that fit the criteria outlined above. I first looked at LinkedIn and Indeed, but their sites proved too difficult to scrape because of dynamic loading.

I settled on scraping job postings from Glassdoor, which also used dynamic loading. But this time, I borrowed the code of a fellow data scientist and modified it to my needs. I’m quite grateful for this code because building my own scraper would have taken me far longer!

After modifying the scraper to fit my needs, I scraped 40 posts per job type and joined the files. Many of the jobs scraped clearly didn’t fit in the correct job archetype — for instance, I got many jobs in construction when scraping “Project Manager” roles. I also got many jobs with the title “Senior [Job Type] Analyst” or “Director of [Job Type]” which were clearly unattainable roles for people with 0–3 years of experience. So I set text filters up to weed out these job types and double checked to make sure the rest of the data fit what I was looking for. Ultimately, I ended up with the text descriptions of 149 job listings from 9 different job types as my data set.

Step 3: Data Cleaning and Topic Modeling

The next step was cleaning the text descriptions. I used standard cleaning techniques such as removing punctuation and capitalization, then tokenized and stemmed the words to standardize semantically. Lastly, I put the data in an array format using a vectorizer.

Once the data was in an analyzable format, I performed topic modeling, trying several techniques but ultimately landing on TruncatedSVD. The optimization factor was how accurately I could predict job types with the classification model I built in Step 4. I came up with 20 different topics in total.

Step 4: Build Classification Algorithm

After topic modeling, I used the topic-document matrix and fed it into a classification algorithm. Optimizing for accuracy, I ultimately settled on a random forest classifier. The model returned ~90% accuracy on validation sets, showing strong competency in predicting correct job types for each job description.

Next, it was time to give the model a functional purpose. I used the above topic model to transform one’s resume, and then used the classification model above to predict which jobs the resume fit best with. I then extracted the percent match by job type, giving a more nuanced view of one’s best job match — for instance, 60% project manager, 40% product manager — and giving the end user multiple career paths to investigate.

Step 5: Build PCA Chart building Function

When I showed people the results of their respective job match percentages, they asked why they fit in with those groups. I had a pretty tough time explaining the underlying mechanisms of the models to a non-technical crowd, and I thus decided to show them a simplified chart to explain these job match results.

First, I reduced the topic-document matrix into two dimensions using PCA. Then, I plotted out each job type according to the reduced dimensions. I also applied the dimensionality reduction to one’s resume and was able to plot where one’s resume held against the other jobs on the chart. In interpreting the new PCA features, I saw that they were heavily geared towards 2 topic types: Marketing related keywords, and project management related keywords.

Step 6: Build Keyword Matching Feature

Users are interested in how they can improve their resumes for a better chance at whichever job they are targeting. I decided to make a matching keywords function in order to make the app more useful.

In this feature, the user selects which job they are interested in from a dropdown, and the app returns which keywords are matching and missing from their resume. People can see both where their resumes currently stand, and what words and experiences they could put in for a more targeted application.

This feature was probably the easiest one to make. I used the same topic model above to come up with the most significant words in each job type — say top 20 — and used list comprehensions to see which words matched or missed.

Step 6: Write and Deploy App

Now that I created the functions and models, I needed to deploy an app online so that people could use it.

For app writing, I used Streamlit. It’s one of the easiest and most effective packages. I then used Heroku and git to upload it to the web, although in retrospect it would’ve been far easier to use streamlit’s newly released app deployment features.

And there we have it! A job recommender. I was quite proud of this project and I hope people find it useful. If you think the app can help someone, send it to them! If you have criticisms, feel free to comment below or message me. Thanks for reading.