Extend multi-label classification to map unknown words to embedding space

I. Generate word embeddings for unknown job titles to predict the relevant skills and similar job titles

In the last two parts, I have provided ( I hope I have ) a convincing justification why multi-label classification helps us, to generate word embeddings. But, we are not trying to create new word embeddings. We have to make use of the concept in real word applications.

In this, I will provide with some ideas how we can extend it to some domains. I will explain how these can be used in HR domain. I wont be provide any dataset. The dataset, I have is collection of job tiles and skills. Each job title is mapped to a set of skills. A subset looks like follows

Java Developer       ['java', 'hibernate' , 'struts']
Mongo Developer ['mongodb' , 'node.js' , 'coucdb', 'bash']

It is a clear multi-label classification problem. Here, we can use the concepts defined in the last articles, to produce meaning word embeddings. But, the main challenge is how we can generate embeddings for new job title in real time. For eg: If “Tensorflow Developer” is a new job title, how will we map that unknown title to our embedding space.

The architecture looks like as follows.

Architecture to map Unknown words to Embedding Space

The ARCHITECTURE block can be replaced with CNN , RNN , or Word Averaging. The crux is the the output that comes from the ARCHITECTURE can be directly pass to the final layer, to calculate the logits and perform of sigmoid loss followed by an optimization algorithm of our choice. In the experimental setup I have used Adam optimizer with above mentioned three different architectures.

The only thing to notice here is that, the input to CNN, RNN, word averaging should come from a word embedding model like Glove , Word2Vec, Lexvec, fasttext, which is not a hard thing to do.

Here we will compare the results of different architectures, the top 10 similar job description and predicted skills.


CNN model includes 3 filters [1,2,3] width and the total number of filters is 128.

RNN model includes simple LSTM with dropout, and hidden stated dimension is 128

Word Averaging is just the average of words in job title. The only thing to take care of is the padding and while taking the average, avoid the padding tokens. In tensorflow, we can make use of the dynamic shape inference to do that.

Evaluation Results

I am providing results for just two job titles, which was totally new to the model. CNN model and RNN model works well in producing the embeddings, while RNN is not that great in predicting the skills, but CNN is okay. But, its word averaging that performs well in predicting the skills and it is evident from the skills of “nlp developer”. As averaging provides equal importance to all words in job title, the similar job titles are not that great, which is expected.

redhat engineer
New title
('citrix engineer', 0.9091693)
('juniper engineer', 0.902331)
('senior rhel engineer', 0.8956084)
('sccm engineer', 0.89465857)
('f5 engineer', 0.8934774)
('linux debian engineer', 0.88996077)
('openshift engineer', 0.8855233)
('senior citrix engineer', 0.87746584)
('senior f5 engineer', 0.8675698)
('vertica engineer', 0.8486083)
('as400 iseries engineer', 0.84797513)
('firewall engineer', 0.8363435)
('pki engineer', 0.8345181)
('siem engineer', 0.83242226)
('system_administrator engineer', 0.8313742)
('cloud_architect engineer', 0.82992494)
('aruba engineer', 0.82932425)
('senior vdi engineer', 0.82441384)
('senior ucce engineer', 0.82213825)
('storage emc engineer', 0.8214098)
linux 0.31129378
citrix 0.12751894
unix 0.12187717
cisco 0.10758454
vmware 0.102100745
python 0.08825246
aws 0.07664791
database 0.06212573
firewall 0.060437996
sql 0.059200715
redhat engineer
New title
('citrix engineer', 0.99995524)
('f5 engineer', 0.99995244)
('juniper engineer', 0.99994314)
('firewall engineer', 0.99993414)
('splunk engineer', 0.99992675)
('system_administrator engineer', 0.9998954)
('pki engineer', 0.99988663)
('fiber engineer', 0.99988174)
('iam engineer', 0.999876)
('ia engineer', 0.9998493)
('sccm engineer', 0.99984854)
('marine engineer', 0.99984396)
('scada engineer', 0.9998402)
('java_developer multithreading', 0.9998237)
('telecom engineer', 0.9998221)
('software_developer engineer', 0.9997939)
('hadoop engineer', 0.9997765)
('epsilon .net_developer c asp linq', 0.9997703)
('.net_developer asp c linq', 0.9997699)
('si engineer', 0.9997697)
amazon 0.22474013
java 0.15613505
javascript 0.1413673
aws 0.14029405
key 0.12660679
operations 0.10168926
python 0.09711019
sql 0.08906316
amazon_web_services 0.081526875
c 0.07997654
Word Averaging
redhat engineer
New title
('sccm engineer', 0.93726146)
('juniper engineer', 0.9334493)
('openshift engineer', 0.93171334)
('system_administrator engineer', 0.92273456)
('f5 engineer', 0.9123632)
('sap_basis engineer', 0.9107574)
('linux_system_administrator engineer', 0.9095712)
('pki engineer', 0.90922016)
('vertica engineer', 0.9082236)
('arcsight engineer', 0.90517604)
('aruba engineer', 0.9040191)
('siem engineer', 0.903926)
('devsecops engineer', 0.90251654)
('rightfax engineer', 0.9014003)
('cloud_architect engineer', 0.9007888)
('chaos engineer sre wbs', 0.9007375)
('cloud_developer engineer', 0.89978963)
('network_architect engineer', 0.8996875)
('citrix engineer', 0.8993301)
('mechatronics engineer', 0.8988679)
linux 0.52305925
perforce 0.34721273
vmware 0.28413337
wlan 0.24213676
rhel 0.21910773
python 0.184438
bash 0.17091948
web_proxy 0.16821696
scripting 0.16309687
unix 0.15708101
nlp engineer
New title
('algorithm engineer', 0.80966806)
('performance profiling engineer', 0.7681951)
('nlp software_programmer', 0.7621985)
('ai engineer', 0.7605918)
('natural language processing nlp engineer', 0.75027907)
('optical mathematics engineer', 0.7500508)
('nlp scientist', 0.7459492)
('senior algorithm engineer', 0.7391043)
('lab scientist engineer', 0.7381841)
('senior biomedical engineer', 0.72555476)
('purification engineer senior_engineer', 0.7173574)
('nlp developer', 0.71618736)
('golang engineer', 0.7105443)
('senior nlp scientist', 0.71006656)
('environmental scientist engineer', 0.7066361)
('data_scientist physicist engineer statistician', 0.705125)
('anti phishing engineer', 0.7002297)
('physiological modeling engineer', 0.6889098)
('dsp engineer', 0.68684816)
('technology dsp optimization engineer', 0.6866118)
python 0.42276484
amazon 0.29434747
machine_learning 0.27243003
aws 0.17649546
java 0.16860259
sql 0.13828619
operations 0.111370005
hadoop 0.10678105
amazon_web_services 0.105712906
key 0.10216582
nlp engineer
New title
('dsp engineer', 0.99983895)
('blockchain engineer', 0.99969935)
('sensor engineer', 0.9996836)
('senior_software_developer engineer', 0.9996829)
('laser engineer', 0.9996761)
('fullstack engineer', 0.99964976)
('senior_java_developer hibernate webservices', 0.999647)
('openshift engineer', 0.99964255)
('senior_java_developer spring webservices', 0.99964166)
('thermal engineer', 0.9996393)
('senior_java_developer servlets webservices', 0.9996364)
('si engineer', 0.99963224)
('unity engineer', 0.99962735)
('.net_architect angular 2.0 4.0', 0.99962723)
('senior_php_developer codeigniter cakephp', 0.99961436)
('it_support_specialist mac windows sharepoint', 0.99961185)
('senior_ui_developer angular ionic', 0.99961084)
('android_developer java webservices', 0.9996071)
('c++ developer qt', 0.9995856)
('sccm engineer', 0.99958503)
amazon 0.21178655
java 0.16665275
javascript 0.15650135
aws 0.1361445
key 0.11995288
python 0.10087062
operations 0.098762214
sql 0.09581756
c 0.08394842
csharp 0.08391896
Word Averaging
nlp engineer
New title
('algorithm engineer', 0.89799416)
('dsp engineer', 0.891079)
('data_scientist physicist engineer statistician', 0.88514084)
('asic engineer', 0.87885153)
('ai engineer', 0.86995685)
('golang engineer blockchain cryptocurrency', 0.8675078)
('blockchain engineer', 0.8673588)
('ai researcher engineer', 0.8672452)
('mechatronics engineer', 0.8669444)
('golang engineer', 0.8665817)
('devsecops engineer', 0.8646184)
('arcsight engineer', 0.86386263)
('geotechnical engineer', 0.86344403)
('bim engineer', 0.86315227)
('vertica engineer', 0.8629966)
('siem engineer', 0.8618113)
('cloud_developer engineer', 0.8607273)
('antenna engineer', 0.859938)
('rightfax engineer', 0.85877115)
('network_architect engineer', 0.8576076)
machine_learning 0.75619835
python 0.63704836
nlp 0.5823915
matlab 0.3790635
linguistics 0.33970052
java 0.32944402
hadoop 0.3122111
tensorflow 0.27999988
amazon 0.25931346
deep_learning 0.25578338


There is a lot we can do to this. We can change the architecture to Residual LSTM, Deep CNN, Averaging based on attention mechanism and so much. These can help, to learn more meaningful embeddings.

Source: Deep Learning on Medium