Building a knowledge graph in python from scratch

Source: Deep Learning on Medium

Building a knowledge graph in python from scratch

A knowledge graph is one of the widely used applications of machine learning that tech giants like Google and Microsoft are using in their search engine to provide search results quickly and efficiently. For example, if we search about “Barak Obama” on google, we can see few information about him in the right side of results page without even clicking any of the links.

This has been possible only because, Google has been able to crawl information about these famous people and form a graph of their details. We are going to use the spacy NLP library to build a simple knowledge graph from scratch. Spacy is one of my favourite libraries for NLP for operations such as entity extraction, classification, dependency parsing, and more.

We are going to use the dependency parsing technique to extract information from these unstructured data.

The datasets used in the following tutorial have been obtained from Wikipedia but you are free to use the dataset of your own choice until it is a text data.

Note that information extraction and representation are a vast topic and there are tons of ways of extracting information besides the one used in this tutorial.

Now let’s begin with the tutorial.

Idea:

  • Extract subject and object from sentences as entities,
  • Extract root word from the sentence and use it as relation,
  • Plot network diagram with edges as entities and root word as a relation.

Example:

I love you.[ entities => (I, you), root => (love) ]

I am going to New York. [ entities => (I, New York), root => (going to) ]

The weather is so chilly. [ entities => ( The weather, so chilly ), root => ( is ) ]

Note: Now there must be one question in your mind, if there is a complex sentence just like this one, how can we extract entities and root since there are lot of information.

Answer: We can always use sentence segmentation to segment sentences from complex sentences. Then we can use the above technique to extract entities and root.

In the above examples, we have also extracted the preposition followed by the root word. It can also be some other combination of parts of speech. For example in the following code snippet, we have also extracted agent and adjective pos.

Note: I have used pandas for loading and manipulating data in this tutorial, if you are fuzzy or want to have a quick refresher, check out my pandas for busy data scientists tutorial.

import spacy
nlp = spacy.load('en_core_web_sm')
from spacy.matcher import Matcher from spacy.tokens import Span
def extract_root(sentence):
doc = nlp(sentence)
matcher = Matcher(nlp.vocab)
root_pattern = [{'DEP':'ROOT'},
{'DEP':'prep','OP':"?"},
{'DEP':'agent','OP':"?"},
{'POS':'ADJ','OP':"?"}
]
matcher.add("root_matcher", None, root_pattern)
matches = matcher(doc)
k = len(matches) - 1
span = doc[matches[k][1]:matches[k][2]]
return(span.text)
root_words = []
for item in sentences_dataframe["sentence"]:
root_words.append(extract_root(item)))

Let’s try this in a toy dataset and see how it works

toy dataset of sentences

Now it should give the following result when extracted root words.

dataset with root aka relation extracted

Now it’s time to extract entities. Using spacy’s dependency parsing feature we can easily extract relations from the above sentences.

So extracting entities for above sentences will give following result:

relation and entities extracted using dependency parsing

Now it’s time to plot these. I am going to use python’s networkx library to plot the diagram as a network having edges as relation and nodes as entities.

Before that I want to add two columns( source, target ) in our dataframe so that it will be easier to plot.

add source and target columns

Now, Let’s plot the knowledge graph using the network library

drawing knowledge graph in the network diagram

Which should generate the following graph

knowledge graph

I also write about other topics like recommendation systems, natural language processing and many more. If you are also exploring these topics, don’t forget to visit my profile and see if there are any posts that interest you.

If you like my article, don’t forget to follow me on Medium, or connect me on Linkedin, or follow me on twitter.