Interview Prep: 6 Questions for Natural Language Processing

Original article can be found here (source): Artificial Intelligence on Medium

Interview Prep: 6 Questions for Natural Language Processing

I am starting a mini series detailing the interview questions I encountered for Data Science/Machine Learning roles. The installments will be divided by discipline, with this one focusing on Natural Language Processing.

These articles exclude general programming or Leetcode-esque questions as there are far better resources to prepare for those.

Question 1: How do computer systems ingest textual data?

Language is formulated as text (or strings as computers would understand it). Meanwhile, Machine Learning models operate in the space of real numbers. Based on how we want to ingest our text, we can keep each observation as a document or break it into smaller tokens. The granularity of the tokens is at our discretion — tokens can be created on the word, phrase or character level.

Afterwards, we can leverage embedding techniques (for instance, tf-idf for embedding documents, or GloVe/BERT for embedding tokens) to convert unstructured text into vectors (or vectors of vectors) of real numbers.

One additional caveat to modelling language data is that input size across all previous and future observations needs to be the same. If we break our text into tokens, then we will encounter a problem where longer text contains more tokens than others. The solution is to either truncate or pad the input based on the designated input size.

Question 2: What are some ways we can preprocess text input?

Here are several preprocessing steps that are commonly used for NLP tasks:

  • case normalization: we can convert all input to the same case (lowercase or uppercase) as a way of reducing our text to a more canonical form
  • punctuation/stop word/white space/special characters removal: if we don’t think these words or characters are relevant, we can remove them to reduce the feature space
  • lemmatizing/stemming: we can also reduce words to their inflectional forms (i.e. walks → walk) to further trim our vocabulary
  • generalizing irrelevant information: we can replace all numbers with a <NUMBER> token or all names with a <NAME> token

Question 3: How does the encoder-decoder structure work for language modelling?

The encoder-decoder structure is a deep learning model architecture responsible for several state of the art solutions, including Machine Translation.

The input sequence is passed to the encoder where it is transformed to a fixed-dimensional vector representation using a neural network. The transformed input is then decoded using another neural network. Then, these outputs undergo another transformation and a softmax layer. The final output is a vector of probabilities over the vocabularies. Meaningful information is extracted based on these probabilities.

Question 4: What are attention mechanisms and why do we use them?

This was a followup to the encoder-decoder question. Only the output from the last time step is passed to the decoder, resulting in a loss of information learned at previous time steps. This information loss is compounded for longer text sequences with more time steps.

Attention mechanisms are a function of the hidden weights at each time step. When we use attention in encoder-decoder networks, the fixed-dimensional vector passed to the decoder becomes a function of all vectors outputted in the intermediary steps.

Two commonly used attention mechanisms are additive attention and multiplicative attention. As the names suggest, additive attention is a weighted sum while multiplicative attention is a weighted multiplier of the hidden weights. During the training process, the model also learns weights for the attention mechanisms to recognize the relative importance of each time step.

Question 5: How would you implement an NLP system as a service, and what are some pitfalls you might face in production?

This is less of a NLP question than a question for productionizing machine learning models. There are however certain intricacies to NLP models.

Without diving too much into the productionization aspect, an ideal Machine Learning service will have:

  • endpoint(s) that other business systems can use to make inference
  • a feedback mechanism for validating model predictions
  • a database to store predictions and ground truths from the feedback
  • a workflow orchestrator which will (upon some signal) re-train and load the new model for serving based on the records from the database + any prior training data
  • some form of model version control to facilitate rollbacks in case of bad deployments
  • post-production accuracy and error monitoring

The feedback mechanism for a sentiment analysis model might be a modal surfaced to the end user which asks for their feedback towards the model’s predictions. This feedback might be processed by some validation mechanism, but will eventually make its way into the database, effectively becoming training data for the next model. One pitfall of feedback loops is bias on how, and for which observations, ground truths are generated.

NLP services are unique in that they need to embed the raw input. This means there will be an additional auxiliary model file used for inference. Here are some pitfalls that are unique to NLP services:

  • exposing preprocessing and embedding steps to the client application rather than accepting raw text as the input
  • error handling around unrecognized vocabulary
  • unexpected user input, such as poor grammar or spelling

Question 6: How can we handle misspellings for text input?

By using word embeddings trained over a large corpus (for instance, an extensive web scrape of billions of words), the model vocabulary would include common misspellings by design. The model can then learn the relationship between misspelled and correctly spelled words to recognize their semantic similarity.

We can also preprocess the input to prevent misspellings. Terms not found in the model vocabulary can be mapped to the “closest” vocabulary term using:

  • edit distance between strings
  • phonetic distance between word pronounciations
  • keyword distance to catch common typos

Some other open ended NLP questions …

  • Describe the tech stack you used for a previous NLP project — specifically, the frameworks and libraries
  • Over the past few years, NLP has evolved a lot. How do you keep up to date with new developments?