spaCy Training using GPU

Source: Deep Learning on Medium

In this article we will use GPU for training a spaCy model in Windows environment. spaCy is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython. spaCy is designed to help you do real work — to build real products, or gather real insights. spaCy excels at large-scale information extraction tasks, and is the best way to prepare text for deep learning. It interoperates seamlessly with TensorFlow, PyTorch, scikit-learn, Gensim and the rest of Python’s awesome AI ecosystem.

Let’s first install the GPU dependencies for spaCy.

  1. Download and install Visual Studio from here
  2. Download and install CUDA from here

Once completing the GPU dependencies, we will install spaCy and its Python dependencies.

I prefer PyPI or the Unofficial Windows Binaries for Python Extension Packages to install Python libraries for Windows. The spaCy and all its dependent libraries can also be downloaded from the mentioned sites.

Additional two more libraries are required by spaCy to run the model training in GPU. They are CuPy and Thinc GPU Ops. We need to download the library that matches the Python version and CUDA version. For example cupy‑6.5.0+cuda102‑cp36‑cp36m‑win_amd64.whl means CuPy version 6.5.0 for CUDA 10.2 and Python 3.6.

In order to train using GPU, you need to get the following output when using Thinc GPU Ops:

>>> import thinc_gpu_ops
>>> thinc_gpu_ops.AVAILABLE
True

If thinc_gpu_ops.AVAILABLE returns False, you need to check the installation.

An to make spaCy use GPU for training we need to add ` spacy.prefer_gpu()` or ` spacy.require_gpu()`. `prefer_gpu` uses GPU if available else uses CPU for training, whereas `require_gpu` only uses GPU for training and will raise an error if GPU not available.

import spacy spacy.prefer_gpu() # or spacy.require_gpu()
nlp = spacy.load("en_core_web_sm")

The training using GPU is 2–3 times faster than CPU.

Happy Coding!!!