You should try the new TensorFlow’s TextVectorization layer.

Source: Deep Learning on Medium

<< Cool! Should I use this in production? 🙄 >>

Up until now, while working on text based machine learning models, you probably had to decide an input preprocessing strategy: this basically means having to go through all the steps defined above for the TextTokenization layer all by yourself, fit your model on the preprocessed data, save it, then somehow reproducing the preprocessing steps at inference time, before passing them to the served model and get back some prediction.

This could be done in a series of different ways, from implementing the whole preprocessing pipeline totally “outside” TensorFlow, to using the poweful api. Of course, there are ways to include your own operations into the exported graph too, and TensorFlow is flexible enough for you to implement your own preprocessing layer, but it’s a long way to the top if you wanna rock’n’roll.

What this layer gives you is the opportunity to include any text preprocessing logic into your model.

By doing so you will use the same preprocessing and analysis code consistently, and avoid differences between data used for training and data fed to your trained models in production, as well as benefitting from writing that code once.

But here’s where our dreams come to and end.

Unfortunately, exporting a model including this layer is not implemented, yet.

This means that using this layer in production is not yet a good idea, since you wouldn’t be able to export your model and serve it with solutions like Tensorflow Serving. However, this layer is still marked as experimental, and chances are we will have the ability to export this layer soon with future TensorFlow releases.