How to run Tensorflow.js on a serverless platform

Original article can be found here (source): Artificial Intelligence on Medium

How to run Tensorflow.js on a serverless platform

In a previous article, we introduced neural networks and TensorFlow framework basics.

Today, we present how to use TensorFlow.js, from online models to model conversion, web-based versus server-based deployment, show how to use an online TensorFlow.js model and deploy it rapidly using our WarpJS JavaScript Serverless Function-as-a-Service (FaaS).

TensorFlow, from Python to JavaScript

As we introduced in the first article, the original Python TensorFlow (TS) consisted of a declarative-style API.

The declarative style (requiring by nature a specific debug environment: Tensorboard) and the broad API functionalities induced a relatively long learning curve on the developer’s side:

  • As a first step, the user constructs a “graph” of all TensorFlow operations (from simple operators on tensors to operations with complete networks, including connections with data sources and sinks),
  • then, he creates a “session”, in which TensorFlow analyses the graph, resolves the operation schedule and executes all computations.

Due to the long time needed to learn and master this API, Google introduced 2 significant python TF improvements toward user-friendliness:

  • An imperative execution mode (“eager execution”), that is way more intuitive for python (and other script-languages) programmers… and making debugging easier. It was however not fully compatible with all existing features,
  • Keras API: a set of high-level operations (network assembly, inference and training), user-friendly, dedicated to neural networks, inherited from Keras by Google in 2017.

Tensorflow.js, the JavaScript version of TensorFlow (imperative execution) does not include all TF functionalities available in “declarative” mode, but supports, amongst others, the full Keras API.

Ready to use TensorFlow.js models

Pre-trained models are available for public use by non-experts in machine learning on TensorFlow.js model repository, for various applications:

  • Images processing: classification, objects detection, body/hand pose estimation, body segmentation, face meshing
  • Text processing: toxicity detection, sentence encoding,
  • Speech processing: command recognition.
  • Language processing: the newly released mobileBERT model enables applications like chat bots, …

All of these are also hosted on NPM. Feel free to visit the repository for more details.

More than 1000 available TensorFlow models and variants are being centralized in the TensorFlow Hub, which includes models for Python and the models mentioned above, usable in JavaScript.

As mentioned in our previous article, the Magenta project (music and art using ML), hosted on NPM as well, provides a JavaScript API using models, amongst which recursive neural networks (RNN).

Converting a Python TF model for JavaScript

Although many ready-to-use models are available online, in most cases, re-training (at least, fine-tuning) is often required for a specific application case, when not re-architecting.

As Python is widely used in model design and training, situations arise where a model developed with Python TF has to be used with JavaScript (browser or Node.js).

Knowing the Python TF history that was briefly summarized above, when the time comes to save or export a trained model, one won’t be surprised to see different formats:

  • saved model format: includes a complete model architecture, weights and optimizer configuration in a single folder. Such a model can be used without access to the original python code. Training can be resumed from the checkpoint reached by the time it was saved,
  • Keras saved model (‘hdf5’ format): models created using the Keras API can be saved in a single file (‘.h5’). Basically, it contains the same info as the saved model,
  • frozen model (‘.pb’): a variant of a saved model, but that cannot be trained anymore (only architecture and weights are saved). It is aimed at being used for inference only.

TensorFlow provides a converter in python environment: tensorflowjs_converter.

It can be installed easily using:

$ pip install tensorflowjs

This utility converts various model file formats generated by the TF python API into a JSON file with additional binary files containing weights.

For details on model converter, see the links below:

In addition, the TensorFlow.js team just released a model conversion wizard (announced at TensorFlow dev summit 2020).

Converting with python shell command-line utility

Example for a frozen graph model’s ‘.pb’ file. The output node of the TensorFlow graph must be specified:

tensorflowjs_converter \— input_format=tf_frozen_model \— output_node_names=’MobilenetV2/Predictions/Reshape_1' \/mobilenet/frozen_model.pb \/mobilenet/web_model

Example for a ‘.h5’ keras model file:

tensorflowjs_converter — input_format=keras /my_path/my_model.h5 /my_tfjsmodel_path

Both examples create a JSON model file & binary weights

Generating a converted model in python code

For Keras models, the tensorflow.js module includes APIs callable in python TF that directly output JSON format.


# in Python code where the model is created and trainedimport tensorflowjs as tfjsdef train(…):model = keras.models.Sequential() # create a layered keras modelmodel.compile(…)…) # train modeltfjs.converters.save_keras_model(model, my_tfjsmodel_path)

Once converted, depending on the model type (Graph or Keras), it can be loaded in a JavaScript environment with Tensorflow.js model loading utilities:

// in JavaScript code inferring the converted modelconst model = await tf.loadGraphModel(‘myTfjsmodelPath/model.json’);


const model = await tf.loadLayersModel(‘myTfjsmodelPath/model.json’);

then the model is usable for an inference:

const prediction = model.predict(inputData);

Operating a JavaScript model

At some point, a neural network model is sufficiently stable to be used on significant data sets. Depending on the application case, this usage may consist of:

  • inference only: analyzing “production” data sets (texts, images or other media content, etc…) without further training (at least during the analysis),
  • inference and training: part of the “production” data sets is also used for continuous network training in order to increase performance with application-specific experience.

If both browser-based and Node-based TensorFlow.js APIs are equivalent in terms of functionalities, multiple key decision aspects add to performance when selecting the best way to operate the model : data volumes, transfer bandwidth and privacy.

Browser-based execution is interesting in highly-interactive applications, particularly when processing media that are streamed in or out locally (webcam, graphical user interfaces, sound, …), and for moderate-size NN whose load-time is not crippling for user experience.

Using a browser-based execution has some drawbacks for standard size-models, impacting a lot the user experience:

  • The performance of the model is limited, and only moderate size NN modules can be used, despite TensorFlow.js’ webGl and Wasm backends that provide acceleration capabilities,
  • loading a model can take 15s or even a minute due to the size of models and the performance of the mobile network, which is a long time for the user,
  • memory requirements to run the model are high. On small memory devices it restricts the use of the model, breaking application features,
  • not all mobile phones/browsers are up to date and the model could not run on all devices.

Of course, this is a current state as Google progresses on some of these issues. In the short term, using a server-based execution using Node.js is an excellent solution that solves all these drawbacks.

  • Performance of the model is close to Python TF thanks to using native or GPU accelerated versions of TF.js for Node.js, there are no more limits to the model complexity,
  • a server has a super fast network, and time to load a model is significantly decreased. Also, servers can be already ready to run with models preloaded,
  • a server can be tuned with memory requirements to run any model size,
  • the model is guaranteed to run on any server,
  • the new drawbacks are more related to the remote data transfers to the server, in particular moving sensitive data out of the device must be managed and defined in the service provider…

It could also open the possibility to perform inference/training processes within or at the edge of the network boundary where the data is stored to reduce latency and data transfer times.

Only the inference results (usually lighter than input data flows) have to be considered as payload from latency & infrastructure cost viewpoints that integrate data out cost in their pricing.

Finally, TensorFlow.js, on the server side, provides the TFX tool (Tensorflow extended) to deploy production machine-learning pipelines. The AutoML tool (provided by Google Cloud) also provides a GUI-based suite to train and deploy custom ML models without requiring extended machine-learning and NN expertize.

Using an online model with TensorFlow.js

Many public models can be retrieved from web databases.

We’ll use the “toxicity” pre-trained model in the next sections as an example.

The toxicity model detects whether text contains toxic content such as threatening language, insults, obscenities, identity-based hate, or sexually explicit language. The model was trained on the civil comments’ dataset: which contains ~2 million comments labeled for toxicity. The model is built on top of the Universal Sentence Encoder (Cer et al., 2018).

Browser-based usage:

The model can be directly loaded for use in JavaScript at:

In the html, add:

<script src=”"></script><script src=”"></script>

Then, in the JS code:

// sets the minimum prediction confidenceconst threshold = 0.9// load and init the modelconst model = await toxicity.load(threshold);. . .// apply an inferenceconst predictions = await model.classify(inputText);. . .

Node-based usage:

Toxicity is also available as a NPM module for Node.js (package that actually loads the model from the storage link above):

$ npm install @tensorflow-models/toxicity

Then, in the JS code:

const toxicity = require(‘@tensorflow-models/toxicity);// sets the minimum prediction confidenceconst threshold = 0.9 // sets the minimum prediction confidence// load and init the modelconst model = await toxicity.load(threshold);. . .// apply an inferenceconst predictions = await model.classify(inputText);. . .

Deploying a model with WarpJS

As discussed before, inference on big data sets in the browser comes rapidly short in terms of performance due to model and data loading time as well as computing capabilities (even with accelerated backends).

Node.js allows to push further the performance limit by deploying on a high performance GPU engine and in the network neighborhood of the dataset, but the user will face complexity when trying to address distributed processing for the next performance step.

The WarpJS JavaScript FaaS enables easy serverless process distribution with very little development effort.

Example: toxicity model serverless deployment

WarpJS installation guidelines can be found here: Getting started with WarpJS

You can register to WarpJS with my invitation code (as I write those lines, WarpJS is currently in private Beta).

This article also provides a good tutorial on all steps to operate WarpJS.

In our WarpJS serverless operation, the browser acts as the primary input/output interface, through an index.html file.

It contains a text box to submit the input text to be analyzed and a “classify” button triggering the inference process.

<!DOCTYPE html><body><h1>TensorFlow.js toxicity demo with WarpJS</h1><form id=”form”><input id=”classifyNewTextInput” placeholder=”i.e. ‘you suck’” required><button>Classify</button></form><p id=”result”></p></body></html>

The index.js file (see below) contains the inference function (highlighted) to be run at each user input and distributed on serverless infrastructure. WarpJS is a function-as-a-service platform for JavaScript. Instead of creating HTTP endpoints and use HTTP calls to do a remote inference, we just have to tell WarpJS to manage the execution of a JavaScript function on its FaaS just by Warp calling it (, which is very similar to a JavaScript call. So in our case we Warp-call the classify function that will be run on the WarpJS FaaS.


/** Copyright 2020 ScaleDynamics SAS. All rights reserved.* Licensed under the MIT license.*/‘use strict’// import WarpJS moduleimport { defaultWarper as warper } from ‘@warpjs/warp’import engine from ‘@warpjs/engine’// init WarpJSengine.init()// warp prediction functionconst classify = async inputs => {‘warp +server -client’// predict with tensorflow modelconst predictions = await model.classify(inputs)// check toxicity resultsconst toxic = predictions.some(({ results }) => results[0].match !== false)return toxic}// main script// listen to button click eventdocument.getElementById(‘form’).addEventListener(‘submit’, async event => {event.preventDefault()result.innerHTML = ‘<h2>Remote inference running</h2>’// scan textbox contentconst text = classifyNewTextInput.value// invoke inferenceconst toxic = await, [text])// render result on html pageif (toxic) {result.innerHTML = `<h2 style=”color:red”>Your sentence is TOXIC :(</h2><img src=”/img/Pdown.png” alt=””>`} else {result.innerHTML = `<h2 style=”color:green”>Your sentence is NON TOXIC :)</h2><img src=”/img/Pup.png” alt=””>`}})

As all FaaS, functions executed serverless are stateless. Instead of loading and initializing the model at each function request, WarpJS provides an easy way to add some initialization on the server that will run the functions.

To do it, we add in init-server.js the initialization of TensorFlow.js and toxicity model. When done, we set global.model, so the functions can use the model directly in their code.


/** Copyright 2020 ScaleDynamics SAS. All rights reserved.* Licensed under the MIT license.*/require(‘@tensorflow/tfjs’)require(‘@tensorflow/tfjs-node’)const toxicity = require(‘@tensorflow-models/toxicity’)// The minimum prediction confidenceconst threshold = 0.9// Load the modellet modelLoaded = falsetoxicity.load(threshold).then(model => {global.model = modelmodelLoaded = true})// Force waiting for the async TensorFlow model load.// The “deasync” lib turns async function into sync via JS wrapper of Node event loop.// The “loopWhile” function will wait for the condition resolution to continue.require(‘deasync’).loopWhile(() => !modelLoaded)

Deploying to the WarpJS FaaS is straightforward, just use “npm run deploy” to get the url of the deployed site and start playing with TensorFlow.js.

Feel free to access url to see the demo in action.



About the author

Dominique d’Inverno holds a MSC in telecommunications engineering. After 20 years of experience including embedded electronics design, mobile computing systems architecture and mathematical modeling, he joined ScaleDynamics team in 2018 as AI and algorithm development engineer.