Should I get a cloud server? Head-to-head performance analysis of TensorFlow.js

Source: Deep Learning on Medium

Implementation Details: Task, Models, Dataset, Hardware

For each experiment, we’ll be running a simple image classification task on the FashionMNIST dataset. We’ll be utilizing the following models:

  • A dense neural net (DNN: one dense layer),
  • A small convolutional neural net: 1 conv2D layer
  • A medium CNN: 2 conv2D layers and 1 maxpooling layer
  • A big CNN: 3 conv2D layers and 2 maxpooling layers

Seem too small? The browser can barely handle training the medium CNN, which is simple. This says a lot about the training limitations of the browser.

However, it is important we try a bigger and pre-trained model, especially for deployment performance. We spoke earlier about the limitations of the Python-JavaScript conversion API, and would prefer not to hand-code large neural nets like VGG. We’ll be using the biggest (and only) large pre-trained architecture available for TF.js: MobileNetV1.

In terms of hardware, we’ll be training the browser models on Google Chrome with WebGL acceleration (which is built-in to TF.js). We’ll be training the python models on an AWS DLAMI t2.medium instance.

We utilized a batch size of 128 or 256 and default learning rate and initialization. For further implementation details, such as model hyperparameters (learning rate, batch size) please see the attached Github.

Pure Training Throughput Comparison

We’ll simply train various architectures in the browser and the server and make a comparison of the throughput. This experiment exists to provide a baseline of sorts: it’s just a raw comparison of throughput to throughput.

Full Pipeline Custom Deployment Time Comparison

Like we discussed earlier, TF.js really isn’t meant for training. The previous result is helpful, but it’ll most likely tell us what we already know: do not train in the browser. However, what about only doing forward passes? That is, we’ve trained the model already and simply deploy it in the website as a classifier. This is where TF.js may really shine.

We’re going to deploy the same models we used for training, and for the browser, we can measure the time to pass the test set through it. If we’re considering this as a “deployed product” we can’t simply pass the test through the model on the server. For a fair comparison, we need to measure the latency of the entire pipeline; that is communication from the front end to the server, data acquisition, test set, and response. This is where TF.js may really shine, network latencies and such may make the two test times comparable.

Experiment 3: Full Pipeline Built-In Deployment Time Comparison

It’s vital we use a large architecture in order to get some sense about model feasibility, and we’d like to see how well TF.js’s built-in models perform! We will repeat the previous full-pipeline experiment of passing the test set through with TF.js’s MobileNetV1 and a python MobileNetV1 running on the server.

Architecture

We utilized the following architecture for our experiments:

Fig 1: Experiment Architecture

From the client side, one has the capability of submitting various jobs. Firsto ff, one can submit both training and testing jobs to the TF.js models sitting right on the client side.

If you want to utilize the server (purely for testing), you can send a job. This job is a configuration file specifying the model to be used (DNN, small CNN, medium CNN, big CNN, MobileNetV1), the dataset to use (we currently have support for MNIST and FashionMNIST (our experiments will use FashionMNIST), as well as other parameters such as batch size, epochs, etc. Upon receiving a job, the instance will either pull the dataset from the internet or from an S3 bucket (for custom datasets).

The proxy only causes a slight additional slowdown; we introduced it for several reasons. We utilize a “pull-push” model where the jobs are pushed to the proxy and the EC2 instance pulls them as fast as it can. It’s the same principle for the completion message sent back. We found that not only is this more robust to network failures, it allows further optimization/prioritization opportunities for pulls when the server is processing multiple jobs. We found that the proxy did not cause a significant deviation in the results compared to a direct HTTP request.

The frontend of our experiment is public and available for testing. Please see the link down below.