Deploying a Deep Learning pipeline with Docker on AWS

Our model was built with Keras and Tensorflow as backend

Note: this article was written with Mohamed Labouardy

Deep learning model are tricky to design and tune. Every data scientist has already feel the relief of seeing its model perform as well with new data than during the training, test and validation phases. However, this is not an achievement but the beginning of a new turbulence zone: how a model (which in our case will be in production as a REST API) can scale to support the amount of prediction it will be asked? You might have thousands or even millions of requests simultaneously and this is where the trouble begins.

Note: Our architecture is based on the amazing article of Adrian Rosebrock quoted by François Chollet (the creator of Keras) as the reference architecture to build a scalable deep learning model-serving API. As mentioned in the article, we worked with Keras (TensorFlow back-end) Flask and Redis but we wrapped it into Docker containers to scale out/in services based on traffic and AWS Lambda to trigger the deployment process from Jenkins.

The whole infrastructure

We decided to built an infrastructure where each part of the process is independent but can communicate with the others ones through a Push and Pull model (AWS SNS and Redis). It is inspired from the microservices architecture well known in software engineering.

Our Infrastructure is divided into five parts:

The learning containers : the source code of our algorithm is wrapped into Docker containers. It allows a better portability as all of our data team is not working with the same libraries version. The code can then be run on every computer which was helpful during the training process. Once the model is ready to go on production (because it reaches the minimum accuracy level required), a service will load all the necessary data (in our case it is some .pickle, .h5 .csv and .xlsx object) into Redis and S3 databases. We definitely intend to improve our model so for every new version, the service will erase the previous objects in AWS and Redis and replace them by the new ones. We strongly recommend to have several environment deployment (3 in our cases) so each new version can go through a test process before being in production.

The application builder: once new objects are loaded, a message is publish through AWS SNS and an AWS Lambda function is triggered. It launches the final unit and integration test and eventually deploy our application through Jenkins, an open source automation server.

Lambda on AWS triggered by SNS

The producer containers: they hold our flask application. The producer has one task which is to handle the request from a client (in our case it is an other microservice of our infrastructure), send it to a queue (we use Redis) , wait for the request to be handled by the consumer and then send back the prediction to the client. The producer will stay “on hold” until it receives the answer from the producer so we make sure a request is only process once.

The consumer containers: this is where the magic happens. Each consumer container constantly loops on the queue to look for new data to predict by our deep learning model and send back the response to an other queue. In our case, prediction can takes a few seconds (while it only takes a few milliseconds for the producer to send a message to the Queue) so it is important to constantly monitor the the activity of the service because you should scale the consumers and producers independently. Otherwise you will not optimize the performance of your architecture.

The monitoring: In addition the amount of request handled by our API, it is needless to say we have to monitor many metrics within our application to quickly detect unusual values, bugs or any other anomaly. We mainly use ELK stack to build interactive and dynamic dashboards and Grafana to monitor the infrastructure resource usage.

monitoring the amount of requests
monitoring resource usage

There are many other architecture, frameworks and tools do deploy deep learning pipeline depending of your needs. Like many other startups, we faced the scalability issue and found a way to solve it from scratch.

About Foxintelligence

Foxintelligence delivers the best insights on the latest European ecommerce trends, by unlocking intelligence from hundreds of merchants and thousands of brands e-receipts thanks to a panel of millions of online shoppers. We transform the panel industry as a service, at the service of business leaders and marketing professionals who want to drive growth with the most accurate and freshest competitive and consumer information.

Source: Deep Learning on Medium