Docker, Container and Data Scientist

Original article was published on Artificial Intelligence on Medium

What is Docker?

Ref: ibexa.co

Docker is the world’s leading software container platform. Let’s take our real example, as we know, data science is a team project and needs to be coordinated with other areas like Client-side (Front end development), Backend (Server), Database, another environment/library dependencies for running the model. The model will not be deployed alone and it will be deployed along with other software applications to get a final product.

From the above picture, we can see the technology stack which has different components and platform which has a different environment. We need to make sure that each component in the technology stack should be compatible with every possible hardware (platform). In reality, it becomes complex to work with all the platforms due to the different computing environment of each component. This is a main problem in the industry and we know that Docker can solve this problem. But, how?

Let’s take one more practical use case from the Shipping industry.

Everybody knows that ships can take all types of goods to different countries. Have you ever noticed that the products shipped are different in sizes? Each ship carries all types of products, however, there are no separate ships for each product. We can see from the above picture there is a car, food items, truck, steel plates, compressors, furniture. All these products are different in nature, sizes, packaging etc. Some of the items are fragile, some need different packaging like food, furniture etc, also how it is going to ship etc. It is a complex problem and the shipping industry solved these using Containers. Whatever the items to be, the only thing we need to do is packaging the items and kept inside the container. Containers help the shipping industry to export the goods easily, safely and efficiently.

Now let’s take our problem. We have a similar kind of problem. Instead of items, we have different components (technology stack) and the solution is using Containers with the help of Docker.

Docker is a tool which helps to create, deploy, and run applications by using containers in a simpler way.

The container helps the data scientist or developer to package up an application with all of the parts it needs, such as libraries and other dependencies, and deploy it as one package.

In simpler terms, a developer and data scientist will package all the software, models and its components into a box called Container and Docker will take care of shipping this container into different platforms. You see the developer and data scientist clearly focus on the code, model, software and its dependencies and put into the container. They don’t need to worry about deployment into the platform which Docker can take care. ML algorithm has several dependencies and Docker helps in downloading and building the same automatically.