Source: Deep Learning on Medium
A summary of what I’ve learned from a course about Full Stack Deep Learning
Hi everyone, How’s everything? Today, I’m going to write article about what I have learned from seeing the Full Stack Deep Learning (FSDL) March 2019 courses. It is a great online courses that tell us to do project with Full Stack Deep Learning. What I love the most is how they teach us a project and teach us not only how to create the Deep Learning architecture, but tell us the Software Engineering stuffs that should be concerned when doing project about Deep Learning.
When we do a Deep Learning project, we need to know what are the steps and technology that we should use. We need to know these to enhance the quality of the project. This will be useful especially when we want to do the project in a team. We do not want the project become messy when the team collaborates.
This article will focus on the tools and what to do in every steps of a full stack Deep Learning project according to FSDL course (plus a few addition about the tools that I know). There will be a brief description what to do on each steps. This article will only show the tools that I lay my eyes on in that course. Programming language that will be focused in this article is Python.
- Planning and Project Setup
- Data Collection and Labeling
- Codebase Development
- Training and Debugging
These are the steps that FSDL course tell us:
- Planning and Project Setup
- Data Collection and Labeling
- Training and Debugging
- Deploying and Testing
Where each of the steps can be done which can come back to previous step or forth (not waterfall). The course also suggest that we do the process iteratively, meaning that we start from small progress and increase it continuously. For example, we start using simple model with small data then improve it as time goes by.
Planning and Project Setup
This step will be the first step that you will do. We need to state what the project going to make and the goal of the project. We also need to state the metric and baseline of the project. The substeps of this step are as follow:
Define Project Goals
First, we need to define what is the project is going to make. There are two consideration on picking what to make. They are are Impact and Feasibility.
We need to make sure that the project is Impactful. What are the values of your application that we want to make in the project. Two questions that you need to answer are
- Where can you take advantages of cheap prediction ?
- Where can you automate complicated manual software pipeline ?
Where for cheap prediction produced by our chosen application that we want to make, we can produce great value which can reduce the cost of other tasks.
Feasibility is also thing that we need to watch out. Since Deep Learning focus on data, We need to make sure that the data is available and fit to the project requirement and cost budget.
We need to consider the accuracy requirement where we need to set the minimum target. Since the project costs will tend to correlate super linearly with the project costs, again, we need to considerate our requirement and maximum cost that we tolerate. Also consider that there might be some cases where it is not important to fail the prediction and some cases where the model must have a low error as possible.
Finally, we need to see the problem difficulty. How hard is the project is. To measure the difficulty, we can see some published works on similar problem. For example, search some papers in ARXIV or any conferences that have similar problem with the project. With these, we can grasp the difficulty of the project.
See Figure 4 for more detail on assessing the feasibility of the project.
Metric is a measurement of particular characteristic of the performance or efficiency of the system.
Since system in Machine Learning work best on optimizing a single number , we need to define a metric which satisfy the requirement with a single number even there might be a lot of metrics that should be calculated. For a problem where there are a lot of metrics that we need to use, we need to pick a formula for combining these metrics. There are:
- Simple average / weighted average
- Threshold n-1 metrics, evaluate the nth metric
- Domain specific formula (for example mAP)
Here are some example how to combine two metrics (Precision and Recall):
After we choose the metric, we need to choose our baseline. Baseline is an expected value or condition which the performance will be measured to be compared to our work. It will give us a lower bound on a expected model performance. The tighter the baseline is the more useful the baseline is.
So why is the baseline is important? Why not skip this step ? We can measure our model how good it is by comparing to the baseline. By knowing how good or bad the model is, we can choose our next move on what to tweak.
To look for the baseline, there are several sources that you can use:
- External baseline , where you form the baseline from the business or engineering requirement. You can also use published work results as the baseline.
- Internal baseline , Use scripted baseline or create simple Machine Learning (ML) model such as using standard feature based word or using simple model.
The baseline is chosen according to your need. For example if you want a system that surpass human, you need to add a human baseline.
Create your codebase that will be the core how to do the further steps. The source code in the codebase can be developed according to the current need of what the project currently going to do. For example, if the current step is collecting the data, we will write the code used to collect the data (if needed). We will mostly go to this step back and forth.
We should make sure that the source code in the codebase is reproducible and scalable, especially for doing the project in a group. To make it happen, you need to use the right tools. This article will tell us about it later.
Data Collection and Labeling
After we define what we are going to create, baseline, and metrics in the project, the most painful of the step will begin, data collection and labeling.
Most of Deep Learning applications will require a lot of data which need to be labeled. Time will be mostly consumed in this process. Although you can also use public dataset, often that labeled dataset needed for our project is not available publicly.
Here is the substeps:
We need to plan how to obtain the complete dataset. There are multiple ways to obtain the data. One that you should be considered that the data need to align according to what we want to create in the project.
If the strategy to obtain data is through the internet by scraping and crawling some websites, we need to use some tools to do it. Scrapy is one of the tool that can be helpful for the project
This is a Python scrapper and data crawler library that can be used to scrap and crawl websites. It can be used to collect data such as images and texts on the websites. We can also scrap images from Bing, Google, or Instagram with this. To use this library, we need to learn from the tutorial that is also available in its website. Do not worry, it is not hard to learn.
After we collect the data, the next problem that you need to think is where to send your collected data. Since you are doing the project not alone, you need to make sure that the data can be accessed by everyone. Also, we need to choose the format of the data which will be saved. Below is a solution when we want to save our data in cloud.
For storing your binary data such as images and videos, You can use cloud storage such as AmazonS3 or GCP to build the object storage with API over the file system. We can also built versioning into the service. See their website for more detail. You need to pay to use it (there is also a free plan).
Database is used for persistent, fast, scalable storage, and retrieval of structured data. Database is used to save the data that often will be accessed continuously which is not binary data. You will save the metadata (labels, user activity) here. There are some tools that you can use. One that is recommended is PostgresSQl.
It can store structured SQL database and also can be used to save unstructured json data. It is still actively maintaned.
When you have data which is the unstructured aggregation from multiple source and multiple format which has high cost transformation, you can use data lake. Basically, you dump every data on it and it will transform it into specific needs.
Amazon Redshift is one of cannonical solution to the Data Lake.
When we are doing the training process, we need to move the data that is needed for your model to your file system.
The data should be versioned to make sure the progress can be revertible. The version control does not only apply to the source code, it also apply to the data. We will dive into data version control after we talk about Data Labeling.
In this section, we will know how to label the data. There are source of labors that you can use to label the data:
- Hire annotators by yourself
- Crowdsource (Mechanical Turk)
- Use full-service data labeling companies such as FigureEight, Scale.ai, and LabelBox
If you want the team to annotate it , Here are several tools that you can use:
Online Collaboration Annotation tools , Data Turks. For the free plan, it is limited to 10000 annotations and the data must be public. It offers several annotation tools for several tasks on NLP (Sequence tagging, classification, etc) and Computer Vision (Image segmentation, Image bounding box, classification, etc). The FSDL course uses this as the tool for labeling.
Free open source Annotation tools for NLP tasks. It also support sequence tagging, classification, and machine translation tasks. Can also be set up as a collaborative annotation tools, but it need a server.
Offline annotation tools for Computer Vision tasks. It is released by Intel as Open Source. It can label bounding boxes and image segmentations.
If you want to search any public datasets, see this article created by Stacy Stanford for to know any list of public dataset.
What are the best datasets for machine learning and data science? After reviewing datasets hours after hours, we have…medium.com
It is still actively been updated and maintaned.
There are level on how to do data versioning :
- Level 0 : Unversioned. We should not attempt this. Deployment need to be versioned. If the data is not versioned, the deployed models are also not versioned. There will be a problem if we use unversioned one, which is inability to get back to previous result.
- Level 1 : Versioned via snapshot. We store all of the data used on different version. Which we could say that it is a bit hacky. We still need to version data as easy as version the code
- Level 2 : Data is versioned as a mix of code and assets. The heavy files will be stored in other server (such as Amazon S3) where there is a JSON or similar type as reference of the relevant metadata. The relevant metadata can contains labels, user activity, and etc. The JSON file will be versioned. The JSON files can become big. To make it easier, we can use git-lfs to version it. This level should be acceptable to do the project.
- Level 3 : Use specialized software solution for versioning the data. If you think that level 2 is not enough for your project, you can do this. One of tool for data versioning is DVC.
DVC is built to make ML models shareable and reproducible. It is designed to handle large files, data sets, machine learning models, and metrics as well as code. It is a solution for versioning ML models with its dataset. We can connect the version control into the cloud storage such as Amazon S3 and GCP.
When we do the project, expect to write codebase on doing every steps. Reproducibility is one thing that we must concern when writing the code. we need to make sure that our codebase has reproducibility on it.
Before we dive into tools, we need to choose the language and framework of our codebase.
For choosing programming language, I prefer Python over anything else. Python has the largest community for data science and great to develop. The popular Deep Learning software also mostly supported by Python. The language is also easy to learn.
Deep Learning Framework
There are several choices that you can made for the Deep Learning Framework. The most popular framework in Python are Tensorflow, Keras, and PyTorch. Use the one that you like.
For easier debugging, you can use PyTorch as the Deep Learning Framework. Keras is also easy to use and have good UX. It is a wrapper of Tensorflow, Theano and other Deep Learning framework and make it easier to use. Tensorflow is also a choice if you like their environment. Tensorflow can be wise decision because of the support of its community and have great tools for deployment.
Do not worry about the deployment. There is exists a software that can convert the model format to another format. For example, you can convert the model that is produced by Pytorch to Tensorflow. We will see this later.
I think the factor of choosing the language and framework is how active the community behind it. Since it will give birth of high number of custom package that can be integrated into it.
One of the important things when doing the project is version control. When we do the project, we don’t want the inability to redo our code base when someone accidentally wreck it. We also need to keep track the code on each update to see what are the changes updated by someone else. This will not be possible if we do not use some tools do it. Git is one of the solution to do it.
Okay, we know that version control is important, especially on doing collaboration work. Currently, git is one of the best solution to do version control. It also be used to share your code to other people in your team. Without this, I don’t think that you can collaborate well with others in the project.
There is also important thing that should be done, which is Code Review. Code reviews are an early protection against incorrect code or bad quality code which pass the unit or integration tests. When you do collaboration, make someone check your code and review it. Most of the version control services should support this feature.
When we first create the project structure folder, we must be wondering how to create the folder structure. Then, we give up and put all the code in the root project folder. It’s a bad practice that give bad quality code.
One of the solution that I found is cookiecutter-data-science. It give a template how should we create the project structure. It also give how to give a name to the created file and where you should put it. Be sure to use it to make your codebase not become messy. Consider reading the website to use it.
Integrated Development Environment (IDE)
IDE is one of the tools that you can use to accelerate to write the code. It has integrated tools which can be useful for developing. There are several IDEs that you can use:
IDE that is released by JetBrains. This IDE can be used not only for doing Deep Learning project, but doing other project such as web development. Pycharm has auto code completion, code cleaning, refactor, and have many integrations to other tools which is important on developing with Python (you need to install the plugin first). It has nice environment for doing debugging. It can also run notebook (.ipynb) file in it.
Jupyter Lab is one of IDE which is easy to use, interactive data science environment tools which not only be used as an IDE, but also be used as presentation tools. The User Interface (UI) is best to make this as a visualization tools or a tutorial tools. We can make the documentation with markdown format and also insert picture to the notebook.
Personally, I code the source code using Pycharm. When I create some tutorials to test something or doing Exploratory Data Analysis (EDA), I use Jupyter Lab to do it. Just do not put your reusable code into your notebook file, it has bad reproducibility.
“Hey, what the hell !? Why I cannot run the training process at this version” — A
“Idk, I just push my code, and I think it works on my notebook.. wait a minute.. I got an error on this line.. I didn’t copy all of my code into my implementation” — B
“So why did you push !?!?” — A
Before we push our work to the repository, we need to make sure that the code is really works and do not have error. To do that, we should test the code before the model and the code pushed to the repository. Unit or Integration Tests must be done.
Unit tests tests that the code should pass for its module functionality. Integration tests test the integration of modules. It will check whether your logic is correct or not. Do this in order to find your mistakes before doing the experiment.
CircleCI is one of the solution to do the Continuous Integration. It can do unit tests and integration tests. It can uses Docker Image (we will dive into it later) as a containerization of the environment (which we should use it) . The similar tools that can do that are Jenkins and TravisCI.
Here are several library that you can use if you want to test your code in Python:
pipenv check : scans our Python package dependency graph for known security vulnerabilities
pylint : does static analysis of Python files and reports both style and bug problems.
mypy : does static analysis checking of Python files
bandit : performs static analysis to find common security vulnerabilities in Python code
shellcheck : finds bugs and potential bugs in shell scripts ( if you use it)
pytest : Python testing library for doing unit and integration test
Write them into your CI and make sure to pass these tests. If it fails, then rewrite your code and know where the error in your code is.
Here is one of the example on writing unit test on Deep Learning System.
Custom Environment and Containerization
“Hey, I’ve tested it on my computer and it works well”
“What ? No dude, it fails on my computer ? How the hell it works on your computer !?”
Ever experienced that ? I have. One of the problem that create that situation caused by the difference of your working environment with the others. For example, you work on Windows and the other work in Linux. The difference of your library and their library can also be the trigger of the problem.
To solve that, you need to write your library dependencies explicitly in a text called
requirements.txt. Then run Python virtual environment such as
pipenv. This will solve the library dependencies. Nevertheless, it still cannot solve the difference of enviroment and OS of the team. To solve it, you can use Docker.
Docker is a container which can be setup to be able to make virtual environment. We can install library dependencies and other environment variables that we set in the Docker. With this, you won’t have to fear on having error that is caused by the difference of the environment. Docker can also be a vital tools when we want to deploy the application. It will force the place of the deployment use the desired environment.
To share the container, First, we need write all of the step on creating the environment into the Dockerfile and then create a DockerImage. It can be pushed into DockerHub. Then the other person can pull the DockerImage from DockerHub and run it from his/her machine.
To learn more about Docker, There is a good article that is beginner friendly written by Preethi Kasireddy.
If you’re a programmer or techie, chances are you’ve at least heard of Docker: a helpful tool for packing, shipping…medium.freecodecamp.org
Figure 17 is an example how to create the Dockerfile.
Training and Debugging
Now we are in Training and Debugging step. This is the step where you do the experiment and produce the model. Here are the substeps for this step:
With your chosen Deep Learning Framework, code the Neural Network with a simple architecture (e.g : Neural Network with 1 hidden layer). Then use defaults hyperparameters such as no regularization and default Adam Optimizer. Do not forget to normalize the input if needed. Finally, use simple version of the model (e.g : small dataset).
Implement and Debug
To implement the neural network, there are several trick that you should follow sequentially.
- Get the model to run
The things that we should do is to get the model that you create with your DL framework to run. It means that to make sure no exception occurred until the process of updating the weight.
The exception that often occurs as follow:
- Shape Mismatch
- Casting Issue
- Out of Memory
2. Overfit a Single Batch
After that, we should overfit a single batch to see that the whether the model can learn or not. Overfit means that we do not care about the validation at all and focus whether our model can learn according to our needs or not. We do this until the quality of the model become overfit (~100%). Here are common issues that occurs in this process:
- Error goes up (Can be caused by : Learning Rate too high, wrong loss function sign, etc)
- Error explodes / goes NaN (Can be caused by : Numerical Issues like the operation of log, exp or high learning rate, etc)
- Error Oscilates (Can be caused by : Corrupted data label, Learning rate too high, etc)
- Error Plateaus (Can be caused by : Learning rate too low, Corrupted data label, etc)
3. Compare to a known result
After we make sure that our model train well, we need to compare the result to other known result. Here is the hierarchy of known result:
We do this to make sure that our model can really learn the data and see the model is in the right track on learning the task. We will need to keep iterating until the model can perform up to expectation.
We will calculate the bias-variance decomposition from calculating the error with the chosen metric of our current best model. The formula of calculating the bias-variance decomposition is as follow:
irreducible error = the error of the baseline
bias = training error - iredducible error
variance = validation error - train error
validation overfitting = test error - validation error
Here is some example on implementing the bias-variance decomposition.
By knowing the value of bias, variance, and validation overfitting , it can help us the choice to do in the next step what to improve.
If the model has met the requirement, then deploy the model. If not, then address the issues whether to improve the data or tune the hyperparameter by using the result of the evaluation. Consider seeing what is wrong with the model when predicting some group of instances. Iterate until it satisfy the requirement (or give up).
Here are some tools that can be helpful on this step:
Here we go again, the version control. Yep, we have a version control for code and data now it is time to version control the model. Here are the tools that can be used to do version control:
A version control of the model’s results. It has nice User Interface and Experience. Then, It can save the parameter used on the model, sample of the result of the model, and also save the weight and bias of the model which will be versioned. Furthermore, It can visualize the result of the model in real time. Moreover, we can also revert back the model to previous run (also change the weight of the model to that previous run) , which make it easier to reproduce the models. It can run anytime you want. It also scales well since it can integrate with Kubeflow (Kubernetes for ML which manages resources and services for containerized application).
It is also a version control to versioning the model. It also saves the result of the model and the hyperparameter used for an experiment in a real time. It can also estimates when the model will finish the training . It will train the model every time you push your code to the repository (on designated branch). It also visualizes the result of the model in real time.
When optimizing or tuning the hyperparameter such as learning rate, there are some libraries and tools available to do it. There are:
For Keras DL Framework : Hyperas,
For Pytorch DL Framework : Hypersearch
Others : Hyperopt
WANDB also offer a solution to do the hyperparameter optimization. You need to contact them first to enable it though.
The final step will be this one. The substeps are as follow:
Pilot in production means that you will verify the system by testing it on selected group of end user. By doing that, we hope that we can gain a feedback on the system before fully deploy it. For Testing, There are several testing that you can do to your system beside Unit and Integration test, for example : Penetration Testing, Stress Testing, etc.
After we are sure that the model and the system has met the requirement, time to deploy the model. First of all, there are several way to deploy the model. There are :
- Web Server Deployment
- Embedded System and Mobile
Web Server Deployment
There are several strategies we can use if we want to deploy to the website. Before that, we need to make sure that we create a RESTful API which serve the predictions in response of HTTP requests (GET, POST, DELETE, etc). The strategies are as follow:
- Deploy code to cloud instances. scale by adding instances.
- Deploy code as containers (Docker), scale via orchestration. App code are packaged into Docker containers. Example : AWS Fargate.
- Deploy code as “Serverless function”. App code are packaged into zip files. The serverless function will manage everything . e.g : instant scale, request per second, load balancing, etc. It’s different from these two above, Serverless Function only pay for compute time rather than uptime. Example : AWS Lambda, Google Cloud Functions, and Azure Functions
Embedded System and Mobile
To deploy to the embedded system or Mobile, we can use Tensorflow Lite. It has smaller, faster, and has less dependencies than the Tensorflow, thus can be deployed into Embedded System or Mobile. Unfortunately it has limited set of operators.
There is also a tools called TensorRT. It optimized the inference engine used on prediction, thus sped up the inference process. It is built on CUDA. On embedding systems, NVIDIA Jetson TX2 works well with it.
ONNX (Open Neural Network Exchange) is a open source format for Deep Learning models that can easily convert model into supported Deep Learning frameworks. ONNX supports Tensorflow, Pytorch, and Caffe2 . It can mix different frameworks such that frameworks that are good for developing (Pytorch) don’t need to be good at the deployment and inference (Tensorflow / Caffe2).
If you deploy the application to cloud server, there should be a solution of the monitoring system. We can set the alarm when things go wrong by writing the record about it in the monitoring system. With this, we will know what can be improved with the model and fix the problem.
In this article, we get to know the steps on doing the Full Stack Deep Learning according to the FSDL course on March 2019. First, we need to setup and plan the project. We need to define the goals, metrics, and baseline in this step. Then, we collect the data and label it with available tools. In building the codebase, there are some tools that can maintain the quality of the project that have been described above. Then we do modeling with testing and debugging. After the model met the requirement, finally we know the step and tools for deploying and monitoring the application to the desired interface.
That’s it, my article about tools and steps introduced by the course that I’ve learned. Why do I write this article ? I found out that my brain can easily remember and make me understand better about the content of something that I need if I write it. Moreover, In the process of my writing, I get to have a chance to review the content of the course. Furthermore, It can make me to share my knowledge to everyone. I am happy to share something good to everyone :).
I gain a lot of new things in following that course, especially about the tools of the Deep Learning Stacks. I also get to know how to troubleshoot model in Deep Learning since it is not easy to debug it. It also taught me the tools , steps, and tricks on doing the Full Stack Deep Learning. To sum it up, It’s a great courses and free to access. Therefore, I recommend it to anyone who want to learn about doing project in Deep Learning.
To be honest, I haven’t tried all the tools written in this article. The tools and its description that this article presents are taken from the FSDL course and some sources that I’ve read. You can tell me if there are some misinformation, especially about the tools.
I welcome any feedback that can improve myself and this article. I’m in the process of learning on writing and learning to become better. I apreciate a feedback to make me become better. Make sure to give feedback in a proper manner 😄.
See ya in my next article.
Pilot Testing is verifying a component of the system or the entire system under a real-time operating conditions. It…www.guru99.com
Many enterprises organize data in disparate silos, making it much more challenging to ask questions that require data…medium.com
https://docs.google.com/presentation/d/1yHLPvPhUs2KGI5ZWo0sU-PKU3GimAk3iTsI38Z-B5Gw/ (Presentation in ICLR 2019 about Reproducibility by Joel Grus). Figure 14 and 16 are taken from this source.
Hands-on program for software developers familiar with the basics of deep learning seeking to expand their skills.fullstackdeeplearning.com
Others figure are taken from this source.