Guide to become a Full Stack Machine Learning Engineer(Part 1): Train

Source: Deep Learning on Medium

Guide to become a Full Stack Machine Learning Engineer(Part 1): Train

Objective: To create a Machine Learning prediction model, serve it by a http server on a remote cloud server and manage it like you would on production.

This is a hands-on approach where we will be writing code from scratch with explanation of all the code and jargons.
The final code for reference can be found here

This is part 1 of a 4 part series:

  1. Training a very basic ML model(which tells if there is any beer in a given image) using keras(a python deep-learning library). We won’t get into depth of it, mostly intuition based.
  2. Writing an independent predict function which serves the model and then serving the prediction using a HTTP API POST request.(here)
  3. Dockerising the prediction api. Setting up a Dockerfile and building our own docker image. Deploying the prediction docker on a cloud machine. We will be able to the call the prediction api from anywhere and see the status and logs of the server for debugging.(here)
  4. Setting up basic CI/CD pipeline, i.e. setup our infra such that performing a commit automatically deploys the new code to production.(here)

Project structure

├── data/ # This folder contains the training data
└── dataset/
├── train/
├── beer/
└── not_beer/
└── validate/
├── beer/
└── not_beer/
├── scripts # These contain the standalone scripts.
├── src # Source files
├── training
├── serving
├── prediction
├── test # Automated tests
├── .gitignore # files to be ignored in commits
├── Makefile
├── serving-requirements.txt
└── training-requirements.txt

Start with creating a blank folder mkdir to-beer-or-not-to-beer and cd into it.

Dependency Management

Then create a sandbox environment using virtualenv(pip install virtualenv). This will enable us to experiment with pip package versions which are different from pip packages installed on our system. Create this using virtualenv training-venv (a folder named venv gets created which will contain all your pip packages). Activate this using source ./training-venv/bin/activate

Install the dependencies which we would be needing for training
pip install numpy==1.17.4 tensorflow==1.14 keras==2.2.4 pillow==6.2.1 google-images-download==2.8.0

It is helpful to keep track of all the packages you are using in the project and their version numbers so that your code is easy to set-up and any new breaking changes to your dependencies don’t break your working code(This happens more often than you can imagine!).

To keep track of the packages you are using run this command pip freeze > training-requirements.txt . This will create a new file “training-requirements.txt”. In this you will find the packages you have installed and their version numbers. Reinstalling the packages is as simple as running pip install -r training-requirements.txt .

Collecting the data

Now lets start collecting the data required. Create the scripts directory and lets create a web scrapper which will download images from google for keyword. Using this you can download your beer and not beer images. I used this script to download 1000 images of “beer” and 1000 images of “not_beer” for training and 200 each for validation. Make sure you put the data in the proper folders as shown in the project structure above.

# ./scripts/scrapers/google/
from google_images_download import google_images_download
def image_download(output_dir, keywords , number_images):
keywords : it refers to the keywords being searched
number_images : Total number of images to be downloaded
response = google_images_download.googleimagesdownload()
arguments = {"keywords":keywords,"limit":number_images,"print_urls":True, "output_directory": output_dir} #creating list of arguments
paths = #passing args
print(paths) #printing absolute paths of the downloaded images
image_download('./data/datasets/train/beer', 'beer', 100)

Managing the configuration centrally

Before we jump into training, lets define a file which we would be using to manage the model related configuration. For trying with parameters, we would just have to tweak this file. This will help us maintain a clean code.

# ./src/
dataset_dir = "data/datasets/"
train_data_dir = "data/datasets/beer/train/"
validation_data_dir = "data/datasets/beer/validation/"
trained_model_name = "latest_model_ekdum_kadak"
trained_model_file = "trained_models/beer/latest.h5"
epochs = 50
img_width = 150
img_height = 150
test_img_dir = "data/datasets/beer/test_validation/"
test_img_dir_file = "data/datasets/beer/test_validation/beer.007.jpg"

Training the model

Now that we have the data, we are ready to train the model.
The right tool for an image classification job is a convnet, so let’s try to train one on our data, as an initial baseline. Since we only have few examples, our number one concern should be overfitting. Overfitting happens when a model exposed to too few examples learns patterns that do not generalise to new data, i.e. when the model starts using irrelevant features for making predictions. For instance, if you, as a human, only see three images of people who are lumberjacks, and three, images of people who are sailors, and among them only one lumberjack wears a cap, you might start thinking that wearing a cap is a sign of being a lumberjack as opposed to a sailor. You would then make a pretty lousy lumberjack/sailor classifier.

Data augmentation is one way to fight overfitting, but it isn’t enough since our augmented samples are still highly correlated. Your main focus for fighting overfitting should be the entropic capacity of your model — how much information your model is allowed to store. A model that can store a lot of information has the potential to be more accurate by leveraging more features, but it is also more at risk to start storing irrelevant features. Meanwhile, a model that can only store a few features will have to focus on the most significant features found in the data, and these are more likely to be truly relevant and to generalise better.

There are different ways to modulate entropic capacity. The main one is the choice of the number of parameters in your model, i.e. the number of layers and the size of each layer. Another way is the use of weight regularisation, such as L1 or L2 regularisation, which consists in forcing model weights to taker smaller values.

In our case we will use a very small convnet with few layers and few filters per layer, alongside data augmentation and dropout. Dropout also helps reduce overfitting, by preventing a layer from seeing twice the exact same pattern, thus acting in a way analogous to data augmentation (you could say that both dropout and data augmentation tend to disrupt random correlations occurring in your data).

The code snippet below is our first model, a simple stack of 3 convolution layers with a ReLU activation and followed by max-pooling layers. This is very similar to the architectures that Yann LeCun advocated in the 1990s for image classification (with the exception of ReLU).

from src import config
import os
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras import backend as K
if __name__ == "__main__":
print("Starting training using train_data_dir->", config.train_data_dir)
train_data_dir = config.train_data_dir
validation_data_dir = config.validation_data_dir
nb_train_samples = 2000
nb_validation_samples = 400
epochs = config.epochs
batch_size = 16
if K.image_data_format() == 'channels_first':
input_shape = (3, config.img_width, config.img_height)
input_shape = (config.img_width, config.img_height, 3)
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, (3, 3)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
# this is the augmentation configuration we will use for training
train_datagen = ImageDataGenerator(rescale=1. / 255, shear_range=0.2, zoom_range=0.2, horizontal_flip=True)
# this is the augmentation configuration we will use for testing:
# only rescaling
test_datagen = ImageDataGenerator(rescale=1. / 255)
train_generator = train_datagen.flow_from_directory( train_data_dir, target_size=(config.img_width, config.img_height), batch_size=batch_size, class_mode='binary')
validation_generator = test_datagen.flow_from_directory(validation_data_dir, target_size=(config.img_width, config.img_height), batch_size=batch_size, class_mode='binary')model.fit_generator(train_generator, steps_per_epoch=nb_train_samples // batch_size, epochs=epochs, validation_data=validation_generator, validation_steps=nb_validation_samples // batch_size)

For data preprocessing and augmentation we use following options ->

  • rotation_range is a value in degrees (0-180), a range within which to randomly rotate pictures
  • width_shift and height_shift are ranges (as a fraction of total width or height) within which to randomly translate pictures vertically or horizontally
  • rescale is a value by which we will multiply the data before any other processing. Our original images consist in RGB coefficients in the 0-255, but such values would be too high for our models to process (given a typical learning rate), so we target values between 0 and 1 instead by scaling with a 1/255. factor.
  • shear_range is for randomly applying shearing transformations
  • zoom_range is for randomly zooming inside pictures
  • horizontal_flip is for randomly flipping half of the images horizontally –relevant when there are no assumptions of horizontal asymmetry (e.g. real-world pictures).
  • fill_mode is the strategy used for filling in newly created pixels, which can appear after a rotation or a width/height shift.

After this code completes execution, if everything goes well, a trained model file will get created in “./trained_models/beer/latest.h5” . The network will get trained faster if this code is run on a machine with GPU. The predictions don’t need GPU. it only need the trained weights of the model and the architecture. So we will be using this generated file run the predictions separately, independent of our training.

Getting the perfect model requires many iterations

Note that this is just a basic model to provide you with intuition. Using a transfer learning on a pre-trained model might give a much better accuracy. This is enough to begin with. We can always come back later and train a better model. In the end a “<something>.h5” file will get generated which we will be using for prediction. This file can directly be used in the Prediction as it contains the information regarding the network architecture used and the weights of the neurons of the model.

In the next parts we will be serving it by a http server on a remote cloud server and managing it like you would on production.