Fastai to prevent distracted city-goers from walking into on-coming traffic

Source: Deep Learning on Medium

Fastai to prevent distracted city-goers from walking into on-coming traffic

I had recently come across a few articles about young people walking into on-coming traffic while being distracted on their phones, and i thought surely with all the technology and machine learning advances there is a way to prevent this, at least in many cases. I decided to make a very simple model to show how it could be done and after seeing how simple it can be, hopefully the makers of smart phones such as Apple and Google could think about implementing similar functionality into their mobile operating systems.

In this article we will walk through each step in the process. The code is written in Python and implemented inside a Jupyter notebook. If you find it interesting and want to check out the full details you can find the repo here.

We look at each step starting with labelling a dataset that we obtain ourselves. This is one of the most important steps and is almost never covered in machine learning books or articles. Then we will move on to training and evaluating our model.

So lets get started

The first step is to gather and label some data. The data gathered for this example is a continuous video that i asked my sister to take for me while walking to the station in the city. I asked someone else to do this so that i didn’t unintentionally influence the angle that the camera was pointing or the direction. I told her to do everything that she would normally do while instagramming or texting or what ever while just recording instead. After insisting that i was “so annoying”, she agreed.

I have created a few helpful classes for this project to make labelling and evaluating easier. You can find the implementation of these objects in the repo here.

First we will create our Labeler object.

from helper_classes import LabelerVIDEO_FILE_NAME = 'video.mp4'
LABELS_FILE_NAME = 'labels.csv'
TRAIN_PCT = .9
labeler = Labeler(VIDEO_FILE_NAME, LABELS_FILE_NAME, TRAIN_PCT);

The Labeler object uses the Ipywidgets module to render GUI elements directly into our Jupyter notebook, pretty cool!

With this simple tool we can give each frame an appropriate label. This project is a simple binary classification problem so we only want to label a frame as either SAFE indicating its safe to walk forward or DANGER we are about to walk onto a road. When we are finished labelling each frame we hit save and then we are ready to roll.

The labels are stored as a pandas DataFrame and can be inspected like so,

labeler.labels.head()

We can also inspect the unique classes contained within our labels DataFrame to make sure they are what we expect.

labeler.get_classes()output: 
['safe', 'danger']

Next we need to create the directory structure for the dataset. We will be creating our fastai DataBunch using the builtin function from_folder so we need to create a directory structure in following form.

data\
train\
class1\
class2\
...
valid\
class1\
class2\
...

More information on this can be found in the fastai documentation here. Our Labeler class will do this for us.

labeler.clean_dir_structure()output:
Dataset file structure
-------------
data/train/safe/
data/train/danger/
data/valid/safe/
data/valid/danger/

Now we have what we need to create the dataset.

labeler.create_dataset()

Next, create a fastai DataBunch.

from fastai import *
from fastai.vision import *
data = ImageDataBunch.from_folder(path='data/', ds_tfms=get_transforms()).normalize(imagenet_stats)

Show a few random samples from the dataset.

data.show_batch(2)

Woo now we have everything we need to move onto training a model.

Establishing a baseline

Our Evaluator object will allow us to generate a video of the results and see with our own eyes how well the model performs instead of just looking at the accuracy.

from helper_classes import Evaluator
evaluator = Evaluator()

Create a fastai Learner with a pre-trained ResNet50.

learn = cnn_learner(data, models.resnet50, metrics=accuracy)

Lets evaluate our model before we start training. We want to do this to establish a baseline and so that we can see how much the trained model improves upon what is essentially random guessing.

The generate_result function will evaluate the validation images in order, so that we can write the images with an overlay showing the prediction to a video file.

evaluator.generate_result(learn=learn, name='result-1')
evaluator.get_accuracy(learn)
output:
0.5460784435272217

Lets see how it looks.

evaluator.show('result-1')

As we can see, the models predictions look a lot like random guessing.

Training the model

The fastai library comes with a tool that helps us find an appropriate learning rate for our model.

learn.lr_find()
learn.recorder.plot()

I chose an initial value of 1e-02. It has a relatively low loss and in a section of steep decline. Choosing the correct learning rate with this method is not an exact science. If you are interested in finding out more, definitely checkout the fastai course here. The person that runs the course Jeremy Howard our lord and saviour is the best teacher of ML out there.

Train the model.

from fastai.callbacks import *learn.fit_one_cycle(6, max_lr=slice(1e-02), callbacks=[
SaveModelCallback(learn, monitor='accuracy', mode='max', name='model-1')
])

Load the best model from the training cycle.

learn = learn.load('model-1')

Unfreeze the model. As mentioned before, the model we create was a pre-trained ResNet50. In the first training cycle we only trained a small part of the model which is essentially a few layers added to the very end of the ResNet50, and the pre-trained part was frozen. The reason we do this is because the model weights in the pre-trained part are already pretty close to what we want them to be.

learn.unfreeze()

Continue training the entire model.

learn.fit_one_cycle(6, max_lr=slice(1e-05, 1e-03), callbacks=[
SaveModelCallback(learn, monitor='accuracy', mode='max', name='model-2')
])

Load the best model from the second stage of training.

learn = learn.load('model-2')

Evaluate our trained models performance.

evaluator.generate_result(learn=learn, name='result-2')
evaluator.get_accuracy(learn)
output:
0.8264706134796143

Not bad, lets have a look.

evaluator.show('result-2')

The model performs pretty well, awesome!

Conclusion

So there we go. We took a short video, labelled our own data, established a baseline, trained a model and evaluated its performance. I had a lot of fun building this little project and i think that big smart phone providers should definitely look into something like this.

All of the code can be found on my GitHub here.

I hope you enjoyed my first article 🙂 If you have any thoughts you would like to share please comment bellow.