Football Commentaries for The Visually Impaired

Source: Deep Learning on Medium

Football Commentaries for The Visually Impaired

Around the world, according to World Health Organization, there are 217 million people suffer from moderate to severe vision impairment, and another 36 million are blind, which means these people cannot even enjoy the little things that we take for granted like watching a football match on a laptop without the assistance from someone else.

In this video, someone in Columbia created a way of communicating a football match situation for his blind friend on a small football board. However, this is only one person, what about so many others? How can we scale help up? You may think that translating traditional commentaries. However, these commentaries tend to focus on the players. I want to explain the general situation in a match, using deep learning and computer vision. And this is my final project at Metis Bootcamp. Let us start with the methodology.


First, I acquire a video of a football match. In this case, I use a 2017 football match between Real Madrid and Barcelona. Then I need to detect the objects that we are interested in. I use state of the art YOLO, you only look once, version 3 deep learning model, which is trained on the Microsoft COCO dataset. The dataset contains 80 everyday objects in daily life. After detection, we need to find a way to classify which team does the person belongs to. Then, extract data to generate heuristic commentaries and translate that into braille.

How did I do it? I use these three tools mainly.

Image Preprocess and Object Detection

With the methodology explained, let us see the actions in detail. As we know, video is composed of images or frames. For my purpose, I am only interested in the frames that depict an overview of the match with at least five to six players, such as these two. Now I know what frames I want to use.

I need to detect the objects that I am interested in. In a football match, the most prominent groups of interests are players and the football. Using YOLO version 3 and a confidence threshold of 40%, the model can detect most of the players on the field.

Team Player Classification

Now let us look even closer. How do we know which team does that person belong to? I tested four methods. The first two use histogram distribution, and measure similarity by either Euclidean distance or cosine similarity, but the results are not satisfactory. So I tried to reduce each image to a 4 by 4-pixel size, then use Euclidean distance to classify. The result is okay. However, can we improve it more? Then I use color masking, which produces a better result of 92% accuracy.

To share a bit more about color masking. There are five groups of people of interest, the teams, their respective goalkeeper, and referee. Using the RGB color in the sample, I can calculate which group the person belongs to.

Commentaries Generation

Finally, I can generate some data. Using the data, I calculate more analytics, such as distance between teams and ball possession. Then I deduce the situation of the match, including which team is attacking and which team is holding the ball. The English commentaries are generated based on the actions.

Once I have the commentaries, I can process them according to these steps and translate them into braille. The visually impaired user can feel the braille using a refreshable braille display.

Final Demo

Here is a demo of the final product. As we can see, as the game goes on, the team players and the football are identified, and commentaries are generated. I choose to display the braille commentaries in the terminal along with English and the time of the comment.

Areas of Improvement

  • With a more powerful GPU, we can enable real-time commentary.
  • Moreover, I can train the model to detect more items such as the gate to identify goals.
  • Last but not least, if I can use a stationary camera feed that captures most of the football field, I can generate more analytics for better commentaries.

Once I have the commentaries, I can process them according to these steps and translate them into braille. The visually impaired user can feel the braille using a refreshable braille display.