Predicting football results using neural networks

Source: Deep Learning on Medium

Go to the profile of Andre Ostrak

The aim of this project is to see if neural networks can help predict outcomes of football matches. We have a database of over 25 000 matches from 11 different leagues in Europe from seasons 2008 to 2016 (the data is from the following Kaggle competition).

The data is divided into different tables, the most important of which are:

  1. Match (25979 rows, 115 columns): consists of match id-s, playing team id-s, match results, squad formations, dates, coefficients of different bookies.
  2. Player attributes (183978 rows, 42 columns): player id-s and attributes for different seasons.
  3. Team attributes (1458 rows, 25 columns): team id-s and attributes for different seasons.

Since there is too much data to include into one predicting network, I had to make a choice about what I was going to focus on. I chose to try using player attributes, squad formations, team attributes and the results of the previous games of the playing teams.

To test my prediction models, I used two different statistics. First, I tried to see how many outcomes the model can predict. Second, I tried to see if the predictions were good enough to earn some money by betting with the bookies.

Player attributes

The first idea I had was to see if the player attributes could be used to predict the results. Looking around the internet, the overall ratings of players (with squad formations) already seem to have been used for a successful prediction model in the following post.

However other player attributes seem not to have been used as successfully so far. Therefore, I set out to see if they could give us better results. The first step towards this was to explore our data. It turns out that there are 38 attributes for each player, meaning the number of attributes for one match is equal to 836. Since this seemed a bit a much, I tried to look if there were some attributes that could be derived from the others as well as. That was to see if there were some attributes that could be less significant than others.

What I learned by exploring the data:

  1. The data inserted into the attack and defense rate columns is often faulty, for this reason, I decided to drop them for all players.
  2. About 4000 matches are missing attributes for some players. I removed the rows that were missing the attributes for at least one player, leaving over 21 000 matches.

To see if there are some attributes that can be derived from others, and to see if there are attributes that are correlated to the result of the match, I drew some heat maps.

The following is the heat map of the attributes of the goal keepers of the home teams. From this, we can see that the overall rating is correlated with the result of the match. We also see that other attributes, that are correlated with the result, are also highly correlated with the overall rating (in this case, the goal-keeping statistics and reactions). The same turned out to be true for other players.

The following is the same heat map for defensive players

Once again we see that reactions, potential and overall ratings are the most influential attributes, but they are also highly correlated. This leaves us with a paradoxical situation, where seemingly the most important attributes give us less additional information.

With this knowledge I decided to try training my models on just the overall ratings and on all the ratings. In addition, I tried seeing if losing the attributes related to goal keeping for all the players besides the goalkeepers would improve the prediction.

Player attributes

Setting up some baselines.

At first I tried to set up some very simple baselines. I divided my data into training and test data (with 90% being training and 10% test data). The results in the test data were divided as follows:

  1. home wins: 45.2% of the games
  2. draws: 25.4% of the games
  3. away wins: 29.2% of the games

So the very first baseline I set is 45.2%. The next step was seeing how well a kNN network could predict the results if we just gave it team id-s and the dates of the games. With the number of neighbors set to 150 the accuracy on the test data was 47.3%.

Now it was time to train kNN networks with the player attributes (with and without squad formations, these were given as the X and Y coordinates of each player on the field). The results are as follows:

We can see that the best result came with just Overall ratings, giving us a 53.7% accuracy on the test data. However, overall ratings with squad formations is almost equal, with 53.6% accuracy. These are already quite impressive. The next goal was to see if we can surpass them with neural networks.

Trying neural networks.

My first idea, when approaching this task, was to use convolutional neural networks. However, as I saw no natural way to group the data for the convolutional layers, I dropped this idea at first (and later came back to it with no success) and instead started out with fully connected networks. The architecture of the networks were usually very simple, with the first layer having two to the power of n nodes and the following layer having half the number of nodes of the previous layer.

An example of a Fully connected network I used. The number of nodes per layer in this instance is: 16, 8, 4, 3.

The results of the neural networks didn’t give very stable results and usually the exact architecture didn’t change the result much. I used mean-squared accuracy as a loss function and Adam as the optimizer. I tried to gather the usual results in the following table.

The usual results with the brought inputs.

In the table, the biggest layer indicates the number of nodes in the biggest layer of the network (So with one type of input, I tried 5 or 6 different networks). When the biggest layer had 256 or 512 nodes, the models started overfitting quite fast and were able to capture enough information from the inputs to get the training accuracy to zero. When the biggest layer was 16 or 32, the models didn’t seem to start overfitting at all, which led me to believe that it couldn’t capture all of the information in the data. Therefore, I gravitated towards models with the biggest layer being 64 or 128.

Some graphs (input: overall ratings and squad formations):

When the biggest layer had 16 nodes (architecture 16–8–4–3), the model seemed to reach a peak and stop training.
When the biggest layer had 64 nodes, the model seemed to start overfitting slowly.
When the biggest layer had 256 nodes, the model was able to capture more information from the data and started overfitting more.

In the end, the best results I got were with overall ratings (with or without formations). Keeping all or almost all of the attributes gave me wildly varying results on the test data and the accuracy was not as good as with just the overall ratings.

Team attributes

Trying to use team attributes seemed like the next logical step after player attributes individually. However, looking through the team attributes more closely, I wasn’t very hopeful. There were 21 attributes for a team and 10 of them had numerical values. The attributes were also more descriptive of play style than the skill of the team. A third of them describing the teams build up play, another third the team’s chance creation and the last third the team’s defense.

Nonetheless, I ran the kNN and some fully connected networks on these statistics and got around 47% accuracy, which was basically the same we got with just the team names. Therefore, while team attributes might still be useful when added to other statistics, I decided to abandon them for the rest of my project.

Results of previous games

Lastly, I tried to see how well we can predict the results of the upcoming match from the results of the previous games of the home and away team.

For each match, the input size was 20. The input consisted of the results of the previous 5 home and away games of each teams. Once again, I set up the baselines with kNN.

Then tried to train fully connected neural networks (the architectures were the same as with player attributes). The results were as follows.

The previous results didn’t give as good of results as the overall ratings, but they did seem to be useful. This led me to try to train a network with overall ratings, formations and previous ratings as input.

Player attributes with previous results

Seeing that player attributes were the same throughout the season, I thought the previous results could give some information on how the team is doing during the season. Therefore, I decided to train networks that was trained on overall ratings, formations and the results of the previous games. Once again I tried my regular fully connected networks, with the biggest layers ranging from 16 to 512. However, the accuracy of the models stayed between 51 and 52%, giving us worse results than with just overall ratings.

My final try was to see if splitting the overall ratings and player attributes, feeding them into the the models I had previously trained, concatenating the results and having one fully connected layer with 3 nodes at the end would work better. Since this was my only custom network, I add the code here.

I froze all the layers besides the final one and tried to train my new model. However, the results were still ranging from 51% to 52%. This led me back to the models with overall ratings.

Testing the best models with the bookies

Now that I had found my best models, it was time to see if they can predict the results well enough to earn money from the bookies.

First, let’s take the overall ratings with formations. I am going to see how much money I would make if I bet on the matches in the test data. We have the coefficients from 10 different bookies. Instead of trying to test my model on each of them individually, I took the maximum coefficient for the home win, maximum for draw and maximum for away win for each match, put one imaginary euro on my prediction for each match and tried to see how much I would have over the test data. However, before I did this, I set up some baselines.

First, let’s try to see how much we would get if we just bet on the home win each match. For this, I created a table that has the results, our predictions (in this case home wins) and the coefficients:

To get the wins, we just multiply the table of the results, the predictions and the coefficients and subtract 1 from each row (this is the one euro we are betting). The result of betting just on the home team is that we lose 54.1 euros over 1971 games.

The tally of our winnings by betting on the home team.

Next we try betting on the favorites, this means betting on the result that has the lowest coefficient. With this strategy, we actually win 15.9 euros.

The tally of our winnings by betting on the favorite.

Next we try our kNN model, this gives 88.8 euros by the end and quite a stable increase over the games.

The tally of our winnings by betting with the kNN model.

Finally, we try our luck with the fully connected network. This gives us a 99.8 euros. With 1971 euros put in the game, this is an average earning of 5%.

The tally of our winnings by betting with the fully connected network model.

However, these were the results on this fixed training and test set and it might not indicate the actual predictive qualities as well as one might think. I tried to train my models a few times, randomizing the training and test data, and there are times, when these models lost money and times it won money. However, I it was almost alway better than our baselines (of always betting on the home team or always betting on the favorites).

Finally, I set the last 2000 matches of the final season as the test data and the rest as training data, to see if we can predict the results of the future games based on the previous matches. With neural networks, this gave results that were way worse than the ones we got with randomized test and training data (around 51%). However, reducing the size of the test data to 1200 matches already gives 52.5% accuracy. This leads me to believe that the networks get better over the course of the season and therefore predict better at the end of it. However, I didn’t have time to fully test this theory.


In this project I tried to train neural networks to predict the results of football games based on the player attributes, team attributes and the results of the previous matches the teams have played. The best results came with just the overall ratings of the players (with or without the squad formations). To test the network, I set up some basic baselines and also trained a kNN model. It turned out that neural networks gave very similar results to the kNN (around 53.5% accuracy on the test set). The results also seemed to suggest that neural networks could be predicting the results of the games well enough to make money from betting.