Autoencoder & K-Means — Clustering EPL Players by their Career Statistics

Source: Deep Learning on Medium

The reason I chose to have two neurons in the latent layer was to have dimension reduction on the input data to two dimensions, due to its simplicity and effectiveness on visualization (on the 2D space). However, unlike what I expected, the outputs of the latent layer turned out to be lying on the x-axis. The clusters are in different colors, as shown in the graph.

Summary of the clusters (number of players, the average number of appearances, wins and losses for each class)

Here are some reviews of every cluster (named ‘class’ here). Of course, other statistics might have affected the result of clustering, but I arbitrarily chose three statistics to compare each class — the average number of appearances, wins and losses. My assumption before running this task was that within each cluster, players would have similar patterns — such as positions or related ratio statistics, which could be goal per match. It did not end up exactly that way; however, there are some classes with interesting results.

Top 5 and bottom 5 rows of Class 1
  • Class 1: consists of defenders and midfielders who usually play in defensive roles, except for Peter Crouch. He was the only forward in this class.
Top 5 and bottom 5 rows of Class 4
  • Class 4: consists of defenders who are (or were) core First team member. All of them played more than 100 matches in their EPL career.
t-SNE plot for latent layer outputs with clustering by K-Means

Visualization will also be done by using the t-SNE algorithm on the output of the representation layer. t-SNE is known to have weaknesses in dimensionality reduction with a dimension greater than 3. However, it is also known to improve the quality of constructed visualizations of data representations produced by deep neural networks.