Source: Deep Learning on Medium

Before the last layer(Output layer) the features we obtained are in the form of 2d vectors, the output we are going to fetch from the last layer expects to be a single vector. Because the probabilities of the classes or the single regression value which we are expecting at the end is in the form of 1d vector. For example if we are performing classification of cats and dogs the output should give the probability either the image is cat(0) or dog(1) so a 1d vector is sufficient to give the result. Which leads us to converting the values of the last before layer into probabilities so we flatten the values. I hope this will give a vague explanation. If u want to study more about cnn u can check cs223n neural networks by Stanford just google it.