Birds of a feather flock together (KNN)

Original article was published on Artificial Intelligence on Medium

K-Nearest Neighbors

k-Nearest Neighbors (k-NN) is a non-parametric learning algorithm. Contrary to other learning algorithms that allow discarding the training data after the model is built, k-NN keeps all training examples in memory.

Once a new, previously unseen example x comes in, the k-NN algorithm finds k training examples closest to x and returns the majority label (in case of classification) or the average label (in case of regression).

The Euclidean Distance Formula

KNN calculates the distance between data points. For this, we use the simple Euclidean Distance formula.

The above formula takes in n number of dimensions or here we can say them as our features in machine learning. The data point which is located at the minimum distance from the test point is assumed to belong to the same class.

The above formula works the same in n number of dimensions and therefore it can be used with n number of features.

In pattern recognition, the k-Nearest Neighbors algorithm (k-NN) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression:

  • In k-NN classification, the output is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.
  • In k-NN regression, the output is the property value for the object. This value is the average of the values of k nearest neighbors.
The image can be found here

How does the KNN algorithm work?

K-NN works by analogy. The idea is that you are what you resemble. So when we want to classify a point we look at its K-closest (most similar) neighbors and we classify the point as the majority class in those neighbors.

KNN depends on:

A metric used to compute the distance between two points and the value of “k” the number of neighbors to consider.

When “k” is a very small number KNN can overfit, it will classify just based on the closest neighbors instead of learning a good separating frontier between classes. But if “k” is a very big number KNN will underfit, in the limit if k=n KNN will think every point belongs to the class that has more samples.

KNN can be used for regression, just average the value for the k nearest neighbors or a point to predict the value for a new point. One nice advantage of KNN is that it can work fine if you only have a few samples for some of the classes.

Example of kNN with K=1

Python Implementation

Conclusion:

I hope I was able to clarify it a little to you kNN it is one of the basic Algorithms, I will be applying a lot of more explanation of algorithms because why not 🙂

Those are my personal research, if you have any comments please reach out to me.

Github, LinkedIn, Zahra Elhamraoui, Upwork