K-nearest neighbour in Machine Learning

K-Nearest Neighbors is one of the most basic yet essential classification algorithms in Machine Learning.

It belongs to the supervised learning domain and finds intense application in pattern recognition, data mining and intrusion detection.

Basic k-nearest neighbor classification

Training method :
- Save the training examples

At prediction time :
- Find the k training examples (X1,y1),.....(xk,yk) that are closest to the test example x
- Classification : Predict the most frequent class among those y1's.
- Regression : Predict the average of among the yi's.

Improvements :

- Weighting examples from the neighborhood
- Measuring "closeness"
- Finding "close" examples in a large training set quickly

Average of k points more reliable when :

- nose in attributes

- noise in class labels

- classes may be partially overlap

Equal weight to all attributes :

- Only of the scale of the attributes are similar and differences
- Scale attribute to equal range and equal various
- Classes are spherical

In above image there are three classes red, green, black.

Small K :- Capture fine structure of problem space better, Its may be necessary for small training set

Large K :- The less sensitive to noise (particularly class noise). we get better probability estimate for discrete classes.Larger training training set allows use of larger k.