Non-linear SVM and Kernel Function in Machine Learning

Nonlinear SVMs: Feature Space

Nonlinear SVMS: The Kernel Tricks

- With this mapping, our discriminant function is now:

- We only use the dot product of feature vectors in both the training and test.
- A kernel function is defined as a function that corresponds to a dot product of two feature vectors in some expanded feature space:
k (𝓍a, 𝓍b) = Φ(𝓍a). Φ(𝓍b)
Often k (𝓍a, 𝓍b) may be very inexpensive to compute even ifΦ(𝓍a) may be extremely high dimensional.

Kernel Example

2-dimensional vector x = [x1x2]
let K(𝓍i, 𝓍j) = (1+𝓍i.𝓍j)2
We need to show that K(𝓍i, 𝓍j) = Φ(𝓍i). Φ(𝓍j)

Commonly-used kernel functions

Linear kernel: K(xi.xj) = xi.xj
Polynomial of power p: K(xi,xj) = (1+xi.xj)p
Gaussian (radial-basis function):

Sigmoid: K(xi,xj) = tanh(β0xi.xj +β1)

In general, function that satisfy Mercer's condition can be kernel functions.

Kernel Functions

Kernel function can be thought of as a similarity measure between the input objects
Not all similarity measure can be used as kernel function.
Mercer's condition state that any positive semi-definite kernel K(x,y), i.e.

Σ K(xi,xj)cicj ≥0