What does the first thing come into your mind when you read or listen to the word kernel? In my mind, it’s a post in the army or as computer scientist its operating system kernel that is responsible for the operation and manages the hardware according to given instructions. But **Kernel in Machine Learning** is something else, its somehow like operating system kernel that manage the function learned by some model/trainer with some experience/examples/data points. Now, here is the question of what actually machine learning is? What is the model/trainer? And what is experience/examples/data points? Here we will try to answer all these questions.

**Basic concepts related to machine learning:**

Machine learning is study in which we give ability to learn without and explicit programming.

If we go for the basic and formal definition of machine learning then it is

*”A computer program is said
to learn from experience E with respect to some task T
and some performance measure P, if its performance on T, as
measured by P, improves with experience E.” — Tom Mitchell, Carnegie Mellon
University”*

So, if you want to train your program to predict the gender
e.g male/female(**Task T**), you can train it through the giving different
examples that distinguish male and female(**experience E**). if you model
learn and train on the given examples then it can predict gender in the future
(**performance measure P**).**Kernel in machine learning **

To explain the kernels and their functionality, kernel of
Support Vector Machine (SVM) is the best way. In simple words, we can say that
kernel is like the similarity function in the field of machine learning. If we
give two objects to any classifier kernel classify them on the basis of some
similarity score. the objects to classify can be anything like two simple
integers, any kind of text, integer vector, images, or any entity in the real
world. It is the responsibility of kernel to define a function or associate a
relationship between them from experiences to classify them. The simplest basic
example of kernel in machine learning is a linear kernel of SVM, in simple
words, you can say dot-product. Linear kernel associates the relationship
between two vectors on the basis of projection length. Another example of
kernel is the Gaussian kernel, that uses the radius parameters to reweight the
distance between two vectors **X** and **Y** to classify.

**Importance of Kernel in Machine Learning:**

The decision to classify an
example depends on the decision function and a kernel can not be said decision
function. Decision function uses the kernel inside and compares the example to
the number of support vectors weights by using the learned parameters α. So, we can say that kernel is
just a **weighting factor** that assigns weights to the examples/data
points. It is up to kernel it can assign more weight to one example at one time
and less weight another time or can assign more weight to other example and so
on. Another function of the kernel is to **change the dimension of data
according to the situation. **It also maps the one data to another in a
one-to-one manner according to given criteria such as missing data or
reordering data etc. Actually, it is the responsibility of kernel to crop,
stretch, expand, bend or shrink the data sequence to map one-to-one on other
data.

**Example**: a = (a1, a2, a3); b = (b1, b2, b3).

Then for the function f(a) = (a1x1, a1x2, a1x3, a2x1, a2x2, a2x3, a3x1, a3x2, a3x3),

the kernel is K(a, b ) = (<a, b>)^{2}.

Let’s add some more example to make it clearer:

suppose a = (1, 2, 3); b = (4, 5, 6). Then:

f(a) = (1, 2, 3, 2, 4, 6, 3, 6, 9)

f(b) = (16, 20, 24, 20, 25, 30, 24, 30, 36)

<f(a), f(b)> = 16 + 40 + 72 + 40 + 100+ 180 + 72 + 180 + 324 = 1024

To calculate the results and find the relationship between input and output we have to do a lot of algebra and should have critical analysis skills. This all is mainly because f is a mapping from three dimensional to nine-dimensional space.

Now let see the magic of kernel:

K(x, y) = (4 + 10 + 18 ) ^{2} = 32^{2} = 1024

by using the kernel we got the same result, but this calculation is so much faster
and easy.

This is how kernel make our life easy. Sometime there is one input against output and it is easy to make the relationship between input and output as in figure 1.

But sometime there is more than one input vectors and function learning in that case is difficult. Example is shown in Figure 2.

**Amazing functionalities of kernel:**

This is the beauty of kernel is that it allows doing classification in infinite dimensions without letting us know the pressure upon it. But keep in mind that it is not possible every time higher dimension data is difficult to classify and sometimes it is not possible for the kernel to make rules or learn function for this big data. In machine learning, higher dimension data lead to lower results that is called the curse of dimensionality. Function F(x) can map high dimension data to infinite dimensions only when it makes sense and have idea to deal with it. In such cases kernel gives amazing shortcuts to deal with data.

**SVM kernels and their suitability:**

There are some different types of kernels that can be used with SVM and perform well according to the nature of data. Following are kernel types:

- Linear kernel: suitable for large sparse data.
- Non-linear kernel: suitable for converting non-linear separable high dimension data in to linear form
- Polynomial kernel: popular in Digital Image Processing (DIP)
- Radial basis function (RBF) kernel: suitable where no prior knowledge about data
- Sigmoid kernel: used in Artificial Neural Network (ANN)

Here we use Linear kernel as an example and code to plot that shows how linearly separable data is classified behind the scene. It is easy to separate the linear data and classify that for the kernel. The output of linear data is shown in figure4

from sklearn.datasets.samples_generator import make_blobs from sklearn import svm import matplotlib.pyplot as plot A,b=make_blobs(n_samples=60,centers=2,random_state=20) svm=svm.SVC(kernel='linear',C=1) svm.fit(A,b) plot.scatter(A[:,0],A[:,1],c=b, s=30, cmap= plt.cm.Paired) plot.show()

In the case of non-linear data, it is difficult to classify. So, as we have discussed above that it is the responsibility of kernel to transform the one-dimension data into two or three dimensions so that classification can be easily by separating the non-linear data.

from mpl_toolkits.mplot3d import Axes3D from sklearn.datasets import make_circles import matplotlib.pyplot as plot A, b = make_circles(n_samples = 500, noise = 0.02) plot.scatter(A[:, 0], A[:, 1], c = b, marker = '.') plot.show()

In figure 5 data is non-linearly distribute and it is difficult to separate of draw boundary for kernel but in 3-D it will be easy so kernel converts it in 3-D form and it looks like same as in figure 6.

In three dimensions it is easy to separate the non-linear data and it is the functionality of non-linear kernel to convert data in a linear distribution.

The successful classification of the model depends on choosing the right kernel for the right scenario. Every kernel can not perform well or fit in every kind of data. It is problem specific and you have to choose the kernel wisely according to your needs and nature of data.

**Description:**

Kernels are the most important part of any operating system or model in machine learning. **Kernel in Machine Learning** used to handle the decision function of machine learning models. Decision function uses the kernel inside and compares the example to number of support vectors weights by using the learned parameters α SVM provides a different kind of kernels such as the linear kernel, nonlinear kernel, RBF kernel, sigmoid kernel. Every kernel has its own functionality, pros and cons, and nature of work.