What does the first thing come into your mind when you read or listen to the word kernel? In my mind, it’s a post in the army or as computer scientist its operating system kernel that is responsible for the operation and manages the hardware according to given instructions. But in machine learning kernel is something else, its somehow like operating system kernel that manage the function learned by some model/trainer with some experience/examples/data points. Now, here is the question of what actually machine learning is? What is the model/trainer? And what is experience/examples/data points? Here we will try to answer all these questions.
Basic concepts related to machine learning:
Machine learning is study in which we give ability to learn without and explicit programming.
If we go for the basic and formal definition of machine learning then it is
”A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.” — Tom Mitchell, Carnegie Mellon University”
So, if you want to train your program to predict the gender e.g male/female(Task T), you can train it through the giving different examples that distinguish male and female(experience E). if you model learn and train on the given examples then it can predict gender in the future (performance measure P).Kernel in machine learning
To explain the kernels and their functionality, kernel of Support Vector Machine (SVM) is the best way. In simple words, we can say that kernel is like the similarity function in the field of machine learning. If we give two objects to any classifier kernel classify them on the basis of some similarity score. the objects to classify can be anything like two simple integers, any kind of text, integer vector, images, or any entity in the real world. It is the responsibility of kernel to define a function or associate a relationship between them from experiences to classify them. The simplest basic example of kernel in machine learning is a linear kernel of SVM, in simple words, you can say dot-product. Linear kernel associates the relationship between two vectors on the basis of projection length. Another example of kernel is the Gaussian kernel, that uses the radius parameters to reweight the distance between two vectors X and Y to classify.
Importance of kernel:
The decision to classify an example depends on the decision function and a kernel can not be said decision function. Decision function uses the kernel inside and compares the example to the number of support vectors weights by using the learned parameters α. So, we can say that kernel is just a weighting factor that assigns weights to the examples/data points. It is up to kernel it can assign more weight to one example at one time and less weight another time or can assign more weight to other example and so on. Another function of the kernel is to change the dimension of data according to the situation. It also maps the one data to another in a one-to-one manner according to given criteria such as missing data or reordering data etc. Actually, it is the responsibility of kernel to crop, stretch, expand, bend or shrink the data sequence to map one-to-one on other data.
Example: a = (a1, a2, a3); b = (b1, b2, b3).
Then for the function f(a) = (a1x1, a1x2, a1x3, a2x1, a2x2, a2x3, a3x1, a3x2, a3x3),
the kernel is K(a, b ) = (<a, b>)2.
Let’s add some more example to make it clearer:
suppose a = (1, 2, 3); b = (4, 5, 6). Then:
f(a) = (1, 2, 3, 2, 4, 6, 3, 6, 9)
f(b) = (16, 20, 24, 20, 25, 30, 24, 30, 36)
<f(a), f(b)> = 16 + 40 + 72 + 40 + 100+ 180 + 72 + 180 + 324 = 1024
To calculate the results and find the relationship between input and output we have to do a lot of algebra and should have critical analysis skills. This all is mainly because f is a mapping from three dimensional to nine-dimensional space.
Now let see the magic of kernel:
K(x, y) = (4 + 10 + 18 ) 2 = 322 = 1024
by using the kernel we got the same result, but this calculation is so much faster and easy.
This is how kernel make our life easy. Sometime there is one input against output and it is easy to make the relationship between input and output as in figure 1.
But sometime there is more than one input vectors and function learning in that case is difficult. Example is shown in Figure 2.
Amazing functionalities of kernel:
This is the beauty of kernel is that it allows doing classification in infinite dimensions without letting us know the pressure upon it. But keep in mind that it is not possible every time higher dimension data is difficult to classify and sometimes it is not possible for the kernel to make rules or learn function for this big data. In machine learning, higher dimension data lead to lower results that is called the curse of dimensionality. Function F(x) can map high dimension data to infinite dimensions only when it makes sense and have idea to deal with it. In such cases kernel gives amazing shortcuts to deal with data.
SVM kernels and their suitability:
There are some different types of kernels that can be used with SVM and perform well according to the nature of data. Following are kernel types:
- Linear kernel: suitable for large sparse data.
- Non-linear kernel: suitable for converting non-linear separable high dimension data in to linear form
- Polynomial kernel: popular in Digital Image Processing (DIP)
- Radial basis function (RBF) kernel: suitable where no prior knowledge about data
- Sigmoid kernel: used in Artificial Neural Network (ANN)
Here we use Linear kernel as an example and code to plot that shows how linearly separable data is classified behind the scene. It is easy to separate the linear data and classify that for the kernel. The output of linear data is shown in figure4
from sklearn.datasets.samples_generator import make_blobs from sklearn import svm import matplotlib.pyplot as plot A,b=make_blobs(n_samples=60,centers=2,random_state=20) svm=svm.SVC(kernel='linear',C=1) svm.fit(A,b) plot.scatter(A[:,0],A[:,1],c=b, s=30, cmap= plt.cm.Paired) plot.show()
In the case of non-linear data, it is difficult to classify. So, as we have discussed above that it is the responsibility of kernel to transform the one-dimension data into two or three dimensions so that classification can be easily by separating the non-linear data.
from mpl_toolkits.mplot3d import Axes3D from sklearn.datasets import make_circles import matplotlib.pyplot as plot A, b = make_circles(n_samples = 500, noise = 0.02) plot.scatter(A[:, 0], A[:, 1], c = b, marker = '.') plot.show()
In figure 5 data is non-linearly distribute and it is difficult to separate of draw boundary for kernel but in 3-D it will be easy so kernel converts it in 3-D form and it looks like same as in figure 6.
In three dimensions it is easy to separate the non-linear data and it is the functionality of non-linear kernel to convert data in a linear distribution.
The successful classification of the model depends on choosing the right kernel for the right scenario. Every kernel can not perform well or fit in every kind of data. It is problem specific and you have to choose the kernel wisely according to your needs and nature of data.
Kernels are the most important part of any operating system or model in machine learning. Kernel used to handle the decision function of machine learning models. Decision function uses the kernel inside and compares the example to number of support vectors weights by using the learned parameters α SVM provides a different kind of kernels such as the linear kernel, nonlinear kernel, RBF kernel, sigmoid kernel. Every kernel has its own functionality, pros and cons, and nature of work.