Introduction of Support Vector Machine:
Support Vector Machine (SVM) [1] is a supervised machine learning based classification algorithm which is efficient for both small and large number of data samples. It can used for both regression and classification problems but mostly it is used for classification purpose due to its high accuracy in classification task. So to understand what is support vector machine we first have to understand the following concepts:
- Data points
- Support Vectors
- Hyperplane
Data points:
Data points are class instances of our dataset which are plotted on n-dimensional space (Here n are the number of features any data point have). If each data point have two number of attributes then it can be plotted on two dimension space but if it has three attributes, a three dimensional space (x, y and z-axis) would be require to plot them. But if it has more than three attributes, then we cannot plot it in the way which is intercepted by human eye or simply we cannot draw them visually. Figure 1 (a) and (b) depicts the data representations in 2D and 3D plains respectively.

In this algorithm, we plot each data item as a point in n-dimensional space (where n is number of features you have) with the value of each feature being the value of a particular coordinate.
Support Vectors:
So what are the support vectors? Support vectors are the data points which are nearest to the hyperplane. Because of this property we can say that support vectors are most important data points. The reason of their importance is that these vectors play assistive role in creating optimal SVM model. As we can see that the name of SVM is given after them. Figure 2.

Hyperplane:
For data in two dimensional space, you can think of hyperplane as a separating line that separates all the data point into two classes. If our data points lie further away from the hyperplane, then we are more confident that they are classified correctly. So we try to create such hyperplane from which data point are as far away as possible. Shortly we try to maximize the margin in between lines which are parallel to hyperplane and intersecting the support vectors as shown in Figure 3.

Training the SVM (Finding the right hyperplane)
SVM is able to learn both simple and complex type of models. It also gives good accuracy on unseen dataset by avoiding overfitting [2]. The job of SVM is to find the best oriented hyperplane (decision boundary) which separates the data points in best way by giving the maximum possible margins in between support vectors.
The function of SVM is to predict the class labels for any given data.

Here, x are the feature vectors which are going to be classified by SVM. These are also called data points. These must be real numbers.

And y, are the labels. It represents positive or negative class (binary classification). Here the problem in terms of mathematics.


Minimize subject to

To predict the label of any new instance x, the classifier trained on the training dataset.
To predict the label of any new instance x, the classifier trained on the training dataset. A model generated during the training phase. Which is further use for the inferencing purpose. The more accurate is the generated model, more accurately it will classify the test data.
The accuracy of the model depend upon the hyperplane. If the hyperplane well defined and optimal the model will be more accurate.
For linear type of problems in which data is linearly separable a simple and linear hyperplane would be sufficient to easily classify the data as shown above in Figure 3.
But If data too much complicated that it cannot handled by simple linear decision hyperplane, then data mapped to higher dimensional feature space, as shown in Figure 4. Here the kernel trick help to generate he optimal model. In that new space the data is then easily separable by using different kernel functions like polynomial kernel function which can be used for Non-linearly separable problem. Similarly a radial kernel can also employed.

To train the best model we have to tune some hyper-parameters. Regularization mechanism [3] is also used to avoid overfitting.
Support Vector Machine Pipeline :
To train a SVM there are some steps involves. By following these steps we can efficiently and effectively train Support vector Machine to a specific type of data.
1. Problem Understanding
Problem understanding is first phase to train a SVM classifier. You have to first understand the problem you are working on and do some literature review to understand if SVM is suitable for that problem or not. As described previously SVM work better for classification problem only, not for regression problems.
2. Data Understanding and Preprocessing
To understand your data is best practice in any Machine learning problem. For this purpose you have to visualize your data, get understanding of data, and perform some pre-processing operations like normalization and discretization etc. Converting your data into numeric form is also necessary as SVM does not handle alphabetical data or strings. Then convert your labels into One Hot Encoding form.
3. Setting SVM Classifier
Finally import svm classifier from skleran library in your code and pass the iris dataset to classifier for training. You have to also import the numpy library.
After importing the libraries, iris dataset would be downloaded. Then we separate the labels from class data. After that an object of SVM is created to start training. Pickle library will be used to save the model. Then saved model will be loaded to predict the class label of test data.


After training we can visulize our data and model using matplotlib library. We can see that the value of c (which is a hpyerparmeter) effects the generated model’s hperspaces.

Support Vector Machine Uses:
SVM mostly used for classification tasks in which we have to categorize the input into some class. Some of the worth mentioning tasks are Spam detection, sentimental analysis and image classification. To classify data first we have to extract feature from data using feature engineering [4] techniques. Then these features are classified using SVM, providing the class of input data. SVM also used in hand written digits recognition task to automate the postal service.
Pros and Cons of Support Vector Machines:
Pros
- Accuracy is good
- More efficient because subset of training points used
- Optimization is very good due to the convex nature of optimization function, which results in global minima instead of local minima
- Can used for both linearly separable (hard margins) and Non-linearly separable (soft margin) data.
Cons
- Works very on clean and small dataset
- Support Vector Machines less efficient on noisy datasets in which classes overlapped with each other.
- It isn’t suited computationally for large dataset where number of rows exceeds a limit.
- In Natural Language Processing problem, where structured information like word embedding used for sequential data, SVM performs worst.
- Probability estimate which desired in most classification problems, SVM don’t directly provide these estimates.
- SVMs over-fit if the number of features are much more than total number of samples provided.
You may also know: Critical Analysis of K-means Clustering.