Logistic Regression for Machine Learning
Machine Learning is a task of learning from the examples in a training dataset by mapping the outcome labels with input variables, which can then be used to predict the outcome of a new event. There are many classification tasks that people do on a routine basis. Identification of whether or not a message is a Spam, identification of whether a tumor is malignant or benign, classification of whether or not a page is fake, etc. These are typical examples that can make our lives much easier with machine learning algorithms.
A key component of machine learning is supervised learning. Classification, in Machine Leaning Challenges is a very popular and important supervised technique. Most of the machine learning algorithms are designed to resolve (discrete, not continuous) classification issues. Simple classification models produces better results for binary class problems where there are only 2 classes to predict. But the problem having 10 classes as there are 10 digits (0–9) requires Multi-Class Classification.
After linear regression, logistic regression is the most popular machine learning algorithm. Linear regression and logistical regression are similar in many ways. Yet, what they are used for is the biggest difference. Linear regression algorithms are used for predicting values, but for classification tasks, logistic regression is used.
What is Logistic Regression?
Logistic Regression is one of the most useful machine learning technique which relates with the field of statistics. Logistic Regression is one of the most commonly used and influential among many Machine Learning Classification Algorithms. It can be used in problems of both binary and multi-class classification. In this blog we will discuss that how the logistic regression algorithm works. Depending on the number of categories, Logistic regression can be classified as:
- Binomial: The resulting parameter can only have 2 different types: “0” and “1” which can reflect “win” versus “miss,” “pass” versus “fail,” “dead” versus “alive,” etc.
- Multinomial: Resulting parameter may have 3 or more potential non-ordered types (i.e. types with no numerical meaning) such as ‘ disease A ‘ vs ‘ disease B ‘ vs ‘ disease C. ‘
- Ordinal: This deals with the target class variables. A test performance, for example, can be graded as: “very poor “poor,” “good,” “extremely good.” Here a score of 0, 1, 2, 3 can be assigned to each group.
What is Logistic Regression Algorithm?
As the name “Logistic” suggests that there may be a function known as Logistic that is involved in the Machine Learning Algorithm’s hypothesis. A linear equation with independent predictors is also used by the logistic regression algorithm to predict a value. Is it in the right direction for our thinking? Yeah, that’s it! Additionally, the Sigmoid function is also known as the logistic function.
The Sigmoid function production varies from 0 to 1. But a specific label that indicates a class should be predicted for classification. In that case, a threshold (obviously a value between 0 and 1) must be set in such a way as to achieve the optimal predictive output. Following is the graphical demonstration of sigmoid function.
Generally, there is also a cost function involved when there is a hypothesis, often known as the Binary Cross Entropy variable. This cost function must be reduced here. It is therefore necessary to find minima (theta 0, theta 1, theta 2, …, theta n). Batch Gradient Descent can be used to find this minimum as an optimization technique. Cost function is represented by,
Only when a decision threshold is placed into the frame, logistic regression becomes a classification technique. Setting the threshold value is a very important aspect of logistic regression which relies on the issue of classification itself. The decision on the threshold value is largely affected by the precision and recall values. Ideally, we want both precision and recall to be 1, but this is rarely the case. We use the following arguments in the case of a precision-recall trade off to decide on the threshold.
Low Precision or High Recall:
For situations where we want to reduce the number of false negatives by explicitly decreasing the number of false positives, we choose a decision value that has either a small precision value or a high recall value.
High Precision or Low Recall:
For situations where we want to minimize the number of false positives by explicitly reducing the number of false negatives, we choose a decision value with a high precision value or a low recall value.
Advantages and Disadvantages?
Logistic regression can be a reasonable and effective option, as long as the data set suits. If your data set is not fitted for a logistic regression, it can be a terrible (and therefore terrible ineffective) approach. For making some logical prediction, more details would be required. There is no version of regression that is always preferable to others.
Logistic regression attempts to predict results based on a set of independent variables, but the model will have little to no predictive value if researchers include the wrong independent variables. For example, if college admission decisions are more based on recommendation letters than test scores, and researchers do not include a metric for recommendation letters in their data set, then the trained model will not provide meaningful or reliable predictions. This means that operational regression is not a useful tool when researchers define all the related independent variables already.
Logistic regression may be quite effective (compared to other forms of regression) and may be inefficient. There’s a reason why your regression model’s errors can be measured in so many ways. You would then use the template to give you the best outcomes with the reliability you need for the information you have.
Conclusion
Logistic regression is a basic algorithm that can be used in the classification of binary / multivariate functions. In this algorithm, the maximum likelihood estimation (MLE) is used to calculate the parameters rather than the ordinary least squares (OLS) and it is thus based on large-sample approximations. I hope you’d get a basic understanding of how the logistic regression algorithm operates by now.
The future work includes the use of other advanced optimization techniques other than Gradient Descent, which do not allow learning levels to be provided as input, but are able to find the Global Minima of the Cost Function (theta) accurately or approximately.