Machine Learning Series (Support Vector Machines)

Introduction

In this series of posts, I'll introduce the theory behind Support Vector Machines, a class of machine learning algorithms used for classification purposes.

1.0 Support Vector Machines

Support Vector Machines (SVMs) is a machine learning algorithm used for classification tasks. Its primary goal is to find an optimal separating hyperplane that separates the data into its classes. This optimal separating hyperplane is the result of the maximization of the margin of the training data.

1.1 Separating Hyperplane

Let's take a look at the scatterplot of the iris dataset that could easily be used by a SVM:

alt text

Just by looking at it, it's fairly obvious how the two classes can be easily separated. The line which separates the two classes is called the separating hyperplane.

In this example, the hyperplane is just two-dimensional, but SVMs can work in any number of dimensions, which is why we refer to it as hyperplane.

1.1.1 Optimal Separating Hyperplane

Going off the scatter plot above, there are a number of separating hyperplanes. The job of the SVM is find the optimal one.

To accomplish this, we choose the separating hyperplane that maximizes the distance from the datapoints in each category. This is so we have a hyperplane that generalizes well.

1.2 Margins

Given a hyperplane, we can compute the distance between the hyperplane and the closest data point. With this value, we can double the value to get the margin. Inside the margin, there are no datapoints.

The larger the margin, the greater the distance between the hyperplane and datapoint, which means we need to maximize the margin.

alt text

1.3 Equation

Recall the equation of a hyperplane: wTx = 0. Here, w and x are vectors. If we combine this equation with y = ax + b, we get:

alt text

This is because we can rewrite y - ax - b = 0. This then becomes:

wTx = -b Χ (1) + (-a) Χ x + 1 Χ y

This is just another way of writing: wTx = y - ax - b. We use this equation instead of the traditional y = ax + b because it's easier to use when we have more than 2 dimensions.

1.3.1 Example

Let's take a look at an example scatter plot with the hyperlane graphed:

alt text

Here, the hyperplane is x2 = -2x1. Let's turn this into the vector equivalent:

alt text

Let's calculate the distance between point A and the hyperplane. We begin this process by projecting point A onto the hyperplane.

alt text

Point A is a vector from the origin to A. So if we project it onto the normal vector w:

alt text

This will get us the projected vector! With the points (3,4) and (2,1) [this came from w = (2,1)], we can compute ||p||. Now, it'll take a few steps before we get there.

We begin by computing ||w||:

||w|| = √(22 + 12) = √5. If we divide the coordinates by the magnitude of ||w||, we can get the direction of w. This makes the vector u = (2/√5, 1/√5).

Now, p is the orthogonal prhoojection of a onto w, so:

alt text

1.3.2 Margin Computation

Now that we have ||p||, the distance between A and the hyperplane, the margin is defined by:

margin = 2||p|| = 4√5. This is the margin of the hyperplane!

Resources

If you want to learn more, stay tuned for further posts by me. Also feel free to visit my GitHub for more content on Data Science, Machine Learning, and more.