Support Vector Machine SVM is a linear classifier. We can consider SVM for linearly separable binary sets. The goal is to design a hyperplane (is a subspace whose dimension is one less than that of its ambient space. If a space is 3-dimensional then its hyperplanes are the 2-dimensional planes).

The hyperplane classifies all the training vectors in two classes. We can have many possible hyperplanes that are able to classify correctly all the elements in the feature set, but the best choice will be the hyperplane that leaves the Maximum Margin from both classes. With Margins we mean the distance between the hyperplane and the closest elements from the hyperplane.

```
data(iris)
summary(iris)
```

```
Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
Median :5.800 Median :3.000 Median :4.350 Median :1.300
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Species
setosa :50
versicolor:50
virginica :50
```

`head(iris)`

```
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
```

```
# library(ggplot2)
# qplot(Petal.Length, Petal.Width, data=iris, color = Species)
```

We are using the **iris** dataset with 4 numerical variables and 1 factor which has 3 levels as described above. We can also see that the numerical variables have different ranges, it is a good pratice to normalize the data.

We create classification machine learning model that help us to predict the correct species. From the graph above, we can see there is a separation based on the **Species**, for example **setosa** species is very far from the other two groups, and between **versicolor** and **virginica** there is a small overlap.

With **Support Vector Machine SVM** we are looking for optimal separating hyperplane between two classes. And to do that SMV maximize the margin around the hyperplane. The point that lie on the boundary ar called **Support Vectors**, and the middle line is the **Seprarating Hyperplane**.

In situatins where we are not able to obtain a linear separator, the data are projected into a higher dimentional space, so that, data points, can become linearly separable.

In this case, we use the the **Kernel Trick**, using the **Gaussian Radial Basis Function**.

```
library(e1071)
mymodel <- svm(Species~., data=iris)
summary(mymodel)
```

```
Call:
svm(formula = Species ~ ., data = iris)
Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 1
gamma: 0.25
Number of Support Vectors: 51
( 8 22 21 )
Number of Classes: 3
Levels:
setosa versicolor virginica
```

```
# Plot two-dimensional projection of the data with highlighting classes and support vectors
# The Species classes are shown in different shadings
plot(mymodel, data=iris,
Petal.Width~Petal.Length,
slice = list(Sepal.Width=3, Sepal.Length=4)) # specify a list of named values for the dimensions held constant
```

```
# Confusion Matrix and Missclassification Error
pred <- predict(mymodel, iris)
tab <- table(Predicted = pred, Actual = iris$Species)
tab
```

```
Actual
Predicted setosa versicolor virginica
setosa 50 0 0
versicolor 0 48 2
virginica 0 2 48
```

```
# Missclassification Rate
1-sum(diag(tab))/sum(tab)
```

`[1] 0.02666667`

As we can see from the result above, we use Gaussian Radial Basis Function, **cost** is the constaint violation.

The two-dimensional plot above, is a projection of the data with highlighting classes and support vectors. The **Species** classes are shown in different shadings. Inside the **blue class setosa** we have 8 points depicted with a cross, and these are the suppor vectors for **setosa**. Similarly, we have points depicted with red cross points for **versicolor**, and green cross points for **virginica**.

From the **Confusion Matrix** above, we have only 2 observation missclassified for **versicolor**, and 2 observation missclassified for **virginica**.

We have also a missclassification rate, of **2.6%**.

If we try to use SVM with a **linear kernel** (not shown here), instead of a SVM with a **radial kernel**, the missclassification rate is a bit higher.

```
mymodel <- svm(Species~., data=iris,
kernel = "polynomial")
plot(mymodel, data=iris,
Petal.Width~Petal.Length,
slice = list(Sepal.Width=3, Sepal.Length=4))
```

```
pred <- predict(mymodel, iris)
tab <- table(Predicted = pred, Actual = iris$Species)
1-sum(diag(tab))/sum(tab)
```

`[1] 0.04666667`

If we also try to use a SVM with a **polynomial kernel**, as we can see from the graph above, the missclassification rate is increased to **4.6%**.

We can try to tune the model in order to have better classification rate. Tune is also called hyperparameter optimization, and it helps to select the best model.

```
# Tuning
set.seed(123)
tmodel <- tune(svm, Species~., data=iris,
ranges = list(epsilon = seq(0,1,0.1), # sequence from 0 to 1 with an icrement of 0.1
cost = 2^(2:7))) # cost captures the cost of constant violatio
# if cost is too high, we have penalty for non-separable points, and the model store too many support vectors
plot(tmodel)
```

We use **epsilon** and **cost** as tune paramentrs.

The **cost** parameter captures the cost of constant violatio. If **cost is too high**, we have penalty for non-separable points, and as a consequence we have a model that store too many support vectors, leading to **overfitting**. On the contrary, if **cost is too small**, we may end up with **underfitting**.

The value of **epsilon** defines a margin of tolerance where no penalty is given to errors. In fact, in SVM we can have **hard** or **soft** margins, where soft allow observations inside the margins. Soft margin is used when two classes are not linearly separable.

the plot here above gives us the performance evaluation of SMV for the **epsilon** and **cost** parameters. Darker regions means better results, and so lower misclassification error. By interpreting this graph we can choose the best model parameters.

```
mymodel <- tmodel$best.model
summary(mymodel)
```

```
Call:
best.tune(method = svm, train.x = Species ~ ., data = iris, ranges = list(epsilon = seq(0,
1, 0.1), cost = 2^(2:7)))
Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 8
gamma: 0.25
Number of Support Vectors: 35
( 6 15 14 )
Number of Classes: 3
Levels:
setosa versicolor virginica
```

```
plot(mymodel, data=iris,
Petal.Width~Petal.Length,
slice = list(Sepal.Width=3, Sepal.Length=4))
```

```
pred <- predict(mymodel, iris)
tab <- table(Predicted = pred, Actual = iris$Species)
1-sum(diag(tab))/sum(tab)
```

`[1] 0.01333333`

Fomr the summary above, now we have **35 support vectors**: **6** for **setosa**, **15** for **versicolor**, and **14** for **virginica**. The graph here above expain the result obtained with the best model. Looking at the confusion matrix and missclassification error, we can see that only 2 observations are missclassified and the missclassification error is 1.3% which is significant less from what the got earlier.