[R] Support Vector Machine

1. What is Support Vector Machine?

Find a plane that separate the classes in feature space.
Soften what we mean by separates and enrich and enlarge the feature space so that separation is possible.
Three methods for SVM
- Maximum Margin Classifier
- Support Vector Classifier(SVC)
- Support Vector Machine(SVM)

1.1 Hyperplane

A hyperplane in p dimensions is a flat affine subspace of dimension p-1.
General equation for a hyper plane : $\beta_0 + \beta_1 X_1 + ... \beta_p X_p = 0$
- If $X = (X_1, ..., X_p)^T$ satisfies above, then $X$ lies on the hyper plane.
- If $X = (X_1, ..., X_p)^T$ does not satify above, then $X$ lies to on the side of the hyperplane.

1.2 Separating Hyperplanes

If $f(x) = \beta_0 + \beta_1 X_1 + ... + \beta_p X_p$, then $f(x) > 0$ for points on one side of the hyperplane, and $f(x) < 0$ for points on the other.
If a separating hyperplane exists, we can use it to construct a very natural classifier.
For a test observation $x^{*}$ : $f(x^*) = \beta_0 + \beta_1 x_1^* + ... + \beta_p x_p^*$
- If $f(x^*) > 0$, we assign the test observation to class 1.
- If $f(x^*) < 0$, we assign the test observation to class 2.
If $|f(x^*)|$ is relatively large, the class assignment is confident.
If $|f(x^*)|$ is relatively small, the class assignment is less confident.

2. Maximal Margin Classifier

The maximal margin hyperplane is the separating hyperplane that is farthest from the training observations.
Among all separating hyperplane, find the one that makes the biggest gap or margin between the two classes.
Constrained optimization problem :
- maximize M subject to
The function svm() in an R package e1071 solves this problem efficiently.

2.1 The Non-separable Case

The maximal margin classifier is a very natural way to perform classification.
However, in many cases no separating hyperplane exists, and so there is no maximal margin classifier.
If a separating hyperplane doesn't exist, there is no solution to M in the optimization problem.
However, we can develop a hyperplane that almost separates the classes, using a so-called soft margin.
The generalization of the maximal margin classifier to the non-separable case is known as the support vector classifier.

3. Support Vector Classifier

We need to consider a classifier based on a hyperplane that does not perfectly separate the two classes, in the interest of
- Greater robustness to individual observations
- Better classification of most of the training observations
It could be worthwile to misclassify a few training observations in order todo a better job in classifying the remaining observations.
The support vector classfier (soft margin classifier) allows some observations to be on the incorrect side of the margin, or even incorrect side of the hyperplane.
The margin is soft because it can be violated by some of the training observations.
Optimization of SVC :
- $maximize_{\beta_0, \beta_1, ..., \beta_p, \epsilon_1, ..., \epsilon_n} M$ subject to $\sum_{j=1}^p \beta_j^2 = 1$
- $y_i(\beta_0 + \beta_1 x_{i1} + ... + \beta_p x_{ip} \geq M(1-\epsilon_i)$
- $\epsilon_i \geq 0$, and $\sum_{i=1}^n \epsilon_i \leq C$
- $C$ is a tuning parameter.

3.1 Margins and Slack Variables

$M$ is the width of the margin.
$\epsilon_1, ..., \epsilon_n$ are slack variables.
- If $\epsilon_i = 0$, the $i$th obs. is on the correct side of the margin.
- If $\epsilon_i > 0$, the $i$th obs. is on the wrong side of the margin.
- If $\epsilon_i > 1$, the $i$th obs. is on the wrong side of the margin.

3.2 [Ex] Support Vector Classifier

# Simple example (simulate data set)
set.seed(1)
x <- matrix(rnorm(20*2), ncol=2)
y <- c(rep(-1, 10), rep(1, 10))
x[y==1, ] <- x[y==1, ] + 1
plot(x, col=(3-y), pch=19, xlab="X1", ylab="X2")

# Support vector classifier with cost=10
library(e1071)
# y must be in format (-1, 1) 
dat <- data.frame(x, y=as.factor(y))
# Tuning parameter cost (inverse of C) 
svmfit <- svm(y~., data=dat, kernel="linear", cost=10, scale=FALSE)
plot(svmfit, dat)
summary(svmfit)

3.3 [Ex] Support Vector Classifier with different margins

# SVM of tuning parameter (cost=0.1) 
svmfit <- svm(y~., data=dat, kernel="linear", cost=0.1,
scale=FALSE)
svmfit$index
beta <- drop(t(svmfit$coefs)%*%x[svmfit$index,])
beta0 <- svmfit$rho

# Visualize results
plot(x, col=(3-y), pch=19, xlab="X1", ylab="X2")
points(x[svmfit$index, ], pch=5, cex=2)
abline(beta0 / beta[2], -beta[1] / beta[2])
abline((beta0 - 1) / beta[2], -beta[1] / beta[2], lty = 2)
abline((beta0 + 1) / beta[2], -beta[1] / beta[2], lty = 2)

3.4 [Ex] Support Vector Classifier with Hyper parameter tuning

# Cross validation to find the optimal tuning parameter (cost) 
# Training using tune function 
set.seed(1)
tune.out <- tune(svm, y~., data=dat, kernel="linear",
                 ranges=list(cost=c(0.001, 0.01, 0.1, 1, 5, 10, 100)))
summary(tune.out)
bestmod <- tune.out$best.model
summary(bestmod)

# Generate test set 
set.seed(4321)
xtest <- matrix(rnorm(20*2), ncol=2)
ytest <- sample(c(-1,1), 20, rep=TRUE)
xtest[ytest==1, ] <- xtest[ytest==1, ] + 1
testdat <- data.frame(xtest, y=as.factor(ytest))

# Calculate missclassification rate for optimal model 
# Compute misclassification rate for the optimal model
ypred <- predict(bestmod, testdat)
table(predict=ypred, truth=testdat$y)
mean(ypred!=testdat$y)

Missclassification error rate : 0.15

3.5 [Ex] Simulation Study of Support Vector Classifier

# Prepare package 
library(mnormt)
library(e1071)

# Function for calculating Misclassification Rate of SVC
SVC.MCR <- function(x.tran, x.test, y.tran, y.test,
                    cost=c(0.01,0.1,1,10,100)) {
  dat <- data.frame(x.tran, y=as.factor(y.tran))
  testdat <- data.frame(x.test, y=as.factor(y.test))
  MCR <- rep(0, length(cost)+1)
  for (i in 1:length(cost)) {
    svmfit <- svm(y~., data=dat, kernel="linear",
                  cost=cost[i])
    MCR[i] <- mean(predict(svmfit, testdat)!=testdat$y)
  }
  tune.out <- tune(svm, y~., data=dat, kernel="linear",
                   ranges=list(cost=cost))
  pred <- predict(tune.out$best.model, testdat)
  MCR[length(cost)+1] <- mean(pred!=testdat$y)
  MCR
}

# Simulation test for 100 replications
set.seed(123)
K <- 100
RES <- matrix(NA, K, 6)
colnames(RES) <- c("0.01", "0.1" ,"1" , "10" ,"100", "CV")
for (i in 1:K) {
  x.A <- rmnorm(100, rep(0, 2), matrix(c(1,-0.5,-0.5,1),2))
  x.B <- rmnorm(100, rep(1, 2), matrix(c(1,-0.5,-0.5,1),2))
  x.tran <- rbind(x.A[1:50, ], x.B[1:50, ])
  x.test <- rbind(x.A[-c(1:50), ], x.B[-c(1:50), ])
  y.tran <- factor(rep(0:1, each=50))
  y.test <- factor(rep(0:1, each=50))
  RES[i,] <- SVC.MCR(x.tran, x.test, y.tran, y.test)
}
apply(RES, 2, summary)
boxplot(RES, boxwex=0.5, col=2:7,
        names=c("0.01", "0.1", "1", "10", "100", "CV"),
        main="", ylab="Classification Error Rates")

The model with C=10/100 get best missclassification error rate of test sets.

저작자표시

'Data Science > R' 카테고리의 다른 글

[R] Non-Linear Support Vector Machine (0)	2022.12.06
[R] Tree-Based Methods : Bayesian Additive Tree (0)	2022.11.27
[R] Tree-Based Methods : Boosting (0)	2022.11.27
[R] Tree-Based Methods : Random Forest (0)	2022.11.27
[R] Tree-Based Methods : Bagging (0)	2022.11.27

See the forest

[R] Support Vector Machine

1. What is Support Vector Machine?

1.1 Hyperplane

1.2 Separating Hyperplanes

2. Maximal Margin Classifier

2.1 The Non-separable Case

3. Support Vector Classifier

3.1 Margins and Slack Variables

3.2 [Ex] Support Vector Classifier

3.3 [Ex] Support Vector Classifier with different margins

3.4 [Ex] Support Vector Classifier with Hyper parameter tuning

3.5 [Ex] Simulation Study of Support Vector Classifier

'Data Science > R' 카테고리의 다른 글

티스토리툴바

[R] Support Vector Machine

1. What is Support Vector Machine?

1.1 Hyperplane

1.2 Separating Hyperplanes

2. Maximal Margin Classifier

2.1 The Non-separable Case

3. Support Vector Classifier

3.1 Margins and Slack Variables

3.2 [Ex] Support Vector Classifier

3.3 [Ex] Support Vector Classifier with different margins

3.4 [Ex] Support Vector Classifier with Hyper parameter tuning

3.5 [Ex] Simulation Study of Support Vector Classifier

'Data Science > R' 카테고리의 다른 글

'Data Science/R' Related Articles

티스토리툴바