본문 바로가기

Data Science/R

[R] Classification Problem : LDA(Linear Discriminant Analysis)

1. Discriminant Analysis

  • Bayes Theorem : \(Pr(Y=k|X=x) = \frac{\pi_k f_k(x)}{\sum_{l=1}^K \pi_l f_l(x)}\)
  • The density for \(X\) in class \(k\) : \(f_k(x) = \frac{1}{\sqrt{2\pi}\sigma_k}e^{-\frac{(x-\mu_k)^2}{2\sigma_k^2}}\)
  • The prior probability for class \(k\) : \(\pi_k = Pr(Y=k)\)
  • If the distribution of the \(X\) is approximately normal, LDA and QDA is more stable.

 

1.1 LDA(Linear Discriminant Analysis)

  • Assume that \(\sigma = \sigma_1 = \sigma_2 = ... \sigma_k\)
  • \(f_k(x)\)
    • p = 1 : \(f_k(x) \to N(\mu_k, \sigma)\)
    • p > 1 : \(f_k(x) \to N(\mu_k, \sum)\)
  •  \(p_k(x) =  \frac{\pi_k \frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu_k)^2}{2\sigma^2}}}{\sum_{l=1}^K \pi_l \frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu_l)^2}{2\sigma^2}}}\)
  • To classify at the value \(X=x\), see which of the \(p_k(x)\) is largest.
  • Because below term of \(p_k(x)\) is same, we need to consider upper terms.
    • Discriminant score :
      • p = 1 : \(\sigma_k(x) = x\frac{\mu_k}{\sigma^2} - \frac{\mu_k^2}{2\sigma^2} + log(\pi_k)\)
      • p > 1 : \(\sigma_k(x) = x^T\sum^{-1}\mu_k - \frac{1}{2}\mu_k^T\sum^{-1}\mu_k + log(\pi_k)\)
    • \(\sigma_k(x)\) is a linear function of \(x\). 
  •  Decision boundary at \(x = \frac{\hat{\mu_1} + \hat{\mu_2}}{2}\)
  • Need to estimate : \((\mu_1, ..., \mu_k), (\pi_1, ..., \pi_k), \sum\)

 

1.2 [Ex] Iris Data

Training model with LDA

 

# open the iris dataset 
data(iris)
str(iris)
summary(iris)
plot(iris[, -5], col=as.numeric(iris$Species) + 1)

# Apply LDA for iris data
library(MASS)
g <- lda(Species ~., data=iris)
plot(g)
plot(g, dimen=1)

 

 

Compute Missclassification Rate for training sets

 

# Compute misclassification error for training sets
pred <- predict(g)
table(pred$class, iris$Species)
mean(pred$class!=iris$Species)

 

 

We can get inferred probability 𝑝̂ 𝑘(𝑥) : predict(g, iris)$posterior[-tran, ].

 

Performance comparison between LDA vs Multinomial Regression

 

library(nnet)
set.seed(1234)
K <- 100
RES <- array(0, c(K, 2))
for (i in 1:K) {
    tran.num <- sample(nrow(iris), size=floor(nrow(iris)*2/3))
    tran <- as.logical(rep(0, nrow(iris)))
    tran[tran.num] <- TRUE
    g1 <- lda(Species ~., data=iris, subset=tran)
    g2 <- multinom(Species ~., data=iris, subset=tran, trace=FALSE)
    pred1 <- predict(g1, iris[!tran,])$class
    pred2 <- predict(g2, iris[!tran,])
    RES[i, 1] <- mean(pred1!=iris$Species[!tran])
    RES[i, 2] <- mean(pred2!=iris$Species[!tran])
}
apply(RES, 2, mean)

# 0.0254 0.0436

 

1.3 [Ex] Default Dataset

# Importing library 
library(ISLR)
data(Default)
attach(Default)
library(MASS)

# Training model 
g <- lda(default~., data=Default)
pred <- predict(g, default)
table(pred$class, default)
mean(pred$class!=default)

 

 

  • False Positive rate : The fraction of negative examples that are classified as positive(0.22%)
  • False Negative rate : The fraction of positive examples that are classified as negative(76.2%)
  • If we classified the prior - always to class "No", then we would make 333/10000 errors, which is only 3.33%.