1. Discriminant Analysis
- Bayes Theorem : \(Pr(Y=k|X=x) = \frac{\pi_k f_k(x)}{\sum_{l=1}^K \pi_l f_l(x)}\)
- The density for \(X\) in class \(k\) : \(f_k(x) = \frac{1}{\sqrt{2\pi}\sigma_k}e^{-\frac{(x-\mu_k)^2}{2\sigma_k^2}}\)
- The prior probability for class \(k\) : \(\pi_k = Pr(Y=k)\)
- If the distribution of the \(X\) is approximately normal, LDA and QDA is more stable.
1.1 LDA(Linear Discriminant Analysis)
- Assume that \(\sigma = \sigma_1 = \sigma_2 = ... \sigma_k\)
- \(f_k(x)\)
- p = 1 : \(f_k(x) \to N(\mu_k, \sigma)\)
- p > 1 : \(f_k(x) \to N(\mu_k, \sum)\)
- \(p_k(x) = \frac{\pi_k \frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu_k)^2}{2\sigma^2}}}{\sum_{l=1}^K \pi_l \frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu_l)^2}{2\sigma^2}}}\)
- To classify at the value \(X=x\), see which of the \(p_k(x)\) is largest.
- Because below term of \(p_k(x)\) is same, we need to consider upper terms.
- Discriminant score :
- p = 1 : \(\sigma_k(x) = x\frac{\mu_k}{\sigma^2} - \frac{\mu_k^2}{2\sigma^2} + log(\pi_k)\)
- p > 1 : \(\sigma_k(x) = x^T\sum^{-1}\mu_k - \frac{1}{2}\mu_k^T\sum^{-1}\mu_k + log(\pi_k)\)
- \(\sigma_k(x)\) is a linear function of \(x\).
- Discriminant score :
- Decision boundary at \(x = \frac{\hat{\mu_1} + \hat{\mu_2}}{2}\)
- Need to estimate : \((\mu_1, ..., \mu_k), (\pi_1, ..., \pi_k), \sum\)
1.2 [Ex] Iris Data
Training model with LDA
# open the iris dataset
data(iris)
str(iris)
summary(iris)
plot(iris[, -5], col=as.numeric(iris$Species) + 1)
# Apply LDA for iris data
library(MASS)
g <- lda(Species ~., data=iris)
plot(g)
plot(g, dimen=1)
Compute Missclassification Rate for training sets
# Compute misclassification error for training sets
pred <- predict(g)
table(pred$class, iris$Species)
mean(pred$class!=iris$Species)
We can get inferred probability 𝑝̂ 𝑘(𝑥) : predict(g, iris)$posterior[-tran, ].
Performance comparison between LDA vs Multinomial Regression
library(nnet)
set.seed(1234)
K <- 100
RES <- array(0, c(K, 2))
for (i in 1:K) {
tran.num <- sample(nrow(iris), size=floor(nrow(iris)*2/3))
tran <- as.logical(rep(0, nrow(iris)))
tran[tran.num] <- TRUE
g1 <- lda(Species ~., data=iris, subset=tran)
g2 <- multinom(Species ~., data=iris, subset=tran, trace=FALSE)
pred1 <- predict(g1, iris[!tran,])$class
pred2 <- predict(g2, iris[!tran,])
RES[i, 1] <- mean(pred1!=iris$Species[!tran])
RES[i, 2] <- mean(pred2!=iris$Species[!tran])
}
apply(RES, 2, mean)
# 0.0254 0.0436
1.3 [Ex] Default Dataset
# Importing library
library(ISLR)
data(Default)
attach(Default)
library(MASS)
# Training model
g <- lda(default~., data=Default)
pred <- predict(g, default)
table(pred$class, default)
mean(pred$class!=default)
- False Positive rate : The fraction of negative examples that are classified as positive(0.22%)
- False Negative rate : The fraction of positive examples that are classified as negative(76.2%)
- If we classified the prior - always to class "No", then we would make 333/10000 errors, which is only 3.33%.
'Data Science > R' 카테고리의 다른 글
[R] Classification Problem : QDA, Naive Bayes, KNN (0) | 2022.11.06 |
---|---|
[R] Assessment of the Performance of Classifier (0) | 2022.11.05 |
[R] Classification Problem : Logistic Regression (0) | 2022.10.18 |
[R] Elastic-Net Regression (0) | 2022.10.13 |
[R] Regularization Methods (0) | 2022.10.11 |