1. Two types of Missclassification errors
We can change the two error rates by changing the thershold from 0.5 to some other value in [0, 1].
- \(\hat{Pr}(Default = Yes|Balance, Student) \ge \alpha\)
- \(\alpha\) is a threshold.
- If \(\alpha\) ↑, FN increase while FP decrease.
- If \(\alpha\) ↓, FN decrease while FP increase.
2. [Ex] Changes in errors along with Thresholds
thresholds <- seq(0, 1, 0.01)
res <- matrix(NA, length(thresholds), 3)
# Compute overall error, false positive, false negatives
for (i in 1:length(thresholds)) {
decision <- rep("No", length(default))
decision[pred$posterior[,2] >= thresholds[i]] <- "Yes"
res[i, 1] <- mean(decision != default)
res[i, 2] <- mean(decision[default=="No"]=="Yes")
res[i, 3] <- mean(decision[default=="Yes"]=="No")
}
k <- 1:51
matplot(thre[k], res[k,], col=c(1,"orange",4), lty=c(1,4,2), type="l", xlab="Threshold", ylab="Error Rate", lwd=2)
legend("top", c("Overall Error", "False Positive", "False Negative"), col=c(1,"orange",4), lty=c(1,4,2), cex=1.2)
apply(res, 2, which.min)
- The overall error seems to decrease in every alpha, because there are only 22 FP rate.
- However, it will increase slightly after the turning point.
- We can made model by setting adequate thresholds in specific situation or problems.
3. Confusion Matrix
4. ROC curve
- Class-specific performance in medicine and biology : Sensitive(TPR) and specificity(TNR)
- The ROC(Receiver Operating Characteristics) curve
- (\(\alpha=1\), TPR=0, TNR=1) in left-lower point, (\(\alpha=0\), TPR=1, TNR=0) in right-upper point.
- The overall performance of a classifier : AUC(The area under the ROC curve)
- Larger the AUC, better the classifier
5. [Ex] ROC curve
Way1 : Drawing ROC curve
# Prerequirisite
library(ISLR)
data(Default)
attach(Default)
library(MASS)
# Train model
g <- lda(default~., data=Default)
pred <- predict(g, default)
# Error grids
thre <- seq(0,1,0.001)
Sen <- Spe <- NULL
RES <- matrix(NA, length(thre), 4)
# Classification metrics
colnames(RES) <- c("TP", "TN", "FP", "FN")
for (i in 1:length(thre)) {
decision <- rep("No", length(default))
decision[pred$posterior[,2] >= thre[i]] <- "Yes"
Sen[i] <- mean(decision[default=="Yes"] == "Yes")
Spe[i] <- mean(decision[default=="No"] == "No")
RES[i,1] <- sum(decision[default=="Yes"] == "Yes")
RES[i,2] <- sum(decision[default=="No"] == "No")
RES[i,3] <- sum(decision=="Yes") - RES[i,1]
RES[i,4] <- sum(default=="Yes") - RES[i,1]
}
# Visualize ROc curve
plot(1-Spe, Sen, type="b", pch=20, xlab="False positive rate",
col="darkblue", ylab="True positive rate", main="ROC Curve")
abline(0, 1, lty=3, col="gray")
Way2 : Drawing ROC curve
# Way 2 : Calculating TPR, TNR
TPR <- RES[,1] / (RES[,1] + RES[,4])
TNR <- RES[,2] / (RES[,2] + RES[,3])
plot(1-TNR, TPR, type="b", pch=20, xlab="False positive rate",
col="darkblue", ylab="True positive rate", main="ROC Curve")
abline(0, 1, lty=3, col="gray")
Way3 : ROC curve with ROCR package
library(ROCR)
# Compute ROC curve
label <- factor(default, levels=c("Yes","No"),
labels=c("TRUE","FALSE"))
preds <- prediction(pred$posterior[,2], label)
perf <- performance(preds, "tpr", "fpr" )
# Visualization
plot(perf, lwd=4, col="darkblue")
abline(a=0, b=1, lty=2)
slotNames(perf)
k <- 1:100
# X - axis values : FPR
list(perf@x.name, perf@x.values[[1]][k])
# Y - axis values : TPR
list(perf@y.name, perf@y.values[[1]][k])
# alpha - cutoffs
list(perf@alpha.name, perf@alpha.values[[1]][k])
# Compute AUC
performance(preds, "auc")@y.values
'Data Science > R' 카테고리의 다른 글
[R] Non-Linear Models : Polynomials (0) | 2022.11.14 |
---|---|
[R] Classification Problem : QDA, Naive Bayes, KNN (0) | 2022.11.06 |
[R] Classification Problem : LDA(Linear Discriminant Analysis) (0) | 2022.10.31 |
[R] Classification Problem : Logistic Regression (0) | 2022.10.18 |
[R] Elastic-Net Regression (0) | 2022.10.13 |