[R] Assessment of the Performance of Classifier

1. Two types of Missclassification errors

We can change the two error rates by changing the thershold from 0.5 to some other value in [0, 1].

\(\hat{Pr}(Default = Yes|Balance, Student) \ge \alpha\)
\(\alpha\) is a threshold.
If \(\alpha\) ↑, FN increase while FP decrease.
If \(\alpha\) ↓, FN decrease while FP increase.

2. [Ex] Changes in errors along with Thresholds

thresholds <- seq(0, 1, 0.01) 
res <- matrix(NA, length(thresholds), 3) 

# Compute overall error, false positive, false negatives
for (i in 1:length(thresholds)) {
    decision <- rep("No", length(default))
    decision[pred$posterior[,2] >= thresholds[i]] <- "Yes"
    res[i, 1] <- mean(decision != default)
    res[i, 2] <- mean(decision[default=="No"]=="Yes")
    res[i, 3] <- mean(decision[default=="Yes"]=="No")
}

k <- 1:51
matplot(thre[k], res[k,], col=c(1,"orange",4), lty=c(1,4,2), type="l", xlab="Threshold", ylab="Error Rate", lwd=2)
legend("top", c("Overall Error", "False Positive", "False Negative"), col=c(1,"orange",4), lty=c(1,4,2), cex=1.2)
apply(res, 2, which.min)

The overall error seems to decrease in every alpha, because there are only 22 FP rate.
However, it will increase slightly after the turning point.
We can made model by setting adequate thresholds in specific situation or problems.

3. Confusion Matrix

4. ROC curve

Class-specific performance in medicine and biology : Sensitive(TPR) and specificity(TNR)
The ROC(Receiver Operating Characteristics) curve
(\(\alpha=1\), TPR=0, TNR=1) in left-lower point, (\(\alpha=0\), TPR=1, TNR=0) in right-upper point.
The overall performance of a classifier : AUC(The area under the ROC curve)
- Larger the AUC, better the classifier

5. [Ex] ROC curve

Way1 : Drawing ROC curve

# Prerequirisite
library(ISLR)
data(Default)
attach(Default)
library(MASS)

# Train model 
g <- lda(default~., data=Default)
pred <- predict(g, default)

# Error grids
thre <- seq(0,1,0.001)
Sen <- Spe <- NULL
RES <- matrix(NA, length(thre), 4)

# Classification metrics 
colnames(RES) <- c("TP", "TN", "FP", "FN")
for (i in 1:length(thre)) {
  decision <- rep("No", length(default))
  decision[pred$posterior[,2] >= thre[i]] <- "Yes"
  Sen[i] <- mean(decision[default=="Yes"] == "Yes")
  Spe[i] <- mean(decision[default=="No"] == "No")
  RES[i,1] <- sum(decision[default=="Yes"] == "Yes")
  RES[i,2] <- sum(decision[default=="No"] == "No")
  RES[i,3] <- sum(decision=="Yes") - RES[i,1]
  RES[i,4] <- sum(default=="Yes") - RES[i,1]
}

# Visualize ROc curve 
plot(1-Spe, Sen, type="b", pch=20, xlab="False positive rate",
     col="darkblue", ylab="True positive rate", main="ROC Curve")
abline(0, 1, lty=3, col="gray")

Way2 : Drawing ROC curve

# Way 2 : Calculating TPR, TNR
TPR <- RES[,1] / (RES[,1] + RES[,4])
TNR <- RES[,2] / (RES[,2] + RES[,3])

plot(1-TNR, TPR, type="b", pch=20, xlab="False positive rate",
col="darkblue", ylab="True positive rate", main="ROC Curve")
abline(0, 1, lty=3, col="gray")

Way3 : ROC curve with ROCR package

library(ROCR)

# Compute ROC curve
label <- factor(default, levels=c("Yes","No"),
labels=c("TRUE","FALSE"))
preds <- prediction(pred$posterior[,2], label)
perf <- performance(preds, "tpr", "fpr" )

# Visualization 
plot(perf, lwd=4, col="darkblue")
abline(a=0, b=1, lty=2)
slotNames(perf)

k <- 1:100
# X - axis values : FPR 
list(perf@x.name, perf@x.values[[1]][k])
# Y - axis values : TPR 
list(perf@y.name, perf@y.values[[1]][k])
# alpha - cutoffs 
list(perf@alpha.name, perf@alpha.values[[1]][k])

# Compute AUC
performance(preds, "auc")@y.values

저작자표시 (새창열림)

'Data Science > R' 카테고리의 다른 글

[R] Non-Linear Models : Polynomials (0)	2022.11.14
[R] Classification Problem : QDA, Naive Bayes, KNN (0)	2022.11.06
[R] Classification Problem : LDA(Linear Discriminant Analysis) (0)	2022.10.31
[R] Classification Problem : Logistic Regression (0)	2022.10.18
[R] Elastic-Net Regression (0)	2022.10.13

See the forest

[R] Assessment of the Performance of Classifier

1. Two types of Missclassification errors

2. [Ex] Changes in errors along with Thresholds

3. Confusion Matrix

4. ROC curve

5. [Ex] ROC curve

Way1 : Drawing ROC curve

Way2 : Drawing ROC curve

Way3 : ROC curve with ROCR package

'Data Science > R' 카테고리의 다른 글

티스토리툴바

[R] Assessment of the Performance of Classifier

1. Two types of Missclassification errors

2. [Ex] Changes in errors along with Thresholds

3. Confusion Matrix

4. ROC curve

5. [Ex] ROC curve

Way1 : Drawing ROC curve

Way2 : Drawing ROC curve

Way3 : ROC curve with ROCR package

'Data Science > R' 카테고리의 다른 글

'Data Science/R' Related Articles

티스토리툴바