An ensemble methods is an approach that combines many simple building ensemble block models in order to obtain a single and potentially very powerful model.
These simple building block models are sometimes known as weak learners.
Methods
Bagging
Random Forest
Boosting
Bayesian additive regression trees
2. Bagging
2.1 Boostrap methods
Referring () as population, sample n with replacement.
Iterating n times : making n Bootstrap sets
Calculating statistics from sampled sets ()
Calculating aggregated statistics from Bootstrap sets
# Population of training set
seq(20)
# Boostrap set
sort(sample(seq(20), 20))
2.2 Bagging Tree methods
Bootstrap aggregation, or bagging, is a general-purpose procedure for reducing the variance of a statistical learning method.
Repeat calculating statistical with every sampled sets :
Taking repeated samples from the training set.
Generate B different bootstrapped training data sets :
For classification trees : for each test observation, we record the class predicted by each of the B trees, and take a majority vote : the overall prediction is the most commonly occuring class among the B predictions.
3. Building Bagging Decision Tree
# Separate training and test sets
set.seed(123)
train <- sample(1:nrow(Heart), nrow(Heart)/2)
test <- setdiff(1:nrow(Heart), train)
# Classification error rate for single tree
heart.tran <- tree(AHD ~., subset=train, Heart)
heart.pred <- predict(heart.tran, Heart[test, ], type="class")
tree.err <- mean(Heart$AHD[test]!=heart.pred)
tree.err
# Bagging
set.seed(12345)
B <- 500
n <- nrow(Heart)
Vote <- rep(0, length(test))
bag.err <- NULLfor (i in1:B) {
# Bootstrap training set
index <- sample(train, replace=TRUE)
heart.tran <- tree(AHD ~., Heart[index,])
heart.pred <- predict(heart.tran, Heart[test, ], type="class")
Vote[heart.pred=="Yes"] <- Vote[heart.pred=="Yes"] + 1
preds <- rep("Yes", length(test))
# Decide as "No" when the number of voted case is lower than i/2 # Apply majority rules
preds[Vote < i/2] <- "No"
bag.err[i] <- mean(Heart$AHD[test]!=preds)
}
# Visualize bagging decision tree
plot(bag.err, type="l", xlab="Number of Trees", col=1, ylab="Classification Error Rate")
abline(h=tree.err, lty=2, col=2)
legend("topright", c("Single tree", "Bagging"), col=c(2,1), lty=c(2,1))
Missclassification rate converges to 0.23xxx
4. Out-of-Bag Error Estimation
The key to bagging is that trees are repeatedly fit to bootstrapped subsets of the observations.
One can show that on average, each bagged tree makes use of around two-thirds of the observations.
The remaining one-third of the observations not used to fit a given bagged tree are referred to as out-of-bag(OOB) observations.
We can predict the response for the th observations using each of the trees in which that observation was OOB. This will yield around predictions for the th observation on average.
4.1 [Ex] Average of missclassification rate of single tree
# Average of missclassification rate of single tree# Over 50 replications
set.seed(12345)
K <- 50
Err <- NULLfor (i in1:K) {
train <- sample(1:nrow(Heart), nrow(Heart)*2/3)
test <- setdiff(1:nrow(Heart), train)
heart.tran <- tree(AHD ~., subset=train, Heart)
heart.pred <- predict(heart.tran, Heart[test, ], type="class")
Err[i] <- mean(Heart$AHD[test]!=heart.pred)
}
summary(Err)
Tree.Err <- mean(Err)
Min
1st
Median
Mean
3rd
Max
0.1616
0.2121
0.2424
0.2473
0.2727
0.3333
Over 50 replications, the mean of missclassification rates is 0.2424.
4.1 [Ex] Out-of-Bagging missclassification rate
# OOB
set.seed(1234)
Valid <- Vote <- Mis <- rep(0, nrow(Heart))
OOB.err <- NULLfor (i in1:B) {
# Bootstrapping from Heart index
index <- sample(1:nrow(Heart), replace=TRUE)
# Extract test index from boostrapped index
test <- setdiff(1:nrow(Heart), unique(index))
Valid[test] <- Valid[test] + 1# Train model with bootstrapped training sample
heart.tran <- tree(AHD ~., Heart[index,])
# Make predictions of test sets
heart.pred <- predict(heart.tran, Heart[test,], type="class")
Vote[test] <- Vote[test] + (heart.pred=="Yes")
# Vote for test sets
preds <- rep("Yes", length(test))
preds[Vote[test]/Valid[test] < 0.5] <- "No"# Find index of misscalssified case
wh <- which(Heart$AHD[test]!=preds)
Mis[test[wh]] <- -1
Mis[test[-wh]] <- 1
OOB.err[i] <- sum(Mis==-1)/sum(Mis!=0)
}
# View statistical reports of error
summary(OOB.err)
summary(OOB.err[-c(1:100)])
# Visualize results
plot(OOB.err, type="l", xlab="Number of Trees", col=1,
ylab="Classification Error Rate", ylim=c(0.1,0.4))
abline(h=Tree.Err, lty=2, col=2)
legend("topright", c("Single tree", "OOB"), col=c(2,1), lty=c(2,1))