1. What is Boosting?
- Boosting is a general approach that can be applied to many statistical learning methods for regression or classification.
- Boosting grow trees sequentially.
- The boosting approach learns slowly.
- Given the current model, we fit a decision tree to the residuals from the model. We then add this new decision tree into the fitted function in order to update the residuals.
- Each of thses models can be rather small, with just a few terminal nodes, determined by the parameter \(d\) in the algorithm.
- The kinds of boosting algorithms
- XGBoost
- Adaboost
1.1 Boosting algorithm
- Set \(\hat{f}(x) = 0\) and \(r_i = y_i\) for all \(i\) in the training set.
- For \(b = 1, 2, ..., B\), repeat :
- Fit a tree \(\hat{f}^b\) with \(d\) splits to the training data \((X, r)\).
- Update \(\hat{f}\) by adding in a shrunken version of the new tree : \(\hat{f}(x) \leftarrow \hat{f}(x) + \lambda\hat{f}^b(x)\)
- Update the residuals, \(r_i \leftarrow r_i - \lambda\hat{f}^b(x_i)\)
- Output the boosted model : \(\hat{f}(x) = \sum_{b=1}^B \lambda \hat{f}^b(x)\).
1.2 Tuning parameters for boosting
- The number of trees \(B\) : Boosting can be overfit if \(B\) is too large, altough this overfitting tends to occur slowly if at all.
- The shrinkage parameter \(\lambda\) : Controls the rate at which boosting learns. Typically values are 0.01 or 0.001.
- The number of splits \(d\) in each tree : Controls the complexity of the boosted ensemble.
2. Building Boosting Decision Tree for Classification problem
Step 1 : Prerequirisite
# Prerequirisite : Importing library, dataset, preprocessing
library(gbm)
# Train-Test split
set.seed(123)
train <- sample(1:nrow(Heart), nrow(Heart)/2)
test <- setdiff(1:nrow(Heart), train)
# Create (0,1) response
Heart0 <- Heart
Heart0[,"AHD"] <- as.numeric(Heart$AHD)-1
Step 2 : Training model with tuing parameters
# boosting (d=1)
boost.d1 <- gbm(AHD~., data=Heart0[train, ], n.trees=1000,
distribution="bernoulli", interaction.depth=1)
# Results of feature selection
summary(boost.d1)
# Calcualte missclassification error
# We need to set value of keyword n.trees for boosting trees
yhat.d1 <- predict(boost.d1, newdata=Heart0[test, ], type="response", n.trees=1000)
phat.d1 <- rep(0, length(yhat.d1))
phat.d1[yhat.d1 > 0.5] <- 1
mean(phat.d1!=Heart0[test, "AHD"])
# boosting (d=2)
# Training model with max_depth = 2
boost.d2 <- gbm(AHD~., data=Heart0[train, ], n.trees=1000,
distribution="bernoulli", interaction.depth=2)
# Make predictions
yhat.d2 <- predict(boost.d2, newdata=Heart0[test, ],
type="response", n.trees=1000)
phat.d2 <- rep(0, length(yhat.d2))
phat.d2[yhat.d2 > 0.5] <- 1
mean(phat.d2!=Heart0[test, "AHD"])
# boosting (d=3)
# Training model with max_depth = 3
boost.d3 <- gbm(AHD~., data=Heart0[train, ], n.trees=1000,
distribution="bernoulli", interaction.depth=3)
# Make predictions
yhat.d3 <- predict(boost.d3, newdata=Heart0[test, ],
type="response", n.trees=1000)
phat.d3 <- rep(0, length(yhat.d3))
phat.d3[yhat.d3 > 0.5] <- 1
mean(phat.d3!=Heart0[test, "AHD"])
# boosting (d=4)
# Training model with max_depth = 4
boost.d4 <- gbm(AHD~., data=Heart0[train, ], n.trees=1000,
distribution="bernoulli", interaction.depth=4)
# Make predictions
yhat.d4 <- predict(boost.d4, newdata=Heart0[test, ],
type="response", n.trees=1000)
phat.d4 <- rep(0, length(yhat.d4))
phat.d4[yhat.d4 > 0.5] <- 1
mean(phat.d4!=Heart0[test, "AHD"]
gbm(interaction.depth=1) | gbm(interaction.depth=2) | gbm(intercation.depth=3) | gbm(iteraction.depth=4) |
0.2013423 | 0.2013423 | 0.2281879 | 0.1879105 |
3. Boosting Decision Tree with Hyperparameter Tuning(n.trees, interaction.depth)
# Simulation: Boosting with d=1, 2, 3 and 4
# The number of trees: 1 to 3000
# Set grids
set.seed(1111)
Err <- matrix(0, 3000, 4)
# Training models with n.trees, interaction.depth : 1 to 4
for (k in 1:4) {
boost <- gbm(AHD~., data=Heart0[train, ], n.trees=3000,
distribution="bernoulli", interaction.depth=k)
for (i in 1:3000) {
# Make predictions of n.trees : 1 to 3000
yhat <- predict(boost, newdata=Heart0[test, ],
type="response", n.trees=i)
phat <- rep(0, length(yhat))
phat[yhat > 0.5] <- 1
Err[i,k] <- mean(phat!=Heart0[test, "AHD"])
}
}
# Visualize results
labels <- c("d = 1", "d = 2", "d = 3", "d = 4")
matplot(Err, type="l", xlab="Number of Trees", lty=2, col=1:4,
ylab="Classification Error Rate")
legend("topright", legend=labels, col=1:4, lty=1)
# View statistical reports
colnames(Err) <- labels
apply(Err, 2, summary)
apply(Err[-c(1:100),], 2, summary)
- Because the Boosting model is a continuously updated model, the initially trained model is less accurate.
'Data Science > R' 카테고리의 다른 글
[R] Support Vector Machine (0) | 2022.12.06 |
---|---|
[R] Tree-Based Methods : Bayesian Additive Tree (0) | 2022.11.27 |
[R] Tree-Based Methods : Random Forest (0) | 2022.11.27 |
[R] Tree-Based Methods : Bagging (0) | 2022.11.27 |
[R] Tree-Based Methods : Advantages and Disadvantages of Tree (0) | 2022.11.27 |