본문 바로가기

Data Science/R

[R] Tree-Based Methods : Boosting

1. What is Boosting?

  • Boosting is a general approach that can be applied to many statistical learning methods for regression or classification.
  • Boosting grow trees sequentially.
  • The boosting approach learns slowly.
  • Given the current model, we fit a decision tree to the residuals from the model. We then add this new decision tree into the fitted function in order to update the residuals.
  • Each of thses models can be rather small, with just a few terminal nodes, determined by the parameter \(d\) in the algorithm.
  • The kinds of boosting algorithms
    • XGBoost
    • Adaboost

 

1.1 Boosting algorithm

  1. Set \(\hat{f}(x) = 0\) and \(r_i = y_i\) for all \(i\) in the training set.
  2. For \(b = 1, 2, ..., B\), repeat :
    1. Fit a tree \(\hat{f}^b\) with \(d\) splits to the training data \((X, r)\).
    2. Update \(\hat{f}\) by adding in a shrunken version of the new tree : \(\hat{f}(x) \leftarrow \hat{f}(x) + \lambda\hat{f}^b(x)\)
    3. Update the residuals, \(r_i \leftarrow r_i - \lambda\hat{f}^b(x_i)\)
  3. Output the boosted model : \(\hat{f}(x) = \sum_{b=1}^B \lambda \hat{f}^b(x)\).

 

1.2 Tuning parameters for boosting 

  • The number of trees \(B\) : Boosting can be overfit if \(B\) is too large, altough this overfitting tends to occur slowly if at all.
  • The shrinkage parameter \(\lambda\) : Controls the rate at which boosting learns. Typically values are 0.01 or 0.001.
  • The number of splits \(d\) in each tree : Controls the complexity of the boosted ensemble.

 

2. Building Boosting Decision Tree for Classification problem

Step 1 : Prerequirisite 

# Prerequirisite : Importing library, dataset, preprocessing 
library(gbm)

# Train-Test split 
set.seed(123)
train <- sample(1:nrow(Heart), nrow(Heart)/2)
test <- setdiff(1:nrow(Heart), train)

# Create (0,1) response
Heart0 <- Heart
Heart0[,"AHD"] <- as.numeric(Heart$AHD)-1

 

Step 2 : Training model with tuing parameters

# boosting (d=1)
boost.d1 <- gbm(AHD~., data=Heart0[train, ], n.trees=1000,
distribution="bernoulli", interaction.depth=1)

# Results of feature selection 
summary(boost.d1)

# Calcualte missclassification error 
# We need to set value of keyword n.trees for boosting trees 
yhat.d1 <- predict(boost.d1, newdata=Heart0[test, ], type="response", n.trees=1000)
phat.d1 <- rep(0, length(yhat.d1))
phat.d1[yhat.d1 > 0.5] <- 1
mean(phat.d1!=Heart0[test, "AHD"])

# boosting (d=2)
# Training model with max_depth = 2
boost.d2 <- gbm(AHD~., data=Heart0[train, ], n.trees=1000,
distribution="bernoulli", interaction.depth=2)

# Make predictions 
yhat.d2 <- predict(boost.d2, newdata=Heart0[test, ],
type="response", n.trees=1000)
phat.d2 <- rep(0, length(yhat.d2))
phat.d2[yhat.d2 > 0.5] <- 1
mean(phat.d2!=Heart0[test, "AHD"])

# boosting (d=3)
# Training model with max_depth = 3
boost.d3 <- gbm(AHD~., data=Heart0[train, ], n.trees=1000,
distribution="bernoulli", interaction.depth=3)

# Make predictions
yhat.d3 <- predict(boost.d3, newdata=Heart0[test, ],
type="response", n.trees=1000)
phat.d3 <- rep(0, length(yhat.d3))
phat.d3[yhat.d3 > 0.5] <- 1
mean(phat.d3!=Heart0[test, "AHD"])

# boosting (d=4)
# Training model with max_depth = 4
boost.d4 <- gbm(AHD~., data=Heart0[train, ], n.trees=1000,
distribution="bernoulli", interaction.depth=4)

# Make predictions 
yhat.d4 <- predict(boost.d4, newdata=Heart0[test, ],
type="response", n.trees=1000)
phat.d4 <- rep(0, length(yhat.d4))
phat.d4[yhat.d4 > 0.5] <- 1
mean(phat.d4!=Heart0[test, "AHD"]

 

 

gbm(interaction.depth=1) gbm(interaction.depth=2) gbm(intercation.depth=3) gbm(iteraction.depth=4)
0.2013423 0.2013423 0.2281879 0.1879105

 

3. Boosting Decision Tree with Hyperparameter Tuning(n.trees, interaction.depth)

# Simulation: Boosting with d=1, 2, 3 and 4
# The number of trees: 1 to 3000

# Set grids 
set.seed(1111)
Err <- matrix(0, 3000, 4)

# Training models with n.trees, interaction.depth : 1 to 4
for (k in 1:4) {
    boost <- gbm(AHD~., data=Heart0[train, ], n.trees=3000,
distribution="bernoulli", interaction.depth=k)
    for (i in 1:3000) {
        # Make predictions of n.trees : 1 to 3000 
        yhat <- predict(boost, newdata=Heart0[test, ],
        type="response", n.trees=i)
        phat <- rep(0, length(yhat))
        phat[yhat > 0.5] <- 1
        Err[i,k] <- mean(phat!=Heart0[test, "AHD"])
    }
}

# Visualize results 
labels <- c("d = 1", "d = 2", "d = 3", "d = 4")
matplot(Err, type="l", xlab="Number of Trees", lty=2, col=1:4,
ylab="Classification Error Rate")
legend("topright", legend=labels, col=1:4, lty=1)

# View statistical reports 
colnames(Err) <- labels
apply(Err, 2, summary)
apply(Err[-c(1:100),], 2, summary)

 

 

  • Because the Boosting model is a continuously updated model, the initially trained model is less accurate.