1. Tree-Based Methods
- Tree-based methods for regression and classification involve stratifying or segmenting the predictor space into a number of simple regions
- The set of splitting rules used to segment the predictor space can be summarized in a tree.
- Tree-based methods are simple and useful for interpretation.
- Bagging and Random Forests grow multiple trees which are combined to yield a single consensus prediction.
- Combining a large number of trees can result in dramatic improvements in prediction accuracy, at the expense of some lose interpretation.
2. Decision Tree
- At a given internal node, the label (of the form (
) indicates the left-hand branch emanting from that split, and the right-hand branch corresponds to . - The number in each leaf is the mean of the response for the observation that fall there.
2.1 [Ex] Decision Trees using tree package
# Import library and dataset
library(ISLR)
library(tree)
data(Hitters)
str(Hitters)
# Missing data
summary(Hitters$Salary)
miss <- is.na(Hitters$Salary)
# Fit a regression tree
g <- tree(log(Salary) ~ Years + Hits, subset=!miss, Hitters)
g
summary(g)
# Draw a tree
plot(g)
text(g, pretty=0)

tree(formula, data, weights, subsets, x, y)
A tree is grown by binary response partitioning using the response in the specified formula and choosing splits from the terms of the right-hand side.
# Prune a tree
g2 <- prune.tree(g, best=3)
plot(g2)
text(g2, pretty=0)

prune.tree(tree, k=NULL, best=NULL, newdata)
Determine a nested sequence of subtrees of the supplied tree by recursively "snipping" off the least important splits.
- tree : fitted model object of class 'tree'.
- best : integer requesting the size of a specific subtree in the cost-complexity sequence to be returned.
2.2 [Ex] Decision Tree using rpart package
# Another R package for tree
library(rpart)
library(rpart.plot)
# Fit a regression tree
g3 <- rpart(log(Salary) ~ Years + Hits, subset=!miss, Hitters)
g3
summary(g3)
# Draw a fancy tree
prp(g3, branch.lwd=4, branch.col="darkgreen", extra=101)

3. Details of the Tree-building Process
- Decision trees are typically drawn upside down, in the sense that the leaves are at the bottom of the tree.
- The points along the tree where the predictor space is split are referred to as internal node.
- The predictor space :
- Set of possible values :
- Non-overlapping regions :
- Set of possible values :
- For every observation that falls into the region
, make the same prediction which is simply the mean of the response value for the training observations in . - Choose to divide the predictor space into high-dimensional rectangles.
The goal of constructing tree methods is finding boxes
3.1 Recursive Binary Splitting
- Creating a binary decision tree is actually a process of dividing up the input space.
- A greedy approach is used to divide the space called recursive binary splitting.
- It is greedy because at each step of the tree-building process, the best split is made at that particular step rather than picking a split that will lead to a better tree in future step.
- Select the predictor
such that splitting the predictor space into the region { } and { } leads to the greatest possible reduction in RSS. - Repeat the process, looking for the best predictor and best cutpoint in order to split the data further so as to minimize the RSS within each of the resulting regions.
- The process contiues until a stopping criterion is reached : no region contains more than five observations.
3.2 Pruning a Tree
- The process with many splits may produce good predictions on the training set, but is likely to overfit the data, leading poor testing set perfomance.
- A smaller tree with with fewer splits might lead lower variance and better interpretation at the cost of a little bias.
- Grow a very alrge tree
, and then prune it back in order to obtain a subtree. - Cost complexity pruning(weakest link pruning) :
controls a trade-off between the subtree's complexity and its fit to the training data.- Selecting an optimal value
using Cross Validation.
3.3 Summary : Tree Algorithm
- Use recursive binary splitting to grow a large tree on the training data, stopping only when each terminal node has fewer than some minimum number of observations(stopping criterion).
- Apply cost complexity pruing to the large tree in order to obtain a sequence of best subtrees, as a function of
.- We can use prune.tree(tree, best=K) function
- Use K-fold corss-validation to choos
. For each :- Repeat step 1 and 2 on the
th fraction of the training data, excluding the th fold. - Evaluate the mean squared prediction error on the data in the left-out
th fold, as a function of . - Average the results, and pick
to minimize the average error.
- Repeat step 1 and 2 on the
- Return the subtree from step 2 that corresponds to the chosen value of
.
'Data Science > R' 카테고리의 다른 글
| [R] Tree-Based Methods : Classification Decision Tree (1) | 2022.11.27 |
|---|---|
| [R] Tree-Based Methods : Regression Decision Tree (0) | 2022.11.27 |
| [R] Non-Linear Models : Local Regression, GAM (0) | 2022.11.14 |
| [R] Non-Linear Models : Splines (0) | 2022.11.14 |
| [R] Non-Linear Models : Step Functions (0) | 2022.11.14 |