1. OLS(Ordinary Least Square) model
- The linear regression model : \(Y = \beta_0 + \beta_1 X_1 + ... \beta_p X_p + \epsilon\)
- OLS
- Ordinary least squared (OLS) is a type of linear least squares method for estimating the unkown parameters in a linear regression.
- All parameters of OLS model are unbiased estimators.
- \(E(\hat{\beta}^{OLS}) = \beta\)
- \(Var(\hat{\beta}^{OLS}) ↓\)
- Problems in multiple linear regression
- OLS cannot be computed when n < p (high-dimensional data).
- OLS has relatively large variance.
2. Unbiased estimators of OLS model
set.seed(123)
n <- 100
# pp : pp means the number of samples
pp <- c(10, 50, 80, 95, 97, 98, 99)
B <- matrix(0, 100, length(pp))
for (i in 1:100) {
for (j in 1:length(pp)) {
beta <- rep(0, pp[j])
# beta1 == 1, beta0, beta2, ... betap -> 0
beta[1] <- 1
x <- matrix(rnorm(n*pp[j]), n, pp[j])
# x %*% beta is same as x[, 1]
y <- x %*% beta + rnorm(n)
g <- lm(y~x)
B[i,j] <- g$coef[2]
}
}
# Visualization of E(beta_{ols})
boxplot(B, col="orange", boxwex=0.6, ylab="Coefficient estimates",
names=pp, xlab="The number of predictors", ylim=c(-5,5))
abline(h=1, col=2, lty=2, lwd=2)
apply(B, 2, mean)
apply(B, 2, var)
- This means the mean of \(\hat{\beta}^{OLS}\) is located 1 approximately.
- If the number of variable is increasing(p is increasing), then the variance becomes higher.
'Data Science > R' 카테고리의 다른 글
[R] Variable Selection Methods : Ridge (0) | 2022.10.05 |
---|---|
[R] Best Subset Selection (0) | 2022.10.05 |
[R] Cross Validation (0) | 2022.10.05 |
[R] Assessing Model Accuracy (0) | 2022.10.05 |
[R] Flexibility and Interpretability (0) | 2022.10.05 |