Data Science/R
[R] Linear Model
See_the_forest
2022. 10. 5. 16:47
1. OLS(Ordinary Least Square) model
- The linear regression model : \(Y = \beta_0 + \beta_1 X_1 + ... \beta_p X_p + \epsilon\)
- OLS
- Ordinary least squared (OLS) is a type of linear least squares method for estimating the unkown parameters in a linear regression.
- All parameters of OLS model are unbiased estimators.
- \(E(\hat{\beta}^{OLS}) = \beta\)
- \(Var(\hat{\beta}^{OLS}) ↓\)
- Problems in multiple linear regression
- OLS cannot be computed when n < p (high-dimensional data).
- OLS has relatively large variance.
2. Unbiased estimators of OLS model
set.seed(123)
n <- 100
# pp : pp means the number of samples
pp <- c(10, 50, 80, 95, 97, 98, 99)
B <- matrix(0, 100, length(pp))
for (i in 1:100) {
for (j in 1:length(pp)) {
beta <- rep(0, pp[j])
# beta1 == 1, beta0, beta2, ... betap -> 0
beta[1] <- 1
x <- matrix(rnorm(n*pp[j]), n, pp[j])
# x %*% beta is same as x[, 1]
y <- x %*% beta + rnorm(n)
g <- lm(y~x)
B[i,j] <- g$coef[2]
}
}
# Visualization of E(beta_{ols})
boxplot(B, col="orange", boxwex=0.6, ylab="Coefficient estimates",
names=pp, xlab="The number of predictors", ylim=c(-5,5))
abline(h=1, col=2, lty=2, lwd=2)
apply(B, 2, mean)
apply(B, 2, var)
- This means the mean of \(\hat{\beta}^{OLS}\) is located 1 approximately.
- If the number of variable is increasing(p is increasing), then the variance becomes higher.