[R] Linear Model

Data Science/R

[R] Linear Model

See_the_forest 2022. 10. 5. 16:47

1. OLS(Ordinary Least Square) model

The linear regression model : \(Y = \beta_0 + \beta_1 X_1 + ... \beta_p X_p + \epsilon\)
OLS
- Ordinary least squared (OLS) is a type of linear least squares method for estimating the unkown parameters in a linear regression.
- All parameters of OLS model are unbiased estimators.
- \(E(\hat{\beta}^{OLS}) = \beta\)
- \(Var(\hat{\beta}^{OLS}) ↓\)
- Problems in multiple linear regression
  - OLS cannot be computed when n < p (high-dimensional data).
  - OLS has relatively large variance.

2. Unbiased estimators of OLS model

set.seed(123)
n <- 100
# pp : pp means the number of samples 
pp <- c(10, 50, 80, 95, 97, 98, 99)
B <- matrix(0, 100, length(pp))


for (i in 1:100) {
  for (j in 1:length(pp)) {
    beta <- rep(0, pp[j])
    # beta1 == 1, beta0, beta2, ... betap -> 0 
    beta[1] <- 1
    x <- matrix(rnorm(n*pp[j]), n, pp[j])
    # x %*% beta is same as x[, 1]
    y <- x %*% beta + rnorm(n)
    g <- lm(y~x)
    B[i,j] <- g$coef[2]
  }
}

# Visualization of E(beta_{ols})
boxplot(B, col="orange", boxwex=0.6, ylab="Coefficient estimates",
        names=pp, xlab="The number of predictors", ylim=c(-5,5))
abline(h=1, col=2, lty=2, lwd=2)
apply(B, 2, mean)
apply(B, 2, var)

This means the mean of \(\hat{\beta}^{OLS}\) is located 1 approximately.
If the number of variable is increasing(p is increasing), then the variance becomes higher.

저작자표시