See_the_forest 2022. 10. 5. 16:47

1. OLS(Ordinary Least Square) model

  • The linear regression model : \(Y = \beta_0 + \beta_1 X_1 + ... \beta_p X_p + \epsilon\)
  • OLS
    • Ordinary least squared (OLS) is a type of linear least squares method for estimating the unkown parameters in a linear regression.
    • All parameters of OLS model are unbiased estimators.
    • \(E(\hat{\beta}^{OLS}) = \beta\)
    • \(Var(\hat{\beta}^{OLS}) ↓\)
    • Problems in multiple linear regression
      • OLS cannot be computed when n < p (high-dimensional data).
      • OLS has relatively large variance. 


2. Unbiased estimators of OLS model


n <- 100
# pp : pp means the number of samples 
pp <- c(10, 50, 80, 95, 97, 98, 99)
B <- matrix(0, 100, length(pp))

for (i in 1:100) {
  for (j in 1:length(pp)) {
    beta <- rep(0, pp[j])
    # beta1 == 1, beta0, beta2, ... betap -> 0 
    beta[1] <- 1
    x <- matrix(rnorm(n*pp[j]), n, pp[j])
    # x %*% beta is same as x[, 1]
    y <- x %*% beta + rnorm(n)
    g <- lm(y~x)
    B[i,j] <- g$coef[2]

# Visualization of E(beta_{ols})
boxplot(B, col="orange", boxwex=0.6, ylab="Coefficient estimates",
        names=pp, xlab="The number of predictors", ylim=c(-5,5))
abline(h=1, col=2, lty=2, lwd=2)
apply(B, 2, mean)
apply(B, 2, var)


  • This means the mean of \(\hat{\beta}^{OLS}\) is located 1 approximately.
  • If the number of variable is increasing(p is increasing), then the variance becomes higher.