본문 바로가기

Data Science/R

[R] Linear Model

1. OLS(Ordinary Least Square) model

  • The linear regression model : \(Y = \beta_0 + \beta_1 X_1 + ... \beta_p X_p + \epsilon\)
  • OLS
    • Ordinary least squared (OLS) is a type of linear least squares method for estimating the unkown parameters in a linear regression.
    • All parameters of OLS model are unbiased estimators.
    • \(E(\hat{\beta}^{OLS}) = \beta\)
    • \(Var(\hat{\beta}^{OLS}) ↓\)
    • Problems in multiple linear regression
      • OLS cannot be computed when n < p (high-dimensional data).
      • OLS has relatively large variance. 

 

2. Unbiased estimators of OLS model

 

set.seed(123)
n <- 100
# pp : pp means the number of samples 
pp <- c(10, 50, 80, 95, 97, 98, 99)
B <- matrix(0, 100, length(pp))


for (i in 1:100) {
  for (j in 1:length(pp)) {
    beta <- rep(0, pp[j])
    # beta1 == 1, beta0, beta2, ... betap -> 0 
    beta[1] <- 1
    x <- matrix(rnorm(n*pp[j]), n, pp[j])
    # x %*% beta is same as x[, 1]
    y <- x %*% beta + rnorm(n)
    g <- lm(y~x)
    B[i,j] <- g$coef[2]
  }
}

# Visualization of E(beta_{ols})
boxplot(B, col="orange", boxwex=0.6, ylab="Coefficient estimates",
        names=pp, xlab="The number of predictors", ylim=c(-5,5))
abline(h=1, col=2, lty=2, lwd=2)
apply(B, 2, mean)
apply(B, 2, var)

 

  • This means the mean of \(\hat{\beta}^{OLS}\) is located 1 approximately.
  • If the number of variable is increasing(p is increasing), then the variance becomes higher. 

 

 

'Data Science > R' 카테고리의 다른 글

[R] Variable Selection Methods : Ridge  (0) 2022.10.05
[R] Best Subset Selection  (0) 2022.10.05
[R] Cross Validation  (0) 2022.10.05
[R] Assessing Model Accuracy  (0) 2022.10.05
[R] Flexibility and Interpretability  (0) 2022.10.05