본문 바로가기

Data Science/R

[R] Useful Functions for Regression Problems

1. model.matrix()

  • Make dummy variable with category and intercept : model.matrix(~., x)
  • Make dummy variable with category not intercept : mdoel.matirx(~., x)[, -1]
  • Make prediciton from regsubsets, glm, or lm function : model.matrix(~., x) %*% coef(g, id=i)
  • Make prediction from glmnet : model.matrix(~., x) %*% coef(g, s=g$lambda[0])

 

 

The model.matrix function convert original data into categorical encoded columns and intercepts. To apply this metrx to glmnet, we need to input matrix without intercept.(beacuse inner argument of intercept is True). However, if we calculate prediction from the result of coef function, then we need to multiply with matrix with intercept.

 

 

2. regsubsets()

  • Training model : g <- regsubsets(x, y, nvmax=15, nbest=1)
  • Make prediction for regsubsets : model.matrix(~., x) %*% coef(g, id=i)
  • Get parameters
    • RSS : g$rss
    • \(R^2\) : g$rsq
    • \(C_p\) : g$cp
    • AIC : g$aic
    • BIC : g$bic
    • Adjusted \(R^2\) : g$adjr2

 

3. glmnet()

3.1 Training model

  • Linear regression : g <- glmnet(x, y, lambda = lambda)
    • Ridge : g <- glmnet(x, y, alpha=0, lambda=lambda)
    • Lasso : g <- glmnet(x, y, alpha=1, labmda=lambda)
  • Binary regression : g <- glmnet(x, y, family="binomial")

 

3.2 Check model attributes

  • \(\lambda\) : g$lambda
  • \(\beta\) : g$beta
  • degree of freedom(lasso) : g$df
  • coefficients : coef(g, s=g$lambda[0])
  • Get names of coefficients : coefi@Dimnames[[1]][which(coefi != 0)][-1] (coefi = coef(g, s=g$lambda[0])

 

 

3.3 Predict result

  • Linear model
    • Result of prediction : predict(g, s=g$lambda[0], newx=test_X)
    • Result of prediction using coef : model.matrix(~., test_X) %*% coef(g, s=g$lambda[0])
  • Binary result
    • Result as probability : predict(g, s=g$lambda[0], type="response")
    • Result as class : predict(g, s=g$lambda[0], type="class")
    • Result as deviance : -2*(yval(log(prob) + (1-yval)*log(1-prob))

 

4. cv.glmnet()

4.1 Linear Regression

  • Training model : gcvl <- cv.gmlnet(x, y, alpha=0, lambda=lam)
  • Get parameters
    • The mean value of Cross Validaiton : gcvl$cvm
    • The standard deviation of Cross Validation : gcvl$cvsd
    • The upperbound of CVE : gcvl$cvup
    • The lowerbound of CVE : gcvl$cvlo
    • Lambda min : gcvl$lambda.min
    • Lambda 1se : gcvl$lambda.1se

 

4.2 Binary Classification 

  • Training model : gcvl <- cv.glmnet(x, y, family="binomial", lambda=lam, nfolds=5, type.measure="class")
  • Training model with personal folds : gcvl <- cv.glmnet(x, y, family="binomial", foldid=folds, type.measure="auc")
  • Plot trained model with one-standarde rule : plot(gcvl)
  • Get parameters
    • The mean value of Cross Validaiton : gcvl$cvm
    • The standard deviation of Cross Validation : gcvl$cvsd
    • The upperbound of CVE : gcvl$cvup
    • The lowerbound of CVE : gcvl$cvlo
    • Lambda min : gcvl$lambda.min
    • Lambda 1se : gcvl$lambda.1se