[R] Useful Functions for Regression Problems

1. model.matrix()

Make dummy variable with category and intercept : model.matrix(~., x)
Make dummy variable with category not intercept : mdoel.matirx(~., x)[, -1]
Make prediciton from regsubsets, glm, or lm function : model.matrix(~., x) %*% coef(g, id=i)
Make prediction from glmnet : model.matrix(~., x) %*% coef(g, s=g$lambda[0])

The model.matrix function convert original data into categorical encoded columns and intercepts. To apply this metrx to glmnet, we need to input matrix without intercept.(beacuse inner argument of intercept is True). However, if we calculate prediction from the result of coef function, then we need to multiply with matrix with intercept.

2. regsubsets()

Training model : g <- regsubsets(x, y, nvmax=15, nbest=1)
Make prediction for regsubsets : model.matrix(~., x) %*% coef(g, id=i)
Get parameters
- RSS : g$rss
- $R^2$ : g$rsq
- $C_p$ : g$cp
- AIC : g$aic
- BIC : g$bic
- Adjusted $R^2$ : g$adjr2

3. glmnet()

3.1 Training model

Linear regression : g <- glmnet(x, y, lambda = lambda)
- Ridge : g <- glmnet(x, y, alpha=0, lambda=lambda)
- Lasso : g <- glmnet(x, y, alpha=1, labmda=lambda)
Binary regression : g <- glmnet(x, y, family="binomial")

3.2 Check model attributes

$\lambda$ : g$lambda
$\beta$ : g$beta
degree of freedom(lasso) : g$df
coefficients : coef(g, s=g$lambda[0])
Get names of coefficients : coefi@Dimnames[[1]][which(coefi != 0)][-1] (coefi = coef(g, s=g$lambda[0])

3.3 Predict result

Linear model
- Result of prediction : predict(g, s=g$lambda[0], newx=test_X)
- Result of prediction using coef : model.matrix(~., test_X) %*% coef(g, s=g$lambda[0])
Binary result
- Result as probability : predict(g, s=g$lambda[0], type="response")
- Result as class : predict(g, s=g$lambda[0], type="class")
- Result as deviance : -2*(yval(log(prob) + (1-yval)*log(1-prob))

4. cv.glmnet()

4.1 Linear Regression

Training model : gcvl <- cv.gmlnet(x, y, alpha=0, lambda=lam)
Get parameters
- The mean value of Cross Validaiton : gcvl$cvm
- The standard deviation of Cross Validation : gcvl$cvsd
- The upperbound of CVE : gcvl$cvup
- The lowerbound of CVE : gcvl$cvlo
- Lambda min : gcvl$lambda.min
- Lambda 1se : gcvl$lambda.1se

4.2 Binary Classification

Training model : gcvl <- cv.glmnet(x, y, family="binomial", lambda=lam, nfolds=5, type.measure="class")
Training model with personal folds : gcvl <- cv.glmnet(x, y, family="binomial", foldid=folds, type.measure="auc")
Plot trained model with one-standarde rule : plot(gcvl)
Get parameters
- The mean value of Cross Validaiton : gcvl$cvm
- The standard deviation of Cross Validation : gcvl$cvsd
- The upperbound of CVE : gcvl$cvup
- The lowerbound of CVE : gcvl$cvlo
- Lambda min : gcvl$lambda.min
- Lambda 1se : gcvl$lambda.1se

저작자표시 (새창열림)

'Data Science > R' 카테고리의 다른 글

[R] Regularization Methods (0)	2022.10.11
[R] Simulation Study : Prediction Performance (0)	2022.10.11
[R] Regularization Methods : Binary (0)	2022.10.05
[R] Variable Selection Methods : Lasso (0)	2022.10.05
[R] Variable Selection Methods : Ridge (0)	2022.10.05

See the forest

[R] Useful Functions for Regression Problems

1. model.matrix()

2. regsubsets()

3. glmnet()

3.1 Training model

3.2 Check model attributes

3.3 Predict result

4. cv.glmnet()

4.1 Linear Regression

4.2 Binary Classification

'Data Science > R' 카테고리의 다른 글

티스토리툴바

[R] Useful Functions for Regression Problems

1. model.matrix()

2. regsubsets()

3. glmnet()

3.1 Training model

3.2 Check model attributes

3.3 Predict result

4. cv.glmnet()

4.1 Linear Regression

4.2 Binary Classification

'Data Science > R' 카테고리의 다른 글

'Data Science/R' Related Articles

티스토리툴바