1. What is Non-Linear Models
- Kinds of non-linear models :
- Polynomials
- Step functions
- Splines
- Local regression
- Generalized Additive Models(GAM)
- Population : \(y_i = \sum_{j=1}^p f_j(x_j) + \epsilon_i\)
- Inference : \(\hat{y}_i = \sum_{j=1}^p \hat{f}_j(x_j)\)
2. Polynomial Regression
- Not really intersted in the coefficients.
- More interested in fitted function values.
- \(\hat{f}(x_0) = \hat{\beta_0} + \hat{\beta_1}x_0 + ... + \hat{\beta_4}x_0^4\)
- Compute the fit and pointwise standard error(confidence interval) : \(\hat{f}(x_0) +- 2se[\hat{f}(x_0)]\).
2.1 [Ex] Polynomial Regression of wage dataset
# install.packages('ISLR')
# Import library
library(ISLR)
attach(Wage)
# Orthogonal polynomials:
# Each column is a linear orthogonal combination of
# age, age^2, age^3 and age^4
fit <- lm(wage ~ poly(age, 4), data=Wage)
summary(fit)
# Direct power of age
fit2 <- lm(wage ~ poly(age, 4, raw=T), data=Wage)
summary(fit2)
fit2a <- lm(wage ~ age + I(age^2) + I(age^3) + I(age^4), Wage)
fit2b <- lm(wage ~ cbind(age, age ^2, age^3, age^4), Wage)
round(data.frame(fit=coef(fit), fit2=coef(fit2),
fit2a=coef(fit2a), fit2b=coef(fit2b)), 5)
# Testing datasets
age.grid <- seq(min(age), max(age))
# se=TRUE, extract sd(f(x)) from fhat
preds <- predict(fit, newdata=list(age=age.grid), se=TRUE)
# Calcualte confidence band
se.bands <- cbind(preds$fit+2*preds$se.fit, preds$fit-2*preds$se.fit)
# Visualize results
plot(age, wage, xlim=range(age), cex=.5, col="darkgrey")
title("Degree-4 Polynomial", outer=T)
lines(age.grid, preds$fit, lwd=2, col="darkblue")
matlines(age.grid, se.bands, lwd=2, col="darkblue", lty=2)
- The more narrower width of confidence bands, the more stable values of prediction we can get.
- Checking the confidence interval graph, it can be seen that the area of the interval was narrow for the data that existsed in the training dataset, but the prediction accuracy was poor for the data that was completely new.
# Orthogonal vs. Non-orthogonal polynomial regression
preds2 <- predict(fit2, newdata=list(age=age.grid), se=TRUE)
data.frame(fit=preds$fit, fit2=preds2$fit)
sum(abs(preds$fit-preds2$fit))
- The difference of prediction between orthogonal vs non-orthogonal polynomial regression is just 1.103956e-09.
2.2 [Ex] Choose optimal polynomial terms : Statistics
# Anova test to find the optimal polynomial degree
fit.1 <- lm(wage ~ age, data=Wage)
fit.2 <- lm(wage ~ poly(age, 2), data=Wage)
fit.3 <- lm(wage ~ poly(age, 3), data=Wage)
fit.4 <- lm(wage ~ poly(age, 4), data=Wage)
fit.5 <- lm(wage ~ poly(age, 5), data=Wage)
g <- anova(fit.1, fit.2, fit.3, fit.4, fit.5)
g
- The ANOVA test will perform comparison of two models.
- In case of Model 2,
- \(H_0 : \beta_2 = 0\) : Model 1 is more preferable.
- \(H_1 : \beta_2 \neq 0\) : Model 2 is more preferable.
- We can choose adequate polynomial terms from this test.
# Perform T-test
coef(summary(fit.5))
round(coef(summary(fit.5)), 5)
# T-test^2 = F-test
summary(fit.5)$coef[-c(1, 2), 3]
summary(fit.5)$coef[-c(1, 2), 3]^2
g$F[-1]
- With T-test with fit.5 models, we can choose optimal polynomial terms.
- The preferable polynomial terms will be 3(\(\beta_3\)).
2.3 [Ex] Choose optimal polynomial terms : K-fold CV simulation
# 10-fold cross-validation to choose the optimal polynomial
set.seed(1111)
N <- 10 # simulation replications
K <- 10 # 10-fold CV
# Simulation Error matrix
CVE <- matrix(0, N, 10)
# Model training - Calculate test error
for (k in 1:N) {
gr <- sample(rep(seq(K), length=nrow(Wage)))
# Cross Validation Error matrix
pred <- matrix(NA, nrow(Wage), 10)
for (i in 1:K) {
tran <- (gr != i)
test <- (gr == i)
for (j in 1:10) {
g <- lm(wage ~ poly(age, j), data=Wage, subset=tran)
yhat <- predict(g, data.frame(poly(age, j)))
mse <- (Wage$wage - yhat)^2
pred[test, j] <- mse[test]
}
}
CVE[k, ] <- apply(pred, 2, mean)
}
RES <- apply(CVE, 2, mean)
RES
# Visualize Result
par(mfrow=c(1,2))
matplot(t(CVE), type="l",xlab="Degrees of Polynomials ",
ylab="Mean Squared Error")
plot(seq(10), RES, type="b", col=2, pch=20,
xlab="Degrees of Polynomials ", ylab="Mean Squared Error")
- When we check the graph, we can see that the CVE value significantly decreases when the polynomial term increase from 1 to 2.
- When building a model, it is important to use the lease expensive one with statistically significant results, rather than choosing the one with the lowest CVE value.
2.4 [Ex] Logistic Regression
# Logistic regression using binary response
# 1 for wage > 250 and 0 for wage <= 250
fit <- glm(I(wage>250) ~ poly(age, 4), Wage, family="binomial")
# Predict dataset and calculate confidence bands
preds <- predict(fit, newdata=list(age=age.grid), se=T)
# logit transformation
pfit <- exp(preds$fit) / (1 + exp(preds$fit))
# calculate confidence bands
se.bands.logit <- cbind(preds$fit + 2*preds$se.fit, preds$fit - 2*preds$se.fit)
se.bands <- exp(se.bands.logit)/(1 + exp(se.bands.logit))
# Another function
preds2 <- predict(fit, newdata=list(age=age.grid), type="response", se=T)
cbind(pfit, preds2$fit)
# Visualize result of logistic regression
plot(age , I(wage > 250), xlim=range(age), type="n", ylim=c(0, .2))
points(jitter(age), I((wage>250)/5), cex=.5, pch="|", col="darkgrey")
lines(age.grid, pfit, lwd=2, col="darkblue")
matlines(age.grid, se.bands, lwd=2, col="darkblue", lty=2)
When the wage target was changed to TRUE/FALSE classification problem based on 250 and the model was fit, it can be seen that the confidence interval was narrow in the case of values existing in the training dataset, but the width of the confidence interval increased significantly.
'Data Science > R' 카테고리의 다른 글
[R] Non-Linear Models : Splines (0) | 2022.11.14 |
---|---|
[R] Non-Linear Models : Step Functions (0) | 2022.11.14 |
[R] Classification Problem : QDA, Naive Bayes, KNN (0) | 2022.11.06 |
[R] Assessment of the Performance of Classifier (0) | 2022.11.05 |
[R] Classification Problem : LDA(Linear Discriminant Analysis) (0) | 2022.10.31 |