[R] Supervised Learning

1. Model based on Supervised Learning

Ideal model : \(Y = f(X) + \epsilon\)
Good \(f(X)\) can make predictions of \(Y\) at new points \(X = x\).
Statistical Learning refers to a set of approaches for estimating the function \(f(X)\).

# Indexing without index
AD <- Advertising[, -1]

# Multiple linear regression 
lm.fit <- lm(sales ~. AD) 
summary(lm.fit) 
names(lm.fit) 
coef(lm.fit) 
confint(lm.fit) 

# Visualizing models 
par(mfrow=c(2,2))
plot(lm.fit) 

dev.off() 
plot(predict(lm.fit), residual(lm.fit)) 
plot(predict(lm.fit), rstudent(lm.fit)) 
plot(hatvalues(lm.fit))
which.max(hat.values(lm.fit))

2. Estimation of \(f\) for Prediction

\(\hat{Y} = \hat{f}(X)\)
\(\hat{f}\) : Estimation for \(f\).
\(\hat{Y}\) : Prediction for \(Y\).
Ideal function \(f(X)\) is \(f(X) = E(Y|X=x)\).
Reducible error : \(E[(f(x) - \hat{f}(x))^2]\)
Irreducible error : \(\epsilon = Y - f(x)\)
Statistical learning techniques for estimating \(f\) is minimizing reducible error.
Statistical learning is the way finding \(\hat{f}\) which is the most similar function to \(f\).

3. [Ex] Income Data

# Load Datasets 
url.in <- "https://www.statlearning.com/s/Income1.csv"
Income <- read.csv(url.in, h=T)

# Polynomial regression fit 
par(mfrow = c(1,2)) 
plot(Income~Education, col=2, pch=19, xlab="Years of Education", 
     ylab="Income", data=Income) 

g <- lm(Income ~ ploy(Education, 3), data=Income) 
plot(Income~Education, col=2, pch=19, xlab="Years of Education", 
     ylab="Income", data=Income)
lines(Income$Education, g$fit, col="darkblue", lwd=4, ylab="Income", 
      xlab="Years of Education")
      
# Compare residuals 
y <- Income$Income
mean((predict(g) - y)^2) 
mean(residuals(g)^2)

# Polynomial fit with multiple hyperparameter 
dist <- NULL
par(mfrow=c(3,4)) 
for (k in 1:12) { 
    g <- lm(Income ~ poly(Education, k), data=Income) 
    dist[k] <- mean(residual(g)^2)
    plot(Income~Education, col=2, pch=19, xlab="Years of Education", ylab="Income",
         data=Income, main=paste("k =", k)) 
    lines(Income$Education, g$fit, col="darkblue", lwd=3, ylabe="Income", xlab="Years of Education")
}

x11()
plot(dist, type="b", xlab="Degree of Polynomial", 
     ylab="Mean squared distance")

저작자표시 (새창열림)

'Data Science > R' 카테고리의 다른 글

[R] Linear Model (0)	2022.10.05
[R] Cross Validation (0)	2022.10.05
[R] Assessing Model Accuracy (0)	2022.10.05
[R] Flexibility and Interpretability (0)	2022.10.05
[R] Introduction to Statistical Learning (0)	2022.10.05

See the forest

[R] Supervised Learning

1. Model based on Supervised Learning

2. Estimation of \(f\) for Prediction

3. [Ex] Income Data

'Data Science > R' 카테고리의 다른 글

티스토리툴바

[R] Supervised Learning

1. Model based on Supervised Learning

2. Estimation of \(f\) for Prediction

3. [Ex] Income Data

'Data Science > R' 카테고리의 다른 글

'Data Science/R' Related Articles

티스토리툴바