1. Definitions of Statistical Learning
- Statistical Learning is a set of tools for modeling and understanding complex datasets.
- Supervised Statistical Learning builds a statistical model for predicting or estimating for data with output based on one or more inputs.
- Unsupervised Statistical Learning learns relationships and structure from data that has inputs but no supervising output.
2. Supervised Learning Problem
- Outcome measurement \(Y\) (dependent variable, response, target)
- Vector of p predictor measurement \(X\) (independent variables, inputs, regressors, features)
- Regression problem : \(Y\) is quantitative
- Classification problem : \(Y\) takes values in a finite, unorder set.
3. Statistical Learning vs Machine Learning
- Machine Learning has a greater emphasis on large scale applications and prediction accuracy.
- Statistical Learning emphasizes models, interpretability, precision and uncertainty.
4. [Ex] Advertising Data
# Open the dataset linked to the book website
url.ad <- "https://www.statlearning.com/s/Advertising.csv"
Advertising <- read.csv(url.ad, h=T)
attach(Advertising)
# Least square fit for simple linear regression
par(mfrow = c(1,3))
plot(sales~TV, col=2, xlab="TV", ylab="Sales")
abline(lm(sales~TV)$coef, lwd=3, col="darkblue")
plot(sales~radio, col=2, xlab="Radio", ylab="Sales")
abline(lm(sales~radio)$coef, lwd=3, col="darkblue")
plot(sales~newspaper, col=2, xlab="Newspaper", ylab="Sales")
abline(lm(sales~newspaper)$coef, lwd = 3, co="darkblue")
Sales is a response or target that we wish to predict. TV is a feature, input, or predictor that we can call it as \(X_1\).
'Data Science > R' 카테고리의 다른 글
[R] Linear Model (0) | 2022.10.05 |
---|---|
[R] Cross Validation (0) | 2022.10.05 |
[R] Assessing Model Accuracy (0) | 2022.10.05 |
[R] Flexibility and Interpretability (0) | 2022.10.05 |
[R] Supervised Learning (0) | 2022.10.05 |