본문 바로가기

Data Science/R

[R] Introduction to Statistical Learning

1. Definitions of Statistical Learning

  • Statistical Learning is a set of tools for modeling and understanding complex datasets.
  • Supervised Statistical Learning builds a statistical model for predicting or estimating for data with output based on one or more inputs.
  • Unsupervised Statistical Learning learns relationships and structure from data that has inputs but no supervising output.

 

2. Supervised Learning Problem

  • Outcome measurement \(Y\) (dependent variable, response, target)
  • Vector of p predictor measurement \(X\) (independent variables, inputs, regressors, features)
  • Regression problem : \(Y\) is quantitative
  • Classification problem : \(Y\) takes values in a finite, unorder set.

 

3. Statistical Learning vs Machine Learning

  • Machine Learning has a greater emphasis on large scale applications and prediction accuracy.
  • Statistical Learning emphasizes models, interpretability, precision and uncertainty.

 

4. [Ex] Advertising Data

# Open the dataset linked to the book website 
url.ad <- "https://www.statlearning.com/s/Advertising.csv"
Advertising <- read.csv(url.ad, h=T)
attach(Advertising)

# Least square fit for simple linear regression 
par(mfrow = c(1,3))
plot(sales~TV, col=2, xlab="TV", ylab="Sales")
abline(lm(sales~TV)$coef, lwd=3, col="darkblue")

plot(sales~radio, col=2, xlab="Radio", ylab="Sales")
abline(lm(sales~radio)$coef, lwd=3, col="darkblue")

plot(sales~newspaper, col=2, xlab="Newspaper", ylab="Sales")
abline(lm(sales~newspaper)$coef, lwd = 3, co="darkblue")

 

Sales is a response or target that we wish to predict. TV is a feature, input, or predictor that we can call it as \(X_1\).

'Data Science > R' 카테고리의 다른 글

[R] Linear Model  (0) 2022.10.05
[R] Cross Validation  (0) 2022.10.05
[R] Assessing Model Accuracy  (0) 2022.10.05
[R] Flexibility and Interpretability  (0) 2022.10.05
[R] Supervised Learning  (0) 2022.10.05