본문 바로가기

Data Science/Classification

[Theorem] Validation Sets

1. Decide what to do? 

In machine learning, errors can be raised sometimes even we set correct aglrotihm terms. If so, what should we try next? We can make some solutions following : 

 

  1. Get more training examples 
  2. Try smaller sets of features 
  3. Try getting additional features 
  4. Try adding polynomial features
  5. Try decreasing lambda 
  6. Try increasing lambda 

Then, how can we select in above solutions? So we need to evaluate machine learning algorithm by using macihne learning diagnostics. 

 

2. Hypothesis 

2.1 Linear Regression 

In overfitting problem of linear regression, we can't easily fix features if features are so many. So we divide examples 70% verses 30%, training set and test set. The new procedure of these two sets is same as below : 

 

For example, if in overfitting problem, training \(J(\Theta)\) becomes low, but test \(J(\Theta)\) becomes high. 

 

  • About 70% : \(\begin{bmatrix}(x^{(1)},y^{(1)})\\...\\(x^{(m)},y^{(m)})\end{bmatrix}\)
  • About 30% : \(\begin{bmatrix}(x_{test}^{(1)},y_{test}^{(1)})\\...\\(x_{test}^{(m)},y_{test}^{(m)})\end{bmatrix}\)

 

$$ J(\theta )=\frac{1}{2m}\sum _{i=1}^m(h_{\theta }(x^{(i)})-y^{(i)})^2 $$

$$ J(\theta )_{test}=\frac{1}{2m_{test}}\sum _{i=1}^{m_{test}}(h_{\theta }(x_{test}^{(i)})-y_{test}^{(i)})^2 $$

 

2.2 Logistic Regression 

In logistic regression, there are two problems. One is overfitting problem, and the other is misclassification error. 

 

The way of solving overfitting problem is same with its of linear regression, dividing examples into training set and test set. Misclassification error can know with average test error for test set. 

 

$$ J(\theta )=-\frac{1}{m}[\sum _{i=1}^m\\ y^{(i)}\log h_{\theta }(x^{(i)})+(1-y^{(i)})\log (1-h_{\theta }(x^{(i)})] $$

$$ J(\theta )_{test}=-\frac{1}{m_{test}}[\sum _{i=1}^{m_{test}}\\ y_{test}^{(i)}\log h_{\theta }(x_{test}^{(i)})+(1-y_{test}^{(i)})\log (1-h_{\theta }(x_{test}^{(i)})] $$

$$ Test\\ error=\\ \frac{1}{m_{test}}\sum _{i=1}^{m_{test}}err(h_{\theta }(x_{test}^{(i)}),\\ y^{(i)}) $$

 

3. Model Selection

In polynomial linear regression, there is a no way to know that our parameter work well to new examples. So we divide examples into three datasets.  Training sets(60%) + Validation sets(20%) + Test sets(20%). 

 

  • About 60% : \(\begin{bmatrix}(x^{(1)},y^{(1)})\\...\\(x^{(m)},y^{(m)})\end{bmatrix}\)
  • About 20% : \(\begin{bmatrix}(x_{val}^{(1)},y_{val}^{(1)})\\...\\(x_{val}^{(m)},y_{val}^{(m)})\end{bmatrix}\)
  • About 20% : \(\begin{bmatrix}(x_{test}^{(1)},y_{test}^{(1)})\\...\\(x_{test}^{(m)},y_{test}^{(m)})\end{bmatrix}\)

 

  • Train set : \(J(\theta )=\frac{1}{2m}\sum _{i=1}^m(h_{\theta }(x^{(i)})-y^{(i)})^2\)
  • CV set : \(J(\theta )_{val}=\frac{1}{2m_{val}}\sum _{i=1}^{m_{val}}(h_{\theta }(x_{val}^{(i)})-y_{val}^{(i)})^2\)
  • Test set : \(J(\theta )_{test}=\frac{1}{2m_{test}}\sum _{i=1}^{m_{test}}(h_{\theta }(x_{test}^{(i)})-y_{test}^{(i)})^2\)

So we can choose model within following method : 

  1. Optimize parameters in \(\theta\) using the training set. 
  2. Find the polynomial degree d with the least error using validation set. 
  3. Estimate the generalization error using test set. 

 

Difference with validation sets and test sets : The validation sets are used to compare their performances and decide to select a model among different models. The test sets are used to obtain the performance characteristics such as accuracy. So, Validations sets are part of training sets used in model building process. However, test sets can't. 

'Data Science > Classification' 카테고리의 다른 글

[Models] Classification Models  (0) 2022.09.20
[Theorem] Bias vs Variance  (0) 2022.09.19
[Theorem] Regularization  (0) 2022.09.19
[Theorem] Overfitting  (0) 2022.09.19
[Theorem] Logistic Regression  (1) 2022.09.19