1. Regularization of Logistic Regression
Because we don't know how many theta can affect overfitting, we make all theta become small.
$$ \left(J(\theta )=\frac{1}{2m}\sum _{i=1}^m(h_{\theta }(x^{(i)})-y^{(i)})^2+\lambda \sum _{j=1}^m\theta _j^2)\right) $$
\(\lambda\) is called the regularization parameter which controls a trade off between two different goals. The first goal is that we would like to fit the training set well. The second goal is we want to keep the parameters small. If \(\lambda\) is chosen to be too large, it may cuase underfitting, in contrast, it may cause overfitting as well.
To regularize overfitting problem, we need to change parameter in gradient descent as well.
$$ \theta _j\ :=\theta _j-\alpha \frac{1}{m}\sum _{i=1}^m(h_{\theta }(x^i)-y^i)\times x_j^i $$
$$ j=0,....,n $$
But in cost function for solving overfitting problem, cost function apply \(\lambda \times \theta^2\) above \(j = 1\). So we need to change gradient same as cost function.
$$ \theta _0:=\theta _0-\alpha \frac{1}{m}\sum _{i=1}^m(h_{\theta }(x^i)-y^i)\times x_0^{(i)} $$
$$ \theta _j\ :=\theta _j-\alpha \left[{\left(\frac{1}{m}\sum _{i=1}^m(h_{\theta }(x^i)-y^i)\times x_j^i\right)+\frac{\lambda }{m}\theta _j}\right] $$
$$ j=1,....,n $$
$$ \theta _j\\ :=\theta _j(1-\alpha \frac{\lambda }{m})-\alpha \frac{1}{m}\sum _{i=1}^m(h_{\theta }(x^{(i)}-y^{(i)})\\ x_j^{(i)} $$
Because \((1-\alpha \times \frac{\lambda}{m})\) will always be less than 1, So we can reduce the value of theta by some amount on every update.
In normal equation, the equation is the same as our original, except that we add another term inside the parenthese.
$$ \theta =(X^TX+\lambda *L)^{-1}X^Ty $$
$$ L=\begin{bmatrix}0&&&&\\&1&&&\\&&1&&\\&&&...&\\&&&&1\end{bmatrix} $$
In this equation we can regulate overfitting problem + non-invertible problem.
2. Regularization of Logistic Regression
We can logistic regression in a similar way that we regularize linear regression. We can regularize logistic cost function by adding term to the end :
$$ J(\theta )=-\frac{1}{m}[\sum _{i=1}^m\\ y^{(i)}\log h_{\theta }(x^{(i)})+(1-y^{(i)})\log (1-h_{\theta }(x^{(i)})]+\frac{\lambda }{2m}\sum _{j=1}^n\theta _j^2 $$
Gradient descent of logistic regression is same with those of linear regression.
$$ \theta _0:=\theta _0-\alpha \frac{1}{m}\sum _{i=1}^m(h_{\theta }(x^i)-y^i)\times x_0^{(i)} $$
$$ \theta _j\ :=\theta _j-\alpha \left[{\left(\frac{1}{m}\sum _{i=1}^m(h_{\theta }(x^i)-y^i)\times x_j^i\right)+\frac{\lambda }{m}\theta _j}\right] $$
$$ j=1,....,n $$
$$ \theta _j\\ :=\theta _j(1-\alpha \frac{\lambda }{m})-\alpha \frac{1}{m}\sum _{i=1}^m(h_{\theta }(x^{(i)}-y^{(i)})\\ x_j^{(i)} $$
$$ \\ h_{\theta }(x)=\frac{1}{1+e^{-\theta ^Tx}} $$
'Data Science > Classification' 카테고리의 다른 글
[Models] Classification Models (0) | 2022.09.20 |
---|---|
[Theorem] Bias vs Variance (0) | 2022.09.19 |
[Theorem] Validation Sets (1) | 2022.09.19 |
[Theorem] Overfitting (0) | 2022.09.19 |
[Theorem] Logistic Regression (1) | 2022.09.19 |