본문 바로가기

Data Science/Classification

[Theorem] Regularization

1. Regularization of Logistic Regression

Because we don't know how many theta can affect overfitting, we make all theta become small.

 

(J(θ)=12mi=1m(hθ(x(i))y(i))2+λj=1mθj2))

 

λ is called the regularization parameter which controls a trade off between two different goals. The first goal is that we would like to fit the training set well. The second goal is we want to keep the parameters small. If λ is chosen to be too large, it may cuase underfitting, in contrast, it may cause overfitting as well.

 

To regularize overfitting problem, we need to change parameter in gradient descent as well.

 

θj :=θjα1mi=1m(hθ(xi)yi)×xji

j=0,....,n

 

But in cost function for solving overfitting problem, cost function apply λ×θ2 above j=1. So we need to change gradient same as cost function.

 

θ0:=θ0α1mi=1m(hθ(xi)yi)×x0(i)

θj :=θjα[(1mi=1m(hθ(xi)yi)×xji)+λmθj]

j=1,....,n

θj:=θj(1αλm)α1mi=1m(hθ(x(i)y(i))xj(i)

Because (1α×λm) will always be less than 1, So we can reduce the value of theta by some amount on every update.

 

In normal equation, the equation is the same as our original, except that we add another term inside the parenthese.

 

θ=(XTX+λL)1XTy

L=[011...1]

 

In this equation we can regulate overfitting problem + non-invertible problem.

 

2. Regularization of Logistic Regression

We can logistic regression in a similar way that we regularize linear regression. We can regularize logistic cost function by adding term to the end :

 

J(θ)=1m[i=1my(i)loghθ(x(i))+(1y(i))log(1hθ(x(i))]+λ2mj=1nθj2

 

Gradient descent of logistic regression is same with those of linear regression.

 

θ0:=θ0α1mi=1m(hθ(xi)yi)×x0(i)

 

θj :=θjα[(1mi=1m(hθ(xi)yi)×xji)+λmθj]

 

j=1,....,n

 

θj:=θj(1αλm)α1mi=1m(hθ(x(i)y(i))xj(i)

 

hθ(x)=11+eθTx

'Data Science > Classification' 카테고리의 다른 글

[Models] Classification Models  (0) 2022.09.20
[Theorem] Bias vs Variance  (0) 2022.09.19
[Theorem] Validation Sets  (1) 2022.09.19
[Theorem] Overfitting  (0) 2022.09.19
[Theorem] Logistic Regression  (1) 2022.09.19