[Theorem] Regularization

1. Regularization of Logistic Regression

Because we don't know how many theta can affect overfitting, we make all theta become small.

$(J (θ) = \frac{1}{2 m} \sum_{i = 1}^{m} (h_{θ} (x^{(i)}) - y^{(i)})^{2} + λ \sum_{j = 1}^{m} θ_{j}^{2}))$

$λ$ is called the regularization parameter which controls a trade off between two different goals. The first goal is that we would like to fit the training set well. The second goal is we want to keep the parameters small. If $λ$ is chosen to be too large, it may cuase underfitting, in contrast, it may cause overfitting as well.

To regularize overfitting problem, we need to change parameter in gradient descent as well.

$θ_{j} := θ_{j} - α \frac{1}{m} \sum_{i = 1}^{m} (h_{θ} (x^{i}) - y^{i}) \times x_{j}^{i}$

$j = 0, . . . ., n$

But in cost function for solving overfitting problem, cost function apply $λ \times θ^{2}$ above $j = 1$ . So we need to change gradient same as cost function.

$θ_{0} := θ_{0} - α \frac{1}{m} \sum_{i = 1}^{m} (h_{θ} (x^{i}) - y^{i}) \times x_{0}^{(i)}$

$θ_{j} := θ_{j} - α [(\frac{1}{m} \sum_{i = 1}^{m} (h_{θ} (x^{i}) - y^{i}) \times x_{j}^{i}) + \frac{λ}{m} θ_{j}]$

$j = 1, . . . ., n$

$θ_{j} := θ_{j} (1 - α \frac{λ}{m}) - α \frac{1}{m} \sum_{i = 1}^{m} (h_{θ} (x^{(i)} - y^{(i)}) x_{j}^{(i)}$

Because $(1 - α \times \frac{λ}{m})$ will always be less than 1, So we can reduce the value of theta by some amount on every update.

In normal equation, the equation is the same as our original, except that we add another term inside the parenthese.

$θ = (X^{T} X + λ * L)^{- 1} X^{T} y$

$L = [\begin{matrix} 0 \\ 1 \\ 1 \\ . . . \\ 1 \end{matrix}]$

In this equation we can regulate overfitting problem + non-invertible problem.

2. Regularization of Logistic Regression

We can logistic regression in a similar way that we regularize linear regression. We can regularize logistic cost function by adding term to the end :

$J (θ) = - \frac{1}{m} [\sum_{i = 1}^{m} y^{(i)} \log h_{θ} (x^{(i)}) + (1 - y^{(i)}) \log (1 - h_{θ} (x^{(i)})] + \frac{λ}{2 m} \sum_{j = 1}^{n} θ_{j}^{2}$

Gradient descent of logistic regression is same with those of linear regression.

$θ_{0} := θ_{0} - α \frac{1}{m} \sum_{i = 1}^{m} (h_{θ} (x^{i}) - y^{i}) \times x_{0}^{(i)}$

$θ_{j} := θ_{j} - α [(\frac{1}{m} \sum_{i = 1}^{m} (h_{θ} (x^{i}) - y^{i}) \times x_{j}^{i}) + \frac{λ}{m} θ_{j}]$

$j = 1, . . . ., n$

$θ_{j} := θ_{j} (1 - α \frac{λ}{m}) - α \frac{1}{m} \sum_{i = 1}^{m} (h_{θ} (x^{(i)} - y^{(i)}) x_{j}^{(i)}$

$h_{θ} (x) = \frac{1}{1 + e^{- θ^{T} x}}$

'Data Science > Classification' 카테고리의 다른 글

[Models] Classification Models (0)	2022.09.20
[Theorem] Bias vs Variance (0)	2022.09.19
[Theorem] Validation Sets (1)	2022.09.19
[Theorem] Overfitting (0)	2022.09.19
[Theorem] Logistic Regression (1)	2022.09.19

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

See the forest

[Theorem] Regularization

1. Regularization of Logistic Regression

2. Regularization of Logistic Regression

'Data Science > Classification' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

[Theorem] Regularization

1. Regularization of Logistic Regression

2. Regularization of Logistic Regression

'Data Science > Classification' 카테고리의 다른 글

'Data Science/Classification' Related Articles

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역