1. What is Hypothesis function?
In Supervised Learning, we use 'Regression Algorithm' when we meet problem such as predicting continuous output. Using knowing data x, y in linear regression, we can predict \(y(n)\) when we have \(x(n)\) and function of \((x, y)\). Below is the function of \((x, y)\) when we have one variable.
$$ H_{\theta}(x)=Y=\theta _0 + \theta _1 X $$
- \(m\) : number of records
- \(x\) : input data
- \(y\) : output data
- \(\theta\) : parameter
So, we make good predictor \(h : X \to Y\) to predict output value.
2. Cost Function, Squared Error Function
To measure the accuracy of our hypothesis function, we use cost function. This takes an average differnece of all the result of the hypothesis with input from \(x\) and the actual output \(y\).
$$J(\theta _0,\theta _1)=\frac{1}{2m}\sum _1^m(h_{\theta }(x_i)-y_i)^2=\frac{1}{2m}\sum _1^m((\theta _0+\theta _1x)-y_i) $$
The goal of the cost function is minimize function \(J\).
3. How to find accurate parameter
$$ H_{\theta}(x)=Y=\theta _0 + \theta _1 X $$
$$J(\theta _0,\theta _1)=\frac{1}{2m}\sum _1^m(h_{\theta }(x_i)-y_i)^2=\frac{1}{2m}\sum _1^m((\theta _0+\theta _1x)-y_i) $$
Between above two function, \(H(x)\) is the function of \(x\). In contrast, the cost function, \(J(\theta)\) is the function of theta. The optimization objective for our algorithm is to choose the value of theta which minimizes \(J\) of parameters. \(J\) means accuracy between Hypothesis function \(H\) and real data \(Y\).
If we find 'good' parameters, \(J\) will be zero and also Hypothesis of input will be more accurate output.
4. Way of finding accurate parameter : Gradient Descent
Gradient Descent is the way finding parameters by using simultaneously partial derivative terms. By using gradient descent, we can go to optimization point step by step and get \(\theta\).
$$ \theta _j\ :=\theta _j-\alpha \frac{\partial }{\partial \theta _j}J(\theta _0,\theta _1) $$
- Operator(:=) : This means in a computer, take the value in \(\theta_j\) and use it overwrite whatever value is \(theta_j\).
- \(\alpha\) : The learning rate. Control how big a step we take downhill with creating descent. If alpha is very large, then corresponds to a very aggressive gradient.
Learning rate became smaller when derivative term's gradient descent goes smaller. As we approach a local minimum, gradient descent will automatically take smaller step. We don't need to worry about how select steps.
5. Summarization of Linear Regression
Our object of Linear Regression Model J(theta0, theta1) is finding theta0, theta1. We can connect Gradient descent algorithm with Hyphotesis H.
$$ H_{\theta }(x)=Y=\theta _0\ +\ \theta _1 $$
$$ J(\theta _0,\theta _1)=\frac{1}{2m}\sum _1^m(h_{\theta }(x_i)-y_i)^2=\frac{1}{2m}\sum _1^m((\theta _0+\theta _1x)-y_i)^2 $$
After connecting two terms, we can get output such as below.
$$ \theta _0\ :=\theta _0-\alpha \frac{1}{m}\sum _1^m(h_{\theta }(x_i)-y_i) $$
$$ \theta _1\ :=\theta _1-\alpha \frac{1}{m}\sum _1^m(h_{\theta }(x_i)-y_i)\times x_1 $$
Last page's Question was 'if step stop at local maximum, what if that position is not global maximum?'. It doesn't matter because the cost function, J, is 'Covex function' which is a bowl shaped function always having one global minimum.
'Data Science > Regression' 카테고리의 다른 글
[Models] Regression Models (0) | 2022.09.20 |
---|---|
[Models] Underfitting and Overfitting (0) | 2022.09.20 |
[Models] How to make model (0) | 2022.09.20 |
[Theorem] Multivariate Linear Regression (2) | 2022.09.19 |