With high variance function, it might fit to the training data set and makes the J(ϴ) minimum,
but, it's not a good regression function since it variates too much.
This is called overfitting(i.e High variance)
Recap:
- underfit, high bias linear function
- just right linear function
- overfit, high variance function
- Using sigmoid function g(h(x)), might generate a underfit.
- Using a fit linear function , it's a good fit.
- Using a complex high variance linear function, which causing overfitting.
How to address overfitting:
1. too many features and not that many training data.
So we need to reduce number of features. (Might throw meaningful feature away.)
- Manually select which features to keep.
- Model selection algorithm.
2. Regularization.
- Keep all the features, but reduce magnitude/values of parameters ϴ.
- Works well when we have a lot of features, each of which contributes a bit to predicting y.
--------
Regularization, cost function:
Small values for parameters ϴ0, ϴ1, ..., ϴn
- "simpler" hypothesis
- Less prone to overfitting
λ : regularization parameter. Control those 2 item:
- fit the training data well
- make the ϴ small.
[Explain with my understanding: ϴ is the gradient, i.e less dramatic change of y while x changes.]
If we choose λ too large, algorithm results in underfitting
(fails to fix even the training set).
Reason: while λ too large, we penalize all the ϴs, thus make the liear function too flat, which causing underfit.
i.e: while setting original J(ϴ) = 1, since we have large λ, causing J(ϴ) becomes large,
thus J(ϴ) will become more smaller to have the Cost function J(ϴ) to be fit.
That this is underfit.
-----------------
Regularization Linear Regression:
Gradient descent:
Normal Equation:
Non-invertibility:
----------------
Regularized Logistic regression:
Thus Gradient descent:
Advanced optimization:
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.