Ataraxia through Epoché: [ML] Overfitting and Regularization

Overfitting:

With high variance function, it might fit to the training data set and makes the J(ϴ) minimum,
but, it's not a good regression function since it variates too much.
This is called overfitting(i.e High variance)

Recap:

underfit, high bias linear function
just right linear function
overfit, high variance function

Using sigmoid function g(h(x)), might generate a underfit.
Using a fit linear function , it's a good fit.
Using a complex high variance linear function, which causing overfitting.

How to address overfitting:
1. too many features and not that many training data.
So we need to reduce number of features. (Might throw meaningful feature away.)

Manually select which features to keep.
Model selection algorithm.

2. Regularization.
- Keep all the features, but reduce magnitude/values of parameters ϴ.
- Works well when we have a lot of features, each of which contributes a bit to predicting y.

--------
Regularization, cost function:

Small values for parameters ϴ0, ϴ1, ..., ϴn

"simpler" hypothesis
Less prone to overfitting

λ : regularization parameter. Control those 2 item:

fit the training data well
make the ϴ small.
[Explain with my understanding: ϴ is the gradient, i.e less dramatic change of y while x changes.]

If we choose λ too large, algorithm results in underfitting
(fails to fix even the training set).
Reason: while λ too large, we penalize all the ϴs, thus make the liear function too flat, which causing underfit.

i.e: while setting original J(ϴ) = 1, since we have large λ, causing J(ϴ) becomes large,
thus J(ϴ) will become more smaller to have the Cost function J(ϴ) to be fit.
That this is underfit.