Ataraxia through Epoché: [ML] Debugging a learning algorithm

Debugging a learning algorithm:

1. Get more training examples. Really? Think.
2. Try smaller sets of features.
3. Try getting additional features.
4. Try adding polynomial features
5. Try decreasing λ
6. Try increasing λ

Which to choose? Please don't randomky picking up one!
Think!

-----------------
How to evaluate algorithm:

While there are too many features, plotting the learning algorithm
isn't practical to see if the learning algorithm fits.

Idea:
Given a set of dataset, we could first seperate it into 2 portions.
one for training set, another one for test set. 7:3 split.
Randomly choose which data go to training set, which go to test set.

Training/testing procedure for linear regression:

Training/testing procedure for logistic regression:

---
Model selection:
Which polynomial function to choose?
To prevent overfit.

d = degree of polynomial.

d=1, ϴ(1) , compute test set: J(ϴ(1))
d=2, ϴ(2) , compute test set: J(ϴ(2))
...
d=10, ϴ(10) , compute test set: J(ϴ(10))

Choose the one generates min error as the polynomial function.

Say, we choose d=5.
Problem:
J(ϴ(5)) is likely to be an optimistic estimate of generalization error.
i.e our extra parameter(d=degree of polynomial) is fit to test set.

To address the problem:
1. Instead of split dataset into 2 portions, we split dataset into
3 portions.

2. Training set (6), Cross validation set(CV)(2), test set(2).

{Train/validataion/test} error:

Instead of testing with test set, test the picked ϴ for dn
with Cross validation set.

-----------------

Bias vs. Variance:

Question: How to differentiate the error is comming from high bias
or high variance?

Easy...

When d=1 or lower degree of polynomial function,
that means, the power of the variable is small, which won't
effect the error that much, thus, the ϴ(0) , i.e the bias,
play a more criticle role of the error.

When d=large, the error could be coming from variance.

-----------

To prevent overfitting, we use regularization.
1. While λ is LARGE, the ϴ will be penalized through iteration, which will close to 0.
Thus ϴ0 will dominate the h(x) function.
2. While λ close to 0, the ϴ will be closely fit after iterations, causing h(x) overfits.
3. While λ is intermidiate "just right", the h(x) is good.

How to find the λ for us?
1. define Jtraining(ϴ) without regularization term.
2. define Jcv(ϴ) without regularization term.
3. define Jtest(ϴ) without regularization term.

Choosing the regularization parameter λ:
1.
λ = 0, 0.01, 0.02, 0.04...10 , for each λ, we find the min J(ϴ), and get ϴ.

2. Use Jcv(ϴ) , the ϴ found in step 1. for each of the λ.

3. Pick the smallest Jcv(ϴ), and that λ is the correct regularization λ.

4. Last, use test set to feed the picked λ with Jtest(ϴ) see how well the result it.

------------------
Learning Curve:
Plot:
Jtrain
Jcv

m as number of training set.

Let's now tuning the m.

While m number is small, our h(x) can fit the set perfectly.
While m number is large, h(x) is hard to fit.

If a learning algorithm is suffering from high bias, getting more training data
won't help.

If a learning algorithm is suffering from high variance, getting more training data
is likely to help.

------------
Recap:

There are variable can be tuned.

1. d = degree of polynomial.
2. λ, regularization.
3. m, number of data

--------------

1. Get more training examples. Fix high variance problem.
2. Try smaller sets of features. Fix high variance problem. (Won't work for high bias problem)
3. Try getting additional features. Fix for high bias problem. Since the currnet hypothsis is too simple.
4. Try adding polynomial features. Fix for high bias problem.
5. Try decreasing λ. Fix high bias, since λ times Totle of ϴ, smaller λ, minimize the effection of bias ϴ0
6. Try increasing λ. Fix high variance. Since high λ will cause the iteration finding of ϴ be smaller. Which make the Degree that ϴ times to be less
effective.

----------------
Back to neural network:
1. More layers + regularization is better than less layers(under fitting), although takes more computation.

-----------------
Machine learning diagnostic:
Diagnostic: (can take time to implement)
A test that you can run to gain insigt what is/isn't
working with a learning algorithm, and gain guidance as to how best to improve its
performance.

Ataraxia through Epoché

Feb 7, 2016

[ML] Debugging a learning algorithm

No comments:

Post a Comment