Feb 21, 2016

[ML] Unsuperviced learning / K-means algorithm / Principal Component Analysis(PCA)

Unsuperviced learning:

No labels.
Clustering Algorithm.

Used for:
Market segmentation
Social network analysis
Organize computing clusters
Astronomical data analysis


---
K-means algorithm (Mostly used)
1. iterative algorithm
2. randomly init. cluster centroids.
3. for data sets, assign each dataum , which is
closer to which centroids, assign to the closer centroid.

4. move centroid. Compute the mean for each centroid.
This will move the centroid.

5. Redo step 3.


Input:
    1. K(number of clusters)
    2. Training set {x(1), x(2), ..., x(m)}
   
x(i) ∈ R^n (drop x(0) = 1 convention)



If a centroid has no data assign to it
(which is that no data is closer to this centroid)
The centroid can be removed.
Or re-random initialize the centroid.
(Usually removing the centroid is more common)


K-means for non-separated clusters.



------
Optimization objective

1. Help debugging the algorithm
2. Help K mean find better cluster


While K mean's running, monitor 2 set of variables:
1. C(i) , index of cluster(1,2,...,K) to which example
    x(i) is currently assigned
   
2. u(k) , cluster centroid k (u(k) ∈ R^n)


Distortion Cost Function:
Find the min distance between data set and the assigned centroid.



-------------
Random Initializing.

1. Should have K < m
i.e number of centroid should less than number of testing data set.

2. Randomly pick K training examples

3. Set u(1),...,u(k) equal to these K examples

Be aware that the way we choose centroid, could end up
with different result.

Local optima:



To deal with local optima, run K mean random initialization multiple times. (usually 50 ~ 1000 times)

Pick one of the Distortion Cost function result as the Best solution.

----

Choosing the number of clusters.
Elbow method: (Worth a shot, but not always useful.)




Choosing the value of K:



----------------------
Dimentionality reduction
(i.e 2D -> 1D)

Motivation:
1. Data compression

2D -> 1D



3D -> 2D / 10000D -> 1000D


Motivation:
2. Data visualization



----------
How to reduce dimension?

Principal Component Analysis problem formulation(PCA)

Projection error. (The distance from sample to the linear line)

PCA:
Reduce from 2-dimension to 1-dimension: Find a direction
(a vector u(1) ∈ R^n)
onto which to project the data so as to minimize the projection error.

Reduce from n-dimension to k-dimension: Find k vector u(1), u(2), ..., u(k) onto which to project the data, so as to minimize the projection error.


PCA is not linear regression.

Left: linear regression, right: PCA



-----
PCA algorithm:





1.Compute 'covariance matrix'

2. Compute 'eigenvectors' of matrix ∑





--------
Reconstruction from compressed representation.

i.e reverse back to original data.



------
Dimentionality reduction:
Choosing the number of principal components.(k)



Method 1:
Start from k=1 ... k=17




Method 2:




Summerize:




-------
Advice for applying PCA


Use PCA on supervised data:
1. Extract inputs
    unlabeled dataset:
        x(1), ... , X(m) ∈ R^100000
    through PCA:
        x(1), ... , X(m) ∈ R^1000
       
2. new training set:
        (z(1), y(1)) , ... , (z(m), y(m))
   



Summerize:



Bad use of PCA:
1. to prevent overfitting, because reduce the features.
2. PCA does not honor label y

Use regularization instead.



3. Design of ML system:

Start without PCA _FIRST_.


No comments:

Post a Comment

Note: Only a member of this blog may post a comment.