Ataraxia through Epoché: [ML] Unsuperviced learning / K-means algorithm / Principal Component Analysis(PCA)

Unsuperviced learning:

No labels.
Clustering Algorithm.

Used for:
Market segmentation
Social network analysis
Organize computing clusters
Astronomical data analysis

---
K-means algorithm (Mostly used)
1. iterative algorithm
2. randomly init. cluster centroids.
3. for data sets, assign each dataum , which is
closer to which centroids, assign to the closer centroid.

4. move centroid. Compute the mean for each centroid.
This will move the centroid.

5. Redo step 3.

Input:
1. K(number of clusters)
2. Training set {x(1), x(2), ..., x(m)}

x(i) ∈ R^n (drop x(0) = 1 convention)

If a centroid has no data assign to it
(which is that no data is closer to this centroid)
The centroid can be removed.
Or re-random initialize the centroid.
(Usually removing the centroid is more common)

K-means for non-separated clusters.

------
Optimization objective

1. Help debugging the algorithm
2. Help K mean find better cluster

While K mean's running, monitor 2 set of variables:
1. C(i) , index of cluster(1,2,...,K) to which example
x(i) is currently assigned

2. u(k) , cluster centroid k (u(k) ∈ R^n)

Distortion Cost Function:
Find the min distance between data set and the assigned centroid.

-------------
Random Initializing.

1. Should have K < m
i.e number of centroid should less than number of testing data set.

2. Randomly pick K training examples

3. Set u(1),...,u(k) equal to these K examples

Be aware that the way we choose centroid, could end up
with different result.

Local optima:

To deal with local optima, run K mean random initialization multiple times. (usually 50 ~ 1000 times)

Pick one of the Distortion Cost function result as the Best solution.

----

Choosing the number of clusters.
Elbow method: (Worth a shot, but not always useful.)

Choosing the value of K:

----------------------
Dimentionality reduction
(i.e 2D -> 1D)

Motivation:
1. Data compression

2D -> 1D

3D -> 2D / 10000D -> 1000D

Motivation:
2. Data visualization

----------
How to reduce dimension?

Principal Component Analysis problem formulation(PCA)

Projection error. (The distance from sample to the linear line)

PCA:
Reduce from 2-dimension to 1-dimension: Find a direction
(a vector u(1) ∈ R^n)
onto which to project the data so as to minimize the projection error.

Reduce from n-dimension to k-dimension: Find k vector u(1), u(2), ..., u(k) onto which to project the data, so as to minimize the projection error.

PCA is not linear regression.

Left: linear regression, right: PCA

-----
PCA algorithm:

1.Compute 'covariance matrix'

2. Compute 'eigenvectors' of matrix ∑

--------
Reconstruction from compressed representation.

i.e reverse back to original data.

------
Dimentionality reduction:
Choosing the number of principal components.(k)

Method 1:
Start from k=1 ... k=17

Method 2:

Summerize:

-------
Advice for applying PCA

Use PCA on supervised data:
1. Extract inputs
unlabeled dataset:
x(1), ... , X(m) ∈ R^100000
through PCA:
x(1), ... , X(m) ∈ R^1000

2. new training set:
(z(1), y(1)) , ... , (z(m), y(m))