Unsuperviced learning:
No labels.
Clustering Algorithm.
Used for:
Market segmentation
Social network analysis
Organize computing clusters
Astronomical data analysis
---
K-means algorithm (Mostly used)
1. iterative algorithm
2. randomly init. cluster centroids.
3. for data sets, assign each dataum , which is
closer to which centroids, assign to the closer centroid.
4. move centroid. Compute the mean for each centroid.
This will move the centroid.
5. Redo step 3.
Input:
1. K(number of clusters)
2. Training set {x(1), x(2), ..., x(m)}
x(i) ∈ R^n (drop x(0) = 1 convention)
If a centroid has no data assign to it
(which is that no data is closer to this centroid)
The centroid can be removed.
Or re-random initialize the centroid.
(Usually removing the centroid is more common)
K-means for non-separated clusters.
------
Optimization objective
1. Help debugging the algorithm
2. Help K mean find better cluster
While K mean's running, monitor 2 set of variables:
1. C(i) , index of cluster(1,2,...,K) to which example
x(i) is currently assigned
2. u(k) , cluster centroid k (u(k) ∈ R^n)
Distortion Cost Function:
Find the min distance between data set and the assigned centroid.
-------------
Random Initializing.
1. Should have K < m
i.e number of centroid should less than number of testing data set.
2. Randomly pick K training examples
3. Set u(1),...,u(k) equal to these K examples
Be aware that the way we choose centroid, could end up
with different result.
Local optima:
To deal with local optima, run K mean random initialization multiple times. (usually 50 ~ 1000 times)
Pick one of the Distortion Cost function result as the Best solution.
----
Choosing the number of clusters.
Elbow method: (Worth a shot, but not always useful.)
Choosing the value of K:
----------------------
Dimentionality reduction
(i.e 2D -> 1D)
Motivation:
1. Data compression
2D -> 1D
3D -> 2D / 10000D -> 1000D
Motivation:
2. Data visualization
----------
How to reduce dimension?
Principal Component Analysis problem formulation(PCA)
Projection error. (The distance from sample to the linear line)
PCA:
Reduce from 2-dimension to 1-dimension: Find a direction
(a vector u(1) ∈ R^n)
onto which to project the data so as to minimize the projection error.
Reduce from n-dimension to k-dimension: Find k vector u(1), u(2), ..., u(k) onto which to project the data, so as to minimize the projection error.
PCA is not linear regression.
Left: linear regression, right: PCA
-----
PCA algorithm:
1.Compute 'covariance matrix'
2. Compute 'eigenvectors' of matrix ∑
--------
Reconstruction from compressed representation.
i.e reverse back to original data.
------
Dimentionality reduction:
Choosing the number of principal components.(k)
Method 1:
Start from k=1 ... k=17
Method 2:
Summerize:
-------
Advice for applying PCA
Use PCA on supervised data:
1. Extract inputs
unlabeled dataset:
x(1), ... , X(m) ∈ R^100000
through PCA:
x(1), ... , X(m) ∈ R^1000
2. new training set:
(z(1), y(1)) , ... , (z(m), y(m))
Summerize:
Bad use of PCA:
1. to prevent overfitting, because reduce the features.
2. PCA does not honor label y
Use regularization instead.
3. Design of ML system:
Start without PCA _FIRST_.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.