Basic knowledge

Machine Learning - Unsupervised Learning

According to Andrew Ng's "Machine Learning" video in Stanford, the knowledge gained through Li Hang's "Statistical Learning Method" is not to be elaborated, but only outlined.

1 Clustering algorithm

1.1 K-Means algorithm

step

Random initialization of K cluster centroids [n-dimensional vectors], and then iteration

Cluster allocation: traverse the sample to determine which cluster center is closer to, and then allocate

Mobile Cluster Center: Calculate the sample mean of each cluster and update the location of the cluster center (remove the cluster if there are no samples in the cluster; initialize randomly if necessary)

Until the cluster center does not change

Clusters that can be used for poor classification

Optimizing objectives

C(i): cluster index of sample x(i)

Muk: cluster center K

Mu C (i): cluster center of sample x (i)

J(c(i),..., c(m), mu 1,..., mu K) = 1m_i=1m | | x(i)mu c(i)| | 2

Random Initialization: Random Selection of K Training Samples

A kind of

Local optimum: multiple runs of K-means algorithm (better for clustering with smaller K value)

Selection of K Value

Elbow Rule: Drawing J-K Curves

Selection based on follow-up purposes

1.2 Dimension Reduction

1.2.1 Goal I: Data Compression

Question: Data redundancy/feature highly correlated

1.2.2 Goal II: Visual data

Problem: High-dimensional data cannot be drawn

1.2.3 Principal Component Analysis

Trying to find a low-dimensional plane to minimize projection error

2D 1D: Finding a vector to minimize projection errors

ND kD: Finding K vectors minimizes projection errors

PCA vs. Linear Regression

PCA: Minimizing projection errors, not predicting

Linear regression: x y, minimizing prediction error, prediction results

Data preprocessing

Feature Scaling/Mean Normalization

Computation of covariance matrix

=1m_ni=1(x(i))(x(i)) T=1mXTX

Computation of eigenvectors of covariance matrix_

[U, S, V] = SVD (Sigma)

U:n*n matrix, i.e. [u(1)u(2)u(3)u(m)], takes the front K column and obtains the n*k matrix.

Z (i) = UTreducex (i) = [u (1) u (2)... u (k)] Tx (i), which is a k-dimensional vector

Number of selected principal components

Ratio: 1m_mi = 1 | | x (i) x (i) approx | 21m Mi = 1 | | x (i) | | 2

Molecules represent the sum of the distances between the original point and the projection point.

The smaller the error is, the more complete the data after dimension reduction can represent the data before dimension reduction.

If the ratio is less than 0.01, the dimensionality reduction data can retain 99% of the information.

In practical application, the K value which can make the error less than 0.01 (99% of the information is retained) or 0.05 (95% of the information is retained) is selected.

For visual data, k = 2 or K = 3 is usually chosen

Compression Reproduction

X (i) approx = Ureducez (i)

Recommendations for application

Accelerated learning algorithm: extracting input x(i)PCA_low-dimensional representation z(i)new training set

Prevent over-fitting: Bad!

Design Machine Learning System: Only when PCA is not used but not available.

PREVIOUS：39 Big Data Visualization Tools NEXT：Unsupervised Learning of Machine Learning Alg