Basic knowledge
Machine Learning - Unsupervised Learning
According to Andrew Ng's "Machine Learning" video in Stanford, the knowledge gained through Li Hang's "Statistical Learning Method" is not to be elaborated, but only outlined.
1 Clustering algorithm
1.1 K-Means algorithm
step
Random initialization of K cluster centroids [n-dimensional vectors], and then iteration
Cluster allocation: traverse the sample to determine which cluster center is closer to, and then allocate
Mobile Cluster Center: Calculate the sample mean of each cluster and update the location of the cluster center (remove the cluster if there are no samples in the cluster; initialize randomly if necessary)
Until the cluster center does not change
Clusters that can be used for poor classification
Optimizing objectives
C(i): cluster index of sample x(i)
Muk: cluster center K
Mu C (i): cluster center of sample x (i)
J(c(i),..., c(m), mu 1,..., mu K) = 1m_i=1m | | x(i)mu c(i)| | 2
Random Initialization: Random Selection of K Training Samples
A kind of
Local optimum: multiple runs of K-means algorithm (better for clustering with smaller K value)
Selection of K Value
Elbow Rule: Drawing J-K Curves
Selection based on follow-up purposes
1.2 Dimension Reduction
1.2.1 Goal I: Data Compression
Question: Data redundancy/feature highly correlated
1.2.2 Goal II: Visual data
Problem: High-dimensional data cannot be drawn
1.2.3 Principal Component Analysis
Trying to find a low-dimensional plane to minimize projection error
2D 1D: Finding a vector to minimize projection errors
ND kD: Finding K vectors minimizes projection errors
PCA vs. Linear Regression
PCA: Minimizing projection errors, not predicting
Linear regression: x y, minimizing prediction error, prediction results
Data preprocessing
Feature Scaling/Mean Normalization
Computation of covariance matrix
=1m_ni=1(x(i))(x(i)) T=1mXTX
Computation of eigenvectors of covariance matrix_
[U, S, V] = SVD (Sigma)
U:n*n matrix, i.e. [u(1)u(2)u(3)u(m)], takes the front K column and obtains the n*k matrix.
Z (i) = UTreducex (i) = [u (1) u (2)... u (k)] Tx (i), which is a k-dimensional vector
Number of selected principal components
Ratio: 1m_mi = 1 | | x (i) x (i) approx | 21m Mi = 1 | | x (i) | | 2
Molecules represent the sum of the distances between the original point and the projection point.
The smaller the error is, the more complete the data after dimension reduction can represent the data before dimension reduction.
If the ratio is less than 0.01, the dimensionality reduction data can retain 99% of the information.
In practical application, the K value which can make the error less than 0.01 (99% of the information is retained) or 0.05 (95% of the information is retained) is selected.
For visual data, k = 2 or K = 3 is usually chosen
Compression Reproduction
X (i) approx = Ureducez (i)
Recommendations for application
Accelerated learning algorithm: extracting input x(i)PCA_low-dimensional representation z(i)new training set
Prevent over-fitting: Bad!
Design Machine Learning System: Only when PCA is not used but not available.