Basic knowledge

Unsupervised Learning of Machine Learning Algorithms

We have learned many machine learning algorithms, including linear regression, logistic regression, neural network and support vector machine. These algorithms have one thing in common, that is, the training samples given are marked by themselves. For example, when using linear regression to predict housing prices, each training sample we use is one or more variables (such as area, floor, etc.) and has its own marker, that is, housing prices. Logistic regression, neural network and support vector machine are also used to deal with classification problems. For example, when classifying spam, we use existing spam (labeled 1) and non-spam (labeled 0), and the variables are the values of each pixel. The tag is the value of the number itself. Supervised Learning is an algorithm that uses marked training samples for learning. The training samples of supervised learning can be unified into the following forms, where x is a variable and Y is a marker.

Obviously, not all data in real life are tagged (or tags are unknown). So we need to learn from unmarked training samples to reveal the inherent nature and regularity of the data. We call this learning Unsupervised Learning. Therefore, unsupervised learning training samples are in the following form, which only contain feature quantities.

Figure 9-1 shows the difference between supervised learning and unsupervised learning. Figure (1) shows that the labeled samples are classified into different classes on both sides of the demarcation line (one is a circle, the other is a fork); Figure (2) clusters the unlabeled samples (which appear to be circles on the surface) based on variables X 1 and x2.

Unsupervised Learning of Machine Learning Algorithms

Figure 9-1 An example of the difference between supervised learning and unsupervised learning

Unsupervised learning also has many applications. An example of clustering is: for the collected papers, grouping according to the characteristics of each paper, such as word frequency, sentence length, page number, etc. Clustering has many other applications, as shown in Figure 9-2. An example of non-clustering is the cocktail party algorithm, which finds valid data (information) from noisy data, such as noisy cocktail parties, where you can still notice someone calling you. So the cocktail party algorithm can be used for speech recognition (see Wikipedia for details).

There is more discussion on the difference between supervised learning and unsupervised learning in quora.

Unsupervised Learning of Machine Learning Algorithms

Figure 9-2 Some Clustering Applications

9.2 K-means algorithm

The basic idea of clustering is to divide the samples in a data set into several usually disjoint subsets, each of which is called a cluster. After partitioning, each cluster may have corresponding concepts (properties), such as clustering papers with 2 clusters according to the number of pages and sentence length, which may result in one cluster containing mostly master's thesis and the other one containing mostly bachelor's thesis.

K-means algorithm is a widely used clustering algorithm. The steps of the K-means algorithm are described below.

K samples (points) are randomly initialized, which are called cluster centroids.

Cluster assignment: For all samples, assign them to the nearest cluster center.

Mobile cluster center: For each cluster, the average value of all samples belonging to the cluster is calculated, and the cluster center is moved to the average value.

Repeat steps 2 and 3 until we find the cluster we want (that is, the optimization objective, detailed in section 9.3 below)

Figure 9-3 illustrates the case where the number of eigenvalues and the number of clusters K are both 2.

Unsupervised Learning of Machine Learning Algorithms

Demonstration of Figure 9-3 K Mean Method

With the above description, we formalize the K-means algorithm.

Input:

K (number of clusters)

Training set

Algorithms:

Randomly initialize K cluster centroids

Repeat{

For I = 1 to M

For k = 1 to K

}

In the above algorithm, the first loop corresponds to the steps of cluster allocation: we construct a vector C so that the value of C (i) equals the index of the cluster to which X (i) belongs, that is, the index nearest to the cluster center of X (i). Expressed mathematically as follows:

The second cycle corresponds to the step of moving the cluster center, that is, moving the cluster center to the average value of the cluster. A more mathematical expression is as follows:

among

If a cluster center is not assigned to a sample, we can either reinitialize the cluster center or remove it directly.

After several iterations, the algorithm will converge, that is to say, the continuous iteration will not affect the cluster.

In some applications, the samples may be continuous and there seems to be no obvious clustering, but we can use K-means algorithm to divide the samples into K subsets for reference. For example, the size of a T-shirt is divided according to a person's height and weight, as shown in Figure 9-4.

Unsupervised Learning of Machine Learning Algorithms

Figure 9-4 K-means for non-separated clusters

9.3 Optimal objective

The variables used in the K-means algorithm are redefined:

Using these variables, we define our cost function as follows:

So our optimization goal is

Combining with the algorithm described in Section 9.2, we can find that:

In the cluster allocation step, our goal is to change

In the mobile cluster center step, our goal is changed by changing

Note that in the K-means algorithm, cost function cannot be increased, it should always decrease (different from the gradient descent method).

9.4 Random Initialization

Here is a recommended method to initialize cluster centers.

ensure

Please read the Chinese version for details.

PREVIOUS：Machine Learning - Unsupervised Learning NEXT：Deep Learning of Tensorflow Production Enviro