Basic knowledge

Python Handwritten Machine Learning Simplest KNN Algorithms

Author Suk 1900

Source: Senior Migrant Workers (ID: Mocun6)

ABSTRACT: Learning the simplest KNN algorithm from scratch.

Starting today, I'm going to write a machine learning tutorial. To be honest, machine learning is more practical and competitive than crawlers.

At present, most of these online tutorials are not friendly to beginners, either calling Sklearn package directly, or full of abstract and boring algorithmic formula text. It is difficult to get started with these tutorials, but few handwritten Python code tutorials are really suitable for beginners. Recently, I watched the machine learning course of Mr. Bobo on Muchow. com, and I was addicted to it. There is no one of the best machine learning courses. Based on his tutorial and my own understanding, I intend to update the Twitter Series of Machine Learning from scratch.

The first tweet doesn't talk about concluding articles such as what machine learning is and what algorithms machine learning has. Before you really know what it is, it will not impress you, but will increase your mental load.

So I'm going to go straight to the beginning of an algorithmic battle, just like previous crawler tutorials, when you really feel its interesting, you will have the desire to learn it.

Let's start with a scenario story.

01 scene substitution

In a bar, there are ten almost identical glasses of red wine on the bar. The boss jokes with you about whether you want to play a game or not. He wins a free drink and loses three times the cost of the wine. The probability of winning is 50%. You're an adventurous person, so talk and play.

The boss then said, "The ten glasses of red wine in front of you are slightly different. The first five belong to Cabernet Sauvignon and the last five belong to Pinot Noir." Now, I'll refill a glass of wine. You just need to tell me exactly what kind of wine it belongs to according to the ten cups I just had.

After listening, you feel a little guilty: you don't know wine at all. You can't tell it by looking at it and tasting it. But when you think of yourself as a machine learner, you can't help agreeing to your boss with more gusto.

Instead of rushing to taste wine, you asked the owner for specific information about each drink: alcohol concentration, color depth, and a pen and paper. While the boss pours a glass of new wine, you're crazy about drafting. Soon, you told your boss that the new wine should be Cabernet Sauvignon.

The boss stared at his chin and almost startled. No one could answer without a sip of wine. Countless people tried it over and over again, and finally ended up hesitating to guess wrong. You smiled mysteriously and the boss kept his promise to make you drink happily. At last, the boss can't help asking you how to do it.

You show off: no other, but machine learning is familiar.

Introduction of 02 kNN algorithm

Next, we will begin to contact machine learning from this story. Machine learning gives many people the feeling that it is "difficult", so I compiled the above story, which is to lead to the simplest algorithm of machine learning: K-Nearest Neighbor algorithm, also known as K-Nearest Neighbor algorithm.

Don't be frightened by the word "algorithm". I promise that you can learn this algorithm only by adding a little Python foundation to high school mathematics.

Learning the kNN algorithm requires only three steps:

Understanding the idea of kNN algorithm

Grasp the mathematical principles behind it (don't be afraid, you learned it in junior high school)

Finally, it is implemented with simple Python code.

Before kNN algorithm, two concepts are introduced: sample and feature.

Each glass of wine above is called a "sample" and ten drinks make up a sample set. Information such as alcohol concentration and color depth is called "characteristics". These ten drinks are distributed in a multi-dimensional feature space. Speaking of space, we can perceive three-dimensional space at most. For understanding convenience, we assume that we can distinguish Cabernet Sauvignon from Pinot Noir only by using two eigenvalues: alcohol concentration and color depth. In this way, it can be displayed intuitively on the two-dimensional coordinate axis.

The horizontal axis is the value of alcohol concentration and the vertical axis is the value of color depth. Ten glasses of wine form ten points on the coordinate axis, five green points represent five cups of Cabernet Sauvignon, and five red points represent five cups of Pinot Noir. It can be seen that there are clear boundaries between the two types of liquor. The owner's new glass of wine is the yellow dot in the picture.

Remember our question? To determine whether the wine is Cabernet Sauvignon or Pinot Noir, the answer is obvious. Judging from subjective distance, it should belong to Cabernet Sauvignon.

This uses the idea of K-nearest neighbor algorithm. The algorithm first needs to take a parameter K, and the experience value given in machine learning is 3. We assume that we take 3 first, and then we study how many. For each new point, what K-nearest neighbor algorithm does is to find the three nearest points to the new point in all sample points, count the categories of the three points, and then vote. The category with the largest number of votes is the category of the new point.

The picture above has two categories: green and red. The nearest three points to yellow are green dots, so the votes for green and red categories are 3:0. Green wins, so yellow dots belong to green, that is, the new cup belongs to Cabernet Sauvignon.

This is the K-nearest neighbor algorithm. Its essence is to judge whether two samples are similar by distance. If they are close enough, they feel that they belong to the same category. Of course, comparing only one sample is not enough, and the error will be very large. Comparing the nearest K samples, we can see which category the K samples belong to and which category the new sample belongs to at most.

Is it simple?

For another example, the boss poured another glass of wine for you to guess.

Please read the Chinese version for details.

PREVIOUS：Overview of Recommendation System Products an NEXT：Teach you the simplest k-NN algorithm for mac