Teach you the simplest k-NN algorithm for machine learning using OpenCV
Preface: OpenCV is constructed to provide a general basic interface for computer vision. It has become the classical and best tool set of integrated algorithms for computer vision and machine learning. As an open source project, researchers, business users and government departments can easily use and modify existing code.
K-NN algorithm can be considered as one of the simplest machine learning algorithms. This article teaches you how to use the basic knowledge of OpenCV and Python to implement k-NN algorithm.
01
Predicting Categories Using Classification Models: Problem Raising
Suppose that in a town called Random Town, people are crazy about their two sports teams, Random City Red Team and Random City Blue Team. Red Team has a long history and is loved by people. But then some millionaires from other towns came to town, bought the best scorer in the Red Team, and began to form a new team, the Blue Team.
In addition to making most of the Red Team fans unhappy, the best scorer can still win the championship step by step in the Blue Team. Although there will still be some fans who will never forgive him for his early career choices, he will return to the Red Team in a few years.
But in any case, you can find that the relationship between the Red Team fans and the Blue Team fans is not good. In fact, because the fans of the two teams are unwilling to be neighbors with each other, they even live apart. I've even heard stories of Red Team fans moving deliberately to other places when Blue Team fans move near their homes. This is a true story!
Anyway, we went into the town without knowing anything and tried to sell people some blue team goods door to door. From time to time, however, there are red-blood fans who shout at us and drive us out of their lawns because we sell blue stuff. Very unfriendly! If you can avoid these houses and just visit the homes of Blue Team fans, the pressure will be less and you can make better use of your time.
With the belief that we can learn to predict where Red Team fans live, we began to record every visit. If you meet a Red Team fan's home, draw a red triangle on the town map at hand; otherwise, draw a blue square. After a while, we know their living information very well.
_Town Map of Random Town
Now, however, we are in front of the house marked by green circles on the map. Should I knock at the door? We tried to find clues to which team they supported (maybe with the team's flag on the back balcony), but we didn't find them. How do you know if knocking is safe?
This silly example illustrates exactly one kind of problem that supervised learning algorithm can solve. We have some observation information (the house, the location of the house and the colour of the team they support) that make up the training data. These data can be used to learn from experience so that when faced with the task of predicting the team color supported by the owner of the new house, there is enough information to make an assessment.
As mentioned earlier, Red fans are very enthusiastic about their team, so they can never be neighbors of Blue fans. Can we use this information to compare all the neighbors'houses to find out which team of fans live in the new house?
This is the problem that k-NN algorithm will deal with.
02
Understanding k-NN algorithm
K-NN algorithm can be considered as one of the simplest machine learning algorithms. The reason is that we only need to store training data sets. Next, in order to predict the new data points, only the nearest neighbor points need to be found in the training data set.
Simply put, k-NN algorithm considers that a data point is likely to belong to the same class as its nearest neighbors. Think about it: If our neighbours were Red Team fans, we would probably be Red Team fans too, otherwise we might have moved elsewhere long ago. The same is true for Blue fans.
Of course, some communities may be slightly more complex. In this case, we will consider not only the category of our nearest neighbor (k = 1), but also the category of k nearest neighbors. For the example mentioned above, if we were Red Team fans, we might not move to a place where most of our neighbors are Blue Team fans.
That's all about it.
03
Implementing k-NN with OpenCV
Using OpenCV, it is easy to create a k-NN model through the cv2.ml.KNearest_create() function. Then the following steps are taken:
Generate some training data.
Specify the K value and create a k-NN object.
Find k nearest neighbors of the new data points you want to classify.
Use majority voting to assign class labels to new data points.
Draw the result diagram.
Firstly, all necessary modules are introduced: OpenCV using k-NN algorithm, NumPy processing data, Matplotlib for drawing. If you use Jupyter Notebook, don't forget to call the% Matplotlib inline magic command.
1. Generating training data
The first step is to generate some training data. We will use NumPy's random number generator to do this. We'll fix the seed value of the random number generator so that rerunning the script will always generate the same value.
In [3]: np. random. seed (42)
Okay, now we can start. So what should our training data look like?
In the previous example, the data point is the house on the town map. Each data point has two characteristics (i.e., the location x on the town map).
Please read the Chinese version for details.