1 KNN recognizes handwritten digits

1.1 KNN(K-nearest neighbor algorithm) to recognize handwritten digits

1.1.1 Introduction to KNN

  1. KNN(K-Nearest Neighbor) is a supervised learning method. Its working mechanism is very simple, and it does not need to train a training set. It is one of the simpler classical machine learning algorithms. Can handle regression and classification problems.
  2. method ideas

In the feature space, if most of the k nearest(that is, the nearest neighbors in the feature space) samples near a sample belong to a certain category, the sample also belongs to this category.

In official words, the so-called K-nearest neighbor algorithm is to give a training data set, for a new input instance, find the K instances closest to the instance in the training data set(that is, the K neighbors mentioned above).), the majority of these K instances belong to a certain class, and the input instance is classified into this class.

  1. working principle

There is a sample data set, also known as a training sample set, and each data in the sample set has a label, that is, we know the relationship between each data in the sample set and the category to which it belongs. After inputting data without labels, compare each feature in the new data with the features corresponding to the data in the sample set, and extract the classification labels of the most similar data(nearest neighbors) in the sample set. Generally speaking, we only select the top K most similar data in the sample data set, which is the origin of K in the K nearest neighbor algorithm, usually K is an integer not greater than 20. Finally, the classification with the most occurrences in the K most similar data is selected as the classification of the new data.

  1. KNN advantages and disadvantages
  1. KNN implementation steps:

1.1.2 Taking the recognition of handwritten digits as an example to introduce the implementation of the KNN algorithm

  1. data set

The test set is used to test the algorithm, you can refer to the directory ~/KNN/knn-digits/testDigits

  1. use the drawing function under windows to make handwritten digital pictures

image-20220222151155632

image-20220305122748713

After drawing, save it as a png image(this example takes 8.png as an example), and copy it to the project directory through the WinSCP tool

  1. Convert the image(.img) to text(.txt)

Code location: ~/KNN/img2file.py

img_path: image file name

txt_name: After conversion, save it in the ~/KNN/knn-digits/testDigits directory, the name is 8_1.txt

After the program runs, a file named 8_1.txt will be generated in the KNN directory. Double-click to open it and you can see that this is the case.

image-20220222161205064

It can be seen that the part with the number 8 is roughly surrounded by a number 8.

  1. run the KNN recognition algorithm program

Code location: ~/KNN/knn.py

image-20220222162113689

As shown in the figure, the identification is 8, and the identification is correct. If the recognition result is different from the real result, please copy the converted txt file to knn-digits/trainingDigits/ and name it as follows: 8_801.txt, and then retrain it to recognize it normally.

Train first, then identify.