Tensorflow Initial Practice-Verification Code Identifier
Since mid-April, we have gradually acquired some preliminary knowledge of neural networks. When we choose tensorflow framework for in-depth learning, we may not adapt to this form of graph at first, but tensorflow also helps us understand the framework of neural networks to a certain extent.
I would like to encourage myself to persist in learning on Labor Day.
I. Project Introduction
1. Goal: Establish a pure digital validation code identifier
2. Principle: CNN
3. Tools: Tensorflow
4. Significance: Familiar with tensorflow framework, deduce forward propagation process, and promote CNN understanding
II. Data Acquisition-Image Captcha
The training data and test data are from the ImageCaptcha class in the captcha. image library.
ImageCaptcha can generate validation code pictures based on the input text
Since image data is processed for the first time, most of the previous efforts are still focused on image preprocessing.
First, write a script to test the acquisition of text validation code and image validation code.
Import tensorflow as TF
Import numpy as NP
Import matplotlib.pyplot as PLT
From PIL import Image
Import random
# Load data sets
From captcha. image import ImageCaptcha
Number = ['0','1','2','3','4','5','6','7','8','9']
Alphabet = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','z']
ALPHABET = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']
Combining the above numbers and letters randomly, a verification code is generated. The length of the verification code is 4. There are 62*62*62*62 different labels in total.
# Define a random_captcha_text function, which generates a validation code every time it is called, and its constituent elements are stored in a list.
Def random_captcha_text (char_set=number+alphabet+ALPHABET, captcha_size=4):
Captcha_text= []
For I in range (captcha_size): Take four cycles
X = random. choice (char_set) # random. choice Random Selection An element
Captcha_text. append (x)
Return captcha_text
# Correspond the elements that make up the verification code to the pictures, and get the corresponding pictures from the captcha. image library.
Def get_captcha_text_and_image():
# Call the ImageCaptcha class to generate validation code images
Image = ImageCaptcha ()
# Using random_captcha_text() function to get four elements
Captcha_text = random_captcha_text()
# Connect the four elements together to form a complete verification code
Captcha_text =''. join (captcha_text)
# According to the generated text validation code, the corresponding picture validation code is obtained.
Captcha = image. generate (captcha_text)
# Open the image and PIL. Image. open () connects the image path to read directly the image pointed to by the path.
Captcha_image = Image. open (captcha)
# Converting Pictures into Array Format
Captcha_image = np. array (captcha_image)
Return captcha_text, captcha_image
Finally, test whether the data is correct.
If _name_=='_main_':
Text, image = get_captcha_text_and_image()
F = plt. figure ()
Ax = f. add_subplot (111)
Ax. text (0.1, 0.9, text, ha='center', va='center', transform = ax. transAxes)
Plt. imshow (image)
Plt. show ()
The generated results are as shown above and are in line with the expected results.
3. Data Preprocessing
Considering the cost of time, this battle only trains the recognizer consisting of pure numbers.
According to the results of data acquisition, the generated validation code image is a color image, and the shape command shows that its dimension is 60*160*3. Therefore, we first need to preprocess, mainly summarize:
Converting color image to gray image for easy calculation --- finding the mean value according to the direction of color channel (lazy method)
2. Converting Pictures into Arrays - Dimension Reduction
3. Converting text-based verification codes (eg: 4123) into vectors - similar to encoding, each bit of verification codes has 10 categories, the correct analogy is 1, and the rest are 0.
Supplementary: Vector-based verification codes are converted into text, and the time-vector-based verification codes obtained in the final test phase are converted into text.
Import numpy as NP
Import tensorflow as TF
From captcha. image import ImageCaptcha
Import numpy as NP
Import matplotlib.pyplot as PLT
From PIL import Image
Import random
# Image size
IMAGE_HEIGHT = 60
IMAGE_WIDTH = 160
MAX_CAPTCHA = 4# Verification Code Maximum Length 4
CHAR_SET_LEN = 10# Verification Code has 10 categories per bit
Checkpoint_dir=''# The path used to save the model
Number = ['0','1','2','3','4','5','6','7','8','9']
# Combining the above numbers and letters randomly, a verification code is generated. The length of the verification code is 4. There are 10 * 10 * 10 * 10 * 10 different labels.
# Define a random_captcha_text function, which generates a validation code every time it is called, and its constituent elements are stored in a list.
Def random_captcha_text (char_set=number, captcha_size=4):
Captcha_text= []
For I in range (captcha_size): Take four cycles
X = random. choice (char_set) # random. choice Random Selection An element
Captcha_text. append (x)
Return captcha_text
# Correspond the elements that make up the verification code to the pictures, and get the corresponding pictures from the captcha. image library.
Def gen_captcha_text_and_image():
# Call the ImageCaptcha class to generate validation code images
Image = ImageCaptcha ()
# Using random_ca
Please read the Chinese version for details.