Basic knowledge

Tensorflow Initial Practice-Verification Code Identifier

Since mid-April, we have gradually acquired some preliminary knowledge of neural networks. When we choose tensorflow framework for in-depth learning, we may not adapt to this form of graph at first, but tensorflow also helps us understand the framework of neural networks to a certain extent.

I would like to encourage myself to persist in learning on Labor Day.

I. Project Introduction

1. Goal: Establish a pure digital validation code identifier

2. Principle: CNN

3. Tools: Tensorflow

4. Significance: Familiar with tensorflow framework, deduce forward propagation process, and promote CNN understanding

II. Data Acquisition-Image Captcha

The training data and test data are from the ImageCaptcha class in the captcha. image library.

ImageCaptcha can generate validation code pictures based on the input text

Since image data is processed for the first time, most of the previous efforts are still focused on image preprocessing.

First, write a script to test the acquisition of text validation code and image validation code.

Import tensorflow as TF

Import numpy as NP

Import matplotlib.pyplot as PLT

From PIL import Image

Import random

# Load data sets

From captcha. image import ImageCaptcha

Number = ['0','1','2','3','4','5','6','7','8','9']

Alphabet = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','z']

ALPHABET = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']

Combining the above numbers and letters randomly, a verification code is generated. The length of the verification code is 4. There are 62*62*62*62 different labels in total.

# Define a random_captcha_text function, which generates a validation code every time it is called, and its constituent elements are stored in a list.

Def random_captcha_text (char_set=number+alphabet+ALPHABET, captcha_size=4):

Captcha_text= []

For I in range (captcha_size): Take four cycles

X = random. choice (char_set) # random. choice Random Selection An element

Captcha_text. append (x)

Return captcha_text

# Correspond the elements that make up the verification code to the pictures, and get the corresponding pictures from the captcha. image library.

Def get_captcha_text_and_image():

# Call the ImageCaptcha class to generate validation code images

Image = ImageCaptcha ()

# Using random_captcha_text() function to get four elements

Captcha_text = random_captcha_text()

# Connect the four elements together to form a complete verification code

Captcha_text =''. join (captcha_text)

# According to the generated text validation code, the corresponding picture validation code is obtained.

Captcha = image. generate (captcha_text)

# Open the image and PIL. Image. open () connects the image path to read directly the image pointed to by the path.

Captcha_image = Image. open (captcha)

# Converting Pictures into Array Format

Captcha_image = np. array (captcha_image)

Return captcha_text, captcha_image

Finally, test whether the data is correct.

If _name_=='_main_':

Text, image = get_captcha_text_and_image()

F = plt. figure ()

Ax = f. add_subplot (111)

Ax. text (0.1, 0.9, text, ha='center', va='center', transform = ax. transAxes)

Plt. imshow (image)

Plt. show ()

The generated results are as shown above and are in line with the expected results.

3. Data Preprocessing

Considering the cost of time, this battle only trains the recognizer consisting of pure numbers.

According to the results of data acquisition, the generated validation code image is a color image, and the shape command shows that its dimension is 60*160*3. Therefore, we first need to preprocess, mainly summarize:

Converting color image to gray image for easy calculation --- finding the mean value according to the direction of color channel (lazy method)

2. Converting Pictures into Arrays - Dimension Reduction

3. Converting text-based verification codes (eg: 4123) into vectors - similar to encoding, each bit of verification codes has 10 categories, the correct analogy is 1, and the rest are 0.

Supplementary: Vector-based verification codes are converted into text, and the time-vector-based verification codes obtained in the final test phase are converted into text.

Import numpy as NP

Import tensorflow as TF

From captcha. image import ImageCaptcha

Import numpy as NP

Import matplotlib.pyplot as PLT

From PIL import Image

Import random

# Image size

IMAGE_HEIGHT = 60

IMAGE_WIDTH = 160

MAX_CAPTCHA = 4# Verification Code Maximum Length 4

CHAR_SET_LEN = 10# Verification Code has 10 categories per bit

Checkpoint_dir=''# The path used to save the model

Number = ['0','1','2','3','4','5','6','7','8','9']

# Combining the above numbers and letters randomly, a verification code is generated. The length of the verification code is 4. There are 10 * 10 * 10 * 10 * 10 different labels.

# Define a random_captcha_text function, which generates a validation code every time it is called, and its constituent elements are stored in a list.

Def random_captcha_text (char_set=number, captcha_size=4):

Captcha_text= []

For I in range (captcha_size): Take four cycles

X = random. choice (char_set) # random. choice Random Selection An element

Captcha_text. append (x)

Return captcha_text

# Correspond the elements that make up the verification code to the pictures, and get the corresponding pictures from the captcha. image library.

Def gen_captcha_text_and_image():

# Call the ImageCaptcha class to generate validation code images

Image = ImageCaptcha ()

# Using random_ca

Please read the Chinese version for details.

PREVIOUS：Image restoration and reconstruction NEXT：What is the criterion of in-depth learning al