Basic knowledge

Deep learning framework - entry level

Specific content please pay attention to the public number: the large community of the Internet

本文说下如何速成我们平时常用的深度学习框架，总共有 7 个—— Caffe，Tensorflow，pytroch，paddlepaddle，Keras，mxnet，cntk。

This article describes how to speed up our usual deep learning framework, a total of seven - Caffe, Tensorflow, pytroch, paddle lepaddle, Keras, mxnet, cntk.

我们以一个分类任务为例，给大家准备了 500 张微笑的图片、500 张非微笑的图片，放置在 data 目录下，图片预览如下，已经缩放到 60*60 的大小：

Taking a classification task as an example, we have prepared 500 smiling pictures and 500 non-smiling pictures for you. They are placed in the data directory. The preview of the pictures is as follows. They have been zoomed down to 60*60 size.

这是非微笑的图片：

This is a non-smiling picture:

这是微笑的图片：

This is a picture of a smile:

Caffe

我们首先讲讲 Caffe 这个主流的开源框架从训练到测试出结果的全流程。到此，我必须假设大家已经有了深度学习的基础知识并了解卷积网络的工作原理。

Let's start with the whole process from training to testing the results of Caffe, the mainstream open source framework. At this point, I must assume that you have the basic knowledge of in-depth learning and understand the working principle of convolution network.

1.1 Caffe 是什么

1.1 What is Caffe

Caffe 是以 C++/CUDA 代码为主，最早的深度学习框架之一，比 TensorFlow、Mxnet、Pytorch 等都更早，支持命令行、Python 和 Matlab 接口，单机多卡、多机多卡等都可以很方便的使用，CPU 和 GPU 之间无缝切换。

Caffe is based on C++/CUDA code. It is one of the earliest deep learning frameworks. It is earlier than TensorFlow, Mxnet, Pytorch, etc. It supports command line, Python and Matlab interfaces. It can be easily used in single-machine multi-card, multi-machine multi-card, and seamless switching between CPU and GPU.

对于入门级别的任务，如图像分类，Caffe 上手的成本最低，几乎不需要写一行代码就可以开始训练，所以我推荐 Caffe 作为入门学习的框架。

For entry-level tasks, such as image classification, Caffe has the lowest cost and can start training without writing a single line of code, so I recommend Caffe as a framework for introductory learning.

Caffe 相对于 TensorFlow 等使用 pip 一键安装的方式来说，编译安装稍微麻烦一些，但其实依旧很简单，我们以 Ubuntu 16.04 为例，官网的安装脚本足够用了，方法如下：

Compiling and installing Caffe is slightly more troublesome than using the PIP one-button installation method such as TensorFlow, but it is still very simple. Take Ubuntu 16.04 as an example, the installation script of the official website is enough. The method is as follows:

sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler

Sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler

sudo apt-get install --no-install-recommends libboost-all-devsudo apt-get install libatlas-base-dev

Sudo apt-get install -- no-install-recommend libboost-all-devsudo apt-get install libatlas-base-dev

sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev

Sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev

装完之后，到 Git 上 clone 代码，修改 Makefile.config 就可以进行编译安装，如果其中有任何问题，多 Google，还有什么问题，就联系我们吧。当然，对于有 GPU 的读者，还需要安装 cuda 以及 Nvidia 驱动。

After loading, you can compile and install clone code on Git by modifying Makefile. config. If there are any problems, multiple Google, and any other problems, please contact us. Of course, for readers with GPUs, you also need to install CUDA and Nvidia drivers.

1.2 Caffe 训练

1.2 Caffe Training

Caffe 完成一个训练，必要准备以下资料：一个是 train.prototxt 作为网络配置文件，另一个是 solver.prototxt 作为优化参数配置文件，再一个是训练文件 list。

When Caffe completes a training, it is necessary to prepare the following information: one is train. prototxt as the network configuration file, the other is solver. prototxt as the optimization parameter configuration file, and the other is the training file list.

另外，在大多数情况下，需要一个预训练模型作为权重的初始化。

In addition, in most cases, a pre-training model is needed to initialize the weight.

（ 1 ）准备网络配置文件

(1) Preparing network configuration files

我们准备了一个 3*3 的卷积神经网络，我们把相关代码保存在它的 train.prototxt 文件中。从 Git 上 clone 下来我们就能看到。

We have prepared a 3*3 convolutional neural network, and we save the relevant code in its train. prototxt file. We can see it from clone on Git.

现在我们分析下 Caffe 的这个网络配置文件：每一个卷积层，都是以 layer{} 的形式定义，layer 的 bottom、top 就是它的输入输出，type 就是它的类型，有的是数据层、有的是卷积层、有的是 loss 层。

Now let's analyze Caffe's network configuration file: each convolution layer is defined in the form of layer {}. The bottom and top of layer are its input and output. Type is its type. Some are data layer, some are convolution layer, and some are loss layer.

我们采用 netscope 来可视化一下这个模型。

We use netscope to visualize the model.

从上面看很直观的看到，网络的输入层是 data 层，后面接了 3 个卷积层，其中每一个卷积层都后接了一个 relu 层，最后 ip1-mouth、fc-mouth 是全连接层。Loss 和 acc 分别是计算 loss 和 acc 的层。

From the above, it is intuitive to see that the input layer of the network is data layer, followed by three convolution layers, each convolution layer is followed by a relu layer, and finally ip1-mouth, fc-mouth is the full connection layer. Loss and ACC are the layers for computing loss and acc, respectively.

各层的配置有一些参数，比如 conv1 有卷积核的学习率、卷积核的大小、输出通道数、初始化方法等，这些可以后续详细了解。

There are some parameters in the configuration of each layer, such as the learning rate of conv1 with convolution core, the size of convolution core, the number of output channels, the initialization method, etc. These can be further understood.

（ 2 ）准备训练 list

(2) Preparatory training list

我们看上面的 data layer，可以到

Let's look at the data layer above.

image_data_param 看，里面有

Look at image_data_param, there are

source: "all_shuffle_train.txt"

Source: "all_shuffle_train.txt"

它是什么呢，就是输入用于训练的 list，它的内容是这样的：

What is it? It inputs a list for training. Its content is as follows:

../../../../datas/mouth/1/182smile.jpg 1

. /. /. /.. /.. / datas / mouth / 1 / 182smile. JPG 1

../../../../datas/mouth/1/435smile.jpg 1

. /. /. /.. /.. / datas / mouth / 1 / 435smile. JPG 1

../../../../datas/mouth/0/40neutral.jpg 0

. /. /. /.. /.. / datas / mouth / 0 / 40neutral. JPG 0

../../../../datas/mouth/1/206smile.jpg 1

. /. /. /.. /.. / datas / mouth / 1 / 206smile. JPG 1

../../../../datas/mouth/0/458neutral.jpg 0

. /. /. /.. /.. / datas / mouth / 0 / 458 neutral. JPG 0

../../../../datas/mouth/0/158neutral.jpg 0

. /. /. /.. /.. / datas / mouth / 0 / 158 neutral. JPG 0

../../../../datas/mouth/1/322smile.jpg 1

. /. /. /.. /.. / datas / mouth / 1 / 322smile. JPG 1

../../../../datas/mouth/1/83smile.jpg 1

. /. /. /.. /.. / datas / mouth / 1 / 83smile. JPG 1

../../../../datas/mouth/0/403neutral.jpg 0

. /. /. /.. /.. / datas / mouth / 0 / 403 neutral. JPG 0

../../../../datas/mouth/1/425smile.jpg 1

. /. /. /.. /.. / datas / mouth / 1 / 425smile. JPG 1

../../../../datas/mouth/1/180smile.jpg 1

. /. /. /.. /.. / datas / mouth / 1 / 180smile. JPG 1

../../../../datas/mouth/1/233smile.jpg 1

. /. /. /.. /.. / datas / mouth / 1 / 233smile. JPG 1

../../../../datas/mouth/1/213smile.jpg 1

. /. /. /.. /.. / datas / mouth / 1 / 213smile. JPG 1

../../../../datas/mouth/1/144smile.jpg 1

. /. /. /.. /.. / datas / mouth / 1 / 144smile. JPG 1

../../../../datas/mouth/0/327neutral.jpg 0

. /. /. /.. /.. / datas / mouth / 0 / 327 neutral. JPG 0

格式就是，图片的名字 + 空格 + label，这就是 Caffe 用于图片分类默认的输入格式。

The format is the name of the picture + space + label, which is the default input format Caffe uses for image classification.

（ 3 ）准备优化配置文件：

(3) Prepare configuration files for optimization:

net: "./train.prototxt"

Net: ". / train. prototxt"

test_iter: 100

Test_iter: 100

test_interval: 10

Test_interval: 10

base_lr: 0.00001

Base_lr: 0.00001

momentum: 0.9

Momentum: 0.9

type: "Adam"

Type: Adam

lr_policy: "fixed"

Lr_policy: "fixed"

display: 100

Display: 100

max_iter: 10000

Max_iter: 10000

snapshot: 2000

Snapshot: 2000

snapshot_prefix: "./snaps/conv3_finetune"

Snapshot_prefix: ". / snaps / conv3_finetune"

solver_mode: GPU

Solver_mode: GPU

介绍一下上面的参数。

Introduce the above parameters.

net 是网络的配置路径。test_interval 是指训练迭代多少次之后，进行一次测试。test_iter 是测试多少个 batch，如果它等于 1，就说明只取一个 batchsize 的数据来做测试，如果 batchsize 太小，那么对于分类任务来说统计出来的指标也不可信，所以最好一次测试，用到所有测试数据。因为，常令 test_iter*test_batchsize=测试集合的大小。

Net is the configuration path of the network. Test_interval refers to how many iterations of training are followed by a test. Test_iter is how many batches are tested. If it is equal to 1, it means that only one batchsize data is used to test. If batchsize is too small, the statistical indicators for classification tasks are not credible, so it is best to test once and use all the test data. Because test_iter* test_batchsize = the size of the test set is often used.

base_lr、momentum、type、lr_policy 是和学习率有关的参数，base_lr 和 lr_policy 决定了学习率大小如何变化。type 是优化的方

Base_lr, momentum, type, lr_policy are parameters related to learning rate. Base_lr and lr_policy determine how the learning rate changes. Type is the optimizing side

Please read the Chinese version for details.

PREVIOUS：The three-year action plan for the developmen NEXT：The Defects of Artificial Intelligence