Basic knowledge

Deep Learning of Tensorflow Production Environment Deployment (Part I: Environmental Preparations)

Recently, in the research of Tensorflow Serving production environment deployment, especially in the server GPU environment deployment, encountered many pits. To sum up, as a lesson from the past.

1 System Background

The system is Ubuntu 16.04

Ubuntu@ubuntu:/usr/bin$cat/etc/issue

Ubuntu 16.04.5 LTSnl

perhaps

Ubuntu@ubuntu:/usr/bin$uname-m & cat/etc/*release

X86_64

DISTRIB_ID=Ubuntu

DISTRIB_RELEASE=16.04

DISTRIB_CODENAME=xenial

DISTRIB_DESCRIPTION= "Ubuntu 16.04.5 LTS"

NAME= "Ubuntu"

VERSION= "16.04.5 LTS (Xenial Xerus)"

ID = Ubuntu

ID_LIKE=debian

PRETTY_NAME= "Ubuntu 16.04.5 LTS"

VERSION_ID= "16.04"

HOME_URL= "http://www.ubuntu.com/"

SUPPORT_URL= "http://help.ubuntu.com/"

BUG_REPORT_URL= "http://bugs.launchpad.net/ubuntu/"

VERSION_CODENAME=xenial

UBUNTU_CODENAME=xenial

The graphics card is Tesla's P40.

Ubuntu@ubuntu:~$nvidia-smi

Thu Jan 3 16:53:36 2019

+ -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| NVIDIA-SMI 384.130 Driver Version: 384.130|

| -------------------------------------- - +----------------------------------------------------------------------------------------------------------------------------------------------+

| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC|

| Fan Temp Perf Pwr: Usage/Cap | Memory-Usage | GPU-Util Compute M.|

|===============================+======================+======================|

| 0 Tesla P40 Off | 00000000:3B:00.0 Off | 0|

| N/A 34C P 49W/250W | 22152 MiB/22912 MiB | 0% Default|

+ -------------------------------------- - +----------------------------------------------------------------------------------------------------------------------------------------------+

+ -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Processes: GPU Memory|

| GPU PID Type Process Name Usage|

|=============================================================================|

| 0 108 329 C Python 4963 MiB|

| 0 133840 C tensorflow_model_server 17179MiB|

+ -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

TensorFlow is the latest version of 1.12.0.

2 Background knowledge

Before introducing how to deploy, let's look at the relevant concepts

2.1 TensorFlow Serving

Reference material

Tensorflow service technology architecture

Tenor flow service using tutorials

TensorFlow Serving is a kind of production environment deployment scheme provided by Google. Generally speaking, after algorithm training, a model will be derived and used directly in application.

The normal idea is to embed tensorflow model in flask, a web service, to provide rest api's cloud service interface. Considering the concurrent high availability, multi-process deployment is usually adopted, that is, multiple flasks are deployed on a cloud server at the same time, each process has a part of GPU resources, which is obviously a waste of resources.

Google provides a new idea for a production environment. They have developed a tensorflow-serving service, which can automatically load all models under a certain path. The model provides RPC or rest services directly through predefined input-output and computational graphs.

On the one hand, support multiple versions of hot deployment (for example, the current production environment deploys a version 1 model, after training, generates a version 2 model, tensorflow automatically loads the model and stops the previous model).

On the other hand, tensorflow service achieves high availability through asynchronous invocation, and automatically organizes input to save GPU computing resources by batch invocation.

Therefore, the invocation of the entire model becomes:

Client - > Web service (flask or tornado) - - grpc or rest - > tensorflow service

If we want to replace the model or update the version, we just need to train the model and save the training results to a fixed directory.

2.2 Docker

Reference material:

Docker tutorial

Doker Actual Warfare

Doker is simply a container technology. If a friend who has done technical support knows the pain of installing software - various system environments, resulting in various installation errors.. Doker solves the problem that as long as you install docker on the server, it will automatically shield all hardware information and pull one out. With a single mirror, you can start providing services directly.

It's also easy to build docker. If MAC downloads DMG file directly, it can double-click to run; if Ubuntu runs directly

Sudo apt-get install docker

However, Ubuntu can only be used through root after installation. If you want other users to use it, you need to adjust the docker group, Baidu can do the details.

Commonly used commands are also less:

# View currently deployed services

Docker PS

# Running a container service

Docker run

# Delete a service

Docker kill XXX

2.3 Nvidia-docker

Reference material:

Nvidia-docker GitHub official website

Because docker is virtual on the operating system, it shields a lot of underlying information. If you want to use graphics card hardware, one idea is that docker directly maps drivers and algorithm libraries on the operating system into containers, but this loses portability.

Another way is to mount a driver-like plug-in when the docker starts -- that's nvidia.-

Please read the Chinese version for details.

PREVIOUS：Unsupervised Learning of Machine Learning Alg NEXT：Deep Learning of Tensorflow Production Enviro