Deep Learning of Tensorflow Production Environment Deployment (Part I: Environmental Preparations)
Recently, in the research of Tensorflow Serving production environment deployment, especially in the server GPU environment deployment, encountered many pits. To sum up, as a lesson from the past.
1 System Background
The system is Ubuntu 16.04
Ubuntu@ubuntu:/usr/bin$cat/etc/issue
Ubuntu 16.04.5 LTSnl
perhaps
Ubuntu@ubuntu:/usr/bin$uname-m & cat/etc/*release
X86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION= "Ubuntu 16.04.5 LTS"
NAME= "Ubuntu"
VERSION= "16.04.5 LTS (Xenial Xerus)"
ID = Ubuntu
ID_LIKE=debian
PRETTY_NAME= "Ubuntu 16.04.5 LTS"
VERSION_ID= "16.04"
HOME_URL= "http://www.ubuntu.com/"
SUPPORT_URL= "http://help.ubuntu.com/"
BUG_REPORT_URL= "http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
The graphics card is Tesla's P40.
Ubuntu@ubuntu:~$nvidia-smi
Thu Jan 3 16:53:36 2019
+ -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| NVIDIA-SMI 384.130 Driver Version: 384.130|
| -------------------------------------- - +----------------------------------------------------------------------------------------------------------------------------------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC|
| Fan Temp Perf Pwr: Usage/Cap | Memory-Usage | GPU-Util Compute M.|
|===============================+======================+======================|
| 0 Tesla P40 Off | 00000000:3B:00.0 Off | 0|
| N/A 34C P 49W/250W | 22152 MiB/22912 MiB | 0% Default|
+ -------------------------------------- - +----------------------------------------------------------------------------------------------------------------------------------------------+
+ -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Processes: GPU Memory|
| GPU PID Type Process Name Usage|
|=============================================================================|
| 0 108 329 C Python 4963 MiB|
| 0 133840 C tensorflow_model_server 17179MiB|
+ -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
TensorFlow is the latest version of 1.12.0.
2 Background knowledge
Before introducing how to deploy, let's look at the relevant concepts
2.1 TensorFlow Serving
Reference material
Tensorflow service technology architecture
Tenor flow service using tutorials
TensorFlow Serving is a kind of production environment deployment scheme provided by Google. Generally speaking, after algorithm training, a model will be derived and used directly in application.
The normal idea is to embed tensorflow model in flask, a web service, to provide rest api's cloud service interface. Considering the concurrent high availability, multi-process deployment is usually adopted, that is, multiple flasks are deployed on a cloud server at the same time, each process has a part of GPU resources, which is obviously a waste of resources.
Google provides a new idea for a production environment. They have developed a tensorflow-serving service, which can automatically load all models under a certain path. The model provides RPC or rest services directly through predefined input-output and computational graphs.
On the one hand, support multiple versions of hot deployment (for example, the current production environment deploys a version 1 model, after training, generates a version 2 model, tensorflow automatically loads the model and stops the previous model).
On the other hand, tensorflow service achieves high availability through asynchronous invocation, and automatically organizes input to save GPU computing resources by batch invocation.
Therefore, the invocation of the entire model becomes:
Client - > Web service (flask or tornado) - - grpc or rest - > tensorflow service
If we want to replace the model or update the version, we just need to train the model and save the training results to a fixed directory.
2.2 Docker
Reference material:
Docker tutorial
Doker Actual Warfare
Doker is simply a container technology. If a friend who has done technical support knows the pain of installing software - various system environments, resulting in various installation errors.. Doker solves the problem that as long as you install docker on the server, it will automatically shield all hardware information and pull one out. With a single mirror, you can start providing services directly.
It's also easy to build docker. If MAC downloads DMG file directly, it can double-click to run; if Ubuntu runs directly
Sudo apt-get install docker
However, Ubuntu can only be used through root after installation. If you want other users to use it, you need to adjust the docker group, Baidu can do the details.
Commonly used commands are also less:
# View currently deployed services
Docker PS
# Running a container service
Docker run
# Delete a service
Docker kill XXX
2.3 Nvidia-docker
Reference material:
Nvidia-docker GitHub official website
Because docker is virtual on the operating system, it shields a lot of underlying information. If you want to use graphics card hardware, one idea is that docker directly maps drivers and algorithm libraries on the operating system into containers, but this loses portability.
Another way is to mount a driver-like plug-in when the docker starts -- that's nvidia.-
Please read the Chinese version for details.