Basic knowledge

My Way to AI: Building GPU Edition Deep Learning Environment from Bare Machine

I've been running on CPU for a long time, and I can barely stand it because I'm working in the direction of NLP. Recently, when making images, I can't carry them anymore. Fortunately, leaders support me to buy a virtual machine and experience it first. Because of the newly purchased machine, the environment has to be explored by oneself, fooled many times, also went through many detours, so I recorded the correct process of installing a deep learning environment from bare machine. (Whole Root User Oh!)

Brief Introduction of Bare Machine

Server is Aliyun's entOS 7.4, the default choice of CUDA driver is wrong, more than 1.5 tensorflow should choose CUDA 9.0, pay attention to not too high, not too low! TF is very picky!

First, let's talk about what's useful in the bare machine.

Git: It looks like the 1.18.x version. I can't remember it clearly.

Python: 2.7

Let's start installation now.

Step 1: Upgrade Python 3

The version I chose here is 3.6.6. The principle is the latest and most stable large version and the highest and smallest version. If you download it locally, you can go to this address directly:

Https://www.python.org/ftp/python/3.6.6/

If the server downloads, you can use commands

WGet https://www.python.org/ftp/python/3.6.6/Python-3.6.6.tgz

Then decompress the compressed package:

Tar zxvf Python-3.6.6.tgz

After entering the catalog:

CD Python-3.6.6

First create the python 3 installation directory:

Mkdir/usr/local/python 3

Start compiling and installing

. / configure -- prefix = / usr / local / Python 3

Make & make install

Modify the old version of Python execution file

Mv/usr/bin/python/usr/bin/python_old2

New Python 3 Soft Connection

Ln-s/usr/local/python 3/bin/python 3/usr/bin/python

Verify with commands:

[root@izwz9fnfgk9709s3h9ex47z ~] # python-V

Python 3.6.6

Then don't forget to add the executable file of pip3:

Ln-s/usr/local/python 3/bin/pip3/usr/bin/pip3

So you can install the module of Python 3 through pip3.

[root@izwz9fnfgk9709s3h9ex47z ~] # pip3-V

PIP 18.0 from / usr / local / Python 3 / lib / Python 3.6 / site-packages / PIP (python 3.6)

At this point, Python will be upgraded.

Since the version of Python script has been modified at this time, it may affect the script of the old system, so we need to modify two scripts - / usr / bin / yum, / usr / libexec / urlgrabber - ext - down to change the corresponding file header from #!/ usr / bin / Python to #!/ usr / bin / Python 2.7.

In addition, I use Aliyun here, Python downloads a lot of things very quickly, if it is other services or physical machines, sometimes need to temporarily specify the yum source, you can use the following commands:

PIP3 install xxx-i http://mirrors.aliyun.com/pypi/simple/--trusted-host mirrors.aliyun.com

Reference: https://www.cnblogs.com/idotest/p/5442173.html

Step 2: Upgrade Git

The default Git for CentOS is 1.8, which is a bit old and can be re-installed.

Before installing, you need to upgrade something like GCC and execute it according to the following scripting procedure:

# Install GCC related stuff

Yum install curl-devel expat-devel gettext-devel openssl-devel zlib-devel

Yum install GCC perl-ExtUtils-MakeMakeMaker

# Here you can choose the latest version to download.

WGet http://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.15.tar.gz

Tar zxvf libiconv-1.15.tar.gz

CD libiconv-1.15

. / configure -- prefix=/usr/local/libiconv

Make & make install

# git before uninstallation

Yum remove Git

# Re-download Git

WGet https://github.com/git/git/archive/v2.18.0.tar.gz

Tar zxvf v2.18.0.tar.gz

CD git-2.18.0

Make configure

. / configure -- prefix = / usr / local / git -- with - iconv = / usr / local / libiconv

Make all doc

Make install install-doc install-html

Echo "export PATH=$PATH:/usr/local/git/bin";/etc/bashrc

Source/etc/bashrc

Then re-validate:

[root@izwz9fnfgk9709s3h9ex47z soft] git--version

Git version 2.18.0

My development process is like this. We have our own git code library. I write the code locally and submit it it to gitlab. Then from Aliyun's server clone code, run. Because it involves a lot of photo resources, Clone has a technique when it comes to:

Git clone XXXX -- depth 1

Specify the depth of clone, otherwise git will download each submission history. If many of the trained pictures or models are deleted or replaced, there is no need to download them.

Reference: https://blog.csdn.net/z_dianjun/article/details/50819908

Step 3: Install CUDA

I remember when I was at school, I wrote a hand-written article to teach you how to install cuda6 on Windows, and there were a lot of people watching it at that time. This time it's easier to install on linux.

First go to the official website to download the version of cuda. If you don't know which version of CUDA you should install, first decide which version of tensorflow you want to use. Then go to the GitHub of tensorflow to see the configure.py file:

Https://github.com/tensorflow/tensorflow/blob/3379bae787d73d6db67d66a284bd1a076b2cbdba/configure.py

Here is the corresponding CUDA version:

_ DEFAULT_CUDA_VERSION='9.0'

_ DEFAULT_CUDNN_VERSION='7'

_ DEFAULT_NCCL_VERSION='2.2'

_ DEFAULT_CUDA_COMPUTE_CAPABILITIES='3.5,7.0'

_ DEFAULT_CUDA_PATH='/usr/local/cuda'

_ DEFAULT_CUDA_PATH_LINUX='/opt/cuda'

_ DEFAULT_CUDA_PATH_WIN= ('C:/Program Files/NVIDIA GPU Computing'

'Toolkit/CUDA/v%s'%_DEFAULT_CUDA_VERSION'

_ DEFAULT_TENSORRT_PATH_LINUX='/usr/lib/%s-linux-gnu'% platform.machine()

_ TF_OPENCL_VERSION='1.2'

_ DEFAULT_COMPUTECPP_TOOLKIT_PATH='/usr/local/computecpp'

_ DEFAULT_TRIS

Please read the Chinese version for details.

PREVIOUS：Hello World, a series of in-depth learning tu NEXT：Classification of Machine Learning Algorithms