My Way to AI: Building GPU Edition Deep Learning Environment from Bare Machine
I've been running on CPU for a long time, and I can barely stand it because I'm working in the direction of NLP. Recently, when making images, I can't carry them anymore. Fortunately, leaders support me to buy a virtual machine and experience it first. Because of the newly purchased machine, the environment has to be explored by oneself, fooled many times, also went through many detours, so I recorded the correct process of installing a deep learning environment from bare machine. (Whole Root User Oh!)
Brief Introduction of Bare Machine
Server is Aliyun's entOS 7.4, the default choice of CUDA driver is wrong, more than 1.5 tensorflow should choose CUDA 9.0, pay attention to not too high, not too low! TF is very picky!
First, let's talk about what's useful in the bare machine.
Git: It looks like the 1.18.x version. I can't remember it clearly.
Python: 2.7
Let's start installation now.
Step 1: Upgrade Python 3
The version I chose here is 3.6.6. The principle is the latest and most stable large version and the highest and smallest version. If you download it locally, you can go to this address directly:
Https://www.python.org/ftp/python/3.6.6/
If the server downloads, you can use commands
WGet https://www.python.org/ftp/python/3.6.6/Python-3.6.6.tgz
Then decompress the compressed package:
Tar zxvf Python-3.6.6.tgz
After entering the catalog:
CD Python-3.6.6
First create the python 3 installation directory:
Mkdir/usr/local/python 3
Start compiling and installing
. / configure -- prefix = / usr / local / Python 3
Make & make install
Modify the old version of Python execution file
Mv/usr/bin/python/usr/bin/python_old2
New Python 3 Soft Connection
Ln-s/usr/local/python 3/bin/python 3/usr/bin/python
Verify with commands:
[root@izwz9fnfgk9709s3h9ex47z ~] # python-V
Python 3.6.6
Then don't forget to add the executable file of pip3:
Ln-s/usr/local/python 3/bin/pip3/usr/bin/pip3
So you can install the module of Python 3 through pip3.
[root@izwz9fnfgk9709s3h9ex47z ~] # pip3-V
PIP 18.0 from / usr / local / Python 3 / lib / Python 3.6 / site-packages / PIP (python 3.6)
At this point, Python will be upgraded.
Since the version of Python script has been modified at this time, it may affect the script of the old system, so we need to modify two scripts - / usr / bin / yum, / usr / libexec / urlgrabber - ext - down to change the corresponding file header from #!/ usr / bin / Python to #!/ usr / bin / Python 2.7.
In addition, I use Aliyun here, Python downloads a lot of things very quickly, if it is other services or physical machines, sometimes need to temporarily specify the yum source, you can use the following commands:
PIP3 install xxx-i http://mirrors.aliyun.com/pypi/simple/--trusted-host mirrors.aliyun.com
Reference: https://www.cnblogs.com/idotest/p/5442173.html
Step 2: Upgrade Git
The default Git for CentOS is 1.8, which is a bit old and can be re-installed.
Before installing, you need to upgrade something like GCC and execute it according to the following scripting procedure:
# Install GCC related stuff
Yum install curl-devel expat-devel gettext-devel openssl-devel zlib-devel
Yum install GCC perl-ExtUtils-MakeMakeMaker
# Here you can choose the latest version to download.
WGet http://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.15.tar.gz
Tar zxvf libiconv-1.15.tar.gz
CD libiconv-1.15
. / configure -- prefix=/usr/local/libiconv
Make & make install
# git before uninstallation
Yum remove Git
# Re-download Git
WGet https://github.com/git/git/archive/v2.18.0.tar.gz
Tar zxvf v2.18.0.tar.gz
CD git-2.18.0
Make configure
. / configure -- prefix = / usr / local / git -- with - iconv = / usr / local / libiconv
Make all doc
Make install install-doc install-html
Echo "export PATH=$PATH:/usr/local/git/bin";/etc/bashrc
Source/etc/bashrc
Then re-validate:
[root@izwz9fnfgk9709s3h9ex47z soft] git--version
Git version 2.18.0
My development process is like this. We have our own git code library. I write the code locally and submit it it to gitlab. Then from Aliyun's server clone code, run. Because it involves a lot of photo resources, Clone has a technique when it comes to:
Git clone XXXX -- depth 1
Specify the depth of clone, otherwise git will download each submission history. If many of the trained pictures or models are deleted or replaced, there is no need to download them.
Reference: https://blog.csdn.net/z_dianjun/article/details/50819908
Step 3: Install CUDA
I remember when I was at school, I wrote a hand-written article to teach you how to install cuda6 on Windows, and there were a lot of people watching it at that time. This time it's easier to install on linux.
First go to the official website to download the version of cuda. If you don't know which version of CUDA you should install, first decide which version of tensorflow you want to use. Then go to the GitHub of tensorflow to see the configure.py file:
Https://github.com/tensorflow/tensorflow/blob/3379bae787d73d6db67d66a284bd1a076b2cbdba/configure.py
Here is the corresponding CUDA version:
_ DEFAULT_CUDA_VERSION='9.0'
_ DEFAULT_CUDNN_VERSION='7'
_ DEFAULT_NCCL_VERSION='2.2'
_ DEFAULT_CUDA_COMPUTE_CAPABILITIES='3.5,7.0'
_ DEFAULT_CUDA_PATH='/usr/local/cuda'
_ DEFAULT_CUDA_PATH_LINUX='/opt/cuda'
_ DEFAULT_CUDA_PATH_WIN= ('C:/Program Files/NVIDIA GPU Computing'
'Toolkit/CUDA/v%s'%_DEFAULT_CUDA_VERSION'
_ DEFAULT_TENSORRT_PATH_LINUX='/usr/lib/%s-linux-gnu'% platform.machine()
_ TF_OPENCL_VERSION='1.2'
_ DEFAULT_COMPUTECPP_TOOLKIT_PATH='/usr/local/computecpp'
_ DEFAULT_TRIS
Please read the Chinese version for details.