深度学习环境搭建部署(DeepLearning神经⽹络)⼯作环境
系统:Ubuntu 16.04.5 LTS
life for rent
显卡:GPU
NVIDIA驱动:410.93
CUDA:10.0
Python:3.x
需要部署的软件
conda环境
nccl2环境
snowingopenmpi环境
horovod环境
1. 创建conda环境
下载合适的安装⽂件,然后运⾏。
1 cd init
2sudo wget /archive/Anaconda3-2019.03-Linux-x86_64.sh2013考研英语二答案
3 bash Anaconda3-2019.03-Linux-x86_64.sh
根据提⽰操作,并选择安装⽬录,默认安装在~/anaconda3/ ⽬录下。
注:初始化操作
1、如果默认不初始化,则安装之后,没有conda命令,需要⼿动初始化
注:为避免⽤户名泄露,此处的⽤户名均已$USER替代
installation finished.
Do you wish the installer to initialize Anaconda3
by running conda init? [yes|no]
[no] >>>
新视野大学英语
You have chon to not have conda modify your shell scripts at all.
To activate conda's ba environment in your current shell ssion:
eval "$(/home/$USER/anaconda3/bin/conda shell.YOUR_SHELL_NAME hook)"
To install conda's shell functions for easier access, first activate, then:
conda init
If you'd prefer that conda's ba environment not be activated on startup,
t the auto_activate_ba parameter to fal:
conda config --t auto_activate_ba fal
Thank you for installing Anaconda3!
===========================================================================
Anaconda and JetBrains are working together to bring you Anaconda-powered
environments tightly integrated in the PyCharm IDE.
2、如果选择初始化,则会修改~/.bashrc⽂件,并创建conda命令
installation finished.
Do you wish the installer to initialize Anaconda3
by running conda init? [yes|no]
"deeplearning"105L, 3558C written
installation finished.
Do you wish the installer to initialize Anaconda3
by running conda init? [yes|no]
[no] >>> yes
WARNING: pat module is deprecated and will be removed in a future relea. no change /home/$USER/anaconda3/condabin/conda
no change /home/$USER/anaconda3/bin/conda
no change /home/$USER/anaconda3/bin/conda-env
no change /home/$USER/anaconda3/bin/activate
no change /home/$USER/anaconda3/bin/deactivate
no change /home/$USER/anaconda3/etc/profile.d/conda.sh
no change /home/$USER/anaconda3/etc/fish/conf.d/conda.fish
no change /home/$USER/anaconda3/shell/condabin/Conda.psm1
no change /home/$USER/anaconda3/shell/condabin/conda-hook.ps1
no change /home/$USER/anaconda3/lib/python3.7/site-packages/xonsh/conda.xsh
no change /home/$USER/anaconda3/etc/profile.d/conda.csh
modified /home/$USER/.bashrc
==> For changes to take effect, clo and re-open your current shell. <==
If you'd prefer that conda's ba environment not be activated on startup,
t the auto_activate_ba parameter to fal:
conda config --t auto_activate_ba fal
Thank you for installing Anaconda3!
===========================================================================
Anaconda and JetBrains are working together to bring you Anaconda-powered environments tightly integrated in the PyCharm IDE.
PyCharm for Anaconda is available at:
/pycharm
执⾏以下命令,使conda环境⽣效
1 source ~/.bashrc
2. 进⼊conda py
3.6
1 conda create -n py36 python=3.6
2 conda activate py36newsouthwales
3. 安装必要包
#修改清华的pip源
1mkdir ~/.pip
2touch ~/.f
#f中写⼊以下内容
[global]
index-url = pypi.tuna.tsinghua.edu/simple
安装包
1 pip install numpy==1.16.23分钟演讲稿
2 pip install opencv-python==4.1.0.25
3 pip install keras==2.1.4
4 pip install tensorflow-gpu==1.13.1
4. 安装nccl2
根据系统和cuda版本下载对应的nccl2
1sudo dpkg -i nccl-repo-ubuntu1604-2.4.7-ga-cuda10.0_1-1_amd64.deb
2sudo apt-key add /var/nccl-repo-2.4.7-ga-cuda10.0/7fa2af80.pub(根据提⽰执⾏)
3sudo apt update
4sudo apt install libnccl2=2.4.7-1+cuda10.0 libnccl-dev=2.4.7-1+cuda10.0
5、安装libcudnn
1sudo dpkg -i libcudnn7_7.6.0.64-1+cuda10.0_amd64.deb
locked
2sudo dpkg -i libcudnn7-dev_7.6.0.64-1+cuda10.0_amd64.deb
6. 安装openmpi
1sudo wget download.open-mpi/relea/open-mpi/v4.0/openmpi-4.0. 2gunzip -c openmpi-4.0. | tar xf -
3 cd openmpi-4.0.1/
4sudo ./configure --prefix=/usr/local
5sudo make all install
7. 安装horovod
1 HOROVOD_GPU_ALLREDUCE=NCCL pip install --no-cache-dir horovod
注:HOROVOD_WITH_TENSORFLOW=1 可开启debug模式。
⾄此,深度学习环境安装完成,接下来即可做深度训练。
网上学习
conda环境常⽤命令
如何默认不使⽤conda环境
1 conda config --t auto_activate_ba fal
退出conda环境uniform
1 conda deactivate
进⼊conda环境
1 conda activate
医务室安装过程中可能出现的问题:
1、
ImportError: libcudnn.so.7: cannot open shared object file: No such file or directory
原因:cudann未安装或者版本错误
1sudo dpkg -i libcudnn7_7.6.0.64-1+cuda10.0_amd64.deb
2sudo dpkg -i libcudnn7-dev_7.6.0.64-1+cuda10.0_amd64.deb
2、
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
原因:⼀般是cuda版本不对导致
解决:安装对应的cuda版本即可
3、
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory 原因:⼀般情况是cuda链接库的问题
解决:执⾏以下命令即可
1sudo ldconfig /usr/local/cuda/lib64
4、奇葩问题:
ModuleNotFoundError: No module named 'cv2'
如果未安装opencv-python,直接执⾏以下命令安装即可
1 pip install opencv-python==4.1.0.25