refer
refer
cuda 는 알아서 설치 요망
prerequisit: NCCL for Multi-GPU, gcc compllier, anaconda gxx complier, tensorflow-gpu, openmpi
- install nccl
- install gcc version 4.9
- if you use anaconda virtual environment, install anaconda g++ complier
- install tensorflow-gpu
- install openmpi
- install horovod with tensorflow
-
install nccl
refer
Download Network Installer for Ubuntu16.041 2 3
$ sudo dpkg -i nvidia-machine-learning-repo-<version>.deb $ sudo apt update $ sudo apt install libnccl2=2.5.6-1+cuda10.0 libnccl-dev=2.5.6-1+cuda10.0
version = 2.5.6
version check -
locate nccl| grep "libnccl.so" | tail -n1 | sed -r 's/^.*\.so\.//'
-
install gcc version 4.9
GCC, G++ 버전관리
refer1 2 3 4 5 6
sudo update-alternatives --display gcc sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 40 --slave /usr/bin/g++ g++ /usr/bin/g++-7 sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 20 --slave /usr/bin/g++ g++ /usr/bin/g++-5 sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.9 60 --slave /usr/bin/g++ g++ /usr/bin/g++-4.9 gcc --version
-
if you use anaconda virtual environment, install anaconda g++ complier
1
conda install -c anaconda gxx\_linux-64
-
install tensorflow-gpu
1
conda install -c anaconda tensorflow-gpu=2.0.0
-
install openmpi
1
conda install -c conda-forge openmpi
-
install horovod with tensorflow
NCCL 헤더와 라이브러리 파일 경로, cuda 경로를 잘 넣어주어야 한다.1
HOROVOD\_NCCL\_INCLUDE=/usr/include HOROVOD\_NCCL\_LIB=/usr/lib/x86\_64-linux-gnu HOROVOD\_CUDA\_HOME=/usr/local/cuda-10.0 HOROVOD\_WITH\_TENSORFLOW=1 pip install --force-reinstall --no-deps --no-cache-dir horovod==0.18.1
Leave a comment