refer
refer
cuda 는 알아서 설치 요망
prerequisit: NCCL for Multi-GPU, gcc compllier, anaconda gxx complier, tensorflow-gpu, openmpi

  1. install nccl
  2. install gcc version 4.9
  3. if you use anaconda virtual environment, install anaconda g++ complier
  4. install tensorflow-gpu
  5. install openmpi
  6. install horovod with tensorflow
  1. install nccl
    refer
    Download Network Installer for Ubuntu16.04

    1
    2
    3
    
    $ sudo dpkg -i nvidia-machine-learning-repo-<version>.deb
    $ sudo apt update
    $ sudo apt install libnccl2=2.5.6-1+cuda10.0 libnccl-dev=2.5.6-1+cuda10.0
    

    version = 2.5.6
    version check

  2. locate nccl| grep "libnccl.so" | tail -n1 | sed -r 's/^.*\.so\.//'

  3. install gcc version 4.9
    GCC, G++ 버전관리
    refer

    1
    2
    3
    4
    5
    6
    
    sudo update-alternatives --display  gcc
    sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 40 --slave /usr/bin/g++ g++ /usr/bin/g++-7
    sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 20 --slave /usr/bin/g++ g++ /usr/bin/g++-5 
    sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.9 60 --slave /usr/bin/g++ g++ /usr/bin/g++-4.9 
        
    gcc --version
    
  4. if you use anaconda virtual environment, install anaconda g++ complier

    1
    
    conda install -c anaconda gxx\_linux-64
    
  5. install tensorflow-gpu

    1
    
    conda install -c anaconda tensorflow-gpu=2.0.0
    
  6. install openmpi

    1
    
    conda install -c conda-forge openmpi
    
  7. install horovod with tensorflow
    NCCL 헤더와 라이브러리 파일 경로, cuda 경로를 잘 넣어주어야 한다.

    1
    
      HOROVOD\_NCCL\_INCLUDE=/usr/include HOROVOD\_NCCL\_LIB=/usr/lib/x86\_64-linux-gnu HOROVOD\_CUDA\_HOME=/usr/local/cuda-10.0 HOROVOD\_WITH\_TENSORFLOW=1 pip install --force-reinstall --no-deps --no-cache-dir horovod==0.18.1  
    

Tags:

Categories:

Updated:

Leave a comment