Skip to main content

Setting up NVIDIA GPU environment

This guide explains the process of setting up an NVIDIA GPU environment, including installing drivers and libraries, creating an instance using a base image, and adding GPU drivers.

Basic information
  • Estimated time: 30 minutes
  • Recommended OS: MacOS, Ubuntu
  • Region: kr-central-2

About this scenario

Setting up an NVIDIA GPU environment in KakaoCloud is an important step to effectively utilize GPU resources.

This document explains how to install and configure NVIDIA drivers, CUDA, and cuDNN libraries in the KakaoCloud environment for machine learning or deep learning tasks using NVIDIA GPUs. Users can create instances with the default Ubuntu 20.04 NVIDIA image or a specific NVIDIA/CUDA version of their choice, and then install the necessary drivers and libraries.

Prework

As a prework, create a VPC and security group.

Create VPC and subnet

Refer to the Create VPC and Create subnet documentation to create a new VPC and Subnet.

Create security group

Refer to the Create security group documentation to create a security group. Add the following inbound policy:

Check my public IP

Click the button below to check your current public IP.

CIDRProtocolPortRole
{Your Public IP}/32TCP22ssh

Step-by-step process

To create a GPU instance using the base image Ubuntu 20.04 (NVIDIA VERSION) provided by KakaoCloud, follow these steps. This image includes NVIDIA Driver Version 470.199.02 and CUDA Version 11.4, so no additional NVIDIA Driver or CUDA installation is required.

Installing specific NVIDIA/CUDA version

If you need to install a specific version of NVIDIA or CUDA, refer to the Appendix: Install specific NVIDIA/CUDA Version document.

Step 1. Create GPU instance

  1. Complete the Prework.

  2. Go to KakaoCloud Console > GPU menu.

  3. In the Instance tab, click the [Create instance] button.

  4. Under Create instance, configure the VM instance as follows, then click the [Create] button.

    FieldSetting
    Basic information- Name: Set as desired
    - Count: 1
    ImageSelect Ubuntu 20.04 - 5.4.0-173 (NVIDIA) under the Base tab
    Instance typep2i.6xlarge
    VolumeRoot Volume: 50GB or more
    Key pairPrivate key
    .pem format, create new or use an existing key
    Network- VPC: Select the VPC created in the prework
    - Subnet: Select the subnet created in the prework
    - Security Group: Select the Security Group created above

Step 2. Associate public IP

Associate a public IP with the GPU instance.

  1. Go to KakaoCloud Console > Beyond Compute Service > GPU.
  2. Click the [More] icon > Associate public IP.
  3. In the popup, select Create new public IP and assign it automatically, then click [OK].
  4. Verify the public IP in the Public IP column.

Step 3. Install cuDNN

Install NVIDIA cuDNN (NVIDIA CUDA Deep Neural Network library), a GPU-accelerated library for deep neural networks.

  1. To run the cuDNN file, access the instance via SSH.

    ssh ubuntu@${HOST_PUBLIC_IP} -i ${PRIVATE_KEY_FILE}.pem

    #Example) ssh ubuntu@210.100.00.000 -i test.pem
    ParameterTypeRequiredDescription
    HOST_PUBLIC_IPStringYesPublic IP address of the GPU instance
    PRIVATE_KEY_FILEStringYesPrivate key file

    이미지 SSH connected

  2. Follow the cuDNN Installation Guide provided by NVIDIA to install cuDNN and CUDA. This document provides instructions and commands for installing cuDNN on Ubuntu 20.04.

  3. Verify the installation of cuDNN using the following command:

    cat /usr/include/x86_64-linux-gnu/cudnn_version*.h | grep CUDNN_MAJOR

    # define CUDNN_MAJOR 8

    이미지 cuDNN installed

Appendix. Install specific NVIDIA/CUDA version

If a specific NVIDIA or CUDA version is required, other than the provided versions, you can use the general Ubuntu image to install the desired NVIDIA or CUDA version.

Step 1. Create GPU instance

  1. Complete the prework.

  2. Go to KakaoCloud Console > GPU.

  3. Click the [Create instance] button.

  4. Configure the VM instance as follows, then click the [Create] button.

    FieldSetting
    Basic information- Name: Set as desired
    - Count: 1
    ImageSelect Ubuntu 20.04 - 5.4.0-173 under the Base tab
    Instance typep2i.6xlarge
    VolumeRoot Volume: 50GB or more
    Key PairPrivate key
    .pem format, create new or use an existing key
    Network- VPC: Select the VPC created in the prework
    - Subnet: Select the Subnet created in the prework
    - Security Group: Select the Security Group created above

Step 2. Associate public IP

Associate a public IP with the GPU instance.

  1. In the KakaoCloud Console, select Beyond Compute Service > GPU.
  2. Click the [More] icon > Associate public IP.
  3. In the popup, select Create new public IP and assign it automatically, then click [OK].
  4. Verify the public IP in the Public IP column.

Step 3. Access to GPU instance and verify environment

  1. Navigate to the directory where your private key file is located.

  2. Use SSH to connect to the public IP created above and verify that the instance is functioning correctly.

    ssh ubuntu@${HOST_PUBLIC_IP} -i ${PRIVATE_KEY_FILE}.pem
    ParameterTypeRequiredDescription
    HOST_PUBLIC_IPStringYesHost public IP address
    PRIVATE_KEY_FILEStringYesPrivate key file
  3. Verify the image information and NVIDIA device details of the GPU instance.

    cat /etc/*release
    lspci | grep -i NVIDIA

Step 4. Install NVIDIA driver

Verify and install the necessary drivers and libraries to set up the GPU environment.

GPU TypeNVIDIA VersionCUDA VersioncuDNN Version
NVIDIA A100450.80.02 or higherCUDA Toolkit 11.4 or higher8.1 or higher
info

If you are following this guide on a different environment or require a specific version, you can download the necessary version from the CUDA Toolkit Archive.

  1. Update the package list.

    sudo apt-get update
    sudo apt-get -y upgrade

    이미지 패키지 리스트 업데이트 완료

  2. Depending on your environment, an NVIDIA driver might already be installed, which can cause issues. To prevent this, remove any existing drivers using the following command:

    sudo apt-get -y remove nvidia* && sudo apt autoremove -y
    sudo apt-get install build-essential linux-headers-generic
  3. Add the graphics driver repository to the package sources.

    sudo add-apt-repository ppa:graphics-drivers/ppa # If a prompt appears with additional information, press 'Enter' to proceed.

    sudo apt-get update
  4. Install the nvidia-driver-470, one of the drivers that supports Ubuntu 20.04 LTS:

    sudo apt install -y nvidia-driver-470

    이미지 nvidia-driver-470 installed

  5. Reboot the instance to apply the installed driver. After a few moments, you can reconnect.

    sudo reboot
  6. After the instance reboots, reconnect using SSH to the previously generated Public IP and verify that everything is functioning correctly.

    ssh ubuntu@${HOST_PUBLIC_IP} -i ${PRIVATE_KEY_FILE}.pem
    ParameterTypeRequiredDescription
    HOST_PUBLIC_IPStringYesHost public IP address
    PRIVATE_KEY_FILEStringYesPrivate key file
  7. Verify the installation results by entering the following command.

    nvidia-smi
    # Installation success:
    # Thu Nov 3 02:21:13 2022
    # +-----------------------------------------------------------------------------+
    # | NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4 |
    # |-------------------------------+----------------------+----------------------+
    # | ...

    이미지 Installation success

Step 5. Install NVIDIA CUDA toolkit

The NVIDIA CUDA Toolkit is a development platform for creating GPU-accelerated applications. It includes GPU-accelerated libraries, debugging and optimization tools, a compiler, and the CUDA runtime for deploying applications. For more details, refer to the NVIDIA official website.

Check CUDA installation

Run the nvcc -V command to verify the installed CUDA Toolkit. If installed correctly, CUDA drivers will be recognized, and you can skip the CUDA installation step and proceed to the cuDNN installation step.

Checking CUDA Installation
nvcc -V

# nvcc: NVIDIA (R) Cuda compiler driver
# Copyright (c) 2005-2021 NVIDIA Corporation
# ...

Step 6. Install CUDA

  1. Download the CUDA installation package.

    wget https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda_11.4.0_470.42.01_linux.run
  2. Run the downloaded CUDA installation file.

    sudo sh cuda_11.4.0_470.42.01_linux.run
  3. During installation, if you see a message like the one below, select [continue].

    ┌──────────────────────────────────────────────────────────────────────────────┐
    │ Existing package manager installation of the driver found. It is strongly │
    │ recommended that you remove this before continuing. │
    │ Abort │
    > Continue
    ...
  4. Next, you will be prompted to accept the EULA license. You must agree to proceed with the installation. Enter accept when prompted.

    ┌──────────────────────────────────────────────────────────────────────────────┐
    │ End User License Agreement │
    │ -------------------------- │
    │ │
    │ The CUDA Toolkit ...

    ...

    │──────────────────────────────────────────────────────────────────────────────│
    │ Do you accept the above EULA? (accept/decline/quit): │
    │ accept │
    └──────────────────────────────────────────────────────────────────────────────┘
  5. Uncheck any drivers that are already installed. After verifying, select [Install] to proceed with the installation.

    ┌──────────────────────────────────────────────────────────────────────────────┐
    │ CUDA Installer │
    │ - [] Driver │
    [] 470.42.01 │
    │ + [X] CUDA Toolkit 11.4
    [X] CUDA Samples 11.4
    [X] CUDA Demo Suite 11.4
    [X] CUDA service 11.4
    │ Options │
    > Install
    ...
  6. After the installation is complete, add the environment variables related to the CUDA Toolkit.

    export PATH=$PATH:/usr/local/cuda/bin
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
    export CUDADIR=/usr/local/cuda
  7. Run the nvcc -V command to verify the installed CUDA Toolkit.

    nvcc -V

    # Success:
    # nvcc: NVIDIA (R) Cuda compiler driver
    # Copyright (c) 2005-2021 NVIDIA Corporation
    # ...

Step 7. Install cuDNN

Install NVIDIA cuDNN (NVIDIA CUDA Deep Neural Network library), a GPU-accelerated library for deep neural networks. For more details, refer to the NVIDIA official website.

  1. Visit the NVIDIA cuDNN page, log in, and click the [Download cuDNN library] button to download the appropriate version. For this tutorial, we will use the Local Installer for Linux x86_64 (Tar) version compatible with CUDA 11.x, and use the file as-is in its compressed form.

    Image Downloading the installation file

  2. Navigate to the directory containing your private key file, then run the following command to transfer the cuDNN file installed on your local environment to the instance.

    sudo scp -i ${PRIVATE_KEY_FILE}.pem ${CUDNN_INSTALL_FILE} ubuntu@${HOST_PUBLIC_IP}:~/

    # example) sudo scp -i ~/Downloads/test.pem ~/Downloads/cudnn-linux-x86_64-8.9.7.29_cuda11-archive.tar.xz ubuntu@210.100.00.000:~/

    ParameterTypeRequiredDescription
    PRIVATE_KEY_FILEStringYesPath to the private key file
    CUDNN_INSTALL_FILEStringYesPath to the cuDNN installation file
    HOST_PUBLIC_IPStringYesPublic IP address of the GPU instance
  3. SSH into the instance to execute the cuDNN file.

    ssh ubuntu@${HOST_PUBLIC_IP} -i ${PRIVATE_KEY_FILE}.pem

    #예제) ssh ubuntu@210.100.00.000 -i test.pem
    ParameterTypeRequiredDescription
    HOST_PUBLIC_IPStringYesPublic IP address of the GPU instance
    PRIVATE_KEY_FILEStringYesPrivate key file

    이미지 SSH connected

  4. Use the tar command to extract the cuDNN package.

    tar -xvf cudnn-linux-x86_64*.tar.xz
  5. Install the cuDNN files into the directory where CUDA is installed. The default installation path is assumed to be /usr/local/cuda/.

    sudo cp cudnn-*-archive/include/cudnn*.h /usr/local/cuda/include
    sudo cp -P cudnn-*-archive/lib/libcudnn* /usr/local/cuda/lib64
    sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
  6. Verify the installation of cuDNN using the following command.

    cat /usr/include/x86_64-linux-gnu/cudnn_version*.h | grep CUDNN_MAJOR

    # define CUDNN_MAJOR 8

    이미지 Completion of cuDNN installation