Skip to main content
Tutorial series | Kubeflow basic workflow

Deploy Jupyter Notebooks on Kubeflow

This tutorial explains how to set up a Jupyter Notebook using the Kubeflow service on KakaoCloud's Kubernetes environment.

Basic information
  • Estimated time: 30 minutes
  • Recommended OS: MacOS, Ubuntu

About this scenario

With KakaoCloud's Kubernetes Engine and Kubeflow services, you can build an efficient MLOps environment. This scenario walks you through creating Kubeflow from the console, accessing the dashboard, and launching a Jupyter Notebook instance for data analysis and model training.

Key topics:

  • Setting up a Kubernetes cluster and File Storage environment
  • Launching a Jupyter Notebook for data analysis and model training

Before you start

As preparation, you'll need to create and configure a Kubernetes cluster and File Storage.

1. Create Kubernetes cluster

Set up a basic Kubernetes cluster to serve as the foundation for Kubeflow components.

  1. From the KakaoCloud console, go to Container Pack > Kubernetes Engine and click [Create Cluster]. Refer to the following settings:

Cluster settings

ItemValue
Cluster namekc-handson
Kubernetes version1.28
Cluster network settingsChoose a network with an IP range that allows external communication
Note

If the network is a private subnet, the nodes can't access the internet. You'll need to set up NAT for external CR communication. Refer to Using a NAT Instance.

Node pool settings

NameCountDescription
pool-ingress1- Type: Virtual Machine
- Instance type: m2a.large
- Volume: 50GB
- Nodes: 1
- Autoscaling: Disabled
pool-worker6- Type: Virtual Machine
- Instance type: m2a.xlarge
- Volume: 100GB
- Nodes: 6
- Autoscaling: Disabled
pool-gpu1- Type: GPU
- Instance type: p2i.6xlarge
- Volume: 100GB
- Autoscaling: Disabled
  1. Wait until all node pools show Running status.
  2. Follow kubectl control setup to configure your cluster's kubectl file.

2. Create File Storage

Create File Storage for persistent volume use by notebooks. Ensure the storage is on the same network and subnet as your cluster.

  1. From the KakaoCloud console, go to Beyond Storage Service > File Storage and click [Create Instance]. Use the settings below:
ItemDescription
Instancekc-handson-fs
Volume size1TB
NetworkSame as the Kubernetes cluster
SubnetSame as the Kubernetes cluster
Access controlAllow all private IPs in the network
Mount pathhandson
  1. Confirm the instance status is Active.

Getting started

Now let's configure the Jupyter Notebook environment.

Step 1. Create Kubeflow

Deploy Kubeflow on your Kubernetes cluster using the following settings:

  1. Go to AI Service > Kubeflow in the KakaoCloud console and click [Create Kubeflow].

Kubeflow settings

ItemValue
Kubeflow namekc-handson
Version1.8
Service typeEssential+HPT+ServingAPI

Cluster settings

ItemValue
Clusterkc-handson
Ingress node poolpool-ingress
Worker node poolpool-worker
CPU node poolpool-worker
GPU node poolpool-gpu
GPU MIG1g.10gb - 7 units
Default File Storagekc-handson-fs

User and workload auth settings

CategoryItemValue
Object StorageTypeObject Storage or MinIO
Kubeflow OwnerEmail${ADMIN_EMAIL} (example@kakaocloud.com)
Namespacekubeflow-tutorial
Namespace File Storagekc-handson-fs
DBTypeKubeflow Internal DB
Port3306
Password${DB_PASSWORD}
Confirm Password${DB_PASSWORD}
Domain (optional)Valid domain format
  1. Confirm the Kubeflow status is Active.

Step 2. Access the dashboard

Access the deployed Kubeflow environment through its dashboard to manage resources and configure your Jupyter Notebook environment.

You can connect using either the public IP of the load balancer or kubectl port forwarding.

  1. Go to Load Balancing > Load Balancer in the KakaoCloud console.
  2. Locate the load balancer named kube_service_{PROJECT_ID}_{IKE_CLUSTER_NAME}_ingress-nginx_ingress-nginx-controller for the Kubeflow ingress and check its public IP. If none is assigned, click the [More] icon and assign one.

Assign public IP

  1. In your browser, access the public IP on port 80:
open http://{LB_PUBLIC_IP}
  1. Once connected to the dashboard, log in using the admin email and the initial password sent to that email during Kubeflow setup.

Step 3. Create Jupyter Notebook

You can create a Jupyter Notebook instance directly from the dashboard. In this step, you’ll configure the notebook specifications and launch it.

  1. From the left panel of the Kubeflow dashboard, click [Notebooks].

    Create Jupyter Notebook

  2. On the Notebooks page, click the [+ New Notebook] button in the top right corner. Refer to the following settings to fill in your notebook information.

  1. For creating a notebook with GPU support, use the following reference table:

    ItemFieldDescription
    NameNameUsed to identify the notebook instance in the dashboard
    NamespaceKubernetes namespace for the notebook
    Docker ImageImageSpecify the Docker image
    CPU / RAMMinimum CPUNumber of CPU cores allocated to the notebook
    Minimum Memory GiMemory allocated in GiB
    GPUsNumber of GPUsGPU count for the notebook
    GPU VendorSelect GPU driver and software toolkit
    Affinity / TolerationsAffinity ConfigSelect GPU node pool
    - Defines which node the notebook runs on
    Tolerations GroupAllow tolerations for specific node taints
  2. Example input values:

    ItemValue
    Namehandson
    Imagekc-kubeflow/jupyter-pyspark-pytorch-cuda:v1.8.0.py311.1a
    Minimum CPU2
    Minimum Memory Gi12
    Number of GPUs4
    GPU VendorNVIDIA MIG - 1g.10gb
    Affinity Configpool-gpu
  1. Click the [LAUNCH] button to create the notebook instance.

Step 4. Access Jupyter Notebook

Access your running Jupyter Notebook instance to begin working on your machine learning project.

  1. Click the [CONNECT] button for the notebook instance you created.

    Click Connect button

  2. In the notebook UI, select the Python3 kernel.

    Click Python3 button

  3. Enter and run the following sample code to test GPU availability:

    import torch

    def check_gpu_available():
    if torch.cuda.is_available():
    print("GPU is available on this system.")
    else:
    print("GPU is not available on this system.")

    check_gpu_available()
    Note

    If you're using a single GPU instance, set the CUDA_VISIBLE_DEVICES environment variable to 0:

    import torch
    import os

    def set_cuda_devices():
    os.environ["CUDA_VISIBLE_DEVICES"] = "0"

    def check_gpu_available():
    if torch.cuda.is_available():
    print("GPU is available on this system.")
    else:
    print("GPU is not available on this system.")

    set_cuda_devices()
    check_gpu_available()
  4. For GPU-enabled notebooks, you can also open a Terminal inside the notebook and run nvidia-smi to confirm the NVIDIA device:

    Check NVIDIA device