Tutorial series | Kubeflow basic workflow

Deploy Jupyter Notebooks on Kubeflow

This tutorial explains how to set up a Jupyter Notebook using the Kubeflow service on KakaoCloud's Kubernetes environment.

Basic information

Estimated time: 30 minutes
Recommended OS: MacOS, Ubuntu

About this scenario

With KakaoCloud's Kubernetes Engine and Kubeflow services, you can build an efficient MLOps environment. This scenario walks you through creating Kubeflow from the console, accessing the dashboard, and launching a Jupyter Notebook instance for data analysis and model training.

Key topics:

Setting up a Kubernetes cluster and File Storage environment
Launching a Jupyter Notebook for data analysis and model training

Before you start

As preparation, you'll need to create and configure a Kubernetes cluster and File Storage.

1. Create Kubernetes cluster

Set up a basic Kubernetes cluster to serve as the foundation for Kubeflow components.

From the KakaoCloud console, go to Container Pack > Kubernetes Engine and click [Create Cluster]. Refer to the following settings:

Cluster settings

Item	Value
Cluster name	kc-handson
Kubernetes version	1.28
Cluster network settings	Choose a network with an IP range that allows external communication

Note

If the network is a private subnet, the nodes can't access the internet. You'll need to set up NAT for external CR communication. Refer to Using a NAT Instance.

Node pool settings

Name	Count	Description
pool-ingress	1	- Type: Virtual Machine - Instance type: `m2a.large` - Volume: 50GB - Nodes: 1 - Autoscaling: Disabled
pool-worker	6	- Type: Virtual Machine - Instance type: `m2a.xlarge` - Volume: 100GB - Nodes: 6 - Autoscaling: Disabled
pool-gpu	1	- Type: GPU - Instance type: `p2i.6xlarge` - Volume: 100GB - Autoscaling: Disabled

Wait until all node pools show Running status.
Follow kubectl control setup to configure your cluster's kubectl file.

2. Create File Storage

Create File Storage for persistent volume use by notebooks. Ensure the storage is on the same network and subnet as your cluster.

From the KakaoCloud console, go to Beyond Storage Service > File Storage and click [Create Instance]. Use the settings below:

Item	Description
Instance	kc-handson-fs
Volume size	1TB
Network	Same as the Kubernetes cluster
Subnet	Same as the Kubernetes cluster
Access control	Allow all private IPs in the network
Mount path	handson

Confirm the instance status is Active.

Getting started

Now let's configure the Jupyter Notebook environment.

Step 1. Create Kubeflow

Deploy Kubeflow on your Kubernetes cluster using the following settings:

Go to AI Service > Kubeflow in the KakaoCloud console and click [Create Kubeflow].

Kubeflow settings

Item	Value
Kubeflow name	kc-handson
Version	1.8
Service type	Essential+HPT+ServingAPI

Cluster settings

Item	Value
Cluster	kc-handson
Ingress node pool	pool-ingress
Worker node pool	pool-worker
CPU node pool	pool-worker
GPU node pool	pool-gpu
GPU MIG	1g.10gb - 7 units
Default File Storage	kc-handson-fs

User and workload auth settings

Category	Item	Value
Object Storage	Type	`Object Storage` or `MinIO`
Kubeflow Owner	Email	$`{ADMIN_EMAIL}` (example@kakaocloud.com)
	Namespace	kubeflow-tutorial
	Namespace File Storage	kc-handson-fs
DB	Type	Kubeflow Internal DB
	Port	3306
	Password	`${DB_PASSWORD}`
	Confirm Password	`${DB_PASSWORD}`
Domain (optional)		Valid domain format

Confirm the Kubeflow status is Active.

Step 2. Access the dashboard

Access the deployed Kubeflow environment through its dashboard to manage resources and configure your Jupyter Notebook environment.

You can connect using either the public IP of the load balancer or kubectl port forwarding.

Using load balancer public IP
Using kubectl port forwarding

Go to Load Balancing > Load Balancer in the KakaoCloud console.
Locate the load balancer named kube_service_{PROJECT_ID}_{IKE_CLUSTER_NAME}_ingress-nginx_ingress-nginx-controller for the Kubeflow ingress and check its public IP. If none is assigned, click the [More] icon and assign one.

Assign public IP

In your browser, access the public IP on port 80:

open http://{LB_PUBLIC_IP}

You can access the Kubeflow gateway directly or use the kubectl CLI to port-forward to the Kubeflow Istio gateway.

Use kubectl to connect to the Kubernetes cluster used during Kubeflow setup.
Forward a local port (e.g., 8080) to the Kubeflow dashboard port:

kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80

In your browser, open the forwarded local port:

open http://localhost:8080

Once connected to the dashboard, log in using the admin email and the initial password sent to that email during Kubeflow setup.

Step 3. Create Jupyter Notebook

You can create a Jupyter Notebook instance directly from the dashboard. In this step, you’ll configure the notebook specifications and launch it.

From the left panel of the Kubeflow dashboard, click [Notebooks].
On the Notebooks page, click the [+ New Notebook] button in the top right corner. Refer to the following settings to fill in your notebook information.

Notebook using GPU image
Notebook using CPU image

For creating a notebook with GPU support, use the following reference table:

Item	Field	Description
Name	Name	Used to identify the notebook instance in the dashboard
	Namespace	Kubernetes namespace for the notebook
Docker Image	Image	Specify the Docker image
CPU / RAM	Minimum CPU	Number of CPU cores allocated to the notebook
	Minimum Memory Gi	Memory allocated in GiB
GPUs	Number of GPUs	GPU count for the notebook
	GPU Vendor	Select GPU driver and software toolkit
Affinity / Tolerations	Affinity Config	Select GPU node pool - Defines which node the notebook runs on
	Tolerations Group	Allow tolerations for specific node taints

Example input values:

Item Value
Name handson
Image kc-kubeflow/jupyter-pyspark-pytorch-cuda:v1.8.0.py311.1a
Minimum CPU 2
Minimum Memory Gi 12
Number of GPUs 4
GPU Vendor NVIDIA MIG - 1g.10gb
Affinity Config pool-gpu

Item	Value
Name	handson
Image	kc-kubeflow/jupyter-pyspark-pytorch-cuda:v1.8.0.py311.1a
Minimum CPU	2
Minimum Memory Gi	12
Number of GPUs	4
GPU Vendor	NVIDIA MIG - 1g.10gb
Affinity Config	pool-gpu

For creating a notebook without GPU, refer to the table below:

Item	Field	Description
Name	Name	Used to identify the notebook instance in the dashboard
	Namespace	Kubernetes namespace for the notebook
Docker Image	Image	Specify the Docker image
CPU / RAM	Minimum CPU	Number of CPU cores allocated to the notebook
	Minimum Memory Gi	Memory allocated in GiB
GPUs	Number of GPUs	GPU usage (None)
Affinity / Tolerations	Affinity Config	Select CPU node pool - Defines which node the notebook runs on
	Tolerations Group	Allow tolerations for specific node taints

Example input values:

Item Value
Name handson
Image kc-kubeflow/jupyter-pyspark-pytorch:v1.8.0.py311.1a
Minimum CPU 2
Minimum Memory Gi 12
Number of GPUs None
Affinity Config pool-worker

Item	Value
Name	handson
Image	kc-kubeflow/jupyter-pyspark-pytorch:v1.8.0.py311.1a
Minimum CPU	2
Minimum Memory Gi	12
Number of GPUs	None
Affinity Config	pool-worker

Click the [LAUNCH] button to create the notebook instance.

Step 4. Access Jupyter Notebook

Access your running Jupyter Notebook instance to begin working on your machine learning project.

Click the [CONNECT] button for the notebook instance you created.
In the notebook UI, select the Python3 kernel.

Enter and run the following sample code to test GPU availability:

import torch

def check_gpu_available():
    if torch.cuda.is_available():
        print("GPU is available on this system.")
    else:
        print("GPU is not available on this system.")

check_gpu_available()

Note

If you're using a single GPU instance, set the CUDA_VISIBLE_DEVICES environment variable to 0:

import torch
import os

def set_cuda_devices():
 os.environ["CUDA_VISIBLE_DEVICES"] = "0"

def check_gpu_available():
 if torch.cuda.is_available():
     print("GPU is available on this system.")
 else:
     print("GPU is not available on this system.")

set_cuda_devices()
check_gpu_available()

For GPU-enabled notebooks, you can also open a Terminal inside the notebook and run nvidia-smi to confirm the NVIDIA device:

About this scenario​

Before you start​

1. Create Kubernetes cluster​

Cluster settings​

Node pool settings​

2. Create File Storage​

Getting started​

Step 1. Create Kubeflow​

Kubeflow settings​

Cluster settings​

User and workload auth settings​

Step 2. Access the dashboard​

Step 3. Create Jupyter Notebook​

Step 4. Access Jupyter Notebook​

About this scenario

Before you start

1. Create Kubernetes cluster

Cluster settings

Node pool settings

2. Create File Storage

Getting started

Step 1. Create Kubeflow

Kubeflow settings

Cluster settings

User and workload auth settings

Step 2. Access the dashboard

Step 3. Create Jupyter Notebook

Step 4. Access Jupyter Notebook