Deploy Jupyter Notebooks on Kubeflow
This tutorial explains how to set up a Jupyter Notebook using the Kubeflow service on KakaoCloud's Kubernetes environment.
- Estimated time: 30 minutes
- Recommended OS: MacOS, Ubuntu
About this scenario
With KakaoCloud's Kubernetes Engine and Kubeflow services, you can build an efficient MLOps environment. This scenario walks you through creating Kubeflow from the console, accessing the dashboard, and launching a Jupyter Notebook instance for data analysis and model training.
Key topics:
- Setting up a Kubernetes cluster and File Storage environment
- Launching a Jupyter Notebook for data analysis and model training
Before you start
As preparation, you'll need to create and configure a Kubernetes cluster and File Storage.
1. Create Kubernetes cluster
Set up a basic Kubernetes cluster to serve as the foundation for Kubeflow components.
- From the KakaoCloud console, go to Container Pack > Kubernetes Engine and click [Create Cluster]. Refer to the following settings:
Cluster settings
Item | Value |
---|---|
Cluster name | kc-handson |
Kubernetes version | 1.28 |
Cluster network settings | Choose a network with an IP range that allows external communication |
If the network is a private subnet, the nodes can't access the internet. You'll need to set up NAT for external CR communication. Refer to Using a NAT Instance.
Node pool settings
Name | Count | Description |
---|---|---|
pool-ingress | 1 | - Type: Virtual Machine - Instance type: m2a.large - Volume: 50GB - Nodes: 1 - Autoscaling: Disabled |
pool-worker | 6 | - Type: Virtual Machine - Instance type: m2a.xlarge - Volume: 100GB - Nodes: 6 - Autoscaling: Disabled |
pool-gpu | 1 | - Type: GPU - Instance type: p2i.6xlarge - Volume: 100GB - Autoscaling: Disabled |
- Wait until all node pools show
Running
status. - Follow kubectl control setup to configure your cluster's kubectl file.
2. Create File Storage
Create File Storage for persistent volume use by notebooks. Ensure the storage is on the same network and subnet as your cluster.
- From the KakaoCloud console, go to Beyond Storage Service > File Storage and click [Create Instance]. Use the settings below:
Item | Description |
---|---|
Instance | kc-handson-fs |
Volume size | 1TB |
Network | Same as the Kubernetes cluster |
Subnet | Same as the Kubernetes cluster |
Access control | Allow all private IPs in the network |
Mount path | handson |
- Confirm the instance status is
Active
.
Getting started
Now let's configure the Jupyter Notebook environment.
Step 1. Create Kubeflow
Deploy Kubeflow on your Kubernetes cluster using the following settings:
- Go to AI Service > Kubeflow in the KakaoCloud console and click [Create Kubeflow].
Kubeflow settings
Item | Value |
---|---|
Kubeflow name | kc-handson |
Version | 1.8 |
Service type | Essential+HPT+ServingAPI |
Cluster settings
Item | Value |
---|---|
Cluster | kc-handson |
Ingress node pool | pool-ingress |
Worker node pool | pool-worker |
CPU node pool | pool-worker |
GPU node pool | pool-gpu |
GPU MIG | 1g.10gb - 7 units |
Default File Storage | kc-handson-fs |
User and workload auth settings
Category | Item | Value |
---|---|---|
Object Storage | Type | Object Storage or MinIO |
Kubeflow Owner | ${ADMIN_EMAIL} (example@kakaocloud.com) | |
Namespace | kubeflow-tutorial | |
Namespace File Storage | kc-handson-fs | |
DB | Type | Kubeflow Internal DB |
Port | 3306 | |
Password | ${DB_PASSWORD} | |
Confirm Password | ${DB_PASSWORD} | |
Domain (optional) | Valid domain format |
- Confirm the Kubeflow status is
Active
.
Step 2. Access the dashboard
Access the deployed Kubeflow environment through its dashboard to manage resources and configure your Jupyter Notebook environment.
You can connect using either the public IP of the load balancer or kubectl
port forwarding.
- Using load balancer public IP
- Using kubectl port forwarding
- Go to Load Balancing > Load Balancer in the KakaoCloud console.
- Locate the load balancer named
kube_service_{PROJECT_ID}_{IKE_CLUSTER_NAME}_ingress-nginx_ingress-nginx-controller
for the Kubeflow ingress and check its public IP. If none is assigned, click the [More] icon and assign one.
- In your browser, access the public IP on port 80:
open http://{LB_PUBLIC_IP}
You can access the Kubeflow gateway directly or use the kubectl
CLI to port-forward to the Kubeflow Istio gateway.
- Use
kubectl
to connect to the Kubernetes cluster used during Kubeflow setup. - Forward a local port (e.g.,
8080
) to the Kubeflow dashboard port:
kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80
- In your browser, open the forwarded local port:
open http://localhost:8080
- Once connected to the dashboard, log in using the admin email and the initial password sent to that email during Kubeflow setup.
Step 3. Create Jupyter Notebook
You can create a Jupyter Notebook instance directly from the dashboard. In this step, you’ll configure the notebook specifications and launch it.
-
From the left panel of the Kubeflow dashboard, click [Notebooks].
-
On the Notebooks page, click the [+ New Notebook] button in the top right corner. Refer to the following settings to fill in your notebook information.
- Notebook using GPU image
- Notebook using CPU image
-
For creating a notebook with GPU support, use the following reference table:
Item Field Description Name Name Used to identify the notebook instance in the dashboard Namespace Kubernetes namespace for the notebook Docker Image Image Specify the Docker image CPU / RAM Minimum CPU Number of CPU cores allocated to the notebook Minimum Memory Gi Memory allocated in GiB GPUs Number of GPUs GPU count for the notebook GPU Vendor Select GPU driver and software toolkit Affinity / Tolerations Affinity Config Select GPU node pool
- Defines which node the notebook runs onTolerations Group Allow tolerations for specific node taints -
Example input values:
Item Value Name handson Image kc-kubeflow/jupyter-pyspark-pytorch-cuda:v1.8.0.py311.1a Minimum CPU 2 Minimum Memory Gi 12 Number of GPUs 4 GPU Vendor NVIDIA MIG - 1g.10gb Affinity Config pool-gpu
-
For creating a notebook without GPU, refer to the table below:
Item Field Description Name Name Used to identify the notebook instance in the dashboard Namespace Kubernetes namespace for the notebook Docker Image Image Specify the Docker image CPU / RAM Minimum CPU Number of CPU cores allocated to the notebook Minimum Memory Gi Memory allocated in GiB GPUs Number of GPUs GPU usage (None) Affinity / Tolerations Affinity Config Select CPU node pool
- Defines which node the notebook runs onTolerations Group Allow tolerations for specific node taints -
Example input values:
Item Value Name handson Image kc-kubeflow/jupyter-pyspark-pytorch:v1.8.0.py311.1a Minimum CPU 2 Minimum Memory Gi 12 Number of GPUs None Affinity Config pool-worker
- Click the [LAUNCH] button to create the notebook instance.
Step 4. Access Jupyter Notebook
Access your running Jupyter Notebook instance to begin working on your machine learning project.
-
Click the [CONNECT] button for the notebook instance you created.
-
In the notebook UI, select the Python3 kernel.
-
Enter and run the following sample code to test GPU availability:
import torch
def check_gpu_available():
if torch.cuda.is_available():
print("GPU is available on this system.")
else:
print("GPU is not available on this system.")
check_gpu_available()NoteIf you're using a single GPU instance, set the
CUDA_VISIBLE_DEVICES
environment variable to0
:import torch
import os
def set_cuda_devices():
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
def check_gpu_available():
if torch.cuda.is_available():
print("GPU is available on this system.")
else:
print("GPU is not available on this system.")
set_cuda_devices()
check_gpu_available() -
For GPU-enabled notebooks, you can also open a Terminal inside the notebook and run
nvidia-smi
to confirm the NVIDIA device: