Setting up Jupyter Notebook environment using Kubeflow
This guide explains the steps to configure a Jupyter Notebook environment using the Kubeflow service on KakaoCloud's Kubernetes platform.
- Estimated time: 30 minutes
- Recommended OS: MacOS, Ubuntu
- Region: kr-central-2
- Prerequisites:
Before starting
Using KakaoCloud's Kubernetes Engine and Kubeflow, you can establish an efficient foundation for an MLOps environment. In this document, you'll learn how to perform data analysis and model training using Jupyter Notebook, and how to optimize machine learning workflows using various features of Kubeflow.
About this scenario
In this scenario, we guide you through creating Kubeflow on the KakaoCloud console, accessing the dashboard, and creating a Jupyter Notebook instance. The main topics covered in this scenario are:
- Setting up a Kubernetes cluster and file storage
- Performing data analysis and model training by creating a Jupyter Notebook
Prework
As a prerequisite for setting up the Kubeflow environment, create and configure a Kubernetes cluster and file storage.
1. Create Kubernetes cluster
Configure a basic Kubernetes cluster for the Kubeflow environment. This cluster serves as the foundation for deploying various Kubeflow components.
-
In the KakaoCloud Console > Container Pack > Kubernetes Engine, click [Create cluster].
Cluster settings
Item Description Cluster name kc-handson Kubernetes version 1.28 Cluster network settings Select a network with an IP range that supports external communication from the created VPC and subnet infoIf the cluster's network is a private subnet, nodes in the private subnet cannot communicate over the internet. To enable internet communication for external CRs, NAT communication is required.
You can use a NAT Instance for NAT communication. For more details, refer to Appendix. NAT instance.
Node pool settings
Name Count Node pool specification pool-ingress 1 - Node pool type: Virtual Machine
- Instance type:m2a.large
- Volume type/size: 50GB
- Node count: 1
- Autoscale: Disabledpool-worker 6 - Node pool type: Virtual Machine
- Instance type:m2a.xlarge
- Volume type/size: 100GB
- Node count: 6
- Autoscale: Disabledpool-gpu 1 - Node pool type: GPU
- Instance type:p2i.6xlarge
- Volume type/size: 100GB
- Autoscale: Disabled -
Ensure that the status of the created node pool is
Running
. -
Follow the steps in Kubectl control setup to configure the kubectl file for the cluster.
2. Create file storage
Create file storage required for data management and storage in Kubeflow. This storage will be used as a Persistent Volume for the notebook instance, ensuring safe storage of data and models. Configure the file storage instance in the same network and subnet as the selected cluster.
-
In the KakaoCloud Console > Beyond Storage Service > File Storage, click [Create instance].
Item Description Instance name kc-handson-fs Volume size 1TB Network settings Same as the Kubernetes cluster Subnet settings Same as the Kubernetes cluster Access control settings Allow access from all private IPs within the configured network Mount information handson -
Ensure that the status of the created instance changes to
Active
.
Step-by-step process
The main steps for configuring the Jupyter Notebook environment are as follows.
Step 1. Create Kubeflow
Deploy and configure Kubeflow on the prepared Kubernetes cluster. This process ensures that you can utilize Kubeflow's various features through the initial configuration.
-
In the KakaoCloud Console > AI Service > Kubeflow menu, click [Create Kubeflow]. Refer to the configuration values below to create Kubeflow.
Kubeflow settings
Item Description Kubeflow name kc-handson Kubeflow version 1.8 Kubeflow service type Essential+HPT+ServingAPI Cluster settings
Item Description Cluster connection kc-handson Ingress node pool pool-ingress Worker node pool pool-worker CPU node pool pool-worker GPU node pool pool-gpu GPU MIG 1g.10gb - 7 count Default file storage kc-handson-fs
Authentication information for users and workloads
Category Item Description Object storage settings Object storage type Object Storage
orMinIO
Kubeflow owner settings Owner email account $ {ADMIN_EMAIL}
(example@kakaocloud.com)Namespace name kubeflow-tutorial Namespace file storage kc-handson-fs DB settings DB type Kubeflow Internal DB Port 3306
Password $ {DB_PASSWORD}
Confirm password $ {DB_PASSWORD}
Domain connection (optional) Enter a valid domain format -
Ensure that the created Kubeflow status changes to
Active
.
Step 2. Access the dashboard
To access the deployed Kubeflow environment, connect to the dashboard. From here, you can manage various Kubeflow resources and configure the Jupyter Notebook environment.
There are two main methods to access the Kubeflow dashboard: via Load Balancer Public IP or using kubectl
port forwarding.
- Using Load Balancer Public IP
- Using Kubectl port forwarding
-
In the KakaoCloud Console, go to Load Balancing > Load Balancer.
-
Find the load balancer named
kube_service_{project_id}_{IKE cluster_name}_ingress-nginx_ingress-nginx-controller
created for Kubeflow's Ingress and check its Public IP. If there is no Public IP, assign a new one from the options menu. -
Open your browser and access the Public IP of the load balancer on port
80
.open http://{LB_PUBLIC_IP}
You can access the Kubeflow gateway or Kubeflow Istio gateway by establishing a port-forwarding connection via the kubectl
CLI.
-
Use the
kubectl
command to connect to the Kubernetes cluster where Kubeflow is installed. -
Forward a specific port (e.g.,
8080
) from your local system to the Kubeflow dashboard port.kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80
-
Open your browser and access the Kubeflow dashboard by navigating to the local host on port
8080
.open http://localhost:8080
- After accessing the dashboard, log in using the owner email account provided during the Kubeflow creation step and the initial password sent to the owner’s email.
Step 3. Create Jupyter Notebook
Through the dashboard, users can create a Jupyter Notebook instance. In this step, you will select the specifications for the notebook and configure the necessary settings.
-
In the Kubeflow dashboard, click on the Notebooks tab on the left side.
-
Navigate to the Notebooks page and click the [+ New Notebook] button at the top right. Refer to the information below to create a new notebook.
- GPU-based Notebook
- CPU-based Notebook
-
For a GPU-based notebook, refer to the following configuration details:
Item Category Description Name Name Used to identify the notebook instance in the Kubeflow dashboard Namespace Kubernetes namespace where the notebook instance will be created Docker Image Image Specify the Docker image CPU / RAM Minimum CPU The number of CPU cores, specifying the amount of CPU resources the notebook instance will use Minimum Memory Gi Unit of memory resources (GiB), specifying the amount of memory resources the notebook instance will use GPUs Number of GPUs GPU resources to be used by the notebook instance Affinity / Tolerations Affinity Config Select the CPU node pool where the notebook will be created
- Specify the node on which the notebook instance will runTolerations Group Allow specific node taint settings -
Enter the information for the notebook you want to create. Refer to the example values below.
Item Description Name handson Image kc-kubeflow/jupyter-pyspark-pytorch:v1.8.0.py311.1a Minimum CPU 2 Minimum Memory Gi 12 Number of GPUs 4 GPU Vendor NVIDIA MIG - 1g.10gb Affinity Config pool-gpu
-
Refer to the following information to create a CPU-based notebook.
Item Field Description Name Name Used to identify the notebook instance in the Kubeflow dashboard Namespace The Kubernetes namespace where the notebook instance will be created Docker Image Image mlops-pipelines/jupyter-pyspark-pytorch:v1.0.1.py36
- Specify the Docker imageCPU / RAM Requested CPUs 2
- Number of CPU cores allocated to the notebook instanceRequested memory in Gi 8
- Memory allocation for the notebook instance in GiBGPUs Number of GPUs None
- No GPU resources allocatedAffinity / Tolerations Affinity Config Select the CPU node pool where the notebook will be created Tolerations Group None
- Set tolerations for specific node taints -
Enter the information for the notebook you want to create. Refer to the example values below.
Item Description Name handson Image kc-kubeflow/jupyter-pyspark-pytorch:v1.8.0.py311.1a Minimum CPU 2 Minimum Memory Gi 12 Number of GPUs None Affinity Config pool-worker
- Click the [LAUNCH] button to create the notebook.
Step 4. Access Jupyter Notebook
Once the Jupyter Notebook instance is created, you can access it to work on real machine learning projects.
-
Click the [CONNECT] button next to the created notebook instance to access it.
-
Select the Python3 kernel from the Notebook.
-
Enter the following example code. After running the code, verify the output message to confirm the results.
import torch
def check_gpu_available():
if torch.cuda.is_available():
print("GPU is available on this system.")
else:
print("GPU is not available on this system.")
check_gpu_available()Basic informationUnlike this tutorial, when using a Single GPU instance, you need to set the environment variable CUDA_VISIBLE_DEVICES to 0.
import torch
import os
def set_cuda_devices():
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
def check_gpu_available():
if torch.cuda.is_available():
print("GPU is available on the current system.")
else:
print("GPU is not available on the current system.")
set_cuda_devices()
check_gpu_available() -
For notebooks using GPUs, access the terminal within the Notebook and run the
nvidia-smi
command to check the NVIDIA devices.