Hyperparameter tuning with Kubeflow
This guide walks through the steps to perform hyperparameter tuning on the MNIST dataset using KakaoCloud Kubeflow and Katib.
- Estimated time: 10 minutes
- User environment
- Recommended OS: MacOS, Ubuntu
- Region: kr-central-2
- Prerequisites:
- Reference documents:
Before you start
Before performing hyperparameter tuning using Kubeflow and Katib, prepare the MNIST dataset and the minimum required Kubeflow environment. This allows users to configure the optimal hyperparameter combination to enhance model performance.
About this scenario
This scenario covers the following key topics using Katib:
- Optimizing model performance through hyperparameter tuning
- Exploring the best hyperparameter combination via automated machine learning experiments
- Hands-on practice with hyperparameter tuning using the MNIST dataset
Supported tools
Tool | Version | Description |
---|---|---|
Katib | 0.15.0 | - An open-source project that adjusts hyperparameters to improve model performance. - It allows testing various hyperparameter combinations to improve model performance. |
For more details on Katib, please refer to the Kubeflow > Katib official documentation.
Before you start
1. Prepare training data
This exercise uses the MNIST dataset. By following the steps below, the dataset will be automatically downloaded without any separate action.
MNIST image dataset
The MNIST dataset consists of handwritten digit images from 0 to 9 and is widely used in the field of computer vision. It contains a total of 70,000 grayscale images, each 28x28 pixels.
2. Set up the Kubeflow environment
Before using Katib on Kubeflow, check the MIG settings and GPU node pool specifications suitable for this exercise. If the Kubeflow environment is not prepared, refer to the Setting up Jupyter Notebook environment using Kubeflow document to create a Kubeflow environment with a GPU pipeline node pool configured.
Minimum requirements
- MIG minimum configuration: At least 3 instances of 1g.10gb
- GPU pipeline node pool required
- Node pool size: 100GB or more
Getting started
The specific steps for hyperparameter tuning on the MNIST dataset using Katib are as follows.
Step 1. Create new experiment with an exercise example
-
Access the Kubeflow dashboard.
-
Select the Experiments (AutoML) tab on the left.
-
Click the [NEW EXPERIMENT] button at the top right.
-
On the Create an Experiment screen, click [Edit and submit YAML], copy and paste the YAML code example below, and click the [CREATE] button.
Insert YAML script code example
실습 예제. YAML 스크립트 코드 예제apiVersion: kubeflow.org/v1beta1
kind: Experiment
metadata:
namespace: kubeflow
name: test-automl2
spec:
objective:
type: maximize
goal: 0.99
objectiveMetricName: accuracy
additionalMetricNames:
- loss
metricsCollectorSpec:
source:
filter:
metricsFormat:
- "{metricName: ([\\w|-]+), metricValue: ((-?\\d+)(\\.\\d+)?)}"
fileSystemPath:
path: "/katib/mnist.log"
kind: File
collector:
kind: File
algorithm:
algorithmName: random
parallelTrialCount: 3
maxTrialCount: 12
maxFailedTrialCount: 3
parameters:
- name: lr
parameterType: double
feasibleSpace:
min: "0.01"
max: "0.03"
- name: momentum
parameterType: double
feasibleSpace:
min: "0.3"
max: "0.7"
trialTemplate:
retain: true
primaryContainerName: training-container
trialParameters:
- name: learningRate
description: Learning rate for the training model
reference: lr
- name: momentum
description: Momentum for the training model
reference: momentum
trialSpec:
apiVersion: batch/v1
kind: Job
spec:
template:
metadata:
annotations:
sidecar.istio.io/inject: 'false'
spec:
containers:
- name: training-container
image: bigdata-150.kr-central-2.kcr.dev/kc-kubeflow/katib-pytorch-mnist-gpu:v0.15.0.1a
command:
- "python3"
- "/opt/pytorch-mnist/mnist.py"
- "--epochs=1"
- "--log-path=/katib/mnist.log"
- "--lr=${trialParameters.learningRate}"
- "--momentum=${trialParameters.momentum}"
resources:
requests:
cpu: '1'
memory: 2Gi
limits:
nvidia.com/mig-1g.10gb: 1
restartPolicy: Never
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kakaoi.io/kke-nodepool
operator: In
values:
- ${GPU_PIPELINE_NODEPOOL_NAME}환경변수 설명 GPU_PIPELINE_NODEPOOL_NAME🖌︎ 사용자의 GPU 파이프라인 노드풀 이름 기입 ex. "gpu-node" The above YAML file defines the configuration for the Katib experiment. It includes the experiment’s objective, metrics to optimize, algorithm to use, and hyperparameter search space.
Step 2. Verify the created experiment
-
After running the experiment, you can check the results by navigating to the Experiments (AutoML) tab on the Kubeflow dashboard.
Verify experiment creation
-
If the experiment runs successfully, you can click on it to view detailed information and the best hyperparameter values.
View experiment details
infoFor more details on Katib, please refer to the Kubeflow > Katib official documentation.