Skip to main content
Tutorial series | Kubeflow basic workflows

Kubeflow hyperparameter tuning

This tutorial provides a step-by-step guide on performing hyperparameter tuning on the MNIST dataset using Kubeflow and Katib on KakaoCloud.

Basic information
  • Estimated time: 10 minutes
  • Recommended OS: MacOS, Ubuntu
  • Reference docs:
  • Note:
    • In private network environments, training file downloads may not work properly.

About this scenario

Before performing hyperparameter tuning using Kubeflow and Katib, you will prepare the MNIST dataset and a minimal Kubeflow environment required for the exercise. This tutorial guides you in configuring optimal hyperparameter combinations to improve your model's performance.

Key topics include:

  • Optimizing model performance through hyperparameter tuning
  • Discovering the best hyperparameter combination using automated machine learning experiments
  • Hands-on practice with the MNIST dataset for hyperparameter tuning

Supported tools

ToolVersionDescription
Katib0.15.0- An open-source project for improving model performance by tuning hyperparameters.
- Enables testing a wide range of hyperparameter combinations.
Note

For more details about Katib, refer to the Kubeflow > Katib official documentation.

Before you start

1. Prepare training data

This exercise uses the MNIST dataset. The dataset will be automatically downloaded during the tutorial steps, so no manual download is necessary.

MNIST dataset sample MNIST image dataset

The MNIST dataset contains grayscale images of handwritten digits (0 through 9) and is widely used in the field of computer vision. It consists of 70,000 images, each 28x28 pixels.

2. Set up Kubeflow environment

Before using Katib in Kubeflow, ensure that your environment meets the proper MIG and GPU node pool requirements. If you haven’t set up Kubeflow yet, refer to Deploy Jupyter Notebooks on Kubeflow and create an environment with a GPU pipeline node pool.

Minimum requirements

  • MIG setup: At least 3 instances of 1g.10gb
  • GPU pipeline node pool is required
  • Node pool size: 100GB or more

Getting started

The following steps walk through hyperparameter tuning on the MNIST dataset using Katib.

Step 1. Create new experiment with example YAML

  1. Log in to the Kubeflow dashboard.
  2. Select the Experiments (AutoML) tab on the left panel.
  3. Click the [NEW EXPERIMENT] button on the top right.
  4. In the Create an experiment screen, click [Edit and submit YAML], paste the example YAML below, and click [CREATE].

Create YAML script
YAML script example

Exercise example: YAML script
apiVersion: kubeflow.org/v1beta1
kind: Experiment
metadata:
namespace: kubeflow
name: test-automl2
spec:
objective:
type: maximize
goal: 0.99
objectiveMetricName: accuracy
additionalMetricNames:
- loss
metricsCollectorSpec:
source:
filter:
metricsFormat:
- "{metricName: ([\\w|-]+), metricValue: ((-?\\d+)(\\.\\d+)?)}"
fileSystemPath:
path: "/katib/mnist.log"
kind: File
collector:
kind: File
algorithm:
algorithmName: random
parallelTrialCount: 3
maxTrialCount: 12
maxFailedTrialCount: 3
parameters:
- name: lr
parameterType: double
feasibleSpace:
min: "0.01"
max: "0.03"
- name: momentum
parameterType: double
feasibleSpace:
min: "0.3"
max: "0.7"
trialTemplate:
retain: true
primaryContainerName: training-container
trialParameters:
- name: learningRate
description: Learning rate for the training model
reference: lr
- name: momentum
description: Momentum for the training model
reference: momentum
trialSpec:
apiVersion: batch/v1
kind: Job
spec:
template:
metadata:
annotations:
sidecar.istio.io/inject: 'false'
spec:
containers:
- name: training-container
image: bigdata-150.kr-central-2.kcr.dev/kc-kubeflow/katib-pytorch-mnist-gpu:v0.15.0.1a
command:
- "python3"
- "/opt/pytorch-mnist/mnist.py"
- "--epochs=1"
- "--log-path=/katib/mnist.log"
- "--lr=${trialParameters.learningRate}"
- "--momentum=${trialParameters.momentum}"
resources:
requests:
cpu: '1'
memory: 2Gi
limits:
nvidia.com/mig-1g.10gb: 1
restartPolicy: Never
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kakaoi.io/kke-nodepool
operator: In
values:
- ${GPU_PIPELINE_NODEPOOL_NAME}
환경변수설명
GPU_PIPELINE_NODEPOOL_NAME🖌Insert your GPU pipeline node pool name, e.g. "gpu-node"

This YAML defines the configuration of the Katib experiment, including its objective, optimization metric, algorithm, hyperparameter search space, and trial template.

Step 2. Verify created experiment

  1. After running the experiment, go to the Experiments (AutoML) tab in the Kubeflow dashboard to check the results.

Verify experiment creation
Experiment successfully created

  1. If the experiment ran successfully, click on it to view detailed information and the optimal hyperparameter values.

Check experiment details
View experiment details

Note

For more information on Katib, see the Kubeflow > Katib official documentation.