Skip to main content

Hyperparameter tuning with Kubeflow

This guide walks through the steps to perform hyperparameter tuning on the MNIST dataset using KakaoCloud Kubeflow and Katib.

Basic information

Before starting

Before performing hyperparameter tuning using Kubeflow and Katib, prepare the MNIST dataset and the minimum required Kubeflow environment. This allows users to configure the optimal hyperparameter combination to enhance model performance.

About this scenario

This scenario covers the following key topics using Katib:

  • Optimizing model performance through hyperparameter tuning
  • Exploring the best hyperparameter combination via automated machine learning experiments
  • Hands-on practice with hyperparameter tuning using the MNIST dataset

Supported tools

ToolVersionDescription
Katib0.15.0- An open-source project that adjusts hyperparameters to improve model performance.
- It allows testing various hyperparameter combinations to improve model performance.
info

For more details on Katib, please refer to the Kubeflow > Katib official documentation.

Prework

1. Prepare training data

This exercise uses the MNIST dataset. By following the steps below, the dataset will be automatically downloaded without any separate action.

MNIST image dataset MNIST image dataset

The MNIST dataset consists of handwritten digit images from 0 to 9 and is widely used in the field of computer vision. It contains a total of 70,000 grayscale images, each 28x28 pixels.

2. Set up the Kubeflow environment

Before using Katib on Kubeflow, check the MIG settings and GPU node pool specifications suitable for this exercise. If the Kubeflow environment is not prepared, refer to the Setting up Jupyter Notebook environment using Kubeflow document to create a Kubeflow environment with a GPU pipeline node pool configured.

Minimum requirements

  • MIG minimum configuration: At least 3 instances of 1g.10gb
  • GPU pipeline node pool required
  • Node pool size: 100GB or more

Step-by-step process

The specific steps for hyperparameter tuning on the MNIST dataset using Katib are as follows.

Step 1. Create new experiment with an exercise example

  1. Access the Kubeflow dashboard.

  2. Select the Experiments (AutoML) tab on the left.

  3. Click the [NEW EXPERIMENT] button at the top right.

  4. On the Create an Experiment screen, click [Edit and submit YAML], copy and paste the YAML code example below, and click the [CREATE] button.

    Create a YAML script Insert YAML script code example

    실습 예제. YAML 스크립트 코드 예제
    apiVersion: kubeflow.org/v1beta1
    kind: Experiment
    metadata:
    namespace: kubeflow
    name: test-automl2
    spec:
    objective:
    type: maximize
    goal: 0.99
    objectiveMetricName: accuracy
    additionalMetricNames:
    - loss
    metricsCollectorSpec:
    source:
    filter:
    metricsFormat:
    - "{metricName: ([\\w|-]+), metricValue: ((-?\\d+)(\\.\\d+)?)}"
    fileSystemPath:
    path: "/katib/mnist.log"
    kind: File
    collector:
    kind: File
    algorithm:
    algorithmName: random
    parallelTrialCount: 3
    maxTrialCount: 12
    maxFailedTrialCount: 3
    parameters:
    - name: lr
    parameterType: double
    feasibleSpace:
    min: "0.01"
    max: "0.03"
    - name: momentum
    parameterType: double
    feasibleSpace:
    min: "0.3"
    max: "0.7"
    trialTemplate:
    retain: true
    primaryContainerName: training-container
    trialParameters:
    - name: learningRate
    description: Learning rate for the training model
    reference: lr
    - name: momentum
    description: Momentum for the training model
    reference: momentum
    trialSpec:
    apiVersion: batch/v1
    kind: Job
    spec:
    template:
    metadata:
    annotations:
    sidecar.istio.io/inject: 'false'
    spec:
    containers:
    - name: training-container
    image: bigdata-150.kr-central-2.kcr.dev/kc-kubeflow/katib-pytorch-mnist-gpu:v0.15.0.1a
    command:
    - "python3"
    - "/opt/pytorch-mnist/mnist.py"
    - "--epochs=1"
    - "--log-path=/katib/mnist.log"
    - "--lr=${trialParameters.learningRate}"
    - "--momentum=${trialParameters.momentum}"
    resources:
    requests:
    cpu: '1'
    memory: 2Gi
    limits:
    nvidia.com/mig-1g.10gb: 1
    restartPolicy: Never
    affinity:
    nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    nodeSelectorTerms:
    - matchExpressions:
    - key: kakaoi.io/kke-nodepool
    operator: In
    values:
    - ${GPU_PIPELINE_NODEPOOL_NAME}
    환경변수설명
    GPU_PIPELINE_NODEPOOL_NAME🖌사용자의 GPU 파이프라인 노드풀 이름 기입 ex. "gpu-node"

    The above YAML file defines the configuration for the Katib experiment. It includes the experiment’s objective, metrics to optimize, algorithm to use, and hyperparameter search space.

Step 2. Verify the created experiment

  1. After running the experiment, you can check the results by navigating to the Experiments (AutoML) tab on the Kubeflow dashboard.

    Verify experiment creation Verify experiment creation

  2. If the experiment runs successfully, you can click on it to view detailed information and the best hyperparameter values.

    View experiment details View experiment details

    info

    For more details on Katib, please refer to the Kubeflow > Katib official documentation.