Skip to main content
Tutorial series | Kubeflow traffic prediction model

2. Tune model hyperparameters

📈 This section explains the hyperparameter tuning process in the traffic prediction model tutorial.

Basic information
  • Estimated time: 40 minutes
  • Recommended OS: MacOS, Ubuntu

About this scenario

In this tutorial, you will automatically optimize the hyperparameters of a traffic prediction model using Kubeflow Katib. You will go through the entire process including writing the tuning code, building the Docker image, configuring and running the experiment, and reviewing the results.

To use Katib, the following components are required:

  • A Python script (tune.py) that trains a model using input hyperparameters
  • A Docker image and dependency file (requirements.txt) to run the script
  • An Experiment YAML configuration file (katib-experiment.yaml) that defines the experiment execution

Before you start

This tutorial assumes you have completed the previous tutorial, Traffic prediction tutorial (Part 1). In particular, the preprocessed dataset (nlb-sample.csv) for model training must be ready, and the environment must have Kubeflow Katib enabled.

Prerequisites
ItemDescription
Kubeflow environmentMust include Katib functionality
PVC configurationData must be mounted to the /dataset path as dataset-pvc
Container registryYou can use a public registry or KakaoCloud Container Registry

Getting started

Step 1. Write tuning code and build image

1. Write tuning code

Create a file named tune.py for hyperparameter tuning and enter the code below.
This code is based on the GradientBoosting model used in the previous tutorial and trains the model by combining several parameters: n_estimators, learning_rate, max_depth, and subsample.

The dataset is split into five folds, and cross-validation is used to evaluate the performance of each combination.
The Mean Absolute Error (MAE) measured in each iteration is averaged and printed, which is used as the evaluation metric for the Katib experiment.

tune.py
import argparse
import pandas as pd
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import cross_val_score

parser = argparse.ArgumentParser()
parser.add_argument('--n_estimators', type=int, default=100)
parser.add_argument('--learning_rate', type=float, default=0.1)
parser.add_argument('--max_depth', type=int, default=3)
parser.add_argument('--subsample', type=float, default=1.0)
parser.add_argument('--data_version', type=str, default='sample')
parser.add_argument('--random_state', type=int, default=1234)

if __name__ == "__main__":
args = parser.parse_args()
df = pd.read_csv(f'/dataset/nlb-{args.data_version}.csv') # Path to preprocessed dataset

model = GradientBoostingRegressor(
n_estimators=args.n_estimators,
learning_rate=args.learning_rate,
max_depth=args.max_depth,
subsample=args.subsample,
random_state=args.random_state
)

X = df[['x1', 'x2', 'x3', 'x4']]
y = df['y']

scores = cross_val_score(model, X, y, cv=5, scoring='neg_mean_absolute_error')
mae = -scores.mean()
print("MAE=%f" % mae)
Note

The training dataset nlb-sample.csv must be located at the /dataset path.
This file is the result generated from the previous tutorial, and you can use it in the Katib experiment by mounting dataset-pvc to this path.

2. Build tuning image

Since Katib runs user-defined container images to perform experiments, you need to build a Docker image that includes tune.py and push it to a container registry.

  1. Prepare tune.py, requirements.txt, and Dockerfile in the same directory.

    Dockerfile
    FROM python:3.10.0-slim 
    WORKDIR /app
    COPY requirements.txt /app/requirements.txt
    COPY tune.py /app/tune.py
    RUN pip install -r requirements.txt
    RUN rm requirements.txt
    CMD ["python", "/app/tune.py"]
    Requirements.txt
    pandas==2.2.2 
    numpy==2.1.1
    scikit-learn==1.5.1
    scipy==1.14.1
  2. In the terminal, use the command below to build the Docker image and push it to a container registry accessible from your cluster.

    docker build -t <your-registry>/lb-pred-model.edu . 
    docker push <your-registry>/lb-pred-model.edu
    Using public image

    If building a container image and registering it to a registry is difficult, you can use a public image provided by KakaoCloud.

    • Image path: bigdata-150.kr-central-2.kcr.dev/kc-kubeflow/lb-pred-model:edu

This image includes the tune.py script and all required libraries.
If your dataset is mounted to the /dataset path, the image can be used as-is.
It is useful when you want to quickly start the Katib experiment without building your own image. :::

Step 2. Set up experiment

To perform hyperparameter tuning with Katib, you need to define a katib-experiment.yaml file that includes the experiment objective, parameter ranges, execution environment, and other related configurations.

Before you begin
  • IMAGE: Enter the path to the Docker image you want to use.
  • IMAGE_SECRET: Required only when using a private registry. You can omit this field if you are using a public image.
    Refer to KakaoCloud documentation on Registry authentication and management for details.
apiVersion: kubeflow.org/v1beta1
kind: Experiment
metadata:
name: lb-gbr-tune
spec:
objective:
type: minimize
goal: 0.0
objectiveMetricName: MAE
metricCollectorSpec:
collector:
kind: StdOut
parallelTrialCount: 2
maxTrialCount: 10
maxFailedTrialCount: 3
algorithm:
algorithmName: random
parameters:
- name: n_estimators
parameterType: int
feasibleSpace:
min: "50"
max: "200"
- name: learning_rate
parameterType: double
feasibleSpace:
min: "0.01"
max: "0.2"
- name: max_depth
parameterType: int
feasibleSpace:
min: "2"
max: "10"
- name: subsample
parameterType: double
feasibleSpace:
min: "0.5"
max: "1.0"

trialTemplate:
primaryContainerName: training-container
trialParameters:
- name: n_estimators
description: "Number of estimators"
reference: "n_estimators"
- name: learning_rate
description: "Learning rate"
reference: "learning_rate"
- name: max_depth
description: "Maximum depth of the trees"
reference: "max_depth"
- name: subsample
description: "Subsample ratio"
reference: "subsample"
trialSpec:
apiVersion: batch/v1
kind: Job
spec:
template:
metadata:
annotations:
sidecar.istio.io/inject: 'false'
spec:
containers:
- name: training-container
image: {{IMAGE}}
command:
- "python"
- "/app/tune.py"
args:
- "--n_estimators"
- "${trialParameters.n_estimators}"
- "--learning_rate"
- "${trialParameters.learning_rate}"
- "--max_depth"
- "${trialParameters.max_depth}"
- "--subsample"
- "${trialParameters.subsample}"
resources:
requests:
cpu: '0.5'
memory: 1Gi
limits:
cpu: '1'
memory: 2Gi
volumeMounts:
- name: dataset
mountPath: /dataset
imagePullSecrets:
- name: {IMAGE_SECRET}
restartPolicy: Never
volumes:
- name: dataset
persistentVolumeClaim:
claimName: dataset-pvc
Description of key fields
FieldDescription
spec.objectiveThe target metric to optimize (maximize or minimize).
Must be printed from the image.
spec.parametersSpecifies the hyperparameters to explore, including their types and ranges.
trialTemplate.trialParametersDeclares the hyperparameters used in the experiment.
These appear as input fields in the Katib UI.
trialTemplate.trialSpecSpecifies the pod configuration for running the experiment.

Step 3. Run experiment

Follow the steps below to create a Katib experiment from the Kubeflow dashboard:

  1. Go to the Kubeflow dashboard and select the Experiments (AutoML) menu on the left.
  2. Click the [+ New Experiment] button, then click the Edit button at the bottom center.
  3. Paste the contents of the katib-experiment.yaml file created in the previous step, and click CREATE.

experiment

Step 4. Check tuning results

After the experiment is created, click lb-gbr-tune to view the tuning results.
Katib automatically logs the parameter combinations and evaluation results of each trial, allowing you to easily identify the best-performing configuration.

tuning results