Skip to main content
Tutorial series

Kubeflow traffic prediction model

This tutorial series walks you through the process of building a traffic prediction model using load balancer log data on Kubeflow in KakaoCloud.
You will preprocess time-series data, develop a machine learning model, tune hyperparameters, deploy the model as an API, and automate the entire pipeline using MLOps—all in a hands-on format.

This content is designed for practitioners and engineers interested in designing traffic prediction models, building MLOps with Kubeflow, and serving & automating Scikit-learn models in production.

Before you start

1. Set up Kubeflow environment

To follow this tutorial, you must have access to a pre-configured Kubeflow environment.
Refer to the Deploy Jupyter Notebooks on Kubeflow guide and make sure your environment has a CPU-based node pool configured.

This tutorial does not use GPU. A CPU-only node pool is sufficient.

2. Configure storage class (NFS)

To create PVCs, your Kubernetes cluster must have a StorageClass capable of dynamic volume provisioning.
If not yet configured, follow the NFS Client Provisioner setup guide to create an NFS Client Provisioner.

3. Create PVC volumes

You’ll need to create the following 3 PVCs to store data, models, and artifacts used in the tutorial:

NameMount pathRecommended sizeAccess mode
dataset-pvc/home/jovyan/dataset2GiReadOnlyMany
model-pvc/home/jovyan/models2GiReadOnlyMany
artifact-pvc/home/jovyan/artifacts2GiReadOnlyMany

Each PVC can be defined using a YAML file. Update {YOUR_KUBEFLOW_NAMESPACE} and {YOUR_STORAGE_CLASS} according to your environment:

dataset-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: model-pvc
namespace: {YOUR_KUBEFLOW_NAMESPACE}
spec:
accessModes:
- ReadOnlyMany
resources:
requests:
storage: 2Gi
storageClassName: {YOUR_STORAGE_CLASS}

You can create model-pvc.yaml and artifact-pvc.yaml by duplicating this file and updating only the name field.

Create PVCs
kubectl apply -f dataset-pvc.yaml
kubectl apply -f model-pvc.yaml
kubectl apply -f artifact-pvc.yaml
Notes when creatin
  • Be sure to create resources in the active Kubeflow namespace.
  • When applying via kubectl, include the -n flag to specify the namespace.

4. Sample data

This tutorial uses synthetic log data, and each step will provide a downloadable sample.

Tutorial structure

This traffic prediction model tutorial series is organized into the following steps:

  1. Explore data and develop model: Preprocess log data, engineer features to capture repeating time patterns, and build ML models.
  2. Tune model hyperparameters: Use Kubeflow Katib to optimize model hyperparameters and improve performance.
  3. Create model serving API: Deploy the trained model as a KServe-based InferenceService and make predictions via API.
  4. Automate pipeline workflow: Build an automated workflow covering data processing, training, and serving.