Kubeflow traffic prediction model
This tutorial series walks you through the process of building a traffic prediction model using load balancer log data on Kubeflow in KakaoCloud.
You will preprocess time-series data, develop a machine learning model, tune hyperparameters, deploy the model as an API, and automate the entire pipeline using MLOps—all in a hands-on format.
This content is designed for practitioners and engineers interested in designing traffic prediction models, building MLOps with Kubeflow, and serving & automating Scikit-learn models in production.
Before you start
1. Set up Kubeflow environment
To follow this tutorial, you must have access to a pre-configured Kubeflow environment.
Refer to the Deploy Jupyter Notebooks on Kubeflow guide and make sure your environment has a CPU-based node pool configured.
This tutorial does not use GPU. A CPU-only node pool is sufficient.
2. Configure storage class (NFS)
To create PVCs, your Kubernetes cluster must have a StorageClass capable of dynamic volume provisioning.
If not yet configured, follow the NFS Client Provisioner setup guide to create an NFS Client Provisioner
.
3. Create PVC volumes
You’ll need to create the following 3 PVCs to store data, models, and artifacts used in the tutorial:
Name | Mount path | Recommended size | Access mode |
---|---|---|---|
dataset-pvc | /home/jovyan/dataset | 2Gi | ReadOnlyMany |
model-pvc | /home/jovyan/models | 2Gi | ReadOnlyMany |
artifact-pvc | /home/jovyan/artifacts | 2Gi | ReadOnlyMany |
Each PVC can be defined using a YAML file. Update {YOUR_KUBEFLOW_NAMESPACE}
and {YOUR_STORAGE_CLASS}
according to your environment:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: model-pvc
namespace: {YOUR_KUBEFLOW_NAMESPACE}
spec:
accessModes:
- ReadOnlyMany
resources:
requests:
storage: 2Gi
storageClassName: {YOUR_STORAGE_CLASS}
You can create model-pvc.yaml and artifact-pvc.yaml by duplicating this file and updating only the name field.
kubectl apply -f dataset-pvc.yaml
kubectl apply -f model-pvc.yaml
kubectl apply -f artifact-pvc.yaml
- Be sure to create resources in the active Kubeflow namespace.
- When applying via
kubectl
, include the-n
flag to specify the namespace.
4. Sample data
This tutorial uses synthetic log data, and each step will provide a downloadable sample.
Tutorial structure
This traffic prediction model tutorial series is organized into the following steps:
- Explore data and develop model: Preprocess log data, engineer features to capture repeating time patterns, and build ML models.
- Tune model hyperparameters: Use Kubeflow Katib to optimize model hyperparameters and improve performance.
- Create model serving API: Deploy the trained model as a KServe-based InferenceService and make predictions via API.
- Automate pipeline workflow: Build an automated workflow covering data processing, training, and serving.