Skip to main content
Tutorial series | Kubeflow basic workflow

Predictive model training in Kubeflow Pipelines

This tutorial introduces how to automate machine learning model training using the Kubeflow service on KakaoCloud.

Basic information
  • Estimated time: 10 minutes
  • Recommended OS: MacOS, Ubuntu
  • Notes:
    • In a private network environment, file downloads may not work properly.

About this scenario

This scenario explains the core concepts and functionalities of Kubeflow Pipelines using training data and a hands-on example. You will learn how to create and combine pipeline components, and automate workflows for data processing and model training. It especially focuses on automating the model training process step by step with Kubeflow Pipelines, helping you build and operate efficient workflows.

Key topics include:

  • Understanding the basics of Kubeflow Pipelines and its components
  • Creating and running pipelines
  • Managing the model training process with Experiments and Runs

Supported tools

ToolVersionDescription
KF Pipelines2.0.5- A core component of Kubeflow that helps build, deploy, and manage machine learning workflows.
- Supports fast experimentation and repeatable ML workflows through a simplified interface.
- Offers parameter tuning, experiment tracking, and model versioning features.
info

For more details on KF Pipelines, refer to the Kubeflow > KF Pipelines official documentation.

Key concepts

  • Component: A reusable task unit that supports various languages and libraries. You can combine multiple components to create an experiment.
  • Experiment: A full workflow composed of connected components. You can test combinations of parameters and data.
  • Run: Executes an Experiment and tracks results of each step. You can retry failed tasks or reuse results from previous runs.

Pipeline management

  • Pipeline components are visualized in the form of a Directed Acyclic Graph (DAG).
  • All pipelines can be managed via code, either through SDK or by manually uploading compressed files.

Pipeline images

KakaoCloud provides Kubeflow pipeline images that include various ML frameworks such as TensorFlow and PyTorch. You can also use your own custom Docker images.

info

The image registry endpoint is bigdata-150.kr-central-2.kcr.dev/kc-kubeflow/(image-name).
For example, to pull the image kmlp-tensorflow:1.0.0.py36.cpu, use:
bigdata-150.kr-central-2.kcr.dev/kc-kubeflow/kmlp-tensorflow:v1.0.0.py36.cpu

Supported pipeline images
Image nameFrameworkVersionGPU Supported
kmlp-tensorflow:v1.8.0.py38.cpu.1atensorflow2.13.1X
kmlp-tensorflow:v1.8.0.py38.cuda.1atensorflow2.13.1O
kmlp-tensorflow:v1.8.0.py311.cpu.1atensorflow2.15.1X
kmlp-tensorflow:v1.8.0.py311.cuda.1atensorflow2.15.1O
kmlp-pytorch:v1.8.0.py38.cpu.1apytorch2.3.0X
kmlp-pytorch:v1.8.0.py38.cuda.1apytorch2.3.0O
kmlp-pytorch:v1.8.0.py311.cpu.1apytorch2.3.0X
kmlp-pytorch:v1.8.0.py311.cuda.1apytorch2.3.0O
kmlp-pyspark-tensorflow:v1.8.0.py38.cpu.1atensorflow2.13.1X
kmlp-pyspark-tensorflow:v1.8.0.py38.cuda.1atensorflow2.13.1O
kmlp-pyspark-tensorflow:v1.8.0.py311.cpu.1atensorflow2.15.1X
kmlp-pyspark-tensorflow:v1.8.0.py311.cuda.1atensorflow2.15.1O
kmlp-pyspark-pytorch:v1.8.0.py38.cpu.1apytorch2.3.0X
kmlp-pyspark-pytorch:v1.8.0.py38.cuda.1apytorch2.3.0O
kmlp-pyspark-pytorch:v1.8.0.py311.cpu.1apytorch2.3.0X
kmlp-pyspark-pytorch:v1.8.0.py311.cuda.1apytorch2.3.0O

Before you start

1. Prepare training dataset

This tutorial uses TLC Trip Record Data from New York City and a sample pipeline manifest file for a simple preprocessing and training pipeline exercise.

ItemDescription
GoalBuild a taxi fare prediction model
DataNYC Yellow Taxi fare data (2009–2015), including pickup/drop-off time and location, trip distance, fare, payment type, passenger count, etc.

2. Prepare Kubeflow environment

This tutorial uses a GPU node pool environment.

If you haven't set up Kubeflow yet, follow the Kubeflow setup guide to create the environment.

Getting started

Here's how to create an Experiment and Run in Kubeflow and build the training pipeline:

Step 1. Create pipeline

Instructions for creating a pipeline using the sample manifest file:

  1. Access the Kubeflow dashboard and click the Pipelines tab. Then, click [Upload pipeline].

    Access Pipelines tab in Kubeflow dashboard

  2. Select Upload a file and upload the .yaml file you downloaded in Step 1.

    Upload pipeline

Step 2. Create Experiment

Create an Experiment from either the Experiments (KFP) tab or the details page of a specific pipeline in the Pipelines tab.

  1. Access the Kubeflow dashboard and select the pipeline where you want to create an Experiment from the Pipelines tab.

  2. In the pipeline detail view, click [Create experiment].

    Pipeline detail page

  3. Enter the Experiment name and click [Next].

    Create experiment

Step 3. Create and manage Run

After creating the Experiment, you will proceed to the Run creation step.

info

If you want to create a Run later, you can do so using one of the following methods:

  • Click [Create run] from the Runs tab.
  • In the Pipelines tab, go to the pipeline detail page and click [Create run].
  • In the Experiments (KFP) tab, go to the experiment detail page and click [Create run].
  1. On the Start a run screen, enter the required information and click [Start].

    • Since the manifest file was uploaded in Step 2, all values will be auto-filled in this screen.

    Start a run

  2. Move to the Runs tab and select the created Run to access the detailed view. You can check all Run-related information on this screen.

    Run detail page

Step 4. Manage run results

  1. In the Kubeflow dashboard, go to the Runs tab, select the Run to archive, and click [Archive].

    Archive run

  2. Archived runs can be found under the “Archived” filter in the Runs tab. To restore a Run, select it and click [Restore].

    Restore run

Step 5. Delete a Run

Once the experiment is complete or no longer needed, it's good practice to delete unused resources:

  1. In the Runs tab, select the Run to delete and click [Archive].

    Delete run - archive

  2. From the Archived tab, select the Run and click [Delete].

    Delete run

  3. You can verify that the corresponding pod has also been deleted.

    Confirm pod deletion

Step 6. Archive an Experiment

  1. Go to the Experiments (KFP) tab in the Kubeflow dashboard and select the Experiment to archive.

  2. In the detail view, click [Archive] in the upper-right corner.

    Archive experiment

  3. You can view archived experiments in the “Archived” section of the Experiments tab. To restore one, click [Restore].

    Restore experiment

Step 7. Delete a pipeline

After completing the tutorial or if a pipeline is no longer used, delete it as follows:

  1. Access the Pipelines tab in the Kubeflow dashboard.

  2. From the list view, select the pipeline to delete and click the [Delete] button in the top-right corner.

    Delete pipeline

info

For more details on Kubeflow Pipelines, see the official Kubeflow Pipelines documentation.