Training predictive model using Kubeflow pipelines
This guide introduces how to automate machine learning model training processes using pipelines in the KakaoCloud Kubeflow environment.
- Estimated time: 10 minutes
- Recommended OS: MacOS, Ubuntu
- Region: kr-central-2
- Prerequisites
- Reference documents
Before starting
This tutorial introduces the core concepts and features of Kubeflow Pipelines through datasets and practical examples. It provides an example to understand the creation and combination of pipeline components, data processing, and model training processes. By following this guide, users can learn how machine learning workflows operate and how to automate them.
About this scenario
This tutorial explains how to create and use Kubeflow Pipelines to automate the predictive model training process step-by-step. The key topics include:
- Understanding the basics of Kubeflow Pipelines and components
- Creating and running pipelines
- Managing model training processes using 'Experiment' and 'Run'
Supported tools
Tool | Version | Description |
---|---|---|
KF Pipelines | 2.0.5 | - A core component of Kubeflow that builds, deploys, and manages machine learning workflows. - Provides a simplified interface for rapid experimentation and repeatable workflows. - Offers features such as parameter tuning, experiment management, and model versioning. |
For more details, refer to the Kubeflow > KF Pipelines official documentation.
Key Concepts
- Component: Reusable task units that support various languages and libraries. Multiple components can be combined to configure an experiment.
- Experiment: A collection of components that form a complete workflow. You can test various parameter and data combinations.
- Run: Executes the experiment and tracks results for each stage. Failed tasks can be re-executed, and previous results can be reused.
Pipeline management
- Pipeline components are visualized as a Directed Acyclic Graph (DAG).
- All pipelines are managed as code through the SDK or by manually uploading compressed files.
Pipeline images
Kubeflow pipeline images supported by KakaoCloud include frameworks such as TensorFlow and PyTorch. You can also use custom Docker images as needed.
The image registry endpoint is bigdata-150.kr-central-2.kcr.dev/kc-kubeflow/(image_name)
.
For example, to pull the kmlp-tensorflow:1.0.0.py36.cpu
image, use bigdata-150.kr-central-2.kcr.dev/kc-kubeflow/kmlp-tensorflow:1.0.0.py36.cpu
.
Supported pipeline information
Image name | Framework | Version | GPU support |
---|---|---|---|
kmlp-tensorflow:1.8.0.py38.cpu.1a | tensorflow | 2.13.1 | X |
kmlp-tensorflow:1.8.0.py38.cuda.1a | tensorflow | 2.13.1 | O |
kmlp-tensorflow:1.8.0.py311.cpu.1a | tensorflow | 2.15.1 | X |
kmlp-tensorflow:1.8.0.py311.cuda.1a | tensorflow | 2.15.1 | O |
kmlp-pytorch:1.8.0.py38.cpu.1a | pytorch | 2.3.0 | X |
kmlp-pytorch:1.8.0.py38.cuda.1a | pytorch | 2.3.0 | O |
kmlp-pytorch:1.8.0.py311.cpu.1a | pytorch | 2.3.0 | X |
kmlp-pytorch:1.8.0.py311.cuda.1a | pytorch | 2.3.0 | O |
kmlp-pyspark-tensorflow:1.8.0.py38.cpu.1a | tensorflow | 2.13.1 | X |
kmlp-pyspark-tensorflow:1.8.0.py38.cuda.1a | tensorflow | 2.13.1 | O |
kmlp-pyspark-tensorflow:1.8.0.py311.cpu.1a | tensorflow | 2.15.1 | X |
kmlp-pyspark-tensorflow:1.8.0.py311.cuda.1a | tensorflow | 2.15.1 | O |
kmlp-pyspark-pytorch:1.8.0.py38.cpu.1a | pytorch | 2.3.0 | X |
kmlp-pyspark-pytorch:1.8.0.py38.cuda.1a | pytorch | 2.3.0 | O |
kmlp-pyspark-pytorch:1.8.0.py311.cpu.1a | pytorch | 2.3.0 | X |
kmlp-pyspark-pytorch:1.8.0.py311.cuda.1a | pytorch | 2.3.0 | O |
Prework
1. Prepare training dataset
This tutorial uses the publicly available TLC Trip Record Data from New York City and example pipeline manifest files for simple preprocessing and training pipeline practice.
Item | Description |
---|---|
Goal | Implement a taxi fare prediction model |
Data information | NYC Taxi and Limousine Commission's data from 2009 to 2015 - Contains pickup/dropoff times and locations, trip distance, fare amount, payment type, passenger count, etc. |
Original dataset information
2. Prepare the Kubeflow environment
This tutorial uses a GPU node pool environment.
If the Kubeflow service or the appropriate environment is not set up, refer to the Create and manage Kubeflow document to create Kubeflow.
Step-by-step process
The detailed steps for creating an 'Experiment' and 'Run' and building a training pipeline in the Kubeflow environment are as follows.
Step 1. Create pipeline
Learn how to create a pipeline using the provided manifest file.
-
After accessing the Kubeflow dashboard, click the Pipelines tab and then the [Upload pipeline] button.
Access the Pipelines tab in the Kubeflow dashboard
-
Select the Upload a file option and upload the practice manifest file (zip) downloaded in Step 1: Prerequisites.
Upload a pipeline
For more detailed instructions on creating a pipeline, refer to the Kubeflow > Kubeflow Pipeline > Quick Start document.
Step 2. Create Experiment
Create an Experiment in the Experiments (KFP) tab or the pipeline detail page within the Pipelines tab of the Kubeflow dashboard.
-
After accessing the Kubeflow dashboard, select the pipeline from the Pipelines tab where you want to create the experiment.
-
Click the [Create experiment] button on the pipeline detail page.
파이프라인 상세화면
-
Enter the Experiment name and click [Next].
Step 3. Create and manage Run
After creating an Experiment, proceed to the Run creation phase.
If you want to create a Run later, use one of the following methods:
- Click the [Create run] button in the Runs tab.
- Access the pipeline detail page in the Pipelines tab and click [Create run].
- Access the experiment detail page
in the Experiments (KFP) tab and click [Create run].
-
In the Start a run screen, fill in the necessary information and click [Start].
- In this practice, all fields are automatically filled since the manifest file was uploaded in Step 2.
Start a run
-
Navigate to the Runs tab and select the created Run to access its details. You can check detailed information about the Run on this page.
Run details page
Step 4. Manage Run results
-
In the Runs tab of the Kubeflow dashboard, select the Run you want to archive from the list, and click the [Archive] button.
Archive a run
-
Archived runs can be viewed in the Archived section of the Runs tab. You can restore a run by selecting it and clicking the [Restore] button.
Restore a run
Step 5. Delete Run
To remove resources after completion or non-use of a service, follow the steps below to delete the Run.
-
In the Runs tab of the Kubeflow dashboard, select the Run you want to delete, and click the [Archive] button.
Archive a run
-
Archived runs can be viewed in the Archived section of the Runs tab. You can delete the Run by selecting it and clicking the [Delete] button.
Delete a run
-
Upon deleting the run, verify that the pods have been deleted.
Confirm run deletion
Step 6. Archive Experiment
-
In the Experiments (KFP) tab of the Kubeflow dashboard, select the experiment you want to archive from the list.
-
On the experiment detail page, click the [Archive] button at the top right.
Archive an experiment
-
Archived experiments can be viewed in the Archived section of the Experiments tab. You can restore an experiment by selecting it and clicking the [Restore] button.
Restore an experiment
Step 7. Delete pipeline
To remove unused pipelines, follow the steps below to delete a pipeline.
-
In the Pipelines tab of the Kubeflow dashboard, select the pipeline you want to delete.
-
Select the pipeline from the list, and click the [Delete] button at the top right to delete it.
Delete a pipeline
For more detailed instructions, refer to the Kubeflow Pipelines documentation.