Key Concepts
Kubeflow on KakaoCloud is an open-source platform that helps easily build and run machine learning workflows in a cloud-native environment. Built on Kubernetes, it leverages Kubernetes' cluster management capabilities to simplify and streamline the management of ML workflows.
It provides key components for ML workflows such as data preprocessing, model training, and model serving, allowing developers to quickly and easily build ML models using a consistent interface and high-level abstractions.
Resource structure
Resource structure
Item | Description |
---|---|
⓵ Cloud | Runs in KakaoCloud environment Enables easier execution of machine learning tasks by leveraging cloud resources |
⓶ Kubernetes engine and cluster | Kubeflow is based on Kubernetes engine and cluster Facilitates development, deployment, and management of ML models |
⓷ Kubeflow application | Provides essential features for developing, tuning, deploying, and managing ML models |
⓸ Kubeflow scaffolding | Supports deployment and management of ML models |
⓹ Machine learning tools | Kubeflow supports various ML tools Users can choose preferred tools for ML tasks |
ML workflow
Item | Category | Description |
---|---|---|
Experiment phase | Data collection and preprocessing | Collect and preprocess data for ML model training |
Data transformation | Convert data into model-readable formats, reduce size, extract features for ML training | |
Write model code | Write code to develop model based on selected ML algorithm | |
Model training | Train model using training data, modify hyperparameters to generate and compare model versions | |
Hyperparameter tuning | Tune model hyperparameters to find the optimal configuration | |
Production phase | Train model and evaluate performance | Train and evaluate selected model for performance |
Deploy model | Deploy trained model to provide prediction services | |
Monitor and manage model | Monitor model performance and stability, respond to issues when needed |
Kubeflow lifecycle and status
Kubeflow lifecycle
Status | Description | Category |
---|---|---|
Creating | Creating Kubeflow resource | Yellow |
Active | Kubeflow resource is active | Green |
Failed | Resource creation failed or unexpected error occurred | Red |
Expired | Resource expired or associated cluster deleted | Red |
Terminating | Resource is being terminated | Yellow |
Terminated | Resource has been terminated (deleted) | Gray |
For the status of connected clusters, node pools, and nodes, refer to Kubernetes engine > Resource status information.
Kubeflow user and group status
Kubeflow user and group status
Status | Description | Category |
---|---|---|
Pending | Creating/updating user or group | Yellow |
Active | Creation/update completed successfully | Green |
Failed | Creation/update failed or unexpected error occurred | Red |
Deleted | Deletion completed successfully | Not displayed |
Kubeflow component architecture
Kubeflow components are modular and essential for managing ML workflows. They enhance flexibility and scalability and automate tasks for ML model development and deployment.
Component type | Description |
---|---|
Dashboard | Web console to access Kubeflow components |
JupyterLab | Web-based ML development tool integrated with Kubeflow SDK |
Kubeflow Pipelines | Visual console to manage ML workflows such as preprocessing, training, and serving |
Katib | Hyperparameter tuning for model training - Supports distributed training and optimal model discovery |
KServe | Model deployment and inference component - Supports model serving and REST API-based inference |
Trainer | Model training component supporting distributed learning with frameworks like TensorFlow and PyTorch |
Model registry | Model registration and versioning - Stores metadata and integrates with other components |
Spark operator | Declarative execution of Apache Spark applications - Automates spark-submit , supports scheduling, retries, and monitoring |
Supported components by version and service type
Version | Service type | Components and versions |
---|---|---|
1.8.0 | Essential + Hyper param tuning (HPT) + Serving API | - JupyterLab 4.2.1 - KF Pipelines 2.0.5 - Trainer v1-855e096 - Katib 0.16.0 - Tensorboard 2.5.1 - KServe 0.11.2 - Model registry 0.2.5-alpha |
1.10.0 | Essential + Hyper param tuning (HPT) + Serving API | - JupyterLab 4.3.5 - KF Pipelines 2.4.1 - Trainer v1-3f15cb - Katib 0.18.0 - Tensorboard 2.5.1 - KServe 0.15.0 - Model registry 0.2.19 - Spark operator 2.1.0 (supports Spark 2.3 and above) |
Manage Kubeflow roles
Kubeflow roles grant different levels of access to console, dashboard, and namespaces.
Roles include Owner, User, and Group user. A single user may have multiple roles.
Role types at the Kubeflow level
To obtain the Owner role, you must have at least IAM project member permission.
After being assigned as Owner, revoking IAM project access may result in loss of console and dashboard access.
Role | Description |
---|---|
Kubeflow owner | Highest-level role automatically assigned to the user who creates Kubeflow - Manages users, namespaces, and groups via console - Each Kubeflow must have at least one owner (up to 5 owners can be assigned) - Requires project admin/member IAM permission |
Kubeflow user | Standard user who can manage owned namespaces and participate in groups - Must be registered by Owner or Org Admin via console - No console access except dashboard - Can be promoted to Owner if they own a namespace and have IAM permissions |
Kubeflow group user | User added to a group; uses dashboard based on group permissions - Registered by Owner, Org Admin, or group Admin via dashboard - Can belong to multiple groups - Role types: Admin / Edit / View |
Console permissions by role
Except for Owners, all other roles lack console access. IAM project permissions are required for console access.
Refer to IAM for more information.
Console permission | Kubeflow owner |
---|---|
View Kubeflow details | ✓ |
Request Kubeflow deletion | ✓ |
Add/edit/delete owner | ✓ |
Add/edit/delete user | ✓ |
Create/edit/delete group | ✓ |
Add/edit/delete group user | ✓ |
Dashboard permissions by role
User roles are valid within assigned namespaces.
Users may hold multiple roles across namespaces. See the table below for role-based dashboard permissions.
Dashboard permission | Owner | User | Group admin | Group edit | Group view |
---|---|---|---|---|---|
View other namespaces | ✓ | ||||
View own namespace | ✓ | ✓ | ✓ | ✓ | ✓ |
Manage group users | ✓ | ||||
View notebooks | ✓ | ✓ | ✓ | ✓ | ✓ |
Create/delete/edit notebooks | ✓ | ✓ | ✓ | ✓ | |
View tensorboard | ✓ | ✓ | ✓ | ✓ | ✓ |
Create/delete/edit tensorboard | ✓ | ✓ | ✓ | ✓ | |
View pipelines | ✓ | ✓ | ✓ | ✓ | ✓ |
Create/delete/edit pipelines | ✓ | ✓ | ✓ | ✓ | |
View AutoML (Katib) | ✓ | ✓ | ✓ | ✓ | ✓ |
Create/delete/edit AutoML (Katib) | ✓ | ✓ | ✓ | ✓ | |
View model serving (KServe) | ✓ | ✓ | ✓ | ✓ | ✓ |
Create/delete/edit model serving (KServe) | ✓ | ✓ | ✓ | ✓ | |
View model registry | ✓ | ✓ | ✓ | ✓ | ✓ |
Create/delete/edit model registry | ✓ | ✓ | ✓ | ✓ | ✓ |