2 posts tagged with "kubeflow"

View All Tags

Practical machine learning workflows starting with Kubeflow

May 23, 2025 · 5 min read

Jin (손진광)

Developer

Owen (정지성)

Developer

Using machine learning and AI in the cloud is no longer an area limited to specific developers or researchers. It is becoming a technology that is closer to practitioners who plan or operate services, and even to beginners encountering AI technology for the first time.

In line with this trend, KakaoCloud provides the latest version of Kubeflow. This time, we are newly providing two hands-on tutorial series that let anyone build machine learning pipelines directly based on Kubeflow.

The newly released tutorials are series on LLM (large language model) practice and web service traffic prediction. Beyond simple code examples, they let you easily experience the full practical process, from model training to serving, optimization, and automation.

📘 Build generative AI yourself - LLM workflow tutorial series

The first series is the LLM workflow tutorial. This series is structured so that you can practice the entire process of serving a large language model directly in a Kubeflow environment, fine-tuning it for your intended purpose, and finally building a document-based question answering system (RAG).

In particular, this series uses Meta Llama 3.2 from Hugging Face Hub together with Kanana, a model developed by Kakao. You can directly experience various LLM usage scenarios, from real-time inference to domain-specific training.

The LLM series consists of three parts.

Part 1: Create an LLM model serving endpoint Deploy a pretrained LLM to a cloud environment using KServe and create an endpoint that supports real-time inference.
Part 2: Fine-tune an LLM model Guides you through efficiently retraining a selected model on domain-specific data based on PEFT (LoRA, and more). It also includes how to save and reuse the model after training.
Part 3: Implement document-based RAG Complete an LLM use case by embedding user text documents into vectors, storing them in FAISS, and configuring a question answering API using LangChain.

Because this series lets you directly configure an LLM using CPU/GPU in a cloud environment, we believe it will be a very useful starting point for developers and AI planners who want to review actual productization possibilities.

📌 Go to the Kubeflow-based LLM workflow series

📈 From logs to insights - Traffic prediction model tutorial series

The second series is a hands-on tutorial for building a traffic prediction model. This series walks through the process of collecting access log data from a web service and creating a time-series machine learning model that predicts future traffic based on that data.

In particular, this tutorial does not stop at analysis. It also covers serving the trained model as an API and automating the entire process with Kubeflow Pipelines. In other words, you can experience an end-to-end pipeline that covers data preprocessing, model development, hyperparameter optimization, deployment, and operations all at once.

The traffic prediction series consists of four parts.

Part 1: Collect and preprocess traffic data Collect web server log data and refine it into a form suitable for time-series analysis. Create features that reflect periodic patterns such as day of week and time of day, and build a dataset that can be used as input for machine learning models.
Part 2: Tune model hyperparameters Based on the results of baseline model training, use Kubeflow Katib to perform hyperparameter optimization and improve performance.
Part 3: Create a model serving API Deploy the trained model as a KServe-based InferenceService and perform predictions through API requests.
Part 4: Configure a model pipeline Automate the entire process, from data preprocessing and model training to performance validation and serving deployment, with Kubeflow Pipelines.

This series is highly recommended for MLOps beginners and data engineers because it lets you practice the complete flow of an operational machine learning service directly in a cloud environment.

📌 Go to the Kubeflow-based traffic prediction model series

🚀 Practical machine learning workflows starting with Kubeflow

Both series released this time are built on KakaoCloud Kubeflow. Kubeflow is a tool that simplifies complex MLOps processes and helps manage reproducible machine learning experiments easily. You can intuitively configure machine learning infrastructure such as GPU, storage, and network settings in the KakaoCloud console, and it provides features for deploying and operating various machine learning workloads in a consistent way.

These tutorials are designed as practical learning paths where you can acquire technology flows applicable to real work, going beyond simply following steps. From the latest generative AI technologies such as LLMs to predictive models and pipeline configuration, you do not merely copy and run complex code. Instead, you configure the meaning of each step yourself, understand the technical context, and build practical intuition.

You can directly practice and experience two machine learning fields currently receiving attention, generative AI and time-series prediction, in the KakaoCloud environment. Start building practical machine learning pipelines with Kubeflow-based hands-on tutorials.

📝 View all Machine Learning & AI tutorials
👉 Start KakaoCloud now

Building MLOps workflows with Kubeflow

January 31, 2024 · 7 min read

Jin (손진광)

Developer

Hello. In this post, we introduce Kubeflow, a core platform for machine learning operations.

Kubeflow is an open-source project designed to reduce the complexity of machine learning and help data scientists and developers develop and deploy machine learning models more easily and quickly. In the first sentence introducing Kubeflow on the official Kubeflow site, it is described as a project that helps comprehensively manage and operate various open-source tools for machine learning on Kubernetes.

Starting from TensorFlow Extended (TFX), which Google used internally in the past, Kubeflow has now expanded into one of the most widely known end-to-end solutions for running machine learning workflows in various Kubernetes-based environments.

One of Kubeflow's most innovative approaches is the integration of AutoML and Kubeflow Pipelines. This allows users to automate and optimize the training, evaluation, and deployment stages of models, reducing repetitive work in machine learning projects. In addition, multi-tenant support has been strengthened so that multiple teams can effectively share the same Kubeflow instance while isolating resources. The Kubeflow service provided by KakaoCloud is also designed to maximize the efficiency of machine learning work and make it easy for users to access.

In this post, we introduce Kubeflow's major components, latest features, and various tutorial scenarios for using Kubeflow on KakaoCloud.

Kubeflow features

Kubeflow supports the following tasks in Kubernetes environments with the goal of flexible scaling and easy, convenient production deployment of machine learning models.

Easy, repeatable, and portable deployment: Pipelines created through Kubeflow make deployment easier across multiple environments, including cloud and on-premises environments.
Independent microservice deployment and management system: Based on a microservices architecture, Kubeflow enables independent management of each component.
Responsive scaling based on user requirements: Resources are automatically scaled according to user requirements to ensure optimal performance.

Key Kubeflow components

Kubeflow consists of multiple open-source components such as Central Dashboard, Jupyter Notebooks, Tensorboard, and Pipelines, each supporting a specific stage of the machine learning workflow. These components are designed to help users manage machine learning projects more efficiently.

Source: Kubeflow Ecosystem

Using these key components on Kubernetes, Kubeflow efficiently supports the entire process from machine learning model development and deployment to resource management.

Key Kubeflow component	Description
Central Dashboard	Provides a dashboard web console for accessing and monitoring multiple components.
Notebooks	Provides a Jupyter Notebook environment where data scientists can code directly within a cluster.
Tensorboard	Creates and manages Tensorboard Server, a tool for visualizing model training processes and training data provided by frameworks such as Tensorflow and PyTorch.
Pipelines	Simplifies complex machine learning workflows through scalable Docker-based pipelines.
Katib	Automates hyperparameter tuning for model training through AutoML components such as Katib.
Training Operator	Supports various machine learning frameworks and enables flexible training jobs.
KServe	Enables efficient model deployment and serving through model-serving add-ons such as KServe, and provides them as real-time APIs internally and externally.

KakaoCloud Kubeflow

KakaoCloud supports the latest features, including Kubeflow 1.6, and provides an optimized cloud environment that enables users to perform machine learning tasks easily and quickly. In particular, KakaoCloud Kubeflow has the following features.

Support for all Kubeflow 1.6 features

KakaoCloud Kubeflow lets you use all major Kubeflow components and add-ons introduced above. You can also install and use frameworks and libraries such as Tensorflow, PyTorch, Apache MXNet, MPI, XGBoost, Chainer, HuggingFace, and OpenAI SDK.

Granular access management

By providing RBAC, users can be assigned namespaces according to their tasks and roles, and permissions can be managed efficiently by user or group. Administrators can also assign quota features by namespace and allocate CPU, memory, GPU memory, and storage resources according to configured usage.

Flexible storage options

In addition to the independent MinIO type, KakaoCloud supports storage repositories of the Object Storage type, enabling more flexible serving of model result files.

Optimized for Nvidia MIG instances

KakaoCloud Kubeflow provides optimized MIG (Multi Instance GPU) instances based on Nvidia A100. MIG instance settings allow GPU resources to be partitioned, enabling users to run multiple workloads efficiently on the same GPU.

Multi File Storage support

Users can dynamically use as much independent File Storage as needed by user or group, making it easier to share files between work pipelines and notebooks.

Usage examples with Kubeflow

KakaoCloud technical documentation provides rich Kubeflow tutorials that cover various stages of machine learning projects, from Jupyter Notebook setup to building parallel training models and creating model-serving APIs. By referring to these tutorials, you can learn about efficient model development, training, optimization, and deployment using KakaoCloud Kubeflow.

The Kubeflow-related tutorials currently available in KakaoCloud technical documentation are as follows.

Configure a Jupyter Notebook environment using Kubeflow
Introduces the process of configuring Jupyter Notebook using the Kubeflow service in a Kubernetes environment.
Implement a predictive model with Kubeflow Notebook
A hands-on example that implements a taxi fare prediction model using TLC Trip Record Data.
Train a predictive model using Kubeflow Pipelines
Introduces how to automate the training process of a machine learning model using Kubeflow Pipelines.
Manage machine learning experiments using Kubeflow Tensorboard
A hands-on example that uses the TensorBoard component to manage and visualize log data generated during machine learning experiments.
Tune hyperparameters with Kubeflow
A scenario that performs hyperparameter tuning for the MNIST dataset using Kubeflow and Katib.
Implement a parallel training model with a Kubeflow MIG instance
A scenario that implements a parallel training model using Kubeflow MIG (Multi-Instance GPU) instances and Training Operator.
Create a Kubeflow model serving API
A scenario that builds a machine learning pipeline using a dataset and provides the generated model as a web API.

Closing

Kubeflow is currently one of the most widely used open-source MLOps platforms in Korea and abroad. As a result, educational content, experience cases, and example source code are relatively abundant, helping data scientists and working analysts who are using it for the first time adapt quickly.

KakaoCloud Kubeflow provides GPU optimization and powerful resource management features through easy provisioning that takes advantage of the cloud environment. We will continue improving the Kubeflow service so KakaoCloud users can fully benefit from an MLOps platform with machine learning efficiency and enhanced security. If you are considering using a Kubeflow service for machine learning, be sure to try KakaoCloud's service.

Thank you.

📘 Build generative AI yourself - LLM workflow tutorial series​

📈 From logs to insights - Traffic prediction model tutorial series​

🚀 Practical machine learning workflows starting with Kubeflow​

Kubeflow features​

Key Kubeflow components​

KakaoCloud Kubeflow​

Support for all Kubeflow 1.6 features​

Granular access management​

Flexible storage options​

Optimized for Nvidia MIG instances​

Multi File Storage support​

Usage examples with Kubeflow​

Closing​

📘 Build generative AI yourself - LLM workflow tutorial series

📈 From logs to insights - Traffic prediction model tutorial series

🚀 Practical machine learning workflows starting with Kubeflow

Kubeflow features

Key Kubeflow components

KakaoCloud Kubeflow

Support for all Kubeflow 1.6 features

Granular access management

Flexible storage options

Optimized for Nvidia MIG instances

Multi File Storage support

Usage examples with Kubeflow

Closing