Tutorial series

Kubeflow LLM workflows

This tutorial series provides a hands-on guide to the end-to-end process of preparing, training, and utilizing large language models (LLMs) using Kubeflow on KakaoCloud.
By working with models like Kakao’s Kanana and Meta’s Llama 3.2, you’ll learn how to operate LLMs in a production-like environment — from setting up inference endpoints to fine-tuning and implementing RAG-based applications.

This series is designed for developers already familiar with Kubeflow or users looking to adopt LLMs in MLOps environments.

Before you start

Before getting started, ensure that a Kubeflow environment (CPU or GPU-based) is already set up.
Refer to the Deploy Jupyter Notebooks on Kubeflow guide to prepare your environment.

The following node pool specifications are recommended for LLM serving and fine-tuning tasks:

Type	Recommended Specs
CPU-based	- `m2a.2xlarge` (8 vCPU, at least 32GiB RAM) - Volume: 100GiB or more
GPU-based	- `p2i.6xlarge` (A100 80GB, 24 vCPU, at least 192GiB RAM) - MIG: At least one `1g.10gb` instance - Volume: 100GiB or more

Example datasets and models used in the tutorials will be provided in each tutorial page with download links.

Tutorial structure

The LLM workflow series consists of the following stages:

Create LLM model serving endpoint:
Use KServe to deploy an LLM model and expose an inference endpoint compatible with LangChain.
Fine-tune LLM model:
Practice fine-tuning a pre-trained model using domain-specific data with PEFT and Unsloth.
Build RAG with LLM model:
Implement a retrieval-augmented generation (RAG) system using LangChain and FAISS to search documents and generate answers to user queries.

Before you start​

Tutorial structure​

Before you start

Tutorial structure