Kubeflow LLM workflows
This tutorial series provides a hands-on guide to the end-to-end process of preparing, training, and utilizing large language models (LLMs) using Kubeflow on KakaoCloud.
By working with models like Kakao’s Kanana and Meta’s Llama 3.2, you’ll learn how to operate LLMs in a production-like environment — from setting up inference endpoints to fine-tuning and implementing RAG-based applications.
This series is designed for developers already familiar with Kubeflow or users looking to adopt LLMs in MLOps environments.
Before you start
-
Before getting started, ensure that a Kubeflow environment (CPU or GPU-based) is already set up.
Refer to the Deploy Jupyter Notebooks on Kubeflow guide to prepare your environment. -
The following node pool specifications are recommended for LLM serving and fine-tuning tasks:
Type Recommended Specs CPU-based - m2a.2xlarge
(8 vCPU, at least 32GiB RAM)
- Volume: 100GiB or moreGPU-based - p2i.6xlarge
(A100 80GB, 24 vCPU, at least 192GiB RAM)
- MIG: At least one1g.10gb
instance
- Volume: 100GiB or more -
Example datasets and models used in the tutorials will be provided in each tutorial page with download links.
Tutorial structure
The LLM workflow series consists of the following stages:
-
Create LLM model serving endpoint:
Use KServe to deploy an LLM model and expose an inference endpoint compatible with LangChain. -
Fine-tune LLM model:
Practice fine-tuning a pre-trained model using domain-specific data with PEFT and Unsloth. -
Build RAG with LLM model:
Implement a retrieval-augmented generation (RAG) system using LangChain and FAISS to search documents and generate answers to user queries.