Skip to main content
Tutorial series

Kubeflow LLM workflows

This tutorial series provides a hands-on guide to the end-to-end process of preparing, training, and utilizing large language models (LLMs) using Kubeflow on KakaoCloud.
By working with models like Kakao’s Kanana and Meta’s Llama 3.2, you’ll learn how to operate LLMs in a production-like environment — from setting up inference endpoints to fine-tuning and implementing RAG-based applications.

This series is designed for developers already familiar with Kubeflow or users looking to adopt LLMs in MLOps environments.


Before you start

  1. Before getting started, ensure that a Kubeflow environment (CPU or GPU-based) is already set up.
    Refer to the Deploy Jupyter Notebooks on Kubeflow guide to prepare your environment.

  2. The following node pool specifications are recommended for LLM serving and fine-tuning tasks:

    TypeRecommended Specs
    CPU-based- m2a.2xlarge (8 vCPU, at least 32GiB RAM)
    - Volume: 100GiB or more
    GPU-based- p2i.6xlarge (A100 80GB, 24 vCPU, at least 192GiB RAM)
    - MIG: At least one 1g.10gb instance
    - Volume: 100GiB or more
  3. Example datasets and models used in the tutorials will be provided in each tutorial page with download links.


Tutorial structure

The LLM workflow series consists of the following stages:

  1. Create LLM model serving endpoint:
    Use KServe to deploy an LLM model and expose an inference endpoint compatible with LangChain.

  2. Fine-tune LLM model:
    Practice fine-tuning a pre-trained model using domain-specific data with PEFT and Unsloth.

  3. Build RAG with LLM model:
    Implement a retrieval-augmented generation (RAG) system using LangChain and FAISS to search documents and generate answers to user queries.