Tutorial series | Kubeflow LLM workflows

2. Fine-tune LLM model

📘 This guide covers the complete workflow, from fine-tuning pretrained models to performing inference.

Basic information

Estimated duration: 60 minutes
Recommended operating system: Ubuntu

About this scenario

This guide describes the process of fine-tuning pretrained models of Kakao’s Kanana and Meta Llama 3.2 in a GPU environment using Jupyter Notebook within a Kubeflow environment. Users can create domain-specific LLMs using KakaoCloud technical documentation data, perform model storage, and test inference.

The main contents are as follows:

Creating a GPU-based notebook instance
Preprocessing and tokenizing training data
Performing LLM fine-tuning using PEFT or Unsloth
Saving the tuned model and uploading it to Object Storage
Confirming model inference results through tests

Supported tools

Tool	Version	Description
Jupyter Notebook	4.2.1	Web-based development environment supporting integration with various ML frameworks and Kubeflow SDK
PEFT	latest	Huggingface library for efficient fine-tuning of pretrained LLMs - Also used for LoRA fine-tuning of Kanana models
Unsloth	2025.3	QLoRA-based fine-tuning tool optimized for Llama3

Before you start

1. Prepare Kubeflow environment

To fine-tune LLM models on Kubeflow, you need a GPU node environment with the specifications below. First, prepare an environment with a GPU node pool configured by referring to the prerequisites.

2. Create Object Storage bucket

Create an Object Storage bucket to save the fine-tuned models. The bucket you create will be used later as the reference path for model files when serving models.

For detailed instructions on bucket creation, refer to Object Storage bucket creation and management.
If you already selected KakaoCloud Object Storage during Kubeflow creation, you can use the automatically created bucket in the format kubeflow-{kubeflow id}.

3. Prepare training dataset

Perform step-by-step fine-tuning exercises in a GPU environment using KakaoCloud technical documentation, specifically Kubeflow service guides and tutorial texts, as training data. Download the sample data below.

Training data download: sample_train_data.csv

Getting started

This scenario explains how to fine-tune pretrained models based on Kakao’s Kanana and Meta Llama 3 using LoRA or QLoRA techniques. LoRA is implemented using Huggingface’s PEFT library, while QLoRA is implemented using Unsloth, optimized for Llama3. Kanana models are compatible with PEFT and Transformers, enabling LoRA-based fine-tuning.

Both methods have strengths in parameter efficiency and low resource usage, allowing you to select according to your environment and objectives.

info

각 기법에 대한 자세한 설명은 Huggingface PEFT 공식 문서와 Unsloth 공식 문서를 확인하세요.

Step 1. Create Jupyter Notebook instance

Create a GPU-based notebook instance for fine-tuning exercises in the Kubeflow dashboard.

In the Kubeflow dashboard, select the Notebooks tab.
Click the [New Notebook] button in the upper-right corner to create a new instance.
Enter the following information:
- Notebook Image: select kc-kubeflow/jupyter-pytorch-cuda-full:v1.8.0.py311.1a
- Minimum Notebook specifications: enter at least 3 vCPUs and 6GB memory
- GPU connection: select Number of GPUs as 1, GPU Vendor as NVIDIA MIG - 1g.10gb
After entering the configuration, click the [LAUNCH] button to create the instance.

Step 2. Install packages and verify GPU connection

Install the Python packages required for fine-tuning and verify that GPU resources are correctly connected.

Install the required packages by running the commands corresponding to your chosen method below.
- PEFT
- Unsloth
! pip install transformers peft datasets
! pip install transformers unsloth datasets Pillow==9.1.0

Run the following code to check whether the GPU driver and CUDA environment are properly configured.

Verify PyTorch GPU environment
import os
import torch

os.environ["NVIDIA_VISIBLE_DEVICES"] = "0"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

print(torch.__version__)
print(torch.version.cuda)
print(torch.backends.cudnn.version())
print(torch.cuda.is_available())

If the GPU is properly connected, it will output True as shown below.

Example output
2.6.0+cu124
12.4
90100
True

Step 3. Load pre-trained model

Download the pre-trained model files for both models from the specified model_name path on the Hugging Face Model Hub, and load the model and tokenizer. Depending on the selected method (PEFT or Unsloth), use the following commands to load the model for the GPU environment.

PEFT
Unsloth

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
 
Load base model for fine-tuning
model_name = "kakaocorp/kanana-nano-2.1b-base" # or "meta-llama/Llama-3.1-8B"
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
).to("cuda")

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side="left")
# Tokenizer settings: For llama-type models, specify the pad_token as shown below
tokenizer.pad_token = tokenizer.eos_token

from unsloth import FastLanguageModel

model_name = "unsloth/Meta-Llama-3.1-8B-bnb-4bit"

# Set maximum sequence length (RoPE scaling is automatically supported internally)
max_seq_length = 4096 

# None for auto detection.
dtype = (
	None 
)

model, tokenizer = FastLanguageModel.from_pretrained(
 	model_name=model_name,
 	max_seq_length=max_seq_length,
 	dtype=dtype,
 	load_in_4bit=True # Use 4-bit quantization to reduce memory usage
)

Example output
==((====))==  Unsloth 2025.3.11: Fast Llama3 patching. Transformers: 4.50.3.
   \\   /|    NVIDIA A100 80GB PCIe MIG 1g.10gb. Num GPUs = 1. Max memory: 9.5 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = True]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!

Step 4. Prepare dataset

After uploading the provided training data in CSV format to the notebook environment, use the Huggingface Datasets library to load the dataset, then preprocess and tokenize it into a format suitable for LLM training.

Upload the sample_train_data.csv file to your local notebook environment, then load it using the Huggingface Datasets library.

Load CSV file
from datasets import Dataset

dataset = Dataset.from_csv('sample_train_data.csv')
dataset

Example output
Generating train split: 
 111/0 [00:00<00:00, 7271.09 examples/s]
Dataset({
    features: ['Unnamed: 0', 'instruction', 'output', 'input'],
    num_rows: 111
})

Convert the instruction-input-output columns of the training data into Alpaca-style prompt format suitable for LLM training.
To do this, define a preprocessing function (formatting_prompts_func) and apply it to the dataset to generate the text inputs required for model training.

def formatting_prompts_func(examples):
    alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]

   EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN

    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }

dataset = dataset.map(formatting_prompts_func, batched = True,)
dataset = dataset.remove_columns(['Unnamed: 0'])
dataset

Example output
Map: 100% 111/111 [00:00<00:00, 5646.33 examples/s]
Dataset({
    features: ['instruction', 'output', 'input', 'text'],
    num_rows: 111
})

For LoRA or QLoRA-based fine-tuning, both Kakao Kanana-Nano-2.1B and Meta Llama 3.2 models require a tokenized input dataset.
The following code is an example of a preprocessing function that converts text data into a training format using a Hugging Face tokenizer.
Tokenization preprocessing function
```
def tokenize_function(examples):
    tokens = tokenizer(examples["text"], padding=True, return_tensors="pt")
    tokens["labels"] = tokens["input_ids"]
    return tokens

dataset = dataset.map(tokenize_function, batched=True, remove_columns=["text"])
```

Step 5. Train model

Fine-tune the pretrained LLM using either the LoRA or QLoRA method. In this step, you will convert the model structure, configure training parameters, and run the actual training process.

1. Convert model structure and configure training

Depending on the method used (PEFT or Unsloth), convert the pretrained model into a LoRA or QLoRA format and set the necessary training parameters. You can check the configuration for each method in the tabs below.

PEFT model conversion
Unsloth model conversion

Use the get_peft_model function from the PEFT library to create a LoRA-based PeftModel and perform fine-tuning.

from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    task_type="CAUSAL_LM",
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
    target_modules=[
        "q_proj", 
        "k_proj", 
        "v_proj"]
)
model = get_peft_model(base_model, lora_config)

Use the Trainer and TrainingArguments classes from the Transformers library to configure the necessary hyperparameters for training.

output_dir	Path to save the model
num_train_epochs	Number of training epochs
per_device_train_batch_size	Batch size per device
gradient_accumulation_steps	Gradient accumulation steps (memory efficient)
evaluation_strategy	Evaluation strategy ("no", "epoch", "steps")
save_strategy	Save strategy ("no", "epoch", "steps")
save_steps	Save interval (in steps)
learning_rate	Learning rate
weight_decay	Weight decay
logging_steps	Logging interval (in steps)
fp16	Use mixed precision (set to False for CPU)

from transformers import Trainer, TrainingArguments

trainer = Trainer(
    model=model,
    train_dataset=dataset,
    args=TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 60,
        learning_rate = 2e-4, 
        bf16 = True,
        logging_steps = 1,
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 1234,
        output_dir = "outputs",
        report_to = "none"
    )
)

Use the get_peft_model function from unsloth.FastLanguageModel to create a PeftModel and fine-tune the model using that instance.

model = FastLanguageModel.get_peft_model(
    model,
    r=16,  # Any positive number such as 8, 16, 32, 64, or 128 is recommended
    lora_alpha=32,  # Set the LoRA alpha value
    lora_dropout=0.05,  # Supports dropout
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],  # Specify target modules
    bias="none",  # Supports bias
    # Use True or "unsloth" to reduce VRAM usage by 30% and support 2x larger batch sizes for long context
    use_gradient_checkpointing="unsloth",
    random_state=1234,
    use_rslora=False,
    loftq_config=None
)

Configure key training parameters such as batch size, gradient accumulation steps, learning rate, and logging interval.

output_dir	Path to save the model
num_train_epochs	Number of training epochs
per_device_train_batch_size	Batch size per device
gradient_accumulation_steps	Gradient accumulation steps (memory efficient)
evaluation_strategy	Evaluation strategy ("no", "epoch", "steps")
save_strategy	Save strategy ("no", "epoch", "steps")
save_steps	Save interval (in steps)
learning_rate	Learning rate
weight_decay	Weight decay
logging_steps	Logging interval (in steps)
fp16	Use mixed precision (set to False for CPU)

from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Training with short sequences can speed up the process by up to 5 times.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 60,
        learning_rate = 2e-4,
        bf16 = True,
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 1234,
        output_dir = "outputs",
        report_to = "none", # This can be used when utilizing logging tools such as WandB.
    )
)

2. Check GPU memory before training

Before starting model training, check the current GPU memory status to assess resource usage.

# Code to display current memory status
gpu_stats = torch.cuda.get_device_properties(0)  # Retrieve GPU properties

start_gpu_memory = round(
    torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3
)  # Calculate reserved GPU memory at initialization
max_memory = round(
    gpu_stats.total_memory / 1024 / 1024 / 1024, 3
)  # Calculate maximum GPU memory
print(
    f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB."
)  # Display GPU name and maximum memory
print(f"{start_gpu_memory} GB of memory reserved.")  # Output reserved GPU memory

Output example
GPU = NVIDIA A100 80GB PCIe MIG 1g.10gb. Max memory = 9.5 GB.
9.252 GB of memory reserved.

3. Run model training

Execute model training using the configured Trainer instance.

# Train model
trainer.train()

Output example
Step	Training Loss
0.441900
0.440900
0.372000
0.435600
0.281700
0.328600
0.300900
0.263100
0.319700
0.238000
0.296800
0.305000
0.211000
0.272500
0.150800
0.143300
0.179500
0.166800
0.138900
0.169000
0.174700
0.219700
0.185800
0.174700
0.153700
0.145200
0.160600
0.183200
0.105200
0.096700
0.105100
0.087900
0.073600
0.086600
0.079100
0.078400
0.091000
0.082200
0.086900
0.099400
0.121900
0.078300
0.061300
0.055300
0.053300
0.059000
0.050900
0.062600
0.057600
0.058700
0.053400
0.065500
0.061800
0.057600
0.077700
0.075800
0.054700
0.047000
0.043600
0.054600

4. Display status after training

After model training is completed, display GPU memory usage and training duration.

# Final memory and time statistics
used_memory = round(
    torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3
)  # Calculate peak reserved memory in GB
used_memory_for_lora = round(
    used_memory - start_gpu_memory, 3
)  # Calculate memory used for LoRA in GB
used_percentage = round(
    used_memory / max_memory * 100, 3
)  # Percentage of peak memory used
lora_percentage = round(
    used_memory_for_lora / max_memory * 100, 3
)  # Percentage used for LoRA
print(
    f"{trainer_stats.metrics['train_runtime']} seconds used for training."
)  # Training time in seconds
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)  # Training time in minutes
print(
    f"Peak reserved memory = {used_memory} GB."
)  # Peak reserved memory in GB
print(
    f"Peak reserved memory for training = {used_memory_for_lora} GB."
)  # Reserved memory for training in GB
print(
    f"Peak reserved memory % of max memory = {used_percentage} %."
)  # Memory usage percentage
print(
    f"Peak reserved memory for training % of max memory = {lora_percentage} %."
)  # LoRA usage percentage

Example output
379.7191 seconds used for training.
6.33 minutes used for training.
Peak reserved memory = 9.252 GB.
Peak reserved memory for training = 0.0 GB.
Peak reserved memory % of max memory = 97.389 %.
Peak reserved memory for training % of max memory = 0.0 %.

Step 6. Save model

Save the fine-tuned model to Object Storage. The saved model will be used later for deployment or inference.

PEFT
Unsloth

The fine-tuned PEFT model can be saved either as LoRA adapter weights only or as a fully merged model.

1. Save LoRA adapter weights only

model_dir = "kanana-2-1b-kcdocs-adapt" # or "llama-3-8b-it-kcdocs-adapt"

model.save_pretrained(model_dir)

2. Merge with base model and save

If using for Huggingface-based serving, save the merged model and tokenizer using merge_and_unload().

model_dir = "kanana-2-1b-kcdocs" # or "llama-3-8b-it-kcdocs"

# Merge with base model and save
model = model.merge_and_unload()

model.save_pretrained(model_dir)
# Save tokenizer for Huggingface serving
tokenizer.save_pretrained(model_dir)

3. Load adapter weights into base model

Alternatively, load the base model first and then apply the LoRA adapter weights.

import torch
from transformers import AutoModelForCausalLM
from peft import PeftModel

# 1. Load base model
model_id = "kakaocorp/kanana-nano-2.1b-instruct" # or "unsloth/Meta-Llama-3.1-8B-bnb-4bit"

base_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
).to("cuda")

adapter_dir = "kanana-2-1b-kcdocs-adapt" # or "llama-3-8b-it-kcdocs-adapt"

# 2. Apply adapter weights to base model
model = PeftModel.from_pretrained(base_model, adapter_dir)

4. Load merged model directly

Load the saved merged model directly for inference.

model_dir = "kanana-2-1b-kcdocs" # or "llama-3-8b-it-kcdocs"

base_model = AutoModelForCausalLM.from_pretrained(
    model_dir,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
).to("cuda")

The fine-tuned Unsloth model can be saved not only as a regular model but also as a merged model for VLLM or in GGUF format.

1. Save model and tokenizer together

model_dir = "kanana-2-1b-kcdocs" # or "llama-3-8b-it-kcdocs"

model.save_pretrained(model_dir)

2. Save merged model for VLLM (16bit)

To optimize for inference, save a 16-bit merged model for use with VLLM.

# Save 16-bit merged model for VLLM
model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit")

3. Save in GGUF format

Save the model in GGUF format for lightweight inference environments (e.g., runtimes supporting GGUF).

# Save as GGUF file
model.save_pretrained_gguf("Llama-3.2-3B-Instruct-kcdocs-gguf", tokenizer, quantization_method = "f16")

4. Load fine-tuned model

Load the saved model using FastLanguageModel in an optimized format such as 4-bit.

from unsloth import FastLanguageModel

model_dir = "kanana-2-1b-kcdocs" # or "llama-3-8b-it-kcdocs"

max_seq_length = 4096
dtype = None
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_dir,
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

FastLanguageModel.for_inference(model)

# Ready for inference
#text_streamer = TextStreamer(tokenizer)
#_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 64)

5. Upload files to Object Storage

Upload all files in the fine-tuned model directory (model_dir) to your KakaoCloud Object Storage bucket.

Note

Refer to S3 > Authentication > Using temporary credentials for S3 requests to issue and use EC2 credentials.
See Using Object Storage with the S3 API > Type 2. Python SDK (Boto3) Example for details.

import boto3

# Enter issued EC2 credentials
AWS_ACCESS_KEY_ID = '<YOUR EC2 CREDENTIAL ACCESS KEY>'
AWS_SECRET_ACCESS_KEY = '<YOUR EC2 CREDENTIAL SECRET KEY>'
ENDPOINT = "https://objectstorage.kr-central-2.kakaocloud.com"
 

# boto client
client = boto3.client(
    region_name="kr-central-2",
    endpoint_url=ENDPOINT,
    aws_access_key_id=AWS_ACCESS_KEY_ID,
    aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
    service_name="s3"
)

# Create bucket (if needed)
bucket_name = "kbm-llm-tutorial" 
client.create_bucket(Bucket=bucket_name)


# Upload files to bucket
from glob import glob
file_paths = glob(f"{model_dir}/*")
object_names = [f.split("/")[-1] for f in file_paths]

for file_name, object_name in zip(file_paths, object_names):
    print(file_name)
    client.upload_file(
        Filename=file_name,
        Bucket=bucket_name,
        Key=object_name
    )

response = client.list_buckets()

Example output
./kbm-llm-tutorial/tokenizer_config.json
./kbm-llm-tutorial/tokenizer.json
./kbm-llm-tutorial/config.json
./kbm-llm-tutorial/model.safetensors
./kbm-llm-tutorial/generation_config.json
./kbm-llm-tutorial/special_tokens_map.json

dashboard1

Step 7. Run inference on trained model

Load the fine-tuned model and perform inference to generate responses for given input prompts. The following example demonstrates how to validate the model's output.

from transformers import TextStreamer

text_streamer = TextStreamer(tokenizer)

inputs = tokenizer(
[
    alpaca_prompt.format(
        "Recommend an AI platform.",  # instruction
        "",  # input (leave blank if not applicable)
        "",  # output (leave blank for generation)
    )
], return_tensors = "pt").to("cuda")

_ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=2048, use_cache=False)

Example output
<bos>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Recommend an AI platform.

### Input:


### Response:
KakaoCloud's AI service provides the open-source platform Kubeflow, which helps users easily build and run machine learning workflows.<eos>

Example 2: "Tell me how to create Kubeflow on KakaoCloud."

inputs = tokenizer(
[
    alpaca_prompt.format(
        "Tell me how to create Kubeflow on KakaoCloud.", # instruction
        "", # input
        "", # output
    )
], return_tensors = "pt").to("cuda")

_ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=2048, use_cache=False)

Example output
<bos>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Tell me how to create Kubeflow on KakaoCloud.

### Input:


### Response:
To create Kubeflow on KakaoCloud, go to the KakaoCloud Console, navigate to **Container Pack > Kubeflow**, and click **\[Create Kubeflow]** to configure the basic settings and quotas. After the Kubeflow instance is created, set up the `kubectl` configuration file and access the Kubeflow dashboard to create your user namespace.<eos>

Example 3: "Tell me the key features of KakaoCloud Kubeflow."

inputs = tokenizer(
[
    alpaca_prompt.format(
        "Tell me the key features of KakaoCloud Kubeflow." # instruction
        "", # input
        "", # output
    )
], return_tensors = "pt").to("cuda")


_ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=2048, use_cache=False)

Example output
<bos>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Tell me the key features of KakaoCloud Kubeflow.

### Input:


### Response:
KakaoCloud Kubeflow provides key features for data preprocessing, model training, and model serving.  
It supports fast and easy ML model development by offering a consistent interface and a high level of abstraction for developers. <eos>

Example 4: "Explain Katib in Kubeflow."

inputs = tokenizer(
[
    alpaca_prompt.format(
        "Explain Katib in Kubeflow.",  # instruction
        "",  # input
        "",  # output
    )
], return_tensors = "pt").to("cuda")

_ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=2048, use_cache=False)

Example output
<bos>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Explain Katib in Kubeflow.

### Input:


### Response:
Katib in Kubeflow is a tool for A/B testing and hyperparameter optimization provided within the Kubeflow platform.<eos>

About this scenario​

Supported tools​

Before you start​

1. Prepare Kubeflow environment​

2. Create Object Storage bucket​

3. Prepare training dataset​

Getting started​

Step 1. Create Jupyter Notebook instance​

Step 2. Install packages and verify GPU connection​

Step 3. Load pre-trained model​

Step 4. Prepare dataset​

Step 5. Train model​

1. Convert model structure and configure training​

2. Check GPU memory before training​

3. Run model training​

4. Display status after training​

Step 6. Save model​

1. Save LoRA adapter weights only​

2. Merge with base model and save​

3. Load adapter weights into base model​

4. Load merged model directly​

1. Save model and tokenizer together​

2. Save merged model for VLLM (16bit)​

3. Save in GGUF format​

4. Load fine-tuned model​

5. Upload files to Object Storage​

Step 7. Run inference on trained model​

Example 1: "Recommend an AI platform."​

Example 2: "Tell me how to create Kubeflow on KakaoCloud."​

Example 3: "Tell me the key features of KakaoCloud Kubeflow."​

Example 4: "Explain Katib in Kubeflow."​