Tutorial series | Kubeflow basic workflows

Kubeflow model serving API setup

This tutorial explains how to build a machine learning pipeline using a sample dataset in the Kubeflow environment on KakaoCloud, and serve the trained model as a web API.

Basic information

Estimated time: 20 minutes
Recommended OS: MacOS, Ubuntu
Reference docs:
- Deploy Jupyter Notebooks on Kubeflow
Note:
- File downloads may not work properly in private network environments.

About this scenario

This tutorial walks you through how to use KServe in the KakaoCloud Kubeflow environment to serve a trained model as a real-time prediction API. You will learn how to efficiently deploy and manage your model with scalability and minimal infrastructure setup.

You’ll implement the model serving API process using KServe and understand how to manage real-time prediction systems.

Key topics include:

Creating and configuring a KServe model server instance in Kubeflow
Deploying a trained model as a real-time prediction API
Managing and optimizing the serving process using KServe

Supported tools

Tool	Version	Description
KServe	0.11.2	- Model serving tool that supports fast deployment, updates, high availability, and scalability. - Automatically handles common ML serving tasks such as load balancing, versioning, and failure recovery.

info

For more information about KServe, refer to the Kubeflow > KServe official docs.

Before you start

This section explains the required setup and resources to use the model serving API.

1. Verify Kubeflow domain connection

To proceed with this tutorial, a domain must be configured in the Domain Connection (Optional) step when creating Kubeflow. Also, if a namespace quota is set, it may interfere with this tutorial, so make sure to proceed without quota restrictions.

info

See the Create Kubeflow and Configure Kubeflow Quotas docs for more details.

2. Prepare training data

You’ll implement a model that predicts restaurant review ratings using Yelp restaurant review data from the 2015 Yelp challenge.

Download sample datasets:
- Training data (csv)
- Validation data (csv)

Item	Description
Goal	Build a model to predict review ratings based on text
Data	Restaurant review texts and ratings from Yelp users

Original Yelp challenge dataset

3. Prepare GPU-based Notebook

This tutorial uses a GPU node pool for the notebook environment.

If you haven't prepared the Kubeflow environment, refer to the Create Jupyter Notebook guide to create a notebook with a GPU image.

Getting started

Step 1. Create pipeline and model server in Notebook

info

If you encounter errors at the Serve a model with KServe step, it may be due to insufficient node pool resources. In that case, increase the number of worker nodes and re-run the pipeline.

Download the example notebook:
- yelp_review_pytorch_deploy_model_build_pipeline_gpu.ipynb
Access your notebook instance and upload the file via the browser.
After uploading, check the content on the right pane and enter required information in the second cell:
- Kubeflow domain address
- Kubeflow account email
- Kubeflow account password
Execute all steps in the notebook up to the [Model Serving API Test] section. This will create the training model, serving component, model component, and pipeline.
After model creation, go to the Models tab to access the serving API.
(In Kubeflow v1.8, the menu name has changed to Endpoints.)

Step 2. Use model serving API

Test from within Kubeflow Notebook or internal network

You can use the Cluster IP to test the serving API from within the notebook or the internal Kubernetes network where Kubeflow is installed.

In the notebook, run the [Model Serving API Test] section.

Test inference API from outside of Kubeflow

To access the serving API externally, you must set a domain when creating Kubeflow. If you've done so, you can use the script below to test:

Fill in the appropriate host, kbm_namespace, username, password, and run the script:

Sample script
host = "${HOST}"
kbm_namespace = "${NAMESPACE}"
username = "${USER_EMAIL}"
password = "${USER_PASSWORD}"
input_text_data = "Hello World!" # test string

model_name = "torch-model"
model_serv_name = "torchserve"

session = requests.Session()
_kargs = {
 "verify": False
}
response = session.get(
 "https://" + host, **_kargs
)

headers = {
 "Content-Type": "application/x-www-form-urlencoded",
}

session.post(response.url, headers=headers, data={"login": username, "password": password})
session_cookie = session.cookies.get_dict()["authservice_session"]
print(session_cookie)

url = f"http://{host}/v1/models/{model_name}:predict"
host = f"{model_serv_name}.{kbm_namespace}.{host}"
print(url)
print(host)

session = {'authservice_session': session_cookie}
data = {"instances": [{"data": input_text_data}]}

headers = {
 "Host": host,
}

x = requests.post(
 url=url,
 cookies=session,
 headers=headers,
 json=data
)

print(f"Input: {data}")
print(f"Result: {x.text}")

환경변수	설명
HOST🖌︎	domain without 'http://' e.g., testkbm.dev.kakaoi.io
NAMESPACE🖌︎	kubeflow namespace e.g., kbm-admin
USER_EMAIL🖌︎	kubeflow email e.g., kbm@kakaoenterprise.com
USER_PASSWORD🖌︎	kubeflow password e.g., kbm@password

Sample output:

Step 3. Delete the model server

info

For more on KServe, refer to Kubeflow > KServe documentation.

To delete a model server, go to the Models tab and click the [Delete Server] button on the corresponding row.
(In Kubeflow v1.8, the menu is called Endpoints.)

Delete model server

Step 4. Delete the Run

info

It is recommended to delete completed or unused Runs for resource management.

Go to the Runs tab in the Kubeflow dashboard, select the Run to delete, and click [Archive].
Go to the Archived section of the Runs tab, select the archived Run, and click [Delete].
The corresponding pod will also be deleted.

About this scenario​

Supported tools​

Before you start​

1. Verify Kubeflow domain connection​

2. Prepare training data​

3. Prepare GPU-based Notebook​

Getting started​

Step 1. Create pipeline and model server in Notebook​

Step 2. Use model serving API​

Test from within Kubeflow Notebook or internal network​

Test inference API from outside of Kubeflow​

Step 3. Delete the model server​

Step 4. Delete the Run​

About this scenario

Supported tools

Before you start

1. Verify Kubeflow domain connection

2. Prepare training data

3. Prepare GPU-based Notebook

Getting started

Step 1. Create pipeline and model server in Notebook

Step 2. Use model serving API

Test from within Kubeflow Notebook or internal network

Test inference API from outside of Kubeflow

Step 3. Delete the model server

Step 4. Delete the Run