Creating Kubeflow model serving API
This guide explains how to build a machine learning pipeline using the KakaoCloud Kubeflow environment and serve the generated model as a web API.
- Estimated time: 20 minutes
- Recommended OS: MacOS, Ubuntu
- Region: kr-central-2
- Prerequisites:
- Reference document: Setting up Jupyter Notebook using Kubeflow
Before starting
This tutorial explains the process of serving a model in the Kubeflow environment, allowing users to learn how to build and manage an API for real-time predictions. Through this guide, you will understand and implement the complete process of serving a trained model as a web API using KServe in Kubeflow.
About this scenario
In this tutorial, you will learn step-by-step how to build and manage a model serving API using KServe in the KakaoCloud Kubeflow environment. KServe allows you to serve and scale models without managing complex infrastructure. The key topics covered in this scenario are:
- Creating and configuring a KServe model server instance in Kubeflow
- Building a real-time prediction API using a trained model
- Learning how to manage and optimize the model serving process using KServe
Supported tools
Tool | Version | Description |
---|---|---|
KServe | 0.11.2 | - A model serving tool that supports fast model deployment and updates with high availability and scalability. - Automatically handles common issues in model serving (load balancing, model versioning, failure recovery, etc.) |
For more information about KServe, refer to the Kubeflow > KServe official documentation.
Prework
Here is a guide to preparing the environment and necessary resources for using the model serving API.
1. Verify Kubeflow domain connection
To proceed with this exercise, a domain must be set in the Domain Connection (optional) section during the Create Kubeflow process. Also, to avoid issues during this exercise, ensure that no namespace quota is set and that domain connection is enabled.
For more details, refer to the Create Kubeflow and Kubeflow Quota Settings documents.
2. Prepare training data
You can build a review rating prediction model using restaurant review data from the 2015 Yelp restaurant rating prediction competition.
- Download the dataset:
Item | Description |
---|---|
Goal | Build a restaurant review rating prediction model using text |
Data details | Yelp platform user restaurant review text, review ratings |
3. Prepare GPU-based notebook
This tutorial uses a notebook in a GPU node pool environment.
If the Kubeflow service or appropriate environment is not ready, refer to the Create a Jupyter Notebook document to create a GPU-based notebook.
Step-by-step process
Step 1. Create pipeline and model server in the notebook
- If an error occurs during the Serve a model with KServe step, there may be insufficient resources in the node pool.
- In such cases, increase the number of nodes in the Worker node pool and rerun the process.
-
Download the following exercise example.
- Example download: yelp_review_pytorch_deploy_model_build_pipeline_gpu.ipynb
-
After downloading, access the created notebook instance and upload the file to the browser.
Upload file to Jupyter Notebook console
-
Once the upload is complete, review the content on the right side and enter the necessary information in the second cell.
- Enter the KUBEFLOW domain address
- Enter the KUBEFLOW account email
- Enter the KUBEFLOW account password
Example file upload completed
-
Run the notebook up to the [Model Serving API Test] step to generate the training model, serving components, model components, and pipeline.
- Once completed, you will see the yelp_review_nlp_model_Pipeline run.
Check pipeline run
-
Once the model is created, you can access the Models tab to use the serving API.
Model server confirmation
Step 2. Using model serving API
Testing within a Kubeflow notebook or internal environment
You can use the serving API within the internal network of Kubernetes or from a notebook using the Cluster IP.
-
Run the [Model Serving API Test] section in the notebook to test the serving API.
Testing model serving API
Testing the inference API externally from Kubeflow
To use the serving API externally from Kubeflow, you must enter a domain address in the Domain Connection section when creating Kubeflow. If you entered a domain during Kubeflow creation, you can test it with the Python script below.
-
Enter the appropriate host, kbm_namespace, username, password, and run the script.
Example scripthost = "${HOST}"
kbm_namespace = "${NAMESPACE}"
username = "${USER_EMAIL}"
password = "${USER_PASSWORD}"
input_text_data = "Hello World!" # Test string
model_name = "torch-model"
model_serv_name = "torchserve"
session = requests.Session()
_kargs = {
"verify": False
}
response = session.get(
"https://" + host, **_kargs
)
headers = {
"Content-Type": "application/x-www-form-urlencoded",
}
session.post(response.url, headers=headers, data={"login": username, "password": password})
session_cookie = session.cookies.get_dict()["authservice_session"]
print(session_cookie)
url = f"http://{host}/v1/models/{model_name}:predict"
host = f"{model_serv_name}.{kbm_namespace}.{host}"
print(url)
print(host)
session={'authservice_session': session_cookie}
data = {"instances": [{"data": input_text_data}]}
headers = {
"Host": host,
}
x = requests.post(
url=url,
cookies=session,
headers=headers,
json=data
)
print(f"Input: {data}")
print(f"Output: {x.text}")환경변수 설명 HOST🖌︎ Domain address excluding 'http://' e.g., testkbm.dev.kakaoi.io NAMESPACE🖌︎ kubeflow namespace e.g., kbm-admin USER_EMAIL🖌︎ kubeflow account email e.g., kbm@kakaoenterprise.com USER_PASSWORD🖌︎ kubeflow account password e.g., kbm@password -
The execution result is shown below.
Running the serving API externally
Step 3. Deleting model server
For more information about the KServe tool, please refer to the Kubeflow > KServe official documentation.
In the Models tab, click the [Delete Server] button on the row of the model server you want to delete to remove any model server that is no longer in use.
Deleting the model server
Step 4. Deleting Run
We recommend deleting completed or unused runs to manage resources effectively.
-
Access the Kubeflow dashboard, click on the Runs tab, select the run you wish to delete, and click the [Archive] button.
Archiving run
-
You can view archived runs in the Archived section of the Runs tab. Select the run and click the [Delete] button to delete it.
Deleting run
-
Upon deleting the run, you will notice that the associated pods are also deleted.
Confirming run deletion