Skip to main content
Tutorial series | Kubeflow basic workflow

Kubeflow model serving API setup

This tutorial explains how to build a machine learning pipeline using a sample dataset in the Kubeflow environment on KakaoCloud, and serve the trained model as a web API.

Basic information
  • Estimated time: 20 minutes
  • Recommended OS: MacOS, Ubuntu
  • Reference docs:
  • Note:
    • File downloads may not work properly in private network environments.

About this scenario

This tutorial walks you through how to use KServe in the KakaoCloud Kubeflow environment to serve a trained model as a real-time prediction API. You will learn how to efficiently deploy and manage your model with scalability and minimal infrastructure setup.

You’ll implement the model serving API process using KServe and understand how to manage real-time prediction systems.

Key topics include:

  • Creating and configuring a KServe model server instance in Kubeflow
  • Deploying a trained model as a real-time prediction API
  • Managing and optimizing the serving process using KServe

Supported tools

ToolVersionDescription
KServe0.11.2- Model serving tool that supports fast deployment, updates, high availability, and scalability.
- Automatically handles common ML serving tasks such as load balancing, versioning, and failure recovery.
info

For more information about KServe, refer to the Kubeflow > KServe official docs.

Before you start

This section explains the required setup and resources to use the model serving API.

1. Verify Kubeflow domain connection

To proceed with this tutorial, a domain must be configured in the Domain Connection (Optional) step when creating Kubeflow. Also, if a namespace quota is set, it may interfere with this tutorial, so make sure to proceed without quota restrictions.

info

See the Create Kubeflow and Configure Kubeflow Quotas docs for more details.

2. Prepare training data

You’ll implement a model that predicts restaurant review ratings using Yelp restaurant review data from the 2015 Yelp challenge.

ItemDescription
GoalBuild a model to predict review ratings based on text
DataRestaurant review texts and ratings from Yelp users

Original Yelp challenge dataset

3. Prepare GPU-based Notebook

This tutorial uses a GPU node pool for the notebook environment.

If you haven't prepared the Kubeflow environment, refer to the Create Jupyter Notebook guide to create a notebook with a GPU image.

Getting started

Step 1. Create pipeline and model server in Notebook

info

If you encounter errors at the Serve a model with KServe step, it may be due to insufficient node pool resources. In that case, increase the number of worker nodes and re-run the pipeline.

  1. Download the example notebook:

  2. Access your notebook instance and upload the file via the browser.

    Upload to Jupyter file browser

  3. After uploading, check the content on the right pane and enter required information in the second cell:

    • Kubeflow domain address
    • Kubeflow account email
    • Kubeflow account password

    Notebook file uploaded

  4. Execute all steps in the notebook up to the [Model Serving API Test] section. This will create the training model, serving component, model component, and pipeline.

    Pipeline run confirmed

  5. After model creation, go to the Models tab to access the serving API.
    (In Kubeflow v1.8, the menu name has changed to Endpoints.)

    Check model server

Step 2. Use model serving API

Test from within Kubeflow Notebook or internal network

You can use the Cluster IP to test the serving API from within the notebook or the internal Kubernetes network where Kubeflow is installed.

  • In the notebook, run the [Model Serving API Test] section.

    Test serving API

Test inference API from outside of Kubeflow

To access the serving API externally, you must set a domain when creating Kubeflow. If you've done so, you can use the script below to test:

  1. Fill in the appropriate host, kbm_namespace, username, password, and run the script:

    Sample script
    host = "${HOST}"
    kbm_namespace = "${NAMESPACE}"
    username = "${USER_EMAIL}"
    password = "${USER_PASSWORD}"
    input_text_data = "Hello World!" # test string

    model_name = "torch-model"
    model_serv_name = "torchserve"

    session = requests.Session()
    _kargs = {
    "verify": False
    }
    response = session.get(
    "https://" + host, **_kargs
    )

    headers = {
    "Content-Type": "application/x-www-form-urlencoded",
    }

    session.post(response.url, headers=headers, data={"login": username, "password": password})
    session_cookie = session.cookies.get_dict()["authservice_session"]
    print(session_cookie)

    url = f"http://{host}/v1/models/{model_name}:predict"
    host = f"{model_serv_name}.{kbm_namespace}.{host}"
    print(url)
    print(host)

    session = {'authservice_session': session_cookie}
    data = {"instances": [{"data": input_text_data}]}

    headers = {
    "Host": host,
    }

    x = requests.post(
    url=url,
    cookies=session,
    headers=headers,
    json=data
    )

    print(f"Input: {data}")
    print(f"Result: {x.text}")
    환경변수설명
    HOST🖌domain without 'http://' e.g., testkbm.dev.kakaoi.io
    NAMESPACE🖌kubeflow namespace e.g., kbm-admin
    USER_EMAIL🖌kubeflow email e.g., kbm@kakaoenterprise.com
    USER_PASSWORD🖌kubeflow password e.g., kbm@password
  2. Sample output:

    External serving API test

Step 3. Delete the model server

info

For more on KServe, refer to Kubeflow > KServe documentation.

To delete a model server, go to the Models tab and click the [Delete Server] button on the corresponding row.
(In Kubeflow v1.8, the menu is called Endpoints.)

Delete model server

Step 4. Delete the Run

info

It is recommended to delete completed or unused Runs for resource management.

  1. Go to the Runs tab in the Kubeflow dashboard, select the Run to delete, and click [Archive].

    Archive run

  2. Go to the Archived section of the Runs tab, select the archived Run, and click [Delete].

    Delete run

  3. The corresponding pod will also be deleted.

    Confirm deletion