Skip to main content

Creating Kubeflow model serving API

This guide explains how to build a machine learning pipeline using the KakaoCloud Kubeflow environment and serve the generated model as a web API.

Basic information

Before starting

This tutorial explains the process of serving a model in the Kubeflow environment, allowing users to learn how to build and manage an API for real-time predictions. Through this guide, you will understand and implement the complete process of serving a trained model as a web API using KServe in Kubeflow.

About this scenario

In this tutorial, you will learn step-by-step how to build and manage a model serving API using KServe in the KakaoCloud Kubeflow environment. KServe allows you to serve and scale models without managing complex infrastructure. The key topics covered in this scenario are:

  • Creating and configuring a KServe model server instance in Kubeflow
  • Building a real-time prediction API using a trained model
  • Learning how to manage and optimize the model serving process using KServe

Supported tools

ToolVersionDescription
KServe0.11.2- A model serving tool that supports fast model deployment and updates with high availability and scalability.
- Automatically handles common issues in model serving (load balancing, model versioning, failure recovery, etc.)
info

For more information about KServe, refer to the Kubeflow > KServe official documentation.

Prework

Here is a guide to preparing the environment and necessary resources for using the model serving API.

1. Verify Kubeflow domain connection

To proceed with this exercise, a domain must be set in the Domain Connection (optional) section during the Create Kubeflow process. Also, to avoid issues during this exercise, ensure that no namespace quota is set and that domain connection is enabled.

info

For more details, refer to the Create Kubeflow and Kubeflow Quota Settings documents.

2. Prepare training data

You can build a review rating prediction model using restaurant review data from the 2015 Yelp restaurant rating prediction competition.

ItemDescription
GoalBuild a restaurant review rating prediction model using text
Data detailsYelp platform user restaurant review text, review ratings

3. Prepare GPU-based notebook

This tutorial uses a notebook in a GPU node pool environment.

If the Kubeflow service or appropriate environment is not ready, refer to the Create a Jupyter Notebook document to create a GPU-based notebook.


Step-by-step process

Step 1. Create pipeline and model server in the notebook

info
  • If an error occurs during the Serve a model with KServe step, there may be insufficient resources in the node pool.
  • In such cases, increase the number of nodes in the Worker node pool and rerun the process.
  1. Download the following exercise example.

  2. After downloading, access the created notebook instance and upload the file to the browser.

    Upload file to Jupyter Notebook console Upload file to Jupyter Notebook console

  3. Once the upload is complete, review the content on the right side and enter the necessary information in the second cell.

    • Enter the KUBEFLOW domain address
    • Enter the KUBEFLOW account email
    • Enter the KUBEFLOW account password

    Example file upload completed Example file upload completed

  4. Run the notebook up to the [Model Serving API Test] step to generate the training model, serving components, model components, and pipeline.

    • Once completed, you will see the yelp_review_nlp_model_Pipeline run.

    Check pipeline run Check pipeline run

  5. Once the model is created, you can access the Models tab to use the serving API.

    Model server confirmation Model server confirmation

Step 2. Using model serving API

Testing within a Kubeflow notebook or internal environment

You can use the serving API within the internal network of Kubernetes or from a notebook using the Cluster IP.

  • Run the [Model Serving API Test] section in the notebook to test the serving API.

    Image. Testing model serving API Testing model serving API

Testing the inference API externally from Kubeflow

To use the serving API externally from Kubeflow, you must enter a domain address in the Domain Connection section when creating Kubeflow. If you entered a domain during Kubeflow creation, you can test it with the Python script below.

  1. Enter the appropriate host, kbm_namespace, username, password, and run the script.

    Example script
    host = "${HOST}"
    kbm_namespace = "${NAMESPACE}"
    username = "${USER_EMAIL}"
    password = "${USER_PASSWORD}"
    input_text_data = "Hello World!" # Test string

    model_name = "torch-model"
    model_serv_name = "torchserve"

    session = requests.Session()
    _kargs = {
    "verify": False
    }
    response = session.get(
    "https://" + host, **_kargs
    )

    headers = {
    "Content-Type": "application/x-www-form-urlencoded",
    }

    session.post(response.url, headers=headers, data={"login": username, "password": password})
    session_cookie = session.cookies.get_dict()["authservice_session"]
    print(session_cookie)

    url = f"http://{host}/v1/models/{model_name}:predict"
    host = f"{model_serv_name}.{kbm_namespace}.{host}"
    print(url)
    print(host)

    session={'authservice_session': session_cookie}
    data = {"instances": [{"data": input_text_data}]}

    headers = {
    "Host": host,
    }

    x = requests.post(
    url=url,
    cookies=session,
    headers=headers,
    json=data
    )

    print(f"Input: {data}")
    print(f"Output: {x.text}")
    환경변수설명
    HOST🖌Domain address excluding 'http://' e.g., testkbm.dev.kakaoi.io
    NAMESPACE🖌kubeflow namespace e.g., kbm-admin
    USER_EMAIL🖌kubeflow account email e.g., kbm@kakaoenterprise.com
    USER_PASSWORD🖌kubeflow account password e.g., kbm@password
  2. The execution result is shown below.

    Image. Running the serving API externally Running the serving API externally

Step 3. Deleting model server

info

For more information about the KServe tool, please refer to the Kubeflow > KServe official documentation.

In the Models tab, click the [Delete Server] button on the row of the model server you want to delete to remove any model server that is no longer in use.

Image. Deleting the model server Deleting the model server

Step 4. Deleting Run

info

We recommend deleting completed or unused runs to manage resources effectively.

  1. Access the Kubeflow dashboard, click on the Runs tab, select the run you wish to delete, and click the [Archive] button.

    Image. Archiving a run Archiving run

  2. You can view archived runs in the Archived section of the Runs tab. Select the run and click the [Delete] button to delete it.

    Image. Deleting a run Deleting run

  3. Upon deleting the run, you will notice that the associated pods are also deleted.

    Image. Confirming run deletion Confirming run deletion