Skip to main content

Managing machine learning experiments with Kubeflow tensorBoard

This guide explains how to use the TensorBoard component in the KakaoCloud Kubeflow environment to manage and visualize log data generated during machine learning experiments.

Basic information

Before starting

TensorBoard is an essential tool for intuitively monitoring and analyzing the training process of machine learning models. By using TensorBoard in the Kubeflow environment, you can monitor the progress of machine learning experiments in real time and compare various experimental results. Additionally, you will learn how to understand key metrics for optimizing model performance and manage experiments with TensorBoard.

About this scenario

This tutorial provides a step-by-step introduction to installing TensorBoard and visualizing and analyzing actual training data logs in the KakaoCloud Kubeflow environment.

Key topics covered in this scenario include:

  • Creating and configuring TensorBoard instances in Kubeflow
  • Learning how to monitor and visualize log data for training models in real time
  • Analyzing model training processes using TensorBoard

Supported tools

ToolVersionDescription
TensorBoard2.1.0A visualization tool for machine learning experiments that tracks metrics, visualizes them, and tracks histograms of weights and tensors.
info

For more information on TensorBoard, refer to the official TensorBoard documentation.

Prework

This section covers the environment setup and necessary resources to use TensorBoard.

1. Prepare Kubeflow environment

Before using TensorBoard in Kubeflow, ensure that the node pool meets the necessary specifications. If the environment needs to be set up, refer to the Setting up Jupyter Notebook using Kubeflow guide to configure an appropriate Kubeflow environment.

Minimum requirements

  • Node pool minimum specs: At least 4 vCPUs and 8 GB memory
  • Sufficient File Storage size: 10 GiB or more

2. Create volume for storing logs

Create a Persistent Volume (PV) for TensorBoard to store training logs.

  1. Access the Kubeflow dashboard and navigate to the Volume tab.

  2. Click the [New Volume] button at the top to create a new volume.

  3. In the New Volume screen, enter the necessary information and click [Create] to create the volume.

    Image. Create a volume

Step-by-step process

The following steps will guide you through managing and visualizing machine learning experiment logs using TensorBoard.

Step 1. Create TensorBoard instance

  1. Select the Tensorboards tab and click the [New TensorBoard] button.

  2. In the New TensorBoard screen, enter the necessary information and click [Create] to create a TensorBoard instance. Image. Create a TensorBoard

    FieldValue
    Storage TypePVC
    PVC NameName of the volume created in Step 2
    Mount PathPath where TensorBoard will display logs from the selected volume
  3. Click the [CONNECT] button to access the created TensorBoard instance.

    Image. Check TensorBoard

Step 2. Create notebook instance for practice

In this step, you'll learn how to create a notebook instance for practice in Kubeflow.

  1. In the Kubeflow dashboard, select the Notebooks tab.

  2. Click the [New Notebook] button at the top to create a notebook instance.

  3. In the New notebook setup screen, enter the following details:

    • Docker Image: Select kc-kubeflow/jupyter-pyspark-pytorch:v1.8.0.py38.1a.
    • Workspace Volume: To remove the default volume, click the trash icon. Then, select the [Attach existing volume] option and choose the tensorboard volume created in Step 2.
  4. Once the setup is complete, click the [LAUNCH] button to create the instance.

Step 3. Train model and visualize the results with TensorBoard

This step explains how to train a model and visualize the training results using TensorBoard.

  1. Download the example project from the link below and upload it to the notebook instance you created:

  2. In the notebook, change the TENSORBOARD_URL variable to the address of the TensorBoard created in Step 2.

    • Image: Update TensorBoard URL
  3. Run the notebook code to train the model, and monitor the changing training process through TensorBoard using an IFrame.

    • Image: Check TensorBoard results