Skip to main content
Tutorial series | Kubeflow basic workflow

ML experiment management with Kubeflow TensorBoard

This tutorial explains how to manage and visualize log data generated during machine learning experiments by using the TensorBoard component in the Kubeflow environment on KakaoCloud.

Basic information
  • Estimated time: 10 minutes
  • Recommended OS: MacOS, Ubuntu
  • Notes:
    • In a private network environment, file downloads may not function properly.

About this scenario

TensorBoard is an essential tool for intuitively monitoring and analyzing the training process of machine learning models. Within the Kubeflow environment, you can use TensorBoard to monitor training progress in real time and compare different experiment results. This scenario will walk you through how to interpret key metrics for optimizing model performance and how to manage experiments using TensorBoard.

This tutorial introduces the step-by-step process of visualizing and analyzing logs from actual training data using TensorBoard in KakaoCloud’s Kubeflow environment.

Key topics include:

  • Creating and configuring a TensorBoard instance in Kubeflow
  • Monitoring and visualizing training logs in real time
  • Analyzing the training process using TensorBoard

Supported tools

ToolVersionDescription
TensorBoard2.1.0A visualization tool for ML experiments. It enables tracking and visualizing metrics, weight histograms, and tensors.
info

For more details on TensorBoard, refer to the official TensorBoard documentation.

Before you start

This section covers the environment setup and required resources to use TensorBoard.

1. Prepare Kubeflow environment

Before using TensorBoard in Kubeflow, verify that your node pool meets the minimum specs for the tutorial. If you need to configure the environment, refer to the guide Deploy Jupyter Notebooks on Kubeflow.

Minimum requirements:

  • Node pool spec: At least 4 vCPUs and 8GB RAM
  • Available file storage: At least 10GiB

Getting started

This section outlines the concrete steps for managing and visualizing ML experiment logs with TensorBoard.

Step 1. Create notebook instance for practice

This step walks you through creating a notebook instance in Kubeflow for the exercise.

  1. In the Kubeflow dashboard, go to the Notebooks tab.
  2. Click the [New Notebook] button to create an instance.
  3. In the New notebook setup screen, enter the following:
    • Notebook Image: Select kc-kubeflow/jupyter-tensorflow-full:v1.8.0.py311.1a
    • Notebook Specs: At least 1 vCPU and 2GB RAM
  4. Click [LAUNCH] to create the notebook instance.

Step 2. Train the model

This step involves training a model whose logs will be visualized using TensorBoard.

  1. Download the example project from the link below and upload it to the notebook instance you created:

  2. Run the code block to create a log directory.

    Create training log directory

  3. Run the model definition and training code block to start training the practice model.

    Train model

Step 3. Create TensorBoard instance

  1. Go to the Tensorboards tab and click [New TensorBoard].

  2. Fill in the required information on the New Tensorboard screen, then click [Create].

    Create TensorBoard

    FieldValue
    Storage typePVC
    PVC NameName of the Workspace Volume from the notebook created in Step 1
    Mount PathPath where training logs are stored (for this tutorial: logs/fit)
  3. Click the [CONNECT] button to open and view the TensorBoard instance.

    View TensorBoard

Step 4. Confirm integrated results with TensorBoard

This step shows how to train the model and review the training results in TensorBoard.

  1. In the notebook, set the TENSORBOARD_URL variable to the address of the TensorBoard created in Step 3.

    Set TensorBoard URL

  2. Run the notebook code to train the model and monitor the process through an IFrame displaying TensorBoard.

    Monitor training with TensorBoard