Tutorial series | Kubeflow basic workflow

ML experiment management with Kubeflow TensorBoard

This tutorial explains how to manage and visualize log data generated during machine learning experiments by using the TensorBoard component in the Kubeflow environment on KakaoCloud.

Basic information

Estimated time: 10 minutes
Recommended OS: MacOS, Ubuntu
Notes:
- In a private network environment, file downloads may not function properly.

About this scenario

TensorBoard is an essential tool for intuitively monitoring and analyzing the training process of machine learning models. Within the Kubeflow environment, you can use TensorBoard to monitor training progress in real time and compare different experiment results. This scenario will walk you through how to interpret key metrics for optimizing model performance and how to manage experiments using TensorBoard.

This tutorial introduces the step-by-step process of visualizing and analyzing logs from actual training data using TensorBoard in KakaoCloud’s Kubeflow environment.

Key topics include:

Creating and configuring a TensorBoard instance in Kubeflow
Monitoring and visualizing training logs in real time
Analyzing the training process using TensorBoard

Supported tools

Tool	Version	Description
TensorBoard	2.1.0	A visualization tool for ML experiments. It enables tracking and visualizing metrics, weight histograms, and tensors.

info

For more details on TensorBoard, refer to the official TensorBoard documentation.

Before you start

This section covers the environment setup and required resources to use TensorBoard.

1. Prepare Kubeflow environment

Before using TensorBoard in Kubeflow, verify that your node pool meets the minimum specs for the tutorial. If you need to configure the environment, refer to the guide Deploy Jupyter Notebooks on Kubeflow.

Minimum requirements:

Node pool spec: At least 4 vCPUs and 8GB RAM
Available file storage: At least 10GiB

Getting started

This section outlines the concrete steps for managing and visualizing ML experiment logs with TensorBoard.

Step 1. Create notebook instance for practice

This step walks you through creating a notebook instance in Kubeflow for the exercise.

In the Kubeflow dashboard, go to the Notebooks tab.
Click the [New Notebook] button to create an instance.
In the New notebook setup screen, enter the following:
- Notebook Image: Select kc-kubeflow/jupyter-tensorflow-full:v1.8.0.py311.1a
- Notebook Specs: At least 1 vCPU and 2GB RAM
Click [LAUNCH] to create the notebook instance.

Step 2. Train the model

This step involves training a model whose logs will be visualized using TensorBoard.

Download the example project from the link below and upload it to the notebook instance you created:
- Download: Tensorboard 사용하기.ipynb
Run the code block to create a log directory.
Run the model definition and training code block to start training the practice model.

Step 3. Create TensorBoard instance

Go to the Tensorboards tab and click [New TensorBoard].
Fill in the required information on the New Tensorboard screen, then click [Create].

Field Value
Storage type PVC
PVC Name Name of the Workspace Volume from the notebook created in Step 1
Mount Path Path where training logs are stored (for this tutorial: logs/fit)
Click the [CONNECT] button to open and view the TensorBoard instance.

Field	Value
Storage type	PVC
PVC Name	Name of the Workspace Volume from the notebook created in Step 1
Mount Path	Path where training logs are stored (for this tutorial: `logs/fit`)

Step 4. Confirm integrated results with TensorBoard

This step shows how to train the model and review the training results in TensorBoard.

In the notebook, set the TENSORBOARD_URL variable to the address of the TensorBoard created in Step 3.
Run the notebook code to train the model and monitor the process through an IFrame displaying TensorBoard.

About this scenario​

Supported tools​

Before you start​

1. Prepare Kubeflow environment​

Getting started​

Step 1. Create notebook instance for practice​

Step 2. Train the model​

Step 3. Create TensorBoard instance​

Step 4. Confirm integrated results with TensorBoard​