Managing machine learning experiments with Kubeflow tensorBoard
This guide explains how to use the TensorBoard component in the KakaoCloud Kubeflow environment to manage and visualize log data generated during machine learning experiments.
- Estimated time: 10 minutes
- Recommended OS: MacOS, Ubuntu
- Region: kr-central-2
- Prerequisites
- Reference document
Before starting
TensorBoard is an essential tool for intuitively monitoring and analyzing the training process of machine learning models. By using TensorBoard in the Kubeflow environment, you can monitor the progress of machine learning experiments in real time and compare various experimental results. Additionally, you will learn how to understand key metrics for optimizing model performance and manage experiments with TensorBoard.
About this scenario
This tutorial provides a step-by-step introduction to installing TensorBoard and visualizing and analyzing actual training data logs in the KakaoCloud Kubeflow environment.
Key topics covered in this scenario include:
- Creating and configuring TensorBoard instances in Kubeflow
- Learning how to monitor and visualize log data for training models in real time
- Analyzing model training processes using TensorBoard
Supported tools
Tool | Version | Description |
---|---|---|
TensorBoard | 2.1.0 | A visualization tool for machine learning experiments that tracks metrics, visualizes them, and tracks histograms of weights and tensors. |
For more information on TensorBoard, refer to the official TensorBoard documentation.
Prework
This section covers the environment setup and necessary resources to use TensorBoard.
1. Prepare Kubeflow environment
Before using TensorBoard in Kubeflow, ensure that the node pool meets the necessary specifications. If the environment needs to be set up, refer to the Setting up Jupyter Notebook using Kubeflow guide to configure an appropriate Kubeflow environment.
Minimum requirements
- Node pool minimum specs: At least 4 vCPUs and 8 GB memory
- Sufficient File Storage size: 10 GiB or more
2. Create volume for storing logs
Create a Persistent Volume (PV) for TensorBoard to store training logs.
-
Access the Kubeflow dashboard and navigate to the Volume tab.
-
Click the [New Volume] button at the top to create a new volume.
-
In the New Volume screen, enter the necessary information and click [Create] to create the volume.
Step-by-step process
The following steps will guide you through managing and visualizing machine learning experiment logs using TensorBoard.
Step 1. Create TensorBoard instance
-
Select the Tensorboards tab and click the [New TensorBoard] button.
-
In the New TensorBoard screen, enter the necessary information and click [Create] to create a TensorBoard instance.
Field Value Storage Type PVC PVC Name Name of the volume created in Step 2 Mount Path Path where TensorBoard will display logs from the selected volume -
Click the [CONNECT] button to access the created TensorBoard instance.
Step 2. Create notebook instance for practice
In this step, you'll learn how to create a notebook instance for practice in Kubeflow.
-
In the Kubeflow dashboard, select the Notebooks tab.
-
Click the [New Notebook] button at the top to create a notebook instance.
-
In the New notebook setup screen, enter the following details:
- Docker Image: Select
kc-kubeflow/jupyter-pyspark-pytorch:v1.8.0.py38.1a
. - Workspace Volume: To remove the default volume, click the trash icon. Then, select the [Attach existing volume] option and choose the
tensorboard volume
created in Step 2.
- Docker Image: Select
-
Once the setup is complete, click the [LAUNCH] button to create the instance.
Step 3. Train model and visualize the results with TensorBoard
This step explains how to train a model and visualize the training results using TensorBoard.
-
Download the example project from the link below and upload it to the notebook instance you created:
- Example download: Using TensorBoard.ipynb
-
In the notebook, change the
TENSORBOARD_URL
variable to the address of the TensorBoard created in Step 2. -
Run the notebook code to train the model, and monitor the changing training process through TensorBoard using an IFrame.