Skip to main content

Create and manage Kubeflow

Before you start

Kubeflow runs on Kubernetes. Therefore, before using Kubeflow, you must first have a cluster created through KakaoCloud's Kubernetes Engine service, along with KakaoCloud File Storage configured on the same network and subnet as the cluster.

To perform these tasks, you need IAM permissions of 'project member' or higher. Additionally, the 'Kubeflow Admin' role is required for creating Kubeflow. For details, refer to IAM roles > Project roles.

info

If the cluster's network is a private subnet, the nodes in the private subnet cannot access the internet. NAT communication is required for those nodes to communicate externally.
You can use a NAT instance for this. See Use NAT instance for more information.

Create Kubernetes cluster

In the KakaoCloud Console, go to the Kubernetes Engine menu to create a Kubernetes-based cluster. If you already have a cluster, you may start from Create node pool.

info
  • Clusters already used with Kubeflow cannot be reused during Kubeflow creation. A new cluster must be used.
  • For detailed steps on creating a cluster, refer to Create cluster.
  • If the VPC does not have an internet gateway, Kubeflow cannot be installed successfully.

Create node pool

Create the required node pools for Kubeflow (both mandatory and optional).

  1. Go to KakaoCloud Console > Container Pack > Kubernetes Engine.

  2. Click the Cluster menu, and navigate to the detail page of the cluster to be linked with Kubeflow.

  3. Go to the Node pool tab and click [Create node pool].

  4. On the node pool creation page, fill in the required information and click [Create].
    (To use Kubeflow, the cluster must have both an Ingress node pool and a Worker node pool pre-created.)

    Minimum specifications for Kubeflow cluster node pools
    TypeRequiredSpecification
    Ingress node poolRequired- Instance type: t1i.small or higher
    - Volume: 50GB or more
    - Node count: at least 1
    - Auto scaling: Disabled
    Worker node poolRequired- Instance type: vCPU ≥ 2, memory ≥ 3GiB
    - Volume: at least 100GB
    - Node count: at least 5
    - Auto scaling: Disabled
    CPU node poolOptional- Instance type: t1i.small or higher
    - Volume: at least 100GB
    - Auto scaling: Disabled
    GPU node poolOptional- Instance type: A100 (p2i instance family)
    - Volume: at least 100GB
    - Auto scaling: Disabled
caution
  • If you do not assign both CPU/GPU node pools, Kubeflow components may not function properly.
  • If you install the ingress-nginx controller manually before creating Kubeflow, errors may occur.
    If previously installed, remove the ingress-nginx resources before proceeding with Kubeflow creation.
  • If the pod scheduling setting of the created node pool is set to Block, Kubeflow cannot be created.
    Also, changing this to Block after Kubeflow creation may cause issues. Always set it to Allow.

Create File Storage instance

Kubeflow allows you to assign File Storage per namespace in addition to the default one. Here's how to create a File Storage instance:

info

You can only use File Storage instances that are on the same network and subnet as the cluster used to create Kubeflow.

  1. Go to KakaoCloud Console > Beyond Storage Service > File Storage.

  2. Click the Instance menu, then click [Create instance].

  3. Refer to the table below, fill in the fields, and click [Create].

    CategoryFieldDescription
    Network configurationNetworkSelect the same network as the cluster used to create Kubeflow
    SubnetSelect the same subnet as the cluster used to create Kubeflow
    Access controlAllow access from all private IPs in the configured network

Create Kubeflow

  1. Go to the KakaoCloud Console and navigate to the Kubeflow menu.

  2. Click the Kubeflow menu, then click the [Create Kubeflow] button.

  3. Enter the required information and click the [Create] button.

    FieldCategoryDescription
    Kubeflow nameEnter 4–20 characters using lowercase letters and hyphens (-) only.
    - Must start with a lowercase letter, cannot contain consecutive hyphens or end with a hyphen.
    - Cannot be duplicated within the same project.
    Kubeflow configurationKubeflow versionSelect a Kubeflow version.
    Kubeflow service typeSelect based on the Kubeflow version.
    Refer to Service types and supported components.
    Cluster connectionSelect a Kubernetes Engine cluster to link with Kubeflow.
    - Refer to Before you start if no cluster exists.
    - Clusters already used for Kubeflow cannot be reused.
    Cluster configurationIngress node pool (required)Select the Ingress node pool from the chosen cluster.
    Worker node pool (required)Select the Worker node pool from the chosen cluster.
    CPU node pool (optional)Select a CPU node pool from the cluster.
    GPU node pool (optional)Select a GPU node pool from the cluster.
    - GPU MIG: Set MIG instance specs for the selected pool.
    - GPU MIG(Default): Set default MIG specs for future node additions.
    Refer to Configure MIG instance.
    Default File StorageSelect a File Storage configured on the same network and subnet as the cluster.
    Object Storage(MinIO) IDAuto-generated.
    (MinIO) ID/passwordAuto-generated MinIO ID/PW.
    (Object Storage) BucketCreates a bucket named kubeflow-{kubeflow id}.
    Check the bucket in the Object Storage menu (View bucket list).
    Owner settingsEmailEnter a valid email address (requires IAM 'project member' or higher).
    Namespace nameEnter a namespace name.
    Namespace File StorageMust be on the same network as the cluster.
    May overlap with the default File Storage.
    Database(MySQL) MySQLSelect a MySQL instance in the same VPC.
    Refer to Create instance group.
    (MySQL) MySQL user IDMySQL admin or user with database creation permissions.
    (MySQL) PasswordPassword of the MySQL user.
    (Internal DB) PortEnter a numeric port.
    (Internal DB) PasswordEnter a password.
    Domain connection (optional)Enter a valid domain.
    - VisualStudio Code and RStudio may not function without domain connection.
info
  • Ingress node pool cannot be shared with other pools.
  • Kubeflow name, Ingress/Worker node pools cannot be changed after creation.
  • To scale node pools, refer to Manage node pool scaling.

Configure MIG instance

MIG (Multi Instance GPU) allows multiple users to run separate workloads on the same GPU without interference.

  • KakaoCloud Kubeflow requires maximum utilization of each MIG instance.
  • If using 2+ GPU cards, MIG settings apply as [number of MIG instances × number of GPU cards].
  • If no setting or default is selected, the smallest configuration applies.
  • Some combinations may be restricted by vendor policy. Refer to MIG instance exceptions.

Default specifications

Instance familyDescription
A1007 instances with 1g.10gb configuration

MIG instance exceptions

Instance familyDescription
A100Cannot use 3g.40gb and 4g.40gb simultaneously due to vendor policy.

Manage Kubeflow

Kubeflow utilizes Kubernetes cluster management to simplify and streamline machine learning workflows. Here's how to manage Kubeflow instances.

View Kubeflow list

  1. Go to the KakaoCloud Console and select the Kubeflow menu.

  2. The menu displays the list of created Kubeflow instances.

    FieldDescription
    FilterFilter/search by name or status.
    NameUser-defined Kubeflow name.
    StatusCreating, Active, Failed, Expired, Terminating, Terminated.
    Refer to Lifecycle and status.
    Service typeSelected service type (e.g., Essential+HPT+Serving API).
    VersionVersion of Kubeflow.
    GPU enabledGPU activation status (Enabled/Disabled).
    ProfilesNumber of profiles: user and group (kbm-g namespace).
    Created atTimestamp of creation.
    UptimeOperational duration of the instance.
    More options- Delete (for Active status only)
    - Modify CPU/GPU node pools
caution

Deleted Kubeflow instances are displayed for 1 day only. Cluster usage fees continue after deletion unless the cluster is also deleted.

View Kubeflow details

You can check detailed information about Kubeflow, including groups, users, components, and monitoring data.

  1. In the KakaoCloud Console, navigate to the Kubeflow menu.

  2. In the Kubeflow menu, select the Kubeflow instance for which you want to view information.

  3. On the Kubeflow detail screen, view the relevant information.

    CategoryDescription
    Status- Creating, Active, Failed, Expired, Terminating, Terminated
    - For detailed explanations of each status value, refer to Kubeflow lifecycle and statuses
    Number of nodesTotal number of instances used in the cluster configuration connected to Kubeflow
    GPU[MIG instance] button: Click to view the GPU MIG instance configuration specified during Kubeflow creation in JSON format
    - Not displayed if GPU node pool is not used
    Quick launch[View dashboard] button: Click either the Private Dashboard or Public Dashboard button to go to the Kubeflow dashboard

Kubeflow tab information

You can view details such as Kubeflow ID, creation time, and the connected cluster.

CategoryItemDescription
Kubeflow informationKubeflow IDUnique ID of the Kubeflow instance
VersionVersion
- 1.6 / 1.8
Service typeType of service
- Essential+HPT+Serving API
- For details on supported versions, refer to Kubeflow service types and component support
CreatorUser account that created the Kubeflow instance
Creation timeTime of creation (uptime)
- Displays the time when the creation request was responded to (operational time of Kubeflow)
Connected clusterCluster nameName of the cluster connected to Kubeflow
Default file storageFile storage configured in the connected cluster
- Used for data storage and sharing
VPCVPC where the connected cluster for Kubeflow is deployed
SubnetSubnet where the connected cluster for Kubeflow is deployed
DBDB settings configured during Kubeflow creation (MySQL, Kubeflow internal DB)
Ingress node poolIngress node pool information of the connected cluster
Worker node poolWorker node pool information of the connected cluster
CPU node poolCPU node pool information of the connected cluster
GPU node poolGPU node pool information of the connected cluster
info

Metrics for GPU instance types are only available in Kubeflow detail screen > Monitoring.

  • Support for custom dashboards in the KakaoCloud Monitoring service will be added later.

Modify Kubeflow cluster node pools

You can add new node pools or delete existing ones by modifying the CPU/GPU node pools configured during Kubeflow creation.

caution

Node pool modification may fail depending on the cluster status. Nodes with running pods cannot be modified. Please check the cluster status and clean up the nodes before proceeding.

  1. Go to the Kubeflow menu in the KakaoCloud Console.

  2. In the Kubeflow menu, select the Kubeflow instance whose node pool you want to modify.

  3. Click the [Modify cluster node pool] button at the top of the page or select the more options icon.

  4. In the cluster node pool modification modal, edit the items listed below.

    ItemEditable
    Kubeflow nameNot editable
    Ingress / Worker node poolNot editable
    CPU node poolEditable - Add/delete CPU node pools
    GPU node poolEditable - Add/delete GPU node pools / Modify MIG configuration

Configure Kubeflow quotas

Quotas define the amount of resources that can be used within a namespace and can be set based on the resources available in the Kubernetes Engine where Kubeflow is installed.

caution

A default of 5 CPU cores and 4 GiB memory is consumed when a namespace is created. For stable operation, you must assign quotas exceeding these values.

Quotas can be configured on the console screens for creating or editing users/groups. (Cannot be configured from the dashboard.)

  1. Go to the Kubeflow menu in the KakaoCloud Console.

  2. In the Kubeflow tab, select the Kubeflow instance you want to use.

  3. On the detail screen, go to the users tab.

  4. Click the [Add user] button.

  5. Check “Enable quota assignment” and specify the desired quota.

info

You can add or modify quotas by clicking the [More] icon for each user. For group quotas, go to the group tab and follow the same method.

caution

If you increase or decrease (+/-) the quota for an already active namespace, it may affect running resources. Always verify resource usage before making changes.

Quota assignment conditions

  1. Quotas are divided into CPU count (cores), CPU memory (GiB), and GPU memory (GiB). The maximum values are based on the sum of all node pools excluding the Worker and Ingress node pools.
    (This means that additional node pools are required aside from Worker and Ingress for quota settings.)

    Quota typeBased on maximum value
    CPU countTotal number of CPU cores in all node pools except Worker and Ingress (at least 5 cores)
    CPU memoryTotal CPU memory in all node pools except Worker and Ingress (at least 4 GiB)
    GPU memoryTotal GPU memory across all GPU node pools
    File storage sizeSize of the mapped file storage (can be shared with other users)
  2. A user namespace or group namespace is required to assign quotas.

  3. If the maximum quota value has already been used, no additional quota can be assigned.

Maximum quota calculation method

The maximum quota value is calculated as the total of all resources except Ingress and Worker node pools.
Node pools used multiple times are not double-counted.

Example calculation

Node poolFlavorCPU coreCPU memoryGPU memoryNode count
ingressNodePoolm2a.large28-1
workerNodePoolm2a.xlarge416-6
cpuNotebookNodePoolm2a.2xlarge832-6
cpuPipelineNodePoolSame as cpuNotebookNodePool832-6
gpuNotebookNodePoolp2i.6xlarge24192402
gpuPipelineNodePoolp2i.12xlarge48384802

Assuming the node pools above were configured during Kubeflow creation, the maximum available quota values are:

  • Maximum CPU cores: cpuNotebookNodePool (48) + cpuPipelineNodePool (0) + gpuNotebookNodePool (48) + gpuPipelineNodePool (96) = 192 cores
  • Maximum CPU memory: cpuNotebookNodePool (192) + cpuPipelineNodePool (0) + gpuNotebookNodePool (384) + gpuPipelineNodePool (768) = 1344 GiB
  • Maximum GPU memory: cpuNotebookNodePool (0) + cpuPipelineNodePool (0) + gpuNotebookNodePool (80) + gpuPipelineNodePool (160) = 240 GiB

Used quota calculation method

  1. The used (occupied) quota is the sum of all assigned quotas for users and groups with namespaces in the Kubeflow instance.

  2. Namespaces that exist without assigned quotas are not included in the used quota calculation.

Available quota calculation method

  1. Available quota is calculated as maximum quota value - used quota value.

  2. Resource usage without quota assignment is not counted as used quota.

Connect Kubeflow dashboard via public IP

You can access the Kubeflow dashboard by assigning a public IP.

Assign a public IP to the load balancer

Assign a public IP to the load balancer created by the Kubeflow ingress controller during service creation. Then access the dashboard using the assigned IP address.

info

When Kubeflow is created, the necessary load balancer listeners and target groups are automatically defined via the ingress controller. No separate creation or modification is required.

  1. Go to the Load balancer menu in the KakaoCloud Console.

  2. In the Load balancer menu, select the load balancer named according to this pattern:
    kube_service_{project ID}_{IKE cluster name}_ingress-nginx_ingress-nginx-controller.

  3. On the load balancer detail page, click the [More] icon.

  4. In the public IP connection settings modal, enter the necessary information and click [Save].

    Image. Configure public IP for load balancer
    Configure public IP for load balancer

    ItemDescription
    Load balancer nameName of the selected load balancer
    Private IPPrivate IP of the selected load balancer
    Public IP assignmentChoose from the options below:
    - Automatically assign a new public IP
    - Select from existing public IPs: choose a public IP to assign
  5. After assigning the public IP, go to the Quick launch section in the Kubeflow detail tab and select the connected public IP.

  6. Verify that the dashboard is accessible.

    Image. Check Kubeflow dashboard
    Verify Kubeflow dashboard connection

info

For more usage examples of the Kubeflow service, see the Machine Learning & AI category in the tutorials section.