Skip to main content

2 posts tagged with "gpu"

View All Tags

KakaoCloud service updates - VM and Hadoop performance improvements, IAM security settings, and more

· 4 min read
Mia (정혜원)
Technical Contents Manager
update

This year, KakaoCloud is continuing to move forward without pause to provide users with a more convenient and secure cloud environment. With the warm arrival of spring, we are sharing a roundup of major service updates from March.

If the recently announced user-centered console renewal was a major change to screen structure and experience (UX), this post focuses on service feature enhancements that strengthen the foundation. Along with work to improve system stability, review the details of this update, which further improves resource management efficiency and security.


🖥️ Infrastructure management efficiency and service scalability

  • GPU service integrated into Virtual Machine (VM): For more intuitive resource management, the previously separate GPU service has been integrated into the Virtual Machine service.

    • Integrated environment provided: You can now select and manage general instances and GPU instances within the same workflow when creating a VM.
    • Automatic notification policy conversion: As part of the service integration, Alert Center notification policies previously configured in the GPU service have been safely and automatically converted into Virtual Machine service policies. You can continue using the existing monitoring environment without separate reconfiguration.
  • Virtual Machine supports "start credits" for t1i instances: To improve workload processing efficiency, the start credit feature has been added to t1i, a burstable instance type. Instances can now temporarily maintain high CPU utilization during boot, dramatically improving initial startup speed.

  • Hadoop Eco expands node volume size up to 16 TB: To support large-scale data analysis, the maximum volume size per node (master, worker, task) in Hadoop Eco has been significantly increased from 5 TB to up to 16 TB. Analyze larger volumes of data without storage constraints.

  • Object Storage product name changed: To make it easier for users to recognize the storage services they are using, Object Storage product names have been changed as follows. Pricing remains the same, and changes will be applied sequentially starting with March billing statements.

    • Data capacity: Hot Bucket → Standard Storage Class
    • API calls: The Standard- prefix is added before existing request names (for example, Standard-PUT, Standard-GET, and so on)

🔑 Security enhancements

  • IAM security settings enhanced: To protect valuable organizational resources, various security settings have been added to Account settings and IAM service items in the console.

    • Password reauthentication when deleting resources: When deleting a user account or project service account, a password reauthentication step has been added to prevent simple mistakes.
    • Immediate session and token expiration option: When changing a password, all currently logged-in sessions and issued access tokens can be invalidated immediately. This helps respond quickly to security incidents in emergency situations where account leakage is suspected.
    • Expanded Cloud Trail audit logs: 17 new event types have been added so that security policy and account management history can be tracked in more detail.

🛠️ Improved developer convenience

  • New OpenAPI support for MySQL: OpenAPI support for developers has been expanded further. With this update, MySQL OpenAPI has been newly added, allowing KakaoCloud MySQL to be controlled directly by API and used for management automation. For detailed OpenAPI updates, see OpenAPI Changelogs.

That is all for this update. In addition to the feature improvements introduced here, detailed changes for each service and previous update history can be found in the service-specific release notes in the technical documentation.

KakaoCloud will continue doing its best to provide stable infrastructure and user-centered features.
If you have any questions about using the service, please contact KakaoCloud Support anytime.

👉 Start KakaoCloud now

Building MLOps workflows with Kubeflow

· 7 min read
Update Kubeflow


Hello. In this post, we introduce Kubeflow, a core platform for machine learning operations.

Kubeflow is an open-source project designed to reduce the complexity of machine learning and help data scientists and developers develop and deploy machine learning models more easily and quickly. In the first sentence introducing Kubeflow on the official Kubeflow site, it is described as a project that helps comprehensively manage and operate various open-source tools for machine learning on Kubernetes.

Starting from TensorFlow Extended (TFX), which Google used internally in the past, Kubeflow has now expanded into one of the most widely known end-to-end solutions for running machine learning workflows in various Kubernetes-based environments.

One of Kubeflow's most innovative approaches is the integration of AutoML and Kubeflow Pipelines. This allows users to automate and optimize the training, evaluation, and deployment stages of models, reducing repetitive work in machine learning projects. In addition, multi-tenant support has been strengthened so that multiple teams can effectively share the same Kubeflow instance while isolating resources. The Kubeflow service provided by KakaoCloud is also designed to maximize the efficiency of machine learning work and make it easy for users to access.

In this post, we introduce Kubeflow's major components, latest features, and various tutorial scenarios for using Kubeflow on KakaoCloud.

Kubeflow features

Kubeflow supports the following tasks in Kubernetes environments with the goal of flexible scaling and easy, convenient production deployment of machine learning models.

  • Easy, repeatable, and portable deployment: Pipelines created through Kubeflow make deployment easier across multiple environments, including cloud and on-premises environments.
  • Independent microservice deployment and management system: Based on a microservices architecture, Kubeflow enables independent management of each component.
  • Responsive scaling based on user requirements: Resources are automatically scaled according to user requirements to ensure optimal performance.

Key Kubeflow components

Kubeflow consists of multiple open-source components such as Central Dashboard, Jupyter Notebooks, Tensorboard, and Pipelines, each supporting a specific stage of the machine learning workflow. These components are designed to help users manage machine learning projects more efficiently.


Source: Kubeflow Ecosystem

Using these key components on Kubernetes, Kubeflow efficiently supports the entire process from machine learning model development and deployment to resource management.

Key Kubeflow componentDescription
Central Dashboard   Provides a dashboard web console for accessing and monitoring multiple components.
NotebooksProvides a Jupyter Notebook environment where data scientists can code directly within a cluster.
TensorboardCreates and manages Tensorboard Server, a tool for visualizing model training processes and training data provided by frameworks such as Tensorflow and PyTorch.
PipelinesSimplifies complex machine learning workflows through scalable Docker-based pipelines.
KatibAutomates hyperparameter tuning for model training through AutoML components such as Katib.
Training OperatorSupports various machine learning frameworks and enables flexible training jobs.
KServeEnables efficient model deployment and serving through model-serving add-ons such as KServe, and provides them as real-time APIs internally and externally.

KakaoCloud Kubeflow

KakaoCloud supports the latest features, including Kubeflow 1.6, and provides an optimized cloud environment that enables users to perform machine learning tasks easily and quickly. In particular, KakaoCloud Kubeflow has the following features.

Support for all Kubeflow 1.6 features

KakaoCloud Kubeflow lets you use all major Kubeflow components and add-ons introduced above. You can also install and use frameworks and libraries such as Tensorflow, PyTorch, Apache MXNet, MPI, XGBoost, Chainer, HuggingFace, and OpenAI SDK.

Granular access management

By providing RBAC, users can be assigned namespaces according to their tasks and roles, and permissions can be managed efficiently by user or group. Administrators can also assign quota features by namespace and allocate CPU, memory, GPU memory, and storage resources according to configured usage.

Flexible storage options

In addition to the independent MinIO type, KakaoCloud supports storage repositories of the Object Storage type, enabling more flexible serving of model result files.

Optimized for Nvidia MIG instances

KakaoCloud Kubeflow provides optimized MIG (Multi Instance GPU) instances based on Nvidia A100. MIG instance settings allow GPU resources to be partitioned, enabling users to run multiple workloads efficiently on the same GPU.

Multi File Storage support

Users can dynamically use as much independent File Storage as needed by user or group, making it easier to share files between work pipelines and notebooks.

Usage examples with Kubeflow

KakaoCloud technical documentation provides rich Kubeflow tutorials that cover various stages of machine learning projects, from Jupyter Notebook setup to building parallel training models and creating model-serving APIs. By referring to these tutorials, you can learn about efficient model development, training, optimization, and deployment using KakaoCloud Kubeflow.

The Kubeflow-related tutorials currently available in KakaoCloud technical documentation are as follows.

Closing

Kubeflow is currently one of the most widely used open-source MLOps platforms in Korea and abroad. As a result, educational content, experience cases, and example source code are relatively abundant, helping data scientists and working analysts who are using it for the first time adapt quickly.

KakaoCloud Kubeflow provides GPU optimization and powerful resource management features through easy provisioning that takes advantage of the cloud environment. We will continue improving the Kubeflow service so KakaoCloud users can fully benefit from an MLOps platform with machine learning efficiency and enhanced security. If you are considering using a Kubeflow service for machine learning, be sure to try KakaoCloud's service.

Thank you.