Skip to main content

2 posts tagged with "hadoop-eco"

View All Tags

KakaoCloud service updates - VM and Hadoop performance improvements, IAM security settings, and more

· 4 min read
Mia (정혜원)
Technical Contents Manager
update

This year, KakaoCloud is continuing to move forward without pause to provide users with a more convenient and secure cloud environment. With the warm arrival of spring, we are sharing a roundup of major service updates from March.

If the recently announced user-centered console renewal was a major change to screen structure and experience (UX), this post focuses on service feature enhancements that strengthen the foundation. Along with work to improve system stability, review the details of this update, which further improves resource management efficiency and security.


🖥️ Infrastructure management efficiency and service scalability

  • GPU service integrated into Virtual Machine (VM): For more intuitive resource management, the previously separate GPU service has been integrated into the Virtual Machine service.

    • Integrated environment provided: You can now select and manage general instances and GPU instances within the same workflow when creating a VM.
    • Automatic notification policy conversion: As part of the service integration, Alert Center notification policies previously configured in the GPU service have been safely and automatically converted into Virtual Machine service policies. You can continue using the existing monitoring environment without separate reconfiguration.
  • Virtual Machine supports "start credits" for t1i instances: To improve workload processing efficiency, the start credit feature has been added to t1i, a burstable instance type. Instances can now temporarily maintain high CPU utilization during boot, dramatically improving initial startup speed.

  • Hadoop Eco expands node volume size up to 16 TB: To support large-scale data analysis, the maximum volume size per node (master, worker, task) in Hadoop Eco has been significantly increased from 5 TB to up to 16 TB. Analyze larger volumes of data without storage constraints.

  • Object Storage product name changed: To make it easier for users to recognize the storage services they are using, Object Storage product names have been changed as follows. Pricing remains the same, and changes will be applied sequentially starting with March billing statements.

    • Data capacity: Hot Bucket → Standard Storage Class
    • API calls: The Standard- prefix is added before existing request names (for example, Standard-PUT, Standard-GET, and so on)

🔑 Security enhancements

  • IAM security settings enhanced: To protect valuable organizational resources, various security settings have been added to Account settings and IAM service items in the console.

    • Password reauthentication when deleting resources: When deleting a user account or project service account, a password reauthentication step has been added to prevent simple mistakes.
    • Immediate session and token expiration option: When changing a password, all currently logged-in sessions and issued access tokens can be invalidated immediately. This helps respond quickly to security incidents in emergency situations where account leakage is suspected.
    • Expanded Cloud Trail audit logs: 17 new event types have been added so that security policy and account management history can be tracked in more detail.

🛠️ Improved developer convenience

  • New OpenAPI support for MySQL: OpenAPI support for developers has been expanded further. With this update, MySQL OpenAPI has been newly added, allowing KakaoCloud MySQL to be controlled directly by API and used for management automation. For detailed OpenAPI updates, see OpenAPI Changelogs.

That is all for this update. In addition to the feature improvements introduced here, detailed changes for each service and previous update history can be found in the service-specific release notes in the technical documentation.

KakaoCloud will continue doing its best to provide stable infrastructure and user-centered features.
If you have any questions about using the service, please contact KakaoCloud Support anytime.

👉 Start KakaoCloud now

Hadoop Eco adds features for operational efficiency in data lake architecture

· 5 min read
Evan (진은용)
Service Manager
HDE update

When enterprises design cloud-based large-scale data lake architectures, we have reached a point where we must go beyond simply accumulating data and maximize operational efficiency. To secure efficiency, it is necessary to build a balanced set of core elements such as high-performance processing, flexible separation of compute resources, and robust data governance.

If this balance breaks down, complex problems can occur, such as real-time analytics queries being delayed by batch jobs or difficulty understanding the location and reliability of the data needed.

KakaoCloud Hadoop Eco (HDE) recently carried out a large-scale update to solve these problems and improve the processing power and operational management capabilities of analytics environments. Based on the release of the new HDE-2.3.0 version, this update includes major changes such as improved integration with Iceberg catalogs, a next-generation metastore, and the introduction of task nodes optimized for workloads.

In this post, we briefly introduce how these improvements can be used within HDE to improve analytics workflows.

🚀 New HDE-2.3.0 version and powerful components added

With this update, HDE-2.3.0 is newly provided, and JupyterLab, Impala, and Kudu components have been added to effectively support data analytics and processing workflows.

Create HDE cluster Create HDE cluster

  • JupyterLab: Provides a web-based programming and shell environment, offering a development environment where data exploration and analysis code can be executed immediately within cluster nodes.
  • Impala: A powerful query engine that supports fast interactive queries against data stores such as Kudu based on Hive Metastore.
  • Kudu: Serves as a columnar data store that supports low-latency reads and writes.

In addition, Druid, a core component of Dataflow-type clusters, has been upgraded to v33.0.0, and Superset has been upgraded to v5.0.0, further improving performance and stability.

💡 View the Hadoop Eco component list

⚙️ Securing cluster structure flexibility: introducing task nodes

One of the tricky parts of cluster operations is separating batch processing and interactive processing resources to minimize mutual interference. In this update, the newly introduced task node effectively reduces operational burden.

Task node settings Task node settings

  • Role separation: Task nodes are mainly used as dedicated compute resources for executing large-scale batch computation jobs (YARN Jobs). By separating their role from worker nodes, they ensure the stability of core data processing resources and effectively prevent performance degradation caused by resource contention.
  • More accurate capacity planning: With the introduction of task nodes, the method for calculating YARN available resources has been changed to include the number and flavor of task nodes. This makes cluster capacity planning more accurate and predictable.

⚠️ Note when using task nodes: Task nodes can only be added when creating a cluster. Please carefully decide whether to add task nodes during the initial design stage, because they cannot be added after creation. However, reducing the number of nodes to 0 and increasing it again is possible.

🧊 Iceberg catalog integration, now with one click

As KakaoCloud Data Catalog officially supports the Apache Iceberg format, Iceberg catalog integration when creating a Hadoop Eco cluster has been dramatically simplified.

Iceberg catalog integration Iceberg catalog integration

In the Hadoop Eco service with this improvement, the console now lets you directly select and connect a Data Catalog Iceberg catalog in the external metastore integration setting during cluster creation. This minimizes human error, shortens integration time, and lets you start analytics work immediately.

In addition, an option has been added so that users can choose whether to automatically retain data during the data retention period (90 days) after cluster deletion. This feature can be used to prevent unnecessary metadata retention costs and clarify governance.

This Hadoop Eco update is not just a feature expansion. It further strengthens the operational efficiency of data lake architecture around three axes: stable metadata governance, high-performance interactive analytics environments, and flexible compute resource management.

Operate analytics workflows more efficiently and systematically with KakaoCloud's new Hadoop Eco.

Thank you.

👉 Start KakaoCloud now