Hadoop Eco overview

KakaoCloud Hadoop Eco is a data analytics ecosystem designed to efficiently perform various tasks using large-scale data. The Hadoop Eco service is based on Apache Hadoop and supports open-source analytics frameworks such as Hadoop, HBase, Spark, Hive, Trino, and Kafka.

Hadoop Eco is designed to scale vertically from a single computer to thousands of clustered computers. Each machine provides local computation and storage capabilities, enabling efficient storage and processing of large-scale datasets ranging from gigabytes to petabytes.
Additionally, it integrates with KakaoCloud's data management tool, Data Catalog, to help operate and manage data more efficiently.

Terminology

Apache Hadoop: An open-source framework that allows distributed storage and processing of large datasets across clusters of computers using a simple programming model. It provides highly available services at the application layer without relying on hardware-level high availability, reducing the risk of individual machine failures through its distributed nature. For more details, refer to the official Apache Hadoop documentation.

Purpose and use cases

When processing and analyzing large-scale data, traditional databases or spreadsheets may struggle to handle large datasets efficiently, resulting in slower processing speeds. Additionally, building infrastructure and tools for distributed processing can introduce complex challenges related to data distribution and management. These factors may delay data analysis tasks, making it difficult to quickly extract insights and incorporate them into business decisions.

KakaoCloud's Hadoop Eco service is designed to efficiently process and analyze large datasets. By leveraging Hadoop Eco, users can handle data more effectively, accelerate data processing tasks, and meet data management and security requirements. This allows users to extract insights from data and make faster, more accurate business decisions.

Features

Easy cluster creation

Easily install big data analysis frameworks and clusters.
A ready-to-use data analysis environment is provided by simply selecting basic cluster settings and types.

Efficient task scheduling

After cluster creation, tasks can be scheduled with executable files and options for Hive and Spark jobs.
Automatically terminate clusters upon task failure for efficient management.

High availability configuration

Provides both standard environments with a single master node and high-availability environments with multiple master nodes.
Ensures stable data processing even during unexpected failures by using multiple master nodes.

Support for various frameworks

Built on Apache Hadoop and supports commonly used open-source frameworks like HBase, Spark, and Hive.

Data insight delivery

Integrates with various data analytics tools to deliver actionable insights, enabling more accurate business decision-making.

Getting started

For detailed usage guides on Hadoop Eco, refer to the How-to Guides.
If you are new to KakaoCloud, start with the Start section.

Purpose and use cases​

Features​

Easy cluster creation​

Efficient task scheduling​

High availability configuration​

Support for various frameworks​

Data insight delivery​

Getting started​