Key Concepts
KakaoCloud's Hadoop Eco is a cloud platform service designed for distributed processing tasks using open-source frameworks such as Hadoop, Hive, HBase, Spark, Trino, and Kafka. It provides provisioning services for Hadoop, HBase, Trino, and Dataflow using KakaoCloud's Virtual Machines. The key concepts of Hadoop Eco are as follows.
Cluster
A cluster is a collection of nodes provisioned using Virtual Machines.
Cluster types
Hadoop Eco offers the following cluster types: Core Hadoop
, HBase
, Trino
, and Dataflow
.
Type | Description |
---|---|
Core Hadoop | Includes Hadoop, Hive, Spark, and Tez - Data is stored in HDFS and analyzed using Hive and Spark |
HBase | Includes Hadoop and HBase - Data is stored in HDFS and NoSQL services are provided using HBase |
Trino | Includes Hadoop, Trino, Hive, and Tez - Data is stored in HDFS and analyzed using Trino and Hive |
Dataflow | Includes Hadoop, Kafka, Druid, and Superset - Data is collected via Kafka and analyzed using Druid and Superset |
Cluster availability types
To ensure operational stability, availability types include Standard (Single) and High Availability (HA).
Availability Type | Description |
---|---|
Standard (Single) | Composed of one master node and multiple worker nodes - Since there is only one master node, if a failure occurs, HDFS and YARN may not function |
High Availability (HA) | Composed of three master nodes and multiple worker nodes - HDFS and YARN are configured for HA, and the master is automatically recovered in case of failure |
Cluster versions
The version of Hadoop Eco determines the versions of the components installed. HDE clusters support Core Hadoop for data analysis, HBase for HDFS-based NoSQL services, and from version HDE 1.1.2, Trino and Dataflow are available. HDE 2.0.1 supports Hadoop 3.x, HBase 2.x, and Hive 3.x.
Components Installed by Cluster Type Per Version
- Core Hadoop
- HBase
- Trino
- Dataflow
Core Hadoop
HBase
Trino
Dataflow
Cluster lifecycle
Hadoop Eco clusters go through various states and lifecycles, allowing users to check and manage the operational and task statuses. After the initial creation request, the lifecycle includes installation, operation, and deletion stages. The states of the cluster and instances may vary based on user actions.
Cluster Lifecycle
Cluster and Node States
State | Description |
---|---|
Initializing | Metadata is being saved and VM creation has been requested |
Creating | VM creation in progress |
Installing | Installing Hadoop Eco components on the created VM |
Starting | Hadoop Eco components are running |
Running | All components are running and the cluster is operational |
Running(Scale out initializing) | VM creation requested for cluster scaling |
Running(Scale out creating) | VM creation in progress |
Running(Scale out installing) | Installing Hadoop Eco components on the created VM |
Running(Scale out starting) | Components are running |
Running(Scale out running) | Verifying the operation of the existing cluster and newly scaled-out VMs |
Running(Scale in Initializing) | Verifying whether the target VM can be deleted for scaling down |
Running(Scale in ready) | Shutting down components on the scaling-down VM |
Running(Scale in starting) | Checking if component shutdown on the scaling-down VM is successful |
Running(Scale in terminating) | Deleting VM |
Failed to scale out | Failed to create the scaling-out VM |
Failed to scale out vm | Failed to install or run components on the scaling-out VM |
Failed to scale in | Failed to delete the scaling-down VM |
Failed to scale in vm | Failed to properly shut down components on the scaling-down VM |
Terminating | Cluster termination in progress |
Terminated(User) | Cluster terminated by the user |
Terminated(UserCommand) | Cluster terminated after successful task scheduling |
Terminated(Scale in) | Cluster scaled down and VM terminated successfully |
Terminated(Error) | Cluster terminated due to error |
Terminated(Failed to create vm) | Error during VM creation |
Terminated(Failed to destroy vm) | Error during VM termination |
Terminated(Check time over) | Cluster creation exceeded time limits |
Terminated(Install error) | Cluster terminated due to component installation or execution failure |
Terminated(Failed to scale out) | VM terminated due to scaling-out failure |
Terminated(Failed to scale in) | Forced termination of the VM after scaling-down failure |
Terminated(User deleted VM) | User manually deleted the Hadoop Eco cluster VM |
Pending | Hadoop Eco creation requests are available after Open API is enabled |
Processing | Hadoop Eco creation and job scheduling in progress after Open API is enabled |
Instance and cluster states
Instances are KakaoCloud's Virtual Machines that make up a cluster. The states of instances and clusters may differ.
Here are scenarios where the instance and cluster states differ:
- If the master node's instance is not in
Active
state, the cluster will not operate correctly if the availability type isSingle
. - If the availability type is
HA
, the cluster can operate correctly as long as one of the master node instances (1st or 2nd node) isActive
.
Components
The components running in a Hadoop Eco cluster are as follows:
Core Hadoop
- Standard (Single)
- HA (High Availability)
Location | Component | URL |
---|---|---|
Master 1 | HDFS NameNode | HDE-2.0.0 below: http://{HadoopMST-cluster-1} :50070 HDE-2.0.0 or above: http://{HadoopMST-cluster-1} :9870 |
YARN ResourceManager | http://{HadoopMST-cluster-1} :8088 | |
TimelineServer | http://{HadoopMST-cluster-1} :8188 | |
JobHistoryServer | http://{HadoopMST-cluster-1} :19888 | |
SparkHistoryServer | http://{HadoopMST-cluster-1} :18082 | |
SparkThriftServer | http://{HadoopMST-cluster-1} :20000 | |
Tez UI | http://{HadoopMST-cluster-1} :9999 | |
HiveServer2 (HS2) | http://{HadoopMST-cluster-1} :10002 | |
Hue | http://{HadoopMST-cluster-1} :8888 | |
Zeppelin | http://{HadoopMST-cluster-1} :8180 | |
Oozie | http://{HadoopMST-cluster-1} :11000 |
Location | Component | URL |
---|---|---|
Master 1 | HDFS NameNode | HDE-2.0.0 below: http://{HadoopMST-cluster-1} :50070 HDE-2.0.0 or above: http://{HadoopMST-cluster-1} :9870 |
YARN ResourceManager | http://{HadoopMST-cluster-1} :8088 | |
HiveServer2 (HS2) | http://{HadoopMST-cluster-1} :10002 | |
Master 2 | HDFS NameNode | HDE-2.0.0 below: http://{HadoopMST-cluster-2} :50070 HDE-2.0.0 or above: http://{HadoopMST-cluster-2} :9870 |
YARN ResourceManager | http://{HadoopMST-cluster-2} :8088 | |
HiveServer2 (HS2) | http://{HadoopMST-cluster-2} :10002 | |
Master 3 | TimelineServer | http://{HadoopMST-cluster-3} :8188 |
JobHistoryServer | http://{HadoopMST-cluster-3} :19888 | |
SparkHistoryServer | http://{HadoopMST-cluster-3} :18082 | |
SparkThriftServer | http://{HadoopMST-cluster-1} :20000 | |
Tez UI | http://{HadoopMST-cluster-3} :9999 | |
HiveServer2 (HS2) | http://{HadoopMST-cluster-3} :10002 | |
Hue | http://{HadoopMST-cluster-3} :8888 | |
Zeppelin | http://{HadoopMST-cluster-3} :8180 | |
Oozie | http://{HadoopMST-cluster-3} :11000 |
HBase
- Standard (Single)
- HA (High Availability)
Location | Component | URL |
---|---|---|
Master 1 | HDFS NameNode | HDE-2.0.0 below: http://{HadoopMST-cluster-1} :50070 HDE-2.0.0 or above: http://{HadoopMST-cluster-1} :9870 |
YARN ResourceManager | http://{HadoopMST-cluster-1} :8088 | |
HMaster | http://{HadoopMST-cluster-1} :16010 | |
TimelineServer | http://{HadoopMST-cluster-1} :8188 | |
JobHistoryServer | http://{HadoopMST-cluster-1} :19888 | |
Hue | http://{HadoopMST-cluster-1} :8888 |
Location | Component | URL |
---|---|---|
Master 1 | HDFS NameNode | HDE-2.0.0 below: http://{HadoopMST-cluster-1} :50070 HDE-2.0.0 or above: http://{HadoopMST-cluster-1} :9870 |
YARN ResourceManager | http://{HadoopMST-cluster-1} :8088 | |
HMaster | http://{HadoopMST-cluster-1} :16010 | |
Master 2 | HDFS NameNode | HDE-2.0.0 below: http://{HadoopMST-cluster-2} :50070 HDE-2.0.0 or above: http://{HadoopMST-cluster-2} :9870 |
YARN ResourceManager | http://{HadoopMST-cluster-2} :8088 | |
HMaster | http://{HadoopMST-cluster-2} :16010 | |
Master 3 | HMaster | http://{HadoopMST-cluster-3} :16010 |
TimelineServer | http://{HadoopMST-cluster-3} :8188 | |
JobHistoryServer | http://{HadoopMST-cluster-3} :19888 | |
Hue | http://{HadoopMST-cluster-3} :8888 |
Trino
- Standard (Single)
- HA (High Availability)
Location | Component | URL |
---|---|---|
Master 1 | HDFS NameNode | HDE-2.0.0 below: http://{HadoopMST-cluster-1} :50070 HDE-2.0.0 or above: http://{HadoopMST-cluster-1} :9870 |
YARN ResourceManager | http://{HadoopMST-cluster-1} :8088 | |
Trino Coordinator | http://{HadoopMST-cluster-1} :8780 | |
TimelineServer | http://{HadoopMST-cluster-1} :8188 | |
JobHistoryServer | http://{HadoopMST-cluster-1} :19888 | |
Tez UI | http://{HadoopMST-cluster-1} :9999 | |
HiveServer2 (HS2) | http://{HadoopMST-cluster-1} :10002 | |
Hue | http://{HadoopMST-cluster-1} :8888 | |
Zeppelin | http://{HadoopMST-cluster-1} :8180 |
Location | Component | URL |
---|---|---|
Master 1 | HDFS NameNode | HDE-2.0.0 below: http://{HadoopMST-cluster-1} :50070 HDE-2.0.0 or above: http://{HadoopMST-cluster-1} :9870 |
YARN ResourceManager | http://{HadoopMST-cluster-1} :8088 | |
HiveServer2 (HS2) | http://{HadoopMST-cluster-1} :10002 | |
Master 2 | HDFS NameNode | HDE-2.0.0 below: http://{HadoopMST-cluster-2} :50070 HDE-2.0.0 or above: http://{HadoopMST-cluster-2} :9870 |
YARN ResourceManager | http://{HadoopMST-cluster-2} :8088 | |
HiveServer2 (HS2) | http://{HadoopMST-cluster-2} :10002 | |
Master 3 | Trino Coordinator | http://{HadoopMST-cluster-3} :8780 |
TimelineServer | http://{HadoopMST-cluster-3} :8188 | |
JobHistoryServer | http://{HadoopMST-cluster-3} :19888 | |
Tez UI | http://{HadoopMST-cluster-3} :9999 | |
HiveServer2 (HS2) | http://{HadoopMST-cluster-3} :10002 | |
Hue | http://{HadoopMST-cluster-3} :8888 | |
Zeppelin | http://{HadoopMST-cluster-3} :8180 |
Dataflow
- Standard (Single)
- HA (High Availability)
Location | Component | URL |
---|---|---|
Master 1 | HDFS NameNode | HDE-2.0.0 below: http://{HadoopMST-cluster-1} :50070 HDE-2.0.0 or above: http://{HadoopMST-cluster-1} :9870 |
YARN ResourceManager | http://{HadoopMST-cluster-1} :8088 | |
TimelineServer | http://{HadoopMST-cluster-1} :8188 | |
JobHistoryServer | http://{HadoopMST-cluster-1} :19888 | |
Kafka Broker | http://{HadoopMST-cluster-1} :9092 | |
Druid Master | http://{HadoopMST-cluster-1} :3001 | |
Druid Broker | http://{HadoopMST-cluster-1} :3002 | |
Druid Router | http://{HadoopMST-cluster-1} :3008 | |
Superset | http://{HadoopMST-cluster-1} :4000 | |
Hue | http://{HadoopMST-cluster-1} :8888 |
Location | Component | URL |
---|---|---|
Master 1 | HDFS NameNode | HDE-2.0.0 below: http://{HadoopMST-cluster-1} :50070 HDE-2.0.0 or above: http://{HadoopMST-cluster-1} :9870 |
YARN ResourceManager | http://{HadoopMST-cluster-1} :8088 | |
Kafka Broker | http://{HadoopMST-cluster-1} :9092 | |
Druid Master | http://{HadoopMST-cluster-1} :3001 | |
Druid Broker | http://{HadoopMST-cluster-1} :3002 | |
Master 2 | HDFS NameNode | HDE-2.0.0 below: http://{HadoopMST-cluster-2} :50070 HDE-2.0.0 or above: http://{HadoopMST-cluster-2} :9870 |
YARN ResourceManager | http://{HadoopMST-cluster-2} :8088 | |
Kafka Broker | http://{HadoopMST-cluster-2} :9092 | |
Druid Master | http://{HadoopMST-cluster-2} :3001 | |
Druid Broker | http://{HadoopMST-cluster-2} :3002 | |
Master 3 | TimelineServer | http://{HadoopMST-cluster-3} :8188 |
JobHistoryServer | http://{HadoopMST-cluster-3} :19888 | |
Kafka Broker | http://{HadoopMST-cluster-3} :9092 | |
Druid Master | http://{HadoopMST-cluster-3} :3001 | |
Druid Broker | http://{HadoopMST-cluster-3} :3002 | |
Druid Router | http://{HadoopMST-cluster-3} :3008 | |
Superset | http://{HadoopMST-cluster-3} :4000 | |
Hue | http://{HadoopMST-cluster-3} :8888 |
Instance
Instances can be checked from the Cluster List, and they behave in the same way as standard VMs.
For stable operation, it is recommended that master node instances have at least 16GB, and worker node instances have at least 32GB of RAM.
Volume
A volume is the basic storage where the image is configured when creating an instance, and it represents the capacity of HDFS. To ensure stable operation of HDFS, selecting an appropriate size is essential. For detailed information on volumes, refer to the Volume Creation and Management document.
Network and security
All instances created by Hadoop Eco are provided within a VPC environment. To configure a cluster, a security group must be created, and inbound rules for component configuration must be set up to create the cluster. For detailed information on network and security settings, refer to the Security Group document.