Hadoop Eco Dataflow clusters now available
The following overview of Hadoop Eco was written based on information available in December 2023. For the latest information about KakaoCloud Hadoop Eco, see Hadoop Eco.
Gartner, a global information technology (IT) research and consulting company, researches and announces Data & Analytics (D&A) trends every year.
According to this year's Gartner report (Gartner Identifies the Top 10 Data and Analytics Trends for 2023), data and analytics teams must do more than manage data resources and generate insights from them. Beyond simply collecting massive amounts of data, they are required to collect the right data with the right tools at the right time and derive business insights from it. To do this, the report suggests that enterprise data and analytics teams should follow trends such as value optimization, data sharing, observability, data and analytics sustainability, and data fabric.
To continue keeping pace with rapidly evolving data analytics trends, KakaoCloud added the new Dataflow cluster type to Hadoop Eco in November 2023. Previously, Hadoop Eco provided Core Hadoop, HBase, and Trino types. With the addition of Dataflow clusters, data collection and analysis through Hadoop, Kafka, Druid, and Superset is now possible.
Dataflow provided by Apache Beam is one of the unified batch and streaming data processing models widely chosen by users around the world. Dataflow is a fully managed open-source framework optimized for streaming data analytics that minimizes latency, processing time, and cost through autoscaling and batch processing, and it supports a wide range of frameworks (Flink, Spark, and more) and multiple languages.
With the newly added Dataflow clusters in KakaoCloud Hadoop Eco, users can experience the following benefits.
- More efficient data collection and analysis: Efficiently collect data through Kafka and analyze data in real time using Druid and Superset.
- Various analysis tools provided: Visualize data and perform diverse analysis tasks through Druid and Superset.
- Scalability and high availability: Standard (Single) and high availability (HA) types are provided with cluster operation stability in mind.
In Standard (Single) mode, one master node instance runs one resource manager and one name node, making it suitable for small-scale jobs. In the high availability (HA) type, three master node instances are provided, and the resource manager and name node run in HA mode. Uninterrupted work is possible even when three master nodes are created or rebooted.
Selecting a Dataflow cluster
Try Dataflow, the unified batch and streaming data processing model provided by Apache Beam, in KakaoCloud Hadoop Eco.
Thank you.
In the Real-time web server log analysis and monitoring using Hadoop Eco Dataflow hands-on tutorial, you can learn in detail how to use Dataflow clusters to collect and analyze data efficiently.
