5. Real-time data analysis based on Data Query

Run SQL queries on the Kafka message table registered in Data Catalog to perform real-time analysis.

Basic information

Estimated time: 15 minutes
Recommended OS: macOS, Ubuntu
IAM permission: Project administrator role
Prerequisites
- Complete Analyze Kafka messages using Data Catalog and Data Query

About this scenario

This tutorial is the final step in the real-time data pipeline and configures an environment for SQL-based analysis of Kafka streaming data.

After registering Kafka data stored in Object Storage with Data Catalog, you use Data Query to query real-time data from the table and perform analysis tasks such as filtering, aggregation, and processing.

You will cover the following:

Query tables in the Data Query console
Write and run queries
Check results and download CSV files

Before you start

This tutorial covers real-time SQL analysis for a Kafka message table registered in Data Catalog. To proceed smoothly, complete metadata registration through Data Catalog in advance and confirm the table name and storage location.

Step 1. View the registered table

Check whether the table to use for analysis (kafka_message_table) is registered in Data Catalog.

Go to KakaoCloud console > Data & Analytics > Data Query.
Find the registered table in the explorer on the left.
- Catalog: default value (awsdatacatalog)
- Database: automatically generated name, such as default, or the DB name you specified
- Table: kafka_message_table
Click the table to check the schema (column structure) and preview.

If it is not registered, complete metadata registration through Data Catalog first.

Step 2. Write and run a query

Write and run an SQL query to retrieve the data you want from the table.

On the Write query tab, enter the following SQL.

SELECT *
FROM kafka_message_table
WHERE message IS NOT NULL
ORDER BY timestamp DESC
LIMIT 10;

Click Run query to check the result. This query retrieves the 10 most recent messages where the message field exists, in reverse chronological order.

Step 3. Check and download results

If the query results are displayed successfully, you can use them for analysis or save them.

Click Download CSV at the top right of the results table.
If needed, open the file in Excel, a BI tool, or another analysis tool for further analysis.

Wrap-up and next steps

You practiced the full pipeline from storing data received from Kafka in Object Storage to registering it with Data Catalog and performing SQL-based analysis. You can now extend this structure to various real-time data analysis scenarios, such as dashboard visualization, event detection, and scheduled reporting.

Build a real-time data pipeline series

This series explains the entire process of building a real-time data pipeline centered on Kafka step by step. Message ingestion, storage, metadata registration, and analysis are connected into a single flow, and each step is written for a real operational environment.

Overall flow: Pub/Sub -> Kafka -> Object Storage -> Data Catalog -> Data Query

① Message processing through Kafka
② Configure Kafka streaming based on Pub/Sub messages
③ Load Kafka data into Object Storage
④ Register a Kafka message table using Data Catalog
⑤ Real-time data analysis based on Data Query

About this scenario​

Before you start​

Step 1. View the registered table​

Step 2. Write and run a query​

Step 3. Check and download results​

Wrap-up and next steps​