5. Real-time data analysis based on Data Query
Run SQL queries on the Kafka message table registered in Data Catalog to perform real-time analysis.
- Estimated time: 15 minutes
- Recommended OS: macOS, Ubuntu
- IAM permission: Project administrator role
- Prerequisites
About this scenario
This tutorial is the final step in the real-time data pipeline and configures an environment for SQL-based analysis of Kafka streaming data.
After registering Kafka data stored in Object Storage with Data Catalog, you use Data Query to query real-time data from the table and perform analysis tasks such as filtering, aggregation, and processing.
You will cover the following:
- Query tables in the Data Query console
- Write and run queries
- Check results and download CSV files
Before you start
This tutorial covers real-time SQL analysis for a Kafka message table registered in Data Catalog. To proceed smoothly, complete metadata registration through Data Catalog in advance and confirm the table name and storage location.
Step 1. View the registered table
Check whether the table to use for analysis (kafka_message_table) is registered in Data Catalog.
- Go to KakaoCloud console > Data & Analytics > Data Query.
- Find the registered table in the explorer on the left.
- Catalog: default value (
awsdatacatalog) - Database: automatically generated name, such as
default, or the DB name you specified - Table:
kafka_message_table
- Catalog: default value (
- Click the table to check the schema (column structure) and preview.
If it is not registered, complete metadata registration through Data Catalog first.
Step 2. Write and run a query
Write and run an SQL query to retrieve the data you want from the table.
-
On the Write query tab, enter the following SQL.
SELECT *
FROM kafka_message_table
WHERE message IS NOT NULL
ORDER BY timestamp DESC
LIMIT 10; -
Click Run query to check the result. This query retrieves the 10 most recent messages where the
messagefield exists, in reverse chronological order.
Step 3. Check and download results
If the query results are displayed successfully, you can use them for analysis or save them.
- Click Download CSV at the top right of the results table.
- If needed, open the file in Excel, a BI tool, or another analysis tool for further analysis.
Wrap-up and next steps
You practiced the full pipeline from storing data received from Kafka in Object Storage to registering it with Data Catalog and performing SQL-based analysis. You can now extend this structure to various real-time data analysis scenarios, such as dashboard visualization, event detection, and scheduled reporting.
This series explains the entire process of building a real-time data pipeline centered on Kafka step by step. Message ingestion, storage, metadata registration, and analysis are connected into a single flow, and each step is written for a real operational environment.
Overall flow: Pub/Sub -> Kafka -> Object Storage -> Data Catalog -> Data Query
① Message processing through Kafka
② Configure Kafka streaming based on Pub/Sub messages
③ Load Kafka data into Object Storage
④ Register a Kafka message table using Data Catalog
⑤ Real-time data analysis based on Data Query