3 posts tagged with "data-query"

Query Cloud Trail and DNS resolver logs with Data Query

May 29, 2026 · 6 min read

Cloud Engineer

In production environments, logs are reference data for troubleshooting incidents and reviewing security. However, storing logs is not enough. To analyze operational issues and identify causes, you must be able to query the data quickly with the conditions you need. For logs that are repeatedly used for audits and diagnostics, it is especially important to consider the storage location, file structure, and query method from the beginning.

The two newly added tutorials explain how to use Data Catalog and Data Query to query and analyze operational logs stored in Object Storage with SQL.

Both documents use the same data analysis architecture, but they focus on different operational scenarios. One covers security auditing and change tracking based on user activity and resource change history. The other covers network diagnostics based on DNS query flows inside a VPC. Accordingly, the target logs are Cloud Trail logs and DNS resolver query logs.

Operational log storage -> Object Storage -> Data Catalog -> Data Query

Although the logs being analyzed are different, the workflow is the same: store logs in Object Storage, configure table metadata in Data Catalog, and query the data with SQL in Data Query.

This post looks at when each log is useful and how to analyze operational logs with SQL.

Cloud Trail logs: Reference data for checking resource change history

Cloud Trail records user activity and resource operation history in KakaoCloud as events. For example, you can check when a specific user logged in, which resources were created or modified, and which service generated an event. When these logs are connected to Data Query, you can answer common security audit and history tracking questions with SQL.

Which user changed which resource on a specific date?
Is there any operation history for a specific service or IP address?
Did create, update, or delete events occur for a specific resource?

The Query Cloud Trail logs with Data Query tutorial explains how to store Cloud Trail logs in gz format in Object Storage and configure Data Catalog and Data Query based on the project_event and domain_event paths. In this workflow, it is important to use partition columns such as date_id and hour_id as query conditions so that you only query the required period.

Because Cloud Trail logs are used for security audits and change tracking, it is better to narrow the query by conditions such as when, from which service, by whom, and for which resource, rather than scanning all logs at once.

DNS resolver query logs: Check DNS queries and responses inside a VPC

DNS resolver query logs record DNS query and response information generated inside a VPC. You can check which domains an application queried, whether responses were normal, and whether failed responses were concentrated in a specific time period. With Data Query, you can answer operational questions such as:

Which domains were queried most on a specific date?
During which time periods were non-NOERROR responses concentrated?
Did a specific VPC query a specific domain unusually often?
Are DNS queries with long response times recurring?

The Query DNS resolver query logs with Data Query tutorial configures tables based on the Object Storage path structure KCLogs/{region-name}/{year=yyyy/month=mm/day=dd} used by DNS resolver query logs. It then shows how to synchronize the year, month, and day partitions in Data Query and aggregate query counts or failed response counts by domain.

DNS logs are useful not only for network incident analysis, but also for checking external dependencies of internal services, unexpected domain queries, and repeated failed responses. If Cloud Trail shows the history of users and resource operations, DNS resolver query logs show how applications inside a VPC perform name resolution.

Why the two tutorials use the same pattern

Cloud Trail logs and DNS resolver query logs have different characteristics, but operators handle them in similar ways.

First, logs are stored in Object Storage. Next, metadata is configured in Data Catalog based on file paths and partition structures. Finally, Data Query uses SQL to query the logs with conditions such as specific periods, services, users, domains, and response codes.

After this common pattern is established, you can reuse the same flow of storage location, metadata configuration, and SQL querying even when the analysis target changes.

Step	Cloud Trail logs	DNS resolver query logs
Storage location	Object Storage	Object Storage
Main path	`trail/project_event`, `trail/domain_event`	`KCLogs/{region-name}/{year=yyyy/month=mm/day=dd}`
Main partitions	`date_id`, `hour_id`	`year`, `month`, `day`
Main analysis focus	User activity, resource changes, service events	Domain queries, DNS response codes, VPC-specific query patterns
Query tool	Data Query	Data Query

This structure helps manage operational log analysis in a consistent way. Instead of learning different query languages and storage structures for each separate tool, you can standardize the query flow around Object Storage, Data Catalog, and Data Query.

How to start operational log analysis

If you need to retain operational logs for a long time and query them when necessary, we recommend reviewing both tutorials together. For example, when an incident occurs during a specific time period, you can check resource change history with Cloud Trail logs and also review DNS query failures or response delays from the same time period with DNS resolver query logs. When logs with different characteristics can be queried in the same way, it becomes easier to interpret individual events in a broader operational context.

KakaoCloud technical documentation provides various tutorials based on practical operational scenarios. Use the following documents to learn how to store operational logs, configure them in a queryable form, and analyze them step by step with the conditions you need.

👉 Query Cloud Trail logs with Data Query
👉 Query DNS resolver query logs with Data Query

Building a Kafka-based real-time data pipeline

September 25, 2025 · 3 min read

Erin (오예진)

Cloud Engineer

Logs, user events, and transaction information generated by services. Storing this data is important, but it becomes a truly "meaningful flow" only when it can be analyzed quickly.

The Kafka-based real-time data pipeline tutorial series introduced here is a hands-on tutorial that lets you directly follow how to implement this "flow of data" on KakaoCloud.

This series consists of three parts and guides you step by step through the entire process, from receiving real-time messages to storage and analysis. It is designed so that you can connect Kafka, Object Storage, Data Catalog, and Data Query, understand the overall structure through which data flows, and implement it directly.

Architecture for building a real-time data pipeline

Part 1: Build a structure for receiving Kafka messages

In the first tutorial, you create a Kafka cluster and configure an environment for sending and receiving messages through topics. You create Kafka topics, configure producers and consumers, and send and receive messages to establish the foundation for real-time data collection. This process focuses on understanding the basic structure of an event-driven system and creating the starting point of message flow.

👉 View the message processing through Kafka tutorial

Part 2: Store received messages in Object Storage

The second tutorial covers the flow of periodically collecting messages received through Kafka and storing them in Object Storage. Messages are collected at regular intervals and stored as a single file, and the stored files are used later as data sources for analysis. In this process, you can also consider the boundary between streaming and batch and how file formats and structures should be designed.

👉 View the tutorial for loading Kafka data into Object Storage

Part 3: Real-time analysis with Data Catalog and Data Query

The final tutorial configures an environment where data stored in Object Storage is registered in Data Catalog and SQL-based analysis can be performed through Data Query. Tables registered in the catalog are managed by partition, and new data can be automatically reflected through periodic synchronization settings. The most important part of this stage is converting real-time data collected through Kafka into a structure that can be analyzed immediately without a separate complex pipeline.

👉 View the tutorial for analyzing Kafka messages using Data Catalog and Data Query

This real-time data pipeline tutorial series is not a simple code example. It is written based on architecture and settings that can be used as-is in operating environments. By directly following the entire process of receiving Kafka messages, storing them in Object Storage, and connecting them to analysis with Data Catalog and Data Query, you can quickly build practical intuition for designing real-time services, monitoring systems, and event-based statistics pipelines.

If you are designing a Kafka-based real-time data pipeline for the first time or want to expand an existing pipeline on KakaoCloud, this tutorial will be a good reference.

🖥️ Try it now!
View the Kafka-based real-time data pipeline tutorial series at a glance

KakaoCloud Data Query officially released as GA

August 22, 2025 · 4 min read

Chloe (이다예슬)

Service Manager

KakaoCloud's serverless interactive query service, Data Query, has finally been officially released as GA (General Availability). This GA version can be seen as a release in which features, performance, and the pricing system have been refined overall through numerous customer cases during internal beta testing and preview stages, so it can be used reliably in real customer environments.

Data Query is a serverless query engine that lets users query data stored in Object Storage directly using SQL without managing separate infrastructure. Users can explore large-scale data with a single query without building a data warehouse themselves or worrying about complex cluster operations.

Simpler and more transparent pricing

In the GA version, a data-scan-based pay-as-you-go pricing model is applied. Fees are charged at KRW 5,850 per TiB based on the amount of data scanned when a query is executed, and no cost is incurred for metadata queries or DDL statements (CREATE, DROP, SHOW TABLE).

A particularly notable change in this version is that the billing policy for failed and canceled queries has been clarified. If a user cancels a query directly, only the data scanned up to the cancellation point is charged. If a system timeout occurs, fees are charged based on the scanned amount immediately before the timeout. This pricing policy helps ensure that unnecessary charges do not occur from an actual operator's perspective, and allows users to safely try experimental queries or large-scale exploration tasks.

Data Query is also most efficient when used together with Object Storage. Because data can be queried as-is without separate replication or movement, no additional overhead occurs beyond standard Object Storage pricing. As a result, operators can secure flexibility in data analytics while reducing unnecessary costs.

Real example: log analysis based on a data lake

Data Query works closely with Object Storage and enables analysis in the same way even as data scale grows. One of the most frequently mentioned cases during the beta service stage was service log analysis. One customer stored tens of TB of service logs in Object Storage and used Data Query to explore abnormal traffic patterns in near real time. With the existing approach, logs had to be collected, loaded, and then ingested into a separate analysis system. With the GA version of Data Query, however, results can be checked directly with SQL queries without separate ETL.

For example, users can quickly check the distribution of error codes concentrated during a specific time period or instantly analyze API response times by user segment. This usage clearly shows the value of a serverless query service in a data lake architecture.

Data analysis closer to real operations

The GA release of Data Query is an important starting point for KakaoCloud's expansion into the data platform area. You can now explore data stored in Object Storage directly without separately building or managing a query-only cluster. In particular, by providing a predictable cost model rather than a complex billing structure, it can improve stability and efficiency in actual service operations. After this GA release, support for various additional data sources will continue to expand, and additional features such as IAM Role integration and more sophisticated query optimization will be provided sequentially.

Data analysis is no longer the role of only dedicated teams. An environment has been prepared where various users, including operators, developers, and planners, can immediately explore the data they need through Data Query and make decisions quickly.
Try the GA version of Data Query now and experience the changed data analytics experience directly.

Want to learn more about Data Query?
👉 View Data Query documentation

Cloud Trail logs: Reference data for checking resource change history​

DNS resolver query logs: Check DNS queries and responses inside a VPC​

Why the two tutorials use the same pattern​

How to start operational log analysis​

Part 1: Build a structure for receiving Kafka messages​

Part 2: Store received messages in Object Storage​

Part 3: Real-time analysis with Data Catalog and Data Query​

Simpler and more transparent pricing​

Real example: log analysis based on a data lake​

Data analysis closer to real operations​