Data Query overview

Beta

KakaoCloud's Data Query service is a serverless interactive query service that allows you to query, analyze, and process data stored in various data sources using standard SQL. It automatically deploys an optimized processing engine according to the amount of data scanned, enabling efficient analysis of large-scale data without complex infrastructure setup. In particular, it supports real-time analysis and user-defined processing logic, providing agility and flexibility in data utilization.

Glossary

Query: A command or request sent to a database to retrieve or manipulate necessary data, written using SQL (Structured Query Language).
DML (Data Manipulation Language): Commands used to query and manipulate data in a database, including SELECT, INSERT, UPDATE, and DELETE operations on tables.
DDL (Data Definition Language): Commands used to define and manage database structures, mainly to create, modify, or delete database objects.
Data Catalog: A data management service that enables easy search and retrieval of metadata scattered across storage and databases; Data Query connects to Data Catalog to fetch data sources.
MySQL: A fully managed database service built on the open-source relational database management system (RDBMS) MySQL; Data Query fetches data sources via MySQL connection creation.
Object Storage: An object-based storage service optimized for storing and processing large amounts of data in key-value format, offering scalability and reliability; Data Query stores query result data here.

Purpose and use cases

As organizations face rapid growth in stored data and diversification of data sources, they often encounter difficulties in querying and analyzing data. Complex data infrastructure and inefficient data processing delay data utilization time and increase costs and resources due to unnecessary duplication and preprocessing. Additionally, if data analysis depends only on technically limited users, data utilization agility may decrease.
Data Query addresses these issues by providing a serverless interactive query service that allows easy querying and analysis using standard SQL. This service processes data in real time from diverse data sources without complex infrastructure setup and offers a cost-effective analysis environment to maximize organizational data utilization.

Features

Accelerate data analysis

Based on the open-source distributed SQL query engine Trino and parallel processing technology, it dramatically increases analysis speed of large datasets. During query execution, it efficiently scans only necessary information from data sources and minimizes processing time through optimized execution plans. This structure supports near real-time data analysis, improving the speed of data-driven decision-making.

Enable efficient data utilization

Supports standard SQL for easy integration with existing data analysis tools and provides a user-friendly interface to conduct data analysis efficiently.
Allows integrated querying across multiple data sources with a single SQL query, simplifying data processing workflows.

Maintain data integration and consistency

Supports direct querying on original data, preventing data loss or inconsistency caused by data movement or transformation, and ensuring analysis results consistent with source data.
Users can check metadata and schemas via Data Catalog and easily query and analyze data stored in MySQL databases using standard SQL. This integrated environment reduces complexity between data sources and provides trustworthy data management and analysis workflows.

Getting started

For detailed usage guides on Hadoop Eco, refer to the How-to Guides.
If you are new to KakaoCloud, start with the Start section.

Purpose and use cases​

Features​

Accelerate data analysis​

Enable efficient data utilization​

Maintain data integration and consistency​

Getting started​