Key Concepts

Catalog

A catalog in Data Catalog is a central repository for storing and managing metadata of user data.

Metadata is not shared between catalogs in different user networks.
Once Data Catalog is activated, users can create a catalog by specifying a VPC (Subnet).
The catalog operates with high availability (HA).
Users can store, modify, and delete metadata such as table definitions and storage paths in the catalog.
The catalog is compatible with Apache Hive Metastore.

Database

A database in Data Catalog serves as a container for storing tables.

The database is used to structure metadata tables.
A table can belong to only one database.
The database list in the KakaoCloud console allows users to view all databases within the project.

Table

In Data Catalog, a table represents metadata for data stored in a data store. Tables can be created in the KakaoCloud console, and the metadata values are displayed in the table list.

A table includes lower-level metadata such as schema, partitions, and table properties.
Tables can be manually created and their information can be edited.
When using Data Catalog as a metastore for Hadoop Eco, information on migrated tables can also be modified.

Crawler

A crawler in Data Catalog scans MySQL data, extracts metadata, and automatically updates Data Catalog, simplifying data discovery. Crawlers can be created in the KakaoCloud console, and tables created by the crawler are displayed in the table list.

The schema extracted by the crawler is stored in Data Catalog tables, with table names set as Prefix + MySQL database name_table name.
Crawler run history is retained for up to 90 days, after which it is automatically deleted.
Crawlers can be scheduled to run at specific times.

Resource status and lifecycle

The resources whose statuses can be tracked in Data Catalog are catalogs, databases, and tables. When a catalog is created, it establishes a central repository for storing and managing metadata of user-owned data assets. (The creation process takes approximately 10 minutes.) The catalog operates as a fully managed central repository with various statuses, including operational and termination states, which users can monitor to understand the current state of the catalog.

The status information for each resource is as follows:

Catalog lifecycle

Catalog status

Status	Description
`INIT`	Immediately after creating the catalog
`PROVISIONING`	VM for the catalog is being created
`RUNNING`	Catalog is running and available
`FATAL`	Catalog has encountered an unrecoverable error
`TERMINATING`	Hardware resources are being returned to terminate the catalog
`TERMINATED`	Catalog is terminated and no longer available

Database and table status

Databases and tables change status based on creation, modification, or deletion actions and are managed according to these statuses. The status affects the operation of the database or table, and the next actions are influenced by the current status. Tables not only have their own status but are also affected by the database's status. For example, tables can only be created or modified when the database is in ACTIVE or ALTERING status.

Status	Description
`CREATING`	Database or table is being created
`ALTERING`	Database or table is being modified
`DELETING`	Database or table is being deleted
`ACTIVE`	Database or table is available for use
`INACTIVE`	Database or table is unavailable

Crawler status

The status of a crawler changes based on creation, modification, execution, or deletion actions and is managed accordingly. Crawler functionality is also affected by the status of the database and MySQL instance. For example, a crawler can only be created or executed if the MySQL instance is in Available status.

Status	Description
`CREATING`	Crawler is being created
`ALTERING`	Crawler is being modified
`DELETING`	Crawler is being deleted
`ACTIVE`	Crawler is available for use
`RUNNING`	Crawler is currently running
`INACTIVE`	Crawler is unavailable (e.g., if the database is deleted, the crawler changes to `INACTIVE` but its logs remain accessible)

Catalog​

Database​

Table​

Crawler​

Resource status and lifecycle​

Catalog status​

Database and table status​

Crawler status​