Skip to main content

Key Concepts

Catalog

A catalog in Data Catalog is a central repository for storing and managing metadata of user data.

  • Metadata is not shared between catalogs in different user networks.
  • Once Data Catalog is activated, users can create a catalog by specifying a VPC (Subnet).
  • The catalog operates with high availability (HA).
  • Users can store, modify, and delete metadata such as table definitions and storage paths in the catalog.
  • The catalog is compatible with Apache Hive Metastore.

Database

A database in Data Catalog serves as a container for storing tables.

  • The database is used to structure metadata tables.
  • A table can belong to only one database.
  • The database list in the KakaoCloud Console allows users to view all databases within the project.

Table

In Data Catalog, a table represents metadata for data stored in a data store. Tables can be created in the KakaoCloud Console, and the metadata values are displayed in the table list.

  • A table includes lower-level metadata such as schema, partitions, and table properties.
  • Tables can be manually created and their information can be edited.
  • When using Data Catalog as a metastore for Hadoop Eco, information on migrated tables can also be modified.

Crawler

A crawler in Data Catalog scans MySQL data, extracts metadata, and automatically updates Data Catalog, simplifying data discovery. Crawlers can be created in the KakaoCloud Console, and tables created by the crawler are displayed in the table list.

  • The schema extracted by the crawler is stored in Data Catalog tables, with table names set as Prefix + MySQL database name_table name.
  • Crawler run history is retained for up to 90 days, after which it is automatically deleted.
  • Crawlers can be scheduled to run at specific times.

Resource status and lifecycle

The resources whose statuses can be tracked in Data Catalog are catalogs, databases, and tables. When a catalog is created, it establishes a central repository for storing and managing metadata of user-owned data assets. (The creation process takes approximately 10 minutes.) The catalog operates as a fully managed central repository with various statuses, including operational and termination states, which users can monitor to understand the current state of the catalog.

The status information for each resource is as follows:

Image Catalog lifecycle

Catalog status

StatusDescription
INITImmediately after creating the catalog
PROVISIONINGVM for the catalog is being created
RUNNINGCatalog is running and available
FATALCatalog has encountered an unrecoverable error
TERMINATINGHardware resources are being returned to terminate the catalog
TERMINATEDCatalog is terminated and no longer available

Database and table status

Databases and tables change status based on creation, modification, or deletion actions and are managed according to these statuses. The status affects the operation of the database or table, and the next actions are influenced by the current status. Tables not only have their own status but are also affected by the database's status. For example, tables can only be created or modified when the database is in ACTIVE or ALTERING status.

StatusDescription
CREATINGDatabase or table is being created
ALTERINGDatabase or table is being modified
DELETINGDatabase or table is being deleted
ACTIVEDatabase or table is available for use
INACTIVEDatabase or table is unavailable

Crawler status

The status of a crawler changes based on creation, modification, execution, or deletion actions and is managed accordingly. Crawler functionality is also affected by the status of the database and MySQL instance. For example, a crawler can only be created or executed if the MySQL instance is in Available status.

StatusDescription
CREATINGCrawler is being created
ALTERINGCrawler is being modified
DELETINGCrawler is being deleted
ACTIVECrawler is available for use
RUNNINGCrawler is currently running
INACTIVECrawler is unavailable (e.g., if the database is deleted, the crawler changes to INACTIVE but its logs remain accessible)