Skip to main content

Extracting MySQL metadata using the data catalog crawler

By utilizing the KakaoCloud Data Catalog crawler, you can extract and automatically update metadata from KakaoCloud MySQL. This document explains the process of extracting metadata from two MySQL databases using the Data Catalog crawler.

info
  • Estimated time: 20 minutes
  • User environment
    • Recommended OS: MacOS, Ubuntu
  • Region: kr-central-2
  • Prerequisites:

About this scenario

In this scenario, we will explain the process of using the KakaoCloud Data Catalog crawler to extract metadata from two MySQL databases and manage it in the Data Catalog database. We will create the kakao_shop table in the MySQL-1 database and the kakao_library table in the MySQL-2 database, then use the Data Catalog crawler to extract metadata. The main steps include:

  • VPC and network setup: Creating a VPC and subnets
  • Creating managed MySQL databases: Creating two KakaoCloud MySQL databases
  • Creating a MySQL client instance: Creating a MySQL client instance and using it to create tables
  • Creating and running the Data Catalog crawler: Using the crawler to scan MySQL data and extract metadata

Scenario

Before you start

In this step, we will set up the environment necessary for the tutorial.
We will confirm the information below to create the VPC, subnets, security group, MySQL client instance, and MySQL.

Create VPC and subnets

Create the VPC and subnets.

  1. Go to the KakaoCloud Console > Beyond Networking Service > VPC menu.

  2. Click the [Create VPC] button on the right, then create a VPC and subnets as follows:

    CategoryItemConfiguration/Input
    VPC InformationVPC Nametutorial-vpc
    VPC IP CIDR Block10.0.0.0/16
    Availability ZoneNumber of Availability Zones1
    First AZkr-central-2-a
    Subnet settingsNumber of public subnets per AZ1
    Number of private subnets per AZ1
    kr-central-2-a
    • Public Subnet IPv4 CIDR Block: 10.0.0.0/20
    • Private Subnet IPv4 CIDR Block: 10.0.16.0/20
  3. After confirming the generated topology at the bottom, click the [Create] button if everything looks good.

    • The status of the subnet changes in the following order: Pending Create > Pending Update > Active. You can proceed to the next step once it reaches the Active status.
    • It may take 5 to 10 minutes for the creation process to complete.

Configure security group

To enhance the security of the VM when it runs in the VPC, create a security group and add inbound policies as follows.

  1. Go to the KakaoCloud Console > VPC menu, then select Security Group.

  2. Click the [+ Create Security Group] button on the right, then configure the security group as follows:

    1. Enter the security group name and description as below:

      Security Group NameSecurity Group Description (optional)
      tutorial-mysql-client-sgSecurity policy for MySQL client instance
    2. Click the [+ Add] button at the bottom, set the inbound rule conditions as shown below, then click [Apply].

      Check my public IP

      Click the button below to check your current public IP.

      ProtocolSourcePort NumberPolicy Description
      TCP  {Your Public IP}/32   22   mysql-client inbound policy

Create and set up the MySQL client instance

After installing the MySQL client on the MySQL client instance, you will be able to access the KakaoCloud MySQL resource.

  1. Go to the KakaoCloud Console > Beyond Compute Service > Virtual Machine menu.

  2. In the Instance menu, click the [Create instance] button, then create a VM instance as follows:

    CategoryItemConfiguration/InputRemarks
    Basic InfoNametutorial-mysql-client
    Number1
    ImageUbuntu 20.04 - 5.4.0-173⚠️ Select a regular image, not an NVIDIA image
    Instance Typem2a.large
    VolumeRoot Volume10
    Key Pair{USER_KEYPAIR}⚠️ Safely store the key pair during the first use.
    Lost keys cannot be recovered and must be reissued.
    NetworkVPCtutorial-vpc
    Subnetmain (10.0.0.0/20)
    Security Grouptutorial-mysql-client-sg
  3. In the Virtual Machine > Instance menu, select the tutorial-mysql-client instance you created, then click the [Associate public IP] button in the Instance Actions panel on the right.

    • The associated public IP can be confirmed in the Beyond Networking Service > VPC > Public IP tab.

Create managed MySQL database

KakaoCloud MySQL is a fully managed database service provided in a VPC environment. To extract metadata using the Data Catalog crawler, we will create two KakaoCloud MySQL databases.

  1. Go to the KakaoCloud Console > Data Store > MySQL menu.

  2. In the Instance Group tab, click the [Create instance group] button, then Create two instance groups.

    tutorial-mysql-1
    CategoryItemConfiguration/Input
    Basic SettingsInstance Group Nametutorial-mysql-1
    MySQL SettingsEngine VersionUse the specified version
    MySQL Username{USER_NAME}
    MySQL Password{USER_PASSWORD}
    Instance Availability/NumberSingle (Primary Instance)
    Instance Typem2a.large
    Storage Type/Size100
    Log Storage Type/Size100
    Network SettingsVPCtutorial-vpc
    Subnettutorial-vpc_{VPC_ID}_sn_1 (10.0.16.0/20)
    Auto BackupAuto Backup OptionsDisabled
    tutorial-mysql-2
    CategoryItemConfiguration/Input
    Basic SettingsInstance Group Nametutorial-mysql-2
    MySQL SettingsEngine VersionUse the specified version
    MySQL Username{USER_NAME}
    MySQL Password{USER_PASSWORD}
    Instance Availability/NumberSingle (Primary Instance)
    Instance Typem2a.large
    Storage Type/Size100
    Log Storage Type/Size100
    Network SettingsVPCtutorial-vpc
    Subnettutorial-vpc_{VPC_ID}_sn_1 (10.0.16.0/20)
    Auto BackupAuto Backup OptionsDisabled

Create MySQL databases and tables

Connect to the MySQL client instance via SSH, then access the MySQL client to create the tables.

  1. On your local machine, open the terminal and navigate to the folder where the key pair file was downloaded.

    • If you created the private key file previously, it is typically saved in the Downloads folder. (cd ~/Downloads)
  2. Run the following command to connect to the MySQL client instance via SSH.

    chmod 400 ${PRIVATE_KEY}.pem # Grant read permission
    ssh -i ${PRIVATE_KEY}.pem ubuntu@${MYSQL_CLIENT_PUBLIC_IP}
    환경변수설명
    PRIVATE_KEY🖌Key File Name
    MYSQL_CLIENT_PUBLIC_IP🖌Check the public IP of the tutorial-mysql-client instance in the Virtual Machine > Instance tab
  3. Install MySQL client.

    Install MySQL client
    sudo apt-get install mysql-client
  4. Connect to tutorial-mysql-1 and tutorial-mysql-2 databases and create tables.

    tutorial-mysql-1
    1. Run the following command to connect to the MySQL database.

    mysql -u --host ${END_POINT} -P 3306 ${USER_NAME} -p"${USER_PASSWORD}"
    환경변수설명
    END_POINT🖌tutorial-mysql-1 endpoint address
    USER_NAME🖌tutorial-mysql-1 username
    USER_PASSWORD🖌tutorial-mysql-1 password


    1. Create the kakao_shop database and tables using SQL queries.

    -- Create database
    CREATE DATABASE kakao_shop;

    -- Select the database
    USE kakao_shop;

    -- Create products table
    CREATE TABLE products (
    product_id INT AUTO_INCREMENT PRIMARY KEY,
    product_name VARCHAR(255) NOT NULL,
    description TEXT,
    price DECIMAL(10, 2) NOT NULL,
    stock_quantity INT NOT NULL,
    category VARCHAR(50),
    seller VARCHAR(100)
    );

    -- Create orders table
    CREATE TABLE orders (
    order_id INT AUTO_INCREMENT PRIMARY KEY,
    customer_name VARCHAR(100) NOT NULL,
    product_id INT,
    quantity INT NOT NULL,
    order_date DATE,
    delivery_status ENUM('Processing', 'Shipped', 'Delivered') DEFAULT 'Processing'
    );
    tutorial-mysql-2
    1. Run the following command to connect to the MySQL database.

    mysql -u --host ${END_POINT} -P 3306 ${USER_NAME} -p"${USER_PASSWORD}"
    환경변수설명
    END_POINT🖌tutorial-mysql-2 Endpoint
    USER_NAME🖌tutorial-mysql-2 User name
    USER_PASSWORD🖌tutorial-mysql-2 Password


    1. Create the kakao_library database and table using SQL queries.

    -- Create a database
    CREATE DATABASE kakao_library;

    -- Select the database
    USE kakao_library;

    -- Create a book information table
    CREATE TABLE books (
    book_id INT AUTO_INCREMENT PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    author VARCHAR(100) NOT NULL,
    genre VARCHAR(50),
    publication_date DATE,
    isbn VARCHAR(20),
    available BOOLEAN DEFAULT TRUE
    );

    -- Create a member information table
    CREATE TABLE members (
    member_id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE,
    phone VARCHAR(20),
    address VARCHAR(255),
    member_since DATE
    );

Getting started

Data Catalog is a fully managed service that helps you identify and efficiently manage your organization and user data assets within KakaoCloud. It consists of a catalog, database, table, and crawler. To create a Data Catalog crawler, you need a catalog, database, and Object Storage Bucket.

Step 1. Create Data Catalog catalog

A catalog is a fully managed central repository within a VPC, and must be created first to use the Data Catalog service.

  1. Go to KakaoCloud Console > Analytics > Data Catalog menu.

  2. In the Catalog menu, click the [Create catalog] button on the right, and create a catalog as follows.

    CategoryItemSetting/Input Value
    Nametutorial_catalog
    VPC settingsVPCtutorial-vpc
    Subnetmain

Step 2. Create Object Storage bucket

An Object Storage Bucket is required to create a Data Catalog database.

  1. Go to KakaoCloud Console > Beyond Storage Service > Object Storage menu.

  2. In the Bucket menu, click the [Create bucket] button on the right, and create a bucket as follows.

    ClassificationSettings/Input Value
    TypeSTANDARD
    Nametutorial-bucket
    EncryptionNot used
  3. In the created bucket, click the [Create folder] button on the right, and create a folder as follows.

    ClassificationSettings/Input Value
    Folder Nametutorial-folder

Step 3. Create Data Catalog database

A database is a container that stores tables. To create a database, you must first create a catalog and an Object Storage Bucket.

  1. Go to KakaoCloud Console > Analytics > Data Catalog menu.

  2. In the Database menu, click the [Create database] button on the right, and create a database as follows.

    CategoryItemSetting/Input Value
    Catalogtutorial_catalog
    Nametutorial_database
    PathBucket Nametutorial-bucket
    Directorytutorial-folder

Step 4. Create Data Catalog crawler

Create two crawlers to extract two MySQL metadata.

  1. Go to KakaoCloud Console > Analytics > Data Catalog menu.

  2. Click the [Create crawler] button on the right side of the Crawler menu, and create a crawler as follows.

    tutorial_crawler_1
    CategoryItemSetting/Input Value
    Databasetutorial_database
    Crawler NameVPCtutorial_crawler_1
    MySQL Full PathMySQL to Connecttutorial-mysql-1
    MySQL Database Namekakao_shop
    MySQL AccountMySQL User ID{USER_ID}
    MySQL Password{USER_PASSWORD}
    Table Prefixtutorial_
    ScheduleOn-Demand
    tutorial_crawler_2
    CategoryItemSetting/Input Value
    Databasetutorial_database
    Crawler NameVPCtutorial_crawler_2
    MySQL Full PathMySQL to Connecttutorial-mysql-2
    MySQL Database Namekakao_library
    MySQL AccountMySQL User ID{USER_ID}
    MySQL Password{USER_PASSWORD}
    Table Prefixtutorial_
    ScheduleOn-Demand
info

A crawler can be created or run only when MySQL's status is Available.

info

When creating a crawler, you must successfully test the connection to the MySQL account entry to create the crawler.

Step 5. Run Data Catalog crawler and check results

  1. Select a crawler from the Crawler menu, then click the Run button on the upper right.
  2. You can check the extracted metadata information from the Table menu.

Table information

alt text

Schema information

alt text