Extracting MySQL metadata using the data catalog crawler
By utilizing the KakaoCloud Data Catalog crawler, you can extract and automatically update metadata from KakaoCloud MySQL. This document explains the process of extracting metadata from two MySQL databases using the Data Catalog crawler.
- Estimated time: 20 minutes
- User environment
- Recommended OS: MacOS, Ubuntu
- Region: kr-central-2
- Prerequisites:
About this scenario
In this scenario, we will explain the process of using the KakaoCloud Data Catalog crawler to extract metadata from two MySQL databases and manage it in the Data Catalog database. We will create the kakao_shop
table in the MySQL-1 database and the kakao_library
table in the MySQL-2 database, then use the Data Catalog crawler to extract metadata.
The main steps include:
- VPC and network setup: Creating a VPC and subnets
- Creating managed MySQL databases: Creating two KakaoCloud MySQL databases
- Creating a MySQL client instance: Creating a MySQL client instance and using it to create tables
- Creating and running the Data Catalog crawler: Using the crawler to scan MySQL data and extract metadata
Before you start
In this step, we will set up the environment necessary for the tutorial.
We will confirm the information below to create the VPC, subnets, security group, MySQL client instance, and MySQL.
Create VPC and subnets
Create the VPC and subnets.
-
Go to the KakaoCloud Console > Beyond Networking Service > VPC menu.
-
Click the [Create VPC] button on the right, then create a VPC and subnets as follows:
Category Item Configuration/Input VPC Information VPC Name tutorial-vpc VPC IP CIDR Block 10.0.0.0/16 Availability Zone Number of Availability Zones 1 First AZ kr-central-2-a Subnet settings Number of public subnets per AZ 1 Number of private subnets per AZ 1 kr-central-2-a - Public Subnet IPv4 CIDR Block: 10.0.0.0/20
- Private Subnet IPv4 CIDR Block: 10.0.16.0/20
- Public Subnet IPv4 CIDR Block: 10.0.0.0/20
-
After confirming the generated topology at the bottom, click the [Create] button if everything looks good.
- The status of the subnet changes in the following order:
Pending Create
>Pending Update
>Active
. You can proceed to the next step once it reaches theActive
status. - It may take 5 to 10 minutes for the creation process to complete.
- The status of the subnet changes in the following order:
Configure security group
To enhance the security of the VM when it runs in the VPC, create a security group and add inbound policies as follows.
-
Go to the KakaoCloud Console > VPC menu, then select Security Group.
-
Click the [+ Create Security Group] button on the right, then configure the security group as follows:
-
Enter the security group name and description as below:
Security Group Name Security Group Description (optional) tutorial-mysql-client-sg Security policy for MySQL client instance -
Click the [+ Add] button at the bottom, set the inbound rule conditions as shown below, then click [Apply].
Check my public IPClick the button below to check your current public IP.
Protocol Source Port Number Policy Description TCP
{Your Public IP}/32
22 mysql-client inbound policy
-
Create and set up the MySQL client instance
After installing the MySQL client on the MySQL client instance, you will be able to access the KakaoCloud MySQL resource.
-
Go to the KakaoCloud Console > Beyond Compute Service > Virtual Machine menu.
-
In the Instance menu, click the [Create instance] button, then create a VM instance as follows:
Category Item Configuration/Input Remarks Basic Info Name tutorial-mysql-client Number 1 Image Ubuntu 20.04 - 5.4.0-173 ⚠️ Select a regular image, not an NVIDIA image Instance Type m2a.large Volume Root Volume 10 Key Pair {USER_KEYPAIR}
⚠️ Safely store the key pair during the first use.
Lost keys cannot be recovered and must be reissued.Network VPC tutorial-vpc Subnet main (10.0.0.0/20) Security Group tutorial-mysql-client-sg
-
In the Virtual Machine > Instance menu, select the
tutorial-mysql-client
instance you created, then click the [Associate public IP] button in the Instance Actions panel on the right.-
The associated public IP can be confirmed in the Beyond Networking Service > VPC > Public IP tab.
-
Create managed MySQL database
KakaoCloud MySQL is a fully managed database service provided in a VPC environment. To extract metadata using the Data Catalog crawler, we will create two KakaoCloud MySQL databases.
-
Go to the KakaoCloud Console > Data Store > MySQL menu.
-
In the Instance Group tab, click the [Create instance group] button, then Create two instance groups.
tutorial-mysql-1
Category Item Configuration/Input Basic Settings Instance Group Name tutorial-mysql-1 MySQL Settings Engine Version Use the specified version MySQL Username {USER_NAME}
MySQL Password {USER_PASSWORD}
Instance Availability/Number Single (Primary Instance) Instance Type m2a.large Storage Type/Size 100 Log Storage Type/Size 100 Network Settings VPC tutorial-vpc Subnet tutorial-vpc_{VPC_ID}_sn_1 (10.0.16.0/20)
Auto Backup Auto Backup Options Disabled tutorial-mysql-2
Category Item Configuration/Input Basic Settings Instance Group Name tutorial-mysql-2 MySQL Settings Engine Version Use the specified version MySQL Username {USER_NAME}
MySQL Password {USER_PASSWORD}
Instance Availability/Number Single (Primary Instance) Instance Type m2a.large Storage Type/Size 100 Log Storage Type/Size 100 Network Settings VPC tutorial-vpc Subnet tutorial-vpc_{VPC_ID}_sn_1 (10.0.16.0/20)
Auto Backup Auto Backup Options Disabled
Create MySQL databases and tables
Connect to the MySQL client instance via SSH, then access the MySQL client to create the tables.
-
On your local machine, open the terminal and navigate to the folder where the key pair file was downloaded.
- If you created the private key file previously, it is typically saved in the Downloads folder. (cd ~/Downloads)
-
Run the following command to connect to the MySQL client instance via SSH.
chmod 400 ${PRIVATE_KEY}.pem # Grant read permission
ssh -i ${PRIVATE_KEY}.pem ubuntu@${MYSQL_CLIENT_PUBLIC_IP}환경변수 설명 PRIVATE_KEY🖌︎ Key File Name MYSQL_CLIENT_PUBLIC_IP🖌︎ Check the public IP of the tutorial-mysql-client instance in the Virtual Machine > Instance tab -
Install MySQL client.
Install MySQL clientsudo apt-get install mysql-client
-
Connect to tutorial-mysql-1 and tutorial-mysql-2 databases and create tables.
tutorial-mysql-1
- Run the following command to connect to the MySQL database.
mysql -u --host ${END_POINT} -P 3306 ${USER_NAME} -p"${USER_PASSWORD}"
환경변수 설명 END_POINT🖌︎ tutorial-mysql-1 endpoint address USER_NAME🖌︎ tutorial-mysql-1 username USER_PASSWORD🖌︎ tutorial-mysql-1 password - Create the
kakao_shop
database and tables using SQL queries.
-- Create database
CREATE DATABASE kakao_shop;
-- Select the database
USE kakao_shop;
-- Create products table
CREATE TABLE products (
product_id INT AUTO_INCREMENT PRIMARY KEY,
product_name VARCHAR(255) NOT NULL,
description TEXT,
price DECIMAL(10, 2) NOT NULL,
stock_quantity INT NOT NULL,
category VARCHAR(50),
seller VARCHAR(100)
);
-- Create orders table
CREATE TABLE orders (
order_id INT AUTO_INCREMENT PRIMARY KEY,
customer_name VARCHAR(100) NOT NULL,
product_id INT,
quantity INT NOT NULL,
order_date DATE,
delivery_status ENUM('Processing', 'Shipped', 'Delivered') DEFAULT 'Processing'
);tutorial-mysql-2
- Run the following command to connect to the MySQL database.
mysql -u --host ${END_POINT} -P 3306 ${USER_NAME} -p"${USER_PASSWORD}"
환경변수 설명 END_POINT🖌︎ tutorial-mysql-2 Endpoint USER_NAME🖌︎ tutorial-mysql-2 User name USER_PASSWORD🖌︎ tutorial-mysql-2 Password - Create the kakao_library database and table using SQL queries.
-- Create a database
CREATE DATABASE kakao_library;
-- Select the database
USE kakao_library;
-- Create a book information table
CREATE TABLE books (
book_id INT AUTO_INCREMENT PRIMARY KEY,
title VARCHAR(255) NOT NULL,
author VARCHAR(100) NOT NULL,
genre VARCHAR(50),
publication_date DATE,
isbn VARCHAR(20),
available BOOLEAN DEFAULT TRUE
);
-- Create a member information table
CREATE TABLE members (
member_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(255) UNIQUE,
phone VARCHAR(20),
address VARCHAR(255),
member_since DATE
); - Run the following command to connect to the MySQL database.
Getting started
Data Catalog is a fully managed service that helps you identify and efficiently manage your organization and user data assets within KakaoCloud. It consists of a catalog, database, table, and crawler. To create a Data Catalog crawler, you need a catalog, database, and Object Storage Bucket.
Step 1. Create Data Catalog catalog
A catalog is a fully managed central repository within a VPC, and must be created first to use the Data Catalog service.
-
Go to KakaoCloud Console > Analytics > Data Catalog menu.
-
In the Catalog menu, click the [Create catalog] button on the right, and create a catalog as follows.
Category Item Setting/Input Value Name tutorial_catalog VPC settings VPC tutorial-vpc Subnet main
Step 2. Create Object Storage bucket
An Object Storage Bucket is required to create a Data Catalog database.
-
Go to KakaoCloud Console > Beyond Storage Service > Object Storage menu.
-
In the Bucket menu, click the [Create bucket] button on the right, and create a bucket as follows.
Classification Settings/Input Value Type STANDARD Name tutorial-bucket
Encryption Not used -
In the created bucket, click the [Create folder] button on the right, and create a folder as follows.
Classification Settings/Input Value Folder Name tutorial-folder
Step 3. Create Data Catalog database
A database is a container that stores tables. To create a database, you must first create a catalog and an Object Storage Bucket.
-
Go to KakaoCloud Console > Analytics > Data Catalog menu.
-
In the Database menu, click the [Create database] button on the right, and create a database as follows.
Category Item Setting/Input Value Catalog tutorial_catalog Name tutorial_database
Path Bucket Name tutorial-bucket
Directory tutorial-folder
Step 4. Create Data Catalog crawler
Create two crawlers to extract two MySQL metadata.
-
Go to KakaoCloud Console > Analytics > Data Catalog menu.
-
Click the [Create crawler] button on the right side of the Crawler menu, and create a crawler as follows.
tutorial_crawler_1
Category Item Setting/Input Value Database tutorial_database Crawler Name VPC tutorial_crawler_1 MySQL Full Path MySQL to Connect tutorial-mysql-1
MySQL Database Name kakao_shop
MySQL Account MySQL User ID {USER_ID}
MySQL Password {USER_PASSWORD}
Table Prefix tutorial_ Schedule On-Demand tutorial_crawler_2
Category Item Setting/Input Value Database tutorial_database Crawler Name VPC tutorial_crawler_2 MySQL Full Path MySQL to Connect tutorial-mysql-2
MySQL Database Name kakao_library
MySQL Account MySQL User ID {USER_ID}
MySQL Password {USER_PASSWORD}
Table Prefix tutorial_ Schedule On-Demand
A crawler can be created or run only when MySQL's status is Available
.
When creating a crawler, you must successfully test the connection to the MySQL account entry to create the crawler.
Step 5. Run Data Catalog crawler and check results
- Select a crawler from the Crawler menu, then click the Run button on the upper right.
- You can check the extracted metadata information from the Table menu.