Data Catalog Type
This document explains how to call procedures for registering, deleting, and synchronizing partitions in Data Catalog-based tables.
Preparing a partitioned table
Create a table that includes partition columns and configure it to store data in an external S3 path.
For details, refer to the Create Table Guide.
Creating an empty partition
Creates an empty partition that exists only in the metastore and does not contain actual data.
Overview
CALL data_source_name.system.create_empty_partition(
schema_name => 'schema_name',
table_name => 'table_name',
partition_columns => ARRAY['partition_columns'],
partition_values => ARRAY['partition_values']);
Parameters
schema_name
- The name of the database schema where the partition will be created.
table_name
- The name of the table to which the partition will be added.
partition_columns
- An array of column names that define the partition.
These must match the columns specified inpartitioned_by
when creating the partitioned table.
partition_values
- An array of values corresponding to the
partition_columns
.
The number and order of values must exactly match thepartition_columns
.
Example
CALL my_data_source.system.create_empty_partition(
schema_name => 'tpch',
table_name => 'nation_test',
partition_columns => ARRAY['nationkey'],
partition_values => ARRAY['1']);
Register partition
Registers a specific partition in the metastore for a given path in S3.
Overview
CALL data_source_name.system.register_partition(
schema_name => 'schema_name',
table_name => 'table_name',
partition_columns => ARRAY['partition_columns'],
partition_values => ARRAY['partition_values'],
[ location => '<S3 path>' ]);
Parameters
[ location => 'S3 path' ]
- Specifies the actual S3 path where the partition data is stored.
If not specified, the default partition path (a subdirectory under the table location) will be automatically assigned.
Example
CALL my_data_source.system.register_partition(
schema_name => 'tpch',
table_name => 'nation_test',
partition_columns => ARRAY['nationkey', 'regionkey'],
partition_values => ARRAY['3', '3'],
location => 's3a://test-bucket/tpch/nation_test/nationkey=3/regionkey=3');
Partition deletion
Deletes specified partition metadata from the metastore.
Note: The actual data stored in S3 is not deleted.
Overview
CALL data_source_name.system.unregister_partition(
schema_name => 'schema_name',
table_name => 'table_name',
partition_columns => ARRAY['partition_columns'],
partition_values => ARRAY['partition_values']);
Example
CALL my_data_source.system.unregister_partition(
schema_name => 'tpch',
table_name => 'nation_test',
partition_columns => ARRAY['nationkey', 'regionkey'],
partition_values => ARRAY['2', '2']);
Partition Synchronization
Compares partition information between the S3 path and the metastore, and updates any differences (additions, deletions, or full sync).
Overview
CALL data_source_name.system.sync_partition_metadata(
schema_name => 'schema_name',
table_name => 'table_name',
mode => [ ADD | DROP | FULL ],
[ case_sensitive => [ TRUE | FALSE ] ]);
Parameters
MODE
- Specifies the synchronization mode. Choose one of the following:
ADD
: Adds partitions that exist in S3 but not in the metastore.DROP
: Removes partitions that exist in the metastore but not in S3.FULL
: Performs both ADD and DROP operations.
[ CASE_SENSITIVE ]
- Determines whether to treat directory and partition names as case-sensitive.
TRUE
: Default value. Case-sensitive comparison.FALSE
: Case-insensitive comparison.
Example
CALL my_data_source.system.sync_partition_metadata(
schema_name => 'tpch',
table_name => 'nation_test',
mode => 'FULL',
case_sensitive => true);
Partition listing
You can query an internal system table to retrieve the list of partitions registered in a table.
Overview
SELECT * FROM "table_name$partitions";
Example
SELECT * FROM "nation_test$partitions";
Metadata Cache Flush
Flushes the cache of table or partition metadata stored in the metastore to reflect the latest state.
Overview
CALL data_source_name.system.flush_metadata_cache(
[ schema_name => 'schema_name' ],
[ table_name => 'table_name' ],
[ partition_columns => ARRAY['partition_columns'] ],
[ partition_values => ARRAY['partition_values'] ]);
Examples
-- Flush all metadata cache
CALL my_data_source.system.flush_metadata_cache();
-- Flush cache for a specific table
CALL my_data_source.system.flush_metadata_cache(
schema_name => 'tpch',
table_name => 'nation_test');
-- Flush cache for a specific partition
CALL my_data_source.system.flush_metadata_cache(
schema_name => 'tpch',
table_name => 'nation_test',
partition_columns => ARRAY['nationkey', 'regionkey'],
partition_values => ARRAY['2', '2']);