Skip to main content

Data Catalog Type

This document explains how to call procedures for registering, deleting, and synchronizing partitions in Data Catalog-based tables.

Preparing a partitioned table

Create a table that includes partition columns and configure it to store data in an external S3 path.
For details, refer to the Create Table Guide.

Creating an empty partition

Creates an empty partition that exists only in the metastore and does not contain actual data.

Overview

CALL data_source_name.system.create_empty_partition(
schema_name => 'schema_name',
table_name => 'table_name',
partition_columns => ARRAY['partition_columns'],
partition_values => ARRAY['partition_values']);

Parameters

schema_name
  • The name of the database schema where the partition will be created.
table_name
  • The name of the table to which the partition will be added.
partition_columns
  • An array of column names that define the partition.
    These must match the columns specified in partitioned_by when creating the partitioned table.
partition_values
  • An array of values corresponding to the partition_columns.
    The number and order of values must exactly match the partition_columns.

Example

Create Empty Partition Example
CALL my_data_source.system.create_empty_partition( 
schema_name => 'tpch',
table_name => 'nation_test',
partition_columns => ARRAY['nationkey'],
partition_values => ARRAY['1']);

Register partition

Registers a specific partition in the metastore for a given path in S3.

Overview

CALL data_source_name.system.register_partition(
schema_name => 'schema_name',
table_name => 'table_name',
partition_columns => ARRAY['partition_columns'],
partition_values => ARRAY['partition_values'],
[ location => '<S3 path>' ]);

Parameters

[ location => 'S3 path' ]
  • Specifies the actual S3 path where the partition data is stored.
    If not specified, the default partition path (a subdirectory under the table location) will be automatically assigned.

Example

Register Partition Example
CALL my_data_source.system.register_partition(
schema_name => 'tpch',
table_name => 'nation_test',
partition_columns => ARRAY['nationkey', 'regionkey'],
partition_values => ARRAY['3', '3'],
location => 's3a://test-bucket/tpch/nation_test/nationkey=3/regionkey=3');

Partition deletion

Deletes specified partition metadata from the metastore.
Note: The actual data stored in S3 is not deleted.

Overview

CALL data_source_name.system.unregister_partition(
schema_name => 'schema_name',
table_name => 'table_name',
partition_columns => ARRAY['partition_columns'],
partition_values => ARRAY['partition_values']);

Example

Partition Deletion Example
CALL my_data_source.system.unregister_partition(
schema_name => 'tpch',
table_name => 'nation_test',
partition_columns => ARRAY['nationkey', 'regionkey'],
partition_values => ARRAY['2', '2']);

Partition Synchronization

Compares partition information between the S3 path and the metastore, and updates any differences (additions, deletions, or full sync).

Overview

CALL data_source_name.system.sync_partition_metadata(
schema_name => 'schema_name',
table_name => 'table_name',
mode => [ ADD | DROP | FULL ],
[ case_sensitive => [ TRUE | FALSE ] ]);

Parameters

MODE
  • Specifies the synchronization mode. Choose one of the following:
    • ADD: Adds partitions that exist in S3 but not in the metastore.
    • DROP: Removes partitions that exist in the metastore but not in S3.
    • FULL: Performs both ADD and DROP operations.
[ CASE_SENSITIVE ]
  • Determines whether to treat directory and partition names as case-sensitive.
    • TRUE: Default value. Case-sensitive comparison.
    • FALSE: Case-insensitive comparison.

Example

Partition Synchronization Example
CALL my_data_source.system.sync_partition_metadata(
schema_name => 'tpch',
table_name => 'nation_test',
mode => 'FULL',
case_sensitive => true);

Partition listing

You can query an internal system table to retrieve the list of partitions registered in a table.

Overview

SELECT * FROM "table_name$partitions";

Example

Partition listing Example
SELECT * FROM "nation_test$partitions";

Metadata Cache Flush

Flushes the cache of table or partition metadata stored in the metastore to reflect the latest state.

Overview

CALL data_source_name.system.flush_metadata_cache(
[ schema_name => 'schema_name' ],
[ table_name => 'table_name' ],
[ partition_columns => ARRAY['partition_columns'] ],
[ partition_values => ARRAY['partition_values'] ]);

Examples

-- Flush all metadata cache
CALL my_data_source.system.flush_metadata_cache();

-- Flush cache for a specific table
CALL my_data_source.system.flush_metadata_cache(
schema_name => 'tpch',
table_name => 'nation_test');

-- Flush cache for a specific partition
CALL my_data_source.system.flush_metadata_cache(
schema_name => 'tpch',
table_name => 'nation_test',
partition_columns => ARRAY['nationkey', 'regionkey'],
partition_values => ARRAY['2', '2']);