Using Information Schema in Unity Catalog

Using Information Schema in Unity Catalog

ยท

2 min read

In this article, we will take a quick look at the information schema and some of the analytics that can be done using it. For reference please note the hierarchy of objects in the Unity Catalog:

  • Metastore: top-level container for metadata and exposes a three-level namespace ( catalog.schema.table ) to organize your data.

  • Catalog: the first object layer in the metastore. A metastore can have many catalogs.

  • Schema: the second layer in the metastore. A catalog can have many schemas.

  • Table: the last layer in the namespace. Tables can be external or managed and a schema can have many tables.

The information schema is provided with each catalog in the metastore, with the exception to the hive_metastore catalog. Each information_schema provided will contain information for the catalog that it is associated with.

Some of the handy Examples I have used recently. Please note the queries can be used in the notebooks and Databricks SQL Warehouse (previously know as SQL Endpoint)

List the table details in the catalog

SELECT table_catalog, table_schema, table_name, table_type,created_by 
FROM INFORMATION_SCHEMA.tables 
WHERE table_schema = 'your_catalog_name';

Retrieve information about the columns in a specific table

SELECT column_name, data_type
FROM INFORMATION_SCHEMA.columns
WHERE table_schema = 'your_catalog_name'
AND table_name = 'your_table_name';

Scan the schema to identify tables with specific column names

SET var.col_string = your_column_name ;-- use the sql variable in a like statement 
SELECT * FROM columns WHERE column_name LIKE '%${var.col_string}%'

Table Relationships

You can filter the same using the where clause to identify relationships to the primary table.


SELECT DISTINCT p.table_catalog as ptable_catalog -- table catalog
  , p.table_schema as ptable_schemav --table schema
  , p.table_name as ptable_name --primary table name
  , s.table_schema as stable_schema --schema
  , s.table_name as stable_name -- secondary table name
  , rc.constraint_name -- foreign key
  , pc.column_name as constraint_column_name
  , rc.unique_constraint_name -- primary key
  , sc.column_name as unique_constraint_column_name
FROM referential_constraints rc 
INNER JOIN table_constraints p on p.constraint_name =  rc.constraint_name 
INNER JOIN table_constraints s on s.constraint_name =  rc.unique_constraint_name 
INNER JOIN constraint_column_usage pc on pc.constraint_name = rc.constraint_name 
INNER JOIN constraint_column_usage sc on sc.constraint_name = rc.unique_constraint_name 
;

Remember to replace 'your_catalog_name' , 'your_table_name' and 'your_column_name' with the actual names you're interested in. The INFORMATION_SCHEMA is a valuable tool for understanding the structure and metadata of your database objects within the Unity Catalog, making it easier to work with and analyze your data.

You can refer to the Entity relationship diagram for the tables in the Information Schema in the following article Databricks SQL Information Schema. Using this you can use relevant tables to make more queries as per your requirement.

Did you find this article valuable?

Support Nitin Khattar by becoming a sponsor. Any amount is appreciated!

ย