Unity Catalog Overview
Why This Matters
The Problem: Without centralized governance, organizations face data silos, inconsistent permissions, no audit trail, and no visibility into how data flows through their platform.
The Solution: Unity Catalog provides a single governance layer across all Databricks workspaces -- unified permissions, data lineage, auditing, and data discovery in one place.
Real Impact: Unity Catalog enables organizations to comply with GDPR, HIPAA, and SOC 2 while giving data teams self-service access to the data they need.
Real-World Analogy
Think of Unity Catalog as a library catalog system:
- Metastore = The entire library system (spans multiple branches/workspaces)
- Catalog = A section of the library (e.g., "Science", "History")
- Schema = A shelf within a section (e.g., "Physics", "Biology")
- Table/View = An individual book on the shelf
- Permissions = Library cards that control who can access what
Three-Level Namespace
Unity Catalog organizes all data assets into a three-level hierarchy: catalog.schema.table. This replaces the old Hive metastore's two-level database.table pattern and adds a top-level catalog for multi-environment or multi-team organization.
-- Create a catalog
CREATE CATALOG IF NOT EXISTS production;
-- Create schemas within the catalog
CREATE SCHEMA IF NOT EXISTS production.sales;
CREATE SCHEMA IF NOT EXISTS production.analytics;
CREATE SCHEMA IF NOT EXISTS production.ml_features;
-- Create a managed table
CREATE TABLE production.sales.orders (
order_id BIGINT,
customer_id BIGINT,
product STRING,
amount DECIMAL(12,2),
order_date DATE
) USING DELTA
COMMENT 'Production order data from e-commerce platform';
-- Create a view
CREATE VIEW production.analytics.daily_revenue AS
SELECT order_date, SUM(amount) AS revenue
FROM production.sales.orders
GROUP BY order_date;
-- Set default catalog and schema
USE CATALOG production;
USE SCHEMA sales;
-- Now you can reference just by table name
SELECT * FROM orders LIMIT 10;
GRANT & REVOKE Permissions
Unity Catalog uses a hierarchical permission model. Permissions granted at the catalog level cascade to all schemas and tables within it. You can grant privileges to users, groups, or service principals.
-- Grant catalog-level permissions
GRANT USE CATALOG ON CATALOG production TO `[email protected]`;
GRANT CREATE SCHEMA ON CATALOG production TO `[email protected]`;
-- Grant schema-level permissions
GRANT USE SCHEMA ON SCHEMA production.sales TO `[email protected]`;
GRANT SELECT ON SCHEMA production.sales TO `[email protected]`;
-- Grant table-level permissions
GRANT SELECT ON TABLE production.sales.orders TO `[email protected]`;
GRANT MODIFY ON TABLE production.sales.orders TO `etl-service-principal`;
GRANT ALL PRIVILEGES ON TABLE production.sales.orders TO `[email protected]`;
-- Revoke permissions
REVOKE SELECT ON TABLE production.sales.orders FROM `[email protected]`;
-- View grants on a table
SHOW GRANTS ON TABLE production.sales.orders;
-- View grants for a specific principal
SHOW GRANTS TO `[email protected]`;
-- Row-level security with row filters
ALTER TABLE production.sales.orders
SET ROW FILTER region_filter ON (region);
-- Column masking for sensitive data
ALTER TABLE production.sales.customers
ALTER COLUMN email SET MASK mask_email;
| Privilege | Applies To | Description |
|---|---|---|
| USE CATALOG | Catalog | Access objects within the catalog |
| USE SCHEMA | Schema | Access objects within the schema |
| SELECT | Table, View | Read data from the object |
| MODIFY | Table | INSERT, UPDATE, DELETE, MERGE |
| CREATE TABLE | Schema | Create new tables in the schema |
| CREATE SCHEMA | Catalog | Create new schemas in the catalog |
| ALL PRIVILEGES | Any | Grant all applicable permissions |
| OWNERSHIP | Any | Full control including GRANT/REVOKE |
Data Lineage
Unity Catalog automatically tracks column-level lineage -- how data flows from source tables through transformations to downstream tables and dashboards. This is critical for impact analysis, debugging, and compliance.
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
# Get lineage for a specific table
lineage = w.data_lineage.get_table_lineage(
table_name="production.analytics.daily_revenue"
)
# Show upstream dependencies
print("Upstream tables:")
for dep in lineage.upstream_tables:
print(f" - {dep.catalog_name}.{dep.schema_name}.{dep.name}")
# Show downstream consumers
print("Downstream tables:")
for dep in lineage.downstream_tables:
print(f" - {dep.catalog_name}.{dep.schema_name}.{dep.name}")
# Column-level lineage
col_lineage = w.data_lineage.get_column_lineage(
table_name="production.analytics.daily_revenue",
column_name="revenue"
)
for col in col_lineage.upstream_columns:
print(f" revenue derived from: {col.table_name}.{col.name}")
External Locations
External locations connect Unity Catalog to cloud storage paths outside the managed metastore storage. This lets you govern data that already exists in S3, ADLS, or GCS without moving it.
-- Create a storage credential (links to cloud IAM)
CREATE STORAGE CREDENTIAL my_adls_credential
WITH MANAGED IDENTITY '<managed-identity-id>';
-- Create an external location
CREATE EXTERNAL LOCATION raw_data_lake
URL 'abfss://[email protected]/'
WITH (STORAGE CREDENTIAL my_adls_credential)
COMMENT 'Raw data lake in ADLS Gen2';
-- Grant access to the external location
GRANT READ FILES, WRITE FILES ON EXTERNAL LOCATION raw_data_lake
TO `[email protected]`;
-- Create an external table pointing to existing data
CREATE TABLE production.sales.legacy_orders
USING DELTA
LOCATION 'abfss://[email protected]/orders/';
-- List external locations
SHOW EXTERNAL LOCATIONS;
Delta Sharing
Delta Sharing is an open protocol for secure, real-time data sharing across organizations, clouds, and platforms -- without copying data. Recipients can be Databricks users or use any client that supports the protocol (pandas, Spark, Power BI, Tableau).
-- Create a share (a named collection of tables)
CREATE SHARE customer_analytics;
-- Add tables to the share
ALTER SHARE customer_analytics
ADD TABLE production.analytics.daily_revenue;
ALTER SHARE customer_analytics
ADD TABLE production.sales.orders
PARTITION (order_date >= '2024-01-01'); -- Share only recent data
-- Create a recipient (external organization)
CREATE RECIPIENT partner_company
USING ID 'partner-sharing-id';
-- Grant the share to the recipient
GRANT SELECT ON SHARE customer_analytics TO RECIPIENT partner_company;
-- View share details
SHOW ALL IN SHARE customer_analytics;
-- As a recipient: create a catalog from a share
CREATE CATALOG partner_data
USING SHARE provider_org.customer_analytics;
Practice Problems
Problem 1: Design a Multi-Team Governance Model
MediumYour company has three teams: data engineering (builds pipelines), data science (trains models), and analytics (builds dashboards). Design a Unity Catalog structure with appropriate catalogs, schemas, and permissions so that: (1) Data engineers can write to all schemas, (2) Data scientists can read all data and write to ML feature tables, (3) Analysts can only read from curated gold tables.
Problem 2: Set Up Cross-Organization Data Sharing
MediumYour company needs to share daily revenue data with a partner organization. Set up Delta Sharing so that the partner can only see revenue data from the last 90 days, with no access to customer PII.
Problem 3: Audit and Lineage Investigation
HardYour compliance team reports that customer PII might be exposed in a downstream analytics dashboard. Use Unity Catalog lineage and audit logs to: (1) Trace which tables feed into the dashboard, (2) Identify if any PII columns are included, (3) Add column masking to protect the data.
Quick Reference
| Concept | Command | Key Detail |
|---|---|---|
| Create Catalog | CREATE CATALOG name | Top-level container for schemas |
| Create Schema | CREATE SCHEMA catalog.name | Container for tables and views |
| Grant Permissions | GRANT priv ON obj TO principal | Cascading permission model |
| Revoke Permissions | REVOKE priv ON obj FROM principal | Remove specific access |
| External Location | CREATE EXTERNAL LOCATION | Connect cloud storage to UC |
| Delta Sharing | CREATE SHARE + GRANT TO RECIPIENT | Cross-org data sharing |
| Column Masking | ALTER COLUMN SET MASK func | Dynamic data masking |
| Row Filters | ALTER TABLE SET ROW FILTER | Row-level security |