Overview
Why This Matters
Build a compliance platform on DataHub: automate PII discovery with classification tags, track PII flow via lineage, enforce access policies, and generate audit reports. Covers GDPR Right to Erasure and CCPA data mapping.
Core Concepts
Project: Data Compliance Platform is a critical capability in DataHub's metadata platform. Understanding the core concepts helps you implement effective metadata management.
Configuration
DataHub provides both UI-based and API-based configuration for project: data compliance platform. Most settings can be managed through the admin panel or programmatically via GraphQL.
Integration
Works seamlessly with DataHub's ingestion framework, search index, and event system. Changes are automatically propagated across the platform.
Automation
Leverage DataHub Actions to automate project: data compliance platform workflows. Trigger actions on metadata changes, schedule periodic checks, and integrate with external systems.
Monitoring
Track usage and effectiveness through DataHub's analytics. Monitor adoption metrics, coverage, and compliance with organizational standards.
How It Works
# Configure project: data compliance platform via DataHub CLI
datahub put --urn "urn:li:dataset:(...)" \
--aspect "datasetProperties" \
-d '{"description": "Configured via CLI"}'
# Or via Python SDK
from datahub.emitter.rest_emitter import DatahubRestEmitter
emitter = DatahubRestEmitter("http://localhost:8080")
# Emit metadata for project: data compliance platform
emitter.emit_mcp(
entity_urn="urn:li:dataset:(...)",
aspect_name="datasetProperties",
aspect_value=DatasetPropertiesClass(
description="Updated via SDK"
)
)Architecture Integration
When project: data compliance platform metadata is updated, DataHub emits a Metadata Change Event (MCE) to Kafka. Downstream consumers update the search index (Elasticsearch) and graph index, ensuring all views stay consistent in near real-time.
Hands-On Tutorial
# Step 1: Verify DataHub is running
curl -s http://localhost:8080/config | python3 -m json.tool
# Step 2: Configure project: data compliance platform via GraphQL
curl -X POST http://localhost:8080/api/graphql \
-H "Content-Type: application/json" \
-d '{"query": "mutation { updateDataset(urn: \"urn:li:dataset:(...)\" input: {}) }"}'
# Step 3: Verify in the UI
# Navigate to http://localhost:9002 and check the entity pageBest Practices
- Start small: Begin with your most critical data assets and expand
- Automate: Use ingestion recipes and Actions for consistency
- Measure: Track coverage and adoption metrics weekly
- Iterate: Gather feedback from data consumers and improve
- Document: Maintain runbooks for common project: data compliance platform operations
Practice Problems
Practice 1
Design a project: data compliance platform strategy for a data team with 500 datasets across 8 databases. What do you prioritize? How do you measure success?
Practice 2
A new data engineer joins your team and needs to understand project: data compliance platform in DataHub. Create a 30-minute onboarding guide covering the essentials.
Practice 3
Your organization's project: data compliance platform adoption is at 30% after 3 months. Identify potential blockers and design an adoption acceleration plan.
Quick Reference
| Feature | Access | Notes |
|---|---|---|
| UI Configuration | Settings → Project: Data Compliance Platform | Point-and-click setup |
| GraphQL API | POST /api/graphql | Programmatic access |
| Python SDK | pip install acryl-datahub | High-level client |
| CLI | datahub put / datahub get | Command-line operations |
| Actions | Event-driven triggers | Automation framework |