GraphQL & REST APIs

Medium 25 min read

Overview

API-First Platform

DataHub is built API-first. Everything you can do in the UI can be done via the GraphQL API (primary) or REST API (OpenAPI). This enables automation, CI/CD integration, and custom tooling.

Core Concepts

GraphQL API

Primary API for querying and mutating metadata. Supports search, entity CRUD, lineage traversal. Available at /api/graphql.

REST (OpenAPI)

RESTful endpoints for entity operations. Swagger docs at /openapi/swagger-ui.

Python SDK

High-level Python client wrapping both APIs. Install via pip install acryl-datahub.

Authentication

Token-based auth (PATs) and OIDC. Tokens scoped to user permissions.

How It Works

GraphQL Queries
# Search for datasets
query { search(input: { type: DATASET, query: "revenue", start: 0, count: 10 }) {
    total searchResults { entity { urn type ... on Dataset { name } } }
} }

# Get dataset with lineage
query { dataset(urn: "urn:li:dataset:(...)") {
    name properties { description }
    ownership { owners { owner { urn } } }
    lineage(input: { direction: UPSTREAM, count: 10 }) { relationships { entity { urn } } }
} }

# Add a tag
mutation { addTag(input: { tagUrn: "urn:li:tag:PII", resourceUrn: "urn:li:dataset:(...)" }) }

Hands-On Tutorial

Python SDK
from datahub.ingestion.graph.client import DataHubGraph
graph = DataHubGraph(config={"server": "http://localhost:8080"})
results = graph.execute_graphql("{ search(input: {type: DATASET, query: \"revenue\"}) { total } }")
print(results)

Best Practices

Practice Problems

Practice 1

Write a script that finds all datasets without an owner and notifies via Slack.

Quick Reference

EndpointMethodPurpose
/api/graphqlPOSTAll metadata operations
/openapi/v2/entityGET/POSTREST CRUD