skills/data-lineage-tracker/SKILL.md
OpenLineage, DataHub, Marquez for data lineage tracking and impact analysis. Activate on: data lineage, OpenLineage, DataHub, Marquez, impact analysis, data catalog, column lineage, data discovery. NOT for: data quality validation (use data-quality-guardian), dbt documentation (use dbt-analytics-engineer).
npx skillsauth add curiositech/windags-skills data-lineage-trackerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Implement end-to-end data lineage tracking using OpenLineage, DataHub, and Marquez for impact analysis, debugging, and compliance.
Activate on: "data lineage", "OpenLineage", "DataHub", "Marquez", "impact analysis", "data catalog", "column lineage", "data discovery", "data provenance", "downstream impact"
NOT for: Data quality checks → data-quality-guardian | dbt model documentation → dbt-analytics-engineer | Schema evolution → schema-evolution-manager
| Domain | Technologies | |--------|-------------| | Lineage Standard | OpenLineage 1.x (open standard, LFAI) | | Catalogs | DataHub 0.14+, OpenMetadata 1.5+, Amundsen | | Lineage Backend | Marquez (OpenLineage reference), DataHub lineage | | Integrations | Airflow OpenLineage, Spark OpenLineage, dbt, Great Expectations | | Visualization | DataHub lineage graph, Marquez UI, dbt docs |
Airflow Task Spark Job dbt Model
│ │ │
├─ START event ──→ ├─ START event ──→ ├─ START event ──→
│ │ │
├─ RUNNING event ─→ ├─ RUNNING event ─→ │
│ │ │
├─ COMPLETE event ─→ ├─ COMPLETE event ─→ ├─ COMPLETE event ─→
│ (with I/O datasets) │ (with I/O datasets) │ (with I/O datasets)
↓ ↓ ↓
┌────────────────────────────────────┐
│ OpenLineage Backend │
│ (Marquez / DataHub / custom) │
│ │
│ Stores: job runs, datasets, │
│ input/output relationships, │
│ column-level lineage │
└────────────────────────────────────┘
# DataHub GraphQL: find all downstream dependencies of a dataset
query = """
{
dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:snowflake,prod.fct_orders,PROD)") {
downstream: relationships(
input: { types: ["DownstreamOf"], direction: INCOMING, count: 50 }
) {
relationships {
entity {
urn
... on Dataset {
name
platform { name }
properties { description }
}
}
}
}
}
}
"""
# Result: all dashboards, models, and exports that depend on fct_orders
# Use this BEFORE making schema changes
Source Tables Transformation Target Table
───────────── ────────────── ────────────
raw_payments.amount ──→ SUM(amount) ──→ fct_revenue.total_revenue
raw_payments.currency ──→ exchange_rate_convert ──→ fct_revenue.total_revenue_usd
raw_orders.order_id ──→ JOIN key ──→ fct_revenue.order_id
raw_customers.name ──→ COALESCE(name, email) ──→ fct_revenue.customer_name
Column-level lineage answers:
"Where does fct_revenue.total_revenue_usd come from?"
→ raw_payments.amount + raw_payments.currency via exchange_rate_convert
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.