skills/lakehouse-architect/SKILL.md
Delta Lake, Apache Iceberg, Hudi for ACID transactions on object storage. Activate on: lakehouse, Delta Lake, Iceberg, Hudi, table format, ACID on S3, time travel, data lake, open table format. NOT for: warehouse query tuning (use data-warehouse-optimizer), streaming ingestion (use streaming-pipeline-architect).
npx skillsauth add curiositech/windags-skills lakehouse-architectInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Design data lakehouse architectures using Delta Lake, Apache Iceberg, or Apache Hudi for ACID transactions, time travel, and schema evolution on object storage.
Activate on: "lakehouse", "Delta Lake", "Apache Iceberg", "Hudi", "table format", "ACID on S3", "time travel", "data lake", "open table format", "Databricks", "catalog"
NOT for: Warehouse query tuning → data-warehouse-optimizer | Streaming pipeline design → streaming-pipeline-architect | Data quality rules → data-quality-guardian
| Domain | Technologies | |--------|-------------| | Table Formats | Apache Iceberg 1.7+, Delta Lake 3.x, Apache Hudi 1.x | | Compute | Spark 3.5+, Trino, DuckDB, Snowflake (Iceberg), Flink | | Catalogs | Unity Catalog, AWS Glue, Nessie, REST Catalog, Polaris | | Storage | S3, GCS, ADLS, MinIO | | Managed | Databricks Lakehouse, AWS Lake Formation, Snowflake Iceberg |
Bronze (Raw) Silver (Cleaned) Gold (Business)
───────────── ──────────────── ───────────────
Raw JSON/CSV/Parquet → Typed, deduplicated → Aggregated, modeled
Append-only → Merge/upsert → Materialized views
Schema-on-read → Schema enforced → Star schema
Full history → Latest + SCD Type 2 → Pre-aggregated metrics
Storage: S3/GCS All layers use Iceberg/Delta
Format: Parquet ACID transactions at each layer
-- Create Iceberg table with partitioning
CREATE TABLE catalog.silver.orders (
order_id STRING,
customer_id STRING,
amount DECIMAL(10,2),
status STRING,
order_date DATE,
_loaded_at TIMESTAMP
)
USING iceberg
PARTITIONED BY (days(order_date))
TBLPROPERTIES (
'write.metadata.delete-after-commit.enabled' = 'true',
'write.metadata.previous-versions-max' = '100'
);
-- Upsert (merge) new data
MERGE INTO catalog.silver.orders t
USING staging.new_orders s
ON t.order_id = s.order_id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *;
-- Time travel: query as of yesterday
SELECT * FROM catalog.silver.orders
FOR SYSTEM_TIME AS OF TIMESTAMP '2026-03-19 00:00:00';
-- Maintenance: compact small files
CALL catalog.system.rewrite_data_files('silver.orders');
CALL catalog.system.expire_snapshots('silver.orders', TIMESTAMP '2026-03-01');
Feature Iceberg Delta Lake Hudi
────────── ─────── ────────── ────
ACID transactions Yes Yes Yes
Time travel Yes Yes Yes
Schema evolution Full Full Full
Partition evolution Yes (hidden) No (requires Limited
rewrite)
Engine support Widest Spark/Databricks Spark/Flink
Catalog REST/Nessie Unity/Hive Hive
Community Apache Linux Foundation Apache
Best for Multi-engine Databricks users CDC workloads
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.