skills/streaming-pipeline-architect/SKILL.md
Kafka Streams, Flink, Spark Streaming, and CDC for real-time data pipelines. Activate on: streaming, Kafka Streams, Flink, Spark Streaming, CDC, Debezium, real-time pipeline, event stream processing. NOT for: message broker setup (use event-driven-architecture-expert), batch processing (use batch-processing-optimizer).
npx skillsauth add curiositech/windags-skills streaming-pipeline-architectInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Design and build real-time data pipelines using Kafka Streams, Apache Flink, Spark Structured Streaming, and Change Data Capture.
Activate on: "streaming pipeline", "Kafka Streams", "Flink", "Spark Streaming", "CDC", "Debezium", "real-time pipeline", "event stream processing", "stream-table join", "windowed aggregation"
NOT for: Message broker configuration → event-driven-architecture-expert | Batch ETL optimization → batch-processing-optimizer | Data warehouse loading → data-warehouse-optimizer
| Domain | Technologies | |--------|-------------| | Stream Processing | Apache Flink 1.20+, Kafka Streams 3.8+, Spark Structured Streaming | | CDC | Debezium 2.7+, Fivetran, Airbyte, Maxwell | | Managed | Confluent Cloud, AWS Kinesis, GCP Dataflow | | Connectors | Kafka Connect, Flink CDC connectors, Spark connectors | | State | RocksDB (Flink/Kafka Streams), Delta Lake checkpoints |
PostgreSQL Debezium Kafka Stream Processor
┌──────────┐ ┌─────────────────┐ ┌──────────┐ ┌─────────────────┐
│ WAL │────→│ Debezium │────→│ Topics │────→│ Flink / KS │
│ (logical │ │ (Kafka Connect)│ │ per │ │ - Enrich │
│ repl.) │ │ │ │ table │ │ - Aggregate │
└──────────┘ └─────────────────┘ └──────────┘ │ - Transform │
└────────┬────────┘
│
┌────────────────────────┤
↓ ↓
Elasticsearch Snowflake / Delta
(search index) (analytics)
-- Flink SQL: 5-minute tumbling window revenue aggregation
CREATE TABLE orders (
order_id STRING,
amount DECIMAL(10,2),
store_id STRING,
event_time TIMESTAMP(3),
WATERMARK FOR event_time AS event_time - INTERVAL '10' SECOND
) WITH (
'connector' = 'kafka',
'topic' = 'orders',
'format' = 'json'
);
SELECT
store_id,
TUMBLE_START(event_time, INTERVAL '5' MINUTE) AS window_start,
COUNT(*) AS order_count,
SUM(amount) AS total_revenue
FROM orders
GROUP BY
store_id,
TUMBLE(event_time, INTERVAL '5' MINUTE);
Source Topic: raw-events
↓
Filter (discard invalid)
↓
Map (normalize schema)
↓
Branch ──→ [high-priority] → enrich → Priority Topic
│
└──→ [standard] → aggregate(5min window) → Metrics Topic
KafkaStreams topology = builder.build();
topology.describe(); // prints processing graph
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.