skills/cloud/gcp/dataflow-pipeline/SKILL.md
--- name: dataflow-pipeline description: Use when building Apache Beam pipelines on Google Cloud Dataflow — batch ETL, streaming, windowing, triggers, or Dataflow vs Dataproc decisions. Covers GCP-PDE domain: Ingest and process data (~25-30%). --- # Dataflow Pipeline ## When to Use - Building ETL pipelines (batch or streaming) on GCP - Choosing between Dataflow and Dataproc for a workload - Designing windowing or late-data handling for streaming - Preparing for GCP Professional Data Engineer e
npx skillsauth add kienbui1995/magic-powers skills/cloud/gcp/dataflow-pipelineInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
| Factor | Choose Dataflow | Choose Dataproc | |--------|----------------|-----------------| | Runtime | Apache Beam pipelines | Spark/Hadoop ecosystem | | Management | Fully managed, serverless | Cluster to manage (or autoscaling) | | Streaming | Native (Pub/Sub → BQ) | Spark Streaming (more complex) | | Existing code | Greenfield | Migrating existing Spark jobs | | Cost model | Per vCPU/memory/hour | Cluster uptime |
Core abstractions:
.withAllowedLateness(Duration.standardMinutes(10)).orFinally(), .repeatedly().withAllowedLateness()?.withAllowedLateness() to capture itDirectRunner = local testing; DataflowRunner = GCP executioncontent-media
Use when designing for XR (AR/VR/MR), choosing interaction modes, or adapting 2D UI patterns for spatial computing
testing
Use when creating new skills, editing existing skills, or verifying skills work before deployment
development
Use when you have a spec or requirements for a multi-step task, before touching code
development
Use when executing a structured workflow — select and run a feature, bugfix, refactor, research, or incident template with correct agent and model assignments per phase.