Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

desenyon/senior-data-engineer

Name: senior-data-engineer
Author: desenyon

.github/skills/engineering-team/senior-data-engineer/SKILL.md

npx skillsauth add desenyon/infinitecontex senior-data-engineer

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Senior Data Engineer

Production-grade data engineering skill for building scalable, reliable data systems.

Trigger Phrases
Quick Start
Workflows
- Building a Batch ETL Pipeline
- Implementing Real-Time Streaming
- Data Quality Framework Setup
Architecture Decision Framework
Tech Stack
Reference Documentation
Troubleshooting

Trigger Phrases

Activate this skill when you see:

Pipeline Design:

"Design a data pipeline for..."
"Build an ETL/ELT process..."
"How should I ingest data from..."
"Set up data extraction from..."

Architecture:

"Should I use batch or streaming?"
"Lambda vs Kappa architecture"
"How to handle late-arriving data"
"Design a data lakehouse"

Data Modeling:

"Create a dimensional model..."
"Star schema vs snowflake"
"Implement slowly changing dimensions"
"Design a data vault"

Data Quality:

"Add data validation to..."
"Set up data quality checks"
"Monitor data freshness"
"Implement data contracts"

Performance:

"Optimize this Spark job"
"Query is running slow"
"Reduce pipeline execution time"
"Tune Airflow DAG"

Quick Start

Core Tools

# Generate pipeline orchestration config
python scripts/pipeline_orchestrator.py generate \
  --type airflow \
  --source postgres \
  --destination snowflake \
  --schedule "0 5 * * *"

# Validate data quality
python scripts/data_quality_validator.py validate \
  --input data/sales.parquet \
  --schema schemas/sales.json \
  --checks freshness,completeness,uniqueness

# Optimize ETL performance
python scripts/etl_performance_optimizer.py analyze \
  --query queries/daily_aggregation.sql \
  --engine spark \
  --recommend

Workflows

→ See references/workflows.md for details

Architecture Decision Framework

Use this framework to choose the right approach for your data pipeline.

Batch vs Streaming

| Criteria | Batch | Streaming | |----------|-------|-----------| | Latency requirement | Hours to days | Seconds to minutes | | Data volume | Large historical datasets | Continuous event streams | | Processing complexity | Complex transformations, ML | Simple aggregations, filtering | | Cost sensitivity | More cost-effective | Higher infrastructure cost | | Error handling | Easier to reprocess | Requires careful design |

Decision Tree:

Is real-time insight required?
├── Yes → Use streaming
│   └── Is exactly-once semantics needed?
│       ├── Yes → Kafka + Flink/Spark Structured Streaming
│       └── No → Kafka + consumer groups
└── No → Use batch
    └── Is data volume > 1TB daily?
        ├── Yes → Spark/Databricks
        └── No → dbt + warehouse compute

Lambda vs Kappa Architecture

| Aspect | Lambda | Kappa | |--------|--------|-------| | Complexity | Two codebases (batch + stream) | Single codebase | | Maintenance | Higher (sync batch/stream logic) | Lower | | Reprocessing | Native batch layer | Replay from source | | Use case | ML training + real-time serving | Pure event-driven |

When to choose Lambda:

Need to train ML models on historical data
Complex batch transformations not feasible in streaming
Existing batch infrastructure

When to choose Kappa:

Event-sourced architecture
All processing can be expressed as stream operations
Starting fresh without legacy systems

Data Warehouse vs Data Lakehouse

| Feature | Warehouse (Snowflake/BigQuery) | Lakehouse (Delta/Iceberg) | |---------|-------------------------------|---------------------------| | Best for | BI, SQL analytics | ML, unstructured data | | Storage cost | Higher (proprietary format) | Lower (open formats) | | Flexibility | Schema-on-write | Schema-on-read | | Performance | Excellent for SQL | Good, improving | | Ecosystem | Mature BI tools | Growing ML tooling |

Tech Stack

| Category | Technologies | |----------|--------------| | Languages | Python, SQL, Scala | | Orchestration | Airflow, Prefect, Dagster | | Transformation | dbt, Spark, Flink | | Streaming | Kafka, Kinesis, Pub/Sub | | Storage | S3, GCS, Delta Lake, Iceberg | | Warehouses | Snowflake, BigQuery, Redshift, Databricks | | Quality | Great Expectations, dbt tests, Monte Carlo | | Monitoring | Prometheus, Grafana, Datadog |

Reference Documentation

1. Data Pipeline Architecture

See references/data_pipeline_architecture.md for:

Lambda vs Kappa architecture patterns
Batch processing with Spark and Airflow
Stream processing with Kafka and Flink
Exactly-once semantics implementation
Error handling and dead letter queues

2. Data Modeling Patterns

See references/data_modeling_patterns.md for:

Dimensional modeling (Star/Snowflake)
Slowly Changing Dimensions (SCD Types 1-6)
Data Vault modeling
dbt best practices
Partitioning and clustering

3. DataOps Best Practices

See references/dataops_best_practices.md for:

Data testing frameworks
Data contracts and schema validation
CI/CD for data pipelines
Observability and lineage
Incident response

Troubleshooting

→ See references/troubleshooting.md for details

desenyon/senior-data-engineer

.github/skills/engineering-team/senior-data-engineer/SKILL.md

Data engineering skill for building scalable data pipelines, ETL/ELT systems, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka, and modern data stack. Includes data modeling, pipeline orchestration, data quality, and DataOps. Use when designing data architectures, building data pipelines, optimizing data workflows, implementing data governance, or troubleshooting data issues.

development

Updated Apr 4, 2026

$ install --global

skillsauth

npx skillsauth add desenyon/infinitecontex senior-data-engineer

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 24, 2026, 7:50 AM4.0s1 file scanned

SKILL.md

name:: senior-data-engineer
description:: Data engineering skill for building scalable data pipelines, ETL/ELT systems, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka, and modern data stack. Includes data modeling, pipeline orchestration, data quality, and DataOps. Use when designing data architectures, building data pipelines, optimizing data workflows, implementing data governance, or troubleshooting data issues.

Senior Data Engineer

Production-grade data engineering skill for building scalable, reliable data systems.

Trigger Phrases
Quick Start
Workflows
- Building a Batch ETL Pipeline
- Implementing Real-Time Streaming
- Data Quality Framework Setup
Architecture Decision Framework
Tech Stack
Reference Documentation
Troubleshooting

Trigger Phrases

Activate this skill when you see:

Pipeline Design:

"Design a data pipeline for..."
"Build an ETL/ELT process..."
"How should I ingest data from..."
"Set up data extraction from..."

Architecture:

"Should I use batch or streaming?"
"Lambda vs Kappa architecture"
"How to handle late-arriving data"
"Design a data lakehouse"

Data Modeling:

"Create a dimensional model..."
"Star schema vs snowflake"
"Implement slowly changing dimensions"
"Design a data vault"

Data Quality:

"Add data validation to..."
"Set up data quality checks"
"Monitor data freshness"
"Implement data contracts"

Performance:

"Optimize this Spark job"
"Query is running slow"
"Reduce pipeline execution time"
"Tune Airflow DAG"

Quick Start

Core Tools

# Generate pipeline orchestration config
python scripts/pipeline_orchestrator.py generate \
  --type airflow \
  --source postgres \
  --destination snowflake \
  --schedule "0 5 * * *"

# Validate data quality
python scripts/data_quality_validator.py validate \
  --input data/sales.parquet \
  --schema schemas/sales.json \
  --checks freshness,completeness,uniqueness

# Optimize ETL performance
python scripts/etl_performance_optimizer.py analyze \
  --query queries/daily_aggregation.sql \
  --engine spark \
  --recommend

Workflows

→ See references/workflows.md for details

Architecture Decision Framework

Use this framework to choose the right approach for your data pipeline.

Batch vs Streaming

Decision Tree:

Is real-time insight required?
├── Yes → Use streaming
│   └── Is exactly-once semantics needed?
│       ├── Yes → Kafka + Flink/Spark Structured Streaming
│       └── No → Kafka + consumer groups
└── No → Use batch
    └── Is data volume > 1TB daily?
        ├── Yes → Spark/Databricks
        └── No → dbt + warehouse compute

Lambda vs Kappa Architecture

When to choose Lambda:

Need to train ML models on historical data
Complex batch transformations not feasible in streaming
Existing batch infrastructure

When to choose Kappa:

Event-sourced architecture
All processing can be expressed as stream operations
Starting fresh without legacy systems

Data Warehouse vs Data Lakehouse

Tech Stack

Reference Documentation

1. Data Pipeline Architecture

See references/data_pipeline_architecture.md for:

Lambda vs Kappa architecture patterns
Batch processing with Spark and Airflow
Stream processing with Kafka and Flink
Exactly-once semantics implementation
Error handling and dead letter queues

2. Data Modeling Patterns

See references/data_modeling_patterns.md for:

Dimensional modeling (Star/Snowflake)
Slowly Changing Dimensions (SCD Types 1-6)
Data Vault modeling
dbt best practices
Partitioning and clustering

3. DataOps Best Practices

See references/dataops_best_practices.md for:

Data testing frameworks
Data contracts and schema validation
CI/CD for data pipelines
Observability and lineage
Incident response

Troubleshooting

→ See references/troubleshooting.md for details

Related Skills

desenyon/form-cro

testing

VerifiedTrustedCommunity

When the user wants to optimize any form that is NOT signup/registration — including lead capture forms, contact forms, demo request forms, application forms, survey forms, or checkout forms. Also use when the user mentions "form optimization," "lead form conversions," "form friction," "form fields," "form completion rate," or "contact form." For signup/registration forms, see signup-flow-cro. For popups containing forms, see popup-cro.

SKILL.mdUpdated Apr 4, 2026

desenyon/financial-analyst

development

VerifiedTrustedCommunity

Performs financial ratio analysis, DCF valuation, budget variance analysis, and rolling forecast construction for strategic decision-making. Use when analyzing financial statements, building valuation models, assessing budget variances, or constructing financial projections and forecasts. Also applicable when users mention financial modeling, cash flow analysis, company valuation, financial projections, or spreadsheet analysis.

SKILL.mdUpdated Apr 4, 2026

desenyon/financial-analyst

desenyon/saas-metrics-coach

testing

VerifiedTrustedCommunity

SaaS financial health advisor. Use when a user shares revenue or customer numbers, or mentions ARR, MRR, churn, LTV, CAC, NRR, or asks how their SaaS business is doing.

SKILL.mdUpdated Apr 4, 2026

desenyon/saas-metrics-coach

desenyon/financial-analyst

development

VerifiedTrustedCommunity

SKILL.mdUpdated Apr 4, 2026

desenyon/financial-analyst

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/desenyon/infinitecontex.git

# Copy into Claude Code skills folder (global)
cp -r infinitecontex/.github/skills/engineering-team/senior-data-engineer ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

desenyon/infinitecontex

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

desenyon/senior-data-engineer

$ install --global

Security Scan Results

SKILL.md

Senior Data Engineer

Table of Contents

Trigger Phrases

Quick Start

Core Tools

Workflows

Architecture Decision Framework

Batch vs Streaming

Lambda vs Kappa Architecture

Data Warehouse vs Data Lakehouse

Tech Stack

Reference Documentation

1. Data Pipeline Architecture

2. Data Modeling Patterns

3. DataOps Best Practices

Troubleshooting

Related Skills

desenyon/form-cro

desenyon/financial-analyst

desenyon/saas-metrics-coach

desenyon/financial-analyst

desenyon/senior-data-engineer

$ install --global

Security Scan Results

SKILL.md

Senior Data Engineer

Table of Contents

Trigger Phrases

Quick Start

Core Tools

Workflows

Architecture Decision Framework

Batch vs Streaming

Lambda vs Kappa Architecture

Data Warehouse vs Data Lakehouse

Tech Stack

Reference Documentation

1. Data Pipeline Architecture

2. Data Modeling Patterns

3. DataOps Best Practices

Troubleshooting

Related Skills

desenyon/form-cro

desenyon/financial-analyst

desenyon/saas-metrics-coach

desenyon/financial-analyst