.claude/skills/ts-dlt/SKILL.md
You are an expert in dlt, the open-source Python library for building data pipelines. You help developers load data from any API, file, or database into warehouses and lakes using simple Python decorators — with automatic schema inference, incremental loading, and built-in data contracts. dlt is the "requests library for data pipelines."
npx skillsauth add eliferjunior/Claude dltInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are an expert in dlt, the open-source Python library for building data pipelines. You help developers load data from any API, file, or database into warehouses and lakes using simple Python decorators — with automatic schema inference, incremental loading, and built-in data contracts. dlt is the "requests library for data pipelines."
import dlt
# Simplest pipeline: Python generator → warehouse
@dlt.resource(write_disposition="append")
def github_events():
"""Load GitHub events for a repository."""
import requests
response = requests.get("https://api.github.com/repos/org/repo/events")
yield from response.json()
# Run pipeline
pipeline = dlt.pipeline(
pipeline_name="github_events",
destination="bigquery", # or: postgres, snowflake, duckdb, motherduck
dataset_name="raw_github",
)
load_info = pipeline.run(github_events())
print(load_info) # Schema inferred automatically
@dlt.resource(
write_disposition="merge", # Upsert: update existing, insert new
primary_key="id",
)
def orders(
updated_at=dlt.sources.incremental(
"updated_at",
initial_value="2025-01-01T00:00:00Z"
)
):
"""Load orders incrementally — only new/changed since last run.
dlt tracks the cursor automatically between runs.
No need to store state manually.
"""
import requests
page = 1
while True:
response = requests.get("https://api.shop.com/orders", params={
"updated_after": updated_at.last_value,
"page": page,
"per_page": 100,
})
data = response.json()
if not data:
break
yield from data
page += 1
from dlt.sources.rest_api import rest_api_source
# Declarative API source — no code needed for standard REST APIs
source = rest_api_source({
"client": {
"base_url": "https://api.hubspot.com/crm/v3/",
"auth": { "type": "bearer", "token": dlt.secrets["hubspot_token"] },
"paginator": { "type": "offset", "limit": 100, "offset_param": "offset" },
},
"resources": [
{
"name": "contacts",
"endpoint": { "path": "objects/contacts" },
"write_disposition": "merge",
"primary_key": "id",
},
{
"name": "deals",
"endpoint": { "path": "objects/deals" },
"write_disposition": "merge",
"primary_key": "id",
},
],
})
pipeline = dlt.pipeline(destination="bigquery", dataset_name="raw_hubspot")
pipeline.run(source)
# Enforce schema contracts — fail loudly on unexpected changes
@dlt.resource(
write_disposition="merge",
primary_key="id",
columns={
"id": {"data_type": "bigint", "nullable": False},
"email": {"data_type": "text", "nullable": False},
"plan": {"data_type": "text", "nullable": False},
"mrr_cents": {"data_type": "bigint"},
},
schema_contract="evolve", # "freeze" | "evolve" | "discard_value" | "discard_row"
)
def customers():
# If API returns unexpected fields, dlt handles per contract setting
yield from fetch_customers()
pip install dlt[bigquery] # + destination adapter
# Other destinations: dlt[snowflake], dlt[postgres], dlt[duckdb], dlt[motherduck]
destination="duckdb", switch to BigQuery/Snowflake for productiondlt.sources.incremental for stateful loading; dlt tracks cursor between runsrest_api_source for standard REST APIs; write custom resources only for complex APIswrite_disposition="merge" with primary_key for entity tables; append for event streamsschema_contract="freeze" in production to catch breaking API changes immediatelydlt.secrets["key"] backed by environment variables or .dlt/secrets.tomladd_map() for row-level transforms during loading; heavier transforms belong in dbtdevelopment
Expert guidance for Fireworks AI, the platform for running open-source LLMs (Llama, Mixtral, Qwen, etc.) with enterprise-grade speed and reliability. Helps developers integrate Fireworks' inference API, fine-tune models, and deploy custom model endpoints with function calling and structured output support.
development
Convert any website into clean, structured data with Firecrawl — API-first web scraping service. Use when someone asks to "turn a website into markdown", "scrape website for LLM", "Firecrawl", "extract website content as clean text", "crawl and convert to structured data", or "scrape website for RAG". Covers single-page scraping, full-site crawling, structured extraction, and LLM-ready output.
tools
Expert guidance for Firebase, Google's platform for building and scaling web and mobile applications. Helps developers set up authentication, Firestore/Realtime Database, Cloud Functions, hosting, storage, and analytics using Firebase's SDK and CLI.
development
When the user needs to build file upload functionality for a web application. Use when the user mentions "file upload," "image upload," "upload endpoint," "multipart upload," "presigned URL," "S3 upload," "file validation," "upload to cloud storage," or "accept user files." Handles upload endpoints, file validation (type, size, magic bytes), cloud storage integration, and upload status tracking. For image/video processing after upload, see media-transcoder.