skills/datapackage/SKILL.md
Explore and query any dataset annotated with a Frictionless Data Package descriptor (datapackage.json). Use this skill whenever a user wants to discover what tables or resources a dataset contains, look up column names and descriptions, surface usage warnings embedded in metadata, or understand how to load data from Parquet files, DuckDB or SQLite databases, or CSV files described by a datapackage.json. Also use when the user has a datapackage.json and wants to know what's in it, how to query it efficiently, or how to connect its metadata to actual data files. Pairs well with dataset-specific skills (like `pudl`) that layer domain knowledge on top.
npx skillsauth add catalyst-cooperative/skills datapackageInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill covers any dataset described by a
Frictionless Data Package descriptor file
(datapackage.json). It is intentionally generic — it works for any conforming
datapackage, regardless of who published it or what the data contains.
For PUDL-specific knowledge (S3 bucket paths, table tier conventions, data source
context, usage warnings), also use the pudl skill on top of this one.
A datapackage.json is a JSON file that describes a collection of tabular data
resources. Each resource represents one table (or file) and includes:
name: machine-readable identifierdescription: human-readable description, often including processing notes, primary
keys, and usage warningspath: filename or URL of the actual data fileschema.fields: list of columns, each with a name and descriptionThe file can be large (hundreds of resources, megabytes of JSON). Always query it selectively — never load it whole into context.
Before querying metadata, verify jq is available:
command -v jq
If not found, tell the user how to install it:
brew install jqsudo apt install jqconda install jqwinget install jqlang.jqFor data loading and SQL queries, the attach-db, and query skills from
duckdb-skills must be installed. Install them from duckdb/duckdb-skills.
datapackage.json (see below).frictionless validate. See Frictionless Validate.frictionless
CLI to validate packages, check data quality, infer schemas, and diagnose unfamiliar
descriptors; read when the user wants to validate a descriptor, check if data matches
its schema, or understand what the frictionless tool can tell them about a packageThe datapackage standard is permissive: publishers frequently add non-standard fields. Two conventions are worth knowing immediately:
_ prefix convention marks system-generated or platform-specific keys (e.g.
_cache, _platformVersion). Some publishers add custom keys without the prefix
(e.g. PUDL adds duckdb_table, sqlite_table on database-backed resources). Treat
unknown fields as informational metadata, not errors..gz or .zip path may have an
explicit "compression": "gz" field. The bytes and hash fields apply to the
compressed file, not the uncompressed original.For other patterns (catalogs, versioning, external foreign keys, translation support, field relationships, etc.), fetch the relevant page on demand:
Both pages cover largely the same set of community conventions; consult whichever matches the descriptor version you're working with.
This skill delegates actual data querying to:
/duckdb-skills:attach-db — attach a .duckdb or .sqlite database file and
set up a persistent session for querying/duckdb-skills:query — run SQL or natural language queries against attached
databases, ad-hoc files (Parquet, CSV, remote HTTPS/S3), and JSON files including
datapackage.json itself (via DuckDB's read_json)These skills must be installed. See skills-lock.json in the project root.
uv to install Python packages — prefer uv add <package> over
pip install <package>. uv is faster and installs into a virtual environment
rather than globally. Fall back to pip only if uv is not available
(command -v uv returns nothing).Two versions of the Frictionless Data Package standard are in common use. Identify the version from the top-level descriptor before parsing:
| Field present | Version | Example value |
| ------------- | -------------------------------- | --------------------------------------------------------- |
| "$schema" | v2.0 | "https://datapackage.org/profiles/2.0/datapackage.json" |
| "profile" | v1.0 | "tabular-data-package" or "data-package" |
| neither | ambiguous (treat as v1 baseline) | — |
Key differences between versions that affect parsing:
"role": "author" (singular string); v2 has
"roles": ["author"] (array). Both may appear in the wild.[-a-z0-9._/]; v2 is unrestricted.version field — present in v2, absent in v1.Bundled schemas:
assets/datapackage-v1.schema.json — v1.0
(JSON Schema draft-04). Used by FERC XBRL packages and many older datasets.assets/datapackage-v2.schema.json — v2.0
(JSON Schema draft-07). The current standard. Canonical version always at:
https://datapackage.org/profiles/2.0/datapackage.jsonRead the appropriate schema when you need to understand which fields are valid in a descriptor or validate one programmatically.
development
Access PUDL table data plus table/column/source metadata in Jupyter or Marimo notebooks for debugging and visualization. Use when users ask what a table contains, how to read it, or how columns are defined.
development
Full-stack development guidance for the PUDL project, covering contributor workflows, local ETL and Dagster development, metadata/schema changes, dbt and pytest validation, and data-oriented documentation context (data access, data dictionaries, data sources, and methodology).
testing
Create, edit, improve, or audit AgentSkills. Use when creating a new skill from scratch or when asked to improve, review, audit, tidy up, or clean up an existing skill or SKILL.md file. Also use when editing or restructuring a skill directory (moving files to references/ or scripts/, removing stale content, validating against the AgentSkills spec). Triggers on phrases like "create a skill", "author a skill", "tidy up a skill", "improve this skill", "review the skill", "clean up the skill", "audit the skill".
testing
Host security hardening and risk-tolerance configuration for OpenClaw deployments. Use when a user asks for security audits, firewall/SSH/update hardening, risk posture, exposure review, OpenClaw cron scheduling for periodic checks, or version status checks on a machine running OpenClaw (laptop, workstation, Pi, VPS).