skills/by-role/data-engineer/schema-spec/SKILL.md
Write a data schema specification document. Use when the user says "schema spec", "document this schema", "schema design doc", "data model spec", "table spec", "field definitions", "schema contract", "data dictionary", "define these fields", or needs to formally document a database schema, event schema, or API payload schema - even if they don't explicitly say "schema spec".
npx skillsauth add qa-aman/claude-skills schema-specInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Based on Fundamentals of Data Engineering (Reis & Housley) and Designing Data-Intensive Applications (Kleppmann). A schema spec is the data contract between producers and consumers. Kleppmann's rule: schema changes are the most dangerous category of change in a data system - the spec exists to make schema evolution explicit and backward-compatible.
The test: can a consumer engineer write a query or build an integration using only this document, without reading source code or asking the producer team?
Schema: [table name / event name / object name]
Domain: [which product or service owns this data]
Owner: [team]
Type: [relational table / event / API payload / Avro / Parquet / JSON]
Storage: [database name, topic name, or S3 path]
Created: [date]
Last modified: [date]
Version: [v1 / v2 / etc.]
Use a table for readability:
| Field | Type | Nullable | Default | Description | |-------|------|----------|---------|-------------| | id | UUID | No | — | Unique record identifier, generated at insert | | user_id | BIGINT | No | — | Foreign key to users.id | | status | ENUM | No | 'pending' | Current state: pending, active, cancelled, completed | | amount_cents | INTEGER | No | — | Transaction amount in cents. Never fractional. | | created_at | TIMESTAMPTZ | No | NOW() | UTC timestamp of record creation | | metadata | JSONB | Yes | NULL | Unstructured key-value pairs. Schema not enforced. |
For each field, ensure:
Primary key: id (UUID)
Unique constraints: (user_id, order_id) - one record per user per order
Foreign keys:
- user_id → users.id (CASCADE DELETE)
- order_id → orders.id (SET NULL on delete)
Indexes:
- (user_id, created_at DESC) — supports user activity queries
- (status) WHERE status != 'completed' — partial index for active records
Rules enforced at the application or database level that don't appear in column definitions:
Business rules:
- amount_cents must be > 0 (enforced by CHECK constraint)
- status transitions are one-way: pending → active → completed or cancelled
- cancelled records are never deleted, only soft-deleted via status
- created_at is immutable after insert
- metadata keys must match the approved key registry (not enforced by DB - application responsibility)
Partitioning: by created_at (monthly)
Retention: 7 years (regulatory requirement - SOX compliance)
Archival policy: records older than 1 year moved to cold storage tier
PII fields: [list any fields containing personal data and their handling policy]
This section prevents breaking changes:
Evolution rules:
- Adding nullable columns: backward-compatible, no coordination required
- Adding NOT NULL columns: requires default value or migration, coordinate with consumers
- Renaming columns: never rename in place - add new column, backfill, deprecate old
- Removing columns: 90-day deprecation window, announce to consumer teams
- Type widening (INT → BIGINT): coordinate with consumers before applying
- Type narrowing: prohibited without explicit consumer sign-off
Current deprecations: [list any fields marked for removal with target date]
1. Ambiguous types and units
Bad: amount FLOAT with no description
Good: amount_cents INTEGER NOT NULL + "Amount in USD cents. Divide by 100 for display. Never stored as float to avoid rounding errors."
2. Missing enum values
Bad: status VARCHAR with description "the status"
Good: status ENUM('pending','active','cancelled','completed') NOT NULL + each value explained
3. No evolution policy Bad: Schema doc with no guidance on how to make changes Good: Explicit rules for each change category (add, rename, remove, retype) with consumer notification requirements
4. PII undocumented Bad: Fields containing email, name, or location with no annotation Good: Each PII field flagged, data classification noted (e.g., PII-direct, PII-quasi), and handling policy cited
development
Plan a webinar end-to-end using April Dunford's Obviously Awesome positioning framework to find the topic angle that makes the webinar obviously valuable to the right audience. Produces topic positioning, abstract, speaker brief, registration page, promotion sequence, day-of run-of-show, and post-webinar follow-up. Use when the user asks to plan a webinar, virtual event, online workshop, "we need a webinar on X", host a webinar, online masterclass, or any live virtual event with promotion and follow-up. Reads ICP, services, and brand voice from knowledge/.
development
Write long-form thought leadership articles, opinion pieces, industry POV essays, and CEO/founder bylines using the Made to Stick SUCCESs framework (Chip and Dan Heath). Use when the user asks for a long-form article, executive byline, opinion piece, industry POV, manifesto, "explain our point of view on X", or wants to publish an authority-building piece (1200-2500 words). Reads brand voice and positioning from knowledge/.
development
Plan a monthly content calendar across channels using the Content Marketing Matrix (Dave Chaffey, Smart Insights) - Entertain/Inspire/Educate/Convince. Every post gets a quadrant label. The monthly calendar must hit 40% Educate, 40% Inspire+Convince, 20% Entertain. Produces a week-by-week posting schedule with topics, formats, channels, and asset links. Use when the user says "content calendar", "social calendar", "plan next month's content", "what should we post", "content plan", "editorial calendar", "schedule posts for the month", or wants a structured posting plan for LinkedIn, Twitter, email, or blog. Reads brand voice, ICP, and past learnings from knowledge/.
development
Write SEO-optimized long-form articles targeting specific keywords using the They Ask You Answer Big 5 framework (Marcus Sheridan). Articles are categorized by Big 5 type (Cost, Problems, Versus, Best/Reviews, How-To) and structured accordingly. The "answer first" rule applies to every article. Use when the user asks for an SEO article, blog post for ranking, "rank for keyword X", organic content, search-optimized post, pillar page, or content for organic traffic. Includes keyword targeting, search intent matching, internal linking suggestions, and meta tags.