plugins/backend-toolkit/skills/data-validation/SKILL.md
Validate untrusted input once at the trust boundary and return a typed parsed value (parse, don't validate). Use when adding an endpoint, accepting external input, or when invalid data leaks past the boundary into business logic. Not for defining the API contract/schema itself (use api-contract) or downstream business-rule logic — parse only at the trust boundary.
npx skillsauth add jaykim88/claude-ai-engineering data-validationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Validate untrusted input exactly once, at the system boundary, and convert it into a typed value that downstream code is statically guaranteed is valid. Stop passing raw/unparsed data inward and re-checking it everywhere.
Universal — the "parse, don't validate" principle (validate at the boundary, return a typed parsed value) applies to any typed language; only the validation library differs.
Identify every trust boundary
Define a schema per boundary input
Parse, don't validate — return a typed value, not a boolean
const data = Schema.parse(raw) → data is now a typed, guaranteed-valid objectif (isValid(raw)) { use raw as any } — downstream still sees untyped datasafeParse at boundaries to convert failures into a 400 response, not a thrown 500Validate at the boundary ONLY — trust inward
Coerce and normalize during parse
5b. Cap size + bound dangerous types at the boundary
Schema.string().max(N), .array().max(N) — prevent ReDoS / pathological allocationsBigInt for IDs / counters that may exceed Number.MAX_SAFE_INTEGER; reject NaN / InfinityDate / instant; reject ambiguous local time without zone — date-string handling is the #1 silent corruption source*? patterns (ReDoS); use a regex library with timeout or pre-validated patterns| ❌ Anti-pattern | ✅ Correct |
|---|---|
| if (isValid(x)) { use x } (boolean check) | const parsed = Schema.parse(x) (typed value) |
| Re-validating the same data in 5 inner functions | Parse once at boundary, trust the type inward |
| Trusting external API responses without parsing | Parse external responses too — they're untrusted |
| body as RequestType (type assertion, no runtime check) | Runtime parse that produces the type |
| Throwing 500 on bad input | safeParse → 400 with field errors |
| No request-body size cap (10GB JSON OOMs the server) | Body-size limit at HTTP layer + .max() in schema |
| Unbounded string / array fields | .max(N) in the schema |
| Storing user input as a local-time date with no zone | Parse to canonical UTC; reject ambiguous local-time strings |
| User-supplied or unbounded regex | Pre-validated patterns + execution timeout |
as assertions on raw input)safeParse → 400 mappingfeat(validation): parse <endpoint> input at boundarysafeParse at the controller boundary, OR NestJS class-validator DTOs with ValidationPipeform-ux)ZodError → RFC 9457 400 response in a global filtergo-playground/validator on structs; or parse into typed structs explicitlyapi-contract — the contract's request schema is the validation schemabackend-security-audit — input validation is the first injection defenseauthentication — validate token claims as untrusted inputdevelopment
Design webhooks correctly on both sides — sending (HMAC signing, retries with backoff, at-least-once) and receiving (verify signature on raw body, enqueue + 200 fast, dedupe on event id). Use when adding webhook delivery or consuming a provider's webhooks. Not for internal service-to-service events (use async-messaging) or general outbound-call retry policy (use resilience-patterns).
testing
Use transactions and isolation levels correctly — keep them short, no network calls inside, explicit isolation, retry on serialization conflicts, and choose optimistic vs pessimistic locking. Use when a write spans multiple tables, when concurrent updates corrupt data, or when designing money/inventory flows. Not for cross-service event delivery (use async-messaging Outbox) or schema-level constraints (use schema-design).
development
Backend testing pyramid — unit for pure logic, integration against a real DB (Testcontainers), and consumer-driven contract testing (Pact) for service boundaries. Use before a feature, after a bug fix, or when services break each other on deploy. Not for load testing (use performance-profiling) or security testing (use backend-security-audit).
data-ai
Design a relational schema — normalize to 3NF then denormalize with justification, choose the right Postgres index type per data shape, enforce constraints at the DB. Use when modeling a new domain, when queries are slow, or before a migration. Not for diagnosing slow queries (use query-optimization) or shipping the change without downtime (use migration-strategy).