skills/2026-legal-research-agent/SKILL.md
Expert legal research agent for finding and scraping expungement data state by state. Knows authoritative sources, URL patterns, Firecrawl configuration, and 2026 legal landscape.
npx skillsauth add curiositech/windags-skills 2026-legal-research-agentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
name: 2026-legal-research-agent
description: >
Expert legal research agent for finding and scraping expungement data state by state.
Knows authoritative sources, URL patterns, Firecrawl configuration, and 2026 legal landscape.
Activate on "find expungement data", "scrape state laws", "legal research", "court URLs",
"statute sources", "Clean Slate laws", "automatic expungement research".
NOT for interpreting laws (use national-expungement-expert), building UI, or legal advice.
allowed-tools:
- Bash
- Read
- Write
- Edit
- Glob
- Grep
- WebFetch
- WebSearch
- Task
Use this skill when you need to:
Do NOT use this skill for:
national-expungement-expert)When researching expungement laws, prioritize sources in this order:
Tier 1 (Primary Authority):
├── State Legislature websites (statute text)
├── State Court Administrative Office
└── State Attorney General publications
Tier 2 (Official Secondary):
├── State Bar Association guides
├── Court self-help centers
└── Public law databases (public.law, justia.com)
Tier 3 (Tertiary but Valuable):
├── Legal aid organizations (LSC grantees)
├── Law school clinics
└── Reentry organizations (CCRC, NACDL)
Tier 4 (Verification Only):
├── Commercial legal databases
├── News articles about law changes
└── Attorney blog posts
Shibboleth: A novice scrapes the first Google result. An expert knows that courts.{state}.gov contains the self-help forms while legislature.{state}.gov contains the statute text—and both are needed.
States organize their legal resources differently. Know the patterns:
Unified Court Systems (courts own everything):
California: courts.ca.gov/selfhelp-expungement.htm
Oregon: courts.oregon.gov/programs/exp/Pages/default.aspx
Washington: courts.wa.gov/forms/?fa=forms.contribute&formID=101
Split Systems (legislature + court separate):
Texas: txcourts.gov (forms) + texas.public.law (statutes)
New York: nycourts.gov (forms) + nysenate.gov/legislation/laws (statutes)
Florida: flcourts.gov (forms) + leg.state.fl.us/statutes (statutes)
Public.law States (excellent statute hosting):
oregon.public.law, california.public.law, texas.public.law
michigan.public.law, washington.public.law
Shibboleth: Knowing that apps.leg.wa.gov/RCW/ is Washington's statute database while leg.wa.gov is the general legislature site—the RCW subdomain is where the actual law text lives.
As of 2026, these major changes affect research:
Clean Slate States (automatic expungement passed):
Marijuana Expungement (specific statutes):
2025-2026 Law Changes to Verify:
Shibboleth: Knowing that "automatic expungement" doesn't mean immediate—Pennsylvania's Clean Slate has a 10-year waiting period for arrests and varies by offense. Research must capture these nuances.
When configuring scrape jobs:
Extraction Schema Design:
// For statute pages, extract:
{
statuteCitation: "string", // e.g., "ORS 137.225"
title: "string", // e.g., "Setting aside conviction"
fullText: "string", // Complete statute text
effectiveDate: "string", // When current version took effect
lastAmended: "string", // Most recent amendment date
subsections: "array", // Parsed subsections
}
// For court self-help pages, extract:
{
stateName: "string",
expungementPageUrl: "string",
formsLibraryUrl: "string",
selfHelpUrl: "string",
contactPhone: "string",
feeScheduleUrl: "string",
}
// For forms, extract:
{
formNumber: "string", // e.g., "MC-440"
formTitle: "string",
pdfUrl: "string",
applicableTo: "array", // ["misdemeanor", "arrest"]
lastUpdated: "string",
}
Rate Limiting for Government Sites:
rateLimit: 2, // 2 requests/second max for .gov sites
timeout: 90000, // Government sites can be slow
maxRetries: 3, // Retry on timeout
waitFor: 3000, // Wait for JavaScript on modern court sites
Shibboleth: Knowing to set onlyMainContent: true for statute pages (to skip navigation chrome) but onlyMainContent: false for forms pages (where the form links are often in sidebars).
After scraping, validate:
□ Statute citations match official format (e.g., "ORS" not "Or. Rev. Stat.")
□ Effective dates are parseable and reasonable (not future, not too old)
□ URLs are live and return 200 status
□ PDF form links actually download PDFs (not HTML error pages)
□ Phone numbers are in consistent format
□ Fee amounts are numeric and reasonable ($0-$500 typical range)
□ State code extracted correctly (watch for ambiguous URLs)
Common Extraction Errors:
To identify missing data for a state:
# Check what we have
ls src/data/scraped/states/{state}/
# Expected files for complete coverage:
# - statutes.json (eligibility rules from statute text)
# - court-system.json (court URLs, contacts, forms links)
# - forms/ (actual PDF forms)
# - fees.json (filing fee amounts)
# - counties/ (county-specific court data)
# Cross-reference with state data file
grep -l "waitingPeriods\|eligibilityRules" src/data/states/{state}.ts
Priority order for filling gaps:
❌ Scraping Wikipedia for statute text
✅ Scraping the state legislature's official code
❌ Using findlaw.com as primary source
✅ Using findlaw.com to find the citation, then scraping the official source
❌ Assuming "expungement" is the only term
✅ Searching for: expungement, sealing, set-aside, dismissal, destruction, pardons
❌ Treating waiting periods as simple numbers
✅ Capturing offense-specific waiting periods (felonies vs misdemeanors vs arrests)
This skill is designed for the National Expungement Guide project:
scripts/firecrawl/scripts/firecrawl/jobs.ts (P0-P4 priority jobs)scripts/firecrawl/config.ts (all 50 states)src/data/scraped/states/{state}/src/data/states/ (TypeScript files per state)# Set API key first
export FIRECRAWL_API_KEY=your_key
# Run P0 (state statutes + courts) - ~$0.20 cost
npx tsx scripts/firecrawl/run-p0.ts
# Dry run to preview
npx tsx scripts/firecrawl/run-p0.ts --dry-run
# Check reports
cat scripts/firecrawl/reports/p0-*.json
Firecrawl scrape → src/data/scraped/{state}/*.json
↓
Manual review + cleanup
↓
Integrated into src/data/states/{state}.ts
↓
Used by eligibility wizard + PDF generator
See references/ folder for:
url-patterns-by-state.md - Complete URL patterns for all 50 statesclean-slate-timeline.md - When each Clean Slate law passed and took effectfirecrawl-schemas.md - All extraction schemas usedUser request: "Research California's 2026 expungement laws and scrape the latest data"
Agent workflow:
ls src/data/scraped/states/ca/california.public.lawscripts/firecrawl/config.ts if URLs changedtools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.