.claude/skills/draft-benchmark/SKILL.md
# SKILL: draft-benchmark Use when: user asks to run draft pipeline benchmarks, fill Draft_test.xlsx, compare with ChatGPT-5.4, or analyze draft errors. ## Instructions for Claude When this skill is invoked, follow these steps ONE BY ONE: ### Step 1: Run Pipeline on Each Scenario (Fill "Draft Agent" Column C) For each of the 10 scenarios in `docs/Draft_test.xlsx` column B: 1. Read the scenario text from the Excel file using openpyxl 2. Run the drafting pipeline: ```python import async
npx skillsauth add itmegirish/boardingmcp-server .claude/skills/draft-benchmarkInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use when: user asks to run draft pipeline benchmarks, fill Draft_test.xlsx, compare with ChatGPT-5.4, or analyze draft errors.
When this skill is invoked, follow these steps ONE BY ONE:
For each of the 10 scenarios in docs/Draft_test.xlsx column B:
import asyncio, openpyxl, sys, time
from pathlib import Path
sys.path.insert(0, str(Path("c:/Girish/Fundamental_Projects/ActionAi/Agent_steer_backend/BoardingMcp-Server")))
from app.agents.drafting_agents.drafting_graph import get_drafting_graph
graph = get_drafting_graph()
result = await graph.ainvoke({"user_request": scenario_text})
result["final_draft"]["draft_artifacts"][0]["text"] firstresult["draft"]["draft_artifacts"][0]["text"]Do this ONE scenario at a time. Do NOT batch. Save after each one.
After all 10 drafts are filled, for each scenario:
research/run_draft_benchmark.py --compare OR manually check:Error Categories to Check:
| Category | Severity | What to Look For |
|----------|----------|------------------|
| Fabrication | CRITICAL (-2.0) | Invented AIR/SCC/ILR citations, fake annexures for documents not in input, invented events/dates |
| Wrong Statute | CRITICAL (-2.0) | Indian Evidence Act 1872 (repealed→BSA 2023), CrPC 1973 (repealed→BNSS 2023), IPC (repealed→BNS 2023), phantom S.27A SRA |
| Missing Section | HIGH (-1.0) | No verification clause, no prayer, no jurisdiction, no cause of action, no valuation, no court fee |
| Legal Error | HIGH/MEDIUM (-1.0/-0.5) | Limitation anchored to notice date, pendente lite cites S.34 CPC (should be Order XX Rule 11), facts-law section mixing, "and/or" usage |
| Placeholder Excess | MEDIUM (-0.5) | More than 15 {{PLACEHOLDER}} in one draft |
| Structural | MEDIUM/LOW (-0.5/-0.25) | Missing paragraph numbers, non-continuous numbering, no continuous numbering through document |
Scoring: Start at 10.0, deduct per severity above. Min 0, max 10.
Write into column E ("Compare"):
Winner: [pipeline/chatgpt] ([score diff])
Pipeline: [score]/10 ([word_count]w, [placeholder_count] placeholders)
ChatGPT: [score]/10 ([word_count]w, [placeholder_count] placeholders)
Write into column F ("Improvements") — list every pipeline error:
[CRITICAL] Fabricated citation: AIR 2019 SC 456
[HIGH] Missing: cause of action section
[MEDIUM] Pendente lite cites S.34 CPC instead of Order XX Rule 11
[MISSING] Sections present in ChatGPT but not pipeline: schedule of property
Save Excel after each scenario comparison
After all 10 are compared, print:
docs/Draft_test.xlsx (columns: s.no, Civil Draft Scenarios, Draft Agent, Chatgpt-5.4, Compare, Improvements)app/agents/drafting_agents/drafting_graph.py → get_drafting_graph()research/run_draft_benchmark.py (--draft or --compare mode)research/output/# Fill Draft Agent column
agent_steer/Scripts/python.exe research/run_draft_benchmark.py --draft
# Compare + fill Compare and Improvements columns
agent_steer/Scripts/python.exe research/run_draft_benchmark.py --compare
development
# SKILL: v9-architecture Use when: planning, building, or reviewing v11.0 architecture components (LKB 2-layer model, document schemas, structured prompt builder, gates, family migrations). ## v11.0 Architecture — Scalable Context-Driven Pipeline ### Core Principles 1. **Better context to LLM = better draft** — no complex engine needed 2. **Separate law from structure** — cause type (92) × document type (12) = 1,104 combinations 3. **Decide law before drafting, enforce law after drafting** #
development
# SKILL: test-draft-pipeline ## Purpose Run the drafting pipeline, evaluate output quality, and verify all 4 gates + review work correctly. ## When to Use - After modifying any pipeline node, gate, or prompt - After creating or updating an exemplar or LKB entry - For regression testing across multiple scenarios - For debugging pipeline failures ## Test Runners ### Quick Test (single scenario) ```bash agent_steer/Scripts/python.exe research/run_draft_live.py ``` ### Unit Tests ```bash agent_
development
# SKILL: exemplar-builder ## Purpose Create, validate, and maintain document schemas and LKB Layer 2 data for the v11.0 scalable drafting pipeline. **v11.0 approach:** No exemplar documents in prompts. Instead: LKB 2-layer data + document schema → structured prompt → LLM drafts. ## When to Use - Creating a new document schema (e.g., written_statement, appeal_memo) - Enriching LKB entries with Layer 2 data (available_reliefs, jurisdiction_basis) - Reviewing schema quality against CPC rules - A
development
# SKILL: section-validator ## Purpose Build and maintain the 4 deterministic verification gates (Stage 3). Gates run on the full draft text with zero LLM calls. They validate, auto-fix formatting, and flag issues for review. ## When to Use - Building or modifying any gate - Adding new entity extraction patterns - Debugging false positives / false negatives - Extending verified provisions coverage ## Architecture Context (v5.1 — what's running) 4 gates run sequentially on `draft.draft_artifac