skills/doc2spec-synthesizing-formal-programming/SKILL.md
Synthesize formal programming specifications from natural-language API docs using grammar induction. Extracts rules from documentation, induces a domain-specific specification grammar (EBNF), and generates validated formal specs. Use when: 'formalize this API documentation', 'extract specifications from these docs', 'generate formal rules from this README', 'convert these requirements to formal specs', 'induce a grammar for these programming rules', 'validate API usage against documentation'.
npx skillsauth add ndpvt-web/arxiv-claude-skills doc2spec-synthesizing-formal-programmingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to convert natural-language programming documentation (API docs, READMEs, RFCs, ERC standards) into formal, machine-checkable specifications by first inducing a domain-specific grammar from the documentation and then generating specifications that conform to that grammar. The technique comes from the Doc2Spec multi-agent framework (Xia et al., 2026), which demonstrated that automated grammar induction constrains the specification space, enforces consistent representations, and produces higher-quality specs than direct LLM translation. It works across Solidity, Rust, Java, and other languages.
Grammar induction before specification generation. The core insight of Doc2Spec is that directly asking an LLM to translate natural-language rules into formal specs produces inconsistent results -- equivalent semantics get encoded differently, predicate names drift, and the output format varies. Instead, Doc2Spec first induces a domain-specific specification grammar (in EBNF) from the rules, establishing a fixed vocabulary of sorts (types of programming entities like Pointer, Allocator, Function) and predicates (properties like AllocatedByAllocator(Pointer, Allocator) or ThrowsOnUnauthorized(Function, Condition)). All specs are then generated to conform to this grammar, validated by parsing.
The universal DSL template. Every specification follows a constrained structure: an optional Boolean condition plus a predicate check. The template grammar is:
<rule> ::= <check> ["if" <expr>] ";"
<check> ::= "check(" <predicate> "," BOOLEAN ")"
<expr> ::= <unary> | <binary>
<unary> ::= "not" <value>
<binary> ::= <value> ("and" | "or") <value>
<value> ::= BOOLEAN | <predicate> | <expr>
Domain-specific sorts and predicates are plugged into this template. This constraint reduces average token count (~30% smaller specs) and enforces consistency across all rules in a document.
Multi-agent pipeline with validation loops. The workflow uses five specialized stages: entity localization, attribute extraction, NL rule extraction, grammar induction, and formal spec generation. Each stage validates its output (JSON schema checks, Lark parser validation for grammar conformance) and retries up to three times on failure. This makes the process robust to LLM formatting errors.
Identify the source document and target entities. Read the documentation (API doc, ERC spec, Rustdoc, Javadoc). List every programming entity mentioned: functions, structs, classes, interfaces, parameters, return types. Use regex patterns or structural cues (headings, code blocks, function signatures) to locate them.
Extract entity attributes. For each entity, determine its attribute schema: name, parameter types, return type, visibility, mutability, ownership semantics. First discover the full schema across all entities, then instantiate values per entity.
Extract natural-language rules. Scan the documentation for constraint-bearing sentences. Classify each sentence as a rule or non-rule using dual evaluation: (a) does it describe a MUST/SHOULD/SHALL requirement? (b) does it constrain behavior, state, or usage? Aggregate scores using a confidence threshold. Discard purely descriptive text.
Induce sorts from extracted rules. Identify the types of programming entities referenced across all rules. These become the sorts of the grammar -- e.g., Pointer, Allocator, Token, Address, Function. Process rules incrementally; only introduce a new sort when no existing sort covers the entity.
Induce predicates from extracted rules. For each rule, identify the property or constraint it asserts and define a predicate with typed parameters -- e.g., AllocatedByAllocator(Pointer, Allocator) -> Bool. Reuse existing predicates when semantically equivalent rules appear. The set of sorts + predicates forms the domain-specific extension to the DSL template.
Assemble the full EBNF grammar. Combine the universal DSL template with the induced sorts and predicates to produce a complete grammar. Validate it parses correctly using a parser generator (Lark, ANTLR, or manual EBNF check).
Generate formal specifications for each rule. Translate each natural-language rule into a formal spec that conforms to the assembled grammar. Use the check(predicate, BOOLEAN) [if expr] structure. Validate each spec by parsing it against the grammar.
Handle validation failures. If a generated spec fails to parse against the grammar, feed the error message and original rule back for retry (up to 3 attempts). If retries fail, flag the rule for manual review and continue with remaining rules.
Output the specification bundle. Produce: (a) the induced grammar in EBNF, (b) the list of sorts with descriptions, (c) the predicate catalog with typed signatures, (d) the formal specs mapped to their source NL rules, and (e) any rules that could not be formalized.
Optionally generate executable checks. Convert the formal specs into runtime assertions, property-based tests, or static analysis rules in the target language (Solidity require() statements, Rust assert!() macros, Java assert or JML annotations).
Example 1: ERC-20 Token Standard
User: "Formalize the rules in the ERC-20 specification for the transferFrom function."
Approach:
transferFrom(address _from, address _to, uint256 _value), allowance, Transfer event, balanceOf_from account has deliberately authorized the sender of the message via some mechanism"Function, Address, Amount, Event, AuthorizationConditionIsAuthorized(Address, Address) -> Bool -- sender is authorized by ownerHasSufficientBalance(Address, Amount) -> Bool -- address holds enough tokensEmitsEvent(Function, Event) -> Bool -- function fires eventThrowsOnCondition(Function, Condition) -> Bool -- function reverts when condition metOutput:
# Induced Grammar Extension
Sorts: Function, Address, Amount, Event, AuthorizationCondition
Predicates:
IsAuthorized(Address, Address) -> Bool
HasSufficientBalance(Address, Amount) -> Bool
EmitsEvent(Function, Event) -> Bool
ThrowsOnCondition(Function, Condition) -> Bool
# Formal Specifications
Rule 1: check(ThrowsOnCondition(transferFrom, not IsAuthorized(_from, msg.sender)), true);
Rule 2: check(EmitsEvent(transferFrom, Transfer), true) if check(IsValidTransfer(_from, _to, _value), true);
Rule 3: check(ThrowsOnCondition(transferFrom, not HasSufficientBalance(msg.sender, _value)), true);
Example 2: Rust Memory Allocator Safety Rules
User: "Extract formal specs from the Rust GlobalAlloc documentation."
Approach:
alloc, dealloc, realloc, Layout, pointer typesptr parameter must denote a block of memory currently allocated via this allocator"layout parameter must match the Layout used to allocate the block"layout has non-zero size"Pointer, Layout, Allocator, FunctionAllocatedBy(Pointer, Allocator) -> BoolLayoutMatches(Layout, Layout) -> BoolNonZeroSize(Layout) -> BoolOutput:
# Formal Specifications for dealloc(ptr, layout)
Precondition 1: check(AllocatedBy(ptr, self), true);
Precondition 2: check(LayoutMatches(layout, original_layout), true);
Precondition 3: check(NonZeroSize(layout), true);
Example 3: Java NavigableMap Contract
User: "Formalize the rules for NavigableMap.subMap from the Javadoc."
Approach:
subMap(K fromKey, boolean fromInclusive, K toKey, boolean toInclusive), NavigableMap<K,V>IllegalArgumentException if fromKey is greater than toKey"NullPointerException if fromKey or toKey is null and this map does not permit null keys"fromKey to toKey"Key, Map, Function, ExceptionKeyOrder(Key, Key) -> Bool -- first key <= second keyPermitsNullKeys(Map) -> BoolIsNull(Key) -> BoolThrowsException(Function, Exception) -> BoolOutput:
# Formal Specifications for subMap
Rule 1: check(ThrowsException(subMap, IllegalArgumentException), true)
if not check(KeyOrder(fromKey, toKey), true);
Rule 2: check(ThrowsException(subMap, NullPointerException), true)
if check(IsNull(fromKey), true) or check(IsNull(toKey), true)
and not check(PermitsNullKeys(this), true);
Address, Pointer), not an individual instance. Keep the sort count manageable (typically 3-8 per domain).| Problem | Symptom | Resolution |
|---------|---------|------------|
| Ambiguous NL rule | Rule sentence contains hedging language ("may", "could", "typically") | Mark as low-confidence; extract only MUST/SHOULD/SHALL rules by default, flag others for user review |
| Predicate drift | Same property named differently across rules (e.g., HasBalance vs SufficientFunds) | During grammar induction, explicitly check each new predicate against existing ones for semantic overlap before adding |
| Spec fails grammar validation | Generated spec uses a predicate or sort not in the grammar | Retry with the error message and grammar definition provided as context; if 3 retries fail, flag for manual review |
| Entity extraction misses entities | Documentation uses non-standard formatting | Fall back to LLM-based entity extraction without regex, or ask the user to provide an entity list |
| Over-extraction of rules | Descriptive text misclassified as rules | Tighten the dual-evaluation threshold; require both evaluations to agree before classifying as rule |
AllocatedBy(ptr, self)) requires separate work specific to the target language and verification framework.Paper: Doc2Spec: Synthesizing Formal Programming Specifications from Natural Language via Grammar Induction (Xia et al., 2026). Look for: the EBNF DSL template (Figure 5), the five-agent pipeline architecture (Section 3), the incremental sort/predicate induction algorithm (Section 3.4), and the evaluation across ERC standards, Rust allocators, and Java APIs (Section 4).
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".