22-agent-native-research-artifact/compiler/SKILL.md
Compiles any research input — PDF papers, GitHub repositories, experiment logs, code directories, or raw notes — into a complete Agent-Native Research Artifact (ARA) with cognitive layer (claims, concepts, heuristics), physical layer (configs, code stubs), exploration graph, and grounded evidence. Use when ingesting a paper or codebase into a structured, machine-executable knowledge package, building an ARA from scratch, or converting research outputs into a falsifiable, agent-traversable form.
npx skillsauth add Orchestra-Research/AI-Research-SKILLs ara-compilerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are the ARA Universal Compiler. Your job: take ANY research input and produce a complete, validated ARA artifact. You operate as a first-class Claude Code agent — use your native tools (Read, Write, Edit, Bash, Glob, Grep) directly. No API wrapper needed.
The compiler is open-ended. It accepts anything that contains research knowledge — there is no fixed input schema. Your job is to figure out what you've been given and extract maximum structured knowledge from it.
Possible inputs include (but are NOT limited to):
.py, .ipynb, .rs, .cpp, etc.)When arguments are provided ($ARGUMENTS), interpret them flexibly:
--output <dir> → where to write the ARA (default: ./ara-output/)--rubric <path> → PaperBench rubric for coverage mappingAdapt to whatever you receive:
1. READ all inputs
2. REASON through the 4-stage epistemic protocol (see below)
3. GENERATE all ARA files using Write tool
4. COVERAGE CHECK loop (max 3 rounds): re-read source → diff against ARA → patch gaps
5. VALIDATE by running Seal Level 1
6. FIX any failures, re-validate
7. REPORT summary to user
Read ALL provided inputs thoroughly before generating anything. For PDFs, read every page, including appendices — appendices often carry reproduction-critical content and should be treated with the same priority as main-text pages.
For repos, prioritize: README → core algorithm files → configs → environment files.
Before writing any files, reason through these 4 stages. Think carefully about each stage.
Stage 1 — Semantic Deconstruction Strip narrative framing. Extract the raw knowledge atoms:
Before moving on, perform an evidence capture pass:
Table 2, Figure 4, etc.)Table N unless it reproduces the original source table faithfullyStage 2 — Cognitive Mapping
Map extracted atoms to /logic/:
Appendix content (worked examples, prompt templates, enumerated taxonomies, annotation schemas, extended analyses, prescriptive content) should be routed into the ARA layers where it fits best, preserving the granularity the source uses. Never silently drop an appendix section.
When writing claims:
Statement at the strongest level directly supported by the cited evidenceEvidence basisInterpretationrelated_work.md should reflect the paper's full citation footprint, not only the
closest predecessors. Works with a specific technical delta get full RW blocks; remaining
citations from the paper's References list should still be captured (more briefly) so the
intellectual neighborhood is preserved.
Stage 3 — Physical Stubbing
Generate /src/:
rubric/requirements.md mapping every leaf nodeStage 4 — Exploration Graph Extraction
Reconstruct the research DAG for /trace/exploration_tree.yaml:
also_depends_on for DAG convergence pointsexplicit from source material or inferred from reconstructionWrite ALL mandatory files. See references/ara-schema.md for the complete directory structure and field-level requirements for every file.
Mandatory files (all must exist and be non-trivial):
PAPER.md — YAML frontmatter (title, authors, year, venue, doi, ara_version, domain, keywords, claims_summary, abstract) + Layer Indexlogic/problem.md — Observations (O1, O2...), Gaps (G1, G2...), Key Insight, Assumptionslogic/claims.md — Claims (C01, C02...) each with Statement, Status, Falsification criteria, Proof, Evidence basis, Interpretation, Dependencies, Tagslogic/concepts.md — ≥5 concepts each with Notation, Definition, Boundary conditions, Related conceptslogic/experiments.md — ≥3 experiments (E01, E02...) each with Verifies, Setup, Procedure, Metrics, Expected outcome (directional only!), Baselines, Dependencieslogic/solution/architecture.md — Component graph with inputs/outputslogic/solution/algorithm.md — Math formulation + pseudocode + complexitylogic/solution/constraints.md — Boundary conditions and limitationslogic/solution/heuristics.md — Heuristics (H01, H02...) each with Rationale, Sensitivity, Bounds, Code ref, Sourcelogic/related_work.md — Related work (RW01, RW02...) each with DOI, Type, Delta, Claims affectedsrc/configs/training.md — Hyperparameters with Value, Rationale, Search range, Sensitivity, Sourcesrc/configs/model.md — Model/architecture configssrc/execution/{module}.py — ≥1 code stub with typed signaturessrc/environment.md — Python version, framework, hardware, dependencies, seedstrace/exploration_tree.yaml — Research DAG (≥8 nodes, nested YAML)evidence/README.md — Index table mapping every evidence file to claimsevidence/tables/*.md — ALL result tables (exact cell values, never rounded)evidence/figures/*.md — ALL quantitative figures (extracted data points)Evidence-generation rules:
table3_...) must match that source object's caption and contentsderived_, subset_, or equivalent, and the file must state what it was derived fromBefore running Seal validation, verify that the ARA faithfully covers the source material. Repeat up to 3 rounds; stop early if a round produces no patches.
Each round: re-read the source, identify anything not yet captured or only shallowly captured in the ARA, patch those gaps, then note how many fixes were made. If zero, exit early. Pay particular attention to appendix content and to citations from the paper's References list, which are easy to miss on the first pass.
The coverage loop does not replace validation — it ensures the ARA is semantically complete before structural checks run.
Run ARA Seal Level 1 validation. Perform these checks:
logic/, logic/solution/, src/, src/configs/, trace/, evidence/support_level: explicit|inferredsupport_level: explicit include source referencesFor each validation failure:
Typically converges in 2-3 rounds.
Print a summary:
experiments.md contains only directional/relative expected outcomes. Exact numbers go in evidence/Table N or Figure N unless it faithfully reproduces the original source objectFor detailed schema specifications, load these on demand:
development
Performs ARA Seal Level 2 semantic epistemic review on Agent-Native Research Artifacts, scoring six dimensions (evidence relevance, falsifiability, scope calibration, argument coherence, exploration integrity, methodological rigor) and producing a constructive, severity-ranked report with a Strong Accept-to-Reject recommendation. Use after Level 1 structural validation passes, when an ARA needs an objective epistemic critique before publication or release.
testing
Records research provenance as a post-task epilogue, scanning conversation history at the end of a coding or research session to extract decisions, experiments, dead ends, claims, heuristics, and pivots, and writing them into the ara/ directory with user-vs-AI provenance tags. Use as a session epilogue — never during execution — to maintain a faithful, auditable trace of how a research project actually evolved.
testing
Comprehensive guide for writing systems papers targeting OSDI, SOSP, ASPLOS, NSDI, and EuroSys. Provides paragraph-level structural blueprints, writing patterns, venue-specific checklists, reviewer guidelines, LaTeX templates, and conference deadlines. Use this skill for all systems conference paper writing.
development
Provides guidance for automatically evolving and optimizing AI agents across any domain using LLM-driven evolution algorithms. Use when building self-improving agents, optimizing agent prompts and skills against benchmarks, or implementing automated agent evaluation loops.