22-agent-native-research-artifact/research-manager/SKILL.md
Records research provenance as a post-task epilogue, scanning conversation history at the end of a coding or research session to extract decisions, experiments, dead ends, claims, heuristics, and pivots, and writing them into the ara/ directory with user-vs-AI provenance tags. Use as a session epilogue — never during execution — to maintain a faithful, auditable trace of how a research project actually evolved.
npx skillsauth add Orchestra-Research/AI-Research-SKILLs ara-research-managerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are the Live PM — a post-task research recorder. You run ONLY at the END of a coding
session, after the user's request has been fully addressed. You review what happened in
the conversation, then update the ara/ artifact accordingly.
ara/ while working on the user's request.ara/.ara/ directory should not be loaded
into context until the epilogue phase.When invoked (after the task is done):
ara/ files — get current IDs, existing claims, current tree state.
If ara/ does not exist, create it (see Initialization below).Scan the conversation for these event types:
| Event Type | Signals | Routes To |
|------------|--------|-----------|
| Decision | User chose between alternatives | trace/exploration_tree.yaml |
| Experiment | Test ran, benchmark completed, quantitative result | trace/exploration_tree.yaml + evidence/ |
| Dead End | Approach abandoned, "doesn't work", reverted | trace/exploration_tree.yaml |
| Pivot | Major direction change based on evidence | trace/exploration_tree.yaml |
| Claim | Assertion about the system, hypothesis stated | logic/claims.md |
| Heuristic | Implementation trick, workaround, "the trick is" | logic/solution/heuristics.md |
| AI Action | Agent wrote code, ran command, created file | Session record only |
| Observation | Interesting but unclassified | staging/observations.yaml |
SKIP (not worth recording):
Every entry must carry a provenance marker:
| Tag | When | Example |
|-----|------|---------|
| user | User explicitly stated or confirmed | "Let's use GQA" |
| ai-suggested | AI inferred; user did NOT confirm | AI notices a pattern |
| ai-executed | AI performed the action | AI wrote scheduler.py |
| user-revised | AI suggested, user corrected | "No, threshold is 90%" |
Default to ai-suggested when uncertain. Never mark inferences as user.
ara/
PAPER.md # Root manifest + layer index
logic/ # What & Why
problem.md # Problem definition + gaps
claims.md # Falsifiable assertions + proof refs
concepts.md # Term definitions
experiments.md # Experiment plans (declarative)
solution/
architecture.md # System design
algorithm.md # Math + pseudocode
constraints.md # Boundary conditions
heuristics.md # Tricks + rationale + sensitivity
related_work.md # Typed dependency graph
src/ # How (code artifacts)
configs/
kernel/
environment.md
trace/ # Journey
exploration_tree.yaml # Research DAG
sessions/
session_index.yaml # Master session index
YYYY-MM-DD_NNN.yaml # Individual session records
evidence/ # Raw Proof
README.md
tables/
figures/
staging/ # Unclassified observations
observations.yaml
The tree is a nested YAML structure where parent-child relationships are expressed
via the children: key. This forms a research DAG showing how decisions led to
experiments, which led to further decisions or dead ends — capturing how researchers
navigate the search space.
tree:children: containing nested child nodes (indented)also_depends_on: [N{XX}] for cross-edges when a node depends on multiple parentschildren: keyWhen adding a new node: determine which existing node it logically follows from
(its parent), and nest it under that node's children:. If it's a new top-level
research thread, add it as a root node.
tree:
- id: N01
type: question
title: "{root research question}"
provenance: user
timestamp: "YYYY-MM-DDTHH:MM"
description: >
{what is being explored}
children:
- id: N02
type: experiment
title: "{what was tested}"
provenance: ai-executed
timestamp: "YYYY-MM-DDTHH:MM"
result: >
{what happened — include numbers}
evidence: [C{XX}, "{figure/table refs}"]
children:
- id: N03
type: decision
title: "{choice made based on N02 results}"
provenance: user
timestamp: "YYYY-MM-DDTHH:MM"
choice: >
{what was chosen and why}
alternatives:
- "{option not chosen}"
evidence: >
{what motivated this — reference parent nodes}
children:
- id: N04
type: dead_end
title: "{approach that failed}"
provenance: user
timestamp: "YYYY-MM-DDTHH:MM"
hypothesis: >
{what was expected to work}
failure_mode: >
{why it failed}
lesson: >
{what was learned}
- id: N05
type: experiment
title: "{alternative that worked}"
also_depends_on: [N02] # cross-edge: also informed by N02
provenance: ai-executed
timestamp: "YYYY-MM-DDTHH:MM"
result: >
{outcome}
evidence: [C{XX}]
- id: N06
type: dead_end
title: "{sibling approach tried from N01}"
provenance: user
timestamp: "YYYY-MM-DDTHH:MM"
hypothesis: >
{what was expected}
failure_mode: >
{why it failed}
lesson: >
{what was learned — motivated N02's direction}
- id: N07
type: pivot
title: "{new top-level research thread}"
provenance: user
timestamp: "YYYY-MM-DDTHH:MM"
from: "{previous direction}"
to: "{new direction}"
trigger: "{what caused the change}"
| Type | Required Fields | When to Use |
|------|----------------|-------------|
| question | description | Root research question or sub-question |
| decision | choice, alternatives, evidence | User chose between options |
| experiment | result, evidence | Test/benchmark produced a result |
| dead_end | hypothesis, failure_mode, lesson | Approach abandoned |
| pivot | from, to, trigger | Major direction change |
## C{XX}: {title}
- **Statement**: {falsifiable assertion}
- **Status**: hypothesis | untested | testing | supported | weakened | refuted | revised
- **Provenance**: user | ai-suggested | user-revised
- **Falsification criteria**: {what would disprove this}
- **Proof**: [{evidence refs or "pending"}]
- **Dependencies**: [C{YY}, ...]
- **Tags**: {comma-separated}
## H{XX}: {title}
- **Rationale**: {why this works}
- **Provenance**: user | ai-suggested | user-revised
- **Sensitivity**: low | medium | high
- **Code ref**: [{file paths}]
- id: O{XX}
timestamp: "YYYY-MM-DDTHH:MM"
provenance: user | ai-suggested | ai-executed
content: "{raw observation}"
context: "{what was happening}"
potential_type: claim | heuristic | decision | unknown
promoted: false
session:
id: "YYYY-MM-DD_NNN"
timestamp: "YYYY-MM-DDTHH:MM"
summary: "{one-line summary of what happened}"
events_logged:
- type: decision | experiment | dead_end | pivot | claim | heuristic | observation
id: "{N/C/H/O}{XX}"
provenance: user | ai-suggested | ai-executed | user-revised
summary: "{what}"
ai_actions:
- action: "{what AI did}"
provenance: ai-executed
files_changed: ["{paths}"]
claims_touched:
- id: C{XX}
action: created | advanced | weakened | confirmed
provenance: user | ai-suggested
open_threads:
- "{what needs follow-up}"
ai_suggestions_pending:
- "{unconfirmed AI suggestions from this session}"
Create the full directory structure and seed files automatically. Do not ask.
mkdir -p ara/{logic/solution,src/{configs,kernel},trace/sessions,evidence/{tables,figures},staging}
Then write:
ara/PAPER.md — root manifest (infer title, authors, venue from project context)ara/trace/sessions/session_index.yaml — sessions: []ara/trace/exploration_tree.yaml — tree: []ara/staging/observations.yaml — observations: []ara/logic/claims.md — # Claimsara/logic/problem.md — # Problemara/logic/solution/heuristics.md — # Heuristicsara/evidence/README.md — # Evidence IndexWhile reviewing staging/observations.yaml:
ai-suggested)evidence/<!-- CONFLICT: contradicts C{XX} -->stale: trueara/ files to get current state (IDs, claims, tree).ara/trace/sessions/YYYY-MM-DD_NNN.yaml.ara/trace/sessions/session_index.yaml.ai-suggested stays until user explicitly confirms.For detailed protocol and taxonomy specifications, load on demand:
development
Performs ARA Seal Level 2 semantic epistemic review on Agent-Native Research Artifacts, scoring six dimensions (evidence relevance, falsifiability, scope calibration, argument coherence, exploration integrity, methodological rigor) and producing a constructive, severity-ranked report with a Strong Accept-to-Reject recommendation. Use after Level 1 structural validation passes, when an ARA needs an objective epistemic critique before publication or release.
development
Compiles any research input — PDF papers, GitHub repositories, experiment logs, code directories, or raw notes — into a complete Agent-Native Research Artifact (ARA) with cognitive layer (claims, concepts, heuristics), physical layer (configs, code stubs), exploration graph, and grounded evidence. Use when ingesting a paper or codebase into a structured, machine-executable knowledge package, building an ARA from scratch, or converting research outputs into a falsifiable, agent-traversable form.
testing
Comprehensive guide for writing systems papers targeting OSDI, SOSP, ASPLOS, NSDI, and EuroSys. Provides paragraph-level structural blueprints, writing patterns, venue-specific checklists, reviewer guidelines, LaTeX templates, and conference deadlines. Use this skill for all systems conference paper writing.
development
Provides guidance for automatically evolving and optimizing AI agents across any domain using LLM-driven evolution algorithms. Use when building self-improving agents, optimizing agent prompts and skills against benchmarks, or implementing automated agent evaluation loops.