src/autoskillit/skills_extended/generate-report/SKILL.md
Synthesize experiment results into a structured research report in the research/ folder. Supports --inconclusive flag.
npx skillsauth add talont-org/autoskillit generate-reportInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Synthesize scope findings, experiment design, and experiment results into a
structured research report. The report is committed to the research/ directory
in the worktree and becomes the primary deliverable of the research recipe.
This skill handles both conclusive and inconclusive outcomes — inconclusive results are valid findings, not failures.
research recipe (phase 2)/autoskillit:run-experiment has produced results (or after retry exhaustion)/autoskillit:generate-report {worktree_path} {results_path} [--inconclusive]
[--output-mode {local|pr}] [--issue-url {url}]
[--experiment-type {type}] [--methodology-traditions {tradition}]
{worktree_path} — Absolute path to the worktree (required). First path-like
token after the skill name.{results_path} — Absolute path to the experiment results file (required).
Second path-like token.--inconclusive — Optional flag indicating experiments were inconclusive
(retry exhaustion or insufficient evidence). When present, the report
emphasizes what was learned and why evidence was insufficient, rather than
framing as a failure.--output-mode {local|pr} — Optional. When 'local', the report header must include an
issue reference link if --issue-url is also provided.--issue-url {url} — Optional. GitHub issue URL. When output_mode=local and this is
supplied, inject a blockquote reference at the top of report.md. In pr mode, skip —
the PR body handles the issue link.--experiment-type {type} — Optional. Experiment type classification (e.g., "benchmark",
"causal_inference"). When absent or "null", write experiment_type: null in frontmatter.--methodology-traditions {tradition} — Optional. Methodology tradition slug from Tier-C
routing (e.g., "controlled_intervention"). When absent or "null", write
methodology_traditions: [] in frontmatter; otherwise wrap in a YAML list.--design-review-verdict {verdict} — Optional. GO/REVISE/STOP verdict from design review.--disambiguation-rule-applied {rule} — Optional. Disambiguation rule applied during Tier-C routing.--tier-c-lens {lens} — Optional. Tier-C vis-lens selected.--classification-timestamp {timestamp} — Optional. UTC timestamp of design classification.In addition to the arguments above, this skill reads from the worktree:
${RESEARCH_DIR}/visualization-plan.md — figure inventory and yaml:figure-spec
blocks produced by plan-visualization. Read in Step 2.5 to drive plot generation.${RESEARCH_DIR}/report-plan.md — section outline mapping figure IDs to report
sections. Read in Step 3 to place figure references correctly.NEVER:
research/ directoryresearch/ directoryrun_in_background: true is prohibited)ALWAYS:
model: "sonnet" when spawning all subagents via the Task toolresearch/ in the worktree rootWhen context is exhausted mid-execution, the report may be partially written
but not yet committed to the worktree. The recipe routes to on_context_limit: test,
preserving whatever was written.
Before emitting structured output tokens:
git -C {worktree_path} status --porcelaingit -C {worktree_path} add -A && git -C {worktree_path} commit -m "chore: commit partial report before context limit"This ensures the partial report is committed and the downstream test step can evaluate whatever research output was produced.
Read all available artifacts from the worktree:
{{AUTOSKILLIT_TEMP}}/experiment-plan.md{{AUTOSKILLIT_TEMP}}/scope/ (if available in worktree){results_path}{{AUTOSKILLIT_TEMP}}/run-experiment/{{AUTOSKILLIT_TEMP}}/run-experiment/ for
*_metrics.json files (e.g., accuracy_metrics.json, parity_metrics.json).
If present, read them — they will populate the Standardized Metrics Assessment
section of the report.Parse arguments for --output-mode and --issue-url.
If --output-mode local AND --issue-url {url} is present:
report.md, before the title:
> This research addresses [Issue #{N}]({url})
(Include a blank line after the blockquote before the title.)In pr mode: omit this step entirely — the PR body contains the issue link.
Based on the --inconclusive flag and the experiment results status:
Conclusive (no --inconclusive flag):
Inconclusive (--inconclusive flag or status = INCONCLUSIVE/FAILED):
visualization-plan.md if it exists.data_source.path exists and has data, generate
the plot (run Step 2.5 normally for that figure).> **[Figure {id} not produced]** — experiment concluded inconclusively;
> data required for this figure was not produced.
Preserve the original yaml:figure-spec YAML block in the report for
reproducibility, indented under a <details> collapsible.If ${RESEARCH_DIR}/visualization-plan.md exists:
Read visualization-plan.md. If it contains zero figure specs (empty plan),
omit all sub-steps and proceed to Step 3.
Identify the experiment's Docker image. The image tag is research-{slug} where
{slug} is the experiment directory name. Verify the image exists:
docker image inspect "research-{slug}" > /dev/null 2>&1 || \
(cd "${RESEARCH_DIR}" && docker build --build-arg MAMBA_ENV={slug} -t "research-{slug}" .)
For each yaml:figure-spec block in visualization-plan.md:
a. Write a Python plotting script to
${RESEARCH_DIR}/scripts/fig{N}_{slug}.py
that reads from data_source.path (or scans results/ and data/ if
the path does not exist — treat data_source.path as a hint).
b. Run the script inside the experiment container (volume-mount research dir):
docker run --rm \
-v "${RESEARCH_DIR}:/workspace" \
"research-{slug}" \
bash -c "pip install --quiet matplotlib seaborn plotly kaleido 2>/dev/null; \
python /workspace/scripts/fig${N}_${slug}.py"
c. Confirm output exists at ${RESEARCH_DIR}/images/fig-${N}.{png,svg}.
d. On failure: emit MISSING: fig-${N} — {error summary} to stdout and
continue with remaining figures. Do not abort the skill.
Commit scripts and images (if any were produced):
git add research/ && git commit -m "Add visualization scripts and figures"
The default section order below suits software engineering and computational research. Adjust to match domain conventions when the research question originates from a non-engineering field:
Rules:
Create the report directory and file:
research/YYYY-MM-DD-{slug}/
report.md # The main research report
scripts/ # Extracted experiment scripts (optional, if complex)
The {slug} is a kebab-case summary of the research topic (max 40 chars).
Write a YAML frontmatter block (fenced with ---) at the very top of report.md,
before the title heading. Use the values from --experiment-type and
--methodology-traditions flags. If a flag is absent or its value is the literal
string null, write the key with value null (not omitted). Always include
generated_at. If --output-mode local with --issue-url, the issue blockquote
goes AFTER the frontmatter, before the title.
b64b32f7 (fix(review): align --methodology-tradition synopsis to plural form) The report structure:
---
experiment_type: {value from --experiment-type, or null}
methodology_traditions:
- {value from --methodology-traditions, or empty list if null}
disambiguation_rule_applied: {value from --disambiguation-rule-applied, or null}
tier_c_lens: {value from --tier-c-lens, or null}
design_review_verdict: {value from --design-review-verdict, or null}
classification_timestamp: {value from --classification-timestamp, or null}
audit_trail_path:
design_review: research/{slug}/audit/design-review-dashboard.md
visualization_trace: research/{slug}/audit/visualization-plan-trace.md
generated_at: {ISO 8601 timestamp}
---
# {Research Title}
> Research report for [Issue #{N}]({issue_url}) — {date}
## Executive Summary
### Data Scope Statement (mandatory — include at start of Executive Summary)
Every report must begin the Executive Summary with a Data Scope Statement:
> **Data Scope:** All benchmarks were conducted on {comma-separated list of data types
> used, e.g., "synthetic Gaussian blobs (10K–100K points)"}. {Domain target} data was
> {present and used | absent — all results derive from synthetic data | partial — only
> {subset} was available}.
**Rules:**
- If ALL benchmarks used ONLY synthetic data and the research task was domain-specific:
state this explicitly. Do not claim domain-specific performance improvements derived
from synthetic data without this qualifier.
- If some hypotheses were marked N/A or BLOCKED due to missing data: state which
hypotheses were affected and why.
- Read the experiment plan's `data_manifest` (if available) to determine what data was
planned vs. what was actually used.
{2-3 paragraph overview: what was investigated, key methodology, headline
finding, and recommendation. Written last, placed first.}
## Background and Research Question
{Context: why this investigation was initiated, what decision it informs,
what was known before this experiment.}
## Methodology
### Experimental Design
{From the experiment design: hypothesis, variables, controls. Include
enough detail for independent reproduction.}
### Environment
- **Repository commit:** {output of `git rev-parse HEAD` — the exact commit this experiment ran against}
- **Branch:** {current branch name}
- **Package versions:** {output of the project's package manager — e.g., `pip freeze`, `conda list`, or the contents of lock files (e.g., `requirements.txt`, `environment.yml`). Include ALL relevant dependency versions, not just top-level.}
- **Hardware/OS:** {if relevant to the experiment}
- **Custom environment:** {if a micromamba/conda environment.yml was used, note it and its location}
### Procedure
{Step-by-step description of what was executed.}
## Results
{Present data from the experiment. Use tables, code blocks, or whatever
format best represents the measurements. No interpretation in this
section — just facts.}
### Figure References
Reference figures by ID and caption only. NEVER embed images with `` syntax.
The HTML report rendered by `bundle-local-report` reads `yaml:figure-spec` metadata
and inserts `<img>` tags at the correct sections. Markdown prose uses:
> "Figure 1 shows ..." or "(see Figure 1)"
### Metrics Provenance Check (mandatory before including any metrics)
Before including data from any `*_metrics.json` file:
1. **Check generation timestamp**: The file's modification time must be within the
current experiment's execution window. If the file predates the experiment run,
it is stale.
2. **Check content relevance**: Verify the metrics file's contents relate to the
hypotheses under test. If a metrics file contains data from a different subsystem
or experiment, it is irrelevant.
3. **Disposition:**
- **Current and relevant**: Include normally.
- **Stale**: Disclose in the report: "Note: {filename} predates the current
experiment run and was not regenerated. Excluded from analysis."
- **Irrelevant**: Disclose: "Note: {filename} contains {description of actual
contents} which is unrelated to the hypotheses under test. Excluded."
- **NEVER** silently drop a metrics file. Always disclose the reason for exclusion.
### Standardized Metrics
{Include this section when `*_metrics.json` files are present in
`{{AUTOSKILLIT_TEMP}}/run-experiment/`. Omit entirely if no metrics JSON was produced.}
| Metric | Dimension | Dataset | Value | Threshold | Status |
|--------|-----------|---------|-------|-----------|--------|
| {metric_name} | {Accuracy/Parity} | {dataset} | {value} | {threshold} | ✅ PASS / ❌ FAIL |
{If any metrics failed: note which solver level or dataset showed the failure
and whether it is within acceptable range for the experiment's scope.}
## Observations
{Notable patterns, anomalies, unexpected behaviors discovered during
the experiment.}
### Gate Enforcement (mandatory for all hypothesis results)
When reporting on pre-specified hypothesis gates:
1. **Use the gate threshold from the experiment plan**, not a different hypothesis's
threshold. Each hypothesis has its own pre-specified success criterion — do not
conflate them.
2. **When a gate is NOT met**: State this as a failure. Example: "H6 targeted ≥3×
speedup at n=100K. Measured: 2.04× at n=50K (estimated ~1.95× at n=100K). **FAIL.**"
3. **When recommending GO**: The GO recommendation must reference the specific gate(s)
that were met and their measured values. A GO on H1 (which has a ≥1.5× threshold)
does not satisfy H6 (which has a ≥3× threshold).
4. **NEVER** silently substitute one hypothesis's threshold for another. If H6's gate
is not met, H6 is a FAIL regardless of whether H1's lower threshold was met by the
same measurement.
## Analysis
{Interpret the results. Compare against the hypothesis. Explain anomalies.
Connect findings to the original research question. Include statistical
analysis if relevant to the experiment type.}
## What We Learned
{Regardless of outcome, document:}
- {Key insight 1}
- {Key insight 2}
- {Boundary conditions established}
- {Methodology learnings for future experiments}
## Conclusions
{Direct answer to the research question.}
## Design Review Summary
**Verdict:** {design_review_verdict}
The design review evaluated the experiment plan against {experiment_type} criteria.
The review confirmed the experimental design meets quality standards.
For detailed evaluation dimensions, scorecard, and adversarial findings, see
[`audit/design-review-dashboard.md`](audit/design-review-dashboard.md).
For visualization tier routing decisions and methodology tradition analysis, see
[`audit/visualization-plan-trace.md`](audit/visualization-plan-trace.md).
## Recommendations
*For biology and social science domains, use "Discussion and Future Directions" as the section title instead.*
{Actionable next steps based on findings — what to keep, revert, modify,
or investigate further. Include justification for each recommendation.
In non-engineering domains, frame this section as "Discussion and Future Directions":
interpret findings in the context of existing literature, explain implications,
and propose follow-on studies rather than prescriptive next steps.}
## Data Availability
{Optional — include when required by the target journal or domain conventions
(biology, medical, social science). State where datasets, code, and supplementary
materials are available, or explain any access restrictions (e.g., patient data,
proprietary datasets). Omit this section for internal engineering reports.}
## Appendix: Experiment Scripts
{Include key experiment scripts as fenced code blocks. These are preserved
for reproducibility even after the worktree is cleaned up.}
### {script_name.ext}
```{language}
{script content}
{Enumerate each script in ${RESEARCH_DIR}/scripts/fig*.py produced during
Step 2.5. Include the full script as a fenced Python code block. These are
preserved for figure reproducibility even after the worktree is cleaned up.}
{If raw data is small enough, include inline. Otherwise, reference the files committed alongside this report.}
### Step 4 — Commit and Emit
1. Create the research directory in the worktree:
`mkdir -p research/YYYY-MM-DD-{slug}/`
2. Write `report.md` to that directory.
3. If experiment scripts are complex (>50 lines), also save them as separate
files in `research/YYYY-MM-DD-{slug}/scripts/`.
4. Commit to the worktree:
git add research/ git commit -m "Add research report: {brief title}"
After committing, emit the structured output token as the very last line of
your text output:
> **IMPORTANT:** Emit the structured output tokens as **literal plain text with no
> markdown formatting on the token names**. Do not wrap token names in `**bold**`,
> `*italic*`, or any other markdown. The adjudicator performs a regex match on the
> exact token name — decorators cause match failure.
report_path = {absolute_path_to_report.md}
development
Generate YAML recipes for .autoskillit/recipes/. Use when user says "make script skill", "generate script", "script a workflow", "write a script", "create a script", "new recipe", "write a pipeline", or when loaded by other skills for script formatting.
data-ai
Create Uncertainty Representation visualization planning spec showing error bar definitions, distribution-aware alternatives, and multi-seed variance protocols. Statistical lens answering "How is uncertainty honestly represented?"
data-ai
Create Temporal Dynamics visualization planning spec showing axis scaling (linear vs log), smoothing disclosure, epoch/step alignment, run aggregation (mean + variance bands), early-stopping markers, and wall-clock vs step-count x-axis. Temporal lens answering "Are training dynamics shown clearly and honestly?"
data-ai
Create Narrative Story Arc visualization planning spec showing visual consistency across the report (same color = same model everywhere), logical figure progression, redundant figure detection, and narrative dependency between figures. Narrative lens answering "Do the figures tell a coherent story across the report?"