.github/skills/openvmm-ci-investigation/SKILL.md
Investigate CI failures on OpenVMM PRs. Load when a PR has failing CI checks, you need to download and analyze test artifacts, or you need to diagnose build, fmt, clippy, or VMM test failures.
npx skillsauth add microsoft/openvmm openvmm-ci-investigationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When a PR has failing CI checks, always start by running the investigation script. Do not manually query the GitHub API or download artifacts by hand — the script handles all of that automatically.
# Investigate a PR by number (ALWAYS use this first)
python3 repo_support/investigate_ci.py 2946
# Or by run ID directly
python3 repo_support/investigate_ci.py 23017249697
The script automatically:
*-unit-tests-junit-xml artifacts and parses JUnit XML for
unit test failures*-vmm-tests-logs artifacts if they existpetri.failed markers and extracts ERROR/WARN linesRead the script's output to identify:
Then use the information to diagnose the issue and suggest fixes.
Always try to diagnose the failure from CI logs, test code, and error messages first. Most failures can be understood without local reproduction. Read the relevant test source, trace the error through the code, and form a hypothesis before considering local repro.
If you cannot diagnose the failure from logs alone, ask the user whether they want to attempt local reproduction before proceeding. Do not automatically start building or running tests on the user's machine.
If the user agrees, check whether the failing platform matches the host architecture — you can only reproduce tests locally on the same arch. If it doesn't match, explain this to the user and continue diagnosing from CI logs and test code.
To reproduce locally, load the vmm-tests skill for instructions on
running with cargo xflowey vmm-tests-run. Do not use
cargo nextest run -p vmm_tests directly — it won't have the required
artifacts.
Only use these if the script fails or you need to dig deeper into a specific artifact. In normal usage, the script above is sufficient.
If the script isn't available or you need more control, follow these steps:
# Get the run ID for a PR
gh pr checks <PR_NUMBER> -R microsoft/openvmm
# Or list runs for a specific commit
gh run list -R microsoft/openvmm --commit <SHA>
gh run view <RUN_ID> -R microsoft/openvmm --json jobs \
-q '[.jobs[] | select(.conclusion == "failure") | {name, databaseId}]'
Unit test results are stored in JUnit XML artifacts named
{platform}-unit-tests-junit-xml. Known platforms include:
x64-linuxaarch64-linuxaarch64-linux-musl# Download unit test JUnit XML for a platform
gh run download <RUN_ID> -R microsoft/openvmm \
-n aarch64-linux-unit-tests-junit-xml -D /tmp/junit-xml
# Parse failures from the XML
python3 -c "
import xml.etree.ElementTree as ET, sys
for f in __import__('pathlib').Path(sys.argv[1]).rglob('*.xml'):
for tc in ET.parse(f).iter('testcase'):
fail = tc.find('failure')
if fail is None:
fail = tc.find('error')
if fail is not None:
print(f'FAIL: {tc.get(\"classname\",\"\")}::{tc.get(\"name\",\"\")}')
print(f' {fail.get(\"message\",\"\")[:200]}')
" /tmp/junit-xml
VMM test results are stored in artifacts named {platform}-vmm-tests-logs.
The known platforms are:
x64-windows-intelx64-windows-intel-tdxx64-windows-amdx64-windows-amd-snpx64-linuxaarch64-windows# Download a specific platform's test logs
gh run download <RUN_ID> -R microsoft/openvmm \
-n x64-windows-amd-snp-vmm-tests-logs -D /tmp/test-logs
Each test gets its own directory inside the artifact. Look for petri.failed
marker files (passing tests have petri.passed instead):
find /tmp/test-logs -name "petri.failed"
The petri.failed file contains the test name.
The petri.jsonl file in each test directory is the primary structured log.
Each line is a JSON object with fields: timestamp, source, severity,
message. Filter for ERROR and WARN severity for a quick diagnosis:
python3 -c "
import json, sys
for line in open(sys.argv[1]):
try:
e = json.loads(line.strip())
if e.get('severity') in ('ERROR', 'WARN'):
print(f'[{e[\"severity\"]}] {e.get(\"source\",\"?\")}: {e.get(\"message\",\"\").strip()[:200]}')
except: pass
" /tmp/test-logs/<test-dir>/petri.jsonl
Artifacts named {platform}-unit-tests-junit-xml contain JUnit XML files
with <testcase> elements. Failed tests have <failure> or <error>
children with message attributes describing the failure. These are the
primary artifacts for diagnosing unit test / cargo-nextest failures.
Each test directory contains:
petri.jsonl — Structured JSON Lines log (primary file for investigation)petri.log — Plain text version of the test logpetri.passed or petri.failed — Pass/fail markeropenhcl.log — OpenHCL serial console output, if the test exercised OpenHCLhyperv.log — Hyper-V event log, if the test exercises the Hyper-V backendopenvmm.log — OpenVMM serial console output, if the test exercises the OpenVMM backendguest.log, uefi.log — Guest OS serial outputscreenshot_*.png — periodic screenshots of the guestdumpfile.dmpTest results are uploaded to Azure Blob Storage and viewable at:
https://openvmm.dev/test-results/#/runs/<RUN_ID>
unit tests job failed. The script downloads
JUnit XML artifacts and shows the failing test names and messages. Common
causes: new test code that relies on OS capabilities not available in CI
(e.g. TAP devices, elevated permissions).quick check [fmt, clippy x64-linux] job
failed. No test artifacts will exist. Run cargo xtask fmt --fix locally
and check the job log for the specific rule that failed.petri.jsonl for
Hyper-V Worker/Chipset errors. Often infrastructure-related, not caused by
the PR's code changes.guest.log.gh run view <RUN_ID> --job <JOB_ID> --log.The gh run view --json artifacts flag does not exist. To list artifacts
for a run, use the GitHub API directly:
gh api repos/microsoft/openvmm/actions/runs/<RUN_ID>/artifacts
tools
Maintain the OpenVMM Guide and its code-sync mapping. Load when: (1) adding, removing, or moving Guide pages, (2) adding new device crates or CLI args that need doc coverage, (3) updating the doc-code-sync mapping table, or (4) auditing Guide freshness against code changes.
development
Run, optimize, and debug OpenVMM fuzzers. Covers cargo-fuzz targets, crash reproduction, lldb debugging, code coverage analysis, entropy optimization, and multi-target parallel campaigns.
tools
Use when work should span one or more detached tasks but still behave like one job with a single owner context. TaskFlow is the durable flow substrate under authoring layers like Lobster, ACPX, plugins, or plain code. Keep conditional logic in the caller; use TaskFlow for flow identity, child-task linkage, waiting state, revision-checked mutations, and user-facing emergence.
tools
# Lobster Lobster executes multi-step workflows with approval checkpoints. Use it when: - User wants a repeatable automation (triage, monitor, sync) - Actions need human approval before executing (send, post, delete) - Multiple tool calls should run as one deterministic operation ## When to use Lobster | User intent | Use Lobster? | | ------------------------------------------------------ | --------------------------