Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

steipete/openclaw-qa-testing

Name: openclaw-qa-testing
Author: steipete

.agents/skills/openclaw-qa-testing/SKILL.md

npx skillsauth add steipete/clawdis openclaw-qa-testing

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

OpenClaw QA Testing

Use this skill for qa-lab / qa-channel work. Repo-local QA only.

Read first

docs/concepts/qa-e2e-automation.md
docs/help/testing.md
docs/channels/qa-channel.md
qa/README.md
qa/scenarios/index.md
extensions/qa-lab/src/suite.ts
extensions/qa-lab/src/character-eval.ts

Model policy

Live OpenAI lane: openai/gpt-5.4
Fast mode: on
Do not use:
- openai/gpt-5.4-pro
- openai/gpt-5.4-mini
Only change model policy if the user explicitly asks.

Default workflow

Read the scenario pack and current suite implementation.
Decide lane:
- mock/dev: mock-openai
- real validation: live-frontier
For live OpenAI, use:

OPENCLAW_LIVE_OPENAI_KEY="${OPENAI_API_KEY}" \
pnpm openclaw qa suite \
  --provider-mode live-frontier \
  --model openai/gpt-5.4 \
  --alt-model openai/gpt-5.4 \
  --output-dir .artifacts/qa-e2e/run-all-live-frontier-<tag>

Watch outputs:
- summary: .artifacts/qa-e2e/run-all-live-frontier-<tag>/qa-suite-summary.json
- report: .artifacts/qa-e2e/run-all-live-frontier-<tag>/qa-suite-report.md
If the user wants to watch the live UI, find the current openclaw-qa listen port and report http://127.0.0.1:<port>.
If a scenario fails, fix the product or harness root cause, then rerun the full lane.

Character evals

Use qa character-eval for style/persona/vibe checks across multiple live models.

pnpm openclaw qa character-eval \
  --model openai/gpt-5.4,thinking=xhigh \
  --model openai/gpt-5.2,thinking=xhigh \
  --model openai/gpt-5,thinking=xhigh \
  --model anthropic/claude-opus-4-6,thinking=high \
  --model anthropic/claude-sonnet-4-6,thinking=high \
  --model zai/glm-5.1,thinking=high \
  --model moonshot/kimi-k2.5,thinking=high \
  --model google/gemini-3.1-pro-preview,thinking=high \
  --judge-model openai/gpt-5.4,thinking=xhigh,fast \
  --judge-model anthropic/claude-opus-4-6,thinking=high \
  --concurrency 16 \
  --judge-concurrency 16 \
  --output-dir .artifacts/qa-e2e/character-eval-<tag>

Runs local QA gateway child processes, not Docker.
Preferred model spec syntax is provider/model,thinking=<level>[,fast|,no-fast|,fast=<bool>] for both --model and --judge-model.
Do not add new examples with separate --model-thinking; keep that flag as legacy compatibility only.
Defaults to candidate models openai/gpt-5.4, openai/gpt-5.2, openai/gpt-5, anthropic/claude-opus-4-6, anthropic/claude-sonnet-4-6, zai/glm-5.1, moonshot/kimi-k2.5, and google/gemini-3.1-pro-preview when no --model is passed.
Candidate thinking defaults to high, with xhigh for OpenAI models that support it. Prefer inline --model provider/model,thinking=<level>; --thinking <level> and --model-thinking <provider/model=level> remain compatibility shims.
OpenAI candidate refs default to fast mode so priority processing is used where supported. Use inline ,fast, ,no-fast, or ,fast=false for one model; use --fast only to force fast mode for every candidate.
Judges default to openai/gpt-5.4,thinking=xhigh,fast and anthropic/claude-opus-4-6,thinking=high.
Report includes judge ranking, run stats, durations, and full transcripts; do not include raw judge replies. Duration is benchmark context, not a grading signal.
Candidate and judge concurrency default to 16. Use --concurrency <n> and --judge-concurrency <n> to override when local gateways or provider limits need a gentler lane.
Scenario source should stay markdown-driven under qa/scenarios/.
For isolated character/persona evals, write the persona into SOUL.md and blank IDENTITY.md in the scenario flow. Use SOUL.md + IDENTITY.md only when intentionally testing how the normal OpenClaw identity combines with the character.
Keep prompts natural and task-shaped. The candidate model should receive character setup through SOUL.md, then normal user turns such as chat, workspace help, and small file tasks; do not ask "how would you react?" or tell the model it is in an eval.
Prefer at least one real task, such as creating or editing a tiny workspace artifact, so the transcript captures character under normal tool use instead of pure roleplay.

Codex CLI model lane

Use model refs shaped like codex-cli/<codex-model> whenever QA should exercise Codex as a model backend.

Examples:

pnpm openclaw qa suite \
  --provider-mode live-frontier \
  --model codex-cli/<codex-model> \
  --alt-model codex-cli/<codex-model> \
  --scenario <scenario-id> \
  --output-dir .artifacts/qa-e2e/codex-<tag>

pnpm openclaw qa manual \
  --model codex-cli/<codex-model> \
  --message "Reply exactly: CODEX_OK"

Treat the concrete Codex model name as user/config input; do not hardcode it in source, docs examples, or scenarios.
Live QA preserves CODEX_HOME so Codex CLI auth/config works while keeping HOME and OPENCLAW_HOME sandboxed.
Mock QA should scrub CODEX_HOME.
If Codex returns fallback/auth text every turn, first check CODEX_HOME, ~/.profile, and gateway child logs before changing scenario assertions.
For model comparison, include codex-cli/<codex-model> as another candidate in qa character-eval; the report should label it as an opaque model name.

Repo facts

Seed scenarios live in qa/.
Main live runner: extensions/qa-lab/src/suite.ts
QA lab server: extensions/qa-lab/src/lab-server.ts
Child gateway harness: extensions/qa-lab/src/gateway-child.ts
Synthetic channel: extensions/qa-channel/

What “done” looks like

Full suite green for the requested lane.
User gets:
- watch URL if applicable
- pass/fail counts
- artifact paths
- concise note on what was fixed

Common failure patterns

Live timeout too short:
- widen live waits in extensions/qa-lab/src/suite.ts
Discovery cannot find repo files:
- point prompts at repo/... inside seeded workspace
Subagent proof too brittle:
- prefer stable final reply evidence over transient child-session listing
Harness “rebuild” delay:
- dirty tree can trigger a pre-run build; expect that before ports appear

When adding scenarios

Add or update scenario markdown under qa/scenarios/
Keep kickoff expectations in qa/scenarios/index.md aligned
Add executable coverage in extensions/qa-lab/src/suite.ts
Prefer end-to-end assertions over mock-only checks
Save outputs under .artifacts/qa-e2e/

steipete/openclaw-qa-testing

.agents/skills/openclaw-qa-testing/SKILL.md

Run, watch, debug, and extend OpenClaw QA testing with qa-lab and qa-channel. Use when Codex needs to execute the repo-backed QA suite, inspect live QA artifacts, debug failing scenarios, add new QA scenarios, or explain the OpenClaw QA workflow. Prefer the live OpenAI lane with regular openai/gpt-5.4 in fast mode; do not use gpt-5.4-pro or gpt-5.4-mini unless the user explicitly overrides that policy.

357,588 stars

development

Updated Apr 15, 2026

$ install --global

skillsauth

npx skillsauth add steipete/clawdis openclaw-qa-testing

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 15, 2026, 6:22 AM7.9s2 files scanned

SKILL.md

name:: openclaw-qa-testing
description:: Run, watch, debug, and extend OpenClaw QA testing with qa-lab and qa-channel. Use when Codex needs to execute the repo-backed QA suite, inspect live QA artifacts, debug failing scenarios, add new QA scenarios, or explain the OpenClaw QA workflow. Prefer the live OpenAI lane with regular openai/gpt-5.4 in fast mode; do not use gpt-5.4-pro or gpt-5.4-mini unless the user explicitly overrides that policy.

OpenClaw QA Testing

Use this skill for qa-lab / qa-channel work. Repo-local QA only.

Read first

docs/concepts/qa-e2e-automation.md
docs/help/testing.md
docs/channels/qa-channel.md
qa/README.md
qa/scenarios/index.md
extensions/qa-lab/src/suite.ts
extensions/qa-lab/src/character-eval.ts

Model policy

Live OpenAI lane: openai/gpt-5.4
Fast mode: on
Do not use:
- openai/gpt-5.4-pro
- openai/gpt-5.4-mini
Only change model policy if the user explicitly asks.

Default workflow

Read the scenario pack and current suite implementation.
Decide lane:
- mock/dev: mock-openai
- real validation: live-frontier
For live OpenAI, use:

OPENCLAW_LIVE_OPENAI_KEY="${OPENAI_API_KEY}" \
pnpm openclaw qa suite \
  --provider-mode live-frontier \
  --model openai/gpt-5.4 \
  --alt-model openai/gpt-5.4 \
  --output-dir .artifacts/qa-e2e/run-all-live-frontier-<tag>

Watch outputs:
- summary: .artifacts/qa-e2e/run-all-live-frontier-<tag>/qa-suite-summary.json
- report: .artifacts/qa-e2e/run-all-live-frontier-<tag>/qa-suite-report.md
If the user wants to watch the live UI, find the current openclaw-qa listen port and report http://127.0.0.1:<port>.
If a scenario fails, fix the product or harness root cause, then rerun the full lane.

Character evals

Use qa character-eval for style/persona/vibe checks across multiple live models.

pnpm openclaw qa character-eval \
  --model openai/gpt-5.4,thinking=xhigh \
  --model openai/gpt-5.2,thinking=xhigh \
  --model openai/gpt-5,thinking=xhigh \
  --model anthropic/claude-opus-4-6,thinking=high \
  --model anthropic/claude-sonnet-4-6,thinking=high \
  --model zai/glm-5.1,thinking=high \
  --model moonshot/kimi-k2.5,thinking=high \
  --model google/gemini-3.1-pro-preview,thinking=high \
  --judge-model openai/gpt-5.4,thinking=xhigh,fast \
  --judge-model anthropic/claude-opus-4-6,thinking=high \
  --concurrency 16 \
  --judge-concurrency 16 \
  --output-dir .artifacts/qa-e2e/character-eval-<tag>

Runs local QA gateway child processes, not Docker.
Preferred model spec syntax is provider/model,thinking=<level>[,fast|,no-fast|,fast=<bool>] for both --model and --judge-model.
Do not add new examples with separate --model-thinking; keep that flag as legacy compatibility only.
Defaults to candidate models openai/gpt-5.4, openai/gpt-5.2, openai/gpt-5, anthropic/claude-opus-4-6, anthropic/claude-sonnet-4-6, zai/glm-5.1, moonshot/kimi-k2.5, and google/gemini-3.1-pro-preview when no --model is passed.
Candidate thinking defaults to high, with xhigh for OpenAI models that support it. Prefer inline --model provider/model,thinking=<level>; --thinking <level> and --model-thinking <provider/model=level> remain compatibility shims.
OpenAI candidate refs default to fast mode so priority processing is used where supported. Use inline ,fast, ,no-fast, or ,fast=false for one model; use --fast only to force fast mode for every candidate.
Judges default to openai/gpt-5.4,thinking=xhigh,fast and anthropic/claude-opus-4-6,thinking=high.
Report includes judge ranking, run stats, durations, and full transcripts; do not include raw judge replies. Duration is benchmark context, not a grading signal.
Candidate and judge concurrency default to 16. Use --concurrency <n> and --judge-concurrency <n> to override when local gateways or provider limits need a gentler lane.
Scenario source should stay markdown-driven under qa/scenarios/.
For isolated character/persona evals, write the persona into SOUL.md and blank IDENTITY.md in the scenario flow. Use SOUL.md + IDENTITY.md only when intentionally testing how the normal OpenClaw identity combines with the character.
Keep prompts natural and task-shaped. The candidate model should receive character setup through SOUL.md, then normal user turns such as chat, workspace help, and small file tasks; do not ask "how would you react?" or tell the model it is in an eval.
Prefer at least one real task, such as creating or editing a tiny workspace artifact, so the transcript captures character under normal tool use instead of pure roleplay.

Codex CLI model lane

Use model refs shaped like codex-cli/<codex-model> whenever QA should exercise Codex as a model backend.

Examples:

pnpm openclaw qa suite \
  --provider-mode live-frontier \
  --model codex-cli/<codex-model> \
  --alt-model codex-cli/<codex-model> \
  --scenario <scenario-id> \
  --output-dir .artifacts/qa-e2e/codex-<tag>

pnpm openclaw qa manual \
  --model codex-cli/<codex-model> \
  --message "Reply exactly: CODEX_OK"

Treat the concrete Codex model name as user/config input; do not hardcode it in source, docs examples, or scenarios.
Live QA preserves CODEX_HOME so Codex CLI auth/config works while keeping HOME and OPENCLAW_HOME sandboxed.
Mock QA should scrub CODEX_HOME.
If Codex returns fallback/auth text every turn, first check CODEX_HOME, ~/.profile, and gateway child logs before changing scenario assertions.
For model comparison, include codex-cli/<codex-model> as another candidate in qa character-eval; the report should label it as an opaque model name.

Repo facts

Seed scenarios live in qa/.
Main live runner: extensions/qa-lab/src/suite.ts
QA lab server: extensions/qa-lab/src/lab-server.ts
Child gateway harness: extensions/qa-lab/src/gateway-child.ts
Synthetic channel: extensions/qa-channel/

What “done” looks like

Full suite green for the requested lane.
User gets:
- watch URL if applicable
- pass/fail counts
- artifact paths
- concise note on what was fixed

Common failure patterns

Live timeout too short:
- widen live waits in extensions/qa-lab/src/suite.ts
Discovery cannot find repo files:
- point prompts at repo/... inside seeded workspace
Subagent proof too brittle:
- prefer stable final reply evidence over transient child-session listing
Harness “rebuild” delay:
- dirty tree can trigger a pre-run build; expect that before ports appear

When adding scenarios

Add or update scenario markdown under qa/scenarios/
Keep kickoff expectations in qa/scenarios/index.md aligned
Add executable coverage in extensions/qa-lab/src/suite.ts
Prefer end-to-end assertions over mock-only checks
Save outputs under .artifacts/qa-e2e/

Related Skills

steipete/extensions/lobster

tools

VerifiedTrustedCommunity

# Lobster Lobster executes multi-step workflows with approval checkpoints. Use it when: - User wants a repeatable automation (triage, monitor, sync) - Actions need human approval before executing (send, post, delete) - Multiple tool calls should run as one deterministic operation ## When to use Lobster | User intent | Use Lobster? | | ------------------------------------------------------ | --------------------------

357,588SKILL.mdUpdated Apr 13, 2026

steipete/extensions/lobster

steipete/openclaw-secret-scanning-maintainer

development

VerifiedTrustedCommunity

Maintainer-only workflow for handling GitHub Secret Scanning alerts on OpenClaw. Use when Codex needs to triage, redact, clean up, and resolve secret leakage found in issue comments, issue bodies, PR comments, or other GitHub content.

357,588SKILL.mdUpdated Apr 13, 2026

steipete/openclaw-secret-scanning-maintainer

steipete/openclaw-release-maintainer

development

VerifiedTrustedCommunity

Maintainer workflow for OpenClaw releases, prereleases, changelog release notes, and publish validation. Use when Codex needs to prepare or verify stable or beta release steps, align version naming, assemble release notes, check release auth requirements, or validate publish-time commands and artifacts.

357,588SKILL.mdUpdated Apr 13, 2026

steipete/openclaw-release-maintainer

steipete/openclaw-parallels-smoke

development

VerifiedTrustedCommunity

End-to-end Parallels smoke, upgrade, and rerun workflow for OpenClaw across macOS, Windows, and Linux guests. Use when Codex needs to run, rerun, debug, or interpret VM-based install, onboarding, gateway smoke tests, latest-release-to-main upgrade checks, fresh snapshot retests, or optional Discord roundtrip verification under Parallels.

357,588SKILL.mdUpdated Apr 13, 2026

steipete/openclaw-parallels-smoke

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/steipete/clawdis.git

# Copy into Claude Code skills folder (global)
cp -r clawdis/.agents/skills/openclaw-qa-testing ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

steipete/clawdis

357,588 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT