Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

jc1122/perf-benchmark

Name: perf-benchmark
Author: jc1122

/SKILL.md

npx skillsauth add jc1122/perf-benchmark-skill perf-benchmark

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Performance Benchmark Pipeline

Overview

Run a deterministic Linux performance diagnosis. The pipeline profiles an explicit benchmark target or binary, scores a 7-dimension rubric, writes raw artifacts, and can emit shared-schema PERF findings plus an append-only trend ledger.

Use this skill when you need evidence for algorithmic scaling, wall-time stability, CPU efficiency, cache behavior, branch prediction, memory profile, or ASM-level review.

Command

python scripts/perf_benchmark_pipeline.py \
  --root /path/to/repo \
  --out-dir /tmp/perf-bench \
  --target "python -m benchmark_entrypoint {SIZE}" \
  --sizes 1000,4000,16000 \
  --tier fast \
  --max-cv 5.0 \
  --findings-out /tmp/perf-findings.json \
  --baseline-ledger /tmp/perf-ledger.jsonl

Use --binary ./program instead of --target for standalone binaries. Use --target or --binary for non-pytest repos. Pytest benchmark autodiscovery is a convenience for Python repos. Multi-size explicit targets must include {SIZE}.

Key Flags

--root: repository root under analysis.
--out-dir: output directory for reports and raw profiler artifacts.
--target: command template to benchmark; include {SIZE} with --sizes.
--binary: standalone binary entrypoint.
--source-prefix: project source filter; repeat for multiple prefixes.
--tier: fast, medium, deep, or asm.
--sizes: comma-separated input sizes for scaling checks.
--expected-complexity: expected growth class for scaling scoring.
--max-cv: timing-noise gate; noisy timing dimensions become N/A (noise).
--baseline: point-in-time benchmark_summary.json comparison.
--baseline-ledger: append JSONL history and compare vs last/best entries.
--findings-out: shared-schema PERF findings for FAIL/WARN dimensions.
--perf-record: opt-in native sampled hotspots in deep/asm runs.
--asm-audit: include objdump/Numba ASM checks in asm tier.

Tiers

fast: pytest-benchmark or direct timing, tracemalloc, GNU time.
medium: fast plus cachegrind and callgrind.
deep: medium plus massif, perf stat, and optional --perf-record.
asm: deep plus objdump and optional Numba ASM inspection.

Outputs

benchmark_report.md: human report with scorecard and prescriptions.
benchmark_summary.json: machine summary and regression comparison data.
perf_findings.json: PERF findings when --findings-out is set.
baseline_ledger.jsonl: trend history when --baseline-ledger is set.
tier1/ through tier4/: raw profiler artifacts by tier.

Interpretation

The rubric scores Algorithmic Scaling, Wall-Time Stability, CPU Efficiency, L1 Cache, Last-Level Cache, Branch Prediction, and Memory Profile from 0 to 4. If Algorithmic Scaling fails, fix asymptotic behavior before cache, branch, or ASM tuning. Full scaling evidence requires multi-size runs; allocation churn requires deep or asm because it comes from massif. Full Algorithmic Scaling scoring requires deep or asm because allocation churn comes from massif.

Regression example:

python scripts/perf_benchmark_pipeline.py \
  --root . --out-dir /tmp/bench --sizes 1000,4000 \
  --target "./path/to/benchmark {SIZE}" --baseline /path/to/previous/benchmark_summary.json

Agent Parallelism

Tier 1 stays isolated because timing and tracemalloc measurements are noise-sensitive. Preferred subagent split: per-artifact or per-rubric-dimension after the pipeline finishes.

References

references/rubric.md: thresholds and scoring details.
references/tool-guide.md: profiler selection and limitations.
references/perf-remediation-playbook.md: measure/change/re-measure rules.
references/question-bank.md: advisory diagnosis prompts.
references/finding-schema.json: PERF finding schema.
references/sample-report.md: compact example report.

Limits

Linux only; /proc, /sys, Valgrind, and perf availability shape depth.
Valgrind slows runs heavily and models L1 plus last-level cache only.
perf stat and --perf-record require permissive perf_event_paranoid.
tracemalloc sees Python allocations, not all native/C extension memory.
Noisy timing is refused by --max-cv instead of being over-interpreted.

jc1122/perf-benchmark

/SKILL.md

Use when profiling Linux Python or C workloads for algorithmic scaling, cache, branch, memory, or ASM bottlenecks, or when comparing a benchmark run against a saved performance baseline.

1 stars

development

Updated Jun 16, 2026

$ install --global

skillsauth

npx skillsauth add jc1122/perf-benchmark-skill perf-benchmark

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 16, 2026, 8:18 AM122.4s53 files scanned

SKILL.md

name:: perf-benchmark
version:: 0.6.1
description:: >

Performance Benchmark Pipeline

Overview

Use this skill when you need evidence for algorithmic scaling, wall-time stability, CPU efficiency, cache behavior, branch prediction, memory profile, or ASM-level review.

Command

python scripts/perf_benchmark_pipeline.py \
  --root /path/to/repo \
  --out-dir /tmp/perf-bench \
  --target "python -m benchmark_entrypoint {SIZE}" \
  --sizes 1000,4000,16000 \
  --tier fast \
  --max-cv 5.0 \
  --findings-out /tmp/perf-findings.json \
  --baseline-ledger /tmp/perf-ledger.jsonl

Key Flags

--root: repository root under analysis.
--out-dir: output directory for reports and raw profiler artifacts.
--target: command template to benchmark; include {SIZE} with --sizes.
--binary: standalone binary entrypoint.
--source-prefix: project source filter; repeat for multiple prefixes.
--tier: fast, medium, deep, or asm.
--sizes: comma-separated input sizes for scaling checks.
--expected-complexity: expected growth class for scaling scoring.
--max-cv: timing-noise gate; noisy timing dimensions become N/A (noise).
--baseline: point-in-time benchmark_summary.json comparison.
--baseline-ledger: append JSONL history and compare vs last/best entries.
--findings-out: shared-schema PERF findings for FAIL/WARN dimensions.
--perf-record: opt-in native sampled hotspots in deep/asm runs.
--asm-audit: include objdump/Numba ASM checks in asm tier.

Tiers

fast: pytest-benchmark or direct timing, tracemalloc, GNU time.
medium: fast plus cachegrind and callgrind.
deep: medium plus massif, perf stat, and optional --perf-record.
asm: deep plus objdump and optional Numba ASM inspection.

Outputs

benchmark_report.md: human report with scorecard and prescriptions.
benchmark_summary.json: machine summary and regression comparison data.
perf_findings.json: PERF findings when --findings-out is set.
baseline_ledger.jsonl: trend history when --baseline-ledger is set.
tier1/ through tier4/: raw profiler artifacts by tier.

Interpretation

Regression example:

python scripts/perf_benchmark_pipeline.py \
  --root . --out-dir /tmp/bench --sizes 1000,4000 \
  --target "./path/to/benchmark {SIZE}" --baseline /path/to/previous/benchmark_summary.json

Agent Parallelism

Tier 1 stays isolated because timing and tracemalloc measurements are noise-sensitive. Preferred subagent split: per-artifact or per-rubric-dimension after the pipeline finishes.

References

references/rubric.md: thresholds and scoring details.
references/tool-guide.md: profiler selection and limitations.
references/perf-remediation-playbook.md: measure/change/re-measure rules.
references/question-bank.md: advisory diagnosis prompts.
references/finding-schema.json: PERF finding schema.
references/sample-report.md: compact example report.

Limits

Linux only; /proc, /sys, Valgrind, and perf availability shape depth.
Valgrind slows runs heavily and models L1 plus last-level cache only.
perf stat and --perf-record require permissive perf_event_paranoid.
tracemalloc sees Python allocations, not all native/C extension memory.
Noisy timing is refused by --max-cv instead of being over-interpreted.

Related Skills

jc1122/perf-optimization

tools

VerifiedTrustedCommunity

Evidence-driven performance optimization that consumes perf-benchmark findings, selects one bounded candidate, re-measures with identical profiling, and records accepted wins or honest no-win outcomes.

1SKILL.mdUpdated Jun 11, 2026

jc1122/perf-optimization

openclaw/openclaw-secret-scanning-maintainer

development

VerifiedTrustedCommunity

Maintainer-only workflow for handling GitHub Secret Scanning alerts on OpenClaw. Use when Codex needs to triage, redact, clean up, and resolve secret leakage found in issue comments, issue bodies, PR comments, or other GitHub content.

357,764SKILL.mdUpdated Apr 15, 2026

openclaw/openclaw-secret-scanning-maintainer

openclaw/openclaw-release-maintainer

development

VerifiedTrustedCommunity

Maintainer workflow for OpenClaw releases, prereleases, changelog release notes, and publish validation. Use when Codex needs to prepare or verify stable or beta release steps, align version naming, assemble release notes, check release auth requirements, or validate publish-time commands and artifacts.

357,764SKILL.mdUpdated Apr 10, 2026

openclaw/openclaw-release-maintainer

openclaw/openclaw-qa-testing

development

VerifiedTrustedCommunity

Run, watch, debug, and extend OpenClaw QA testing with qa-lab and qa-channel. Use when Codex needs to execute the repo-backed QA suite, inspect live QA artifacts, debug failing scenarios, add new QA scenarios, or explain the OpenClaw QA workflow. Prefer the live OpenAI lane with regular openai/gpt-5.4 in fast mode; do not use gpt-5.4-pro or gpt-5.4-mini unless the user explicitly overrides that policy.

357,764SKILL.mdUpdated Apr 10, 2026

openclaw/openclaw-qa-testing

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/jc1122/perf-benchmark-skill.git

# Copy into Claude Code skills folder (global)
cp -r perf-benchmark-skill/ ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

jc1122/perf-benchmark-skill

1 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT