Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

pyramidheadshark/.claude/skills/critical-analysis

Name: .claude/skills/critical-analysis
Author: pyramidheadshark

.claude/skills/critical-analysis/SKILL.md

npx skillsauth add pyramidheadshark/ml-claude-infra .claude/skills/critical-analysis

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Skill: Critical Analysis (v1.1)

Multi-role critique system for ML experiments and architecture decisions. Based on: Solo Performance Prompting (NAACL 2024), CrewAI role taxonomy, Bermingham 13-agent DA-as-gate, De Bono Six Thinking Hats (Black Hat gating).

When to Load (Auto-Default Behavior)

Before any of the following — run Quick Mode without waiting for user request:

Architecture or design decision (new integration, refactor, layer change)
ML experiment launch (clustering, training run, hyperparameter search)
Infrastructure change (Terraform, Docker, secrets, permissions)
Parameter choice without prior ablation

Design doc check: for any architectural decision, verify design-doc.md exists and guides this decision. If missing → [DA]: "No design doc — decisions are ungrounded."

Quick Mode: if all 8 roles return "clear" — proceed. If any role flags CRITICAL — surface it to the user before proceeding.

This is NOT optional. Critique runs before code, not after.

8-Role Taxonomy

| Role | Source | Key Question | |------|--------|--------------| | [Security] Security Sentinel | CrewAI Code Security Auditor | Injection, auth, secrets, attack surface — what can be exploited? | | [Perf] Performance Analyst | CrewAI Performance Optimizer | O(n²), wasted iterations, memory — where is 80% of cost wasted? | | [DA] Devil's Advocate | Bermingham 13-agent + Black Hat | What is the strongest argument AGAINST this? What breaks in 90 days? | | [Crutch] Crutch Identifier | Project history: defectoscopy + hub | Is this a reusable pattern or a workaround being institutionalized? | | [Strategy] Strategic Horizon | DX/tech debt, 6-month view | New external dep / >3 files / new interface contract? Flag only if yes. | | [ML] ML Experiment Auditor | arxiv 2603.15916, project history | Are we at a plateau? Is this experiment worth running at all? | | [TestCov] Testing Coverage | Qodo/CodeRabbit/Anthropic tool | Does this change need new tests? Are edge cases covered? | | [Obs] Observability Enforcer | Google SRE, OpenTelemetry practices | Can we debug this in prod? Are logs/metrics/traces in place? |

Quick Mode Protocol (SPP — Solo Performance Prompting)

Run all 8 roles simultaneously in your reasoning. Output before proceeding:

QUICK CRITIQUE:
[Security]:  <finding + what was evaluated, or "clear — evaluated: X">
[Perf]:      <finding + what was evaluated, or "clear — evaluated: X">
[DA]:        <top objection with 90-day scenario, or "no blockers — reason">
[Crutch]:    <pattern or crutch + evidence, or "pattern — reusable: evidence">
[Strategy]:  <flag only if new dep/3+ files/new contract; else "below threshold">
[ML]:        <plateau/pivot signal, or "N/A — not an experiment">
[TestCov]:   <missing test areas, or "clear — existing tests cover: X">
[Obs]:       <observability gap, or "clear — logs/metrics in place">

VERDICT: PROCEED / BLOCKED by [Role] — [reason]

Anti-collapse rule: Before writing "clear" for any role — cite what was evaluated. WRONG: [Security]: clear RIGHT: [Security]: clear — evaluated: no new endpoints, no user input, no new deps

[Strategy] falsifiability threshold: Flag ONLY if ≥1 of:

Introduces new external dependency
Touches >3 files
Creates new interface contract
Estimated follow-up work >5 person-days Otherwise: below threshold — not flagging

If BLOCKED: surface the finding to the user. Do not write code around a CRITICAL finding.

ML Experiment Audit Protocol

Before any experiment launch, answer all 4 checks:

PLATEAU CHECK Plateau = absolute delta < 0.01 AND relative delta < 5% (both conditions). Example: metric=0.65, new=0.659 → 0.009 abs, 1.4% rel → PLATEAU Example: metric=0.65, new=0.670 → 0.020 abs, 3.1% rel → NOT plateau

PLATEAU → STOP. "Ceiling at [metric]. Pivot to: [2 alternatives from different algorithm families]."
NOT plateau → continue, document delta

HYPOTHESIS QUALITY

Falsifiable? What result disproves it?
Baseline exists? (If NO → block tuning. Run baseline first.)
Success threshold defined? ("any improvement" is not a threshold)

OBSERVABILITY SLA

ETA > 30 min? Set CronCreate check at ETA / 2.
Hung condition: CPU < 5% for 15 min OR no log output for 30 min
Artifact save: checkpoint, metrics JSON, sample predictions, manifest entry
Kill-switch defined?

MACRO PRIORITY

Is this module a bottleneck? (Show profiling data or reasoning.)
Will improvement here move the overall system metric?
What else could run in parallel?

Pre-flight template (required before any experiment > 30 min):

Hypothesis: [specific falsifiable claim]
Baseline:   [prior result, or "NONE — run baseline first"]
Success:    [metric >= X, or cost <= Y]
Pilot:      [5% subset, ~5 min — validates I/O before full run]
SLA:        [check interval, hung condition, artifact path]

Crutch vs. Pattern Rubric

Crutch indicators:

Hardcoded value without named constant or config
"temporary" comment older than 1 sprint
Logic duplicated in 2+ places without abstraction
TODO/FIXME with no owner or date
Architecture built for current single user, not contract

Pattern indicators (safe):

Reused across 3+ contexts with same interface
Decision documented with explicit rationale
Testable contract exists
Another team could use it without asking questions

Deep Mode (trigger: `[CRITIQUE]` in user prompt)

Launch 8 subagents via Agent tool in parallel — one per role. Each receives: context, the proposed decision, role system prompt from resources/role-prompts.md.

Synthesize via D3:

Debate: all 8 findings, unfiltered
Deliberate: classify each as CRITICAL / HIGH / LOW
Decide: recommendation + explicit acknowledgment of each CRITICAL

Patterns from Project History (Grounded in Real Failures)

From defectoscopy (ML pipeline):

K=3000, 6h, no timer, no pilot, no baseline → [ML] + [Perf] + [DA] would block
Hexagonal arch scaffolded before any real code → [DA] + [Crutch] + [Perf] would block
Clustering optimized when clustering was not the bottleneck → [ML] + [Perf] + [DA]

From techcon_hub (infrastructure):

GitHub PAT tokens in chat (3 incidents) → [Security] CRITICAL
terraform apply -auto-approve destroyed controller node → [Security] + [DA]
Secrets in CI only — developers locked out of local dev → [Security] + [Strategy]

Resource Index

Load only when explicitly needed (not auto-injected):

resources/role-prompts.md — Full SPP prompts for all 8 roles (Deep Mode)
resources/ml-audit-protocol.md — Plateau detection, pivot logic, experiment manifest
resources/failure-patterns.md — Annotated failure catalog (defectoscopy + hub)

pyramidheadshark/.claude/skills/critical-analysis

.claude/skills/critical-analysis/SKILL.md

## Skill: Critical Analysis (v1.1) Multi-role critique system for ML experiments and architecture decisions. Based on: Solo Performance Prompting (NAACL 2024), CrewAI role taxonomy, Bermingham 13-agent DA-as-gate, De Bono Six Thinking Hats (Black Hat gating). --- ### When to Load (Auto-Default Behavior) **Before any of the following — run Quick Mode without waiting for user request:** - Architecture or design decision (new integration, refactor, layer change) - ML experiment launch (clusteri

4 stars

development

Updated Apr 15, 2026

$ install --global

skillsauth

npx skillsauth add pyramidheadshark/ml-claude-infra .claude/skills/critical-analysis

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 20, 2026, 12:15 PM143.8s1 file scanned

SKILL.md

Skill: Critical Analysis (v1.1)

When to Load (Auto-Default Behavior)

Before any of the following — run Quick Mode without waiting for user request:

Architecture or design decision (new integration, refactor, layer change)
ML experiment launch (clustering, training run, hyperparameter search)
Infrastructure change (Terraform, Docker, secrets, permissions)
Parameter choice without prior ablation

Design doc check: for any architectural decision, verify design-doc.md exists and guides this decision. If missing → [DA]: "No design doc — decisions are ungrounded."

Quick Mode: if all 8 roles return "clear" — proceed. If any role flags CRITICAL — surface it to the user before proceeding.

This is NOT optional. Critique runs before code, not after.

8-Role Taxonomy

Quick Mode Protocol (SPP — Solo Performance Prompting)

Run all 8 roles simultaneously in your reasoning. Output before proceeding:

QUICK CRITIQUE:
[Security]:  <finding + what was evaluated, or "clear — evaluated: X">
[Perf]:      <finding + what was evaluated, or "clear — evaluated: X">
[DA]:        <top objection with 90-day scenario, or "no blockers — reason">
[Crutch]:    <pattern or crutch + evidence, or "pattern — reusable: evidence">
[Strategy]:  <flag only if new dep/3+ files/new contract; else "below threshold">
[ML]:        <plateau/pivot signal, or "N/A — not an experiment">
[TestCov]:   <missing test areas, or "clear — existing tests cover: X">
[Obs]:       <observability gap, or "clear — logs/metrics in place">

VERDICT: PROCEED / BLOCKED by [Role] — [reason]

[Strategy] falsifiability threshold: Flag ONLY if ≥1 of:

Introduces new external dependency
Touches >3 files
Creates new interface contract
Estimated follow-up work >5 person-days Otherwise: below threshold — not flagging

If BLOCKED: surface the finding to the user. Do not write code around a CRITICAL finding.

ML Experiment Audit Protocol

Before any experiment launch, answer all 4 checks:

PLATEAU → STOP. "Ceiling at [metric]. Pivot to: [2 alternatives from different algorithm families]."
NOT plateau → continue, document delta

HYPOTHESIS QUALITY

Falsifiable? What result disproves it?
Baseline exists? (If NO → block tuning. Run baseline first.)
Success threshold defined? ("any improvement" is not a threshold)

OBSERVABILITY SLA

ETA > 30 min? Set CronCreate check at ETA / 2.
Hung condition: CPU < 5% for 15 min OR no log output for 30 min
Artifact save: checkpoint, metrics JSON, sample predictions, manifest entry
Kill-switch defined?

MACRO PRIORITY

Is this module a bottleneck? (Show profiling data or reasoning.)
Will improvement here move the overall system metric?
What else could run in parallel?

Pre-flight template (required before any experiment > 30 min):

Hypothesis: [specific falsifiable claim]
Baseline:   [prior result, or "NONE — run baseline first"]
Success:    [metric >= X, or cost <= Y]
Pilot:      [5% subset, ~5 min — validates I/O before full run]
SLA:        [check interval, hung condition, artifact path]

Crutch vs. Pattern Rubric

Crutch indicators:

Hardcoded value without named constant or config
"temporary" comment older than 1 sprint
Logic duplicated in 2+ places without abstraction
TODO/FIXME with no owner or date
Architecture built for current single user, not contract

Pattern indicators (safe):

Reused across 3+ contexts with same interface
Decision documented with explicit rationale
Testable contract exists
Another team could use it without asking questions

Deep Mode (trigger: `[CRITIQUE]` in user prompt)

Launch 8 subagents via Agent tool in parallel — one per role. Each receives: context, the proposed decision, role system prompt from resources/role-prompts.md.

Synthesize via D3:

Debate: all 8 findings, unfiltered
Deliberate: classify each as CRITICAL / HIGH / LOW
Decide: recommendation + explicit acknowledgment of each CRITICAL

Patterns from Project History (Grounded in Real Failures)

From defectoscopy (ML pipeline):

K=3000, 6h, no timer, no pilot, no baseline → [ML] + [Perf] + [DA] would block
Hexagonal arch scaffolded before any real code → [DA] + [Crutch] + [Perf] would block
Clustering optimized when clustering was not the bottleneck → [ML] + [Perf] + [DA]

From techcon_hub (infrastructure):

GitHub PAT tokens in chat (3 incidents) → [Security] CRITICAL
terraform apply -auto-approve destroyed controller node → [Security] + [DA]
Secrets in CI only — developers locked out of local dev → [Security] + [Strategy]

Resource Index

Load only when explicitly needed (not auto-injected):

resources/role-prompts.md — Full SPP prompts for all 8 roles (Deep Mode)
resources/ml-audit-protocol.md — Plateau detection, pivot logic, experiment manifest
resources/failure-patterns.md — Annotated failure catalog (defectoscopy + hub)

Related Skills

pyramidheadshark/tests/fixtures/project-with-status/.claude/skills/design-doc-creator

testing

VerifiedTrustedCommunity

# Design Doc Creator ## When to Load This Skill Load when: design documents, requirements, new project start. Short fixture skill for testing (optional/meta skill).

4SKILL.mdUpdated Apr 17, 2026

pyramidheadshark/tests/fixtures/project-with-status/.claude/skills/design-doc-creator

pyramidheadshark/.claude/skills/windows-developer

development

VerifiedTrustedCommunity

# Windows Developer Guide ## When to Load Automatically loaded on Windows (`platform_trigger: "win32"`). Applies to: `.py`, `.ps1`, `.bat`, `.cmd` files and any Windows-specific workflow. ## Python on Windows ### Encoding (CRITICAL) Windows defaults to `cp1251` / `cp1252` for file I/O. Always specify UTF-8 explicitly: ```python with open("file.txt", "r", encoding="utf-8") as f: content = f.read() Path("file.txt").read_text(encoding="utf-8") Path("file.txt").write_text(content, encodin

4SKILL.mdUpdated Apr 15, 2026

pyramidheadshark/.claude/skills/windows-developer

pyramidheadshark/.claude/skills/test-first-patterns

development

VerifiedTrustedCommunity

# Test-First Patterns ## When to Load This Skill Load when writing tests, creating `.feature` files, setting up conftest, discussing test strategy, or reviewing coverage. ## Philosophy Tests are written BEFORE code. Always. No exceptions. The order is: Design Doc → BDD Scenarios → Unit Tests → Implementation. BDD scenarios come from the design document's use cases section — they are a direct translation of business requirements into executable specifications. This makes tests the living do

4SKILL.mdUpdated Apr 15, 2026

pyramidheadshark/.claude/skills/test-first-patterns

pyramidheadshark/.claude/skills/supply-chain-auditor

testing

VerifiedTrustedCommunity

# Skill: Supply Chain Auditor ## When to Load Auto-load when: adding dependencies, reviewing packages, updating versions, or discussing `requirements.txt`, `pyproject.toml`, `package.json`. Triggers on `dependency`, `install`, `package`, `CVE`, `audit`, `vulnerable` (≥2 keywords). ## Core Rules Every new dependency addition must pass this checklist before merging: 1. **Pinned** — exact version in production (`==1.2.3` for pip, `"1.2.3"` for npm, not `^` or `~`). 2. **Maintained** — last com

4SKILL.mdUpdated Apr 15, 2026

pyramidheadshark/.claude/skills/supply-chain-auditor

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/pyramidheadshark/ml-claude-infra.git

# Copy into Claude Code skills folder (global)
cp -r ml-claude-infra/.claude/skills/critical-analysis ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

pyramidheadshark/ml-claude-infra

4 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

pyramidheadshark/.claude/skills/critical-analysis

$ install --global

Security Scan Results

SKILL.md

Skill: Critical Analysis (v1.1)

When to Load (Auto-Default Behavior)

8-Role Taxonomy

Quick Mode Protocol (SPP — Solo Performance Prompting)

ML Experiment Audit Protocol

Crutch vs. Pattern Rubric

Deep Mode (trigger: [CRITIQUE] in user prompt)

Patterns from Project History (Grounded in Real Failures)

Resource Index

Related Skills

pyramidheadshark/tests/fixtures/project-with-status/.claude/skills/design-doc-creator

pyramidheadshark/.claude/skills/windows-developer

pyramidheadshark/.claude/skills/test-first-patterns

pyramidheadshark/.claude/skills/supply-chain-auditor

pyramidheadshark/.claude/skills/critical-analysis

$ install --global

Security Scan Results

SKILL.md

Skill: Critical Analysis (v1.1)

When to Load (Auto-Default Behavior)

8-Role Taxonomy

Quick Mode Protocol (SPP — Solo Performance Prompting)

ML Experiment Audit Protocol

Crutch vs. Pattern Rubric

Deep Mode (trigger: [CRITIQUE] in user prompt)

Patterns from Project History (Grounded in Real Failures)

Resource Index

Related Skills

pyramidheadshark/tests/fixtures/project-with-status/.claude/skills/design-doc-creator

pyramidheadshark/.claude/skills/windows-developer

pyramidheadshark/.claude/skills/test-first-patterns

pyramidheadshark/.claude/skills/supply-chain-auditor

Deep Mode (trigger: `[CRITIQUE]` in user prompt)

Deep Mode (trigger: `[CRITIQUE]` in user prompt)