Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

ahrav/design-tournament

Name: design-tournament
Author: ahrav

.claude/skills/design-tournament/SKILL.md

npx skillsauth add ahrav/gossip-rs design-tournament

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Design Tournament

A two-phase "wisdom of crowds" pattern for design work. Multiple independent agents explore the same problem in isolation, then independent rankers evaluate and merge the findings into a confidence-weighted stack ranking.

When to Use

Designing a new feature, data structure, algorithm, or API
Exploring multiple valid architectural approaches
Story/task breakdown where the "right" decomposition isn't obvious
Any design decision where independent perspectives reduce blind spots

Invocation

/design-tournament <problem statement>

The <problem statement> is the full description of what needs to be designed. If no argument is given, ask the user for the problem statement before proceeding.

Phase 1 — Diverge (Parallel Exploration)

Launch 5 independent agents in parallel using the Task tool with subagent_type=general-purpose. Every agent receives identical instructions so that their outputs are independently generated with zero cross-contamination.

All 5 agents MUST be launched in a single message (one message, five Task tool calls) so they run concurrently.

Agent Prompt Template (identical for all 5)

Each agent receives this prompt, with {PROBLEM} replaced by the user's problem statement and {AGENT_ID} set to a unique label (Alpha, Beta, Gamma, Delta, Epsilon):

You are Design Agent {AGENT_ID} in an independent design tournament.

## Problem
{PROBLEM}

## Your Task

Produce a COMPLETE, standalone design proposal. You are working independently —
do not assume anyone else's output exists.

### Required Sections

1. **Problem Restatement** — Restate the problem in your own words to confirm
   understanding. Call out any ambiguities or assumptions you are making.

2. **Design Proposal** — Your recommended design. Include:
   - Core idea / approach (1-2 sentence summary)
   - Detailed description with concrete types, signatures, or pseudocode
   - Key invariants or contracts the design enforces
   - How the design handles edge cases and failure modes

3. **Alternatives Considered** — At least 2 alternative approaches you
   evaluated and why you rejected them.

4. **Trade-offs** — Explicit trade-off analysis:
   - What does this design optimize for?
   - What does it sacrifice?
   - Under what conditions would a different design be better?

5. **Risks & Open Questions** — What could go wrong? What remains uncertain?

6. **Self-Assessment** — Rate your own confidence (0-100) in this design and
   explain why.

### Rules

- Be specific and concrete — pseudocode or type signatures over hand-waving.
- Do NOT hedge with "it depends" — commit to a design and defend it.
- Explore the codebase if you need context (use Glob, Grep, Read).
- Aim for depth over breadth. One well-defended design beats three sketches.

### Output Format

Return your response as a single markdown document with the sections above.
Start with: `# Design Proposal — Agent {AGENT_ID}`

Collecting Results

After all 5 agents complete, gather their outputs. If any agent fails or times out, proceed with the agents that succeeded (minimum 3 required).

Phase 2 — Converge (Parallel Ranking)

Launch 2 independent ranking agents in parallel using the Task tool with subagent_type=general-purpose. Both receive identical instructions and the collected Phase 1 outputs.

Both rankers MUST be launched in a single message (one message, two Task tool calls) so they run concurrently.

Ranker Prompt Template (identical for both)

You are Ranking Judge {JUDGE_ID} in a design tournament.

## Original Problem
{PROBLEM}

## Design Proposals

Below are {N} independently generated design proposals for the same problem.
Evaluate them WITHOUT knowing which agent produced each one (they are labeled
Alpha through Epsilon for reference only).

{PROPOSALS}

## Your Task

### 1. Evaluation Criteria

Score each proposal on these dimensions (1-10 scale):

| Criterion | Weight | Description |
|-----------|--------|-------------|
| **Correctness** | 25% | Does the design actually solve the problem? Are invariants sound? |
| **Simplicity** | 20% | Is this the simplest design that could work? Minimal moving parts? |
| **Robustness** | 20% | How well does it handle edge cases, failures, and adversarial inputs? |
| **Extensibility** | 15% | Can the design accommodate likely future requirements without rework? |
| **Specificity** | 10% | How concrete and implementable is the proposal? |
| **Trade-off Awareness** | 10% | Does the author understand what they're sacrificing? |

### 2. Per-Proposal Evaluation

For EACH proposal, provide:
- Scores on each criterion
- Weighted total score
- Top strength (1 sentence)
- Top weakness (1 sentence)
- Key insight unique to this proposal (if any)

### 3. Stack Ranking

Produce a final ranking from best to worst. For each position:
- Rank position (1 = best)
- Agent label
- Weighted total score
- Confidence in this ranking position (0-100):
  - 90-100: Clear winner/loser, large score gap
  - 70-89: Solid ranking, meaningful score gap
  - 50-69: Close call, could reasonably swap with adjacent rank
  - Below 50: Essentially tied, ranking is arbitrary

### 4. Synthesis Recommendation

After ranking, provide:
- **Recommended design**: Which proposal to use as the starting point
- **Cherry-pick list**: Specific ideas from OTHER proposals that should be
  incorporated into the winner
- **Composite confidence** (0-100): Your overall confidence that the #1
  ranked proposal (with cherry-picks) is the right design

### Rules

- Judge designs on MERIT, not on writing quality or verbosity.
- If two designs are substantively identical, say so — don't force a
  false distinction.
- Be willing to rank a simple, correct design above an elaborate one.
- Explore the codebase if you need context to evaluate feasibility.

### Output Format

Return your response as a single markdown document.
Start with: `# Ranking Report — Judge {JUDGE_ID}`

Merging Rankings

After both rankers complete, merge their results:

For each proposal, average the weighted scores from both judges.
For each rank position, compute:
- Average score
- Agreement: did both judges assign the same rank? (yes/no)
- Combined confidence: average of both judges' per-position confidence, reduced by 15 points if they disagree on rank position
Final stack ranking by averaged weighted score (descending).
Synthesize cherry-pick lists from both judges into a unified list.

Final Output Format

Present the merged result to the user as:

## Design Tournament Results

### Problem
{one-line restatement}

### Stack Ranking

| Rank | Agent | Avg Score | Judge Agreement | Confidence | Key Strength |
|------|-------|-----------|-----------------|------------|--------------|
| 1    | ...   | 8.2/10    | Yes             | 92%        | ...          |
| 2    | ...   | 7.5/10    | Yes             | 78%        | ...          |
| 3    | ...   | 6.8/10    | No (J1:#2,J2:#4)| 54%       | ...          |
| ...  | ...   | ...       | ...             | ...        | ...          |

### Recommended Design

**Base**: Agent {winner}'s proposal
**Cherry-picks from other proposals**:
- From Agent X: {specific idea}
- From Agent Y: {specific idea}

**Composite Confidence**: {average of both judges' composite}%

### Winning Proposal Summary
{2-3 paragraph summary of the #1 design with cherry-picks integrated}

### Full Proposals (collapsed)
<details><summary>Agent Alpha — Score: X.X</summary>
{full proposal text}
</details>
{repeat for each agent}

### Ranking Details (collapsed)
<details><summary>Judge 1 Report</summary>
{full ranking report}
</details>
<details><summary>Judge 2 Report</summary>
{full ranking report}
</details>

Configuration

The default is 5 explorers + 2 rankers (7 total agents). If the user specifies a different count (e.g., /design-tournament 3 agents: ...), respect it but enforce these minimums:

Phase 1: at least 3 explorer agents
Phase 2: always exactly 2 ranking agents

Tips

For large or ambiguous problems, include relevant file paths or context in the problem statement so explorer agents can read the codebase.
If the problem involves existing code, tell agents which files to examine.
The skill works for any design task: features, data structures, algorithms, API shapes, system architecture, task decomposition, migration plans.

ahrav/design-tournament

.claude/skills/design-tournament/SKILL.md

Use when facing a design decision with multiple viable approaches, when you want competing proposals evaluated objectively, or when brainstorming needs structured diverge-then-converge evaluation. 3-5 independent design agents plus ranked synthesis.

1 stars

testing

Updated Apr 10, 2026

$ install --global

skillsauth

npx skillsauth add ahrav/gossip-rs design-tournament

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 10, 2026, 3:11 AM4.2s1 file scanned

SKILL.md

name:: design-tournament
description:: Use when facing a design decision with multiple viable approaches, when you want competing proposals evaluated objectively, or when brainstorming needs structured diverge-then-converge evaluation. 3-5 independent design agents plus ranked synthesis.

Design Tournament

When to Use

Designing a new feature, data structure, algorithm, or API
Exploring multiple valid architectural approaches
Story/task breakdown where the "right" decomposition isn't obvious
Any design decision where independent perspectives reduce blind spots

Invocation

/design-tournament <problem statement>

The <problem statement> is the full description of what needs to be designed. If no argument is given, ask the user for the problem statement before proceeding.

Phase 1 — Diverge (Parallel Exploration)

All 5 agents MUST be launched in a single message (one message, five Task tool calls) so they run concurrently.

Agent Prompt Template (identical for all 5)

Each agent receives this prompt, with {PROBLEM} replaced by the user's problem statement and {AGENT_ID} set to a unique label (Alpha, Beta, Gamma, Delta, Epsilon):

You are Design Agent {AGENT_ID} in an independent design tournament.

## Problem
{PROBLEM}

## Your Task

Produce a COMPLETE, standalone design proposal. You are working independently —
do not assume anyone else's output exists.

### Required Sections

1. **Problem Restatement** — Restate the problem in your own words to confirm
   understanding. Call out any ambiguities or assumptions you are making.

2. **Design Proposal** — Your recommended design. Include:
   - Core idea / approach (1-2 sentence summary)
   - Detailed description with concrete types, signatures, or pseudocode
   - Key invariants or contracts the design enforces
   - How the design handles edge cases and failure modes

3. **Alternatives Considered** — At least 2 alternative approaches you
   evaluated and why you rejected them.

4. **Trade-offs** — Explicit trade-off analysis:
   - What does this design optimize for?
   - What does it sacrifice?
   - Under what conditions would a different design be better?

5. **Risks & Open Questions** — What could go wrong? What remains uncertain?

6. **Self-Assessment** — Rate your own confidence (0-100) in this design and
   explain why.

### Rules

- Be specific and concrete — pseudocode or type signatures over hand-waving.
- Do NOT hedge with "it depends" — commit to a design and defend it.
- Explore the codebase if you need context (use Glob, Grep, Read).
- Aim for depth over breadth. One well-defended design beats three sketches.

### Output Format

Return your response as a single markdown document with the sections above.
Start with: `# Design Proposal — Agent {AGENT_ID}`

Collecting Results

After all 5 agents complete, gather their outputs. If any agent fails or times out, proceed with the agents that succeeded (minimum 3 required).

Phase 2 — Converge (Parallel Ranking)

Launch 2 independent ranking agents in parallel using the Task tool with subagent_type=general-purpose. Both receive identical instructions and the collected Phase 1 outputs.

Both rankers MUST be launched in a single message (one message, two Task tool calls) so they run concurrently.

Ranker Prompt Template (identical for both)

You are Ranking Judge {JUDGE_ID} in a design tournament.

## Original Problem
{PROBLEM}

## Design Proposals

Below are {N} independently generated design proposals for the same problem.
Evaluate them WITHOUT knowing which agent produced each one (they are labeled
Alpha through Epsilon for reference only).

{PROPOSALS}

## Your Task

### 1. Evaluation Criteria

Score each proposal on these dimensions (1-10 scale):

| Criterion | Weight | Description |
|-----------|--------|-------------|
| **Correctness** | 25% | Does the design actually solve the problem? Are invariants sound? |
| **Simplicity** | 20% | Is this the simplest design that could work? Minimal moving parts? |
| **Robustness** | 20% | How well does it handle edge cases, failures, and adversarial inputs? |
| **Extensibility** | 15% | Can the design accommodate likely future requirements without rework? |
| **Specificity** | 10% | How concrete and implementable is the proposal? |
| **Trade-off Awareness** | 10% | Does the author understand what they're sacrificing? |

### 2. Per-Proposal Evaluation

For EACH proposal, provide:
- Scores on each criterion
- Weighted total score
- Top strength (1 sentence)
- Top weakness (1 sentence)
- Key insight unique to this proposal (if any)

### 3. Stack Ranking

Produce a final ranking from best to worst. For each position:
- Rank position (1 = best)
- Agent label
- Weighted total score
- Confidence in this ranking position (0-100):
  - 90-100: Clear winner/loser, large score gap
  - 70-89: Solid ranking, meaningful score gap
  - 50-69: Close call, could reasonably swap with adjacent rank
  - Below 50: Essentially tied, ranking is arbitrary

### 4. Synthesis Recommendation

After ranking, provide:
- **Recommended design**: Which proposal to use as the starting point
- **Cherry-pick list**: Specific ideas from OTHER proposals that should be
  incorporated into the winner
- **Composite confidence** (0-100): Your overall confidence that the #1
  ranked proposal (with cherry-picks) is the right design

### Rules

- Judge designs on MERIT, not on writing quality or verbosity.
- If two designs are substantively identical, say so — don't force a
  false distinction.
- Be willing to rank a simple, correct design above an elaborate one.
- Explore the codebase if you need context to evaluate feasibility.

### Output Format

Return your response as a single markdown document.
Start with: `# Ranking Report — Judge {JUDGE_ID}`

Merging Rankings

After both rankers complete, merge their results:

For each proposal, average the weighted scores from both judges.
For each rank position, compute:
- Average score
- Agreement: did both judges assign the same rank? (yes/no)
- Combined confidence: average of both judges' per-position confidence, reduced by 15 points if they disagree on rank position
Final stack ranking by averaged weighted score (descending).
Synthesize cherry-pick lists from both judges into a unified list.

Final Output Format

Present the merged result to the user as:

## Design Tournament Results

### Problem
{one-line restatement}

### Stack Ranking

| Rank | Agent | Avg Score | Judge Agreement | Confidence | Key Strength |
|------|-------|-----------|-----------------|------------|--------------|
| 1    | ...   | 8.2/10    | Yes             | 92%        | ...          |
| 2    | ...   | 7.5/10    | Yes             | 78%        | ...          |
| 3    | ...   | 6.8/10    | No (J1:#2,J2:#4)| 54%       | ...          |
| ...  | ...   | ...       | ...             | ...        | ...          |

### Recommended Design

**Base**: Agent {winner}'s proposal
**Cherry-picks from other proposals**:
- From Agent X: {specific idea}
- From Agent Y: {specific idea}

**Composite Confidence**: {average of both judges' composite}%

### Winning Proposal Summary
{2-3 paragraph summary of the #1 design with cherry-picks integrated}

### Full Proposals (collapsed)
<details><summary>Agent Alpha — Score: X.X</summary>
{full proposal text}
</details>
{repeat for each agent}

### Ranking Details (collapsed)
<details><summary>Judge 1 Report</summary>
{full ranking report}
</details>
<details><summary>Judge 2 Report</summary>
{full ranking report}
</details>

Configuration

The default is 5 explorers + 2 rankers (7 total agents). If the user specifies a different count (e.g., /design-tournament 3 agents: ...), respect it but enforce these minimums:

Phase 1: at least 3 explorer agents
Phase 2: always exactly 2 ranking agents

Tips

For large or ambiguous problems, include relevant file paths or context in the problem statement so explorer agents can read the codebase.
If the problem involves existing code, tell agents which files to examine.
The skill works for any design task: features, data structures, algorithms, API shapes, system architecture, task decomposition, migration plans.

Related Skills

ahrav/first-principles

development

VerifiedTrustedCommunity

Deep first-principles code explanation that builds real understanding through phased walkthroughs with diagrams. Covers algorithms, data structures, memory layout, concurrency patterns, and performance tricks — especially for systems code in Rust. Use whenever the user asks to explain, walk through, break down, deep dive into, or understand code. Trigger on "how does this work", "what's happening here", "teach me about this", "why is it done this way", or when the user references a file with @ and wants to understand it. Proactively use when examining code involving lock-free algorithms, atomics/CAS, memory ordering,

1SKILL.mdUpdated Apr 17, 2026

ahrav/first-principles

ahrav/task-forge

development

VerifiedTrustedCommunity

Use when creating implementation-ready beads tasks that need testing strategy, optimal implementation approach, and documentation requirements baked in — composes /create-task with parallel enrichment agents that analyze the codebase and produce concrete test specifications, algorithm/data-structure guidance, and doc quality standards so implementing agents don't need to re-research

1SKILL.mdUpdated Apr 10, 2026

ahrav/.claude/skills/autoresearch

development

VerifiedTrustedCommunity

--- name: autoresearch description: Autonomous Goal-directed Iteration. Apply Karpathy's autoresearch principles to ANY task. Loops autonomously — modify, verify, keep/discard, repeat. Supports bounded iteration via Iterations: N inline config. version: 1.9.11 --- # Claude Autoresearch — Autonomous Goal-directed Iteration Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch). Applies constraint-driven autonomous iteration to ANY work — not just ML research. **Core id

1SKILL.mdUpdated Apr 10, 2026

ahrav/.claude/skills/autoresearch

ahrav/test-pipeline

development

VerifiedTrustedCommunity

Use when implementing a new feature and assessing coverage gaps, during periodic test hygiene, when test suites feel bloated, or before merging code that changes coordination or hot paths. Two-phase assess-then-improve testing pipeline.

1SKILL.mdUpdated Apr 2, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/ahrav/gossip-rs.git

# Copy into Claude Code skills folder (global)
cp -r gossip-rs/.claude/skills/design-tournament ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

ahrav/gossip-rs

1 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT