Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

beam-ai-team/beam-ape-optimizer

Name: beam-ape-optimizer
Author: beam-ai-team

skills/beam/beam-tools/beam-ape-optimizer/SKILL.md

npx skillsauth add beam-ai-team/beam-next-skills beam-ape-optimizer

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Beam APE Optimizer

Systematic prompt optimization for Beam agent nodes using Automated Prompt Engineering (APE). Runs test batches, identifies which prompt instructions cause failures, and surgically rewrites only the failing parts.

Safety Contract

Before adding reasoning params, patching prompts, running test batches that create Beam tasks, or publishing optimized prompts, show the workspace, agent ID, target node IDs, dataset/sample count, proposed prompt/param changes, and expected side effects. Require explicit user approval in the current turn. Dataset analysis, local critique/edit drafting, and dry-run summaries do not require approval.

Prerequisites

A deployed Beam agent with nodes to optimize
A test dataset with ground truth (e.g., golden dataset with verified classifications)
Agent ID and node IDs for target nodes
Beam API credentials in .env

Workflow

Step 1: Add Reasoning to Target Nodes

For each node being optimized, add a reasoning output parameter:

| Param | Type | Description | |-------|------|-------------| | reasoning | string | Step-by-step chain-of-thought explaining each decision |

Append reasoning instructions to the node's prompt (before the Input section):

# Reasoning
Before producing your outputs, think through each step:
1. [Step specific to this node's task]
2. [Another step]
3. [Final decision step]
Write your full reasoning in the 'reasoning' output.

Why reasoning only (no combined_output): APE analyzes each node's prompt independently. We need to see WHY the model made each decision, then compare individual outputs against ground truth. Aggregating outputs at the exit node adds complexity without value — each node is analyzed separately.

Deploy reasoning additions:

PATCH prompt with appended reasoning instructions
PATCH input-output-params to add reasoning output
Publish

Step 2: Analyze Test Dataset for Coverage

Before running tests, analyze the dataset to ensure maximum coverage with minimum samples:

Map categories — list all classifications, edge types, and routing paths in the agent
Count samples per category — how many test cases cover each path
Select minimum representative sample — pick the fewest samples that cover ALL categories
If minimum > 15 samples: split into chunks of 10-15
- Chunk 1: run first, iterate prompts based on results
- Chunk 2+: run with updated prompts (fresh samples prevent overfitting)

Output a coverage table:

| Category | Count | Selected Samples |
|----------|-------|-----------------|
| product_complaint | 12 | GD-44, GD-52 |
| spare_parts | 8 | GD-88, GD-91 |
| missing_info | 5 | GD-38 |
| ... | ... | ... |

Step 3: Run Test Batch

Run selected samples one at a time (concurrent tasks cause input corruption):

python3 04-workspace/scripts/trigger_beam_task.py \
  --agent {AGENT_ID} \
  --msg "path/to/sample.msg" \
  --poll --poll-timeout 600

For each completed task, collect:

Node-level outputs (the fields being validated)
reasoning output (the chain-of-thought trace)
Task ID for reference

CRITICAL: Never rerun tasks without explicit user approval. Present all results first, report stuck/failed tasks, and wait for the user to say "rerun".

Step 4: Compare Against Ground Truth

For each test case, compare node outputs against ground truth:

Ground truth precedence (if using reviewed dataset):

Review comments that correct the original → corrected value is ground truth
Review comments that confirm "CORRECT" → original value is ground truth
No review → original value is ground truth (flag as unverified)

Create a results table:

| Sample | Expected | Got | Match | Node | Reasoning Summary |
|--------|----------|-----|-------|------|-------------------|
| GD-44 | product_complaint | product_complaint | PASS | N4 | — |
| GD-38 | missing_info | spare_parts | FAIL | N4 | "Found product ref..." |

Step 5: Credit Assignment — 3-Agent System

For each MISMATCH, run the Critic and Editor agents.

Agent 1: Doer (Already Done)

The Beam agent itself is the Doer. The reasoning output is the chain-of-thought trace.

Agent 2: Critic

A Claude prompt that takes the failing node's prompt + reasoning + output + ground truth, and assigns credit to each instruction:

Input:

Node prompt (segmented into numbered instructions)
Reasoning trace from the test run
Model output (what it produced)
Ground truth (correct answer)

Output: Per-instruction labels:

KEEP — instruction followed correctly
MODIFY — instruction contributed to the error (with explanation)
NEUTRAL — instruction not relevant to this error

See references/critic-prompt.md for the full Critic prompt template.

Agent 3: Editor

Takes the Critic's output and rewrites ONLY the MODIFY instructions:

Rules:

Never change KEEP instructions
Rewrite MODIFY instructions using failure reasoning as guide
Be specific — vague instructions cause model errors
Add examples where the model was confused
Keep the same overall structure and section ordering

See references/editor-prompt.md for the full Editor prompt template.

Step 6: Redeploy and Retest

PATCH updated prompt to Beam
Publish
Run the SAME test cases again (to verify fix, not regression)
Also run next chunk of fresh samples (to verify generalization)
Compare accuracy: did it improve without regression?

Step 7: Iterate Until Convergence

Convergence criteria:

Target accuracy: 90%+ correct on test batch
No regressions: previously correct cases must stay correct
Max iterations: 5 per node (diminishing returns after that)

Iteration order:

Start with the node that has the most impact on final output
Then upstream nodes (errors propagate downstream)
After individual nodes stabilize, check for cross-node propagation errors

Prompt versioning: Save each version:

plan/prompt-versions/
  n1-v1.txt, n1-v2.txt, ...
  n4-v1.txt, n4-v2.txt, ...

Results tracking:

| Iteration | Node | Accuracy | Changes | Regressions |
|-----------|------|----------|---------|-------------|
| v1 (baseline) | N4 | 7/10 | — | — |
| v2 | N4 | 8/10 | Rule 2 rewrite | 0 |
| v3 | N4 | 9/10 | Added examples | 0 |

References

critic-prompt.md — Full Critic agent prompt template
editor-prompt.md — Full Editor agent prompt template
beam-agent-manager — PATCH workflows, publishing rules, API behavioral rules
beam-get-task-details — Fetch detailed task output for analysis

Related Skills

beam-graph-creator — Create and deploy the agent graph (do this before APE)
beam-agent-manager — API rules and PATCH/publish workflows

beam-ai-team/beam-ape-optimizer

skills/beam/beam-tools/beam-ape-optimizer/SKILL.md

Automated Prompt Engineering (APE) optimization loop for Beam agent nodes. Adds reasoning traces, runs test batches against a golden dataset, uses 3-agent credit assignment (Doer/Critic/Editor) to identify and fix failing prompt instructions, and redeploys improved prompts. Use when user says "optimize prompts", "APE loop", "improve beam agent accuracy", "prompt engineering", "test and fix prompts", "run APE", "credit assignment", or when agent accuracy needs systematic improvement rather than manual prompt editing.

development

Updated Jul 8, 2026

$ install --global

skillsauth

npx skillsauth add beam-ai-team/beam-next-skills beam-ape-optimizer

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jul 8, 2026, 3:32 AM151.7s3 files scanned

SKILL.md

name:: beam-ape-optimizer
type:: skill
version:: 1.0
description:: Automated Prompt Engineering (APE) optimization loop for Beam agent nodes.
author:: Abdul Rafay
category:: integrations
platform:: Beam AI
updated:: 2026-03-15
visibility:: team
- approval:: beam_prompt_optimization

Beam APE Optimizer

Safety Contract

Prerequisites

A deployed Beam agent with nodes to optimize
A test dataset with ground truth (e.g., golden dataset with verified classifications)
Agent ID and node IDs for target nodes
Beam API credentials in .env

Workflow

Step 1: Add Reasoning to Target Nodes

For each node being optimized, add a reasoning output parameter:

| Param | Type | Description | |-------|------|-------------| | reasoning | string | Step-by-step chain-of-thought explaining each decision |

Append reasoning instructions to the node's prompt (before the Input section):

# Reasoning
Before producing your outputs, think through each step:
1. [Step specific to this node's task]
2. [Another step]
3. [Final decision step]
Write your full reasoning in the 'reasoning' output.

Deploy reasoning additions:

PATCH prompt with appended reasoning instructions
PATCH input-output-params to add reasoning output
Publish

Step 2: Analyze Test Dataset for Coverage

Before running tests, analyze the dataset to ensure maximum coverage with minimum samples:

Map categories — list all classifications, edge types, and routing paths in the agent
Count samples per category — how many test cases cover each path
Select minimum representative sample — pick the fewest samples that cover ALL categories
If minimum > 15 samples: split into chunks of 10-15
- Chunk 1: run first, iterate prompts based on results
- Chunk 2+: run with updated prompts (fresh samples prevent overfitting)

Output a coverage table:

| Category | Count | Selected Samples |
|----------|-------|-----------------|
| product_complaint | 12 | GD-44, GD-52 |
| spare_parts | 8 | GD-88, GD-91 |
| missing_info | 5 | GD-38 |
| ... | ... | ... |

Step 3: Run Test Batch

Run selected samples one at a time (concurrent tasks cause input corruption):

python3 04-workspace/scripts/trigger_beam_task.py \
  --agent {AGENT_ID} \
  --msg "path/to/sample.msg" \
  --poll --poll-timeout 600

For each completed task, collect:

Node-level outputs (the fields being validated)
reasoning output (the chain-of-thought trace)
Task ID for reference

CRITICAL: Never rerun tasks without explicit user approval. Present all results first, report stuck/failed tasks, and wait for the user to say "rerun".

Step 4: Compare Against Ground Truth

For each test case, compare node outputs against ground truth:

Ground truth precedence (if using reviewed dataset):

Review comments that correct the original → corrected value is ground truth
Review comments that confirm "CORRECT" → original value is ground truth
No review → original value is ground truth (flag as unverified)

Create a results table:

| Sample | Expected | Got | Match | Node | Reasoning Summary |
|--------|----------|-----|-------|------|-------------------|
| GD-44 | product_complaint | product_complaint | PASS | N4 | — |
| GD-38 | missing_info | spare_parts | FAIL | N4 | "Found product ref..." |

Step 5: Credit Assignment — 3-Agent System

For each MISMATCH, run the Critic and Editor agents.

Agent 1: Doer (Already Done)

The Beam agent itself is the Doer. The reasoning output is the chain-of-thought trace.

Agent 2: Critic

A Claude prompt that takes the failing node's prompt + reasoning + output + ground truth, and assigns credit to each instruction:

Input:

Node prompt (segmented into numbered instructions)
Reasoning trace from the test run
Model output (what it produced)
Ground truth (correct answer)

Output: Per-instruction labels:

KEEP — instruction followed correctly
MODIFY — instruction contributed to the error (with explanation)
NEUTRAL — instruction not relevant to this error

See references/critic-prompt.md for the full Critic prompt template.

Agent 3: Editor

Takes the Critic's output and rewrites ONLY the MODIFY instructions:

Rules:

Never change KEEP instructions
Rewrite MODIFY instructions using failure reasoning as guide
Be specific — vague instructions cause model errors
Add examples where the model was confused
Keep the same overall structure and section ordering

See references/editor-prompt.md for the full Editor prompt template.

Step 6: Redeploy and Retest

PATCH updated prompt to Beam
Publish
Run the SAME test cases again (to verify fix, not regression)
Also run next chunk of fresh samples (to verify generalization)
Compare accuracy: did it improve without regression?

Step 7: Iterate Until Convergence

Convergence criteria:

Target accuracy: 90%+ correct on test batch
No regressions: previously correct cases must stay correct
Max iterations: 5 per node (diminishing returns after that)

Iteration order:

Start with the node that has the most impact on final output
Then upstream nodes (errors propagate downstream)
After individual nodes stabilize, check for cross-node propagation errors

Prompt versioning: Save each version:

plan/prompt-versions/
  n1-v1.txt, n1-v2.txt, ...
  n4-v1.txt, n4-v2.txt, ...

Results tracking:

| Iteration | Node | Accuracy | Changes | Regressions |
|-----------|------|----------|---------|-------------|
| v1 (baseline) | N4 | 7/10 | — | — |
| v2 | N4 | 8/10 | Rule 2 rewrite | 0 |
| v3 | N4 | 9/10 | Added examples | 0 |

References

critic-prompt.md — Full Critic agent prompt template
editor-prompt.md — Full Editor agent prompt template
beam-agent-manager — PATCH workflows, publishing rules, API behavioral rules
beam-get-task-details — Fetch detailed task output for analysis

Related Skills

beam-graph-creator — Create and deploy the agent graph (do this before APE)
beam-agent-manager — API rules and PATCH/publish workflows

Related Skills

beam-ai-team/use-case-proposal

tools

VerifiedTrustedCommunity

Build a Palantir-shape, PDF-native use-case proposal document for a sophisticated enterprise account: research-grounded use cases (each with description, challenge, impact, value), an operating-graph ontology page, a recommended PoC with a week-by-week plan, and a closing page that asks for one decision. Load when a client asks us to 'propose high-impact use cases', requests a use-case presentation/catalog for a function (finance, HR, ops), or when a technical evaluation team will review candidates to pick a PoC. NOT for single-account cold outreach (use prospect-brief), full process diagnostics (use operating-diagnostic), or priced proposals (use proposal-creation).

SKILL.mdUpdated Jul 8, 2026

beam-ai-team/use-case-proposal

beam-ai-team/beam-figma-to-html-slides

development

VerifiedTrustedCommunity

Convert Beam Figma slide designs into high-fidelity, editable HTML presentation decks. Use when Codex is asked to audit Figma slides, extract slide templates, rebuild Beam slides as HTML decks, decide whether Figma imagery should be exported or rebuilt in HTML/CSS, create Beam/Prism-compatible deck templates, or improve fidelity of existing Beam HTML slide rebuilds.

SKILL.mdUpdated Jul 8, 2026

beam-ai-team/beam-figma-to-html-slides

beam-ai-team/beam-ai-slide-library

development

VerifiedTrustedCommunity

Use the Beam AI reusable slide library: individual HTML slide templates extracted from Beam Figma rebuilds, kept separate from deck themes and full deck templates. Load when the user asks for a slide library, specific Beam slide patterns, reusable Figma-inspired slides, Prism slide-library items, or slide-level HTML templates.

SKILL.mdUpdated Jul 8, 2026

beam-ai-team/beam-ai-slide-library

beam-ai-team/beam-ai-deck-templates

development

VerifiedTrustedCommunity

Use Beam AI deck and report design packs, HTML templates, and curated examples to create sales decks, customer intro decks, RPO decks, and DIN A4 use-case proposal reports. Load when the user asks for Beam-branded presentation templates, Prism-compatible deck templates, Beam report templates, customer intro decks, commercial proposals, or reusable HTML deck/report examples.

SKILL.mdUpdated Jul 8, 2026

beam-ai-team/beam-ai-deck-templates

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/beam-ai-team/beam-next-skills.git

# Copy into Claude Code skills folder (global)
cp -r beam-next-skills/skills/beam/beam-tools/beam-ape-optimizer ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

beam-ai-team/beam-next-skills

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT