Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

ivzc07/aeo-core

Name: aeo-core
Author: ivzc07

skills/aeo-core/SKILL.md

npx skillsauth add ivzc07/aeo-skills aeo-core

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

AEO Core - Confidence Engine

Purpose: Calculate confidence scores (0-1) and decide whether to execute autonomously or involve the human.

Activation

Loads when user types /aeo in Claude Code.

Confidence Calculation

Phase 0: Spec Validation (First Gate)

// Invoke spec-validator to check if task is well-defined
spec_score = aeo_spec_validator.validate(task)

if spec_score < 40:
    return REFUSE("Spec too unclear - need more details")

// Continue with confidence calculation

Phase 1: Rule-Based Score

Start with base confidence of 0.50, then adjust:

Add for Clarity (+0.15 each):

[ ] Explicit acceptance criteria defined
[ ] Tech stack specified
[ ] Dependencies listed
[ ] Test requirements defined

Add for Context (+0.10 each):

[ ] Similar successful task exists in memory
[ ] Familiar codebase area
[ ] Recent successful commits in this area

Subtract for Risk (-0.10 each):

[ ] Touching authentication/security
[ ] Modifying core infrastructure
[ ] Large scope (>5 files, >500 LOC)
[ ] Unclear dependencies

base_confidence = clamp(0.50 + clarity_score + context_score - risk_score, 0.0, 1.0)

Phase 2: Spec Score Adjustment

// Adjust based on spec quality
if spec_score >= 80: base_confidence += 0.10  // Excellent spec
elif spec_score < 60: base_confidence -= 0.10  // Poor spec

Phase 3: Security Multipliers

// Critical areas get confidence penalty
if task.touches_payments: base_confidence *= 0.5  // Payments need human oversight
elif task.touches_auth: base_confidence *= 0.7    // Auth needs review

Phase 4: LLM Adjustment (Optional)

If you have uncertainty about the task, adjust ±0.10:

final_confidence = clamp(base_confidence + llm_adjustment, 0.0, 1.0)

Decision Thresholds

Based on final_confidence, decide execution path:

≥ 0.85: AUTONOMOUS

Execute without asking
Note: "Confidence: 0.XX - proceeding autonomously"
Continue to execution loop

≥ 0.70: ADVISORY

Note risk clearly
Offer to pause: "Confidence: 0.XX - [CONCERNS]. I can proceed or pause."
If no response in 5 seconds, continue
Otherwise wait for human input

≥ 0.50: BLOCKING

Explain concerns
Wait for confirmation before proceeding

Format:

⚠️ CONFIDENCE BELOW THRESHOLD

Confidence: 0.XX
Threshold: 0.70

Concerns:
• [Spec] Missing acceptance criteria
• [Risk] Touching authentication
• [Context] No similar tasks in memory

Options:
1. Proceed with assumptions
2. Clarify spec first
3. Break into smaller tasks

Please confirm (1-3):

< 0.50: REFUSE

Explain why task can't be executed
Request clarification or spec improvement

Format:

❌ CANNOT EXECUTE - INSUFFICIENT CONFIDENCE

Confidence: 0.XX

Why:
• Spec score: 35/100 (below 40 threshold)
• Touching security without clear requirements
• No acceptance criteria defined

What's needed:
1. Clear acceptance criteria
2. Security requirements specified
3. Test requirements defined

Please improve spec and try again.

Learning from Outcomes

After task completes, write signal to memory:

# Append to signal log
echo '{
  "timestamp": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
  "task_id": "unique-id",
  "task_description": "brief description",
  "predicted_confidence": 0.85,
  "actual_difficulty": "easy|medium|hard",
  "success": true,
  "adjustment": +0.05
}' >> ~/.claude/MEMORY/aeo-signals.jsonl

Actual Difficulty Rating:

easy: Task went smoothly, no blockers
medium: Minor issues or clarifications needed
hard: Significant problems, multiple iterations

Confidence Adjustment:

easy + success: +0.05
medium + success: +0.00
hard + success: -0.05
any failure: -0.10

Rolling Window: Keep last 100 signals, calculate adjustment average

Reading Past Signals

On startup, read recent signals to calibrate:

# Get last 100 signals
tail -100 ~/.claude/MEMORY/aeo-signals.jsonl | jq -s '. | map(.adjustment) | add / length'

Apply average adjustment as offset to all confidence calculations.

Integration Flow

User activates: Types /aeo
Calculate confidence: Follow phases 0-4
Make decision: Based on thresholds
If autonomous: Execute task
If advisory/blocking: Invoke aeo-escalation skill
Post-execution: Invoke aeo-qa-agent for review
Record outcome: Write to signal log
Update model: Adjust future confidence based on outcome

Memory Files

Signals: $PAI_DIR/MEMORY/aeo-signals.jsonl
Escalations: $PAI_DIR/MEMORY/aeo-escalations.jsonl
Patterns: $PAI_DIR/MEMORY/aeo-failure-patterns.json

Escalation Triggers

Invoke aeo-escalation skill when:

Confidence < 0.70 (advisory/blocking)
Spec score < 40 (refuse)
QA veto occurs
Failure pattern can't be resolved
Cost limit approaching (if cost-governor enabled)

Example Session

User: /aeo
User: Add user authentication with email verification

AEO: [Invoking aeo-spec-validator]
AEO: Spec score: 72/100
AEO: Calculating confidence...
      - Base: 0.50
      - Clarity: +0.30 (acceptance criteria, tech stack)
      - Context: +0.10 (similar task in memory)
      - Risk: -0.10 (touching auth)
      - Spec adj: -0.10 (spec < 80)
      - Security mult: ×0.7
      - Final: 0.49

AEO: [Invokes aeo-escalation]
AEO: ❌ CONFIDENCE BELOW THRESHOLD
     Confidence: 0.49

     Concerns:
     • [Spec] Missing security requirements
     • [Risk] Touching authentication
     • [Context] Need email service details

     Options:
     1. Add security requirements and proceed
     2. Provide email service details
     3. Break into smaller tasks

     Please clarify (1-3):

User: 2
User: We use Resend for emails, API key in .env

AEO: Recalculating confidence with added context...
     Final: 0.71

AEO: ⚡ ADVISORY - Confidence: 0.71
     [Acceptance criteria defined]
     [Tech stack: Node.js, bcrypt, jwt]
     [Email: Resend, API key in .env]

     Proceeding with implementation. I'll pause if issues arise.

[Implementation proceeds]

Special Cases

Repeated Tasks

If same task done successfully 3+ times:

Add +0.10 to confidence
Flag as "routine - can be autonomous"

High-Risk Areas

Never reach full autonomy for:

Payment processing (max 0.70)
Authentication changes (max 0.75)
Database migrations (max 0.80)
Production deployments (max 0.85)

Emergency Rollbacks

If task causes test failures or errors:

Immediately rollback
Write failure signal
Reduce confidence by 0.20
Require human review before retry

ivzc07/aeo-core

skills/aeo-core/SKILL.md

Main AEO skill - calculates confidence scores and decides execution path. Auto-loads on /aeo command.

testing

Updated Apr 6, 2026

$ install --global

skillsauth

npx skillsauth add ivzc07/aeo-skills aeo-core

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 24, 2026, 8:38 PM1.8s1 file scanned

SKILL.md

name:: aeo-core
description:: Main AEO skill - calculates confidence scores and decides execution path. Auto-loads on /aeo command.

AEO Core - Confidence Engine

Purpose: Calculate confidence scores (0-1) and decide whether to execute autonomously or involve the human.

Activation

Loads when user types /aeo in Claude Code.

Confidence Calculation

Phase 0: Spec Validation (First Gate)

// Invoke spec-validator to check if task is well-defined
spec_score = aeo_spec_validator.validate(task)

if spec_score < 40:
    return REFUSE("Spec too unclear - need more details")

// Continue with confidence calculation

Phase 1: Rule-Based Score

Start with base confidence of 0.50, then adjust:

Add for Clarity (+0.15 each):

[ ] Explicit acceptance criteria defined
[ ] Tech stack specified
[ ] Dependencies listed
[ ] Test requirements defined

Add for Context (+0.10 each):

[ ] Similar successful task exists in memory
[ ] Familiar codebase area
[ ] Recent successful commits in this area

Subtract for Risk (-0.10 each):

[ ] Touching authentication/security
[ ] Modifying core infrastructure
[ ] Large scope (>5 files, >500 LOC)
[ ] Unclear dependencies

base_confidence = clamp(0.50 + clarity_score + context_score - risk_score, 0.0, 1.0)

Phase 2: Spec Score Adjustment

// Adjust based on spec quality
if spec_score >= 80: base_confidence += 0.10  // Excellent spec
elif spec_score < 60: base_confidence -= 0.10  // Poor spec

Phase 3: Security Multipliers

// Critical areas get confidence penalty
if task.touches_payments: base_confidence *= 0.5  // Payments need human oversight
elif task.touches_auth: base_confidence *= 0.7    // Auth needs review

Phase 4: LLM Adjustment (Optional)

If you have uncertainty about the task, adjust ±0.10:

final_confidence = clamp(base_confidence + llm_adjustment, 0.0, 1.0)

Decision Thresholds

Based on final_confidence, decide execution path:

≥ 0.85: AUTONOMOUS

Execute without asking
Note: "Confidence: 0.XX - proceeding autonomously"
Continue to execution loop

≥ 0.70: ADVISORY

Note risk clearly
Offer to pause: "Confidence: 0.XX - [CONCERNS]. I can proceed or pause."
If no response in 5 seconds, continue
Otherwise wait for human input

≥ 0.50: BLOCKING

Explain concerns
Wait for confirmation before proceeding

Format:

⚠️ CONFIDENCE BELOW THRESHOLD

Confidence: 0.XX
Threshold: 0.70

Concerns:
• [Spec] Missing acceptance criteria
• [Risk] Touching authentication
• [Context] No similar tasks in memory

Options:
1. Proceed with assumptions
2. Clarify spec first
3. Break into smaller tasks

Please confirm (1-3):

< 0.50: REFUSE

Explain why task can't be executed
Request clarification or spec improvement

Format:

❌ CANNOT EXECUTE - INSUFFICIENT CONFIDENCE

Confidence: 0.XX

Why:
• Spec score: 35/100 (below 40 threshold)
• Touching security without clear requirements
• No acceptance criteria defined

What's needed:
1. Clear acceptance criteria
2. Security requirements specified
3. Test requirements defined

Please improve spec and try again.

Learning from Outcomes

After task completes, write signal to memory:

# Append to signal log
echo '{
  "timestamp": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
  "task_id": "unique-id",
  "task_description": "brief description",
  "predicted_confidence": 0.85,
  "actual_difficulty": "easy|medium|hard",
  "success": true,
  "adjustment": +0.05
}' >> ~/.claude/MEMORY/aeo-signals.jsonl

Actual Difficulty Rating:

easy: Task went smoothly, no blockers
medium: Minor issues or clarifications needed
hard: Significant problems, multiple iterations

Confidence Adjustment:

easy + success: +0.05
medium + success: +0.00
hard + success: -0.05
any failure: -0.10

Rolling Window: Keep last 100 signals, calculate adjustment average

Reading Past Signals

On startup, read recent signals to calibrate:

# Get last 100 signals
tail -100 ~/.claude/MEMORY/aeo-signals.jsonl | jq -s '. | map(.adjustment) | add / length'

Apply average adjustment as offset to all confidence calculations.

Integration Flow

User activates: Types /aeo
Calculate confidence: Follow phases 0-4
Make decision: Based on thresholds
If autonomous: Execute task
If advisory/blocking: Invoke aeo-escalation skill
Post-execution: Invoke aeo-qa-agent for review
Record outcome: Write to signal log
Update model: Adjust future confidence based on outcome

Memory Files

Signals: $PAI_DIR/MEMORY/aeo-signals.jsonl
Escalations: $PAI_DIR/MEMORY/aeo-escalations.jsonl
Patterns: $PAI_DIR/MEMORY/aeo-failure-patterns.json

Escalation Triggers

Invoke aeo-escalation skill when:

Confidence < 0.70 (advisory/blocking)
Spec score < 40 (refuse)
QA veto occurs
Failure pattern can't be resolved
Cost limit approaching (if cost-governor enabled)

Example Session

User: /aeo
User: Add user authentication with email verification

AEO: [Invoking aeo-spec-validator]
AEO: Spec score: 72/100
AEO: Calculating confidence...
      - Base: 0.50
      - Clarity: +0.30 (acceptance criteria, tech stack)
      - Context: +0.10 (similar task in memory)
      - Risk: -0.10 (touching auth)
      - Spec adj: -0.10 (spec < 80)
      - Security mult: ×0.7
      - Final: 0.49

AEO: [Invokes aeo-escalation]
AEO: ❌ CONFIDENCE BELOW THRESHOLD
     Confidence: 0.49

     Concerns:
     • [Spec] Missing security requirements
     • [Risk] Touching authentication
     • [Context] Need email service details

     Options:
     1. Add security requirements and proceed
     2. Provide email service details
     3. Break into smaller tasks

     Please clarify (1-3):

User: 2
User: We use Resend for emails, API key in .env

AEO: Recalculating confidence with added context...
     Final: 0.71

AEO: ⚡ ADVISORY - Confidence: 0.71
     [Acceptance criteria defined]
     [Tech stack: Node.js, bcrypt, jwt]
     [Email: Resend, API key in .env]

     Proceeding with implementation. I'll pause if issues arise.

[Implementation proceeds]

Special Cases

Repeated Tasks

If same task done successfully 3+ times:

Add +0.10 to confidence
Flag as "routine - can be autonomous"

High-Risk Areas

Never reach full autonomy for:

Payment processing (max 0.70)
Authentication changes (max 0.75)
Database migrations (max 0.80)
Production deployments (max 0.85)

Emergency Rollbacks

If task causes test failures or errors:

Immediately rollback
Write failure signal
Reduce confidence by 0.20
Require human review before retry

Related Skills

ivzc07/aeo-spec-validator

testing

VerifiedTrustedCommunity

Validate that tasks are sufficiently defined before execution. Returns score 0-100.

SKILL.mdUpdated Apr 6, 2026

ivzc07/aeo-spec-validator

ivzc07/aeo-qa-agent

development

VerifiedTrustedCommunity

Internal code reviewer with veto power. Reviews changes before commit, blocks security issues.

SKILL.mdUpdated Apr 6, 2026

ivzc07/skills/aeo-failure-patterns

testing

VerifiedTrustedCommunity

--- name: aeo-failure-patterns description: Recognize common errors and apply known fixes automatically. Hybrid: core patterns + project-specific learning. --- # AEO Failure Patterns **Purpose:** Recognize common errors and apply known fixes automatically. Uses hybrid architecture: core curated patterns + project-specific learned patterns. ## Architecture **Core Patterns (in this SKILL.md):** - 20-30 curated patterns with high-confidence fixes - Portable across projects - Confidence ≥ 0.85 →

SKILL.mdUpdated Apr 6, 2026

ivzc07/skills/aeo-failure-patterns

ivzc07/aeo-escalation

testing

VerifiedTrustedCommunity

Human-AI interface for when to interrupt and involve humans. Presents clear options and records decisions.

SKILL.mdUpdated Apr 6, 2026

ivzc07/aeo-escalation

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/ivzc07/aeo-skills.git

# Copy into Claude Code skills folder (global)
cp -r aeo-skills/skills/aeo-core ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

ivzc07/aeo-skills

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT