skills/hong-et-al-2024-metagpt/SKILL.md
Applies insights from MetaGPT paper on how structured communication, role specialization, and SOPs prevent coordination failures in multi-agent LLM systems
npx skillsauth add curiositech/windags-skills hong-et-al-2024-metagptInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
IF task requires <3 sequential steps with minimal handoffs
└── Use single agent with structured output templates
└── Add validation checkpoints at each step
IF task requires 3-7 steps with clear role boundaries
├── Use role-based specialization (PM→Architect→Engineer→QA)
│ └── Implement structured artifacts at each handoff
└── Add pub-sub message pool if >5 agents
IF task requires >7 agents or complex information sharing
└── Mandatory pub-sub architecture with typed messages
├── Define message schemas first, then agent roles
└── Implement subscription patterns by information type
IF task has established human SOP (software dev, research, content)
├── Encode existing workflow as agent roles
│ └── Map human deliverables to structured artifacts
└── Validate each role produces testable outputs
IF task is novel/experimental
├── Start with 3-agent proof-of-concept (Generator→Validator→Refiner)
│ └── Identify handoff points where information degrades
└── Add specialization only when bottlenecks appear
IF output is executable (code, API calls, database queries)
└── Mandatory execution feedback loop
├── Capture concrete errors (stack traces, response codes)
└── Feed errors back to generating agent
IF output is structured data (JSON, documents, reports)
├── Schema validation first
└── Content validation via executable checks where possible
IF output is creative/subjective (writing, designs)
└── Use structured evaluation criteria with binary checkpoints
Symptoms: Output quality degrades with each agent handoff; later agents produce confident but incorrect results based on earlier errors Detection: Compare first agent output quality to final output—if final is significantly worse, cascade is occurring Fix: Implement structured artifacts with validation at each handoff; require concrete verification before passing to next agent
Symptoms: Agents lose critical information from earlier steps; requirements get "interpreted" differently at each stage Detection: If agents ask for information that was provided earlier, or if final output doesn't match initial requirements Fix: Use persistent structured documents (PRDs, design specs) that agents read from rather than relying on message passing
Symptoms: Agents endlessly negotiate, ask clarifying questions, or produce conflicting outputs Detection: High message volume between agents with low progress on actual deliverables; circular dependencies in agent communication Fix: Switch to pub-sub with pre-defined message types; eliminate agent-to-agent negotiation in favor of structured information publishing
Symptoms: Code/outputs that look correct but fail when tested; agents claiming "validation complete" without actually running tests Detection: If agent reports success but execution reveals errors that should have been caught Fix: Make execution feedback mandatory and automatic; never accept agent self-assessment without concrete verification
Symptoms: Agents performing tasks outside their specialization; unclear ownership when outputs fail Detection: If you can't answer "which agent is responsible for X deliverable?" or agents produce overlapping outputs Fix: Redefine roles by output artifacts; each agent owns exactly one type of structured deliverable
Scenario: Analyze 50 academic papers to identify trends in multi-agent systems
Initial Approach (fails): Single agent reads papers and produces analysis
MetaGPT Approach:
Paper Processor agent produces structured summaries:
{
"title": "...",
"methodology": "...",
"key_findings": ["...", "..."],
"evaluation_metrics": {...}
}
Trend Analyzer subscribes to summary messages, produces trend reports:
{
"trend_name": "...",
"supporting_papers": ["id1", "id2"],
"evidence_strength": "high|medium|low",
"counter_evidence": [...]
}
Synthesis Agent produces final analysis with traceable references
Key Decisions Made:
Failure Recovery: When Trend Analyzer produced unsupported claims, the structured format allowed automatic verification against source summaries—claims without supporting evidence were flagged and regenerated.
Scenario: Generate technical blog posts from API documentation
Decision Tree Navigated:
Implementation:
Research Agent produces structured fact sheet with verifiable claims:
{
"api_endpoints": [{"url": "...", "verified": true/false}],
"code_examples": [{"code": "...", "tested": true/false}],
"use_cases": [...]
}
Outline Agent subscribes to fact sheets, produces structured outline
Draft Agent writes content following outline structure
Technical Editor validates code examples through execution
Failure Recovery Scenario: Draft Agent claimed an API endpoint returned specific JSON structure. Technical Editor executed the API call, discovered different response format, published correction to message pool. Draft Agent automatically regenerated affected sections with correct structure.
What Novice Would Miss: Running code examples to verify they work; accepting plausible-sounding API behavior without testing What Expert Catches: Every technical claim must be executable-verifiable; structure enables automatic error localization and recovery
Don't use MetaGPT principles for:
For these alternatives, use:
basic-promptinghuman-ai-collaborationdirect-agent-coordinationexperimental-multi-agentstructured-creativity-frameworkstools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.