skills/agyn-multi-agent-system-team-based/SKILL.md
Orchestrate multi-agent teams for autonomous software engineering using the Agyn methodology: coordinator, researcher, implementer, and reviewer agents with structured communication, isolated sandboxes, and iterative review loops. Use when: 'set up a multi-agent team to fix this bug', 'use agent swarm to implement this feature', 'resolve this GitHub issue with a team of agents', 'coordinate agents to refactor this module', 'spin up an engineering team to tackle this task', 'use Agyn-style agents to solve this problem'.
npx skillsauth add ndpvt-web/arxiv-claude-skills agyn-multi-agent-system-team-basedInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to orchestrate autonomous software engineering teams modeled after the Agyn architecture (Benkovich & Valkov, 2026). Instead of treating issue resolution as a single monolithic prompt, Claude decomposes work across four specialized agent roles — Coordinator, Researcher, Implementer, and Reviewer — each operating in isolated sandboxes with structured inter-agent communication. The agents follow a defined development methodology: analysis, task specification, implementation, and iterative review, mirroring how real engineering teams operate. This approach resolves 72.2% of SWE-bench 500 tasks, outperforming comparable single-agent baselines.
The Agyn system's core insight is that organizational design matters as much as model capability. Rather than feeding a single agent an issue description and hoping it produces a correct patch, Agyn replicates the structure of a real engineering team. Four agents — Coordinator, Researcher, Implementer, and Reviewer — communicate through structured messages, each with a narrow mandate and isolated execution environment. The Coordinator decomposes problems and routes work; the Researcher analyzes the codebase to produce structured findings; the Implementer writes code guided by those findings; and the Reviewer validates against the original requirements, triggering re-research when needed.
The iterative review loop is what distinguishes Agyn from pipeline-based multi-agent systems. When the Reviewer identifies failures — test regressions, unmet requirements, logical errors — it produces specific feedback that flows back through the Coordinator to the Researcher, who re-analyzes the relevant code sections. The Implementer then creates targeted fixes rather than rewriting from scratch. This loop continues until the Reviewer passes the changes or an iteration limit is reached. This progressive refinement mimics the PR review cycle in real teams and prevents the compounding errors that plague single-pass approaches.
Sandbox isolation is the third pillar. Each agent operates within bounded environments with defined resource constraints and repository access limits. This prevents agents from making unintended modifications, enables safe experimentation (e.g., the Researcher can explore code paths without side effects), and ensures that only the Implementer's final, reviewed changes are applied. Structured communication — messages containing task parameters, file references, and execution constraints — replaces the lossy context passing of monolithic prompts.
Receive and analyze the issue: Parse the user's request into a concrete issue specification with success criteria, target repository paths, and constraints. If the user provides a GitHub issue URL, fetch its details. Identify whether this is a bug fix, feature implementation, or refactoring task.
Spawn the Coordinator agent: Create a team using TeamCreate and launch a general-purpose agent as the Coordinator. The Coordinator owns the task list, decomposes the problem, and manages all inter-agent routing. It does NOT write code — it orchestrates.
Launch the Researcher agent in an Explore role: Spawn an Explore-type agent (read-only) tasked with codebase analysis. The Researcher's mandate is: identify relevant files, trace execution paths, map dependencies, understand existing patterns, and produce a structured research report. Give it the issue description and any known entry points.
Collect research findings into a task specification: The Coordinator receives the Researcher's report — a list of relevant files, root cause analysis (for bugs), or design context (for features) — and produces a precise task specification. This spec includes: files to modify, the expected behavior, constraints to respect, and test commands to validate.
Launch the Implementer agent: Spawn a general-purpose agent (with write access) that receives the task specification and research findings. The Implementer writes code changes, runs tests in its sandbox, and produces a candidate patch. It should NOT explore the codebase broadly — that was the Researcher's job.
Launch the Reviewer agent: Spawn a general-purpose agent to review the Implementer's changes. The Reviewer reads the diff, runs the test suite, checks alignment with the original issue requirements, and produces a structured review: PASS (with rationale) or FAIL (with specific issues and file:line references).
Handle review feedback in an iterative loop: If the Reviewer returns FAIL, the Coordinator routes the specific feedback back. For understanding-related failures, re-engage the Researcher to gather additional context. For implementation-related failures, send targeted fix instructions to the Implementer. Each iteration should be narrowly scoped — fix the specific failure, don't rewrite.
Cap iterations and converge: Set a maximum of 3 review iterations. If the patch passes, apply it. If the iteration limit is reached with a still-failing review, present the best attempt to the user with a summary of remaining issues and the Reviewer's final feedback.
Aggregate results and clean up: Collect the final diff, test results, and a summary of the methodology (research findings, implementation decisions, review outcomes). Shut down all agents gracefully using SendMessage with shutdown_request. Clean up the team with TeamDelete.
Present the solution: Show the user the final changes with context: what was found during research, what approach was chosen and why, what the Reviewer validated, and any caveats.
Example 1: Fixing a bug reported in a GitHub issue
User: "Resolve this issue — users report that the /api/export endpoint returns
a 500 error when the dataset has null values in the 'timestamp' column."
Approach:
1. Coordinator creates tasks: research the export endpoint, identify null handling,
implement fix, validate with tests.
2. Researcher (Explore agent) traces the /api/export route:
- Finds route handler in src/routes/export.ts:45
- Follows data flow to src/services/exporter.ts:112 where timestamps are
formatted without null checks
- Identifies that formatTimestamp() in src/utils/date.ts:23 throws on null input
- Notes existing test file: tests/export.test.ts (no null-value test case)
Research report:
- Root cause: formatTimestamp() at src/utils/date.ts:23 calls .toISOString()
on null
- Fix location: src/utils/date.ts:23 or src/services/exporter.ts:112
- Related pattern: other formatters in date.ts handle null (see formatDate:31)
- Test gap: no null-column test in tests/export.test.ts
3. Coordinator produces task spec:
- Add null guard in formatTimestamp() following existing pattern from formatDate()
- Add test case for null timestamp column in export test suite
- Run: npm test -- --grep "export"
4. Implementer adds null check and test case, runs tests — all pass.
5. Reviewer validates:
- Diff is minimal and follows existing patterns: PASS
- New test covers the exact failure scenario: PASS
- No regressions in test suite: PASS
- Review: APPROVED
Output: Two-file patch (src/utils/date.ts, tests/export.test.ts) with explanation.
Example 2: Implementing a new feature with unfamiliar codebase
User: "Add rate limiting to all public API endpoints. Use a sliding window
algorithm. I want it configurable per-route."
Approach:
1. Coordinator identifies subtasks: research API routing layer, research
existing middleware patterns, implement rate limiter, implement per-route
config, review.
2. Researcher analyzes:
- Middleware chain in src/middleware/index.ts — finds auth, logging, cors
- Route registration in src/routes/index.ts — finds registerPublicRoutes()
- Config pattern in src/config/index.ts — env-based with defaults
- Existing Redis connection in src/services/redis.ts (available for state)
Research report:
- Middleware pattern: export function, register in chain at src/middleware/index.ts:8
- Config pattern: add to src/config/index.ts with RATE_LIMIT_ prefix
- Redis client available at src/services/redis.ts:getClient()
- Public routes identifiable via registerPublicRoutes() in src/routes/index.ts:34
- Per-route config: recommend decorator/metadata pattern matching existing
route options at src/routes/types.ts:12
3. Implementer creates:
- src/middleware/rateLimit.ts — sliding window using Redis sorted sets
- Adds route-level config option to src/routes/types.ts
- Registers middleware in src/middleware/index.ts
- Adds config defaults in src/config/index.ts
- Writes tests in tests/middleware/rateLimit.test.ts
4. Reviewer flags: "Rate limit key uses IP only — behind a load balancer
this will limit all users together. Should use X-Forwarded-For with fallback."
5. Iteration 2: Implementer fixes key extraction to use X-Forwarded-For header
with configurable trust proxy setting.
6. Reviewer: APPROVED — all tests pass, pattern matches existing middleware.
Output: Multi-file patch with rate limiting middleware, per-route configuration,
and test suite.
Example 3: Coordinating a refactoring task
User: "Refactor the authentication module to use JWT instead of session cookies.
Don't break existing endpoints."
Approach:
1. Coordinator creates phased plan: research current auth flow, research all
consumers, implement JWT auth, implement backward compatibility, review.
2. Researcher maps the full auth surface:
- Session creation in src/auth/session.ts
- 14 endpoints that call req.session.user
- 3 middleware functions checking session validity
- Frontend cookie handling in src/client/api.ts
- Test fixtures using session mocking in tests/helpers/auth.ts
3. Coordinator creates a task spec prioritizing backward compatibility:
- Support both JWT (Authorization header) and session cookies during migration
- New JWT middleware that falls back to session check
4. Implementer creates JWT utilities, dual-mode auth middleware, and updates
test helpers. Runs full test suite.
5. Reviewer catches: "The refresh token endpoint still creates a session object
— this will leak sessions even for JWT-authenticated users."
6. Iteration 2: Researcher confirms the refresh endpoint at src/auth/refresh.ts:28
creates sessions unconditionally. Implementer adds a conditional check.
7. Reviewer: APPROVED after verifying all 14 endpoints work with both auth methods.
Output: Comprehensive refactoring patch with dual-mode authentication,
no breaking changes, and full test coverage.
Do:
Avoid:
SendMessage to the specific agent that needs to act.Agyn: A Multi-Agent System for Team-Based Autonomous Software Engineering — Benkovich & Valkov, 2026. Focus on Section 3 (System Architecture) for agent role definitions and communication protocols, and Section 4 (Development Methodology) for the iterative review loop that drives the 72.2% SWE-bench resolution rate.
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".