skills/fipa-00023-agent-management/SKILL.md
FIPA standard for agent platform management including lifecycle, directory, and communication services
npx skillsauth add curiositech/windags-skills fipa-00023-agent-managementInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
metadata:
name: fipa-agent-management
version: XC00023H
description: >
Canonical patterns for naming, registering, lifecycling, and managing
autonomous agents in multi-agent systems. Derived from the FIPA Agent
Management Specification — the normative standard for interoperable
agent infrastructure.
activation_triggers:
- designing multi-agent orchestration systems
- implementing agent discovery or capability routing
- building skill registries or service directories
- handling agent failures or retry logic in distributed systems
- designing agent identity, addressing, or transport layers
- modeling agent state or lifecycle transitions
- building federated or hierarchical agent platforms
IF building single-organization agent platform (< 50 agents):
├─ Use single AMS + single DF
├─ Direct resolution of AIDs to transport addresses
└─ Flat capability search across all agents
ELSE IF building multi-organization/federated system:
├─ Use federated DF architecture
├─ Each organization runs own AMS/DF pair
├─ DFs register with parent/peer DFs
└─ Search propagates with max-depth bounds
IF agent needs specific capability:
├─ Query local DF first with service description
├─ IF no results AND federated topology exists:
│ └─ Propagate search to federated DFs (depth ≤ 3)
└─ ELSE IF no results in flat topology:
└─ Return "capability not available" (don't retry)
IF agent joins platform:
├─ Register with AMS first (establish identity + lifecycle state)
├─ THEN register capabilities with DF
└─ Registration order matters: existence before capabilities
IF receive typed FIPA exception:
├─ unsupported/unrecognised → route to different agent type
├─ missing/malformed → fix request parameters, retry once
├─ unauthorised → escalate to platform admin
├─ not-registered → agent reference stale, re-discover via DF
├─ already-registered → idempotent operation, continue
└─ internal-error → exponential backoff retry (max 3 attempts)
ELSE IF receive non-FIPA error or timeout:
├─ Mark agent as Unknown in AMS
├─ Trigger platform-level health check
└─ Do not retry until health restored
IF need to send request to agent:
├─ Query AMS for current lifecycle state
├─ IF state = Active: send immediately
├─ IF state = Waiting: send with acknowledgment protocol
├─ IF state = Suspended: queue or route to alternative
└─ IF state = Unknown/Transit: wait for state resolution
IF agent reports failure/becomes unresponsive:
├─ AMS transitions agent to Unknown state
├─ DF keeps capability registrations (for recovery)
├─ Platform stops routing new requests to agent
└─ Initiate recovery protocol (restart/reregister)
Symptoms: Direct calls to http://agent.host:8080, hard-coded URLs in routing logic, broken references when agents move
Detection Rule: If you store transport addresses as permanent identifiers, you have this anti-pattern
Fix: Store AIDs only; resolve transport addresses via AMS at invocation time
Symptoms: Single table/store tracking both agent existence and capabilities, lifecycle changes break service discovery
Detection Rule: If updating agent state requires touching capability records, you've collapsed the registries
Fix: Separate AMS (white pages) from DF (yellow pages); independent update operations
Symptoms: Generic error handling that retries all failures identically, no differentiation between temporary vs permanent failures
Detection Rule: If your retry logic doesn't branch on failure type, you're in blind retry
Fix: Parse FIPA exception types; different recovery strategies per exception class
Symptoms: Sending requests without checking agent lifecycle state, timeouts on suspended agents
Detection Rule: If you invoke agents without AMS state check, you're state-blind
Fix: Query lifecycle state before invocation; handle each state appropriately
Symptoms: Single central DF handling all capability queries, search latency increases with agent count
Detection Rule: If all capability searches hit the same registry, you have a bottleneck
Fix: Implement federated DF architecture with bounded search propagation
Scenario: Agent in Organization A needs text-translation capability, which exists in Organization B's platform.
Step-by-step execution:
QUERY-REF(service-type=translation, input-lang=en, output-lang=es)[email protected], service-description=...Novice mistakes: Querying Organization B directly, storing B's transport address permanently, not using AID resolution
Expert insights: Federated search is bounded (prevents query storms), AID resolution happens at invocation time (handles agent migration), performative choice signals intent clearly
Scenario: Active translation agent crashes mid-task, client needs to detect failure and recover.
Step-by-step execution:
not-registered exception from message transportNovice mistakes: Infinite retry without state checking, assuming agent death means capability unavailable
Expert insights: AMS state change is separate from DF re-registration, multiple agents can offer same capability, exception type determines recovery strategy
Scenario: Agent platform adds MQTT transport alongside existing HTTP, agent needs to be reachable via both protocols.
Step-by-step execution:
[email protected], addresses=[http://host:8080/agent, mqtt://broker:1883/agent]Novice mistakes: Creating separate agent identities per protocol, forcing all clients to update references
Expert insights: Identity stability across transport changes, AID supports multiple addresses natively, transport choice happens at resolution time
Do NOT use this skill for:
This skill covers infrastructure-level coordination only: how agents find each other, communicate, and manage shared platform resources. For everything that happens inside an individual agent, delegate to other skills.
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.