skills/integration-testing/SKILL.md
Run end-to-end integration tests with all 15 agents and supervisor in local Docker Compose dev environment. Validates agent discovery, multi-agent routing, checkpoint persistence, and cross-agent follow-up conversations.
npx skillsauth add cnoe-io/ai-platform-engineering local-integration-testingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Run end-to-end integration tests against the full CAIPE multi-agent stack in the local Docker Compose dev environment.
docker-compose.dev.yaml present in the repo root.env file configured with required API keys and agent enable flagscaipe-mongodb-dev) runningEdit .env — ensure all agent flags are enabled:
ENABLE_ARGOCD=true
ENABLE_AWS=true
ENABLE_BACKSTAGE=true
ENABLE_CONFLUENCE=true
ENABLE_GITHUB=true
ENABLE_GITLAB=true
ENABLE_JIRA=true
ENABLE_KOMODOR=true
ENABLE_NETUTILS=true
ENABLE_PAGERDUTY=true
ENABLE_SLACK=true
ENABLE_SPLUNK=true
ENABLE_VICTOROPS=true
ENABLE_WEATHER=true
ENABLE_WEBEX=true
Verify MongoDB checkpoint type:
grep "^LANGGRAPH_CHECKPOINT_TYPE=" .env
# Should output: LANGGRAPH_CHECKPOINT_TYPE=mongodb
Bring up all containers:
IMAGE_TAG=latest docker compose -f docker-compose.dev.yaml up -d
Wait for agents to be healthy (agents take 15-30 seconds to initialize):
# Check all agent containers are running
docker ps --filter "name=agent-" --format "table {{.Names}}\t{{.Status}}" | sort
Restart supervisor after agents are warm (avoids race condition where supervisor starts before agents are ready):
docker restart caipe-supervisor
Verify supervisor discovered all agents:
docker logs caipe-supervisor 2>&1 | grep -E "(ONLINE|subagents|tools)" | tail -5
Expected: Deep agent updated with 15 tools and 15 subagents
Run the checkpoint validation script:
./skills/persistence/validate_agent_checkpoints.sh
This checks:
Open the CAIPE UI at http://localhost:3000 and test multi-agent routing:
| Test | Query | Expected Agents | Verification |
|------|-------|-----------------|--------------|
| AWS | "list EKS clusters" | supervisor → aws | checkpoints_aws has docs |
| ArgoCD | "show argocd version" | supervisor → argocd | checkpoints_argocd has docs |
| Jira | "show recent Jira issues" | supervisor → jira | checkpoints_jira has docs |
| Splunk | "check latest splunk logs" | supervisor → splunk | checkpoints_splunk has docs |
| Weather | "what's the weather in San Jose?" | supervisor → weather | checkpoints_weather has docs |
| Multi-agent | "list EKS clusters and show ArgoCD version" | supervisor → aws + argocd | Both checkpoint collections grow |
After each query, verify checkpoint writes:
docker exec caipe-mongodb-dev mongosh "mongodb://admin:changeme@localhost:27017/caipe?authSource=admin" --quiet --eval '
db.getCollectionNames().filter(c => c.includes("checkpoint")).sort().forEach(function(c) {
print(c + ": " + db.getCollection(c).countDocuments() + " docs");
});
'
docker restart caipe-supervisor agent-aws agent-argocd agent-jira
docker exec caipe-mongodb-dev mongosh "mongodb://admin:changeme@localhost:27017/caipe?authSource=admin" --quiet --eval '
var colls = db.getCollectionNames().filter(c => c.startsWith("checkpoints_"));
var threadMap = {};
colls.forEach(function(coll) {
db.getCollection(coll).distinct("thread_id").forEach(function(tid) {
if (!threadMap[tid]) threadMap[tid] = [];
threadMap[tid].push(coll);
});
});
var shared = 0;
Object.keys(threadMap).forEach(function(tid) {
if (threadMap[tid].length > 1) {
shared++;
print("thread " + tid.substring(0,8) + "... → " + threadMap[tid].join(", "));
}
});
if (shared > 0) {
print(shared + " threads shared across collections (expected — supervisor forwards context_id)");
} else {
print("No shared threads — each agent is fully isolated");
}
'
Shared thread IDs between supervisor and subagent collections are expected — the supervisor forwards its context_id as the subagent's thread_id. What matters is that each agent's graph state is only in its own collection.
# Check agent logs for startup errors
docker logs agent-<name> 2>&1 | tail -20
# Verify the ENABLE flag in .env
grep "ENABLE_<NAME>" .env
# Restart supervisor after agent is running
docker restart caipe-supervisor
# Check for InMemorySaver fallback in logs
docker logs agent-<name> 2>&1 | grep -i "InMemorySaver\|checkpointer"
# Verify the agent imports get_checkpointer()
grep -r "get_checkpointer\|MemorySaver\|InMemorySaver" ai_platform_engineering/agents/<name>/
# Remove stale containers
docker rm -f agent-<name>
docker compose -f docker-compose.dev.yaml up -d agent-<name>
Agents take 15-30 seconds to initialize. If supervisor starts first, it marks agents as offline.
# Wait for all agents, then restart supervisor
sleep 30
docker restart caipe-supervisor
# Full validation (run after stack is up)
./skills/persistence/validate_agent_checkpoints.sh
# Validate specific agents
./skills/persistence/validate_agent_checkpoints.sh aws jira argocd
# Check supervisor agent discovery
docker logs caipe-supervisor 2>&1 | grep -E "subagents|ONLINE|OFFLINE" | tail -20
# MongoDB checkpoint overview
docker exec caipe-mongodb-dev mongosh "mongodb://admin:changeme@localhost:27017/caipe?authSource=admin" --quiet --eval 'db.getCollectionNames().filter(c => c.includes("checkpoint")).sort().forEach(c => print(c + ": " + db.getCollection(c).countDocuments() + " docs"))'
# Watch agent logs in real-time
docker logs -f agent-aws 2>&1 | grep -i "checkpoint\|error"
docker restart, not rebuildvalidate_agent_checkpoints.sh script is the single source of truth for checkpoint healththread_id values across supervisor and subagent collections are expected behavior.envtesting
Compare A2A streaming behaviour across supervisor versions. Captures SSE events, analyzes metadata flags (is_narration, is_final_answer), and produces side-by-side comparison reports.
testing
Generate a comprehensive sprint progress report from Jira with velocity metrics, burndown analysis, blocker identification, and team workload distribution. Use when preparing sprint reviews, standups, or tracking sprint health mid-cycle.
development
Scan GitHub repositories for security vulnerabilities including Dependabot alerts, code scanning results, and secret scanning findings. Use when auditing repository security, preparing compliance reports, or triaging vulnerability alerts.
development
Perform a comprehensive code review of a specific GitHub Pull Request. Analyzes code changes, checks for bugs, security issues, test coverage, and coding standards compliance. Use when a user provides a PR URL or asks to review a specific pull request.