Local E2E Testing Guide

Run full end-to-end tests of the agent swarm locally with a real API server and Docker containers.

When to Use This Skill

This skill should be invoked in two modes:

User-requested QA: The user asks you to run E2E tests, verify a feature, or QA a specific flow. Follow the steps below targeting what they asked for.
Automated change verification: After implementing changes that touch the API, runner, polling, task lifecycle, session logs, Docker entrypoint, or worker/lead behavior — use this skill proactively to verify the changes work end-to-end. Determine what's testable based on the diff:
- Task lifecycle changes (poll, runner, store-progress): Create assigned + pool tasks, verify they complete and have correct logs
- Session log changes: Run two sequential tasks on the same agent, verify log isolation (unique sessionIds, no cross-contamination)
- Docker / entrypoint changes: Build image, start containers, verify boot logs and registration
- UI changes: Start the dashboard, use agent-browser/qa-use to verify rendering
- API endpoint changes: Call the endpoint directly and verify the response

You do not need to run every step — pick the subset relevant to the changes being tested.

Prerequisites

OrbStack or Docker Desktop running (open -a OrbStack if needed)
.env with API_KEY and PORT configured
.env.docker-lead with lead config (AGENT_ID, CLAUDE_CODE_OAUTH_TOKEN, MCP_BASE_URL)
.env.docker with worker config (AGENT_ID, CLAUDE_CODE_OAUTH_TOKEN or OPENROUTER_API_KEY, MCP_BASE_URL)

Step 1: Determine Your Port

Check .env for the configured port — do not assume 3013:

grep ^PORT= .env

Use this value as $PORT throughout. In worktrees, each worktree may have a different port. Always verify and use the value from .env.

Also verify the Docker env files match:

grep MCP_BASE_URL .env.docker-lead .env.docker
# Both should point to http://host.docker.internal:$PORT

If they don't match, update them before starting containers.

Step 2: Clean DB + Start API Server

# Kill any existing API process on your port
lsof -ti :$PORT | xargs kill 2>/dev/null

# Clean DB for fresh state
rm -f agent-swarm-db.sqlite agent-swarm-db.sqlite-wal agent-swarm-db.sqlite-shm

# Start API server
bun run start:http &
# Wait ~3s for startup, confirm "MCP HTTP server running on http://localhost:$PORT/mcp"

Step 3: Build Docker Image

bun run docker:build:worker

This builds agent-swarm-worker:latest from the current code. Rebuild after every code change.

Step 4: Start Lead Container

Use a unique container name to avoid conflicts with other worktrees (e.g. include branch name or feature):

docker run --rm -d \
  --name e2e-lead-$(git branch --show-current | tr '/' '-') \
  --env-file .env.docker-lead \
  -e AGENT_ROLE=lead \
  -e MAX_CONCURRENT_TASKS=1 \
  -p 3201:3000 \
  agent-swarm-worker:latest

Wait ~15s, then verify:

docker logs e2e-lead-$(git branch --show-current | tr '/' '-') 2>&1 | tail -5
# Should see: "[lead] Polling for triggers (0/1 active)..."

If port 3201 is taken by another worktree, pick a different host port (e.g. -p 3211:3000).

Step 5: Start Worker Container

docker run --rm -d \
  --name e2e-worker-$(git branch --show-current | tr '/' '-') \
  --env-file .env.docker \
  -e MAX_CONCURRENT_TASKS=1 \
  -p 3203:3000 \
  agent-swarm-worker:latest

Wait ~15s, then verify:

docker logs e2e-worker-$(git branch --show-current | tr '/' '-') 2>&1 | tail -5
# Should see: "[worker] Polling for triggers (0/1 active)..."

Step 6: Verify Registration

Use context-mode execute (not curl directly due to hook restrictions):

const headers = { 'Authorization': 'Bearer $API_KEY', 'Content-Type': 'application/json' };
const agents = await (await fetch('http://localhost:$PORT/api/agents', { headers })).json();
for (const a of agents.agents) {
  console.log(`${a.name} | isLead: ${a.isLead} | status: ${a.status} | id: ${a.id}`);
}

Should show both lead and worker registered as idle. Save the agent IDs for task creation.

Step 7: Create Tasks

Assigned task (picked up by lead)

const t = await (await fetch('http://localhost:$PORT/api/tasks', {
  method: 'POST', headers,
  body: JSON.stringify({ task: 'Say hello. Call store-progress with status completed.', agentId: LEAD_ID })
})).json();
console.log('Task:', t.id, '| status:', t.status);

Important: Use agentId (not assignedTo) to assign tasks. Wrong param silently creates an unassigned task.

Pool task (auto-claimed by worker)

const t = await (await fetch('http://localhost:$PORT/api/tasks', {
  method: 'POST', headers,
  body: JSON.stringify({ task: 'Say hello. Call store-progress with status completed.' })
})).json();
console.log('Pool task:', t.id, '| status:', t.status);

Workers auto-claim unassigned tasks at poll time. Leads do not auto-claim pool tasks.

Step 8: Monitor Progress

# Watch lead logs (use your container name)
docker logs -f e2e-lead-$(git branch --show-current | tr '/' '-') 2>&1 | tail -20

# Watch worker logs
docker logs -f e2e-worker-$(git branch --show-current | tr '/' '-') 2>&1 | tail -20

Poll task status:

const t = await (await fetch('http://localhost:$PORT/api/tasks/<task-id>', { headers })).json();
console.log(t.status);  // pending → in_progress → completed/failed

Step 9: Verify Session Logs

const logs = await (await fetch('http://localhost:$PORT/api/tasks/<task-id>/session-logs', { headers })).json();
console.log('Log count:', logs.logs.length);
// Should be > 0 for completed tasks

For log isolation verification (multiple sequential tasks from same agent):

const [l1, l2] = await Promise.all([
  fetch('http://localhost:$PORT/api/tasks/<task1>/session-logs', { headers }).then(r => r.json()),
  fetch('http://localhost:$PORT/api/tasks/<task2>/session-logs', { headers }).then(r => r.json()),
]);
const s1 = [...new Set(l1.logs.map(l => l.sessionId))];
const s2 = [...new Set(l2.logs.map(l => l.sessionId))];
console.log('Unique sessionIds:', s1[0] !== s2[0]);  // Should be true

Step 10: Test the Dashboard UI

Start the dashboard to visually verify tasks, logs, and agent status:

cd new-ui && pnpm run dev &
# Defaults to port from APP_URL in .env (check with: grep APP_URL ../.env)

If the UI port is taken by another worktree, start on an alternate:

cd new-ui && pnpm run dev --port 5276

The UI connects to the API via VITE_API_URL (check new-ui/.env or defaults to http://localhost:$PORT).

Visual verification with agent-browser / qa-use

Use agent-browser or qa-use to automate UI checks:

# Quick visual gut-check with agent-browser
agent-browser --url http://localhost:5175 snapshot

# Or use qa-use to verify specific flows
qa-use explore http://localhost:5175

Things to verify in the UI:

Agents page: Lead and worker both show as registered with correct status
Tasks page: Tasks appear with correct status, assigned agent, and timestamps
Task detail → Logs tab: Session logs render in the conversation viewer (not "No session data available")
Task detail → Outcome tab: Completed tasks show output
Costs: Session costs appear for completed tasks

Step 11: Cleanup

# Stop containers (use your branch-specific names)
docker stop e2e-lead-$(git branch --show-current | tr '/' '-') e2e-worker-$(git branch --show-current | tr '/' '-') 2>/dev/null

# Stop API server
lsof -ti :$PORT | xargs kill 2>/dev/null

# Stop UI dev server (if started)
lsof -ti :5175 | xargs kill 2>/dev/null

Troubleshooting

Docker daemon not running

ERROR: Cannot connect to the Docker daemon

Fix: open -a OrbStack and wait ~5s.

Container name conflict

docker: Error response from daemon: Conflict. The container name "..." is already in use

Another worktree has a container with the same name. Either stop it (docker stop <name>) or use a different name suffix.

Lead not picking up tasks

Verify task was created with agentId (not assignedTo) — wrong param silently creates an unassigned task
Check task status isn't already in_progress (e.g. from a manual poll call that consumed the trigger)
Restart container if stuck: docker restart <container-name>

Worker not picking up pool tasks

Workers auto-claim via poll. Leads do not claim pool tasks.
Check worker has capacity: docker logs <container> 2>&1 | grep "capacity"
If "At capacity" — a previous task is still running. Wait or restart.

Poll returns 404

Poll endpoint is GET /api/poll (not POST)
Requires X-Agent-ID header with a valid agent UUID

Port conflicts (worktrees)

lsof -i :3013  # Check what's using the port

If another worktree is running, set a different PORT in .env and update MCP_BASE_URL in .env.docker* to http://host.docker.internal:<new-port>.

Session logs show 0 entries

Task must have actually run (status completed or failed, not just in_progress)
Check claudeSessionId is set on the task: GET /api/tasks/<id> should show it
If logs were stored under wrong taskId, check the session_logs table directly

Task cancellation doesn't stop Claude

Direct API cancellation (POST /api/tasks/<id>/cancel) updates the DB but doesn't kill the Claude process inside Docker. Use docker restart <container> to force-stop.

Keep tasks trivial

Use simple tasks like "Say hello" for E2E tests. Complex tasks waste time and API credits.

UI shows stale data

The dashboard auto-polls every 5 seconds. If data looks stale, hard-refresh (Cmd+Shift+R) or check VITE_API_URL points to the correct API port.

Local E2E Testing Guide

Run full end-to-end tests of the agent swarm locally with a real API server and Docker containers.

When to Use This Skill

This skill should be invoked in two modes:

User-requested QA: The user asks you to run E2E tests, verify a feature, or QA a specific flow. Follow the steps below targeting what they asked for.
Automated change verification: After implementing changes that touch the API, runner, polling, task lifecycle, session logs, Docker entrypoint, or worker/lead behavior — use this skill proactively to verify the changes work end-to-end. Determine what's testable based on the diff:
- Task lifecycle changes (poll, runner, store-progress): Create assigned + pool tasks, verify they complete and have correct logs
- Session log changes: Run two sequential tasks on the same agent, verify log isolation (unique sessionIds, no cross-contamination)
- Docker / entrypoint changes: Build image, start containers, verify boot logs and registration
- UI changes: Start the dashboard, use agent-browser/qa-use to verify rendering
- API endpoint changes: Call the endpoint directly and verify the response

You do not need to run every step — pick the subset relevant to the changes being tested.

Prerequisites

OrbStack or Docker Desktop running (open -a OrbStack if needed)
.env with API_KEY and PORT configured
.env.docker-lead with lead config (AGENT_ID, CLAUDE_CODE_OAUTH_TOKEN, MCP_BASE_URL)
.env.docker with worker config (AGENT_ID, CLAUDE_CODE_OAUTH_TOKEN or OPENROUTER_API_KEY, MCP_BASE_URL)

Step 1: Determine Your Port

Check .env for the configured port — do not assume 3013:

grep ^PORT= .env

Use this value as $PORT throughout. In worktrees, each worktree may have a different port. Always verify and use the value from .env.

Also verify the Docker env files match:

grep MCP_BASE_URL .env.docker-lead .env.docker
# Both should point to http://host.docker.internal:$PORT

If they don't match, update them before starting containers.

Step 2: Clean DB + Start API Server

# Kill any existing API process on your port
lsof -ti :$PORT | xargs kill 2>/dev/null

# Clean DB for fresh state
rm -f agent-swarm-db.sqlite agent-swarm-db.sqlite-wal agent-swarm-db.sqlite-shm

# Start API server
bun run start:http &
# Wait ~3s for startup, confirm "MCP HTTP server running on http://localhost:$PORT/mcp"

Step 3: Build Docker Image

bun run docker:build:worker

This builds agent-swarm-worker:latest from the current code. Rebuild after every code change.

Step 4: Start Lead Container

Use a unique container name to avoid conflicts with other worktrees (e.g. include branch name or feature):

docker run --rm -d \
  --name e2e-lead-$(git branch --show-current | tr '/' '-') \
  --env-file .env.docker-lead \
  -e AGENT_ROLE=lead \
  -e MAX_CONCURRENT_TASKS=1 \
  -p 3201:3000 \
  agent-swarm-worker:latest

Wait ~15s, then verify:

docker logs e2e-lead-$(git branch --show-current | tr '/' '-') 2>&1 | tail -5
# Should see: "[lead] Polling for triggers (0/1 active)..."

If port 3201 is taken by another worktree, pick a different host port (e.g. -p 3211:3000).

Step 5: Start Worker Container

docker run --rm -d \
  --name e2e-worker-$(git branch --show-current | tr '/' '-') \
  --env-file .env.docker \
  -e MAX_CONCURRENT_TASKS=1 \
  -p 3203:3000 \
  agent-swarm-worker:latest

Wait ~15s, then verify:

docker logs e2e-worker-$(git branch --show-current | tr '/' '-') 2>&1 | tail -5
# Should see: "[worker] Polling for triggers (0/1 active)..."

Step 6: Verify Registration

Use context-mode execute (not curl directly due to hook restrictions):

const headers = { 'Authorization': 'Bearer $API_KEY', 'Content-Type': 'application/json' };
const agents = await (await fetch('http://localhost:$PORT/api/agents', { headers })).json();
for (const a of agents.agents) {
  console.log(`${a.name} | isLead: ${a.isLead} | status: ${a.status} | id: ${a.id}`);
}

Should show both lead and worker registered as idle. Save the agent IDs for task creation.

Step 7: Create Tasks

Assigned task (picked up by lead)

const t = await (await fetch('http://localhost:$PORT/api/tasks', {
  method: 'POST', headers,
  body: JSON.stringify({ task: 'Say hello. Call store-progress with status completed.', agentId: LEAD_ID })
})).json();
console.log('Task:', t.id, '| status:', t.status);

Important: Use agentId (not assignedTo) to assign tasks. Wrong param silently creates an unassigned task.

Pool task (auto-claimed by worker)

const t = await (await fetch('http://localhost:$PORT/api/tasks', {
  method: 'POST', headers,
  body: JSON.stringify({ task: 'Say hello. Call store-progress with status completed.' })
})).json();
console.log('Pool task:', t.id, '| status:', t.status);

Workers auto-claim unassigned tasks at poll time. Leads do not auto-claim pool tasks.

Step 8: Monitor Progress

# Watch lead logs (use your container name)
docker logs -f e2e-lead-$(git branch --show-current | tr '/' '-') 2>&1 | tail -20

# Watch worker logs
docker logs -f e2e-worker-$(git branch --show-current | tr '/' '-') 2>&1 | tail -20

Poll task status:

const t = await (await fetch('http://localhost:$PORT/api/tasks/<task-id>', { headers })).json();
console.log(t.status);  // pending → in_progress → completed/failed

Step 9: Verify Session Logs

const logs = await (await fetch('http://localhost:$PORT/api/tasks/<task-id>/session-logs', { headers })).json();
console.log('Log count:', logs.logs.length);
// Should be > 0 for completed tasks

For log isolation verification (multiple sequential tasks from same agent):

const [l1, l2] = await Promise.all([
  fetch('http://localhost:$PORT/api/tasks/<task1>/session-logs', { headers }).then(r => r.json()),
  fetch('http://localhost:$PORT/api/tasks/<task2>/session-logs', { headers }).then(r => r.json()),
]);
const s1 = [...new Set(l1.logs.map(l => l.sessionId))];
const s2 = [...new Set(l2.logs.map(l => l.sessionId))];
console.log('Unique sessionIds:', s1[0] !== s2[0]);  // Should be true

Step 10: Test the Dashboard UI

Start the dashboard to visually verify tasks, logs, and agent status:

cd new-ui && pnpm run dev &
# Defaults to port from APP_URL in .env (check with: grep APP_URL ../.env)

If the UI port is taken by another worktree, start on an alternate:

cd new-ui && pnpm run dev --port 5276

The UI connects to the API via VITE_API_URL (check new-ui/.env or defaults to http://localhost:$PORT).

Visual verification with agent-browser / qa-use

Use agent-browser or qa-use to automate UI checks:

# Quick visual gut-check with agent-browser
agent-browser --url http://localhost:5175 snapshot

# Or use qa-use to verify specific flows
qa-use explore http://localhost:5175

Things to verify in the UI:

Agents page: Lead and worker both show as registered with correct status
Tasks page: Tasks appear with correct status, assigned agent, and timestamps
Task detail → Logs tab: Session logs render in the conversation viewer (not "No session data available")
Task detail → Outcome tab: Completed tasks show output
Costs: Session costs appear for completed tasks

Step 11: Cleanup

# Stop containers (use your branch-specific names)
docker stop e2e-lead-$(git branch --show-current | tr '/' '-') e2e-worker-$(git branch --show-current | tr '/' '-') 2>/dev/null

# Stop API server
lsof -ti :$PORT | xargs kill 2>/dev/null

# Stop UI dev server (if started)
lsof -ti :5175 | xargs kill 2>/dev/null

Troubleshooting

Docker daemon not running

ERROR: Cannot connect to the Docker daemon

Fix: open -a OrbStack and wait ~5s.

Container name conflict

docker: Error response from daemon: Conflict. The container name "..." is already in use

Another worktree has a container with the same name. Either stop it (docker stop <name>) or use a different name suffix.

Lead not picking up tasks

Verify task was created with agentId (not assignedTo) — wrong param silently creates an unassigned task
Check task status isn't already in_progress (e.g. from a manual poll call that consumed the trigger)
Restart container if stuck: docker restart <container-name>

Worker not picking up pool tasks

Workers auto-claim via poll. Leads do not claim pool tasks.
Check worker has capacity: docker logs <container> 2>&1 | grep "capacity"
If "At capacity" — a previous task is still running. Wait or restart.

Poll returns 404

Poll endpoint is GET /api/poll (not POST)
Requires X-Agent-ID header with a valid agent UUID

Port conflicts (worktrees)

lsof -i :3013  # Check what's using the port

If another worktree is running, set a different PORT in .env and update MCP_BASE_URL in .env.docker* to http://host.docker.internal:<new-port>.

Session logs show 0 entries

Task must have actually run (status completed or failed, not just in_progress)
Check claudeSessionId is set on the task: GET /api/tasks/<id> should show it
If logs were stored under wrong taskId, check the session_logs table directly

Task cancellation doesn't stop Claude

Direct API cancellation (POST /api/tasks/<id>/cancel) updates the DB but doesn't kill the Claude process inside Docker. Use docker restart <container> to force-stop.

Keep tasks trivial

Use simple tasks like "Say hello" for E2E tests. Complex tasks waste time and API credits.

UI shows stale data

The dashboard auto-polls every 5 seconds. If data looks stale, hard-refresh (Cmd+Shift+R) or check VITE_API_URL points to the correct API port.

Adoption

desplega-ai/swarm-local-e2e

$ install --global

Security Scan Results

SKILL.md

Local E2E Testing Guide

When to Use This Skill

Prerequisites

Step 1: Determine Your Port

Step 2: Clean DB + Start API Server

Step 3: Build Docker Image

Step 4: Start Lead Container

Step 5: Start Worker Container

Step 6: Verify Registration

Step 7: Create Tasks

Assigned task (picked up by lead)

Pool task (auto-claimed by worker)

Step 8: Monitor Progress

Step 9: Verify Session Logs

Step 10: Test the Dashboard UI

Visual verification with agent-browser / qa-use

Step 11: Cleanup

Troubleshooting

Docker daemon not running

Container name conflict

Lead not picking up tasks

Worker not picking up pool tasks

Poll returns 404

Port conflicts (worktrees)

Session logs show 0 entries

Task cancellation doesn't stop Claude

Keep tasks trivial

UI shows stale data

Related Skills

desplega-ai/plugin/skills/artifacts

desplega-ai/work-on-task

desplega-ai/user-management

desplega-ai/todos

desplega-ai/swarm-local-e2e

$ install --global

Security Scan Results

SKILL.md

Local E2E Testing Guide

When to Use This Skill

Prerequisites

Step 1: Determine Your Port

Step 2: Clean DB + Start API Server

Step 3: Build Docker Image

Step 4: Start Lead Container

Step 5: Start Worker Container

Step 6: Verify Registration

Step 7: Create Tasks

Assigned task (picked up by lead)

Pool task (auto-claimed by worker)

Step 8: Monitor Progress

Step 9: Verify Session Logs

Step 10: Test the Dashboard UI

Visual verification with agent-browser / qa-use

Step 11: Cleanup

Troubleshooting

Docker daemon not running

Container name conflict

Lead not picking up tasks

Worker not picking up pool tasks

Poll returns 404

Port conflicts (worktrees)

Session logs show 0 entries

Task cancellation doesn't stop Claude

Keep tasks trivial

UI shows stale data

Related Skills

desplega-ai/plugin/skills/artifacts

desplega-ai/work-on-task

desplega-ai/user-management

desplega-ai/todos