skills/swarm/SKILL.md
N coordinated agents on shared task list with SQLite-based atomic claiming
npx skillsauth add MeroZemory/oh-my-droid swarmInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Spawn N coordinated agents working on a shared task list with SQLite-based atomic claiming. Like a dev team tackling multiple files in parallel—fast, reliable, and with full fault tolerance.
/swarm N:agent-type "task description"
/swarm 5:executor "fix all TypeScript errors"
/swarm 3:build-fixer "fix build errors in src/"
/swarm 4:designer "implement responsive layouts for all components"
/swarm 2:architect "analyze and document all API endpoints"
User: "/swarm 5:executor fix all TypeScript errors"
|
v
[SWARM ORCHESTRATOR]
|
+--+--+--+--+--+
| | | | |
v v v v v
E1 E2 E3 E4 E5
| | | | |
+--+--+--+--+
|
v
[SQLITE DATABASE]
┌─────────────────────┐
│ tasks table │
├─────────────────────┤
│ id, description │
│ status (pending, │
│ claimed, done, │
│ failed) │
│ claimed_by, claimed_at
│ completed_at, result│
│ error │
├─────────────────────┤
│ heartbeats table │
│ (agent monitoring) │
└─────────────────────┘
Key Features:
run_in_background: true for allWhen spawning swarm agents, ALWAYS wrap the task with the worker preamble to prevent recursive sub-agent spawning:
import { wrapWithPreamble } from '../droids/preamble.js';
// When spawning each agent:
const agentPrompt = wrapWithPreamble(`
Connect to swarm at ${cwd}/.omd/state/swarm.db
Claim tasks with claimTask('agent-${n}')
Complete work with completeTask() or failTask()
Send heartbeat every 60 seconds
Exit when hasPendingWork() returns false
`);
Task({
subagent_type: 'oh-my-droid:executor',
prompt: agentPrompt,
run_in_background: true
});
The worker preamble ensures agents:
Each agent follows this loop:
LOOP:
1. Call claimTask(agentId)
2. SQLite transaction:
- Find first pending task
- UPDATE status='claimed', claimed_by=agentId, claimed_at=now
- INSERT/UPDATE heartbeat record
- Atomically commit (only one agent succeeds)
3. Execute task
4. Call completeTask(agentId, taskId, result) or failTask()
5. GOTO LOOP (until hasPendingWork() returns false)
Atomic Claiming Details:
IMMEDIATE transaction prevents race conditionsheartbeat(agentId) every 60 seconds (or custom interval)Exit when ANY of:
/cancel.omd/state/swarm.db)The swarm uses a single SQLite database stored at .omd/state/swarm.db. This provides:
tasks Table SchemaCREATE TABLE tasks (
id TEXT PRIMARY KEY,
description TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'pending',
-- pending: waiting to be claimed
-- claimed: claimed by an agent, in progress
-- done: completed successfully
-- failed: completed with error
claimed_by TEXT, -- agent ID that claimed this task
claimed_at INTEGER, -- Unix timestamp when claimed
completed_at INTEGER, -- Unix timestamp when completed
result TEXT, -- Optional result/output from task
error TEXT -- Error message if task failed
);
heartbeats Table SchemaCREATE TABLE heartbeats (
agent_id TEXT PRIMARY KEY,
last_heartbeat INTEGER NOT NULL, -- Unix timestamp of last heartbeat
current_task_id TEXT -- Task agent is currently working on
);
session Table SchemaCREATE TABLE session (
id TEXT PRIMARY KEY,
agent_count INTEGER NOT NULL,
started_at INTEGER NOT NULL,
completed_at INTEGER,
active INTEGER DEFAULT 1
);
The core strength of the new implementation is transactional atomicity:
function claimTask(agentId: string): ClaimResult {
// Transaction ensures only ONE agent succeeds
const claimTransaction = db.transaction(() => {
// Step 1: Find first pending task
const task = db.prepare(
'SELECT id, description FROM tasks WHERE status = "pending" ORDER BY id LIMIT 1'
).get();
if (!task) {
return { success: false, reason: 'No pending tasks' };
}
// Step 2: Attempt claim (will only succeed if status is still 'pending')
const result = db.prepare(
'UPDATE tasks SET status = "claimed", claimed_by = ?, claimed_at = ? WHERE id = ? AND status = "pending"'
).run(agentId, Date.now(), task.id);
if (result.changes === 0) {
// Another agent claimed it between SELECT and UPDATE - try next
return { success: false, reason: 'Task was claimed by another agent' };
}
// Step 3: Update heartbeat to show we're alive and working
db.prepare(
'INSERT OR REPLACE INTO heartbeats (agent_id, last_heartbeat, current_task_id) VALUES (?, ?, ?)'
).run(agentId, Date.now(), task.id);
return { success: true, taskId: task.id, description: task.description };
}).immediate(); // Explicitly acquire RESERVED lock for immediate transaction
return claimTransaction(); // Atomic execution
}
Why SQLite Transactions Work:
.immediate() to acquire RESERVED lockTasks are automatically released if claimed too long without heartbeat:
function cleanupStaleClaims(leaseTimeout: number = 5 * 60 * 1000) {
// Default 5-minute timeout
const cutoffTime = Date.now() - leaseTimeout;
const cleanupTransaction = db.transaction(() => {
// Find claimed tasks where:
// 1. Claimed longer than timeout, OR
// 2. Agent hasn't sent heartbeat in that time
const staleTasks = db.prepare(`
SELECT t.id
FROM tasks t
LEFT JOIN heartbeats h ON t.claimed_by = h.agent_id
WHERE t.status = 'claimed'
AND t.claimed_at < ?
AND (h.last_heartbeat IS NULL OR h.last_heartbeat < ?)
`).all(cutoffTime, cutoffTime);
// Release each stale task back to pending
for (const staleTask of staleTasks) {
db.prepare('UPDATE tasks SET status = "pending", claimed_by = NULL, claimed_at = NULL WHERE id = ?')
.run(staleTask.id);
}
return staleTasks.length;
}).immediate(); // Explicitly acquire RESERVED lock for immediate transaction
return cleanupTransaction();
}
How Recovery Works:
Agents interact with the swarm via a TypeScript API:
import { startSwarm, connectToSwarm } from './swarm';
// Orchestrator starts the swarm
await startSwarm({
agentCount: 5,
tasks: ['fix a.ts', 'fix b.ts', ...],
leaseTimeout: 5 * 60 * 1000, // 5 minutes (default)
heartbeatInterval: 60 * 1000 // 60 seconds (default)
});
// Agents join existing swarm
await connectToSwarm(process.cwd());
import {
claimTask,
completeTask,
failTask,
heartbeat,
hasPendingWork,
disconnectFromSwarm
} from './swarm';
const agentId = 'agent-1';
// Main work loop
while (hasPendingWork()) {
// Claim next task
const claim = claimTask(agentId);
if (!claim.success) {
console.log('No tasks available:', claim.reason);
break;
}
const { taskId, description } = claim;
console.log(`Agent ${agentId} working on: ${description}`);
try {
// Do the work...
const result = await executeTask(description);
// Mark complete
completeTask(agentId, taskId, result);
console.log(`Agent ${agentId} completed task ${taskId}`);
} catch (error) {
// Mark failed
failTask(agentId, taskId, error.message);
console.error(`Agent ${agentId} failed on ${taskId}:`, error);
}
// Send heartbeat every 60 seconds (while working on long tasks)
heartbeat(agentId);
}
// Cleanup
disconnectFromSwarm();
startSwarm(config: SwarmConfig): Promise<boolean>Initialize the swarm with task pool and start cleanup timer.
const success = await startSwarm({
agentCount: 5,
tasks: ['task 1', 'task 2', 'task 3'],
leaseTimeout: 5 * 60 * 1000,
heartbeatInterval: 60 * 1000
});
stopSwarm(deleteDatabase?: boolean): booleanStop the swarm and optionally delete the database.
stopSwarm(true); // Delete database on cleanup
claimTask(agentId: string): ClaimResultClaim the next pending task. Returns { success, taskId, description, reason }.
const claim = claimTask('agent-1');
if (claim.success) {
console.log(`Claimed: ${claim.description}`);
}
completeTask(agentId: string, taskId: string, result?: string): booleanMark a task as done. Only succeeds if agent still owns the task.
completeTask('agent-1', 'task-1', 'Fixed the bug');
failTask(agentId: string, taskId: string, error: string): booleanMark a task as failed with error details.
failTask('agent-1', 'task-1', 'Could not compile: missing dependency');
heartbeat(agentId: string): booleanSend a heartbeat to indicate agent is alive. Call every 60 seconds during long-running tasks.
heartbeat('agent-1');
cleanupStaleClaims(leaseTimeout?: number): numberManually trigger cleanup of expired claims. Called automatically every 60 seconds.
const released = cleanupStaleClaims(5 * 60 * 1000);
console.log(`Released ${released} stale tasks`);
hasPendingWork(): booleanCheck if there are unclaimed tasks available.
if (!hasPendingWork()) {
console.log('All tasks claimed or completed');
}
isSwarmComplete(): booleanCheck if all tasks are done or failed.
if (isSwarmComplete()) {
console.log('Swarm finished!');
}
getSwarmStats(): SwarmStats | nullGet task counts and timing info.
const stats = getSwarmStats();
console.log(`${stats.doneTasks}/${stats.totalTasks} done`);
getActiveAgents(): numberGet count of agents with recent heartbeats.
const active = getActiveAgents();
console.log(`${active} agents currently active`);
getAllTasks(): SwarmTask[]Get all tasks with current status.
const tasks = getAllTasks();
const pending = tasks.filter(t => t.status === 'pending');
getTasksWithStatus(status: string): SwarmTask[]Filter tasks by status: 'pending', 'claimed', 'done', 'failed'.
const failed = getTasksWithStatus('failed');
getAgentTasks(agentId: string): SwarmTask[]Get all tasks claimed by a specific agent.
const myTasks = getAgentTasks('agent-1');
retryTask(agentId: string, taskId: string): ClaimResultAttempt to reclaim a failed task.
const retry = retryTask('agent-1', 'task-1');
if (retry.success) {
console.log('Task reclaimed, trying again...');
}
interface SwarmConfig {
agentCount: number; // Number of agents (1-5)
tasks: string[]; // Task descriptions
agentType?: string; // Agent type (default: 'executor')
leaseTimeout?: number; // Milliseconds (default: 5 min)
heartbeatInterval?: number; // Milliseconds (default: 60 sec)
cwd?: string; // Working directory
}
interface SwarmTask {
id: string;
description: string;
status: 'pending' | 'claimed' | 'done' | 'failed';
claimedBy: string | null;
claimedAt: number | null;
completedAt: number | null;
error?: string;
result?: string;
}
interface ClaimResult {
success: boolean;
taskId: string | null;
description?: string;
reason?: string;
}
interface SwarmStats {
totalTasks: number;
pendingTasks: number;
claimedTasks: number;
doneTasks: number;
failedTasks: number;
activeAgents: number;
elapsedTime: number;
}
heartbeat() at least this oftencleanupStaleClaims() to release orphaned tasks.omd/state/swarm.db)
completeTask() but is no longer the owner (was released)startSwarm() returns false if database initialization failsclaimTask() returns { success: false, reason: 'Database not initialized' }isSwarmReady() before proceedinggetActiveAgents() === 0 or hasPendingWork() === falseclaimTask() returns success=false with reason 'No pending tasks available'hasPendingWork() before loopingUser can cancel via /cancel:
/swarm 5:executor "fix all TypeScript type errors"
Spawns 5 executors, each claiming and fixing individual files.
/swarm 3:designer "implement Material-UI styling for all components in src/components/"
Spawns 3 designers, each styling different component files.
/swarm 4:security-reviewer "review all API endpoints for vulnerabilities"
Spawns 4 security reviewers, each auditing different endpoints.
/swarm 2:writer "add JSDoc comments to all exported functions"
Spawns 2 writers, each documenting different modules.
claimTask(), completeTask(), heartbeat()IMPORTANT: Delete state files on completion - do NOT just set active: false
When all tasks are done:
# Delete swarm state files
rm -f .omd/state/swarm-state.json
rm -f .omd/state/swarm-tasks.json
rm -f .omd/state/swarm-claims.json
The orchestrator (main skill handler) is responsible for:
startSwarm()getSwarmStats() and getActiveAgents()cleanupStaleClaims() automatically (via setInterval)isSwarmComplete()Each agent is a standard Task invocation with:
run_in_background: trueimport { claimTask, completeTask, ... } from './swarm'await connectToSwarm(cwd) to join existing swarmclaimTask() → do work → completeTask() or failTask()documentation
Agentic memory system for writers - track characters, relationships, scenes, and themes
development
Decompose multi-step tasks into parallel sub-agent workloads, route each sub-task to the cheapest capable model tier (Haiku/Sonnet/Opus), run long-running commands in the background, and verify all deliverables before stopping. Use when the user asks to 'go fast', 'parallelize', 'ultrawork', or when a request contains 3+ independent sub-tasks that benefit from concurrent execution.
tools
QA cycling workflow - test, verify, fix, repeat until goal met
development
Parallel autopilot with file ownership partitioning