skills/execution-lifecycle-manager/SKILL.md
Manage DAG execution lifecycles including start, stop, pause, resume, and cleanup. Activate on 'execution lifecycle', 'stop execution', 'abort DAG', 'graceful shutdown', 'kill process'. NOT for cost estimation, DAG building, or skill selection.
npx skillsauth add curiositech/windags-skills execution-lifecycle-managerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Centralized state management for running DAG executions with graceful shutdown patterns.
✅ Use for:
❌ NOT for:
Always use SIGTERM first, then escalate to SIGKILL:
// CORRECT: Two-phase shutdown
const GRACEFUL_TIMEOUT_MS = 2000;
async function terminateProcess(proc: ChildProcess): Promise<void> {
proc.kill('SIGTERM');
const forceKillTimer = setTimeout(() => {
if (!proc.killed) {
proc.kill('SIGKILL');
}
}, GRACEFUL_TIMEOUT_MS);
await waitForExit(proc);
clearTimeout(forceKillTimer);
}
Use AbortController for cancellation propagation:
// Parent (DAGExecutor)
const abortController = new AbortController();
// Pass signal to child executors
await executor.execute({
...request,
abortSignal: abortController.signal,
});
// To abort all children:
abortController.abort();
Track active executions for monitoring and cleanup:
interface ActiveExecution {
executionId: string;
abortController: AbortController;
status: 'running' | 'stopping' | 'stopped' | 'completed' | 'failed';
startedAt: number;
stoppedAt?: number;
}
class ExecutionManager {
private executions: Map<string, ActiveExecution> = new Map();
create(id: string): ActiveExecution { /* ... */ }
stop(id: string, reason: string): Promise<StopResult> { /* ... */ }
listActive(): ActiveExecution[] { /* ... */ }
}
Novice thinking: "Just kill it immediately"
Reality: SIGKILL doesn't allow cleanup. Processes can't:
Timeline:
Correct approach: Always SIGTERM first, SIGKILL as fallback.
Novice thinking: "Just track the top-level execution"
Reality: Without signal propagation, child processes become orphans:
Correct approach: Pass AbortSignal through entire execution tree.
Novice thinking: "Stop should return immediately"
Reality: Stopping is async - processes need time to terminate:
Correct approach: Return Promise with final state after cleanup completes.
┌──────────┐
│ idle │
└────┬─────┘
│ start()
▼
┌──────────┐
┌───►│ running │◄───┐
│ └────┬─────┘ │
│ │ │ resume()
│ │ pause() │
│ ▼ │
│ ┌──────────┐ │
│ │ paused │────┘
│ └────┬─────┘
│ │ stop()
│ ▼
│ ┌──────────┐
└────│ stopping │ (transitional - 2-10s)
└────┬─────┘
│
┌────────┴────────┐
▼ ▼
┌──────────┐ ┌──────────┐
│ stopped │ │ failed │
└──────────┘ └──────────┘
interface StopResponse {
status: 'stopped';
executionId: string;
reason: string; // 'user_abort' | 'timeout' | 'error'
finalCostUsd: number;
stoppedAt: number;
summary: {
nodesCompleted: number;
nodesFailed: number;
nodesTotal: number;
durationMs: number;
};
}
// In server.ts
process.on('SIGINT', async () => {
console.log('Shutting down...');
// Stop all active executions gracefully
const active = executionManager.listActive();
await Promise.all(
active.map(e => executionManager.stop(e.executionId, 'server_shutdown'))
);
server.close();
});
| Component | Responsibility |
|-----------|----------------|
| ExecutionManager | Tracks executions, coordinates stop |
| DAGExecutor | Owns AbortController, orchestrates waves |
| ProcessExecutor | Spawns processes, handles SIGTERM/SIGKILL |
| /api/execute/stop | HTTP interface for stop requests |
See /references/process-signals.md for Unix signal handling details.
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.