skills/cost-accrual-tracker/SKILL.md
Track real-time API cost accrual during LLM execution. Activate on 'cost tracking', 'token usage', 'API costs', 'budget monitoring', 'usage metrics'. NOT for cost estimation, pricing tiers, or billing systems.
npx skillsauth add curiositech/windags-skills cost-accrual-trackerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Real-time tracking of API costs during LLM execution with support for partial costs on abort.
✅ Use for:
❌ NOT for:
interface TokenUsage {
inputTokens: number;
outputTokens: number;
cacheReadTokens?: number; // Prompt caching hits
cacheWriteTokens?: number; // Prompt caching misses
}
interface CostCalculation {
inputCostUsd: number;
outputCostUsd: number;
cacheSavingsUsd?: number;
totalCostUsd: number;
}
function calculateCost(usage: TokenUsage, model: string): CostCalculation {
const pricing = MODEL_PRICING[model];
const inputCostUsd = (usage.inputTokens / 1_000_000) * pricing.inputPerMTok;
const outputCostUsd = (usage.outputTokens / 1_000_000) * pricing.outputPerMTok;
return {
inputCostUsd,
outputCostUsd,
totalCostUsd: inputCostUsd + outputCostUsd,
};
}
Track costs as they accrue, not just at completion:
class CostAccrualTracker {
private totalInputTokens = 0;
private totalOutputTokens = 0;
private accruedCostUsd = 0;
private readonly model: string;
constructor(model: string) {
this.model = model;
}
/**
* Called after each API response (streaming or complete)
*/
recordUsage(usage: TokenUsage): void {
this.totalInputTokens += usage.inputTokens;
this.totalOutputTokens += usage.outputTokens;
const cost = calculateCost(usage, this.model);
this.accruedCostUsd += cost.totalCostUsd;
}
/**
* Get current accrued cost (for real-time display)
*/
getCurrentCost(): number {
return this.accruedCostUsd;
}
/**
* Finalize on completion or abort
*/
finalize(reason: 'completed' | 'aborted' | 'failed'): CostReport {
return {
totalInputTokens: this.totalInputTokens,
totalOutputTokens: this.totalOutputTokens,
totalCostUsd: this.accruedCostUsd,
completionReason: reason,
finalizedAt: Date.now(),
};
}
}
Critical: Always capture partial costs on abort:
// In execution handler
const tracker = new CostAccrualTracker(model);
try {
for await (const chunk of executeStream(request)) {
if (abortSignal.aborted) {
// CRITICAL: Capture cost BEFORE throwing
const partialCost = tracker.finalize('aborted');
onCostUpdate(partialCost);
throw new AbortError('Execution aborted');
}
tracker.recordUsage(chunk.usage);
onCostUpdate(tracker.getCurrentCost());
}
return tracker.finalize('completed');
} catch (error) {
if (error instanceof AbortError) {
throw error; // Already handled
}
return tracker.finalize('failed');
}
Auto-stop execution when budget is exceeded:
interface BudgetConfig {
maxCostUsd: number;
warnAtPercentage: number; // e.g., 0.8 for 80%
onWarn?: (current: number, max: number) => void;
onExceed?: (current: number, max: number) => void;
}
function createBudgetGuard(config: BudgetConfig) {
return {
check(currentCostUsd: number): 'ok' | 'warn' | 'exceed' {
const percentage = currentCostUsd / config.maxCostUsd;
if (percentage >= 1.0) {
config.onExceed?.(currentCostUsd, config.maxCostUsd);
return 'exceed';
}
if (percentage >= config.warnAtPercentage) {
config.onWarn?.(currentCostUsd, config.maxCostUsd);
return 'warn';
}
return 'ok';
}
};
}
Novice thinking: "Just throw an error when aborted"
Reality: If you don't capture costs before aborting, you lose:
Timeline: Always been an issue, but became critical with expensive models (GPT-4, Claude Opus)
Correct approach: Always call finalize() with partial data BEFORE throwing abort errors.
Novice thinking: "Poll cost endpoint every 100ms for real-time updates"
Reality:
Correct approach: Poll at 1-2 second intervals, or use event-driven updates from the execution stream.
Novice thinking: "Just multiply tokens by price per token"
Reality: Claude's prompt caching changes the cost model:
Timeline:
Correct approach: Track cache_read_input_tokens and cache_creation_input_tokens separately.
Novice thinking: "Create new tracker for each request"
Reality: For DAG execution with multiple nodes:
Correct approach: Hierarchical tracking - per-node trackers that roll up to execution-level.
┌─────────────────────────────────────────┐
│ CostAccrualTracker │
└─────────────────────────────────────────┘
│
┌─────────────────────────┼─────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ recordUsage() │ │ getCurrentCost()│ │ finalize() │
│ │ │ │ │ │
│ After each API │ │ For real-time │ │ On completion, │
│ response │ │ display │ │ abort, or fail │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ CostReport │
│ { inputTokens, outputTokens, totalCostUsd, completionReason } │
└─────────────────────────────────────────────────────────────────┘
For real-time cost display in execution UIs:
// Poll every 2 seconds while executing
useEffect(() => {
if (status !== 'running') return;
const interval = setInterval(async () => {
const response = await fetch(`/api/execute/${executionId}`);
const data = await response.json();
setAccruedCost(data.cost.accruedUsd);
setTokens({
input: data.cost.inputTokens,
output: data.cost.outputTokens,
});
}, 2000);
return () => clearInterval(interval);
}, [executionId, status]);
// Display format
<div className="cost-display">
<span className="cost-amount">${accruedCost.toFixed(4)}</span>
<span className="token-count">
{tokens.input.toLocaleString()} in / {tokens.output.toLocaleString()} out
</span>
</div>
| Component | Responsibility |
|-----------|----------------|
| CostAccrualTracker | Per-execution token counting and cost calculation |
| ExecutionManager | Aggregates costs across DAG executions |
| BudgetGuard | Threshold monitoring and auto-stop |
| /api/execute/:id | Exposes current cost via polling |
| Cost Display Widget | Real-time UI rendering |
See /references/claude-api-pricing.md for current Claude API pricing.
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.