skills/fischer-lynch-paterson-1985-flp-impossibility/SKILL.md
Foundational impossibility result proving consensus cannot be guaranteed in asynchronous systems with even one faulty process
npx skillsauth add curiositech/windags-skills fischer-lynch-paterson-1985-flp-impossibilityInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Given system constraints → Choose approach:
IF no timing assumptions allowed AND exact consensus required
├── THEN consensus is impossible
├── → Redesign to avoid coordination
└── → OR accept potential non-termination + add manual escape
IF timing assumptions acceptable (timeouts, heartbeats allowed)
├── AND need strong consistency
├── → Use Paxos/Raft family protocols
└── → Monitor timing assumption violations
IF approximate results acceptable
├── OR partial agreement sufficient
├── → Use best-effort protocols
└── → Trade precision for availability
IF static participant set AND known failure bound
├── → Use majority protocol
└── → (Consensus IS possible in this case)
System stuck without deciding → Check cause:
IF cannot distinguish slow from crashed processes
├── → You've hit FLP boundary
├── → Add failure detector (timeout) OR accept non-termination
└── → Document synchrony assumptions if using timeouts
IF system has timing assumptions but still hangs
├── → Assumptions violated (network partition, load spike)
├── → Check: timeout too aggressive? GC pause? Network delay?
└── → Adjust assumptions OR add assumption violation handling
IF in bivalent state (can decide either way)
├── → Normal FLP scenario - not a bug
├── → Add escape hatch: manual override, quorum adjustment
└── → OR wait for network/timing conditions to improve
Setting failure detection timeout → Balance false positives vs negatives:
IF timeout too short
├── → False positives: declare live process dead
├── → Leads to: split decisions, unnecessary failovers
└── → SOLUTION: make adaptive, increase under load
IF timeout too long
├── → False negatives: don't detect real failures
├── → Leads to: long waits, poor user experience
└── → SOLUTION: expose partial results, manual override
ALWAYS:
├── → Log timeout violations to validate assumptions
├── → Design protocol to handle false positives gracefully
└── → Make timing assumptions explicit in documentation
Aggregating parallel task results → Check operation properties:
IF aggregation is commutative/associative (order doesn't matter)
├── → Process results as they arrive
├── → Minimal coordination needed
└── → Example: sum, max, set union
IF aggregation requires specific order OR all inputs
├── → Requires coordination protocol
├── → Apply coordination strategy selection (Decision Point 1)
└── → Example: consensus on final decision, ordered operations
IF some tasks might not complete
├── → Wait forever (pure asynchrony) OR timeout (synchrony assumption)
├── → Recommend: expose both modes
└── → Strict mode vs best-effort mode
Symptom: System still hangs despite hardware upgrades, faster networks, optimized code Diagnosis: Treating impossibility as performance problem Detection Rule: If you're saying "we'll make it fast enough that failures don't matter" Fix: Accept the trade-off, choose which guarantee to sacrifice (usually pure asynchrony via timeouts)
Symptom: System works in testing but fails unpredictably in production during network issues Diagnosis: Using timeouts/heartbeats without acknowledging synchrony assumptions Detection Rule: If your "asynchronous" system has any timeout values Fix: Document timing assumptions explicitly, monitor violations, design for assumption breaches
Symptom: System appears frozen, users can't proceed, no error message or recovery option Diagnosis: Protocol blocks indefinitely waiting for consensus without escape hatch Detection Rule: If system can enter a state where it waits forever with no user recourse Fix: Add timeouts, manual override, quorum adjustment, or "decide with available data" mode
Symptom: Operators can't tell if system is making progress or stuck, debugging is impossible Diagnosis: Treating consensus as black box without exposing internal state Detection Rule: If you can't answer "is the system stuck or still working?" Fix: Expose bivalent/univalent state, show which processes voted, add progress indicators
Symptom: Applying FLP reasoning to problems that don't need binary consensus Diagnosis: Over-applying impossibility result to leader election, ordering, or best-effort coordination Detection Rule: If you're using "FLP says it's impossible" for non-consensus problems Fix: Distinguish exact consensus from other coordination patterns, use appropriate guarantees
Scenario: 5 AI agents must reach consensus on whether to approve a pull request. Each agent analyzes different aspects (security, performance, style, tests, logic).
Expert Decision Process:
What novice misses:
Expert solution:
Scenario: 100 agents processing data chunks, need to aggregate results into final report. Some agents might fail.
Expert Decision Process:
FLP implications:
Expert solution:
Scenario: Raft-based coordination system appears stuck during leader election.
Expert Diagnosis Process:
What novice sees: "Consensus is broken, FLP proves this is impossible" What expert sees: "Synchrony assumptions violated, this is expected behavior"
Expert solution:
Task completion checklist for applying FLP knowledge:
Do NOT use this skill for:
Delegate to other skills when:
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.