skills/lakatos/SKILL.md
Philosophy of science methodology examining how research programs evolve through proofs and refutations
npx skillsauth add curiositech/windags-skills lakatosInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
license: Apache-2.0
name: proofs-and-refutations
version: 1.0
source: "Proofs and Refutations: The Logic of Mathematical Discovery — Imre Lakatos"
description: >
A framework for reasoning under uncertainty, handling counterexamples,
refining concepts iteratively, and understanding how knowledge actually
grows through conjecture-refutation dialectics rather than accumulation
of certified truths.
activation_triggers:
- debugging complex systems where errors are ambiguous or hard to localize
- designing ontologies, schemas, or type systems
- handling edge cases, boundary conditions, or unexpected inputs
- evaluating whether a proof, argument, or reasoning chain is sound
- deciding how to respond to falsification or refutation of a hypothesis
- building systems that must learn or improve from failure
- coordinating across disagreement about definitions or success criteria
- any situation where "is this really a counterexample?" is a live question
Load this skill when:
A proof does not establish truth — it breaks a conjecture into sub-conjectures (lemmas), each independently criticizable. Even a proof of a false conjecture is productive: it maps the hidden assumptions and creates new targets for inquiry. The "failed" reasoning chain is a diagnostic, not just waste.
Operational implication: When a reasoning chain produces a wrong answer, don't discard it — trace which lemma failed. The map of failure is often more valuable than the original goal.
When a counterexample appears, there are fundamentally different responses:
Monster-barring feels like clarification but is concept-contraction driven by the desire to protect a result. Systems that keep redefining success criteria to exclude inconvenient cases appear to learn while becoming more brittle.
A counterexample can attack:
Conflating these causes two failure modes: (a) abandoning good conjectures because a sub-argument failed, or (b) defending bad conjectures by patching sub-arguments indefinitely.
Triage rule: Before deciding what a counterexample means, first determine what it hits.
The right definition of a concept cannot be determined at the outset — it emerges from pressure applied by counterexamples. Every boundary case is an opportunity to discover what the concept actually needs to be. Initial ontologies are provisional hypotheses, not fixed infrastructure.
Operational implication: Treat schema freeze dates with suspicion. The concepts that matter most will be the ones that haven't been stress-tested yet.
The "Perfect Definition" move — define the domain as exactly the set of things for which the conjecture holds — collapses inquiry into triviality. Progress requires maintaining tension between conjecture and domain. Any architecture too eager to resolve ambiguity, lock down schemas, or close open questions will systematically prevent learning that only happens at the boundary between what works and what doesn't.
Is the failure in the argument or the claim?
├── Attack on a lemma (local) → Repair the proof; conjecture survives
│ ├── Can the lemma be rescued? → Revise sub-argument
│ └── Lemma is genuinely false → Weaken the conjecture's conditions (lemma-incorporation)
└── Attack on the conjecture (global) → Revision required
├── Is the counterexample a "monster"? → Check: am I excluding it to protect the conjecture?
│ ├── Yes (monster-barring) → Dangerous; examine what you're giving up
│ └── No (genuinely outside scope) → Narrow domain explicitly and document why
└── Counterexample is legitimate → Revise conjecture or surrender
| Response | What it does | When it's legitimate | When it's a trap | |---|---|---|---| | Surrender | Abandons conjecture | Global counterexample is valid | Counterexample was only local | | Monster-barring | Redefines to exclude | Genuinely outside intended scope | Protecting conjecture from real falsification | | Exception-barring | Restricts domain explicitly | Principled scope limitation | Scope kept shrinking to preserve result | | Lemma-incorporation | Adds condition to conjecture | Local failure, fixable | Used to hide unresolvable problems | | Concept-stretching | Expands definition to cover case | Genuine generalization | Forces alien cases into concept |
Is there pressure to finalize definitions before boundary cases are tested?
└── Yes → Treat current definitions as v0 hypotheses; document known gaps
Are there cases the current schema handles awkwardly?
└── Yes → These are not "edge cases to handle later" — they are diagnostic pressure
└── What would the concept need to be for these to fit naturally?
└── That question is more valuable than patching the current schema
What does this proof actually establish?
1. List the lemmas (explicit and implicit)
2. Which lemmas are independently verified vs. assumed?
3. If a lemma fails, does the conjecture fall or just the argument?
4. What domain assumptions are embedded silently?
| File | When to Load |
|---|---|
| references/proofs-as-decomposition-not-certification.md | When evaluating what a proof or reasoning chain actually establishes; when extracting value from a "failed" argument; when assessing implicit assumptions in a chain of reasoning |
| references/monster-barring-and-concept-stretching.md | When a counterexample appears and there's temptation to redefine terms; when a system seems to be "learning" by progressively narrowing its success criteria; when definitions keep shifting after failures |
| references/local-vs-global-counterexamples-error-triage.md | When triaging a failure — deciding whether it invalidates the whole approach or just a sub-component; when doing root cause analysis; when deciding whether to abandon or repair an approach |
| references/definitions-as-products-not-preconditions.md | When designing or freezing ontologies, schemas, or type systems; when initial categories are failing under boundary cases; when asked to specify definitions before testing |
| references/the-dialectic-of-conjecture-and-refutation.md | When understanding the overall arc of how knowledge grows; when a system's learning process needs to be evaluated or designed; foundational framing for most other references |
| references/the-problem-of-concept-extension-and-boundary-cases.md | When boundary or edge cases are accumulating and need a principled treatment; when deciding whether to expand, restrict, or revise a concept under pressure |
| references/the-social-structure-of-rigorous-inquiry.md | When coordinating across agents or team members who disagree; when a single reasoner is making all calls without challenge; when epistemic role distribution matters |
| references/the-gap-between-knowing-and-proving.md | When a conjecture is highly credible but unproven; when deciding whether to act on an unproven hypothesis; when distinguishing justified confidence from formal proof |
These are the failure modes Lakatos most directly warns against:
1. Monster-Barring by Drift Redefining terms incrementally after each counterexample, so no single redefinition looks unreasonable, but the cumulative effect is that the "theorem" now applies to almost nothing. The warning sign: each revision feels like a clarification.
2. Treating Schema Freeze as Progress Locking down definitions and ontologies before they've been stress-tested by boundary cases. The result is a system that handles everything it was tested on and fails badly at everything it wasn't.
3. Conflating Proof Failure with Conjecture Failure Abandoning a correct conjecture because a particular argument for it failed. The argument failing is information about the argument, not necessarily the claim.
4. The Trivializing Perfect Definition Defining the domain as exactly the set of things for which the result holds, then claiming the theorem is proven. This is the death of inquiry dressed as rigor.
5. Missing the Value in Failed Reasoning Discarding a reasoning chain that reached a wrong conclusion instead of tracing which lemma failed. Failed proofs are maps; throwing away the map is waste.
6. Concept-Stretching Under Pressure Forcing genuinely alien cases into an existing concept rather than admitting the concept needs revision or a new concept is needed. Produces categories that are simultaneously too broad and too narrow.
7. Solo Epistemics A single agent making all decisions about what counts as a counterexample, what counts as monster-barring, and whether a proof is sound — without structures that enable genuine challenge. Lakatos's dialogue format is itself an argument about process.
How to tell if someone has actually internalized this book vs. just read the summary:
They have internalized it if they:
They have only read the summary if they:
Load reference files on demand as specific situations arise. The dialectic of conjecture and refutation cannot be shortcut — neither can reading these references selectively based on what the situation actually requires.
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.