skills/windags-looking-back/SKILL.md
Polya's four questions after every WinDAGs execution. Q1 (contract satisfied?) and Q2 (unstated assumptions?) are mandatory on every DAG. Q3 (generalizable?) and Q4 (broader connections?) are conditional on quality and cost thresholds, run asynchronously, and never delay result delivery. Activate on "looking back", "Polya", "retrospective", "post-execution review", "was the contract satisfied", "unstated assumptions", "generalize", "problem class". NOT for learning engine updates (use windags-curator), failure scanning (use windags-premortem), or DAG construction (use windags-architect).
npx skillsauth add curiositech/windags-skills windags-looking-backInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Ask Polya's four questions at the end of every execution. Q1 and Q2 are mandatory and block completion. Q3 and Q4 are conditional, asynchronous, and never delay result delivery. Produce a LookingBackResult that feeds into the learning archive.
Model Tier: Q1-Q2 = Tier 1 (Haiku-class), Q3-Q4 = Tier 2 (Sonnet-class) Behavioral Contracts: BC-LEARN-003, BC-CROSS-009
Use this skill when:
Do NOT use for:
windags-curator)windags-premortem)windags-mutator)Derived from Polya's "How to Solve It" -- the Looking Back phase. Adapted for DAG execution.
flowchart TD
EX[Execution complete] --> Q1[Q1: Was the contract satisfied?]
Q1 --> Q2[Q2: Were there unstated assumptions?]
Q2 --> DELIVER[Deliver result to user]
Q2 --> GATE{Quality >= 0.8 AND cost <= $0.01?}
GATE -->|Yes| Q3[Q3: Can this method be generalized?]
GATE -->|No| SKIP3[Skip Q3]
Q3 --> GATE2{Quality >= 0.9 AND cost <= $0.02?}
GATE2 -->|Yes| Q4[Q4: Broader problem class?]
GATE2 -->|No| SKIP4[Skip Q4]
Q4 --> APPEND[Append Q3/Q4 results when available]
SKIP3 --> DONE[Done]
SKIP4 --> APPEND
APPEND --> DONE
| Question | Requirement | Model Tier | Blocks Delivery? | Cost Gate | |----------|------------|------------|-------------------|-----------| | Q1 | MANDATORY, every DAG | Tier 1 | Yes | None | | Q2 | MANDATORY, every DAG | Tier 1 | Yes | None | | Q3 | CONDITIONAL | Tier 2 | No | quality >= 0.8, cost <= $0.01 | | Q4 | CONDITIONAL | Tier 2 | No | quality >= 0.9, cost <= $0.02 |
BC-LEARN-003: Q1 and Q2 run on every DAG, including trivial single-node DAGs. No exceptions.
BC-CROSS-009: Q3 and Q4 are non-blocking. They run asynchronously after the result has been delivered to the user. Their findings are appended to the LookingBackResult when available.
Verify that the DAG's output matches what was promised.
Retrieve the original contract: Pull the Sensemaker's problem statement and the Decomposer's output specification. The contract is what was promised, not what was attempted.
Check output presence: Verify that every required output field exists in the final result. Missing fields are an automatic Q1 failure.
Check output types: Verify that each output field matches its specified type. A field that exists but contains the wrong type (string where number expected, empty array where populated array expected) is a type violation.
Check output content: For each output field, verify that the content addresses the original problem statement. An output that is present, correctly typed, but irrelevant is a content failure.
Check completeness: If the contract specified multiple deliverables, verify all are present. Partial completion is noted with the specific gaps identified.
q1_satisfied = all of:
- All required fields present
- All fields correctly typed
- All fields contain relevant content
- All specified deliverables accounted for
q1_evidence = list of:
- For each check: pass/fail + specific details
- If failed: which specific aspect failed and why
When Q1 fails, note the specific failure mode. This feeds back into the Evaluator's quality model and the Curator's Thompson updates.
Surface assumptions that the DAG made implicitly but never validated.
Scan for each category in order.
1. Environment Assumptions
2. Data Assumptions
3. Dependency Assumptions
4. Context Assumptions
For each node in the DAG:
Collect all detected assumptions. Rate each by impact:
High-impact assumptions are the most dangerous because they produce confident-incorrect outputs (Principle 8).
Only run when: average_quality >= 0.8 AND estimated_q3_cost <= $0.01
This question uses a Tier 2 model because generalization requires reasoning about abstraction.
Extract the method: What sequence of operations did this DAG use to solve the problem? Abstract away the specific inputs and outputs. Describe the method as a template.
Identify the invariant: What properties of the input made this method work? What would need to be true of a new input for the same method to apply?
Identify the variables: What parts of the method changed based on the specific problem? These are the parameters of the generalized method.
Test generalizability: Mentally apply the method to 2-3 hypothetical problems in the same domain. Does it still make sense? Would it produce reasonable results?
Assess scope: How broad is the class of problems this method covers?
q3_generalizable: true | false
q3_method_description: string # The generalized method as a template
q3_scope: "narrow" | "medium" | "broad"
q3_invariants: [string] # Conditions that must hold
q3_parameters: [string] # Parts that vary per problem
When q3_generalizable = true and q3_scope != "narrow", this feeds into the Curator's crystallization pipeline. A generalizable method is a candidate for a new skill.
Only run when: average_quality >= 0.9 AND estimated_q4_cost <= $0.02
This question uses a Tier 2 model because identifying problem class connections requires broad reasoning.
Identify the problem class: What type of problem was this? Classification options:
Find structural analogies: What other problems share the same structure, even if the domain is different? Look for:
Identify transfer opportunities: If this method worked here, where else might it work? Be specific:
q4_connections:
- problem_class: string # The broader class name
structural_analogy: string # What structure is shared
transfer_targets:
- domain: string
adaptation_needed: string
transfer_difficulty: "low" | "medium" | "high"
Q3 and Q4 run asynchronously after the result is delivered to the user.
sequenceDiagram
participant LB as Looking Back
participant User as User/Executor
participant KL as Knowledge Library
LB->>LB: Run Q1 (mandatory)
LB->>LB: Run Q2 (mandatory)
LB->>User: Deliver result with Q1 + Q2
par Async Q3-Q4
LB->>LB: Check Q3 gate (quality >= 0.8, cost <= $0.01)
LB->>LB: Run Q3 if gated
LB->>LB: Check Q4 gate (quality >= 0.9, cost <= $0.02)
LB->>LB: Run Q4 if gated
LB->>KL: Append Q3/Q4 findings
end
The user receives their result without waiting for Q3 or Q4. These findings are available for subsequent executions and for the Curator's crystallization pipeline.
If Q3 or Q4 fail (model error, timeout, cost overrun), silently skip. They are enrichment, not requirements.
Produce a LookingBackResult with these fields:
LookingBackResult:
q1_satisfied: boolean
q1_evidence:
- check: string # What was verified
result: "pass" | "fail"
detail: string # Specific evidence
q2_assumptions:
- category: string # Environment | Data | Dependency | Context
assumption: string # What was assumed
impact: "low" | "medium" | "high"
validation_status: "confirmed" | "unvalidated" | "violated"
q3_generalizable: boolean | null # null if not run
q3_method_description: string | null
q3_scope: "narrow" | "medium" | "broad" | null
q3_invariants: [string] | null
q3_parameters: [string] | null
q4_connections: list | null # null if not run
- problem_class: string
structural_analogy: string
transfer_targets:
- domain: string
adaptation_needed: string
transfer_difficulty: "low" | "medium" | "high"
execution_summary:
total_nodes: number
successful_nodes: number
average_quality: number
total_cost: number
total_duration_seconds: number
Looking Back is the final agent in the meta-DAG pipeline:
flowchart LR
CU[Curator] --> LB[Looking Back]
LB -->|Q1 + Q2| USER[Deliver to User]
LB -.->|Q3 + Q4 async| KL[Knowledge Library]
Looking Back reads the Evaluator's scores and the Curator's learning updates but does not modify them. It produces an independent assessment. If Q1 finds a contract violation that the Evaluator missed, this is logged as a discrepancy for future calibration.
| Operation | Target | |-----------|--------| | Q1 verification | < 2s | | Q2 assumption scan | < 3s | | Q3 generalization (Tier 2) | < 10s | | Q4 connection analysis (Tier 2) | < 15s | | Q1 + Q2 total (blocking) | < 5s | | Q3 + Q4 total (non-blocking) | < 25s |
Q1 and Q2 together must complete in under 5 seconds. They block result delivery and must be fast. Q3 and Q4 have more latitude because they run in the background.
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.