skills/kieras-goms-for-task-analysis/SKILL.md
GOMS cognitive modeling methodology for analyzing human-computer interaction task performance
npx skillsauth add curiositech/windags-skills kieras-goms-for-task-analysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
license: Apache-2.0
Load this skill when facing:
Core insight: GOMS bridges informal task descriptions and executable cognitive models, making human performance predictable through hierarchical-sequential decomposition constrained by cognitive architecture.
Concept: A procedural model must be generative—capable of producing correct behavior for any instance within a task class—not just describe specific action sequences.
Why it matters:
Application: When modeling a task, ask "Can this procedure handle variations?" If your model only works for one specific example, you're modeling behavior, not knowledge.
Concept: Breaking tasks into goal-method-operator-selection hierarchies makes consistency (or its absence) structurally visible.
Why it matters:
Application: Map two "similar" tasks side-by-side in hierarchical form. Where their trees diverge unnecessarily, you've found a consistency problem users will feel as confusion.
Concept: Working memory isn't unlimited storage—it's a coordination mechanism using explicit tags (<filename>, <target>) that represents real cognitive constraints.
Why it matters:
Application: Count the working memory tags active at any point in your procedure. More than 3-4 simultaneously active tags signals likely cognitive overload.
Concept: When encountering processes too complex to model practically (reading comprehension, creative judgment, problem-solving), bypass them by treating results as "already available on a yellow pad."
Why it matters:
Application: If modeling a process requires PhD-level domain theory, bypass it. Focus on what the interface contributes to task difficulty, not domain complexity.
Concept: Errors aren't model failures—they're foreseeable events triggering error-recovery methods. Well-designed systems have simple, consistent recovery procedures.
Why it matters:
Application: For each operator in your model, ask "What if this fails?" If recovery requires a completely different mental model, the design is brittle.
If you need to compare design alternatives before implementation → Use GOMS to get quantitative learning/execution predictions
If the task is primarily perceptual-motor with little decision-making → Use simpler keystroke-level models (KLM)
If the task involves complex problem-solving or creative work → Use bypass heuristic to model only the interface-mediated portions
If you need to understand why users find a system confusing → Build hierarchical GOMS to reveal procedural inconsistencies
If comparing macro-level design approaches → Model to method level only, showing goal decomposition
If predicting actual execution time → Model to operator level, including primitive actions
If analyzing working memory load → Include all Retain, Delete, and Recall operations explicitly
If the detail varies greatly across design alternatives → Only that differing detail matters; use bypass heuristic elsewhere
If achieving similar high-level goals requires learning different low-level methods → Inconsistent (redesign for shared submethods)
If the same submethod serves multiple parent goals → Consistent (users learn once, apply broadly)
If selection rules differ arbitrarily for similar situations → Inconsistent (standardize decision criteria)
If error recovery uses the same mechanisms as normal operation → Consistent (reduced cognitive load)
If conditions at execution time determine which procedure to follow → One method with selection rules
If users think of these as different tasks → Separate methods (even if procedures are similar)
If the procedures share 80%+ structure with small variations → One parameterized method
If the procedures are fundamentally different approaches → Multiple methods (cognitive distinction exists)
| Reference File | When to Load | Key Content |
|---------------|--------------|-------------|
| hierarchical-decomposition-as-coordination-mechanism.md | Designing multi-agent systems, task distribution, understanding how hierarchy enables reuse | How goal-method hierarchies enable distributed intelligence to coordinate without global communication; structure as shared protocol |
| generative-models-versus-behavioral-traces.md | Distinguishing knowledge from behavior, evaluating if a model captures transferable understanding | The fundamental distinction between models that generate behavior for any task instance vs. those describing specific sequences; why generativity matters |
| bypassing-complexity-yellow-pad-heuristic.md | Encountering intractable domain complexity, scoping modeling effort, defending modeling boundaries | How to explicitly non-model complex processes; separating interface-mediated work from domain complexity; pragmatic tractability |
| working-memory-as-coordination-mechanism.md | Analyzing cognitive load, designing information handoffs, multi-agent state management | Tagged working memory as coordination mechanism; explicit Retain/Recall/Delete operations; memory as constraint not storage |
| judgment-calls-and-how-users-view-tasks.md | Making modeling decisions about user mental models, handling observational ambiguity | The irreducible judgment calls in task modeling; how analyst perspective shapes models; representing unobservable user understanding |
| failure-modes-in-procedural-systems.md | Predicting errors, designing error recovery, understanding when procedures break down | Where GOMS models fail (novel situations, skill development); error as goal trigger; designing for fault tolerance |
Symptom: Your model only works for the exact example you documented, not for variations.
Why it fails: You've captured what an expert does in one case, not what users must learn to handle all cases.
GOMS corrective: Build procedures with parameters and selection rules that generate correct behavior across instances.
Symptom: Flat lists of steps; no sense of goal-subgoal structure.
Why it fails: You can't see consistency patterns; every task looks equally complex; no reuse visible.
GOMS corrective: Decompose to goals and methods; shared submethods become obvious; inconsistency appears as duplicated-but-different structures.
Symptom: Procedure assumes users "just remember" 7+ pieces of information indefinitely.
Why it fails: Real users experience cognitive overload; information requires deliberate retention and recall; access has time cost.
GOMS corrective: Explicitly tag all working memory contents; count active tags; add Retain/Recall/Delete operations; if >4 tags active, redesign to reduce load.
Symptom: Spending days modeling complex domain calculations that are identical across design alternatives.
Why it fails: Wastes effort on areas where interface design has no impact; obscures what actually matters.
GOMS corrective: Use bypass heuristic for complex invariant processes; focus detail where design alternatives differ.
Symptom: "The model doesn't account for mistakes, so it's unrealistic."
Why it fails: Errors are predictable, not random; error-prone steps have characteristics; recovery is designable.
GOMS corrective: Identify error-prone operators (complex calculations, delayed feedback); model error recovery as goal-triggered methods; design for consistent recovery procedures.
Symptom: Organizing model by menu structure or page flow rather than user goals.
Why it fails: Interface organization often doesn't match task logic; forces artificial procedural sequences.
GOMS corrective: Organize by user goals, not interface geography; let goal-method hierarchy be natural; poor fit between task and interface structure becomes visible.
Someone who has truly internalized GOMS doesn't just analyze tasks—they see procedural structure in real-time during design discussions. They notice when "similar" features actually require unrelated procedural knowledge. They instinctively count working memory tags. They distinguish fluently between behavioral observations and generative knowledge. They know when to model deeply and when to bypass pragmatically.
The ultimate shibboleth: They can explain why a proposed interface change will increase learning time before anyone builds it, and do so by pointing to specific structural changes in the goal-method hierarchy, not through vague intuition.
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.