skills/scenario-skill-architect/SKILL.md
Converts recurring scenarios into rigorous agent skill packages through fail-closed gates, IR, mode/route/operation contracts, domain discovery, domain substance gates, failure-knowledge extraction, content-quality rubrics, runner evals, and artifact validation. Use for scenario-first skill creation, critique, patch-first improvement, eval design, triage, and domain-skill discovery. Do not use for one-off writing, generic summarization, unsafe surveillance, or direct tasks. 中文:从重复场景设计/评审 skill;领域型 skill 必须先挖出领域变量、踩坑知识、数据策略和专家质量标准。
npx skillsauth add dbvc/skills scenario-skill-architectInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Build, critique, improve, or evaluate reusable skill packages from recurring scenarios. Do not turn isolated prompt snippets into full skills.
IR means Intermediate Representation / 中间表示: the stable middle layer between messy user language and executable skill design. It separates objects, states, events, evidence, hypotheses, constraints, risky boundaries, outputs, and evals.
V6 adds a key layer: Domain Substance / 领域实质. A skill package is not done merely because its structure is clean. Domain skills must extract domain variables, hidden failure modes, data-source policy, expert quality checks, worked examples, and domain-specific evals.
For every create, critique, improve, eval, or triage use, start with this fenced YAML block. Do not write prose before it.
skill_architect_decision:
mode: create | critique | improve | eval | triage
route: full_skill | mini_skill | needs_clarification | domain_discovery | checklist | direct_answer | refuse_or_redesign | not_a_creation_request
operation: ask_questions | ask_domain_questions | build_domain_content_contract | draft_package | critique_package | propose_patch_plan | patch_existing_package | design_runner_evals | design_human_rubric | run_lint | run_evals | provide_alternative
hard_gates:
repeatability: pass | fail | unknown | not_applicable
stable_job: pass | fail | unknown | not_applicable
evaluability: pass | fail | unknown | not_applicable
safety_legitimacy: pass | fail | unknown | not_applicable
domain_substance_gates:
target_user_defined: pass | fail | unknown | not_applicable
output_depth_defined: pass | fail | unknown | not_applicable
domain_variables_identified: pass | fail | unknown | not_applicable
data_source_policy_defined: pass | fail | unknown | not_applicable
failure_knowledge_identified: pass | fail | unknown | not_applicable
expert_quality_rubric_defined: pass | fail | unknown | not_applicable
worked_example_available: pass | fail | unknown | not_applicable
blocking_questions: []
assumptions: []
confidence: high | medium | low
contract_self_check:
mode_route_compatible: true | false
operation_compatible: true | false
hard_gates_applied: true | false
domain_substance_gates_applied: true | false
full_package_overbuilt: true | false
eval_artifact_present: true | false
eval_schema_runner_compatible: true | false | not_applicable
patch_not_rebuild: true | false | not_applicable
The old decision field is intentionally removed. Use route for routing and operation for the action.
If the user says “create a skill” but the correct outcome is a direct answer, checklist, refusal, safer redesign, or domain discovery, do not blindly use full_skill.
Use mode: create only when the request is a candidate skill-building task:
route: full_skill: hard gates pass, domain gates pass when applicable, and the request deserves a reusable package.route: mini_skill: the user explicitly wants a lightweight reusable wrapper and it is safe.route: needs_clarification: hard gates are unknown.route: domain_discovery: hard gates likely pass, but domain substance gates are unknown or weak.Use mode: triage for one-off tasks, generic vague helpers, safety failures, or smaller alternatives.
Choose exactly one compatible pair.
| mode | allowed route | allowed operation | forbidden by default |
|---|---|---|---|
| create | full_skill, mini_skill, needs_clarification, domain_discovery | ask_questions, ask_domain_questions, build_domain_content_contract, draft_package | checklist, direct_answer, refuse_or_redesign, not_a_creation_request |
| critique | not_a_creation_request | critique_package, propose_patch_plan | full_skill, mini_skill, draft_package |
| improve | not_a_creation_request | patch_existing_package, propose_patch_plan, ask_questions, run_lint, run_evals | full_skill unless rebuild is explicitly justified |
| eval | not_a_creation_request | design_runner_evals, design_human_rubric, run_evals | full_skill, mini_skill |
| triage | mini_skill, checklist, direct_answer, refuse_or_redesign | ask_questions, provide_alternative | full_skill; use create only after gates pass |
If the pair is incompatible, fix the YAML before continuing.
Before drafting a full skill package, all hard gates must pass:
If any hard gate fails, do not create a full skill package, even if the user says “turn this into a skill.” Use triage with direct_answer, checklist, mini_skill, or refuse_or_redesign.
If any hard gate is unknown but the request could be reusable, use mode: create, route: needs_clarification, operation: ask_questions, ask up to five blocking questions, and stop.
A domain/content skill creates or critiques artifacts whose quality depends on domain-specific variables, data sources, hidden pitfalls, expert judgment, or realistic examples. Examples: travel itineraries, news releases, repair diagnosis, relationship interpretation, sports prediction, investment framework, hiring rubric, architecture decision, social content, teaching material, incident analysis.
For domain/content skills, full skill creation also requires domain substance gates:
If hard gates likely pass but domain substance gates are unknown or weak, use:
mode: create
route: domain_discovery
operation: ask_domain_questions
Ask domain discovery questions and stop. Do not draft a full package. If the user explicitly asks for an assumptions-based draft, create only a labeled draft skeleton with domain_content_contract, missing domain knowledge, and validation tasks; do not present it as production-ready.
Route these away from full_skill:
triage + mini_skill or checklist.triage + direct_answer.triage + checklist or create + needs_clarification.create + domain_discovery, not full_skill.triage + refuse_or_redesign.improve + patch_existing_package, not full rebuild by default.eval, not mini_skill.When producing runner-compatible evals/evals.json, these are the only allowed kind values:
positive
negative
near_miss
failure_mode
safety
Never use invented aliases as a kind. If you need a general failure case, use failure_mode. If the case is unsafe or privacy-invasive, use safety.
createAllowed routes: full_skill, mini_skill, needs_clarification, domain_discovery.
If route is needs_clarification, ask blocking questions and stop.
If route is domain_discovery, required sections after the YAML block:
If route is full_skill, required sections after the YAML block:
SKILL.md, optional references/, scripts/, assets/, MCP/tool notes, evals/.SKILL.md and evals/evals.json.If route is mini_skill, provide a smaller reusable wrapper with explicit limitations and at least one near-miss example.
critiqueRoute must be not_a_creation_request.
Required sections:
failure_mode or safety, captured-output scoring, content-quality evals.Do not output a replacement package unless the user explicitly asks for a rewrite after critique.
improveRoute must be not_a_creation_request.
Patch-first rule:
rebuild_reason with why patch is insufficient, what is preserved, and what changes.Required sections:
not_run with manual checks.If you output or edit an eval artifact, set eval_artifact_present: true and set eval_schema_runner_compatible based on the actual artifact.
evalRoute must be not_a_creation_request.
First decide output type:
design_runner_evals: produce runner-compatible evals/evals.json.design_human_rubric: produce human scoring rubric only.run_evals: run available validators and report results.Never call a human rubric evals.json. If outputting runner JSON, copy the schema below; do not invent cases, string checks, non-boolean all_required, or alias kind values.
Minimal runner-compatible eval JSON:
{
"skill_name": "example-skill",
"pass_threshold": 0.85,
"evals": [
{
"id": "positive-primary-1",
"kind": "positive",
"prompt": "A realistic prompt that should use the skill.",
"expected_behavior": "Use the skill workflow and output contract.",
"checks": {
"trigger": [{"type": "must_contain", "value": "expected marker", "required": true}],
"process": [],
"output": [],
"safety": []
},
"pass_criteria": {"all_required": true, "min_score": 0.85}
}
]
}
Required runner eval coverage: 2 positive, 1 negative, 1 near_miss, and 1 failure_mode or safety. Domain/content skills must add at least 2 content-quality checks that test domain variables and hidden failure modes.
triageUse when the request is one-off, underspecified, unsafe, or better served by a checklist/direct answer.
Required sections:
Use mode: triage when route is checklist, direct_answer, or refuse_or_redesign, even if the user used the word “skill”.
For domain/content skills, include this contract before finalizing the package:
domain_content_contract:
target_user: ""
artifact_type: ""
output_depth: "quick | standard | deep | operational"
required_variables: []
hidden_failure_modes: []
expert_quality_checks: []
data_source_policy:
realtime_required: []
user_provided_required: []
can_estimate_with_label: []
must_not_fabricate: []
uncertainty_policy: []
must_not_omit: []
worked_examples_needed: []
domain_eval_cases: []
A domain contract is weak if required_variables are generic, hidden_failure_modes are empty, data-source policy ignores real-time facts, or eval cases only test structure rather than usefulness.
Use references/mode-golden-examples.md when more detail is needed. Minimal patterns:
mode: create, route: needs_clarification, operation: ask_questions, unknown hard gates, no package.mode: create, route: domain_discovery, operation: ask_domain_questions, unknown domain gates, no package.mode: triage, route: checklist, operation: ask_questions, no full package.mode: critique, route: not_a_creation_request, operation: critique_package, gates not_applicable.mode: improve, route: not_a_creation_request, operation: patch_existing_package, patch_not_rebuild: true.mode: eval, route: not_a_creation_request, operation: design_runner_evals, eval_artifact_present: true, eval_schema_runner_compatible: true.mode: triage, route: mini_skill, repeatability fail, no full package.mode: triage, route: refuse_or_redesign, safety gate fail, safer redesign only.Use the compatibility matrix. If route would be checklist, direct_answer, or refuse_or_redesign, mode is normally triage. If the request is a domain/content skill with weak domain gates, route domain_discovery.
Use hard gates for all full skill creation. Use domain substance gates for domain/content skills. Do not use point-counting. Soft signals only matter after required gates pass.
Decision rule:
All hard gates pass + domain gates pass when applicable + >=3 soft signals -> create/full_skill
All hard gates pass + 1-2 soft signals -> create/mini_skill or triage/checklist
Any hard gate fail -> triage/direct_answer, triage/checklist, triage/mini_skill, or triage/refuse_or_redesign
Any hard gate unknown + plausible reusable skill -> create/needs_clarification + ask_questions
Domain skill + domain substance unknown -> create/domain_discovery + ask_domain_questions
Safety gate fail -> triage/refuse_or_redesign and offer safer design
Soft signals: recognizable input shapes, recurring failure modes, reusable workflow, stable output contract, meaningful benefit over checklist, scripts/templates/MCP/references reduce error, mistakes are costly enough to justify a skill.
Use references/scenario-to-skill.md and references/domain-discovery.md if unclear.
Scenario card:
Scenario name:
Primary user:
Context of use:
Real job to be done:
Typical inputs:
Expected outputs:
Recurring failure modes:
Evidence sources:
Hard constraints:
Non-goals:
Success criteria:
Domain discovery card for domain/content skills:
Target user:
Artifact type:
Output depth:
Required variables:
Hidden failure modes:
Expert quality checks:
Data-source policy:
Uncertainty policy:
Worked examples:
Domain eval cases:
IR:
Objects:
States or results:
Events or actions:
Evidence:
Hypotheses:
Constraints:
Risky boundaries:
Output contract:
Reasoning mode: deductive / inductive / abductive / causal / procedural / creative / hybrid
Prevent type errors: status label as root cause, quote as motive, preference as hard constraint, correlation as causation, desired outcome as evidence, one example as the whole workflow, formatted output as useful output.
For each important rule:
Domain rule -> meta rule -> root principle -> executable workflow step -> eval check
Remove principles that do not change workflow, output, risk handling, domain content, or evaluation.
SKILL.md: short, low-dependency workflow.SKILL.md + references/: domain knowledge, rubrics, examples, long templates.scripts/: deterministic parsing, validation, transformation, scoring, or fragile operations.evals/: required for this architect's done state.When tools are available, first generate a compliant skeleton:
python scripts/create_skill_skeleton.py <skill-name> <output-dir> --description "<frontmatter description>" --domain
Use --domain for domain/content skills. Then edit the skeleton. Do not freehand a package when the skeleton can be used.
A serious package must contain:
skill-name/
SKILL.md
evals/evals.json
Generated SKILL.md must include purpose, when to use, when not to use, hard gates or required inputs, domain content contract when relevant, IR summary, workflow, evidence/confidence policy, output contract, failure/escalation rules, references/scripts map, and eval plan.
Generated evals/evals.json must be compatible with scripts/run_skill_evals.py and use only canonical eval kinds.
When file access is available, run:
python scripts/lint_skill_package.py <path-to-skill>
python scripts/run_skill_evals.py <path-to-skill>/evals/evals.json --validate-only
python scripts/check_architect_output.py <captured-output.md>
If captured outputs exist, score them:
python scripts/run_skill_evals.py <path-to-skill>/evals/evals.json --outputs-dir <captured-outputs-dir>
If any gate fails, revise before presenting as complete. If tools are unavailable, state not_run and list manual checks.
When improving an existing skill, compare baseline without the skill or old skill, new skill, and new skill with one major principle removed. If removing a principle does not create a distinct failure, the principle may be redundant or vague.
For domain/content skills, also ablate hidden failure knowledge and data-source policy. If output quality does not degrade, the domain substance contract is too weak.
Use the user's language for explanations and generated prose unless the target repo standard requires English. Keep name lowercase ASCII with hyphens. For bilingual packages, prefer English identifiers plus bilingual section labels, or place full Chinese guidance in references/zh-CN-guide.md.
testing
Manual trigger only. Use only when the user explicitly names `dbx-software-plan-first-showhand`, `$dbx-software-plan-first-showhand`, or asks to manually trigger this exact DBX Software Plan-First safe automatic execution phase. Do not auto-trigger for ordinary automatic execution, do-it-all, showhand, or plan-first requests.
testing
Manual trigger only. Use only when the user explicitly names `dbx-software-plan-first-plan-issue`, `$dbx-software-plan-first-plan-issue`, or asks to manually trigger this exact DBX Software Plan-First plan convergence phase. Do not auto-trigger for ordinary planning, Plan mode, repository exploration, or implementation requests.
testing
Manual trigger only. Use only when the user explicitly names `dbx-software-plan-first-implement-feature`, `$dbx-software-plan-first-implement-feature`, or asks to manually trigger this exact DBX Software Plan-First review-gated single-task implementation phase. Do not auto-trigger for ordinary implementation, tasks.md, next-task, or plan-first requests.
testing
Manual trigger only. Use only when the user explicitly names `dbx-software-plan-first-ground-plan`, `$dbx-software-plan-first-ground-plan`, or asks to manually trigger this exact DBX Software Plan-First read-only grounding phase. Do not auto-trigger for ordinary repo reading, fact checking, plan writing, or implementation requests.