generated/claude/skills/bdd/SKILL.md
Behavior-Driven Development with Gherkin specifications and black-box testing. Use when working with BDD projects, writing feature files, implementing step definitions, or designing acceptance tests around observable behaviors. Triggers on: 'use bdd mode', 'bdd', 'behavior driven', 'gherkin', 'feature file', 'scenario', 'step definitions', 'acceptance test', 'given when then', 'cucumber', 'godog', 'behave', 'specflow'. Full access mode - can write feature files, step definitions, and tests.
npx skillsauth add mcouthon/agents bddInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Write executable specifications that describe what the system does, not how it works.
"Scenarios are not tests. They are executable specifications — living documentation of how the system behaves, written in the language of the business."
Proactively load this skill when any of these indicators are present:
| Indicator | What to Look For |
| ------------------ | ---------------------------------------------------------------------------- |
| Feature files | *.feature files anywhere in the project |
| Features directory | features/ or specs/ directory at project root |
| Step definitions | *_steps.*, *_test.go with godog imports, steps/*.py |
| BDD config | cucumber.js, cucumber.yml, .specflow/, behave.ini, godog in go.mod |
| Test runner config | BDD-related entries in test configuration or CI pipeline |
When detected, apply all guidance below to feature files, step definitions, and related test code.
login.feature, checkout.feature, not everything.featurepassword-reset.feature), Title Case for Feature/Scenario titles# Good — focused feature, outcome-oriented scenarios
Feature: Password Reset
Users can reset forgotten passwords via email
Scenario: Successful password reset
Given a registered user with email "[email protected]"
When they request a password reset
Then a reset link is sent to "[email protected]"
Scenario: Reset request for unknown email
Given no user exists with email "[email protected]"
When they request a password reset for "[email protected]"
Then no email is sent
And no error is revealed to the requester
Describe what happens, not how the user clicks through the UI:
# BAD — imperative (UI mechanics)
Scenario: User logs in
Given I am on the login page
When I fill in "username" with "alice"
And I fill in "password" with "secret123"
And I click the "Login" button
And I wait for the dashboard to load
Then I should see "Welcome, Alice"
# GOOD — declarative (behavior)
Scenario: Successful login
Given a registered user "alice"
When alice logs in with valid credentials
Then alice sees her dashboard
| Keyword | Meaning | Rule | | ----------- | --------------------------------------------- | ------------------------------------------------- | | Given | Precondition — system state before the action | Set up state, never assert | | When | Action — the single thing being tested | One per scenario (use And for multi-step actions) | | Then | Observable outcome — what changed | Assert only observable results | | And/But | Continuation of the previous keyword | Same semantics as the keyword it follows |
Use Background when most scenarios share the same preconditions:
Feature: Shopping Cart
Background:
Given a customer with an active account
And the product catalog is loaded
Scenario: Add item to cart
When the customer adds "Widget" to their cart
Then the cart contains 1 item
Scenario: Remove item from cart
Given the customer has "Widget" in their cart
When they remove "Widget"
Then the cart is empty
Use outlines when the same behavior applies across different inputs — never duplicate scenarios:
Scenario Outline: Shipping cost by region
Given a package weighing <weight> kg
When shipped to <region>
Then the shipping cost is <cost>
Examples:
| weight | region | cost |
| 1 | domestic | $5.00 |
| 1 | europe | $15.00 |
| 5 | domestic | $12.00 |
BDD scenarios test through public interfaces only — the system is a black box:
# BAD — reaches into implementation
Then the database contains a row in "users" with email "[email protected]"
And the password hash starts with "$2b$"
# GOOD — asserts observable behavior
Then alice can log in with her new password
And a welcome email is received at "[email protected]"
Step definitions are thin glue code — they translate Gherkin into application calls:
# GOOD — thin glue, delegates to application code
@when('the customer adds "{item}" to their cart')
def add_to_cart(context, item):
context.response = context.client.post("/cart/items", json={"item": item})
@then('the cart contains {count:d} item(s)')
def check_cart_count(context, count):
cart = context.client.get("/cart").json()
assert len(cart["items"]) == count
# BAD — logic and DB access in step definition
@when('the customer adds "{item}" to their cart')
def add_to_cart(context, item):
product = db.query("SELECT * FROM products WHERE name = %s", (item,))
db.execute("INSERT INTO cart_items ...") # Direct DB manipulation!
context.cart_count = db.query("SELECT COUNT(*) FROM cart_items ...")[0]
| Anti-Pattern | Problem | Better Approach | | --------------------------------------------------------------------------- | ----------------------------------------------- | -------------------------------------------------------------- | | Incidental details — "alice" with password "Str0ng!" at "9:30 AM" | Noise hides the behavior under test | Only include details relevant to the outcome | | Testing implementation — Then UserService.validate() returns true | Coupled to code structure, breaks on refactor | Assert observable outcomes: "Then the user is logged in" | | Coupled step defs — steps call each other or share mutable global state | Fragile chain; one change breaks many scenarios | Independent steps sharing state through a context object | | Scenario as test script — 15 Given/When/Then steps in sequence | Unreadable, tests multiple behaviors at once | One behavior per scenario, 3–7 steps maximum | | UI-coupled steps — "click button", "fill field", "wait for element" | Brittle, breaks on any UI change | Declarative: "When the user submits the form" | | Copy-paste scenarios — same steps with different data | Maintenance burden, inconsistent updates | Scenario Outlines with Examples tables | | Missing Why — Feature with no description, no business context | Can't tell if the feature is still needed | Add 1-line description under Feature explaining business value |
Before committing feature files and step definitions:
- [ ] Feature files are readable by non-developers
- [ ] Each scenario tests exactly one behavior
- [ ] Steps are declarative (no UI mechanics or implementation details)
- [ ] No implementation coupling (scenarios survive internal refactors)
- [ ] Step definitions are thin glue (1–5 lines, delegate to app code)
- [ ] Shared state flows through context/world object, not globals
- [ ] Scenario Outlines used for data variations (no copy-paste scenarios)
- [ ] Feature descriptions explain the business value
| Excuse | Reality | Required Action | | ----------------------------------- | ---------------------------------------------------------- | --------------------------------------------------------- | | "We can add scenarios later" | Missing scenarios are missing requirements | Write scenarios before implementation — they ARE the spec | | "This is too simple for BDD" | Simple behaviors still need documented acceptance criteria | Write the feature file even if steps are trivial | | "I'll just verify through the DB" | DB assertions couple tests to implementation | Assert through the public API/UI — black-box only | | "One big scenario covers more" | Long scenarios test multiple behaviors and hide failures | Split into focused scenarios — one outcome each | | "Imperative steps are more precise" | They couple to UI/implementation and break on refactor | Declarative steps describe intent, not mechanics | | "Step reuse isn't worth the effort" | Duplicated steps diverge and create maintenance burden | Parameterize and share steps across features from day one |
development
Systematic debugging with hypothesis-driven investigation. Use when something is broken, tests are failing, unexpected behavior occurs, or errors need investigation. Triggers on: 'this is broken', 'debug', 'why is this failing', 'unexpected error', 'not working', 'bug', 'fix this issue', 'investigate', 'tests failing', 'trace the error', 'use debug mode'. Full access mode - can run commands, add logging, and fix issues.
development
Systematic debugging with hypothesis-driven investigation. Use when something is broken, tests are failing, unexpected behavior occurs, or errors need investigation. Triggers on: 'this is broken', 'debug', 'why is this failing', 'unexpected error', 'not working', 'bug', 'fix this issue', 'investigate', 'tests failing', 'trace the error', 'use debug mode'. Full access mode - can run commands, add logging, and fix issues.
testing
Behavioral testing strategy — deciding what to test and how. Use when writing tests, reviewing test quality, or fixing tests that test mocks instead of behavior. Triggers on: 'use testing mode', 'write tests', 'test strategy', 'tests are brittle', 'tests test mocks', 'improve test quality', 'what should I test'. Full access mode - can write and run tests.
development
Use when finding code smells, auditing TODOs, removing dead code, cleaning up unused imports, or assessing code quality. Triggers on: 'use tech-debt mode', 'tech debt', 'code smells', 'clean up', 'remove dead code', 'delete unused', 'simplify'. Full access mode - can modify files and run tests.