.claude/skills/workflow/ai-regression-testing/SKILL.md
Test specifically for AI-introduced regressions that repeat without tests
npx skillsauth add andrem-sec/psc-comet ai-regression-testingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Write tests for bugs that were found, not for code that works.
AI re-introduces the same bug repeatedly without tests. Real example from ECC research: the same notification_settings bug was introduced 4 times in a row before a regression test was added.
Why this happens:
The solution: regression tests. Once a bug appears, test it. The test prevents recurrence.
Symptom: Works in development, fails in production.
Cause: Hardcoded paths, environment assumptions, missing environment variable checks.
Example:
# AI writes this in sandbox
data = pd.read_csv('/Users/claude/data/users.csv')
# Fails in production (path doesn't exist)
Test to write:
def test_data_path_uses_config():
"""Regression: ensure data paths come from config, not hardcoded"""
from config import get_data_path
path = get_data_path('users')
assert not path.startswith('/Users/')
assert os.path.exists(path)
Symptom: Query returns all columns when only specific columns needed.
Cause: AI defaults to SELECT * for simplicity.
Example:
# AI writes this
query = "SELECT * FROM users WHERE active = true"
# Should be (performance issue with 50+ columns)
query = "SELECT id, email, name FROM users WHERE active = true"
Test to write:
def test_user_query_selects_only_required_columns():
"""Regression: ensure query doesn't SELECT *"""
query = get_active_users_query()
assert 'SELECT *' not in query
assert 'id' in query and 'email' in query
Symptom: Errors in one operation affect subsequent operations.
Cause: Shared state not cleaned up after exceptions.
Example:
# AI writes this
def process_batch(items):
for item in items:
cache[item.id] = item # Sets cache
process(item) # May raise exception
cache.clear() # Never reached if exception raised
Test to write:
def test_cache_cleared_even_on_exception():
"""Regression: ensure cache doesn't leak on errors"""
items = [valid_item, invalid_item, valid_item]
with pytest.raises(ProcessError):
process_batch(items)
assert len(cache) == 0 # Cache must be empty
Symptom: UI shows success but backend operation failed.
Cause: UI updated before async operation completes.
Example:
// AI writes this
function deleteUser(id) {
users = users.filter(u => u.id !== id) // Update UI immediately
api.delete(`/users/${id}`) // May fail, UI not reverted
}
Test to write:
test('user remains in list if delete API fails', async () => {
// Regression: ensure rollback on API failure
api.delete.mockRejectedValue(new Error('Network error'))
await deleteUser(123)
expect(users.find(u => u.id === 123)).toBeDefined()
})
Wrong approach: "Let's add tests for the auth module."
Right approach: "This auth bug happened. Let's test for it."
After every bug fix: If you fixed it, test it.
After every "this worked before" incident: Regression detected. Add test.
After every repeated mistake: If the same bug appears twice, add test immediately.
During code review: Reviewer sees a pattern. "Have we tested for error case X?"
Testing only happy paths: AI regressions occur in error cases, edge cases, race conditions. Test those.
Writing tests after the fact: Test while the bug is fresh. Delay = forgotten context.
Generic tests for specific bugs: A test for "auth works" doesn't prevent the specific bug "logout doesn't clear session cookie". Be specific.
Skipping the "verify failure" step: Always confirm the test fails before the fix. Otherwise you might have written a test that would never catch the bug.
This is the /fix pipeline. AI regression testing is step 3.
Use descriptive names that explain the regression:
# Good: Explains what was wrong
def test_logout_clears_session_cookie():
"""Regression: logout was leaving session cookie set"""
# Bad: Generic, doesn't explain the bug
def test_logout():
"""Test logout functionality"""
After adding a regression test, log to context/learnings.md:
[2026-03-28] pattern | AI regression: SELECT * query on users table caused performance issue. Added test_user_query_selects_only_required_columns() to prevent recurrence. Pattern: AI defaults to SELECT * for simplicity without considering column count impact.
This creates institutional memory. Future sessions can reference these patterns.
data-ai
Parallel agent swarm — decomposes work into independent units, spawns isolated workers, tracks PRs via fan-in
testing
Audit animations and transitions for motion accessibility, performance safety, and design intent. Enforces prefers-reduced-motion compliance and blocks layout-triggering transitions.
development
Framework for decomposing agent-driven tasks into independently verifiable units
development
Framework for designing quality agents with proper action space and contracts