Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

andrem-sec/ai-regression-testing

Name: ai-regression-testing
Author: andrem-sec

.claude/skills/workflow/ai-regression-testing/SKILL.md

npx skillsauth add andrem-sec/psc-comet ai-regression-testing

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

AI Regression Testing Skill

Write tests for bugs that were found, not for code that works.

What Claude Gets Wrong Without This Skill

AI re-introduces the same bug repeatedly without tests. Real example from ECC research: the same notification_settings bug was introduced 4 times in a row before a regression test was added.

Why this happens:

AI has no memory of past bugs across sessions
AI optimizes for "looks correct" not "is correct"
Similar contexts trigger similar mistakes
Pattern recognition works both ways (reinforces both correct and incorrect patterns)

The solution: regression tests. Once a bug appears, test it. The test prevents recurrence.

The Four Primary AI Regression Patterns

Pattern 1: Sandbox/Production Path Mismatch (#1 most common)

Symptom: Works in development, fails in production.

Cause: Hardcoded paths, environment assumptions, missing environment variable checks.

Example:

# AI writes this in sandbox
data = pd.read_csv('/Users/claude/data/users.csv')

# Fails in production (path doesn't exist)

Test to write:

def test_data_path_uses_config():
    """Regression: ensure data paths come from config, not hardcoded"""
    from config import get_data_path
    path = get_data_path('users')
    assert not path.startswith('/Users/')
    assert os.path.exists(path)

Pattern 2: SELECT Clause Omission

Symptom: Query returns all columns when only specific columns needed.

Cause: AI defaults to SELECT * for simplicity.

Example:

# AI writes this
query = "SELECT * FROM users WHERE active = true"

# Should be (performance issue with 50+ columns)
query = "SELECT id, email, name FROM users WHERE active = true"

Test to write:

def test_user_query_selects_only_required_columns():
    """Regression: ensure query doesn't SELECT *"""
    query = get_active_users_query()
    assert 'SELECT *' not in query
    assert 'id' in query and 'email' in query

Pattern 3: Error State Leakage

Symptom: Errors in one operation affect subsequent operations.

Cause: Shared state not cleaned up after exceptions.

Example:

# AI writes this
def process_batch(items):
    for item in items:
        cache[item.id] = item  # Sets cache
        process(item)  # May raise exception
    cache.clear()  # Never reached if exception raised

Test to write:

def test_cache_cleared_even_on_exception():
    """Regression: ensure cache doesn't leak on errors"""
    items = [valid_item, invalid_item, valid_item]
    with pytest.raises(ProcessError):
        process_batch(items)
    assert len(cache) == 0  # Cache must be empty

Pattern 4: Optimistic Update Without Rollback

Symptom: UI shows success but backend operation failed.

Cause: UI updated before async operation completes.

Example:

// AI writes this
function deleteUser(id) {
    users = users.filter(u => u.id !== id)  // Update UI immediately
    api.delete(`/users/${id}`)  // May fail, UI not reverted
}

Test to write:

test('user remains in list if delete API fails', async () => {
    // Regression: ensure rollback on API failure
    api.delete.mockRejectedValue(new Error('Network error'))
    await deleteUser(123)
    expect(users.find(u => u.id === 123)).toBeDefined()
})

Strategy: Test Bugs, Not Features

Wrong approach: "Let's add tests for the auth module."

Leads to testing happy paths that already work
Misses edge cases where bugs actually occur
Low ROI (writing tests for working code)

Right approach: "This auth bug happened. Let's test for it."

Directly targets actual failure modes
High ROI (prevents specific regressions)
Documents the bug in executable form

When to Write AI Regression Tests

After every bug fix: If you fixed it, test it.

After every "this worked before" incident: Regression detected. Add test.

After every repeated mistake: If the same bug appears twice, add test immediately.

During code review: Reviewer sees a pattern. "Have we tested for error case X?"

Anti-Patterns

Testing only happy paths: AI regressions occur in error cases, edge cases, race conditions. Test those.

Writing tests after the fact: Test while the bug is fresh. Delay = forgotten context.

Generic tests for specific bugs: A test for "auth works" doesn't prevent the specific bug "logout doesn't clear session cookie". Be specific.

Skipping the "verify failure" step: Always confirm the test fails before the fix. Otherwise you might have written a test that would never catch the bug.

Integration with Fix Workflow

Bug reported: User reports issue or test fails
Reproduce: Confirm the bug exists
Write regression test: Test that fails against current code
Fix the bug: Modify code to pass the test
Verify: Test now passes, all other tests still pass
Document: Add entry to learnings.md noting the pattern

This is the /fix pipeline. AI regression testing is step 3.

Test Naming Convention

Use descriptive names that explain the regression:

# Good: Explains what was wrong
def test_logout_clears_session_cookie():
    """Regression: logout was leaving session cookie set"""

# Bad: Generic, doesn't explain the bug
def test_logout():
    """Test logout functionality"""

Documenting in Learnings

After adding a regression test, log to context/learnings.md:

[2026-03-28] pattern | AI regression: SELECT * query on users table caused performance issue. Added test_user_query_selects_only_required_columns() to prevent recurrence. Pattern: AI defaults to SELECT * for simplicity without considering column count impact.

This creates institutional memory. Future sessions can reference these patterns.

Mandatory Checklist

Verify the regression test targets a specific bug that actually occurred (not hypothetical)
Verify the test fails against the buggy code before the fix
Verify the test passes after the fix is applied
Verify the test name clearly describes what regression it prevents
Verify the bug pattern matches one of the four primary patterns (or documents a new pattern)
Verify the regression is documented in context/learnings.md with the pattern category
Verify all existing tests still pass after the fix

andrem-sec/ai-regression-testing

.claude/skills/workflow/ai-regression-testing/SKILL.md

Test specifically for AI-introduced regressions that repeat without tests

1 stars

testing

Updated Apr 16, 2026

$ install --global

skillsauth

npx skillsauth add andrem-sec/psc-comet ai-regression-testing

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 24, 2026, 8:49 PM1.7s1 file scanned

SKILL.md

name:: ai-regression-testing
description:: Log to learnings.md for future reference
version:: 1.0.0
level:: 3
- name:: Document Pattern

AI Regression Testing Skill

Write tests for bugs that were found, not for code that works.

What Claude Gets Wrong Without This Skill

AI re-introduces the same bug repeatedly without tests. Real example from ECC research: the same notification_settings bug was introduced 4 times in a row before a regression test was added.

Why this happens:

AI has no memory of past bugs across sessions
AI optimizes for "looks correct" not "is correct"
Similar contexts trigger similar mistakes
Pattern recognition works both ways (reinforces both correct and incorrect patterns)

The solution: regression tests. Once a bug appears, test it. The test prevents recurrence.

The Four Primary AI Regression Patterns

Pattern 1: Sandbox/Production Path Mismatch (#1 most common)

Symptom: Works in development, fails in production.

Cause: Hardcoded paths, environment assumptions, missing environment variable checks.

Example:

# AI writes this in sandbox
data = pd.read_csv('/Users/claude/data/users.csv')

# Fails in production (path doesn't exist)

Test to write:

def test_data_path_uses_config():
    """Regression: ensure data paths come from config, not hardcoded"""
    from config import get_data_path
    path = get_data_path('users')
    assert not path.startswith('/Users/')
    assert os.path.exists(path)

Pattern 2: SELECT Clause Omission

Symptom: Query returns all columns when only specific columns needed.

Cause: AI defaults to SELECT * for simplicity.

Example:

# AI writes this
query = "SELECT * FROM users WHERE active = true"

# Should be (performance issue with 50+ columns)
query = "SELECT id, email, name FROM users WHERE active = true"

Test to write:

def test_user_query_selects_only_required_columns():
    """Regression: ensure query doesn't SELECT *"""
    query = get_active_users_query()
    assert 'SELECT *' not in query
    assert 'id' in query and 'email' in query

Pattern 3: Error State Leakage

Symptom: Errors in one operation affect subsequent operations.

Cause: Shared state not cleaned up after exceptions.

Example:

# AI writes this
def process_batch(items):
    for item in items:
        cache[item.id] = item  # Sets cache
        process(item)  # May raise exception
    cache.clear()  # Never reached if exception raised

Test to write:

def test_cache_cleared_even_on_exception():
    """Regression: ensure cache doesn't leak on errors"""
    items = [valid_item, invalid_item, valid_item]
    with pytest.raises(ProcessError):
        process_batch(items)
    assert len(cache) == 0  # Cache must be empty

Pattern 4: Optimistic Update Without Rollback

Symptom: UI shows success but backend operation failed.

Cause: UI updated before async operation completes.

Example:

// AI writes this
function deleteUser(id) {
    users = users.filter(u => u.id !== id)  // Update UI immediately
    api.delete(`/users/${id}`)  // May fail, UI not reverted
}

Test to write:

test('user remains in list if delete API fails', async () => {
    // Regression: ensure rollback on API failure
    api.delete.mockRejectedValue(new Error('Network error'))
    await deleteUser(123)
    expect(users.find(u => u.id === 123)).toBeDefined()
})

Strategy: Test Bugs, Not Features

Wrong approach: "Let's add tests for the auth module."

Leads to testing happy paths that already work
Misses edge cases where bugs actually occur
Low ROI (writing tests for working code)

Right approach: "This auth bug happened. Let's test for it."

Directly targets actual failure modes
High ROI (prevents specific regressions)
Documents the bug in executable form

When to Write AI Regression Tests

After every bug fix: If you fixed it, test it.

After every "this worked before" incident: Regression detected. Add test.

After every repeated mistake: If the same bug appears twice, add test immediately.

During code review: Reviewer sees a pattern. "Have we tested for error case X?"

Anti-Patterns

Testing only happy paths: AI regressions occur in error cases, edge cases, race conditions. Test those.

Writing tests after the fact: Test while the bug is fresh. Delay = forgotten context.

Generic tests for specific bugs: A test for "auth works" doesn't prevent the specific bug "logout doesn't clear session cookie". Be specific.

Skipping the "verify failure" step: Always confirm the test fails before the fix. Otherwise you might have written a test that would never catch the bug.

Integration with Fix Workflow

Bug reported: User reports issue or test fails
Reproduce: Confirm the bug exists
Write regression test: Test that fails against current code
Fix the bug: Modify code to pass the test
Verify: Test now passes, all other tests still pass
Document: Add entry to learnings.md noting the pattern

This is the /fix pipeline. AI regression testing is step 3.

Test Naming Convention

Use descriptive names that explain the regression:

# Good: Explains what was wrong
def test_logout_clears_session_cookie():
    """Regression: logout was leaving session cookie set"""

# Bad: Generic, doesn't explain the bug
def test_logout():
    """Test logout functionality"""

Documenting in Learnings

After adding a regression test, log to context/learnings.md:

[2026-03-28] pattern | AI regression: SELECT * query on users table caused performance issue. Added test_user_query_selects_only_required_columns() to prevent recurrence. Pattern: AI defaults to SELECT * for simplicity without considering column count impact.

This creates institutional memory. Future sessions can reference these patterns.

Mandatory Checklist

Verify the regression test targets a specific bug that actually occurred (not hypothetical)
Verify the test fails against the buggy code before the fix
Verify the test passes after the fix is applied
Verify the test name clearly describes what regression it prevents
Verify the bug pattern matches one of the four primary patterns (or documents a new pattern)
Verify the regression is documented in context/learnings.md with the pattern category
Verify all existing tests still pass after the fix

Related Skills

andrem-sec/batch

data-ai

VerifiedTrustedCommunity

Parallel agent swarm — decomposes work into independent units, spawns isolated workers, tracks PRs via fan-in

1SKILL.mdUpdated Apr 16, 2026

andrem-sec/animation-safe

testing

VerifiedTrustedCommunity

Audit animations and transitions for motion accessibility, performance safety, and design intent. Enforces prefers-reduced-motion compliance and blocks layout-triggering transitions.

1SKILL.mdUpdated Apr 16, 2026

andrem-sec/animation-safe

andrem-sec/agentic-engineering

development

VerifiedTrustedCommunity

Framework for decomposing agent-driven tasks into independently verifiable units

1SKILL.mdUpdated Apr 16, 2026

andrem-sec/agentic-engineering

andrem-sec/agent-harness-construction

development

VerifiedTrustedCommunity

Framework for designing quality agents with proper action space and contracts

1SKILL.mdUpdated Apr 16, 2026

andrem-sec/agent-harness-construction

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/andrem-sec/psc-comet.git

# Copy into Claude Code skills folder (global)
cp -r psc-comet/.claude/skills/workflow/ai-regression-testing ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

andrem-sec/psc-comet

1 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT