.claude/skills/testing-quality/when-validating-code-works-use-functionality-audit/SKILL.md
Validates that code actually works through sandbox testing, execution verification, and systematic debugging. Use this skill after code generation or modification to ensure functionality is genuine rather than assumed. The skill creates isolated test environments, executes code with realistic inputs, identifies bugs through systematic analysis, and applies best practices to fix issues without breaking existing functionality.
npx skillsauth add DNYoussef/ai-chrome-extension when-validating-code-works-use-functionality-auditInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Trigger Conditions:
Situations Requiring Functionality Audit:
This skill systematically validates that code delivers its intended behavior through actual execution rather than static analysis alone. It creates isolated testing environments (sandboxes), executes code with realistic inputs, captures outputs and errors, identifies root causes of failures through systematic debugging, and applies fixes using best practices that preserve existing functionality.
The skill emphasizes genuine functionality over appearance, detecting "theater code" that looks correct but fails during execution. It combines automated testing, manual validation, and debugging expertise to ensure code reliability.
Agents: tester (lead), coder (support) Duration: 10-15 minutes
Scripts:
# Initialize phase
npx claude-flow hooks pre-task --description "Phase 1: Setup Testing Environment"
npx claude-flow swarm init --topology hierarchical --max-agents 2
# Spawn agents
npx claude-flow agent spawn --type tester --capabilities "sandbox-setup,environment-config,dependency-management"
npx claude-flow agent spawn --type coder --capabilities "tooling-setup,script-generation"
# Memory coordination - store environment config
npx claude-flow memory store --key "testing/functionality-audit/phase-1/tester/sandbox-config" --value '{"isolated":true,"snapshot_enabled":true}'
npx claude-flow memory store --key "testing/functionality-audit/phase-1/coder/dependencies" --value '{"package_manager":"npm","install_command":"npm install"}'
# Execute phase work
# 1. Create isolated sandbox environment
echo "Creating sandbox for code execution..."
mkdir -p /tmp/functionality-audit-sandbox
cd /tmp/functionality-audit-sandbox
# 2. Install dependencies and setup tools
npm init -y 2>/dev/null || true
npm install --save-dev jest @types/jest ts-node typescript 2>/dev/null || true
# 3. Configure testing framework
cat > jest.config.js << 'EOF'
module.exports = {
preset: 'ts-jest',
testEnvironment: 'node',
collectCoverage: true,
coverageDirectory: 'coverage',
testMatch: ['**/*.test.ts', '**/*.test.js'],
verbose: true
};
EOF
# Complete phase
npx claude-flow hooks post-task --task-id "phase-1-setup"
npx claude-flow memory store --key "testing/functionality-audit/phase-1/output" --value '{"status":"complete","sandbox_ready":true}'
Memory Pattern:
testing/functionality-audit/phase-0/user/code-to-validatetesting/functionality-audit/phase-1/tester/sandbox-readytesting/functionality-audit/shared/environment-configSuccess Criteria:
Deliverables:
Agents: tester (lead), production-validator (support) Duration: 15-20 minutes
Scripts:
# Initialize phase
npx claude-flow hooks pre-task --description "Phase 2: Execute Code with Realistic Inputs"
npx claude-flow swarm scale --target-agents 2
# Retrieve sandbox config from Phase 1
npx claude-flow memory retrieve --key "testing/functionality-audit/phase-1/output"
# Memory coordination - define test scenarios
npx claude-flow memory store --key "testing/functionality-audit/phase-2/tester/test-scenarios" --value '{"unit_tests":true,"integration_tests":true,"edge_cases":true}'
# Execute phase work
# 1. Copy code to sandbox
echo "Copying code to sandbox environment..."
# (Code paths retrieved from memory)
# 2. Generate test cases with realistic inputs
cat > sandbox-tests.test.js << 'EOF'
// Auto-generated test cases for functionality validation
describe('Functionality Audit Tests', () => {
test('Happy path with valid inputs', async () => {
// Test implementation with realistic data
});
test('Edge cases and boundary conditions', async () => {
// Test edge cases
});
test('Error handling with invalid inputs', async () => {
// Test error scenarios
});
});
EOF
# 3. Execute tests and capture output
npm test -- --coverage --verbose > test-output.log 2>&1
TEST_EXIT_CODE=$?
# 4. Store results in memory
npx claude-flow memory store --key "testing/functionality-audit/phase-2/tester/execution-results" --value "{\"exit_code\":$TEST_EXIT_CODE,\"timestamp\":\"$(date -Iseconds)\"}"
# Complete phase
npx claude-flow hooks post-task --task-id "phase-2-execute"
Memory Pattern:
testing/functionality-audit/phase-1/tester/sandbox-readytesting/functionality-audit/phase-2/tester/execution-resultstesting/functionality-audit/shared/test-logsSuccess Criteria:
Deliverables:
Agents: coder (lead), tester (support), reviewer (validation) Duration: 20-30 minutes
Scripts:
# Initialize phase
npx claude-flow hooks pre-task --description "Phase 3: Debug Issues"
npx claude-flow swarm scale --target-agents 3
# Retrieve execution results from Phase 2
npx claude-flow memory retrieve --key "testing/functionality-audit/phase-2/tester/execution-results"
# Memory coordination - analyze failures
npx claude-flow memory store --key "testing/functionality-audit/phase-3/coder/debugging-strategy" --value '{"method":"systematic-root-cause","tools":["debugger","logs","profiler"]}'
# Execute phase work
# 1. Analyze test failures systematically
echo "Analyzing failures and errors..."
grep -E "(FAIL|ERROR|Exception)" test-output.log > failures.txt || true
# 2. Identify root causes using debugging techniques
# - Stack trace analysis
# - Variable inspection
# - Logic flow validation
# - Dependency conflict detection
# 3. Categorize issues by type
cat > issue-analysis.json << 'EOF'
{
"syntax_errors": [],
"runtime_errors": [],
"logic_errors": [],
"integration_failures": [],
"dependency_issues": []
}
EOF
# 4. Prioritize fixes by impact
# Critical: Code doesn't run at all
# High: Core functionality broken
# Medium: Edge cases failing
# Low: Minor issues, optimizations
# Store debugging results
npx claude-flow memory store --key "testing/functionality-audit/phase-3/coder/root-causes" --value "$(cat issue-analysis.json)"
# Complete phase
npx claude-flow hooks post-task --task-id "phase-3-debug"
Memory Pattern:
testing/functionality-audit/phase-2/tester/execution-resultstesting/functionality-audit/phase-3/coder/root-causestesting/functionality-audit/shared/issue-trackerSuccess Criteria:
Deliverables:
Agents: production-validator (lead), reviewer (quality gate), tester (regression) Duration: 15-25 minutes
Scripts:
# Initialize phase
npx claude-flow hooks pre-task --description "Phase 4: Validate Functionality"
npx claude-flow swarm scale --target-agents 3
# Retrieve fixes from Phase 3
npx claude-flow memory retrieve --key "testing/functionality-audit/phase-3/coder/root-causes"
# Memory coordination - define validation criteria
npx claude-flow memory store --key "testing/functionality-audit/phase-4/validator/criteria" --value '{"functional_correctness":true,"performance_acceptable":true,"no_regressions":true}'
# Execute phase work
# 1. Apply fixes and rerun tests
echo "Applying fixes and validating..."
npm test -- --coverage --verbose > retest-output.log 2>&1
RETEST_EXIT_CODE=$?
# 2. Compare results with baseline
diff test-output.log retest-output.log > validation-diff.txt || true
# 3. Validate no regressions introduced
# - Check previously passing tests still pass
# - Verify no new failures introduced
# - Confirm fixes resolved original issues
# 4. Production readiness assessment
cat > production-readiness.json << 'EOF'
{
"all_tests_passing": false,
"coverage_threshold_met": false,
"performance_acceptable": false,
"security_validated": false,
"ready_for_deployment": false
}
EOF
# Update readiness based on results
if [ $RETEST_EXIT_CODE -eq 0 ]; then
echo "Tests passing, updating readiness..."
fi
# Store validation results
npx claude-flow memory store --key "testing/functionality-audit/phase-4/validator/readiness" --value "$(cat production-readiness.json)"
# Complete phase
npx claude-flow hooks post-task --task-id "phase-4-validate"
Memory Pattern:
testing/functionality-audit/phase-3/coder/root-causestesting/functionality-audit/phase-4/validator/readinesstesting/functionality-audit/shared/validation-resultsSuccess Criteria:
Deliverables:
Agents: reviewer (lead), tester (metrics) Duration: 10-15 minutes
Scripts:
# Initialize phase
npx claude-flow hooks pre-task --description "Phase 5: Report Results"
npx claude-flow swarm scale --target-agents 2
# Retrieve all phase outputs
npx claude-flow memory retrieve --key "testing/functionality-audit/phase-4/validator/readiness"
npx claude-flow memory retrieve --key "testing/functionality-audit/phase-3/coder/root-causes"
npx claude-flow memory retrieve --key "testing/functionality-audit/phase-2/tester/execution-results"
# Execute phase work
# 1. Compile comprehensive audit report
cat > functionality-audit-report.md << 'EOF'
# Functionality Audit Report
## Executive Summary
- **Status**: [PASS/FAIL/PARTIAL]
- **Tests Executed**: X
- **Tests Passed**: Y
- **Code Coverage**: Z%
- **Critical Issues**: N
## Phase-by-Phase Results
### Phase 1: Environment Setup
- Status: Complete
- Issues: None
### Phase 2: Execution Testing
- Total Tests: X
- Passed: Y
- Failed: Z
- Coverage: N%
### Phase 3: Debugging
- Issues Identified: X
- Root Causes: Y categories
- Fixes Applied: Z
### Phase 4: Validation
- Regression Status: No regressions
- Production Ready: Yes/No
- Performance: Acceptable
## Recommendations
1. [Action items for addressing remaining issues]
2. [Performance optimization suggestions]
3. [Code quality improvements]
## Artifacts
- Test logs: test-output.log
- Coverage report: coverage/
- Issue analysis: issue-analysis.json
- Validation results: production-readiness.json
EOF
# 2. Generate metrics dashboard
cat > audit-metrics.json << 'EOF'
{
"timestamp": "",
"duration_minutes": 0,
"tests_total": 0,
"tests_passed": 0,
"coverage_percentage": 0,
"issues_found": 0,
"issues_fixed": 0,
"production_ready": false
}
EOF
# 3. Store final results
npx claude-flow memory store --key "testing/functionality-audit/final-report" --value "$(cat functionality-audit-report.md)"
# Complete phase and export metrics
npx claude-flow hooks post-task --task-id "phase-5-report" --export-metrics true
npx claude-flow hooks session-end --export-metrics true
Memory Pattern:
testing/functionality-audit/phase-{1-4}/*/outputtesting/functionality-audit/final-reporttesting/functionality-audit/shared/complete-auditSuccess Criteria:
Deliverables:
testing/functionality-audit/phase-{N}/{agent-type}/{data-type}
Examples:
testing/functionality-audit/phase-1/tester/sandbox-configtesting/functionality-audit/phase-2/tester/execution-resultstesting/functionality-audit/phase-3/coder/root-causestesting/functionality-audit/phase-4/validator/readinesstesting/functionality-audit/shared/environment-config// Store outputs for downstream agents
mcp__claude_flow__memory_store({
key: "testing/functionality-audit/phase-2/tester/output",
value: JSON.stringify({
status: "complete",
results: {
tests_run: 45,
tests_passed: 38,
tests_failed: 7,
coverage: 82.5,
execution_time_ms: 3420
},
timestamp: Date.now(),
artifacts: {
logs: "/tmp/functionality-audit-sandbox/test-output.log",
coverage: "/tmp/functionality-audit-sandbox/coverage/"
}
})
})
// Retrieve inputs from upstream agents
mcp__claude_flow__memory_retrieve({
key: "testing/functionality-audit/phase-1/tester/sandbox-ready"
}).then(data => {
const sandboxConfig = JSON.parse(data);
console.log(`Using sandbox: ${sandboxConfig.path}`);
})
# Pre-task hook - Initialize audit session
npx claude-flow hooks pre-task \
--description "Functionality audit for module: auth-service" \
--session-id "functionality-audit-$(date +%s)"
# Post-edit hook - Track code changes during debugging
npx claude-flow hooks post-edit \
--file "src/auth/login.ts" \
--memory-key "testing/functionality-audit/coder/edits"
# Post-task hook - Finalize and export metrics
npx claude-flow hooks post-task \
--task-id "functionality-audit" \
--export-metrics true
# Session end - Generate summary and persist state
npx claude-flow hooks session-end \
--export-metrics true \
--summary "Functionality audit completed: 38/45 tests passing, 7 issues debugged and fixed"
Before finalizing each phase, validate:
Does this approach align with successful past work?
Do the outputs support the stated objectives?
Is the chosen method appropriate for the context?
Are there any internal contradictions?
Define objective precisely
Decompose into sub-goals
Identify dependencies
Evaluate options
Synthesize solution
Planning Phase:
Validation Gate 1: Review strategy against objectives
Implementation Phase:
Validation Gate 2: Verify outputs and performance
Optimization Phase:
Validation Gate 3: Confirm targets met before concluding
Related Skills:
when-detecting-fake-code-use-theater-detection - Pre-audit detection of non-functional codewhen-reviewing-code-comprehensively-use-code-review-assistant - Post-audit quality reviewwhen-verifying-quality-use-verification-quality - Comprehensive quality validationwhen-auditing-code-style-use-style-audit - Code style and conventions validationCoordination Points:
Memory Sharing:
# Share audit results with code review skill
npx claude-flow memory store \
--key "code-review/input/functionality-audit-results" \
--value "$(cat functionality-audit-report.md)"
# Retrieve theater detection findings
npx claude-flow memory retrieve \
--key "theater-detection/output/suspicious-code"
Symptoms: Dependency installation errors, configuration failures Root Cause: Missing system dependencies, incompatible versions Solution:
# Validate system dependencies
node --version
npm --version
# Clean install
rm -rf node_modules package-lock.json
npm install --force
# Use Docker for isolation if local setup problematic
docker run -it --rm -v $(pwd):/workspace node:18 bash
Symptoms: Runtime errors referencing undefined config Root Cause: Environment variables not set in sandbox Solution:
# Create .env file with required variables
cat > .env << 'EOF'
NODE_ENV=test
API_KEY=test-key-12345
DATABASE_URL=sqlite::memory:
EOF
# Load environment in test setup
require('dotenv').config();
Symptoms: Coverage report shows gaps in tested code Root Cause: Missing test cases for edge cases, error paths Solution:
Symptoms: Fixing one issue breaks other functionality Root Cause: Tight coupling, shared state, hidden dependencies Solution:
Symptoms: Tests pass but execution time significantly increased Root Cause: Inefficient fix implementation, excessive validation Solution:
# Profile execution to identify bottlenecks
node --prof app.js
node --prof-process isolate-*.log > profile.txt
# Optimize hot paths
# - Cache repeated computations
# - Reduce unnecessary iterations
# - Use appropriate data structures
# Initialize audit for API endpoint
npx claude-flow hooks pre-task --description "Audit /api/users endpoint"
# Phase 1: Setup
mkdir -p /tmp/api-audit
cd /tmp/api-audit
npm init -y
npm install --save-dev jest supertest
# Phase 2: Execute tests
cat > users.test.js << 'EOF'
const request = require('supertest');
const app = require('../src/app');
describe('GET /api/users', () => {
test('returns 200 with user list', async () => {
const response = await request(app).get('/api/users');
expect(response.statusCode).toBe(200);
expect(Array.isArray(response.body)).toBe(true);
});
test('handles authentication', async () => {
const response = await request(app)
.get('/api/users')
.set('Authorization', 'Bearer invalid-token');
expect(response.statusCode).toBe(401);
});
});
EOF
npm test
# Phase 3: Debug failures (if any)
# Analyze error messages and stack traces
# Fix authentication logic or data access issues
# Phase 4: Validate fixes
npm test -- --coverage
# Phase 5: Report
echo "API audit complete: All endpoints functional"
// Phase 2: Create test with realistic data
const { processUserData } = require('./data-processor');
describe('processUserData', () => {
test('processes valid user data correctly', () => {
const input = {
name: 'John Doe',
email: '[email protected]',
age: 30,
roles: ['user', 'admin']
};
const result = processUserData(input);
expect(result).toHaveProperty('id');
expect(result.email).toBe('[email protected]');
expect(result.roles).toContain('admin');
});
test('handles missing fields gracefully', () => {
const input = { name: 'Jane' };
const result = processUserData(input);
expect(result).toHaveProperty('email');
expect(result.email).toBe(''); // Default value
});
test('validates email format', () => {
const input = { email: 'invalid-email' };
expect(() => processUserData(input)).toThrow('Invalid email format');
});
});
// Phase 3: Debug - Found that email validation regex was incorrect
// Fix: Update regex to properly match email patterns
// Phase 4: Validate - All tests pass, coverage 95%
// Phase 2: Create component test with user interactions
import { render, screen, fireEvent } from '@testing-library/react';
import LoginForm from './LoginForm';
describe('LoginForm', () => {
test('submits form with valid credentials', async () => {
const handleSubmit = jest.fn();
render(<LoginForm onSubmit={handleSubmit} />);
fireEvent.change(screen.getByLabelText(/email/i), {
target: { value: '[email protected]' }
});
fireEvent.change(screen.getByLabelText(/password/i), {
target: { value: 'password123' }
});
fireEvent.click(screen.getByRole('button', { name: /login/i }));
await screen.findByText(/logging in/i);
expect(handleSubmit).toHaveBeenCalledWith({
email: '[email protected]',
password: 'password123'
});
});
test('displays validation errors', () => {
render(<LoginForm onSubmit={jest.fn()} />);
fireEvent.click(screen.getByRole('button', { name: /login/i }));
expect(screen.getByText(/email is required/i)).toBeInTheDocument();
expect(screen.getByText(/password is required/i)).toBeInTheDocument();
});
});
// Phase 3: Debug - Found that form validation wasn't triggering
// Fix: Add form validation logic with proper error state management
// Phase 4: Validate - Component renders and validates correctly
development
Comprehensive truth scoring, code quality verification, and automatic rollback system with 0.95 accuracy threshold for ensuring high-quality agent outputs and codebase reliability.
development
Comprehensive framework for analyzing, creating, and refining prompts for AI systems using evidence-based techniques
data-ai
Implement adaptive learning with ReasoningBank for pattern recognition, strategy optimization, and continuous improvement
development
Create new Claude Code Skills with proper YAML frontmatter, progressive disclosure structure, and complete directory organization