Task Quality KPI Framework

Overview

The Task Quality KPI Framework provides objective, quantitative metrics for evaluating task implementation quality.

Key Architecture: KPIs are auto-generated by a hook - you read the results, not run scripts.

┌─────────────────────────────────────────────────────────────┐
│  HOOK (auto-executes)                                       │
│  Trigger: PostToolUse on TASK-*.md                          │
│  Script: task-kpi-analyzer.py                               │
│  Output: TASK-XXX--kpi.json                                 │
├─────────────────────────────────────────────────────────────┤
│  SKILL / AGENT (reads output)                               │
│  Input: TASK-XXX--kpi.json                                  │
│  Action: Make evaluation decisions                          │
└─────────────────────────────────────────────────────────────┘

Why This Architecture?

| Problem | Solution | |---------|----------| | Skills can't execute scripts | Hook auto-runs on file save | | Subjective review_status | Quantitative 0-10 scores | | "Looks good to me" | Evidence-based evaluation | | Binary pass/fail | Graduated quality levels |

KPI File Location

After any task file modification, find KPI data at:

docs/specs/[ID]/tasks/TASK-XXX--kpi.json

KPI Categories

┌─────────────────────────────────────────────────────────────┐
│                    OVERALL SCORE (0-10)                     │
├─────────────────────────────────────────────────────────────┤
│  Spec Compliance (30%)                                      │
│  ├── Acceptance Criteria Met (0-10)                         │
│  ├── Requirements Coverage (0-10)                           │
│  └── No Scope Creep (0-10)                                  │
├─────────────────────────────────────────────────────────────┤
│  Code Quality (25%)                                         │
│  ├── Static Analysis (0-10)                                 │
│  ├── Complexity (0-10)                                      │
│  └── Patterns Alignment (0-10)                              │
├─────────────────────────────────────────────────────────────┤
│  Test Coverage (25%)                                        │
│  ├── Unit Tests Present (0-10)                              │
│  ├── Test/Code Ratio (0-10)                                 │
│  └── Coverage Percentage (0-10)                             │
├─────────────────────────────────────────────────────────────┤
│  Contract Fulfillment (20%)                                 │
│  ├── Provides Verified (0-10)                               │
│  └── Expects Satisfied (0-10)                               │
└─────────────────────────────────────────────────────────────┘

Category Weights

| Category | Weight | Why | |----------|--------|-----| | Spec Compliance | 30% | Most important - did we build what was asked? | | Code Quality | 25% | Technical excellence | | Test Coverage | 25% | Verification and confidence | | Contract Fulfillment | 20% | Integration with other tasks |

When to Use

Reading KPI data for task quality evaluation
Understanding quality metrics and scoring breakdown
Deciding whether to iterate or approve based on quantitative data
Integrating KPI checks into automated loops (agents_loop.py)
Generating evidence-based evaluation reports

Instructions

1. Reading KPI Data (Primary Use)

DO NOT run scripts - read the auto-generated file:

Read the KPI file:
  docs/specs/001-feature/tasks/TASK-001--kpi.json

2. Understanding the Data

The KPI file contains:

{
  "task_id": "TASK-001",
  "evaluated_at": "2026-01-15T10:30:00Z",
  "overall_score": 8.2,
  "passed_threshold": true,
  "threshold": 7.5,
  "kpi_scores": [
    {
      "category": "Spec Compliance",
      "weight": 30,
      "score": 8.5,
      "weighted_score": 2.55,
      "metrics": {
        "acceptance_criteria_met": 9.0,
        "requirements_coverage": 8.0,
        "no_scope_creep": 8.5
      },
      "evidence": [
        "Acceptance criteria: 9/10 checked",
        "Requirements coverage: 8/10"
      ]
    }
  ],
  "recommendations": [
    "Code Quality: Moderate improvements possible"
  ],
  "summary": "Score: 8.2/10 - PASSED"
}

3. Making Decisions

Use overall_score and passed_threshold:

IF passed_threshold == true:
  → Task meets quality standards
  → Approve and proceed

IF passed_threshold == false:
  → Task needs improvement
  → Check recommendations for specific targets
  → Create fix specification

Integration with Workflow

In Task Review (evaluator-agent)

## Review Process

1. Read KPI file: TASK-XXX--kpi.json
2. Extract overall_score and kpi_scores
3. Read task file to validate
4. Generate evaluation report
5. Decision based on passed_threshold

In agents_loop

# Check KPI file exists
kpi_path = spec_path / "tasks" / f"{task_id}--kpi.json"

if kpi_path.exists():
    kpi_data = json.loads(kpi_path.read_text())
    
    if kpi_data["passed_threshold"]:
        # Quality threshold met
        advance_state("update_done")
    else:
        # Need more work
        fix_targets = kpi_data["recommendations"]
        create_fix_task(fix_targets)
        advance_state("fix")
else:
    # KPI not generated yet - task may not be implemented
    log_warning("No KPI data found")

Multi-Iteration Loop

Instead of max 3 retries, iterate until quality threshold met:

Iteration 1: Score 6.2 → FAILED → Fix: Improve test coverage
Iteration 2: Score 7.1 → FAILED → Fix: Refactor complex functions  
Iteration 3: Score 7.8 → PASSED → Proceed

Each iteration updates the KPI file automatically on task save.

Threshold Guidelines

| Score | Quality Level | Action | |-------|---------------|--------| | 9.0-10.0 | Exceptional | Approve, document best practices | | 8.0-8.9 | Good | Approve with minor notes | | 7.0-7.9 | Acceptable | Approve (if threshold 7.5) | | 6.0-6.9 | Below Standard | Request specific improvements | | < 6.0 | Poor | Significant rework required |

Recommended Thresholds

| Project Type | Threshold | Rationale | |--------------|-----------|-----------| | Production MVP | 8.0 | High quality required | | Internal Tool | 7.0 | Good enough | | Prototype | 6.0 | Functional over perfect | | Critical System | 8.5 | No compromises |

Metric Details

Spec Compliance Metrics

Acceptance Criteria Met

Calculates: (checked_criteria / total_criteria) * 10
Source: Task file checkbox count
Example: 9/10 checked = 9.0

Requirements Coverage

Calculates: Count of REQ-IDs this task covers
Source: traceability-matrix.md
Example: 4 requirements covered = 8.0

No Scope Creep

Calculates: (implemented_files / expected_files) * 10
Source: Task "Files to Create" vs actual files
Penalizes: Missing files or unexpected additions

Code Quality Metrics

Static Analysis

Java: Maven Checkstyle
TypeScript: ESLint
Python: ruff
Score: 10 if passes, 5 if issues found

Complexity

Calculates: Functions >50 lines
Score: 10 - (long_functions_ratio * 5)
Penalizes: Large, complex functions

Patterns Alignment

Checks: Knowledge Graph patterns
Source: knowledge-graph.json
Validates: Implementation follows project patterns

Test Coverage Metrics

Unit Tests Present

Calculates: min(10, test_files * 5)
2 test files = maximum score
Penalizes: Missing tests

Test/Code Ratio

Calculates: (test_count / code_count) * 10
1:1 ratio = 10/10
Ideal: At least 1 test file per code file

Coverage Percentage

Source: Coverage reports (JaCoCo, lcov, etc.)
Calculates: coverage_percent / 10
80% coverage = 8.0

Contract Fulfillment Metrics

Provides Verified

Checks: Files exist and export expected symbols
Source: Task provides frontmatter
Validates: Contract satisfied

Expects Satisfied

Checks: Dependencies provide required files/symbols
Source: Task expects frontmatter
Validates: Prerequisites met

When KPI File is Missing

If TASK-XXX--kpi.json doesn't exist:

Task was never modified - Hook runs on file save
Hook failed - Check Claude Code logs
Task is new - Save the file first to trigger hook

DO NOT try to calculate KPIs manually. The hook runs automatically when:

Task file is saved (Write tool)
Task file is edited (Edit tool)

Best Practices

1. Always Check KPI File Exists

Before evaluating:

Check if KPI file exists:
  docs/specs/[ID]/tasks/TASK-XXX--kpi.json

If missing:
  - Task may not be implemented yet
  - Ask user to save the task file first

2. Trust the Metrics

The KPIs are objective. Only override with documented evidence:

Critical security issue not in metrics
Logic error not caught by static analysis
Exceptional quality not measured

3. Iterate on Low KPIs

Target specific categories:

❌ "Fix code quality issues"
✅ "Improve Code Quality KPI from 5.2 to 7.0:
    - Complexity: Refactor processData() (5→8)
    - Patterns: Add error handling (6→8)"

4. Track KPI Trends

Monitor quality over time:

Sprint 1: Average KPI 6.8
Sprint 2: Average KPI 7.3 (+0.5)
Sprint 3: Average KPI 7.9 (+0.6)

Troubleshooting

KPI File Not Generated

Check:

Hook enabled in hooks.json
Task file name matches pattern TASK-*.md
File was actually saved (not just viewed)

KPI Scores Seem Wrong

Validate:

Check evidence field for data sources
Verify files exist at expected paths
Some metrics need build tools (Maven, npm)

Low Scores Despite Good Code

Possible causes:

Missing test files
No coverage report generated
Acceptance criteria not checked
Lint rules too strict

Fix the root cause, not just the score.

Examples

Example 1: Reading KPI Data

Read the KPI file to evaluate task quality:
  docs/specs/001-feature/tasks/TASK-042--kpi.json

Based on the data:
- Overall score: 6.8/10 (below threshold)
- Lowest KPI: Test Coverage (5.0/10)
- Recommendation: Add unit tests

Decision: REQUEST FIXES - target Test Coverage improvement

Example 2: Iteration Decision

Iteration 1 KPI: Score 6.2 → FAILED
- Spec Compliance: 7.0 ✓
- Code Quality: 5.5 ✗
- Test Coverage: 6.0 ✗

Fix targets:
1. Refactor complex functions (Code Quality)
2. Add test coverage (Test Coverage)

Iteration 2 KPI: Score 7.8 → PASSED ✓

Example 3: agents_loop Integration

# In agents_loop, after implementation step
kpi_file = spec_dir / "tasks" / f"{task_id}--kpi.json"

if kpi_file.exists():
    kpi = json.loads(kpi_file.read_text())
    
    if kpi["passed_threshold"]:
        print(f"✅ Task passed quality check: {kpi['overall_score']}/10")
        advance_state("update_done")
    else:
        print(f"❌ Task failed quality check: {kpi['overall_score']}/10")
        print("Recommendations:")
        for rec in kpi["recommendations"]:
            print(f"  - {rec}")
        advance_state("fix")

References

evaluator-agent.md - Agent that uses KPI data for evaluation
hooks.json - Hook configuration for auto-generation
task-kpi-analyzer.py - Hook script (do not execute directly)
agents_loop.py - Orchestrator that reads KPI for decisions

Task Quality KPI Framework

Overview

The Task Quality KPI Framework provides objective, quantitative metrics for evaluating task implementation quality.

Key Architecture: KPIs are auto-generated by a hook - you read the results, not run scripts.

┌─────────────────────────────────────────────────────────────┐
│  HOOK (auto-executes)                                       │
│  Trigger: PostToolUse on TASK-*.md                          │
│  Script: task-kpi-analyzer.py                               │
│  Output: TASK-XXX--kpi.json                                 │
├─────────────────────────────────────────────────────────────┤
│  SKILL / AGENT (reads output)                               │
│  Input: TASK-XXX--kpi.json                                  │
│  Action: Make evaluation decisions                          │
└─────────────────────────────────────────────────────────────┘

Why This Architecture?

KPI File Location

After any task file modification, find KPI data at:

docs/specs/[ID]/tasks/TASK-XXX--kpi.json

KPI Categories

┌─────────────────────────────────────────────────────────────┐
│                    OVERALL SCORE (0-10)                     │
├─────────────────────────────────────────────────────────────┤
│  Spec Compliance (30%)                                      │
│  ├── Acceptance Criteria Met (0-10)                         │
│  ├── Requirements Coverage (0-10)                           │
│  └── No Scope Creep (0-10)                                  │
├─────────────────────────────────────────────────────────────┤
│  Code Quality (25%)                                         │
│  ├── Static Analysis (0-10)                                 │
│  ├── Complexity (0-10)                                      │
│  └── Patterns Alignment (0-10)                              │
├─────────────────────────────────────────────────────────────┤
│  Test Coverage (25%)                                        │
│  ├── Unit Tests Present (0-10)                              │
│  ├── Test/Code Ratio (0-10)                                 │
│  └── Coverage Percentage (0-10)                             │
├─────────────────────────────────────────────────────────────┤
│  Contract Fulfillment (20%)                                 │
│  ├── Provides Verified (0-10)                               │
│  └── Expects Satisfied (0-10)                               │
└─────────────────────────────────────────────────────────────┘

Category Weights

When to Use

Reading KPI data for task quality evaluation
Understanding quality metrics and scoring breakdown
Deciding whether to iterate or approve based on quantitative data
Integrating KPI checks into automated loops (agents_loop.py)
Generating evidence-based evaluation reports

Instructions

1. Reading KPI Data (Primary Use)

DO NOT run scripts - read the auto-generated file:

Read the KPI file:
  docs/specs/001-feature/tasks/TASK-001--kpi.json

2. Understanding the Data

The KPI file contains:

{
  "task_id": "TASK-001",
  "evaluated_at": "2026-01-15T10:30:00Z",
  "overall_score": 8.2,
  "passed_threshold": true,
  "threshold": 7.5,
  "kpi_scores": [
    {
      "category": "Spec Compliance",
      "weight": 30,
      "score": 8.5,
      "weighted_score": 2.55,
      "metrics": {
        "acceptance_criteria_met": 9.0,
        "requirements_coverage": 8.0,
        "no_scope_creep": 8.5
      },
      "evidence": [
        "Acceptance criteria: 9/10 checked",
        "Requirements coverage: 8/10"
      ]
    }
  ],
  "recommendations": [
    "Code Quality: Moderate improvements possible"
  ],
  "summary": "Score: 8.2/10 - PASSED"
}

3. Making Decisions

Use overall_score and passed_threshold:

IF passed_threshold == true:
  → Task meets quality standards
  → Approve and proceed

IF passed_threshold == false:
  → Task needs improvement
  → Check recommendations for specific targets
  → Create fix specification

Integration with Workflow

In Task Review (evaluator-agent)

## Review Process

1. Read KPI file: TASK-XXX--kpi.json
2. Extract overall_score and kpi_scores
3. Read task file to validate
4. Generate evaluation report
5. Decision based on passed_threshold

In agents_loop

# Check KPI file exists
kpi_path = spec_path / "tasks" / f"{task_id}--kpi.json"

if kpi_path.exists():
    kpi_data = json.loads(kpi_path.read_text())
    
    if kpi_data["passed_threshold"]:
        # Quality threshold met
        advance_state("update_done")
    else:
        # Need more work
        fix_targets = kpi_data["recommendations"]
        create_fix_task(fix_targets)
        advance_state("fix")
else:
    # KPI not generated yet - task may not be implemented
    log_warning("No KPI data found")

Multi-Iteration Loop

Instead of max 3 retries, iterate until quality threshold met:

Iteration 1: Score 6.2 → FAILED → Fix: Improve test coverage
Iteration 2: Score 7.1 → FAILED → Fix: Refactor complex functions  
Iteration 3: Score 7.8 → PASSED → Proceed

Each iteration updates the KPI file automatically on task save.

Threshold Guidelines

Recommended Thresholds

Metric Details

Spec Compliance Metrics

Acceptance Criteria Met

Calculates: (checked_criteria / total_criteria) * 10
Source: Task file checkbox count
Example: 9/10 checked = 9.0

Requirements Coverage

Calculates: Count of REQ-IDs this task covers
Source: traceability-matrix.md
Example: 4 requirements covered = 8.0

No Scope Creep

Calculates: (implemented_files / expected_files) * 10
Source: Task "Files to Create" vs actual files
Penalizes: Missing files or unexpected additions

Code Quality Metrics

Static Analysis

Java: Maven Checkstyle
TypeScript: ESLint
Python: ruff
Score: 10 if passes, 5 if issues found

Complexity

Calculates: Functions >50 lines
Score: 10 - (long_functions_ratio * 5)
Penalizes: Large, complex functions

Patterns Alignment

Checks: Knowledge Graph patterns
Source: knowledge-graph.json
Validates: Implementation follows project patterns

Test Coverage Metrics

Unit Tests Present

Calculates: min(10, test_files * 5)
2 test files = maximum score
Penalizes: Missing tests

Test/Code Ratio

Calculates: (test_count / code_count) * 10
1:1 ratio = 10/10
Ideal: At least 1 test file per code file

Coverage Percentage

Source: Coverage reports (JaCoCo, lcov, etc.)
Calculates: coverage_percent / 10
80% coverage = 8.0

Contract Fulfillment Metrics

Provides Verified

Checks: Files exist and export expected symbols
Source: Task provides frontmatter
Validates: Contract satisfied

Expects Satisfied

Checks: Dependencies provide required files/symbols
Source: Task expects frontmatter
Validates: Prerequisites met

When KPI File is Missing

If TASK-XXX--kpi.json doesn't exist:

Task was never modified - Hook runs on file save
Hook failed - Check Claude Code logs
Task is new - Save the file first to trigger hook

DO NOT try to calculate KPIs manually. The hook runs automatically when:

Task file is saved (Write tool)
Task file is edited (Edit tool)

Best Practices

1. Always Check KPI File Exists

Before evaluating:

Check if KPI file exists:
  docs/specs/[ID]/tasks/TASK-XXX--kpi.json

If missing:
  - Task may not be implemented yet
  - Ask user to save the task file first

2. Trust the Metrics

The KPIs are objective. Only override with documented evidence:

Critical security issue not in metrics
Logic error not caught by static analysis
Exceptional quality not measured

3. Iterate on Low KPIs

Target specific categories:

❌ "Fix code quality issues"
✅ "Improve Code Quality KPI from 5.2 to 7.0:
    - Complexity: Refactor processData() (5→8)
    - Patterns: Add error handling (6→8)"

4. Track KPI Trends

Monitor quality over time:

Sprint 1: Average KPI 6.8
Sprint 2: Average KPI 7.3 (+0.5)
Sprint 3: Average KPI 7.9 (+0.6)

Troubleshooting

KPI File Not Generated

Check:

Hook enabled in hooks.json
Task file name matches pattern TASK-*.md
File was actually saved (not just viewed)

KPI Scores Seem Wrong

Validate:

Check evidence field for data sources
Verify files exist at expected paths
Some metrics need build tools (Maven, npm)

Low Scores Despite Good Code

Possible causes:

Missing test files
No coverage report generated
Acceptance criteria not checked
Lint rules too strict

Fix the root cause, not just the score.

Examples

Example 1: Reading KPI Data

Read the KPI file to evaluate task quality:
  docs/specs/001-feature/tasks/TASK-042--kpi.json

Based on the data:
- Overall score: 6.8/10 (below threshold)
- Lowest KPI: Test Coverage (5.0/10)
- Recommendation: Add unit tests

Decision: REQUEST FIXES - target Test Coverage improvement

Example 2: Iteration Decision

Iteration 1 KPI: Score 6.2 → FAILED
- Spec Compliance: 7.0 ✓
- Code Quality: 5.5 ✗
- Test Coverage: 6.0 ✗

Fix targets:
1. Refactor complex functions (Code Quality)
2. Add test coverage (Test Coverage)

Iteration 2 KPI: Score 7.8 → PASSED ✓

Example 3: agents_loop Integration

# In agents_loop, after implementation step
kpi_file = spec_dir / "tasks" / f"{task_id}--kpi.json"

if kpi_file.exists():
    kpi = json.loads(kpi_file.read_text())
    
    if kpi["passed_threshold"]:
        print(f"✅ Task passed quality check: {kpi['overall_score']}/10")
        advance_state("update_done")
    else:
        print(f"❌ Task failed quality check: {kpi['overall_score']}/10")
        print("Recommendations:")
        for rec in kpi["recommendations"]:
            print(f"  - {rec}")
        advance_state("fix")

References

evaluator-agent.md - Agent that uses KPI data for evaluation
hooks.json - Hook configuration for auto-generation
task-kpi-analyzer.py - Hook script (do not execute directly)
agents_loop.py - Orchestrator that reads KPI for decisions

Adoption

giuseppe-trisciuoglio/task-quality-kpi

$ install --global

Security Scan Results

SKILL.md

Task Quality KPI Framework

Overview

Why This Architecture?

KPI File Location

KPI Categories

Category Weights

When to Use

Instructions

1. Reading KPI Data (Primary Use)

2. Understanding the Data

3. Making Decisions

Integration with Workflow

In Task Review (evaluator-agent)

In agents_loop

Multi-Iteration Loop

Threshold Guidelines

Recommended Thresholds

Metric Details

Spec Compliance Metrics

Code Quality Metrics

Test Coverage Metrics

Contract Fulfillment Metrics

When KPI File is Missing

Best Practices

1. Always Check KPI File Exists

2. Trust the Metrics

3. Iterate on Low KPIs

4. Track KPI Trends

Troubleshooting

KPI File Not Generated

KPI Scores Seem Wrong

Low Scores Despite Good Code

Examples

Example 1: Reading KPI Data

Example 2: Iteration Decision

Example 3: agents_loop Integration

References

Related Skills

giuseppe-trisciuoglio/specs-explore

giuseppe-trisciuoglio/specs-e2e-verification

giuseppe-trisciuoglio/sdd-init

giuseppe-trisciuoglio/brainstorm-prompt-optimizer

giuseppe-trisciuoglio/task-quality-kpi

$ install --global

Security Scan Results

SKILL.md

Task Quality KPI Framework

Overview

Why This Architecture?

KPI File Location

KPI Categories

Category Weights

When to Use

Instructions

1. Reading KPI Data (Primary Use)

2. Understanding the Data

3. Making Decisions

Integration with Workflow

In Task Review (evaluator-agent)

In agents_loop

Multi-Iteration Loop

Threshold Guidelines

Recommended Thresholds

Metric Details

Spec Compliance Metrics

Code Quality Metrics

Test Coverage Metrics

Contract Fulfillment Metrics

When KPI File is Missing

Best Practices

1. Always Check KPI File Exists

2. Trust the Metrics

3. Iterate on Low KPIs

4. Track KPI Trends

Troubleshooting