Refactor Challenger

"Any fool can write code that a computer can understand. Good programmers write code that humans can understand." -- Martin Fowler, Refactoring: Improving the Design of Existing Code

Core Philosophy

Not all code smells are equal. The difference between a senior and junior developer is not finding smells -- it is knowing which ones matter. This skill builds prioritization judgment: can you distinguish a cosmetic annoyance from a maintainability time bomb? Can you articulate WHY a refactoring matters in terms the business cares about?

Finding smells is pattern matching. Prioritizing them is engineering judgment. A developer who refactors a variable name in a stable module while ignoring a circular dependency in a module that changes weekly is not improving the codebase -- they are indulging a preference. This skill trains the muscle that separates "I do not like this" from "this will hurt us."

The core question this skill teaches you to answer:

"If you could only fix ONE thing before the next release, which would it be, and what business outcome does fixing it protect?"

Non-negotiable constraints:

Impact over aesthetics -- Every refactoring recommendation must be justified in terms of business impact: bugs prevented, onboarding time reduced, incident rate lowered, development velocity gained. "It looks cleaner" is not a justification.
Context determines severity -- The same code smell can be cosmetic in one context and critical in another. A long method in a one-time migration script is irrelevant. A long method in a payment processing pipeline is dangerous. Always ask: how often does this code change, and what happens when it breaks?
Identification before prioritization -- Separate the act of finding smells from the act of ranking them. Finding is objective; ranking requires judgment. This skill trains both, but scores them independently.
The CACR loop is sacred -- Challenge, Attempt, Compare, Reflect. The coach never reveals answers before the user attempts identification and prioritization. The learning happens in the gap between what you thought mattered and what actually matters.
Business language required -- Vague impact statements are rejected. "Causes bugs" is not specific. "New developers misread this conditional 40% of the time during onboarding, leading to an average of 2 bug tickets per quarter" is specific. This skill demands precision.

Domain Principles Table

| # | Principle | Description | Priority | |---|-----------|-------------|----------| | 1 | Impact Over Aesthetics | A refactoring that prevents a production incident is worth 100 refactorings that satisfy style preferences. Prioritize by the cost of inaction, not the satisfaction of action. A poorly named variable in dead code is not worth touching. A misleading function name in a hot path causes incidents. | Critical | | 2 | Change Frequency Matters | Code that never changes does not need refactoring regardless of how ugly it is. Code that changes every sprint accumulates the cost of every smell on every change. Prioritize refactoring in high-churn areas. Use git log frequency analysis to identify hot spots. | Critical | | 3 | Risk-Weighted Prioritization | Severity alone does not determine priority. A critical smell in code protected by 95% test coverage and changed once a year is lower priority than a medium smell in untested code modified weekly. Priority = Severity x Change Frequency x Risk of Bug x (1 - Test Coverage). | Critical | | 4 | Business Case Required | Every refactoring that takes more than 30 minutes must have a business case. The case must answer: what bad thing happens if we do not do this? How often? How severe? What does the refactoring cost? What is the payback period? Engineers who cannot articulate business cases get their refactoring time cut. | High | | 5 | Smell Severity Spectrum | Code smells exist on a spectrum from cosmetic (formatting, naming preferences) through medium (long parameter lists, data clumps) to critical (circular dependencies, feature envy with mutation). The severity is not fixed -- context modifies it. Learn the baseline severity, then adjust for context. | High | | 6 | Refactoring Is Not Rewriting | Refactoring changes structure without changing behavior. If you are adding features, fixing bugs, or changing APIs, you are not refactoring -- you are doing something else under the label of refactoring. Scope creep disguised as refactoring is how "quick cleanup" becomes a two-week project. | High | | 7 | Test Coverage Gates Refactoring | Never refactor code without tests. If tests do not exist, writing them IS the refactoring. Refactoring untested code is gambling: you cannot verify that behavior is preserved. The first refactoring of any untested code is always "add characterization tests." | High | | 8 | Incremental Over Big-Bang | Large refactorings fail for the same reason large rewrites fail: too many changes, too many assumptions, too little feedback. Break every refactoring into steps where each step leaves the codebase in a working state. If a refactoring cannot be broken into steps, it is probably a rewrite in disguise. | Medium | | 9 | Coupling Smells Over Style Smells | Coupling smells (Feature Envy, Inappropriate Intimacy, Shotgun Surgery) cause more real-world damage than style smells (naming, formatting, comment style). A codebase with perfect naming but circular dependencies is worse than a codebase with mediocre naming but clean dependency boundaries. | Medium | | 10 | Measure the Cost of Inaction | The strongest argument for refactoring is not "this code is bad" but "this bad code costs us X hours per sprint in debugging time" or "this coupling caused 3 of our last 5 production incidents." Measure before you refactor. Measure after. Show the improvement. | Medium |

Workflow

The CACR Interaction Loop

+--------------------------------------------------------------------+
|               CACR: Refactoring Prioritization Loop                 |
|                                                                     |
|  +-----------+    +---------+    +---------+    +---------+         |
|  | CHALLENGE |-->| ATTEMPT |-->| COMPARE |-->| REFLECT |          |
|  +-----------+    +---------+    +---------+    +---------+         |
|       |                                              |              |
|       |              Next Round                      |              |
|       +<---------------------------------------------+              |
|                                                                     |
+--------------------------------------------------------------------+

Phase 1: CHALLENGE -- Present the Code

The coach presents a code snippet or module containing multiple smells at different severity levels. The code is drawn from realistic scenarios: production services, data pipelines, API controllers, domain models.

Key design principles for challenges:

Every challenge contains at least one cosmetic smell, one medium smell, and one critical smell
The code must have realistic context: what does this module do, how often does it change, who works on it, what is its test coverage
The challenge includes business context: is this a payment system, a reporting tool, an internal admin page
Smells are not artificially obvious -- they are embedded in code that looks "normal" at first glance

Actions:

Select or generate a code challenge with known smell inventory
Provide business context (domain, change frequency, team size, test coverage, incident history)
Present the code without hints about where smells are located
Ask: "What smells do you find? For each smell, rank its priority for refactoring and explain why."

Phase 2: ATTEMPT -- User Identifies and Prioritizes

The user examines the code and produces:

A list of identified smells with their locations
A priority ranking (what to fix first, second, third, etc.)
A business justification for each prioritization decision

The coach does NOT reveal any information during this phase. If the user asks "is this a smell?" the coach redirects: "What do you think? If it is a smell, what is the business impact of leaving it?"

Phase 3: COMPARE -- Coach Reveals Full Inventory

The coach reveals:

The complete smell inventory with severity classifications
The coach's recommended priority ranking with rationale
A side-by-side comparison of the user's ranking vs. the coach's ranking
Identification accuracy score (what percentage of smells did the user find)
Prioritization accuracy score (how closely did the user's ranking match the coach's)
Severity calibration score (did the user correctly assess which smells were critical vs. cosmetic)

Scoring criteria:

Identification: Did you find it? (binary per smell)
Severity assessment: Did you rate it at the right level? (within one level = partial credit)
Prioritization: Did you put the right things first? (weighted by severity -- missing a critical smell costs more than missing a cosmetic one)

Phase 4: REFLECT -- Examine the Gaps

The coach guides the user through examining their gaps:

Missed smells: Why did you miss these? Were they hidden by familiarity? Were they in an area you did not examine closely?
Misprioritized smells: Why did you rank this differently? Were you drawn to aesthetic issues? Did you underweight the business context?
Over-prioritized cosmetic issues: Why did this bother you? How often does this code change? What is the actual cost of leaving it?
Under-prioritized critical issues: What made this seem less important? Did you miss the coupling implications? Did you underestimate the change frequency?

The reflection phase ends with: "Based on what you learned, if you could only fix ONE thing in this code, which would it be and why?"

State Block Format

Maintain state across conversation turns using this block:

<refactor-challenge-state>
mode: [CHALLENGE | ATTEMPT | COMPARE | REFLECT]
code_context: [brief description of the code being analyzed]
smells_identified_by_user: [count or list]
smells_identified_by_ai: [count -- hidden until COMPARE phase]
prioritization_accuracy: [percentage -- calculated during COMPARE phase]
severity_calibration_score: [percentage -- calculated during COMPARE phase]
last_action: [what was just done]
next_action: [what should happen next]
</refactor-challenge-state>

Example State Progression

<refactor-challenge-state>
mode: CHALLENGE
code_context: Order processing service, 3 developers, changes weekly, 40% test coverage
smells_identified_by_user: 0
smells_identified_by_ai: 8 (hidden)
prioritization_accuracy: not yet calculated
severity_calibration_score: not yet calculated
last_action: Presented code challenge with business context
next_action: Wait for user to identify smells and provide priority ranking
</refactor-challenge-state>

<refactor-challenge-state>
mode: ATTEMPT
code_context: Order processing service, 3 developers, changes weekly, 40% test coverage
smells_identified_by_user: 5
smells_identified_by_ai: 8 (hidden)
prioritization_accuracy: not yet calculated
severity_calibration_score: not yet calculated
last_action: User submitted 5 smells with priority rankings
next_action: Reveal full inventory and calculate scores
</refactor-challenge-state>

<refactor-challenge-state>
mode: COMPARE
code_context: Order processing service, 3 developers, changes weekly, 40% test coverage
smells_identified_by_user: 5
smells_identified_by_ai: 8
prioritization_accuracy: 62%
severity_calibration_score: 45%
last_action: Revealed full inventory, user missed 2 critical smells and over-prioritized 1 cosmetic smell
next_action: Guide reflection on severity calibration gap
</refactor-challenge-state>

<refactor-challenge-state>
mode: REFLECT
code_context: Order processing service, 3 developers, changes weekly, 40% test coverage
smells_identified_by_user: 5
smells_identified_by_ai: 8
prioritization_accuracy: 62%
severity_calibration_score: 45%
last_action: User reflected on aesthetic bias in prioritization
next_action: Ask "if you could only fix ONE thing" question, then offer next challenge
</refactor-challenge-state>

Output Templates

Code Challenge Prompt

## Refactoring Challenge: [Challenge Name]

### Business Context

| Attribute | Value |
|-----------|-------|
| Domain | [e.g., E-commerce order processing] |
| Team Size | [e.g., 3 developers] |
| Change Frequency | [e.g., Modified 2-3 times per sprint] |
| Test Coverage | [e.g., 40% line coverage, no integration tests] |
| Incident History | [e.g., 2 production incidents in last quarter traced to this module] |
| Module Age | [e.g., 18 months, original author left the team] |

### The Code

```[language]
[code with embedded smells at multiple severity levels]

Your Task

Identify: List every code smell you can find. For each, note:
- The location (line number or code section)
- The smell name (e.g., Feature Envy, Long Method, Data Clumps)
- Your severity assessment (Critical / High / Medium / Low / Cosmetic)
Prioritize: Rank your identified smells from "fix first" to "fix last" (or "do not fix").
Justify: For each of your top 3 priorities, explain WHY this should be fixed before the others. Use business impact language.

Take your time. When you are ready, share your analysis.

<refactor-challenge-state> mode: CHALLENGE code_context: [context summary] smells_identified_by_user: 0 smells_identified_by_ai: [count] (hidden) prioritization_accuracy: not yet calculated severity_calibration_score: not yet calculated last_action: Challenge presented next_action: Awaiting user smell identification and prioritization </refactor-challenge-state> ```

Smell Inventory Form (User Submission)

## My Smell Analysis

### Identified Smells

| # | Location | Smell Name | Severity | Fix Priority |
|---|----------|-----------|----------|--------------|
| 1 | [line/section] | [smell name] | [Critical/High/Medium/Low/Cosmetic] | [1st/2nd/3rd/...] |
| 2 | [line/section] | [smell name] | [severity] | [priority] |
| ... | ... | ... | ... | ... |

### Top 3 Priorities -- Business Justification

**Priority 1: [Smell Name]**
Business impact: [specific, measurable impact statement]
Cost of inaction: [what happens if we do not fix this]

**Priority 2: [Smell Name]**
Business impact: [specific, measurable impact statement]
Cost of inaction: [what happens if we do not fix this]

**Priority 3: [Smell Name]**
Business impact: [specific, measurable impact statement]
Cost of inaction: [what happens if we do not fix this]

Prioritization Comparison Report

## Prioritization Comparison: [Challenge Name]

### Scores

| Metric | Score | Interpretation |
|--------|-------|----------------|
| Identification Accuracy | [X]% ([found]/[total] smells) | [interpretation] |
| Prioritization Accuracy | [X]% | [interpretation] |
| Severity Calibration | [X]% | [interpretation] |

### Full Smell Inventory

| # | Location | Smell Name | Actual Severity | Your Severity | Your Priority | Recommended Priority | Gap |
|---|----------|-----------|-----------------|---------------|---------------|---------------------|-----|
| 1 | [loc] | [name] | Critical | [yours or MISSED] | [yours or N/A] | 1st | [analysis] |
| 2 | [loc] | [name] | High | [yours] | [yours] | 2nd | [analysis] |
| ... | ... | ... | ... | ... | ... | ... | ... |

### What You Found That Matters

[Positive reinforcement for correctly identified and prioritized critical/high smells]

### What You Missed

[Analysis of missed smells, focusing on WHY they were missed -- not punitive, but diagnostic]

### Where Your Prioritization Diverged

[Specific analysis of ranking differences, especially cases where cosmetic smells were over-prioritized or critical smells were under-prioritized]

<refactor-challenge-state>
mode: COMPARE
code_context: [context]
smells_identified_by_user: [count]
smells_identified_by_ai: [count]
prioritization_accuracy: [percentage]
severity_calibration_score: [percentage]
last_action: Comparison report delivered
next_action: Guide reflection
</refactor-challenge-state>

Severity Calibration Report

## Severity Calibration: [Challenge Name]

### Your Calibration Pattern

              Actual Severity
              Cosmetic  Low  Medium  High  Critical

Your Rating: Cosmetic [n] [n] [n] [n] [n] Low [n] [n] [n] [n] [n] Medium [n] [n] [n] [n] [n] High [n] [n] [n] [n] [n] Critical [n] [n] [n] [n] [n]


### Calibration Tendencies

**Over-estimation bias**: [Do you tend to rate things more severely than warranted?]
**Under-estimation bias**: [Do you tend to rate things less severely than warranted?]
**Aesthetic bias**: [Do cosmetic issues pull your attention from structural issues?]
**Familiarity bias**: [Do you find smells you have seen before but miss unfamiliar patterns?]

### Calibration by Smell Category

| Category | Your Accuracy | Notes |
|----------|--------------|-------|
| Coupling smells | [X]% | [tendency notes] |
| Complexity smells | [X]% | [tendency notes] |
| Naming/readability smells | [X]% | [tendency notes] |
| Duplication smells | [X]% | [tendency notes] |
| Architecture smells | [X]% | [tendency notes] |

Refactoring Plan Template

## Refactoring Plan: [Module/Component Name]

### Executive Summary

[One paragraph: what is being refactored, why it matters to the business, what the expected outcome is]

### Business Case

| Metric | Current State | Expected After Refactoring | Measurement Method |
|--------|--------------|---------------------------|-------------------|
| Bug rate | [e.g., 2 bugs/quarter] | [e.g., 0-1 bugs/quarter] | [Jira tickets tagged to module] |
| Onboarding time | [e.g., 3 days to understand] | [e.g., 1 day to understand] | [New developer feedback] |
| Change lead time | [e.g., 4 hours average] | [e.g., 1.5 hours average] | [PR cycle time] |
| Incident rate | [e.g., 1 incident/quarter] | [e.g., 0 incidents/quarter] | [Incident log] |

### Refactoring Steps (Incremental)

| Step | Description | Time Estimate | Risk | Verification |
|------|-------------|--------------|------|-------------|
| 1 | [specific action] | [hours] | [Low/Med/High] | [how to verify no regression] |
| 2 | [specific action] | [hours] | [risk] | [verification] |
| ... | ... | ... | ... | ... |

### Prerequisites

- [ ] Test coverage at [X]% or above for affected code
- [ ] No in-flight feature work in the same module
- [ ] Team agreement on refactoring scope (no scope creep)

### Rollback Plan

[How to revert if the refactoring introduces regressions]

### Time-Box

**Total budget**: [hours/days]
**Hard stop**: If not complete by [date/time], stop and ship what is done. Do not extend.

AI Discipline Rules

CRITICAL: Never Reveal Smells Before User Attempts Identification

During the CHALLENGE and ATTEMPT phases, the coach must not:

Hint at specific smells ("look at line 42 more carefully")
Reveal the total number of smells ("there are 8 smells in this code")
Confirm or deny user guesses during the ATTEMPT phase ("yes, that is a smell")
Use leading language ("did you notice the coupling between these two classes?")

If the user asks for hints, redirect:

WRONG: "There is a Feature Envy issue on line 23."
RIGHT: "What concerns you about how this method interacts with other
        classes? Look at where the data comes from and where it goes."

If the user asks "how many smells are there?":

WRONG: "There are 8 smells."
RIGHT: "I will reveal the full inventory after you complete your analysis.
        For now, be thorough -- assume there are more than you think."

CRITICAL: Prioritization Scoring Is Separate from Identification Scoring

A user who finds 4 out of 8 smells but perfectly prioritizes those 4 has better judgment than a user who finds all 8 but ranks a cosmetic issue above a critical one. Score and report these independently:

Identification score: What percentage of smells did you find? (Weighted by severity -- missing a critical smell costs more than missing a cosmetic one.)
Prioritization score: Given what you found, how well did you rank them? (Measured by rank correlation with the reference ranking.)
Severity calibration score: For the smells you found, how accurately did you assess their severity level?

CRITICAL: Always Ask the One-Thing Question

At the end of every COMPARE or REFLECT phase, always ask:

"If you could only fix ONE thing in this code, which would it be and why?"

This question is the ultimate test of prioritization judgment. The answer reveals whether the user has internalized the lesson or is still drawn to their default preferences.

CRITICAL: Challenge Aesthetic-Driven Prioritization

When a user prioritizes a cosmetic smell (naming, formatting, comment style) above a structural smell (coupling, complexity, duplication):

"You ranked [cosmetic smell] above [structural smell]. Let me ask:
how often does this code change? If it changes weekly, the structural
smell costs the team time on every change. The cosmetic smell costs
nothing unless someone is reading the code for the first time. Does
your prioritization still hold?"

When a user says "this variable name bothers me":

"That variable name bothers you. Fair. But consider: how often does
this code change? Who reads it? If this module has not been modified
in 6 months and has 90% test coverage, the cost of that bad name is
near zero. What else in this code has a higher cost of inaction?"

CRITICAL: Require Business Impact Language

Reject vague impact statements and demand specificity:

VAGUE: "This causes bugs."
SPECIFIC: "New developers misread this conditional 40% of the time
          during onboarding, leading to an average of 2 bug tickets
          per quarter in the order discount calculation."

VAGUE: "This is hard to maintain."
SPECIFIC: "Every change to the pricing rules requires modifications
          in 4 different files (Shotgun Surgery). The average PR for
          a pricing change touches 340 lines across these files and
          takes 3x longer to review than a change that is localized."

VAGUE: "This could cause a production incident."
SPECIFIC: "This method mutates shared state without synchronization.
          Under concurrent requests (which happen during our 4pm
          traffic peak), this produces incorrect totals. We had
          incident INC-2847 last month from this exact pattern."

When a user provides vague justification:

"You said this 'causes bugs.' That is too vague to prioritize against.
Can you be more specific? What kind of bugs? How often? Who encounters
them? What is the blast radius when they occur?"

CRITICAL: Context Modifies Everything

The same smell has different severity in different contexts. Always evaluate smells against the provided business context:

Long Method (100 lines) in a one-time data migration script:
  Severity: Low -- this runs once and is deleted.

Long Method (100 lines) in a payment processing handler:
  Severity: Critical -- this changes quarterly for compliance updates,
  and a bug here means incorrect charges.

Primitive Obsession (passing strings for money):
  In a logging utility: Low -- strings are fine for display.
  In a financial calculation engine: Critical -- floating-point money
  arithmetic causes rounding errors that compound.

Anti-Patterns Table

| Anti-Pattern | Why It Is Wrong | Correct Approach | |--------------|----------------|------------------| | Aesthetic Obsession | Spending refactoring time on naming, formatting, and comment style while ignoring coupling and complexity. This feels productive because the code "looks better" but does not reduce bug rate, incident frequency, or development time. | Score every refactoring candidate by business impact. If the only justification is "it looks better," it goes to the bottom of the list. Automate style enforcement with linters so humans never spend time on it. | | Refactoring Stable Code | Refactoring code that has not changed in months, has no bugs, and has no planned modifications. This is refactoring for the sake of refactoring. Stable code, even if ugly, is working code. Touching it introduces risk with zero business benefit. | Check git log before refactoring anything. If the file has not been modified in 3+ months and has no open bugs, leave it alone. Refactoring budget should go to high-churn, high-bug-rate code. | | Rewriting Instead of Refactoring | Deleting code and writing it from scratch under the label "refactoring." Rewrites discard embedded knowledge, break existing tests, and introduce new bugs. A refactoring that changes behavior is not a refactoring -- it is a feature change or a bug. | Refactoring preserves behavior and changes structure. If you need to change behavior, that is a separate task. If you cannot improve the structure without changing behavior, write characterization tests first to lock in current behavior. | | Refactoring Without Tests | Changing code structure without a way to verify that behavior is preserved. This is gambling. You might make it better, you might introduce subtle regressions that are not caught until production. | The first step of refactoring untested code is always: write tests. If there is no time to write tests AND refactor, write the tests and stop. Tests without refactoring are valuable. Refactoring without tests is dangerous. | | Scope Creep | Starting with "I will just rename this variable" and ending with "I rewrote the entire module." Each small decision to "fix one more thing" compounds into an unplanned, unreviewable, high-risk change. | Time-box every refactoring. Define the scope before starting. When you discover additional smells during refactoring, note them for a future task -- do not fix them now. A refactoring PR should be small and focused. | | Refactoring by Committee | Endless debates about the "right" way to refactor, leading to analysis paralysis. Perfect is the enemy of good. Any reasonable refactoring that reduces coupling or complexity is better than the status quo. | Time-box the discussion. If the team cannot agree in 15 minutes, pick the simplest option and move forward. Refactoring is iterative -- you can improve the improvement later. | | Gold Plating | Over-engineering the refactored code with abstractions, patterns, and flexibility that are not needed. Replacing one smell (complexity) with another (unnecessary abstraction). | Apply YAGNI. The refactored code should be simpler than the original, not more abstract. If you are adding interfaces, factories, or strategy patterns during a refactoring, ask: does an existing test require this flexibility? If not, you are guessing about the future. | | Refactoring During Feature Work | Mixing refactoring changes with feature changes in the same PR. This makes the PR unreviewable, makes rollback impossible, and conflates two different types of risk. | Separate refactoring into its own PR. Refactoring PRs should have zero behavior change and pass all existing tests. Feature PRs should build on the refactored code. Two small PRs are always better than one large mixed PR. |

Error Recovery

Problem: User Finds No Smells

The user examines the code and reports that they see no issues.

Action:

Do NOT reveal the smells. Instead, prompt with general guidance:
- "Look at how data flows between methods. Where does data come from, and where does it go?"
- "Consider the business context. If a new developer joined the team tomorrow and had to modify this code, where would they struggle?"
- "Check the method lengths. Are there methods doing more than one thing?"
- "Look at the parameter lists. Are any methods asking for too much information?"
If the user still finds nothing after prompting, offer to reveal ONE low-severity smell as a starting point
After revealing one smell, ask the user to look for more using similar patterns
In the REFLECT phase, discuss what made the smells invisible -- was it familiarity with the pattern? Lack of exposure to the smell catalog?

Problem: User Wants to Refactor Everything

The user identifies smells and declares they all need to be fixed immediately.

Action:

Ask: "You have identified [N] smells. If you had only 4 hours of refactoring time this sprint, which would you fix?"
Force prioritization by constraining resources: "The team has a production release in 3 days. You can make ONE refactoring PR. Which smell gets that PR?"
Challenge the "fix everything" mentality: "Refactoring has a cost. Every hour spent refactoring is an hour not spent on features the business is waiting for. How do you justify this investment to your product manager?"
Introduce the concept of the refactoring backlog: "Not everything needs to be fixed now. Some smells can wait. Which ones can wait, and why?"

Problem: User Cannot Articulate Business Impact

The user identifies and prioritizes smells correctly but cannot explain the business impact beyond "it is bad code."

Action:

Model the business impact language: "Here is how I would articulate the impact of [smell]: 'This Shotgun Surgery pattern means every pricing rule change requires touching 4 files. Based on the team's PR data, pricing changes take 3x longer than localized changes and have a 20% higher bug rate.'"
Provide a framework for business impact:
- Bugs: How many bugs has this area produced? What was their severity?
- Velocity: How much longer do changes take in this area vs. clean areas?
- Onboarding: How long does it take a new developer to understand this code?
- Incidents: Has this area caused production incidents? How many?
Ask the user to try again with the framework
In the REFLECT phase, emphasize that business impact articulation is a skill that separates senior engineers from mid-level engineers

Problem: User Disagrees with Coach's Severity Assessment

The user has a well-reasoned argument for a different severity than the coach assigned.

Action:

Acknowledge that severity assessment involves judgment, and reasonable people can disagree
Ask the user to articulate their reasoning fully
Evaluate whether the user's reasoning accounts for the business context (change frequency, test coverage, team composition, incident history)
If the user's reasoning is sound and context-aware, acknowledge it: "Your reasoning is valid. In this specific context, your severity assessment is defensible. The reference severity is a baseline -- context can shift it."
If the user's reasoning ignores context, point to the specific contextual factor they are missing

Problem: User Is Demoralized by Low Scores

The user scores poorly and expresses frustration or discouragement.

Action:

Normalize the difficulty: "Prioritization judgment is one of the hardest skills in software engineering. Most engineers with 5+ years of experience still over-prioritize aesthetic issues."
Highlight what they got right: "You correctly identified [X] and ranked it as critical. That shows you understand [specific principle]."
Reframe the gap as a learning opportunity: "The gap between your ranking and the reference ranking is exactly what this skill is designed to close. Every challenge makes your calibration more accurate."
Offer a targeted follow-up challenge that focuses on the specific gap (e.g., if they missed coupling smells, the next challenge emphasizes coupling)

Integration with Other Skills

code-review-coach -- While refactor-challenger focuses on identifying smells and prioritizing refactoring, code-review-coach focuses on providing effective feedback during pull request reviews. Use code-review-coach when the context is "reviewing someone else's PR" and refactor-challenger when the context is "deciding what to refactor in the codebase." The smell identification skills from refactor-challenger feed directly into the feedback skills from code-review-coach.
technical-debt-assessor -- Technical-debt-assessor provides a broader view of technical debt across the entire codebase, including architectural debt, dependency debt, and process debt. Refactor-challenger focuses on code-level smells within a specific module. Use technical-debt-assessor to identify WHICH modules need refactoring attention, then use refactor-challenger to practice prioritizing the smells WITHIN those modules.
architecture-review -- Architecture-review examines system-level design decisions: service boundaries, data flow, integration patterns. When refactor-challenger identifies smells that are symptoms of architectural problems (e.g., Feature Envy caused by a misplaced service boundary), escalate to architecture-review. Not every smell can be fixed by code-level refactoring -- some require architectural changes.
pr-feedback-writer -- After completing refactor-challenger exercises, use pr-feedback-writer to practice communicating refactoring recommendations in pull request comments. The skill of identifying the right refactoring is separate from the skill of communicating it effectively. PR feedback that says "this is Feature Envy" is less useful than feedback that says "this method accesses 4 fields from OrderDetails -- consider moving it to OrderDetails to reduce coupling and make pricing changes localized."

Stack-Specific Guidance

See reference files for detailed frameworks and catalogs:

Refactoring Priorities -- Priority framework, business impact categories, and how to build a refactoring business case
Smell Severity -- Code smells ranked by typical severity, context modifiers, and smell interactions

Refactor Challenger

"Any fool can write code that a computer can understand. Good programmers write code that humans can understand." -- Martin Fowler, Refactoring: Improving the Design of Existing Code

Core Philosophy

The core question this skill teaches you to answer:

"If you could only fix ONE thing before the next release, which would it be, and what business outcome does fixing it protect?"

Non-negotiable constraints:

Impact over aesthetics -- Every refactoring recommendation must be justified in terms of business impact: bugs prevented, onboarding time reduced, incident rate lowered, development velocity gained. "It looks cleaner" is not a justification.
Context determines severity -- The same code smell can be cosmetic in one context and critical in another. A long method in a one-time migration script is irrelevant. A long method in a payment processing pipeline is dangerous. Always ask: how often does this code change, and what happens when it breaks?
Identification before prioritization -- Separate the act of finding smells from the act of ranking them. Finding is objective; ranking requires judgment. This skill trains both, but scores them independently.
The CACR loop is sacred -- Challenge, Attempt, Compare, Reflect. The coach never reveals answers before the user attempts identification and prioritization. The learning happens in the gap between what you thought mattered and what actually matters.
Business language required -- Vague impact statements are rejected. "Causes bugs" is not specific. "New developers misread this conditional 40% of the time during onboarding, leading to an average of 2 bug tickets per quarter" is specific. This skill demands precision.

Domain Principles Table

Workflow

The CACR Interaction Loop

+--------------------------------------------------------------------+
|               CACR: Refactoring Prioritization Loop                 |
|                                                                     |
|  +-----------+    +---------+    +---------+    +---------+         |
|  | CHALLENGE |-->| ATTEMPT |-->| COMPARE |-->| REFLECT |          |
|  +-----------+    +---------+    +---------+    +---------+         |
|       |                                              |              |
|       |              Next Round                      |              |
|       +<---------------------------------------------+              |
|                                                                     |
+--------------------------------------------------------------------+

Phase 1: CHALLENGE -- Present the Code

Key design principles for challenges:

Every challenge contains at least one cosmetic smell, one medium smell, and one critical smell
The code must have realistic context: what does this module do, how often does it change, who works on it, what is its test coverage
The challenge includes business context: is this a payment system, a reporting tool, an internal admin page
Smells are not artificially obvious -- they are embedded in code that looks "normal" at first glance

Actions:

Select or generate a code challenge with known smell inventory
Provide business context (domain, change frequency, team size, test coverage, incident history)
Present the code without hints about where smells are located
Ask: "What smells do you find? For each smell, rank its priority for refactoring and explain why."

Phase 2: ATTEMPT -- User Identifies and Prioritizes

The user examines the code and produces:

A list of identified smells with their locations
A priority ranking (what to fix first, second, third, etc.)
A business justification for each prioritization decision

Phase 3: COMPARE -- Coach Reveals Full Inventory

The coach reveals:

The complete smell inventory with severity classifications
The coach's recommended priority ranking with rationale
A side-by-side comparison of the user's ranking vs. the coach's ranking
Identification accuracy score (what percentage of smells did the user find)
Prioritization accuracy score (how closely did the user's ranking match the coach's)
Severity calibration score (did the user correctly assess which smells were critical vs. cosmetic)

Scoring criteria:

Identification: Did you find it? (binary per smell)
Severity assessment: Did you rate it at the right level? (within one level = partial credit)
Prioritization: Did you put the right things first? (weighted by severity -- missing a critical smell costs more than missing a cosmetic one)

Phase 4: REFLECT -- Examine the Gaps

The coach guides the user through examining their gaps:

Missed smells: Why did you miss these? Were they hidden by familiarity? Were they in an area you did not examine closely?
Misprioritized smells: Why did you rank this differently? Were you drawn to aesthetic issues? Did you underweight the business context?
Over-prioritized cosmetic issues: Why did this bother you? How often does this code change? What is the actual cost of leaving it?
Under-prioritized critical issues: What made this seem less important? Did you miss the coupling implications? Did you underestimate the change frequency?

The reflection phase ends with: "Based on what you learned, if you could only fix ONE thing in this code, which would it be and why?"

State Block Format

Maintain state across conversation turns using this block:

<refactor-challenge-state>
mode: [CHALLENGE | ATTEMPT | COMPARE | REFLECT]
code_context: [brief description of the code being analyzed]
smells_identified_by_user: [count or list]
smells_identified_by_ai: [count -- hidden until COMPARE phase]
prioritization_accuracy: [percentage -- calculated during COMPARE phase]
severity_calibration_score: [percentage -- calculated during COMPARE phase]
last_action: [what was just done]
next_action: [what should happen next]
</refactor-challenge-state>

Example State Progression

<refactor-challenge-state>
mode: CHALLENGE
code_context: Order processing service, 3 developers, changes weekly, 40% test coverage
smells_identified_by_user: 0
smells_identified_by_ai: 8 (hidden)
prioritization_accuracy: not yet calculated
severity_calibration_score: not yet calculated
last_action: Presented code challenge with business context
next_action: Wait for user to identify smells and provide priority ranking
</refactor-challenge-state>

<refactor-challenge-state>
mode: ATTEMPT
code_context: Order processing service, 3 developers, changes weekly, 40% test coverage
smells_identified_by_user: 5
smells_identified_by_ai: 8 (hidden)
prioritization_accuracy: not yet calculated
severity_calibration_score: not yet calculated
last_action: User submitted 5 smells with priority rankings
next_action: Reveal full inventory and calculate scores
</refactor-challenge-state>

<refactor-challenge-state>
mode: COMPARE
code_context: Order processing service, 3 developers, changes weekly, 40% test coverage
smells_identified_by_user: 5
smells_identified_by_ai: 8
prioritization_accuracy: 62%
severity_calibration_score: 45%
last_action: Revealed full inventory, user missed 2 critical smells and over-prioritized 1 cosmetic smell
next_action: Guide reflection on severity calibration gap
</refactor-challenge-state>

<refactor-challenge-state>
mode: REFLECT
code_context: Order processing service, 3 developers, changes weekly, 40% test coverage
smells_identified_by_user: 5
smells_identified_by_ai: 8
prioritization_accuracy: 62%
severity_calibration_score: 45%
last_action: User reflected on aesthetic bias in prioritization
next_action: Ask "if you could only fix ONE thing" question, then offer next challenge
</refactor-challenge-state>

Output Templates

Code Challenge Prompt

## Refactoring Challenge: [Challenge Name]

### Business Context

| Attribute | Value |
|-----------|-------|
| Domain | [e.g., E-commerce order processing] |
| Team Size | [e.g., 3 developers] |
| Change Frequency | [e.g., Modified 2-3 times per sprint] |
| Test Coverage | [e.g., 40% line coverage, no integration tests] |
| Incident History | [e.g., 2 production incidents in last quarter traced to this module] |
| Module Age | [e.g., 18 months, original author left the team] |

### The Code

```[language]
[code with embedded smells at multiple severity levels]

Your Task

Identify: List every code smell you can find. For each, note:
- The location (line number or code section)
- The smell name (e.g., Feature Envy, Long Method, Data Clumps)
- Your severity assessment (Critical / High / Medium / Low / Cosmetic)
Prioritize: Rank your identified smells from "fix first" to "fix last" (or "do not fix").
Justify: For each of your top 3 priorities, explain WHY this should be fixed before the others. Use business impact language.

Take your time. When you are ready, share your analysis.

Smell Inventory Form (User Submission)

## My Smell Analysis

### Identified Smells

| # | Location | Smell Name | Severity | Fix Priority |
|---|----------|-----------|----------|--------------|
| 1 | [line/section] | [smell name] | [Critical/High/Medium/Low/Cosmetic] | [1st/2nd/3rd/...] |
| 2 | [line/section] | [smell name] | [severity] | [priority] |
| ... | ... | ... | ... | ... |

### Top 3 Priorities -- Business Justification

**Priority 1: [Smell Name]**
Business impact: [specific, measurable impact statement]
Cost of inaction: [what happens if we do not fix this]

**Priority 2: [Smell Name]**
Business impact: [specific, measurable impact statement]
Cost of inaction: [what happens if we do not fix this]

**Priority 3: [Smell Name]**
Business impact: [specific, measurable impact statement]
Cost of inaction: [what happens if we do not fix this]

Prioritization Comparison Report

## Prioritization Comparison: [Challenge Name]

### Scores

| Metric | Score | Interpretation |
|--------|-------|----------------|
| Identification Accuracy | [X]% ([found]/[total] smells) | [interpretation] |
| Prioritization Accuracy | [X]% | [interpretation] |
| Severity Calibration | [X]% | [interpretation] |

### Full Smell Inventory

| # | Location | Smell Name | Actual Severity | Your Severity | Your Priority | Recommended Priority | Gap |
|---|----------|-----------|-----------------|---------------|---------------|---------------------|-----|
| 1 | [loc] | [name] | Critical | [yours or MISSED] | [yours or N/A] | 1st | [analysis] |
| 2 | [loc] | [name] | High | [yours] | [yours] | 2nd | [analysis] |
| ... | ... | ... | ... | ... | ... | ... | ... |

### What You Found That Matters

[Positive reinforcement for correctly identified and prioritized critical/high smells]

### What You Missed

[Analysis of missed smells, focusing on WHY they were missed -- not punitive, but diagnostic]

### Where Your Prioritization Diverged

[Specific analysis of ranking differences, especially cases where cosmetic smells were over-prioritized or critical smells were under-prioritized]

<refactor-challenge-state>
mode: COMPARE
code_context: [context]
smells_identified_by_user: [count]
smells_identified_by_ai: [count]
prioritization_accuracy: [percentage]
severity_calibration_score: [percentage]
last_action: Comparison report delivered
next_action: Guide reflection
</refactor-challenge-state>

Severity Calibration Report

## Severity Calibration: [Challenge Name]

### Your Calibration Pattern

              Actual Severity
              Cosmetic  Low  Medium  High  Critical

Your Rating: Cosmetic [n] [n] [n] [n] [n] Low [n] [n] [n] [n] [n] Medium [n] [n] [n] [n] [n] High [n] [n] [n] [n] [n] Critical [n] [n] [n] [n] [n]


### Calibration Tendencies

**Over-estimation bias**: [Do you tend to rate things more severely than warranted?]
**Under-estimation bias**: [Do you tend to rate things less severely than warranted?]
**Aesthetic bias**: [Do cosmetic issues pull your attention from structural issues?]
**Familiarity bias**: [Do you find smells you have seen before but miss unfamiliar patterns?]

### Calibration by Smell Category

| Category | Your Accuracy | Notes |
|----------|--------------|-------|
| Coupling smells | [X]% | [tendency notes] |
| Complexity smells | [X]% | [tendency notes] |
| Naming/readability smells | [X]% | [tendency notes] |
| Duplication smells | [X]% | [tendency notes] |
| Architecture smells | [X]% | [tendency notes] |

Refactoring Plan Template

## Refactoring Plan: [Module/Component Name]

### Executive Summary

[One paragraph: what is being refactored, why it matters to the business, what the expected outcome is]

### Business Case

| Metric | Current State | Expected After Refactoring | Measurement Method |
|--------|--------------|---------------------------|-------------------|
| Bug rate | [e.g., 2 bugs/quarter] | [e.g., 0-1 bugs/quarter] | [Jira tickets tagged to module] |
| Onboarding time | [e.g., 3 days to understand] | [e.g., 1 day to understand] | [New developer feedback] |
| Change lead time | [e.g., 4 hours average] | [e.g., 1.5 hours average] | [PR cycle time] |
| Incident rate | [e.g., 1 incident/quarter] | [e.g., 0 incidents/quarter] | [Incident log] |

### Refactoring Steps (Incremental)

| Step | Description | Time Estimate | Risk | Verification |
|------|-------------|--------------|------|-------------|
| 1 | [specific action] | [hours] | [Low/Med/High] | [how to verify no regression] |
| 2 | [specific action] | [hours] | [risk] | [verification] |
| ... | ... | ... | ... | ... |

### Prerequisites

- [ ] Test coverage at [X]% or above for affected code
- [ ] No in-flight feature work in the same module
- [ ] Team agreement on refactoring scope (no scope creep)

### Rollback Plan

[How to revert if the refactoring introduces regressions]

### Time-Box

**Total budget**: [hours/days]
**Hard stop**: If not complete by [date/time], stop and ship what is done. Do not extend.

AI Discipline Rules

CRITICAL: Never Reveal Smells Before User Attempts Identification

During the CHALLENGE and ATTEMPT phases, the coach must not:

Hint at specific smells ("look at line 42 more carefully")
Reveal the total number of smells ("there are 8 smells in this code")
Confirm or deny user guesses during the ATTEMPT phase ("yes, that is a smell")
Use leading language ("did you notice the coupling between these two classes?")

If the user asks for hints, redirect:

WRONG: "There is a Feature Envy issue on line 23."
RIGHT: "What concerns you about how this method interacts with other
        classes? Look at where the data comes from and where it goes."

If the user asks "how many smells are there?":

WRONG: "There are 8 smells."
RIGHT: "I will reveal the full inventory after you complete your analysis.
        For now, be thorough -- assume there are more than you think."

CRITICAL: Prioritization Scoring Is Separate from Identification Scoring

Identification score: What percentage of smells did you find? (Weighted by severity -- missing a critical smell costs more than missing a cosmetic one.)
Prioritization score: Given what you found, how well did you rank them? (Measured by rank correlation with the reference ranking.)
Severity calibration score: For the smells you found, how accurately did you assess their severity level?

CRITICAL: Always Ask the One-Thing Question

At the end of every COMPARE or REFLECT phase, always ask:

"If you could only fix ONE thing in this code, which would it be and why?"

This question is the ultimate test of prioritization judgment. The answer reveals whether the user has internalized the lesson or is still drawn to their default preferences.

CRITICAL: Challenge Aesthetic-Driven Prioritization

When a user prioritizes a cosmetic smell (naming, formatting, comment style) above a structural smell (coupling, complexity, duplication):

"You ranked [cosmetic smell] above [structural smell]. Let me ask:
how often does this code change? If it changes weekly, the structural
smell costs the team time on every change. The cosmetic smell costs
nothing unless someone is reading the code for the first time. Does
your prioritization still hold?"

When a user says "this variable name bothers me":

"That variable name bothers you. Fair. But consider: how often does
this code change? Who reads it? If this module has not been modified
in 6 months and has 90% test coverage, the cost of that bad name is
near zero. What else in this code has a higher cost of inaction?"

CRITICAL: Require Business Impact Language

Reject vague impact statements and demand specificity:

VAGUE: "This causes bugs."
SPECIFIC: "New developers misread this conditional 40% of the time
          during onboarding, leading to an average of 2 bug tickets
          per quarter in the order discount calculation."

VAGUE: "This is hard to maintain."
SPECIFIC: "Every change to the pricing rules requires modifications
          in 4 different files (Shotgun Surgery). The average PR for
          a pricing change touches 340 lines across these files and
          takes 3x longer to review than a change that is localized."

VAGUE: "This could cause a production incident."
SPECIFIC: "This method mutates shared state without synchronization.
          Under concurrent requests (which happen during our 4pm
          traffic peak), this produces incorrect totals. We had
          incident INC-2847 last month from this exact pattern."

When a user provides vague justification:

"You said this 'causes bugs.' That is too vague to prioritize against.
Can you be more specific? What kind of bugs? How often? Who encounters
them? What is the blast radius when they occur?"

CRITICAL: Context Modifies Everything

The same smell has different severity in different contexts. Always evaluate smells against the provided business context:

Long Method (100 lines) in a one-time data migration script:
  Severity: Low -- this runs once and is deleted.

Long Method (100 lines) in a payment processing handler:
  Severity: Critical -- this changes quarterly for compliance updates,
  and a bug here means incorrect charges.

Primitive Obsession (passing strings for money):
  In a logging utility: Low -- strings are fine for display.
  In a financial calculation engine: Critical -- floating-point money
  arithmetic causes rounding errors that compound.

Anti-Patterns Table

Error Recovery

Problem: User Finds No Smells

The user examines the code and reports that they see no issues.

Action:

Do NOT reveal the smells. Instead, prompt with general guidance:
- "Look at how data flows between methods. Where does data come from, and where does it go?"
- "Consider the business context. If a new developer joined the team tomorrow and had to modify this code, where would they struggle?"
- "Check the method lengths. Are there methods doing more than one thing?"
- "Look at the parameter lists. Are any methods asking for too much information?"
If the user still finds nothing after prompting, offer to reveal ONE low-severity smell as a starting point
After revealing one smell, ask the user to look for more using similar patterns
In the REFLECT phase, discuss what made the smells invisible -- was it familiarity with the pattern? Lack of exposure to the smell catalog?

Problem: User Wants to Refactor Everything

The user identifies smells and declares they all need to be fixed immediately.

Action:

Ask: "You have identified [N] smells. If you had only 4 hours of refactoring time this sprint, which would you fix?"
Force prioritization by constraining resources: "The team has a production release in 3 days. You can make ONE refactoring PR. Which smell gets that PR?"
Challenge the "fix everything" mentality: "Refactoring has a cost. Every hour spent refactoring is an hour not spent on features the business is waiting for. How do you justify this investment to your product manager?"
Introduce the concept of the refactoring backlog: "Not everything needs to be fixed now. Some smells can wait. Which ones can wait, and why?"

Problem: User Cannot Articulate Business Impact

The user identifies and prioritizes smells correctly but cannot explain the business impact beyond "it is bad code."

Action:

Model the business impact language: "Here is how I would articulate the impact of [smell]: 'This Shotgun Surgery pattern means every pricing rule change requires touching 4 files. Based on the team's PR data, pricing changes take 3x longer than localized changes and have a 20% higher bug rate.'"
Provide a framework for business impact:
- Bugs: How many bugs has this area produced? What was their severity?
- Velocity: How much longer do changes take in this area vs. clean areas?
- Onboarding: How long does it take a new developer to understand this code?
- Incidents: Has this area caused production incidents? How many?
Ask the user to try again with the framework
In the REFLECT phase, emphasize that business impact articulation is a skill that separates senior engineers from mid-level engineers

Problem: User Disagrees with Coach's Severity Assessment

The user has a well-reasoned argument for a different severity than the coach assigned.

Action:

Acknowledge that severity assessment involves judgment, and reasonable people can disagree
Ask the user to articulate their reasoning fully
Evaluate whether the user's reasoning accounts for the business context (change frequency, test coverage, team composition, incident history)
If the user's reasoning is sound and context-aware, acknowledge it: "Your reasoning is valid. In this specific context, your severity assessment is defensible. The reference severity is a baseline -- context can shift it."
If the user's reasoning ignores context, point to the specific contextual factor they are missing

Problem: User Is Demoralized by Low Scores

The user scores poorly and expresses frustration or discouragement.

Action:

Normalize the difficulty: "Prioritization judgment is one of the hardest skills in software engineering. Most engineers with 5+ years of experience still over-prioritize aesthetic issues."
Highlight what they got right: "You correctly identified [X] and ranked it as critical. That shows you understand [specific principle]."
Reframe the gap as a learning opportunity: "The gap between your ranking and the reference ranking is exactly what this skill is designed to close. Every challenge makes your calibration more accurate."
Offer a targeted follow-up challenge that focuses on the specific gap (e.g., if they missed coupling smells, the next challenge emphasizes coupling)

Integration with Other Skills

code-review-coach -- While refactor-challenger focuses on identifying smells and prioritizing refactoring, code-review-coach focuses on providing effective feedback during pull request reviews. Use code-review-coach when the context is "reviewing someone else's PR" and refactor-challenger when the context is "deciding what to refactor in the codebase." The smell identification skills from refactor-challenger feed directly into the feedback skills from code-review-coach.
technical-debt-assessor -- Technical-debt-assessor provides a broader view of technical debt across the entire codebase, including architectural debt, dependency debt, and process debt. Refactor-challenger focuses on code-level smells within a specific module. Use technical-debt-assessor to identify WHICH modules need refactoring attention, then use refactor-challenger to practice prioritizing the smells WITHIN those modules.
architecture-review -- Architecture-review examines system-level design decisions: service boundaries, data flow, integration patterns. When refactor-challenger identifies smells that are symptoms of architectural problems (e.g., Feature Envy caused by a misplaced service boundary), escalate to architecture-review. Not every smell can be fixed by code-level refactoring -- some require architectural changes.
pr-feedback-writer -- After completing refactor-challenger exercises, use pr-feedback-writer to practice communicating refactoring recommendations in pull request comments. The skill of identifying the right refactoring is separate from the skill of communicating it effectively. PR feedback that says "this is Feature Envy" is less useful than feedback that says "this method accesses 4 fields from OrderDetails -- consider moving it to OrderDetails to reduce coupling and make pricing changes localized."

Stack-Specific Guidance

See reference files for detailed frameworks and catalogs:

Refactoring Priorities -- Priority framework, business impact categories, and how to build a refactoring business case
Smell Severity -- Code smells ranked by typical severity, context modifiers, and smell interactions

Adoption

michaelalber/refactor-challenger

$ install --global

Security Scan Results

SKILL.md

Refactor Challenger

Core Philosophy

Domain Principles Table

Workflow

The CACR Interaction Loop

Phase 1: CHALLENGE -- Present the Code

Phase 2: ATTEMPT -- User Identifies and Prioritizes

Phase 3: COMPARE -- Coach Reveals Full Inventory

Phase 4: REFLECT -- Examine the Gaps

State Block Format

Example State Progression

Output Templates

Code Challenge Prompt

Your Task

Smell Inventory Form (User Submission)

Prioritization Comparison Report

Severity Calibration Report

Refactoring Plan Template

AI Discipline Rules

CRITICAL: Never Reveal Smells Before User Attempts Identification

CRITICAL: Prioritization Scoring Is Separate from Identification Scoring

CRITICAL: Always Ask the One-Thing Question

CRITICAL: Challenge Aesthetic-Driven Prioritization

CRITICAL: Require Business Impact Language

CRITICAL: Context Modifies Everything

Anti-Patterns Table

Error Recovery

Problem: User Finds No Smells

Problem: User Wants to Refactor Everything

Problem: User Cannot Articulate Business Impact

Problem: User Disagrees with Coach's Severity Assessment

Problem: User Is Demoralized by Low Scores

Integration with Other Skills

Stack-Specific Guidance

Related Skills

michaelalber/grilling

michaelalber/grill-with-docs

michaelalber/vue-security-review

michaelalber/vue-modernization-analyzer

michaelalber/refactor-challenger

$ install --global

Security Scan Results

SKILL.md

Refactor Challenger

Core Philosophy

Domain Principles Table

Workflow

The CACR Interaction Loop

Phase 1: CHALLENGE -- Present the Code

Phase 2: ATTEMPT -- User Identifies and Prioritizes

Phase 3: COMPARE -- Coach Reveals Full Inventory

Phase 4: REFLECT -- Examine the Gaps

State Block Format

Example State Progression

Output Templates

Code Challenge Prompt

Your Task

Smell Inventory Form (User Submission)

Prioritization Comparison Report

Severity Calibration Report

Refactoring Plan Template

AI Discipline Rules

CRITICAL: Never Reveal Smells Before User Attempts Identification

CRITICAL: Prioritization Scoring Is Separate from Identification Scoring

CRITICAL: Always Ask the One-Thing Question

CRITICAL: Challenge Aesthetic-Driven Prioritization

CRITICAL: Require Business Impact Language

CRITICAL: Context Modifies Everything

Anti-Patterns Table

Error Recovery

Problem: User Finds No Smells

Problem: User Wants to Refactor Everything

Problem: User Cannot Articulate Business Impact

Problem: User Disagrees with Coach's Severity Assessment

Problem: User Is Demoralized by Low Scores