/analyze-project — Root Cause Analyst Workflow

Analyze AI-assisted coding sessions in brain/ and produce a diagnostic report that explains not just what happened, but why it happened, who/what caused it, and what should change next time.

This workflow is not a simple metrics dashboard. It is a forensic analysis workflow for AI coding sessions.

Primary Objective

For each session, determine:

What changed from the initial ask to the final executed work
Whether the change was caused primarily by:
- the user/spec
- the agent
- the codebase/repo
- testing/verification
- legitimate task complexity
Whether the original prompt was sufficient for the actual job
Which subsystems or files repeatedly correlate with struggle
What concrete changes would most improve future sessions

Core Principles

Treat .resolved.N counts as signals of iteration intensity, not proof of failure
Do not label struggle based on counts alone; classify the shape of rework
Separate human-added scope from necessary discovered scope
Separate agent error from repo friction
Every diagnosis must include evidence
Every recommendation must map to a specific observed pattern
Use confidence levels:
- High = directly supported by artifact contents or timestamps
- Medium = supported by multiple indirect signals
- Low = plausible inference, not directly proven

Step 1: Discovery — Find Relevant Conversations

Read the conversation summaries available in the system context.
List all subdirectories in: `~/.gemini/antigravity/brain/
Build a Conversation Index by cross-referencing summaries with UUID folders.
Record for each conversation:
- conversation_id
- title
- objective
- created
- last_modified
If the user supplied a keyword/path, filter on that. Otherwise analyze all workspace conversations.

Output: indexed list of conversations to analyze.

Step 2: Artifact Extraction — Build Session Evidence

For each conversation, read all structured artifacts that exist.

2a. Core Artifacts

task.md
implementation_plan.md
walkthrough.md

2b. Metadata

*.metadata.json

2c. Version Snapshots

task.md.resolved.0 ... N
implementation_plan.md.resolved.0 ... N
walkthrough.md.resolved.0 ... N

2d. Additional Signals

other .md artifacts
report/evaluation files
timestamps across artifact updates
file/folder names mentioned in plans and walkthroughs
repeated subsystem references
explicit testing/validation language
explicit non-goals or constraints, if present

2e. Record Per Conversation

Presence / Lifecycle

has_task
has_plan
has_walkthrough
is_completed
is_abandoned_candidate = has task but no walkthrough

Revision / Change Volume

task_versions
plan_versions
walkthrough_versions
extra_artifacts

Scope

task_items_initial
task_items_final
task_completed_pct
scope_delta_raw
scope_creep_pct_raw

Timing

created_at
completed_at
duration_minutes

Content / Quality Signals

objective_text
initial_plan_summary
final_plan_summary
initial_task_excerpt
final_task_excerpt
walkthrough_summary
mentioned_files_or_subsystems
validation_requirements_present
acceptance_criteria_present
non_goals_present
scope_boundaries_present
file_targets_present
constraints_present

Step 3: Prompt Sufficiency Analysis

For each conversation, score the opening objective/request on a 0–2 scale for each dimension:

Clarity — is the ask understandable?
Boundedness — are scope limits defined?
Testability — are success conditions or acceptance criteria defined?
Architectural specificity — are files/modules/systems identified?
Constraint awareness — are non-goals, constraints, or environment details included?
Dependency awareness — does the prompt acknowledge affected systems or hidden coupling?

Create:

prompt_sufficiency_score
prompt_sufficiency_band = High / Medium / Low

Then note which missing ingredients likely contributed to later friction.

Important: Do not assume a low-detail prompt is bad by default. Short prompts can still be good if the task is narrow and the repo context is obvious.

Step 4: Scope Change Classification

Do not treat all scope growth as the same.

For each conversation, classify scope delta into:

4a. Human-Added Scope

New items clearly introduced beyond the initial ask. Examples:

optional enhancements
follow-on refactors
“while we are here” additions
cosmetic or adjacent work added later

4b. Necessary Discovered Scope

Work that was not in the opening ask but appears required to complete it correctly. Examples:

dependency fixes
required validation work
hidden integration tasks
migration fallout
coupled module updates

4c. Agent-Introduced Scope

Work that appears not requested and not necessary, likely introduced by agent overreach.

For each conversation record:

scope_change_type_primary
scope_change_type_secondary (optional)
scope_change_confidence
evidence for classification

Step 5: Rework Shape Analysis

Do not just count revisions. Determine the shape of session rework.

Classify each conversation into one of these patterns:

Clean execution — little change, smooth completion
Early replan then stable finish — plan changed early, then execution converged
Progressive scope expansion — work kept growing throughout the session
Reopen/reclose churn — repeated task adjustments/backtracking
Late-stage verification churn — implementation mostly done, but testing/validation caused loops
Abandoned mid-flight — work started but did not reach walkthrough
Exploratory / research session — iterations are high but expected due to problem discovery

Record:

rework_shape
rework_shape_confidence
supporting evidence

Step 6: Root Cause Analysis

For every non-clean session, assign:

6a. Primary Root Cause

Choose one:

SPEC_AMBIGUITY
HUMAN_SCOPE_CHANGE
REPO_FRAGILITY
AGENT_ARCHITECTURAL_ERROR
VERIFICATION_CHURN
LEGITIMATE_TASK_COMPLEXITY

6b. Secondary Root Cause

Optional if a second factor materially contributed.

6c. Evidence Requirements

Every root cause assignment must include:

evidence from artifacts or metadata
why competing causes were rejected
confidence level

6d. Root Cause Heuristics

SPEC_AMBIGUITY

Use when the opening ask lacked boundaries, targets, criteria, or constraints, and the plan had to invent them.

HUMAN_SCOPE_CHANGE

Use when the task set expanded due to new asks, broadened goals, or post-hoc additions.

REPO_FRAGILITY

Use when hidden coupling, unclear architecture, brittle files, or environmental issues forced extra work.

AGENT_ARCHITECTURAL_ERROR

Use when the agent chose the wrong approach, wrong files, wrong assumptions, or hallucinated structure.

VERIFICATION_CHURN

Use when implementation mostly succeeded but tests, validation, QA, or fixes created repeated loops.

LEGITIMATE_TASK_COMPLEXITY

Use when revisions were reasonable given the difficulty and do not strongly indicate avoidable failure.

Step 7: Subsystem / File Clustering

Across all conversations, cluster repeated struggle by subsystem, folder, or file mentions.

Examples:

frontend/auth/*
db.py
ui.py
video_pipeline/*

For each cluster, calculate:

number of conversations touching it
average revisions
completion rate
abandonment rate
common root causes

Output the top recurring friction zones.

Goal: Identify whether struggle is prompt-driven, agent-driven, or concentrated in specific repo areas.

Step 8: Comparative Cohort Analysis

Compare these cohorts:

first-shot successes vs re-planned sessions
completed vs abandoned
high prompt sufficiency vs low prompt sufficiency
narrow-scope vs high-scope-growth
short sessions vs long sessions
low-friction subsystems vs high-friction subsystems

For each comparison, identify:

what differs materially
which prompt traits correlate with smoother execution
which repo traits correlate with repeated struggle

Do not merely restate averages. Extract causal-looking patterns cautiously and label them as inference where appropriate.

Step 9: Non-Obvious Findings

Generate 3–7 findings that are not simple metric restatements.

Good examples:

“Most replans happen in sessions with weak file targeting, not weak acceptance criteria.”
“Scope growth usually begins after the first successful implementation, suggesting post-success human expansion.”
“Auth-related sessions cluster around repo fragility rather than agent hallucination.”
“Abandoned work is strongly associated with missing validation criteria.”

Bad examples:

“Some sessions had many revisions.”
“Some sessions were longer than others.”

Each finding must include:

observation
why it matters
evidence
confidence

Step 10: Report Generation

Create session_analysis_report.md in the current conversation’s brain folder.

Use this structure:

📊 Session Analysis Report — [Project Name]

Generated: [timestamp] Conversations Analyzed: [N] Date Range: [earliest] → [latest]

Executive Summary

| Metric | Value | Rating | |:---|:---|:---| | First-Shot Success Rate | X% | 🟢/🟡/🔴 | | Completion Rate | X% | 🟢/🟡/🔴 | | Avg Scope Growth | X% | 🟢/🟡/🔴 | | Replan Rate | X% | 🟢/🟡/🔴 | | Median Duration | Xm | — | | Avg Revision Intensity | X | 🟢/🟡/🔴 |

Then include a short narrative summary:

what is going well
what is breaking down
whether the main issue is prompt quality, repo fragility, or workflow discipline

Root Cause Breakdown

| Root Cause | Count | % | Notes | |:---|:---|:---|:---| | Spec Ambiguity | X | X% | ... | | Human Scope Change | X | X% | ... | | Repo Fragility | X | X% | ... | | Agent Architectural Error | X | X% | ... | | Verification Churn | X | X% | ... | | Legitimate Task Complexity | X | X% | ... |

Prompt Sufficiency Analysis

common traits of high-sufficiency prompts
common missing inputs in low-sufficiency prompts
which missing prompt ingredients correlate most with replanning or abandonment

Scope Change Analysis

Separate:

Human-added scope
Necessary discovered scope
Agent-introduced scope

Show top offenders in each category.

Rework Shape Analysis

Summarize how sessions tend to fail:

early replan then recover
progressive scope expansion
late verification churn
abandonments
reopen/reclose cycles

Friction Hotspots

Cluster repeated struggle by subsystem/file/domain. Show which areas correlate with:

replanning
abandonment
verification churn
agent architectural mistakes

First-Shot Successes

List the cleanest sessions and extract what made them work:

scope boundaries
acceptance criteria
file targeting
validation clarity
narrowness of change surface

Non-Obvious Findings

List 3–7 high-value findings with evidence and confidence.

Recommendations

Each recommendation must use this format:

Recommendation [N]

Observed pattern
Likely cause
Evidence
Change to make
Expected benefit
Confidence

Recommendations must be specific, not generic.

Per-Conversation Breakdown

| # | Title | Duration | Scope Δ | Plan Revs | Task Revs | Root Cause | Rework Shape | Complete? | |:---|:---|:---|:---|:---|:---|:---|:---|:---|

Add short notes only where meaningful.

Step 11: Auto-Optimize — Improve Future Sessions

11a. Update Project Health State

Example path (update to your actual location):

`~/.gemini/antigravity/.agent/skills/project-health-state/SKILL.md`

Update:

session analysis metrics
recurring fragile files/subsystems
recurring failure modes
last updated timestamp

11b. Generate Prompt Improvement Guidance

Create prompt_improvement_tips.md

Do not give generic advice. Instead extract:

traits of high-sufficiency prompts
examples of effective scope boundaries
examples of good acceptance criteria
examples of useful file targeting
common missing details that led to replans

11c. Suggest Missing Skills / Workflows

If multiple struggle sessions cluster around the same subsystem or repeated sequence, recommend:

a targeted skill
a repeatable workflow
a reusable prompt template
a repo note / architecture map

Only recommend workflows when the pattern appears repeatedly.

Final Output Standard

The workflow must produce:

A metrics summary
A root-cause diagnosis
A subsystem/friction map
A prompt-sufficiency assessment
Evidence-backed recommendations
Non-obvious findings

If evidence is weak, say so. Do not overclaim. Prefer explicit uncertainty over fake precision.

How to invoke this skill
Just say any of these in a new conversation:

“Run analyze-project on the workspace”
“Do a full session analysis report”
“Root cause my recent brain/ sessions”
“Update project health state”

The agent will automatically discover and use the skill.

/analyze-project — Root Cause Analyst Workflow

This workflow is not a simple metrics dashboard. It is a forensic analysis workflow for AI coding sessions.

Primary Objective

For each session, determine:

What changed from the initial ask to the final executed work
Whether the change was caused primarily by:
- the user/spec
- the agent
- the codebase/repo
- testing/verification
- legitimate task complexity
Whether the original prompt was sufficient for the actual job
Which subsystems or files repeatedly correlate with struggle
What concrete changes would most improve future sessions

Core Principles

Treat .resolved.N counts as signals of iteration intensity, not proof of failure
Do not label struggle based on counts alone; classify the shape of rework
Separate human-added scope from necessary discovered scope
Separate agent error from repo friction
Every diagnosis must include evidence
Every recommendation must map to a specific observed pattern
Use confidence levels:
- High = directly supported by artifact contents or timestamps
- Medium = supported by multiple indirect signals
- Low = plausible inference, not directly proven

Step 1: Discovery — Find Relevant Conversations

Read the conversation summaries available in the system context.
List all subdirectories in: `~/.gemini/antigravity/brain/
Build a Conversation Index by cross-referencing summaries with UUID folders.
Record for each conversation:
- conversation_id
- title
- objective
- created
- last_modified
If the user supplied a keyword/path, filter on that. Otherwise analyze all workspace conversations.

Output: indexed list of conversations to analyze.

Step 2: Artifact Extraction — Build Session Evidence

For each conversation, read all structured artifacts that exist.

2a. Core Artifacts

task.md
implementation_plan.md
walkthrough.md

2b. Metadata

*.metadata.json

2c. Version Snapshots

task.md.resolved.0 ... N
implementation_plan.md.resolved.0 ... N
walkthrough.md.resolved.0 ... N

2d. Additional Signals

other .md artifacts
report/evaluation files
timestamps across artifact updates
file/folder names mentioned in plans and walkthroughs
repeated subsystem references
explicit testing/validation language
explicit non-goals or constraints, if present

2e. Record Per Conversation

Presence / Lifecycle

has_task
has_plan
has_walkthrough
is_completed
is_abandoned_candidate = has task but no walkthrough

Revision / Change Volume

task_versions
plan_versions
walkthrough_versions
extra_artifacts

Scope

task_items_initial
task_items_final
task_completed_pct
scope_delta_raw
scope_creep_pct_raw

Timing

created_at
completed_at
duration_minutes

Content / Quality Signals

objective_text
initial_plan_summary
final_plan_summary
initial_task_excerpt
final_task_excerpt
walkthrough_summary
mentioned_files_or_subsystems
validation_requirements_present
acceptance_criteria_present
non_goals_present
scope_boundaries_present
file_targets_present
constraints_present

Step 3: Prompt Sufficiency Analysis

For each conversation, score the opening objective/request on a 0–2 scale for each dimension:

Clarity — is the ask understandable?
Boundedness — are scope limits defined?
Testability — are success conditions or acceptance criteria defined?
Architectural specificity — are files/modules/systems identified?
Constraint awareness — are non-goals, constraints, or environment details included?
Dependency awareness — does the prompt acknowledge affected systems or hidden coupling?

Create:

prompt_sufficiency_score
prompt_sufficiency_band = High / Medium / Low

Then note which missing ingredients likely contributed to later friction.

Important: Do not assume a low-detail prompt is bad by default. Short prompts can still be good if the task is narrow and the repo context is obvious.

Step 4: Scope Change Classification

Do not treat all scope growth as the same.

For each conversation, classify scope delta into:

4a. Human-Added Scope

New items clearly introduced beyond the initial ask. Examples:

optional enhancements
follow-on refactors
“while we are here” additions
cosmetic or adjacent work added later

4b. Necessary Discovered Scope

Work that was not in the opening ask but appears required to complete it correctly. Examples:

dependency fixes
required validation work
hidden integration tasks
migration fallout
coupled module updates

4c. Agent-Introduced Scope

Work that appears not requested and not necessary, likely introduced by agent overreach.

For each conversation record:

scope_change_type_primary
scope_change_type_secondary (optional)
scope_change_confidence
evidence for classification

Step 5: Rework Shape Analysis

Do not just count revisions. Determine the shape of session rework.

Classify each conversation into one of these patterns:

Clean execution — little change, smooth completion
Early replan then stable finish — plan changed early, then execution converged
Progressive scope expansion — work kept growing throughout the session
Reopen/reclose churn — repeated task adjustments/backtracking
Late-stage verification churn — implementation mostly done, but testing/validation caused loops
Abandoned mid-flight — work started but did not reach walkthrough
Exploratory / research session — iterations are high but expected due to problem discovery

Record:

rework_shape
rework_shape_confidence
supporting evidence

Step 6: Root Cause Analysis

For every non-clean session, assign:

6a. Primary Root Cause

Choose one:

SPEC_AMBIGUITY
HUMAN_SCOPE_CHANGE
REPO_FRAGILITY
AGENT_ARCHITECTURAL_ERROR
VERIFICATION_CHURN
LEGITIMATE_TASK_COMPLEXITY

6b. Secondary Root Cause

Optional if a second factor materially contributed.

6c. Evidence Requirements

Every root cause assignment must include:

evidence from artifacts or metadata
why competing causes were rejected
confidence level

6d. Root Cause Heuristics

SPEC_AMBIGUITY

Use when the opening ask lacked boundaries, targets, criteria, or constraints, and the plan had to invent them.

HUMAN_SCOPE_CHANGE

Use when the task set expanded due to new asks, broadened goals, or post-hoc additions.

REPO_FRAGILITY

Use when hidden coupling, unclear architecture, brittle files, or environmental issues forced extra work.

AGENT_ARCHITECTURAL_ERROR

Use when the agent chose the wrong approach, wrong files, wrong assumptions, or hallucinated structure.

VERIFICATION_CHURN

Use when implementation mostly succeeded but tests, validation, QA, or fixes created repeated loops.

LEGITIMATE_TASK_COMPLEXITY

Use when revisions were reasonable given the difficulty and do not strongly indicate avoidable failure.

Step 7: Subsystem / File Clustering

Across all conversations, cluster repeated struggle by subsystem, folder, or file mentions.

Examples:

frontend/auth/*
db.py
ui.py
video_pipeline/*

For each cluster, calculate:

number of conversations touching it
average revisions
completion rate
abandonment rate
common root causes

Output the top recurring friction zones.

Goal: Identify whether struggle is prompt-driven, agent-driven, or concentrated in specific repo areas.

Step 8: Comparative Cohort Analysis

Compare these cohorts:

first-shot successes vs re-planned sessions
completed vs abandoned
high prompt sufficiency vs low prompt sufficiency
narrow-scope vs high-scope-growth
short sessions vs long sessions
low-friction subsystems vs high-friction subsystems

For each comparison, identify:

what differs materially
which prompt traits correlate with smoother execution
which repo traits correlate with repeated struggle

Do not merely restate averages. Extract causal-looking patterns cautiously and label them as inference where appropriate.

Step 9: Non-Obvious Findings

Generate 3–7 findings that are not simple metric restatements.

Good examples:

“Most replans happen in sessions with weak file targeting, not weak acceptance criteria.”
“Scope growth usually begins after the first successful implementation, suggesting post-success human expansion.”
“Auth-related sessions cluster around repo fragility rather than agent hallucination.”
“Abandoned work is strongly associated with missing validation criteria.”

Bad examples:

“Some sessions had many revisions.”
“Some sessions were longer than others.”

Each finding must include:

observation
why it matters
evidence
confidence

Step 10: Report Generation

Create session_analysis_report.md in the current conversation’s brain folder.

Use this structure:

📊 Session Analysis Report — [Project Name]

Generated: [timestamp] Conversations Analyzed: [N] Date Range: [earliest] → [latest]

Executive Summary

Then include a short narrative summary:

what is going well
what is breaking down
whether the main issue is prompt quality, repo fragility, or workflow discipline

Root Cause Breakdown

Prompt Sufficiency Analysis

common traits of high-sufficiency prompts
common missing inputs in low-sufficiency prompts
which missing prompt ingredients correlate most with replanning or abandonment

Scope Change Analysis

Separate:

Human-added scope
Necessary discovered scope
Agent-introduced scope

Show top offenders in each category.

Rework Shape Analysis

Summarize how sessions tend to fail:

early replan then recover
progressive scope expansion
late verification churn
abandonments
reopen/reclose cycles

Friction Hotspots

Cluster repeated struggle by subsystem/file/domain. Show which areas correlate with:

replanning
abandonment
verification churn
agent architectural mistakes

First-Shot Successes

List the cleanest sessions and extract what made them work:

scope boundaries
acceptance criteria
file targeting
validation clarity
narrowness of change surface

Non-Obvious Findings

List 3–7 high-value findings with evidence and confidence.

Recommendations

Each recommendation must use this format:

Recommendation [N]

Observed pattern
Likely cause
Evidence
Change to make
Expected benefit
Confidence

Recommendations must be specific, not generic.

Per-Conversation Breakdown

| # | Title | Duration | Scope Δ | Plan Revs | Task Revs | Root Cause | Rework Shape | Complete? | |:---|:---|:---|:---|:---|:---|:---|:---|:---|

Add short notes only where meaningful.

Step 11: Auto-Optimize — Improve Future Sessions

11a. Update Project Health State

Example path (update to your actual location):

`~/.gemini/antigravity/.agent/skills/project-health-state/SKILL.md`

Update:

session analysis metrics
recurring fragile files/subsystems
recurring failure modes
last updated timestamp

11b. Generate Prompt Improvement Guidance

Create prompt_improvement_tips.md

Do not give generic advice. Instead extract:

traits of high-sufficiency prompts
examples of effective scope boundaries
examples of good acceptance criteria
examples of useful file targeting
common missing details that led to replans

11c. Suggest Missing Skills / Workflows

If multiple struggle sessions cluster around the same subsystem or repeated sequence, recommend:

a targeted skill
a repeatable workflow
a reusable prompt template
a repo note / architecture map

Only recommend workflows when the pattern appears repeatedly.

Final Output Standard

The workflow must produce:

A metrics summary
A root-cause diagnosis
A subsystem/friction map
A prompt-sufficiency assessment
Evidence-backed recommendations
Non-obvious findings

If evidence is weak, say so. Do not overclaim. Prefer explicit uncertainty over fake precision.

How to invoke this skill
Just say any of these in a new conversation:

“Run analyze-project on the workspace”
“Do a full session analysis report”
“Root cause my recent brain/ sessions”
“Update project health state”

The agent will automatically discover and use the skill.

Adoption

aliabbaschadhar/analyze-project

$ install --global

Security Scan Results

SKILL.md

/analyze-project — Root Cause Analyst Workflow

Primary Objective

Core Principles

Step 1: Discovery — Find Relevant Conversations

Step 2: Artifact Extraction — Build Session Evidence

2a. Core Artifacts

2b. Metadata

2c. Version Snapshots

2d. Additional Signals

2e. Record Per Conversation

Presence / Lifecycle

Revision / Change Volume

Scope

Timing

Content / Quality Signals

Step 3: Prompt Sufficiency Analysis

Step 4: Scope Change Classification

4a. Human-Added Scope

4b. Necessary Discovered Scope

4c. Agent-Introduced Scope

Step 5: Rework Shape Analysis

Step 6: Root Cause Analysis

6a. Primary Root Cause

6b. Secondary Root Cause

6c. Evidence Requirements

6d. Root Cause Heuristics

SPEC_AMBIGUITY

HUMAN_SCOPE_CHANGE

REPO_FRAGILITY

AGENT_ARCHITECTURAL_ERROR

VERIFICATION_CHURN

LEGITIMATE_TASK_COMPLEXITY

Step 7: Subsystem / File Clustering

Step 8: Comparative Cohort Analysis

Step 9: Non-Obvious Findings

Step 10: Report Generation

📊 Session Analysis Report — [Project Name]

Executive Summary

Root Cause Breakdown

Prompt Sufficiency Analysis

Scope Change Analysis

Rework Shape Analysis

Friction Hotspots

First-Shot Successes

Non-Obvious Findings

Recommendations

Recommendation [N]

Per-Conversation Breakdown

Step 11: Auto-Optimize — Improve Future Sessions

11a. Update Project Health State

Example path (update to your actual location):

~/.gemini/antigravity/.agent/skills/project-health-state/SKILL.md

11b. Generate Prompt Improvement Guidance

11c. Suggest Missing Skills / Workflows

Final Output Standard

Related Skills

aliabbaschadhar/biopython

aliabbaschadhar/bill-gates

aliabbaschadhar/bdi-mental-states

aliabbaschadhar/baseline-ui

aliabbaschadhar/analyze-project

$ install --global

Security Scan Results

SKILL.md

/analyze-project — Root Cause Analyst Workflow

Primary Objective

Core Principles

Step 1: Discovery — Find Relevant Conversations

Step 2: Artifact Extraction — Build Session Evidence

2a. Core Artifacts

2b. Metadata

2c. Version Snapshots

2d. Additional Signals

2e. Record Per Conversation

Presence / Lifecycle

`~/.gemini/antigravity/.agent/skills/project-health-state/SKILL.md`

`~/.gemini/antigravity/.agent/skills/project-health-state/SKILL.md`