engineering/collab-proof/skills/collab-proof/SKILL.md
Use when you want to understand what Claude contributed vs what you drove in a session. Triggers on: /collab-proof, session retrospective, ai contribution analysis, collaboration evidence, what did claude do.
npx skillsauth add alirezarezvani/claude-skills collab-proofInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Surfaces AI collaboration evidence the developer didn't consciously record. Vela 3-layer pipeline × ADHD 4-frame reasoning — prompt-native, zero dependencies.
Run git log --oneline -10 and git diff --stat HEAD~3..HEAD first.
Classify signal level using this rubric (pick the highest that matches):
HIGH → full artifacts (DECISIONS.md + session-history + WORKLOG + HTML)
BUG_FIXING special rule — override file count: Even if only 1 file changed, classify as HIGH if the conversation contains:
MEDIUM → WORKLOG only
LOW → silence, tell user "Routine session — nothing recorded."
Show the user: Signal: HIGH / MEDIUM / LOW — [one-line reason]
Run all four frames simultaneously against conversation context + git diff. Score each frame 0.0–1.0 using the rubric below. Then apply pruning and classification rules.
Frame A — Technical (code churn complexity)
1.0 New module/file created, complex logic added (state machine, Lua script, novel algorithm)0.5 Existing function logic modified, simple API endpoint added0.1 Typo fix, comment change, plain text editFrame B — Uncertainty (developer doubt signals)
1.0 Code written then fully rolled back, explicit doubt expressed ("이게 맞나?", "동작 안 하네"), git revert0.5 Advice sought from Claude mid-implementation, 2+ revision requests on same area0.0 Uninterrupted directive execution — developer knew exactly what to buildFrame C — Fork (decision branch presence)
1.0 Two or more alternatives explicitly compared in conversation (A vs B)0.5 No explicit comparison but tradeoff mentioned (performance vs readability)0.0 Single standard approach applied, no alternatives consideredFrame D — AI contribution (Claude's actual impact)
1.0 Claude identified a bug/edge case the developer hadn't noticed and proposed the fix0.6 Claude generated structural boilerplate/skeleton that significantly accelerated execution0.2 Claude reformatted or transcribed developer-directed code without independent contributionPrune any frame scoring < 0.4.
Exception — High-Speed Execution Guard:
If Frame A >= 0.8 AND Frame D >= 0.6, do NOT prune and do NOT silence the session,
even if Frame B = 0.0 and Frame C = 0.0.
This is a boilerplate-heavy FEATURE_BUILDING session. Classify immediately as FEATURE_BUILDING with HIGH signal.
Rationale: zero uncertainty in a fast-moving session is a feature, not a reason to discard it.
| Surviving frames | Dominant intent | Meaning |
|---|---|---|
| A high + D mid-high (B, C low) | FEATURE_BUILDING | High-velocity feature generation, Claude scaffolding |
| B high + A/D high | BUG_FIXING or STUCK | Active debugging or unresolved looping |
| C high + A high | REFACTORING or EXPLORING | Architecture exploration, weighing alternatives |
| All frames < 0.4 | FLOW_STATE or LOW | Routine typing, silence unless Layer 01 was HIGH |
If multiple intents tie, pick the one with the highest combined frame score. Record the runner-up — it belongs in the session narrative.
Before proceeding to Layer 03, resolve to this structure (show it to the user):
{
"frames": {
"technical": 0.0,
"uncertainty": 0.0,
"fork": 0.0,
"ai_contribution": 0.0
},
"pruned": ["list of pruned frame names"],
"intent": "FEATURE_BUILDING",
"signal": "HIGH",
"calibration_note": "one sentence explaining any exception rule applied"
}
Append to DECISIONS.md — one entry per real fork (Frame C must confirm alternatives existed):
## [YYYY-MM-DD] <title>
**Context**: [Frame A — what forced this choice]
**Decision**: what was chosen
**Alternatives considered**: [Frame C — road not taken]
**Reasoning**: why — prefix "inferred:" if reconstructed from context
**AI contribution**:
- Identified: [Frame D — something developer missed]
- Suggested: [Frame D — approach or alternative]
- Developer-driven: [what the developer decided independently]
**Intent class**: [from Layer 02]
**Signal score**: HIGH
**Outcome**: implemented | pending | reversed
If no real fork existed → write nothing. Never fabricate decisions.
BUG_FIXING intent: use this format instead:
## [YYYY-MM-DD] <bug title>
**Root cause**: what actually caused the bug — the WHY, not just the what
**Symptom**: what the developer observed
**Fix**: what was changed
**Why this fix**: rationale — inferred if not stated explicitly
**Alternative fixes considered**: other approaches discussed (if any)
**AI contribution**:
- Identified: [Frame D — did Claude spot the root cause?]
- Suggested: [Frame D — fix approach or diagnostic step]
- Developer-driven: [what the developer diagnosed/decided independently]
**Intent class**: BUG_FIXING
**Signal score**: HIGH
**Outcome**: fixed | workaround | deferred
Create session-history/YYYY-MM-DD-HHMM.md:
# Session [YYYY-MM-DD HH:MM]
**Intent**: [class] (runner-up: [class if any])
**Signal**: HIGH
**Frames active**: A ([score]) / B ([score]) / C ([score]) / D ([score])
## What shipped
[grounded in git log]
## What was figured out
[Frame B + C — the reasoning, tradeoffs, debugging — what developers forget]
## Decisions made this session
[refs to DECISIONS.md entries]
## Where it got hard
[Frame B findings — uncertainty, reverts, EXPLORING/STUCK signals]
## AI contribution summary
[Frame D synthesis — one honest paragraph, calibrated]
## Next steps inferred
[what's obviously incomplete]
Append to WORKLOG.md:
YYYY-MM-DD HH:MM | [intent] | HIGH | D:[score] | cache:[hit%]% | tok:[total] | <verb phrase> — <why it mattered>
Fields:
D:[score] — Frame D AI contribution score (0.0–1.0)cache:[hit%]% — cache hit rate from token analysis (or cache:n/a if no data)tok:[total] — total tokens this session (input + cache_read + cache_create + output, in K e.g. 45K)Collect token usage (bash — run this and capture output):
python3 -c "
import json, sys
from pathlib import Path
projects = Path.home() / '.claude/projects'
files = sorted(projects.rglob('*.jsonl'), key=lambda f: f.stat().st_mtime, reverse=True)
if not files:
print('no_data'); sys.exit()
with open(files[0]) as fp:
lines = [json.loads(l) for l in fp if l.strip()]
ti = to = cr = cc = 0
turns = []
for i, line in enumerate(lines):
if line.get('type') == 'assistant':
u = line.get('message', {}).get('usage', {})
if not u: continue
inp = u.get('input_tokens', 0)
ti += inp; to += u.get('output_tokens', 0)
cr += u.get('cache_read_input_tokens', 0)
cc += u.get('cache_creation_input_tokens', 0)
prompt = ''
for j in range(i-1, -1, -1):
if lines[j].get('type') == 'user':
c = lines[j].get('message', {}).get('content', '')
prompt = (c if isinstance(c, str) else next((x.get('text','') for x in c if isinstance(x,dict) and x.get('type')=='text'), ''))[:80]
break
turns.append((inp, prompt))
total = ti + cr + cc
hit = cr / total * 100 if total else 0
print(f'input={ti} output={to} cache_read={cr} cache_create={cc} hit={hit:.0f} turns={len(turns)}')
turns.sort(reverse=True)
for idx, (tok, p) in enumerate(turns[:3]):
print(f'top{idx+1}={tok}|{p}')
"
Parse the output and include token stats in the session narrative. Then:
Generate session-history/YYYY-MM-DD-HHMM-proof.html — write a self-contained HTML file. Structure and class names are fixed — do not rename or reorder sections.
Fixed CSS tokens (use exactly):
#0d1117, Card: #161b22, Border: #30363dfont-family: 'Courier New', monospacehigh → #3fb950, low → #f85149, pruned → #8b949eai-identified → #a371f7, ai-suggested → #d29922, ai-developer → #3fb950Fixed HTML structure (class names must match exactly):
<div class="header">
<div class="header-top">
<div class="project-name">
<span class="badge"> <!-- intent class -->
<div class="meta-row"> <!-- date, branch, signal level text -->
<div class="signal-container">
<div class="signal-label">
<div class="signal-track">
<div class="signal-fill"> <!-- width % driven by signal score -->
<div class="section"> <!-- frames -->
<div class="section-title"> ... <span class="count">Layer 02 · ADHD tree-of-thought</span>
<div class="frames-grid">
<div class="frame-card"> <!-- pruned: class="frame-card pruned" -->
<div class="frame-label"> <!-- Frame A / B / C / D -->
<div class="frame-name">
<div class="frame-score high|low"> <!-- score value -->
<div class="section"> <!-- decisions — skip section if none -->
<div class="section-title"> ... <span class="count">N recorded</span>
<div class="decision-card"> <!-- one per DECISIONS.md entry -->
<div class="decision-header">
<div class="decision-title">
<div class="decision-date">
<div class="decision-fields">
<div class="field-row">
<div class="field-label"> <!-- Context / Decision / Alternatives / Reasoning -->
<div class="field-value">
<div class="field-row"> <!-- AI contribution row -->
<div class="field-label">AI contribution</div>
<div class="field-value">
<div class="ai-block">
<div class="ai-line ai-identified|ai-suggested|ai-developer">
<span class="tag">IDENTIFIED|SUGGESTED|DEV-DRIVEN</span>
<div class="field-row"> <!-- Outcome row -->
<div class="field-label">Outcome</div>
<div class="field-value">
<span class="outcome-badge outcome-implemented|outcome-pending|outcome-reversed">
<div class="section"> <!-- session narrative -->
<div class="section-title">Session narrative</div>
<div class="narrative-grid">
<div class="narrative-card"> <!-- What shipped -->
<div class="narrative-card"> <!-- What was figured out -->
<div class="narrative-card"> <!-- Where it got hard -->
<div class="narrative-card"> <!-- Next steps inferred -->
<div class="section"> <!-- AI contribution summary -->
<div class="section-title">AI contribution summary</div>
<div class="narrative-card"> <!-- Frame D synthesis paragraph -->
<div class="section"> <!-- token usage -->
<div class="section-title">Token usage</div>
<div class="narrative-card"> <!-- cache hit rate bar + top turns + optimization note -->
<div class="section"> <!-- worklog tail -->
<div class="section-title"> ... <span class="count">last N entries</span>
<div class="worklog-entry"> <!-- one per recent WORKLOG line -->
<div class="footer"> <!-- last commit hash · "Generated by collab-proof · timestamp" -->
Write the HTML using bash:
cat > session-history/YYYY-MM-DD-HHMM-proof.html << 'HTMLEOF'
<!DOCTYPE html>
... (full HTML with inline CSS, no external resources)
HTMLEOF
After writing, show: open session-history/YYYY-MM-DD-HHMM-proof.html
Append one line to WORKLOG.md only:
YYYY-MM-DD HH:MM | [intent] | MEDIUM | D:[score] | cache:[hit%]% | tok:[total] | <verb phrase>
Tell user: "Signal: LOW — Routine session, nothing recorded."
When context compaction is about to happen (triggered by the PreCompact hook), run a lightweight mid-session checkpoint before context is lost:
session-history/.tmp-TIMESTAMP.json:{
"timestamp": "YYYY-MM-DD HH:MM:SS",
"trigger": "pre-compact",
"signal": "HIGH / MEDIUM / LOW",
"frames": { "technical": 0.0, "uncertainty": 0.0, "fork": 0.0, "ai_contribution": 0.0 },
"intent": "FEATURE_BUILDING",
"key_moments": [
"one-line description of the most important decision or finding so far"
]
}
When /collab-proof runs at session end:
session-history/.tmp-*.json fileskey_moments arrays — these preserve tradeoff discussions that were compacted away.tmp-*.json files after mergingdata-ai
Personal coach that teaches users to become Claude power users. Use this skill the FIRST time a user asks to "learn Claude", "be a power user", "coach me", "teach me Claude tricks", "what can Claude do", "make me better at prompting", or any variation. After activation, also use it on EVERY subsequent turn to detect missed optimization opportunities (vague prompts, ignored capabilities, manual work Claude could automate) and surface a single power-user tip. Trigger generously — most users do not know what they do not know, so err on the side of coaching.
development
Use when designing or revisiting product pricing — selecting a pricing model (subscription seat-based, usage-based, value-based, freemium, or hybrid), running Van Westendorp Price Sensitivity Meter analysis on WTP survey data, or designing Good/Better/Best packaging tiers. Recommends a model and a price range with trade-offs, never a single number. For Commercial leads, Product Marketing, and CMOs at the pricing-design moment — not deal-by-deal discounting, not brand positioning.
testing
Use when a startup is approached by a prospective partner and someone has to decide should we sign this partner, at what partner tier (referral / reseller / OEM / SI-consulting / strategic alliance), with what joint GTM commitment, and at what revshare. Classifies partner tier from independent-demand evidence vs. preferential-terms hunting, designs a 90-day joint GTM plan, models revshare against direct-sale margin, and surfaces kill criteria for unwinding under-performing partnerships. For Head of Partnerships, Head of BD, and Founder-CEOs doing reseller agreement, OEM deal, or strategic alliance review — not technical sale enablement, not channel cost economics, not M&A.
testing
Use when reviewing a specific inbound deal before close — when sales has asked for a discount that exceeds AE authority, when the customer has redlined the MSA, when per-deal economics (margin after discount, multi-year payment shape, indemnity exposure) need to be quantified, or when discount approval needs to be routed to a named human approver (Sales Director, VP Sales, CFO, CRO, General Counsel). Covers deal review, discount approval routing, per-deal margin scoring, deal exception handling, MSA redline triage, contract landmine detection (uncapped indemnity, MFN, perpetual license-back, missing DPA), and named-approver chain assembly. NEVER auto-approves — every output is a numeric scorecard plus a routing recommendation to a named human.