skills/ctrlcot-dual-granularity-chain-of-thought-compress/SKILL.md
Compress chain-of-thought reasoning using CtrlCoT's dual-granularity framework: hierarchical semantic abstraction combined with logic-preserving token pruning. Use when asked to 'compress reasoning', 'shorten chain of thought', 'optimize CoT tokens', 'reduce reasoning verbosity', 'budget-controlled reasoning', or 'efficient step-by-step thinking'.
npx skillsauth add ndpvt-web/arxiv-claude-skills ctrlcot-dual-granularity-chain-of-thought-compressInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to produce compressed chain-of-thought reasoning that uses significantly fewer tokens while preserving correctness. Based on the CtrlCoT framework, it combines two complementary strategies: semantic abstraction (rewriting reasoning at coarser granularity levels) and logic-preserving pruning (removing redundant tokens while retaining numbers, operators, and logical connectives). The result is reasoning traces that are 30-60% shorter than naive CoT while maintaining or improving accuracy, particularly on mathematical and logical tasks.
Dual-granularity compression addresses a fundamental tension: semantic-level compression (rewriting steps at a higher abstraction) is conservative and often leaves redundancy, while token-level pruning (deleting individual tokens) is aggressive and can destroy critical reasoning cues like numbers, variable names, and operators. CtrlCoT harmonizes both through three interlocking mechanisms.
Hierarchical Reasoning Abstraction generates reasoning at four semantic tiers -- Detailed, Standard, Concise, and Ultra-Concise -- each produced by controlling the verbosity of the explanation. The key insight is that a correct proof at the "Concise" level is not just a shorter version of "Detailed" -- it restructures the argument, merging steps and eliding obvious sub-derivations. Answer-consistency filtering ensures that only traces producing the correct final answer survive at each tier.
Logic-Preserving Pruning then operates within a chosen semantic tier, removing filler tokens (transition phrases, restatements, hedging language) while explicitly protecting logic-critical tokens: numerical values, mathematical operators, variable bindings, conditional keywords (if, then, else), and intermediate results. This is not generic summarization -- it is domain-aware compression that understands which tokens carry reasoning load. The pruning ratio is controllable via a parameter (0.3 to 1.0), where 0.3 means "keep 30% of tokens" and 1.0 means "keep everything." Empirically, 0.3-0.4 is the sweet spot for math reasoning.
Distribution Alignment resolves the final problem: pruned traces read like telegrams ("calculate... divide... get 7") which diverge from fluent reasoning style. The compressed trace is re-rendered into natural language that flows coherently at the target length, avoiding fragmentation while respecting the token budget.
Classify the reasoning task. Determine whether the problem is mathematical, logical, code-based, or analytical. This determines which tokens are "logic-critical" (numbers and operators for math; variable names and control flow for code; premises and conclusions for logic).
Generate a full-detail reasoning trace first. Solve the problem with complete step-by-step reasoning internally. This is your "Detailed" tier -- the ground-truth derivation you will compress from.
Select the target semantic granularity. Choose one of four tiers based on the user's needs:
Rewrite at the chosen granularity. Restructure the reasoning at the target level. This is not truncation -- merge sequential steps that share a single logical move, drop sub-derivations the reader can infer, and collapse intermediate arithmetic into single expressions.
Identify and protect logic-critical tokens. Before any further compression, tag tokens that must survive: all numerical values, operators (+, -, *, /, =, <, >), variable names, function names, conditional keywords, and final/intermediate results. These are non-negotiable.
Prune filler tokens. Remove transition phrases ("Let's think about this," "Now we need to," "This means that"), restatements of the problem, hedging language ("probably," "I think"), and redundant renamings. Preserve sentence structure around protected tokens.
Re-render into fluent language. Smooth the pruned trace so it reads naturally. Replace telegram-style fragments with grammatical sentences. Ensure each step logically connects to the next without requiring the reader to mentally reconstruct missing words.
Verify answer consistency. Re-derive the final answer from only the compressed trace. If the compressed reasoning no longer supports the correct answer, back off to a less aggressive compression tier and repeat from step 4.
Present the compressed reasoning. Output the final compressed CoT, optionally prefixed with the compression tier used and approximate token savings if the user requested that information.
Example 1: Mathematical problem with budget constraint
User: "Solve this concisely: If f(x) = 2x^2 - 3x + 1, find f(5) - f(2)."
Full internal reasoning (Detailed tier, ~80 tokens):
f(5) = 2(25) - 3(5) + 1 = 50 - 15 + 1 = 36
f(2) = 2(4) - 3(2) + 1 = 8 - 6 + 1 = 3
f(5) - f(2) = 36 - 3 = 33
Compressed output (Concise tier, ~35 tokens):
f(5) = 2(25) - 15 + 1 = 36
f(2) = 2(4) - 6 + 1 = 3
Difference: 36 - 3 = **33**
What was removed: Redundant expansions (writing "3(5)" then "15" separately), transition words. What was preserved: Every number, every operator, every intermediate result.
Example 2: Code debugging with compressed reasoning
User: "Why does this function return None? Keep the explanation brief."
def find_max(lst):
if len(lst) == 0:
return None
max_val = lst[0]
for item in lst:
if item > max_val:
max_val = item
Full reasoning (Detailed, ~120 tokens):
The function defines max_val and iterates through the list correctly.
The comparison logic is right. However, after the loop completes,
there is no return statement for max_val. Python functions without
an explicit return statement return None by default. The fix is to
add "return max_val" after the for loop.
Compressed output (Concise, ~40 tokens):
The loop logic is correct, but the function never returns `max_val` --
it falls off the end after the `for` block. Add `return max_val` after
the loop.
What was removed: Restatement of what the code does ("defines max_val and iterates"), explanation of Python's default None behavior (the user likely knows this). What was preserved: The root cause ("never returns max_val"), the location ("after the for block"), the fix.
Example 3: Multi-step logic problem at Ultra-Concise tier
User: "Compress your reasoning to the minimum. All widgets cost $5. Alice buys 3, Bob buys 7, Carol buys twice what Alice bought. Total spent by all three?"
Compressed output (Ultra-Concise, ~25 tokens):
Alice: 3*5=15, Bob: 7*5=35, Carol: 2*3*5=30.
Total: 15+35+30 = **$80**
Every number and operator preserved. Zero filler words. The reasoning spine is fully traceable.
Do:
Avoid:
Compression destroys correctness: If the compressed trace produces a different answer than the full trace, immediately back off one granularity tier. The hierarchy is your safety net: Concise fails? Use Standard. Standard fails? Use Detailed. Never ship a compressed trace that yields the wrong answer.
Logic-critical token accidentally pruned: If a number, operator, or variable name was dropped and the reasoning gap is noticed during verification, restore it and re-prune only the surrounding filler. The protected-token list is inviolable.
Fluency degradation: If the compressed trace reads as fragmented bullet points rather than coherent reasoning, spend a few tokens on connective tissue ("so," "thus," "giving"). A 5% token increase for readability is worth it -- the goal is compression, not minimalism at all costs.
Ambiguous compression request: If the user says "be brief" without specifying a tier, default to Concise (35-50% tokens). This balances informativeness with economy. Only use Ultra-Concise when explicitly requested or in automated pipelines.
Paper: CtrlCoT: Dual-Granularity Chain-of-Thought Compression for Controllable Reasoning (Fan et al., 2026). Focus on Section 3 for the three-component framework (HRA, LPD, DAG) and Table 2 for compression-ratio-vs-accuracy tradeoffs across MATH-500 and GSM8K benchmarks.
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".