skills/claude-4-6-features/extended-thinking/SKILL.md
Use Claude's extended thinking (reasoning) mode effectively — budget tokens, interleaved thinking with tool use, when it helps, when it wastes tokens, and how to inspect the thinking trace. Use this skill when building reasoning-heavy features (math, code generation, multi-step planning), debugging why a model is shallow on hard problems, or deciding whether to enable thinking. Activate when: extended thinking, thinking tokens, budget_tokens, reasoning mode, interleaved thinking, thinking blocks.
npx skillsauth add latestaiagents/agent-skills extended-thinkingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Extended thinking gives the model a scratchpad before the final answer. Pay for reasoning tokens, get deeper answers. Use it surgically, not everywhere.
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 16_000,
thinking: {
type: "enabled",
budget_tokens: 10_000,
},
messages: [{ role: "user", content: "Prove that every prime > 3 is of the form 6k±1." }],
});
budget_tokens is the max thinking tokens. Model may use fewer. max_tokens must be > budget_tokens (thinking counts toward the total).
| Task | Typical budget | |---|---| | Short multi-step reasoning | 2,000-5,000 | | Code generation with planning | 5,000-10,000 | | Complex math/proofs | 10,000-32,000 | | Deep agent planning | 10,000-20,000 | | Research synthesis | 16,000-32,000 |
Start at 5,000 and measure. Bigger budget ≠ better answers past a point.
for (const block of response.content) {
if (block.type === "thinking") {
console.log("REASONING:", block.thinking);
} else if (block.type === "text") {
console.log("ANSWER:", block.text);
}
}
The thinking block reveals the model's reasoning. Useful for:
Do not feed thinking blocks back to the user as-is in production — they're not polished prose. And do not modify them before passing back in multi-turn (signature validation will fail).
With the interleaved-thinking-2025-05-14 beta, the model thinks between tool calls — reasoning about each tool result before picking the next:
const response = await client.messages.create(
{
model: "claude-sonnet-4-6",
max_tokens: 16_000,
thinking: { type: "enabled", budget_tokens: 10_000 },
tools: [searchTool, fetchTool, summarizeTool],
messages: [{ role: "user", content: "Research X and write a brief." }],
},
{ headers: { "anthropic-beta": "interleaved-thinking-2025-05-14" } },
);
Without interleaved thinking, the model only thinks once at the start. With it, the model can reassess after every tool result — critical for agents that operate under uncertainty.
When continuing a conversation that included thinking, pass the assistant's full message back unchanged (including thinking blocks):
messages.push({ role: "assistant", content: response.content });
messages.push({ role: "user", content: "Great. Now prove the converse." });
const next = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 16_000,
thinking: { type: "enabled", budget_tokens: 10_000 },
messages,
});
Thinking blocks carry signatures that the API validates. Reordering or editing them breaks the request.
Thinking tokens are billed as output tokens. A call with 10K thinking + 2K answer costs 12K output tokens.
Rough rule: thinking doubles-to-triples the cost of a reasoning-heavy call. Confirm it's worth it by A/B testing against no-thinking.
const stream = client.messages.stream({
model: "claude-sonnet-4-6",
max_tokens: 16_000,
thinking: { type: "enabled", budget_tokens: 10_000 },
messages: [...],
});
for await (const event of stream) {
if (event.type === "content_block_delta" && event.delta.type === "thinking_delta") {
// show a "thinking..." spinner or subtle text
} else if (event.type === "content_block_delta" && event.delta.type === "text_delta") {
process.stdout.write(event.delta.text);
}
}
In UX, show a distinct "thinking" indicator, then switch to streaming the answer.
development
Test skills for correct activation, content quality, and regression — both automated checks (frontmatter validity, lint) and manual verification (query-suite activation testing). Covers CI integration and how to catch skill regressions before users do. Use this skill when adding skills to a repo, setting up CI for a skill library, or debugging "the skill exists but doesn't work". Activate when: test skills, validate skills, skill CI, skill linting, skill activation test, skill regression.
documentation
Write the YAML frontmatter for a SKILL.md file so it activates reliably — name, description, and activation keywords that the model matches against. Covers length, tone, and the most common frontmatter mistakes. Use this skill when authoring a new skill, fixing a skill that isn't auto-activating, or reviewing skills for publication. Activate when: SKILL.md frontmatter, skill description, skill activation, skill YAML, write a skill, author a skill.
development
Design skills that fire at the right moment — neither over-eager (noise) nor under-eager (silent). Covers activation specificity, trigger phrases, disambiguation between overlapping skills, and debugging activation. Use this skill when multiple skills could fire on the same query, a skill never fires, or a skill fires too often. Activate when: skill won't activate, skill over-activates, overlapping skills, skill triggers, skill selection, skill disambiguation.
development
Structure SKILL.md content so the model reads just enough — concise summary up front, progressively deeper detail, examples on demand. Covers section ordering, length budgets, when to split into multiple skills. Use this skill when writing or refactoring a skill body, one skill has grown too long, or a skill is wordy but not useful. Activate when: SKILL.md structure, skill content, skill too long, split skill, progressive disclosure, skill body.