Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

latestaiagents/extended-thinking

Name: extended-thinking
Author: latestaiagents

skills/claude-4-6-features/extended-thinking/SKILL.md

npx skillsauth add latestaiagents/agent-skills extended-thinking

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Extended Thinking

Extended thinking gives the model a scratchpad before the final answer. Pay for reasoning tokens, get deeper answers. Use it surgically, not everywhere.

When to Use

Complex reasoning: math, proofs, multi-step logic
Code generation where the model needs to plan before writing
Agent planning: deciding which of many tools to call and in what order
Debugging subtle issues: model can "think through" root causes
Writing where structure and coherence matter more than speed

When NOT to Use

Simple classification, extraction, or formatting — pure overhead
Latency-sensitive paths — thinking adds 2-30 seconds
Small prompts where the model gets it right zero-shot anyway
High-volume batch tasks on a tight cost budget

Enabling Thinking

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 16_000,
  thinking: {
    type: "enabled",
    budget_tokens: 10_000,
  },
  messages: [{ role: "user", content: "Prove that every prime > 3 is of the form 6k±1." }],
});

budget_tokens is the max thinking tokens. Model may use fewer. max_tokens must be > budget_tokens (thinking counts toward the total).

Budget Sizing

| Task | Typical budget | |---|---| | Short multi-step reasoning | 2,000-5,000 | | Code generation with planning | 5,000-10,000 | | Complex math/proofs | 10,000-32,000 | | Deep agent planning | 10,000-20,000 | | Research synthesis | 16,000-32,000 |

Start at 5,000 and measure. Bigger budget ≠ better answers past a point.

Inspecting the Thinking Trace

for (const block of response.content) {
  if (block.type === "thinking") {
    console.log("REASONING:", block.thinking);
  } else if (block.type === "text") {
    console.log("ANSWER:", block.text);
  }
}

The thinking block reveals the model's reasoning. Useful for:

Debugging why the model chose a wrong answer
Surfacing rationale to power users (e.g., "show reasoning" toggle)
Auditing agent decisions

Do not feed thinking blocks back to the user as-is in production — they're not polished prose. And do not modify them before passing back in multi-turn (signature validation will fail).

Interleaved Thinking with Tools

With the interleaved-thinking-2025-05-14 beta, the model thinks between tool calls — reasoning about each tool result before picking the next:

const response = await client.messages.create(
  {
    model: "claude-sonnet-4-6",
    max_tokens: 16_000,
    thinking: { type: "enabled", budget_tokens: 10_000 },
    tools: [searchTool, fetchTool, summarizeTool],
    messages: [{ role: "user", content: "Research X and write a brief." }],
  },
  { headers: { "anthropic-beta": "interleaved-thinking-2025-05-14" } },
);

Without interleaved thinking, the model only thinks once at the start. With it, the model can reassess after every tool result — critical for agents that operate under uncertainty.

Multi-Turn Conversations

When continuing a conversation that included thinking, pass the assistant's full message back unchanged (including thinking blocks):

messages.push({ role: "assistant", content: response.content });
messages.push({ role: "user", content: "Great. Now prove the converse." });

const next = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 16_000,
  thinking: { type: "enabled", budget_tokens: 10_000 },
  messages,
});

Thinking blocks carry signatures that the API validates. Reordering or editing them breaks the request.

Cost

Thinking tokens are billed as output tokens. A call with 10K thinking + 2K answer costs 12K output tokens.

Rough rule: thinking doubles-to-triples the cost of a reasoning-heavy call. Confirm it's worth it by A/B testing against no-thinking.

Streaming

const stream = client.messages.stream({
  model: "claude-sonnet-4-6",
  max_tokens: 16_000,
  thinking: { type: "enabled", budget_tokens: 10_000 },
  messages: [...],
});

for await (const event of stream) {
  if (event.type === "content_block_delta" && event.delta.type === "thinking_delta") {
    // show a "thinking..." spinner or subtle text
  } else if (event.type === "content_block_delta" && event.delta.type === "text_delta") {
    process.stdout.write(event.delta.text);
  }
}

In UX, show a distinct "thinking" indicator, then switch to streaming the answer.

Anti-Patterns

Thinking on every request — massive cost increase with no quality gain for easy tasks
Tiny budget on hard tasks — 1000 tokens isn't enough for real reasoning; model truncates
Modifying thinking blocks between turns — breaks signature; API rejects
Showing raw thinking to end users in production — messy, not polished

Best Practices

Gate thinking on task complexity — classify the query first, enable thinking only for hard ones
Start budget at 5K; measure quality and cost; adjust
Enable interleaved thinking for any agent with > 3 tools
Pass thinking blocks back unmodified in multi-turn
Stream the response so users see progress during long thinks
Log thinking tokens used per request — detect when the model is consistently maxing out budget

latestaiagents/extended-thinking

skills/claude-4-6-features/extended-thinking/SKILL.md

Use Claude's extended thinking (reasoning) mode effectively — budget tokens, interleaved thinking with tool use, when it helps, when it wastes tokens, and how to inspect the thinking trace. Use this skill when building reasoning-heavy features (math, code generation, multi-step planning), debugging why a model is shallow on hard problems, or deciding whether to enable thinking. Activate when: extended thinking, thinking tokens, budget_tokens, reasoning mode, interleaved thinking, thinking blocks.

2 stars

tools

Updated Apr 23, 2026

$ install --global

skillsauth

npx skillsauth add latestaiagents/agent-skills extended-thinking

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 24, 2026, 2:55 AM8.6s1 file scanned

SKILL.md

name:: extended-thinking
description:: |
Activate when:: extended thinking, thinking tokens, budget_tokens, reasoning mode, interleaved thinking, thinking blocks.

Extended Thinking

Extended thinking gives the model a scratchpad before the final answer. Pay for reasoning tokens, get deeper answers. Use it surgically, not everywhere.

When to Use

Complex reasoning: math, proofs, multi-step logic
Code generation where the model needs to plan before writing
Agent planning: deciding which of many tools to call and in what order
Debugging subtle issues: model can "think through" root causes
Writing where structure and coherence matter more than speed

When NOT to Use

Simple classification, extraction, or formatting — pure overhead
Latency-sensitive paths — thinking adds 2-30 seconds
Small prompts where the model gets it right zero-shot anyway
High-volume batch tasks on a tight cost budget

Enabling Thinking

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 16_000,
  thinking: {
    type: "enabled",
    budget_tokens: 10_000,
  },
  messages: [{ role: "user", content: "Prove that every prime > 3 is of the form 6k±1." }],
});

budget_tokens is the max thinking tokens. Model may use fewer. max_tokens must be > budget_tokens (thinking counts toward the total).

Budget Sizing

Start at 5,000 and measure. Bigger budget ≠ better answers past a point.

Inspecting the Thinking Trace

for (const block of response.content) {
  if (block.type === "thinking") {
    console.log("REASONING:", block.thinking);
  } else if (block.type === "text") {
    console.log("ANSWER:", block.text);
  }
}

The thinking block reveals the model's reasoning. Useful for:

Debugging why the model chose a wrong answer
Surfacing rationale to power users (e.g., "show reasoning" toggle)
Auditing agent decisions

Do not feed thinking blocks back to the user as-is in production — they're not polished prose. And do not modify them before passing back in multi-turn (signature validation will fail).

Interleaved Thinking with Tools

With the interleaved-thinking-2025-05-14 beta, the model thinks between tool calls — reasoning about each tool result before picking the next:

const response = await client.messages.create(
  {
    model: "claude-sonnet-4-6",
    max_tokens: 16_000,
    thinking: { type: "enabled", budget_tokens: 10_000 },
    tools: [searchTool, fetchTool, summarizeTool],
    messages: [{ role: "user", content: "Research X and write a brief." }],
  },
  { headers: { "anthropic-beta": "interleaved-thinking-2025-05-14" } },
);

Without interleaved thinking, the model only thinks once at the start. With it, the model can reassess after every tool result — critical for agents that operate under uncertainty.

Multi-Turn Conversations

When continuing a conversation that included thinking, pass the assistant's full message back unchanged (including thinking blocks):

messages.push({ role: "assistant", content: response.content });
messages.push({ role: "user", content: "Great. Now prove the converse." });

const next = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 16_000,
  thinking: { type: "enabled", budget_tokens: 10_000 },
  messages,
});

Thinking blocks carry signatures that the API validates. Reordering or editing them breaks the request.

Cost

Thinking tokens are billed as output tokens. A call with 10K thinking + 2K answer costs 12K output tokens.

Rough rule: thinking doubles-to-triples the cost of a reasoning-heavy call. Confirm it's worth it by A/B testing against no-thinking.

Streaming

const stream = client.messages.stream({
  model: "claude-sonnet-4-6",
  max_tokens: 16_000,
  thinking: { type: "enabled", budget_tokens: 10_000 },
  messages: [...],
});

for await (const event of stream) {
  if (event.type === "content_block_delta" && event.delta.type === "thinking_delta") {
    // show a "thinking..." spinner or subtle text
  } else if (event.type === "content_block_delta" && event.delta.type === "text_delta") {
    process.stdout.write(event.delta.text);
  }
}

In UX, show a distinct "thinking" indicator, then switch to streaming the answer.

Anti-Patterns

Thinking on every request — massive cost increase with no quality gain for easy tasks
Tiny budget on hard tasks — 1000 tokens isn't enough for real reasoning; model truncates
Modifying thinking blocks between turns — breaks signature; API rejects
Showing raw thinking to end users in production — messy, not polished

Best Practices

Gate thinking on task complexity — classify the query first, enable thinking only for hard ones
Start budget at 5K; measure quality and cost; adjust
Enable interleaved thinking for any agent with > 3 tools
Pass thinking blocks back unmodified in multi-turn
Stream the response so users see progress during long thinks
Log thinking tokens used per request — detect when the model is consistently maxing out budget

Related Skills

latestaiagents/skill-testing

development

VerifiedTrustedCommunity

Test skills for correct activation, content quality, and regression — both automated checks (frontmatter validity, lint) and manual verification (query-suite activation testing). Covers CI integration and how to catch skill regressions before users do. Use this skill when adding skills to a repo, setting up CI for a skill library, or debugging "the skill exists but doesn't work". Activate when: test skills, validate skills, skill CI, skill linting, skill activation test, skill regression.

2SKILL.mdUpdated Apr 23, 2026

latestaiagents/skill-testing

latestaiagents/skill-frontmatter

documentation

VerifiedTrustedCommunity

Write the YAML frontmatter for a SKILL.md file so it activates reliably — name, description, and activation keywords that the model matches against. Covers length, tone, and the most common frontmatter mistakes. Use this skill when authoring a new skill, fixing a skill that isn't auto-activating, or reviewing skills for publication. Activate when: SKILL.md frontmatter, skill description, skill activation, skill YAML, write a skill, author a skill.

2SKILL.mdUpdated Apr 23, 2026

latestaiagents/skill-frontmatter

latestaiagents/skill-activation-patterns

development

VerifiedTrustedCommunity

Design skills that fire at the right moment — neither over-eager (noise) nor under-eager (silent). Covers activation specificity, trigger phrases, disambiguation between overlapping skills, and debugging activation. Use this skill when multiple skills could fire on the same query, a skill never fires, or a skill fires too often. Activate when: skill won't activate, skill over-activates, overlapping skills, skill triggers, skill selection, skill disambiguation.

2SKILL.mdUpdated Apr 23, 2026

latestaiagents/skill-activation-patterns

latestaiagents/progressive-disclosure

development

VerifiedTrustedCommunity

Structure SKILL.md content so the model reads just enough — concise summary up front, progressively deeper detail, examples on demand. Covers section ordering, length budgets, when to split into multiple skills. Use this skill when writing or refactoring a skill body, one skill has grown too long, or a skill is wordy but not useful. Activate when: SKILL.md structure, skill content, skill too long, split skill, progressive disclosure, skill body.

2SKILL.mdUpdated Apr 23, 2026

latestaiagents/progressive-disclosure

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/latestaiagents/agent-skills.git

# Copy into Claude Code skills folder (global)
cp -r agent-skills/skills/claude-4-6-features/extended-thinking ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

latestaiagents/agent-skills

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT