Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

latestaiagents/long-context-1m

Name: long-context-1m
Author: latestaiagents

skills/claude-4-6-features/long-context-1m/SKILL.md

npx skillsauth add latestaiagents/agent-skills long-context-1m

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

1M Context Window

Claude Opus 4.6 and Sonnet 4.6 support 1M token context with the context-1m-2025-08-07 beta header. Use it well or burn money for nothing.

When to Use

You have a codebase, book, log bundle, or document set that fits in 1M tokens
You need cross-document reasoning that chunked RAG can't deliver
You're deciding between 1M-context vs a RAG pipeline
You want to cache a giant system prompt / knowledge base across requests

Enabling 1M Context

import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();

const response = await client.messages.create(
  {
    model: "claude-sonnet-4-6",
    max_tokens: 4096,
    messages: [{ role: "user", content: giantDocument + "\n\nSummarize." }],
  },
  { headers: { "anthropic-beta": "context-1m-2025-08-07" } },
);

Without the beta header, requests over 200K tokens will error.

Pricing Tiers

Long context is priced differently above 200K input tokens. Check your provider's current rates; as a rule of thumb input above 200K costs ~2× the base rate. Output price is unchanged.

Rule: if you're only going to use 200K, don't enable 1M. Only pay for long-context pricing when you actually need > 200K.

1M Context vs RAG

| When 1M context wins | When RAG wins | |---|---| | Cross-document synthesis | Fresh data that updates hourly | | Full-codebase refactoring | Unbounded corpus (> 1M tokens) | | Holistic code review | Per-user personal data (privacy isolation) | | Single-shot analysis | Many cheap lookups on small queries | | Exploration where you don't know what's relevant | Known query patterns |

Hybrid: RAG retrieves the top 500K tokens; stuff those into 1M context. Best of both.

Structuring Long Inputs for Recall

Claude's long-context recall is strong but not uniform. Tips:

Put the instruction at the END — "Given the above, answer X" recalls better than instruction-then-context
Section headers with XML tags — <document index="1" title="...">...</document> — the model indexes on these
Repeat critical instructions — once at top, once at bottom
Avoid homogeneous blobs — chunk with delimiters; recall degrades in undifferentiated text

const prompt = `
You will analyze the codebase below, then answer questions.

<codebase>
<file path="src/auth.ts">...</file>
<file path="src/db.ts">...</file>
...
</codebase>

Given the codebase above, answer: <question>How does auth flow work?</question>
`;

Combine with Prompt Caching

1M context is expensive per call. If you're asking multiple questions against the same corpus, cache it:

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 4096,
  system: [
    { type: "text", text: "You are a code reviewer." },
    { type: "text", text: giantCodebase, cache_control: { type: "ephemeral", ttl: "1h" } },
  ],
  messages: [{ role: "user", content: "What's the auth flow?" }],
});

First call: full cost. Subsequent calls in the TTL window: ~10% of input cost for the cached portion. See the prompt-caching-ttl skill.

Latency

1M input takes longer to process — TTFT can be 10-30s for a full context. Mitigate:

Stream the response so the user sees output early
Extended thinking on top only if you need reasoning (adds latency)
Cache so subsequent calls skip the bulk of input processing

Token Accounting

Use the token counter before sending:

const { input_tokens } = await client.messages.countTokens({
  model: "claude-sonnet-4-6",
  messages: [{ role: "user", content: text }],
});
if (input_tokens > 1_000_000) throw new Error("Over context limit");

Budget with a 5% safety margin — actual tokenization varies slightly.

Anti-Patterns

Always-on 1M: enabling the beta even for small requests. Wastes money via long-context pricing
Dump-and-pray: no structure in the input. Model can't find what it needs
Instruction at top only: with 900K of content below, the instruction gets diluted
No caching for repeat queries: every question against the same document pays full price

Best Practices

Check the token count before enabling 1M beta
Cache the static portion (the document/codebase) with 1h TTL
Put actionable instructions at the end, not the start
Use XML tags to structure sections
Stream responses — users don't want to wait 30s staring at nothing
For queries on a large corpus with many questions, prefer cached-1M over RAG

latestaiagents/long-context-1m

skills/claude-4-6-features/long-context-1m/SKILL.md

Use Claude's 1M-token context window effectively — when to use it, how to structure inputs for recall, how to price it, and how to combine with prompt caching to keep it affordable. Use this skill when building apps that feed large codebases, long documents, or entire conversation histories to Claude, or when weighing 1M context vs RAG. Activate when: 1M context, long context, big context window, context vs RAG, Claude 1 million tokens, context-beta header.

2 stars

development

Updated Apr 23, 2026

$ install --global

skillsauth

npx skillsauth add latestaiagents/agent-skills long-context-1m

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 24, 2026, 2:55 AM13.7s1 file scanned

SKILL.md

name:: long-context-1m
description:: |
Activate when:: 1M context, long context, big context window, context vs RAG, Claude 1 million tokens, context-beta header.

1M Context Window

Claude Opus 4.6 and Sonnet 4.6 support 1M token context with the context-1m-2025-08-07 beta header. Use it well or burn money for nothing.

When to Use

You have a codebase, book, log bundle, or document set that fits in 1M tokens
You need cross-document reasoning that chunked RAG can't deliver
You're deciding between 1M-context vs a RAG pipeline
You want to cache a giant system prompt / knowledge base across requests

Enabling 1M Context

import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();

const response = await client.messages.create(
  {
    model: "claude-sonnet-4-6",
    max_tokens: 4096,
    messages: [{ role: "user", content: giantDocument + "\n\nSummarize." }],
  },
  { headers: { "anthropic-beta": "context-1m-2025-08-07" } },
);

Without the beta header, requests over 200K tokens will error.

Pricing Tiers

Long context is priced differently above 200K input tokens. Check your provider's current rates; as a rule of thumb input above 200K costs ~2× the base rate. Output price is unchanged.

Rule: if you're only going to use 200K, don't enable 1M. Only pay for long-context pricing when you actually need > 200K.

1M Context vs RAG

Hybrid: RAG retrieves the top 500K tokens; stuff those into 1M context. Best of both.

Structuring Long Inputs for Recall

Claude's long-context recall is strong but not uniform. Tips:

Put the instruction at the END — "Given the above, answer X" recalls better than instruction-then-context
Section headers with XML tags — <document index="1" title="...">...</document> — the model indexes on these
Repeat critical instructions — once at top, once at bottom
Avoid homogeneous blobs — chunk with delimiters; recall degrades in undifferentiated text

const prompt = `
You will analyze the codebase below, then answer questions.

<codebase>
<file path="src/auth.ts">...</file>
<file path="src/db.ts">...</file>
...
</codebase>

Given the codebase above, answer: <question>How does auth flow work?</question>
`;

Combine with Prompt Caching

1M context is expensive per call. If you're asking multiple questions against the same corpus, cache it:

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 4096,
  system: [
    { type: "text", text: "You are a code reviewer." },
    { type: "text", text: giantCodebase, cache_control: { type: "ephemeral", ttl: "1h" } },
  ],
  messages: [{ role: "user", content: "What's the auth flow?" }],
});

First call: full cost. Subsequent calls in the TTL window: ~10% of input cost for the cached portion. See the prompt-caching-ttl skill.

Latency

1M input takes longer to process — TTFT can be 10-30s for a full context. Mitigate:

Stream the response so the user sees output early
Extended thinking on top only if you need reasoning (adds latency)
Cache so subsequent calls skip the bulk of input processing

Token Accounting

Use the token counter before sending:

const { input_tokens } = await client.messages.countTokens({
  model: "claude-sonnet-4-6",
  messages: [{ role: "user", content: text }],
});
if (input_tokens > 1_000_000) throw new Error("Over context limit");

Budget with a 5% safety margin — actual tokenization varies slightly.

Anti-Patterns

Always-on 1M: enabling the beta even for small requests. Wastes money via long-context pricing
Dump-and-pray: no structure in the input. Model can't find what it needs
Instruction at top only: with 900K of content below, the instruction gets diluted
No caching for repeat queries: every question against the same document pays full price

Best Practices

Check the token count before enabling 1M beta
Cache the static portion (the document/codebase) with 1h TTL
Put actionable instructions at the end, not the start
Use XML tags to structure sections
Stream responses — users don't want to wait 30s staring at nothing
For queries on a large corpus with many questions, prefer cached-1M over RAG

Related Skills

latestaiagents/skill-testing

development

VerifiedTrustedCommunity

Test skills for correct activation, content quality, and regression — both automated checks (frontmatter validity, lint) and manual verification (query-suite activation testing). Covers CI integration and how to catch skill regressions before users do. Use this skill when adding skills to a repo, setting up CI for a skill library, or debugging "the skill exists but doesn't work". Activate when: test skills, validate skills, skill CI, skill linting, skill activation test, skill regression.

2SKILL.mdUpdated Apr 23, 2026

latestaiagents/skill-testing

latestaiagents/skill-frontmatter

documentation

VerifiedTrustedCommunity

Write the YAML frontmatter for a SKILL.md file so it activates reliably — name, description, and activation keywords that the model matches against. Covers length, tone, and the most common frontmatter mistakes. Use this skill when authoring a new skill, fixing a skill that isn't auto-activating, or reviewing skills for publication. Activate when: SKILL.md frontmatter, skill description, skill activation, skill YAML, write a skill, author a skill.

2SKILL.mdUpdated Apr 23, 2026

latestaiagents/skill-frontmatter

latestaiagents/skill-activation-patterns

development

VerifiedTrustedCommunity

Design skills that fire at the right moment — neither over-eager (noise) nor under-eager (silent). Covers activation specificity, trigger phrases, disambiguation between overlapping skills, and debugging activation. Use this skill when multiple skills could fire on the same query, a skill never fires, or a skill fires too often. Activate when: skill won't activate, skill over-activates, overlapping skills, skill triggers, skill selection, skill disambiguation.

2SKILL.mdUpdated Apr 23, 2026

latestaiagents/skill-activation-patterns

latestaiagents/progressive-disclosure

development

VerifiedTrustedCommunity

Structure SKILL.md content so the model reads just enough — concise summary up front, progressively deeper detail, examples on demand. Covers section ordering, length budgets, when to split into multiple skills. Use this skill when writing or refactoring a skill body, one skill has grown too long, or a skill is wordy but not useful. Activate when: SKILL.md structure, skill content, skill too long, split skill, progressive disclosure, skill body.

2SKILL.mdUpdated Apr 23, 2026

latestaiagents/progressive-disclosure

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/latestaiagents/agent-skills.git

# Copy into Claude Code skills folder (global)
cp -r agent-skills/skills/claude-4-6-features/long-context-1m ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

latestaiagents/agent-skills

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT