skills/mcp-to-skill/SKILL.md
Converts MCP servers into on-demand skills to cut context window usage, classifying each tool by replacement strategy and generating the skill package. Triggers on: "convert MCP", "MCP to skill", "reduce context size", "too many tools", "tool token bloat", "MCP migration".
npx skillsauth add mathews-tom/armory mcp-to-skillInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Convert MCP servers into on-demand skills. MCP tool schemas sit in the system prompt
on every turn (~500-2000 tokens per tool, regardless of whether they're used). Skills
cost zero tokens until loaded via view. For a typical setup with 4-5 MCP servers
exposing 20-40 tools, this reclaims 10,000-30,000 tokens of context per turn.
This matters because that's 10-30% of the context window burned before the conversation even starts — and it compounds: every turn re-injects the full schema.
Not every MCP should become a skill. Apply this heuristic:
Convert when the MCP wraps a REST API (use curl/web_fetch), wraps a CLI tool (gh, aws, gcloud — invoke directly), implements a reasoning/planning pattern (capture as methodology), or when you use fewer than half its tools regularly.
Keep as MCP when it maintains persistent server-side state (DB connections, WebSocket sessions), handles binary protocols or streaming, provides real-time event subscriptions, or is tiny (1-2 tools, under 500 tokens — negligible overhead).
Hybrid approach — convert the stateless tools to a skill, keep stateful ones as a slimmed-down MCP. This is often the sweet spot for large MCP servers.
Proceed through 5 phases. Present findings at each phase boundary and wait for user confirmation before continuing. The user knows their usage patterns better than any analysis can infer — lean on their input.
Acquire the MCP's tool definitions. Try these sources in order:
Active session tools — Inspect tools visible in the current conversation. Ask the user to identify which tools belong to the target MCP. This is the most reliable source because you see the exact schema consuming context.
MCP config file — Parse the user's MCP configuration:
~/Library/Application Support/Claude/claude_desktop_config.json.cursor/mcp.json in the project root~/.claude/settings.json or project .mcp.jsonMCP server source code — If the user points to a repo or local path, look for
tool definitions: FastMCP @mcp.tool() decorators, SDK server.setRequestHandler,
or similar patterns. Extract name, description, parameter schemas, return types.
Package registry — For published MCPs: npm info <pkg> or pip show <pkg>,
then fetch the README or source to find tool definitions.
User-provided schema — Ask the user to paste or upload tool definitions.
Produce a structured inventory for each tool:
Tool: tool_name
Description: what it does
Parameters: param list with types
Returns: return type/shape
Estimated tokens: rough schema size
Present this and ask: "Are these all the tools? Did I miss any?"
Classify each tool along two dimensions. This classification drives the entire replacement strategy, so getting it right matters.
Replacement category:
| Category | Signals | Replacement Approach |
| --------------- | ------------------------------------------ | ----------------------- |
| REST_API | HTTP endpoints, URL patterns, auth headers | curl or web_fetch |
| CLI_WRAPPER | Wraps known CLI (git, gh, aws, docker) | Direct CLI invocation |
| LOGIC_PATTERN | Structures reasoning, no external calls | Methodology in SKILL.md |
| FILE_OP | Reads/writes/transforms local files | bash commands or Python |
| STATEFUL | Maintains connections, sessions, caches | Keep as MCP (flag it) |
| COMPOSITE | Orchestrates multiple sub-operations | Multi-step workflow |
Usage frequency — Ask the user directly:
| Frequency | Action |
| -------------- | ---------------------------------------------- |
| ESSENTIAL | Must be in the generated skill |
| NICE_TO_HAVE | Include if the replacement is clean |
| RARELY_USED | Skip — user can fall back to manual invocation |
Present a classification table and ask: "Does this look right? Which tools do you actually use regularly?"
Flag any STATEFUL tools explicitly — these are the ones that may not convert
cleanly, and the user should understand the trade-off.
For each tool marked ESSENTIAL or NICE_TO_HAVE, design the concrete replacement.
Read references/replacement-patterns.md — it contains detailed patterns for
each category: REST API wrappers, CLI mappings, logic patterns, file operations,
stateful workarounds, composite workflows, auth patterns, and output parsing.
For each tool, determine:
Also identify multi-tool workflows — sequences of tools the user commonly chains. These become "Common Workflows" sections in the generated skill, which is where skills often provide more value than the MCP because workflows make the multi-step pattern explicit rather than relying on the agent to discover it.
Ask the user:
If the target environment is unclear, read references/environment-guide.md
for environment-specific constraints (Claude.ai vs Claude Code vs Cursor vs API).
Generate the complete skill package.
Read references/skill-template.md for the output template, sizing guide,
frontmatter checklist, and quality checklist.
The generated skill structure:
skill-name/
SKILL.md
Frontmatter (name, description with aggressive triggers)
Quick Reference table (old tool name to new command mapping)
Prerequisites (CLI tools, env vars, auth setup)
Core Operations (one subsection per essential tool)
Common Workflows (multi-step patterns)
Error Handling and Troubleshooting
references/ (only if SKILL.md exceeds ~400 lines)
api-reference.md (overflow for complex tool replacements)
Generation rules — these exist to ensure the generated skill actually triggers and works correctly in practice:
After generating, validate the skill and estimate savings.
Run the token estimation using scripts/estimate_tokens.py:
python3 scripts/estimate_tokens.py --mcp-tools TOOL_COUNT --avg-schema-chars AVG_CHARS
This shows before/after token savings per turn and across a typical conversation.
Opus 4.7 note: Input tokens run 1.0–1.35× Opus 4.6 for the same text due to a tokenizer update. Treat pre-4.7 baselines as a lower bound — actual savings on Opus 4.7 may be larger than the estimator reports.
Generate 2-3 test scenarios — realistic prompts that would trigger the new skill and show the replacement commands in action. Present them to the user.
Migration checklist:
Validate the generated skill structure:
Verify the generated skill directory contains a valid SKILL.md with frontmatter (name, description),
and that all file references in the body resolve to existing files within the skill directory.
Present the complete package to the user. Offer to iterate on any section.
Conversions succeed best for stateless tools and REST/CLI wrappers. Inherent constraints:
These exist to prevent common failure modes in generated skills:
testing
Create, review, and restyle data visualizations using Edward Tufte principles: high data-ink ratio, direct labels, range-frame axes, small multiples, accessible color, responsive charts, and honest comparisons. Triggers on: "create a chart", "style this chart", "review this graph", "Tufte chart", "data visualization", "Recharts", "Plotly", "matplotlib", "Chart.js", "ECharts", "D3". Use when generating or critiquing charts, dashboards, sparklines, and data tables.
testing
Manages dependent branch stacks and stacked pull requests using safe Git topology rules. Triggers on: "create stacked PRs", "publish this stack", "sync my PR stack", "rebase this stack", "merge the stack", "retarget child PRs", "split this branch into stacked PRs", "validate this stack", "cleanup stacked branches". Use when local branches or one source branch need to become a dependency-ordered PR stack with correct parent bases, validation, synchronization, merge order, and cleanup.
development
Scaffolds per-repository agent context so coding agents share the same issue tracker rules, triage label vocabulary, domain glossary, ADR layout, and handoff conventions. Triggers on: "set up project context", "configure agent docs", "create CONTEXT.md", "setup agent workflow", "agent issue tracker setup", "triage labels", "domain glossary for agents". Use when a repo needs durable context files before planning, triage, debugging, TDD, architecture review, or multi-agent implementation.
testing
Produces phased task boards from feature requests: dependency-mapped work items, parallelization flags, risk flags, edge cases, test matrices. Triggers on: "decompose this feature", "task breakdown with dependencies", "phased implementation plan", "work breakdown structure". NOT for effort estimates, use estimate-calibrator.