plugins/tooling/skills/mcp-server-craft/SKILL.md
Build MCP servers that AI agents actually want to use. Covers the full lifecycle — tool design (naming, schemas, descriptions), resource design (URIs, templates, subscriptions), project structure, transport selection (stdio vs Streamable HTTP), security, error handling, and testing. Use this skill when building a new MCP server, adding tools or resources to an existing one, reviewing an MCP server for quality, choosing between stdio and HTTP transport, designing tool schemas for LLM consumption, or hardening an MCP server for production. Also activates for questions about tool naming conventions, Pydantic Field descriptions, Zod validation for MCP, resource URI schemes, or MCP server security patterns.
npx skillsauth add saif-shines/devex-kit mcp-server-craftInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Build MCP servers that LLMs and AI agents can use reliably. A good MCP server makes the agent feel competent — clear tool names, helpful descriptions, structured errors, and predictable behavior.
This skill covers the full lifecycle:
| Phase | What you do | Key question | |-------|-------------|-------------| | Design | Tool names, schemas, descriptions, resource URIs | Can the LLM understand what to call and why? | | Build | Project structure, transport, implementation | Is the server clean and maintainable? | | Harden | Security, error handling, validation | Can the server handle malicious or unexpected input? | | Test | Functional, integration, agent workflow tests | Does it work when a real agent calls it? |
State which phase you need, or describe what you're building.
For expanded tool and resource design patterns with examples, load
references/tool-design.md.
The most important thing about an MCP server is whether the LLM can figure out how to use it. Tool names, descriptions, and schemas are your API — the LLM reads them to decide what to call and how.
Names should be verbs that tell the model exactly what happens:
| Pattern | Examples | Why it works |
|---------|----------|-------------|
| verb_noun | read_file, search_issues, create_bucket | Action is clear, noun scopes it |
| get_status | get_build_status, get_user_profile | Read-only intent is obvious |
| list_* | list_repositories, list_connections | Signals pagination/collection |
Rules:
_, or -snake_case and kebab-casesearch_code not code_searchDescriptions are documentation the LLM reads at inference time. They directly affect whether the agent picks the right tool.
Good: "Search for code across repositories using a text query.
Returns matching file paths and line numbers.
Use this when the user wants to find where something is defined or used."
Bad: "Code search functionality."
Include in every description:
Define input schemas with rich metadata. The LLM reads field descriptions to fill in parameters correctly.
TypeScript (Zod):
server.tool("search_issues", "Search for issues by query text, label, or status.",
{
query: z.string().describe("Search text to match against issue title and body"),
status: z.enum(["open", "closed", "all"]).default("open")
.describe("Filter by issue status. Default: open"),
limit: z.number().min(1).max(100).default(20)
.describe("Maximum results to return. Default: 20"),
},
async ({ query, status, limit }) => { /* ... */ }
);
Python (Pydantic Field):
@mcp.tool()
async def search_issues(
query: str = Field(..., description="Search text to match against issue title and body"),
status: Literal["open", "closed", "all"] = Field("open", description="Filter by issue status"),
limit: int = Field(20, ge=1, le=100, description="Maximum results to return"),
) -> list[Issue]:
"""Search for issues by query text, label, or status.
Returns matching issues with title, body preview, and metadata."""
...
Key patterns:
Field(...) (required) vs Field(default) (optional) — never leave ambiguousge, le, min, max, Literal, enum) so the LLM knows valid ranges"IMPORTANT: Provide the full absolute path, not relative"Resources expose read-only data via URIs. Use them for context the agent needs before picking a tool.
resource://connections → List available service connections
resource://schema/users → Database schema for users table
file:///workspace/config.yaml → Project configuration
Rules:
postgres://, jira://)application/json, text/markdown)resource://schema/{table_name}One server, one domain. A github-mcp-server with 8 focused tools beats an everything-server with 50 tools where the LLM can't tell search_code from search_files.
TypeScript:
mcp-server-myservice/
├── src/
│ ├── index.ts # Entry point, transport setup
│ ├── server.ts # MCP server, tool/resource registration
│ ├── tools/ # Tool implementations
│ │ ├── search.ts
│ │ └── create.ts
│ ├── resources/ # Resource implementations
│ │ └── schema.ts
│ ├── types.ts # Shared types
│ └── utils/ # Helpers (http client, validation)
├── tests/
├── package.json
├── tsconfig.json
└── README.md
Python:
mcp-server-myservice/
├── src/
│ └── myservice_mcp/
│ ├── __init__.py # __version__
│ ├── server.py # MCP server, main() entry point
│ ├── models.py # Pydantic models
│ ├── consts.py # Constants (UPPER_SNAKE_CASE)
│ └── tools/ # Tool implementations
├── tests/
├── pyproject.toml
└── README.md
Key rules:
main() that creates the server and starts transport| Transport | When to use | Client examples | |-----------|------------|-----------------| | stdio | Local tools, desktop clients | Claude Desktop, local dev | | Streamable HTTP | Remote access, cloud deployment | Cursor, cloud agents, multi-tenant | | HTTP/SSE (legacy) | Backward compatibility only | Older MCP clients |
Prefer Streamable HTTP for anything deployed. Use stdio for local-only tools. Support both by keeping server logic transport-agnostic.
Every tool and resource handler should be async. Use concurrency for independent operations:
# Good: concurrent fetches
results = await asyncio.gather(
fetch_issues(repo_a),
fetch_issues(repo_b),
fetch_issues(repo_c),
)
# Bad: sequential when independent
result_a = await fetch_issues(repo_a)
result_b = await fetch_issues(repo_b)
result_c = await fetch_issues(repo_c)
For expanded security patterns, input validation, sandboxing, and testing strategies, load
references/security-and-testing.md.
Return errors inside tool results so the LLM can react — don't throw protocol-level exceptions that crash the conversation.
// Good: structured error the LLM can interpret
return {
content: [{ type: "text", text: JSON.stringify({
error: "Repository not found",
suggestion: "Check the repository name. Use list_repositories to see available repos."
})}],
isError: true,
};
// Bad: raw exception that kills the tool call
throw new Error("ENOENT");
Error handling rules:
isError: true in results for recoverable errorsValidate everything at the boundary. The LLM generates parameters — they will be wrong sometimes.
../../../etc/passwd)If your server executes user-provided code (diagram generators, script runners):
| Layer | What to test | How | |-------|-------------|-----| | Unit | Individual tool logic, validation, error paths | Mock external dependencies | | Integration | Tool → real service round-trip | Use test accounts or sandboxes | | Contract | Protocol compliance, schema correctness | Validate against MCP spec | | Agent workflow | End-to-end with a real LLM client | Call tools from an agent, check results |
Agent workflow testing is the most important and most neglected. Your tools may pass unit tests but confuse the LLM because the descriptions are ambiguous or the return format is unexpected.
isError: true and a suggestionSet the instructions field — the LLM reads this before using any tool:
const server = new McpServer({
name: "github-server",
version: "1.0.0",
instructions: "Read-only access to GitHub repos. Use search_code to find definitions, list_issues to browse bugs, get_file to read files. Always provide full repo name (owner/repo)."
});
Design → Build: Every tool has a verb-noun name ≤ 64 chars? Descriptions include what/returns/when? Schemas have constraints and field descriptions?
Build → Harden: Server starts cleanly on both stdio and HTTP? All handlers are async? Tool responses are structured JSON the LLM can parse?
Harden → Test: Input validation covers path traversal, oversized inputs, invalid types? Errors use isError: true with suggestions? Rate limiting in place for external API calls?
isError: true with suggestions, not raw exceptionstools
Route tasks and route the user to the correct devex-kit skill before any work begins. Use when starting conversations or tasks that may involve documentation contributions, writing style, cookbook quality, sidebar navigation, SDK design/build/ship, CLI or API tooling, MCP server craft, agent plugin or skill development, devrel storytelling, DX first-success and content taxonomy, or when the user says "using devex-kit", "which devex-kit skill should I use", "help me pick the right skill from the kit", "route this to the right devex skill", or is unsure which /docs-* /sdk-* /mcp-* /devrel-* skill applies. Activates at the start of relevant sessions just like using-superpowers.
tools
Design, build, document, and ship SDKs that developers love. Covers the full SDK lifecycle — from API surface design and type safety through implementation, bundling, documentation, versioning, and publishing. Use this skill whenever someone is creating a new SDK, extracting shared code into a client library, improving SDK developer experience, planning a breaking change or migration guide, or reviewing an SDK for quality. Also activates for questions about error message design, client library patterns, type-safe API design, SDK packaging (ESM/CJS), or npm publishing.
tools
Build CLI tools and API utilities that developers on your platform actually use. Covers CLI design (command hierarchy, flags, completions, cross-platform UX) and API collection generation (Postman/OpenAPI from Express, Next.js, Fastify, Hono routes). Use this skill when building a developer-facing CLI tool, adding subcommands or flags, implementing shell completions, designing interactive prompts, generating Postman collections from code, creating API testing artifacts, or building any developer utility. Also activates for questions about argument parsing (commander, click, typer, cobra), progress indicators, terminal UX, or Postman collection format.
development
Create new skills and iteratively improve existing ones using devex-kit conventions. Use when users want to create a skill from scratch, turn a workflow into a SKILL.md, write or edit a skill, improve skill triggering/description, package a skill for distribution, or follow the lean + references + progressive disclosure model. Also activates for questions about skill anatomy, frontmatter quality, imperative writing style, test cases for skills, or when the user says "create a skill", "write SKILL.md", "improve this skill", "package my skill".