skills-templates/moonshot-ai/SKILL.md
Moonshot AI Kimi API - Trillion-parameter MoE model with 256K context, tool calling, and agentic capabilities for chat, coding, and autonomous task execution
npx skillsauth add enuno/claude-command-and-control moonshot-aiInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Moonshot AI provides the Kimi large language model series, featuring the flagship Kimi K2 - a state-of-the-art mixture-of-experts (MoE) model with 1 trillion total parameters. The API offers OpenAI-compatible endpoints with 256K context length, strong tool calling capabilities, and competitive pricing.
Key Value Proposition: Access a trillion-parameter model optimized for agentic tasks, tool use, and coding at significantly lower costs than competitors (up to 100x cheaper than GPT-4 for some tasks), with excellent multilingual support for Chinese and English.
┌─────────────────────────────────────────────────────────────────┐
│ Moonshot AI Platform │
│ platform.moonshot.ai │
└─────────────────────────────────────────────────────────────────┘
│
┌─────────────────────┼─────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Kimi K2 │ │ moonshot-v1 │ │ Tool Use │
│ (Latest) │ │ (Legacy) │ │ │
├───────────────┤ ├───────────────┤ ├───────────────┤
│ • 1T params │ │ • v1-8k │ │ • Functions │
│ • 32B active │ │ • v1-32k │ │ • Web Search │
│ • 128K-256K │ │ • v1-128k │ │ • Code Exec │
│ • MoE arch │ │ │ │ • Custom │
└───────────────┘ └───────────────┘ └───────────────┘
│ │ │
└─────────────────────┼─────────────────────┘
│
▼
┌───────────────────┐
│ API Endpoints │
├───────────────────┤
│ • OpenAI compat │
│ • Anthropic compat│
│ • Streaming │
│ • Tool calling │
└───────────────────┘
| Model | Parameters | Active | Context | Best For | |-------|------------|--------|---------|----------| | kimi-k2-0905-preview | 1T | 32B | 256K | Latest, agentic tasks | | kimi-k2-turbo-preview | 1T | 32B | 128K | Fast, general use | | kimi-k2-thinking | 1T | 32B | 128K | Multi-step reasoning | | moonshot-v1-8k | - | - | 8K | Short context | | moonshot-v1-32k | - | - | 32K | Medium context | | moonshot-v1-128k | - | - | 128K | Long documents | | kimi-latest | - | - | Auto | Auto-selects tier |
Architecture: Mixture-of-Experts (MoE)
Total Parameters: 1 Trillion
Activated Parameters: 32 Billion per token
Layers: 61 (including 1 dense layer)
Experts: 384 total, 8 selected per token
Attention: MLA (Multi-head Latent Attention)
Activation: SwiGLU
Vocabulary: 160K tokens
Context: 128K tokens (256K for 0905-preview)
Training Data: 15.5T tokens
export MOONSHOT_API_KEY="your-api-key-here"
# Optional: Use China endpoint
export MOONSHOT_API_BASE="https://api.moonshot.cn/v1"
from openai import OpenAI
client = OpenAI(
api_key="your-api-key",
base_url="https://api.moonshot.ai/v1"
)
response = client.chat.completions.create(
model="kimi-k2-0905-preview",
messages=[
{"role": "system", "content": "You are Kimi, an AI assistant created by Moonshot AI."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
temperature=0.6, # Recommended
max_tokens=1024
)
print(response.choices[0].message.content)
| Region | URL |
|--------|-----|
| Global | https://api.moonshot.ai/v1 |
| China | https://api.moonshot.cn/v1 |
curl https://api.moonshot.ai/v1/chat/completions \
-H "Authorization: Bearer $MOONSHOT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2-0905-preview",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Endpoint: POST /v1/chat/completions
Request Parameters:
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| model | string | Yes | Model identifier |
| messages | array | Yes | Conversation history |
| temperature | float | No | 0.0-1.0, recommended 0.6 |
| max_tokens | int | No | Maximum response length |
| stream | bool | No | Enable streaming |
| top_p | float | No | Nucleus sampling |
| tools | array | No | Function definitions |
| tool_choice | string | No | auto, none, or specific |
Message Format:
{
"messages": [
{"role": "system", "content": "System prompt"},
{"role": "user", "content": "User message"},
{"role": "assistant", "content": "Previous response"},
{"role": "user", "content": [
{"type": "text", "text": "Multimodal content"}
]}
]
}
Response:
{
"id": "chatcmpl-xxx",
"object": "chat.completion",
"created": 1234567890,
"model": "kimi-k2-0905-preview",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Response text"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 50,
"completion_tokens": 100,
"total_tokens": 150
}
}
Kimi K2 has strong native support for tool calling, enabling agentic applications.
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"required": ["city"],
"properties": {
"city": {
"type": "string",
"description": "City name"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
}
}
}
},
{
"type": "function",
"function": {
"name": "search_web",
"description": "Search the web for information",
"parameters": {
"type": "object",
"required": ["query"],
"properties": {
"query": {
"type": "string",
"description": "Search query"
}
}
}
}
}
]
response = client.chat.completions.create(
model="kimi-k2-0905-preview",
messages=[
{"role": "user", "content": "What's the weather in Tokyo?"}
],
tools=tools,
tool_choice="auto",
temperature=0.6
)
# Check if model wants to call a tool
message = response.choices[0].message
if message.tool_calls:
for tool_call in message.tool_calls:
print(f"Function: {tool_call.function.name}")
print(f"Arguments: {tool_call.function.arguments}")
import json
def execute_tool(name: str, args: dict) -> str:
"""Execute tool and return result."""
if name == "get_weather":
return json.dumps({"temp": 22, "condition": "sunny"})
elif name == "search_web":
return json.dumps({"results": ["Result 1", "Result 2"]})
return json.dumps({"error": "Unknown tool"})
messages = [{"role": "user", "content": "What's the weather in Tokyo?"}]
while True:
response = client.chat.completions.create(
model="kimi-k2-0905-preview",
messages=messages,
tools=tools,
tool_choice="auto",
temperature=0.6
)
message = response.choices[0].message
messages.append(message)
if not message.tool_calls:
# No more tool calls, done
print(message.content)
break
# Execute each tool call
for tool_call in message.tool_calls:
result = execute_tool(
tool_call.function.name,
json.loads(tool_call.function.arguments)
)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result
})
stream = client.chat.completions.create(
model="kimi-k2-0905-preview",
messages=[{"role": "user", "content": "Write a poem about AI"}],
stream=True,
temperature=0.6
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.MOONSHOT_API_KEY,
baseURL: 'https://api.moonshot.ai/v1'
});
async function chat() {
const stream = await client.chat.completions.create({
model: 'kimi-k2-0905-preview',
messages: [{ role: 'user', content: 'Hello!' }],
stream: true
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
}
curl https://api.moonshot.ai/v1/chat/completions \
-H "Authorization: Bearer $MOONSHOT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2-0905-preview",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'
| Model | Input (per 1M tokens) | Output (per 1M tokens) | |-------|----------------------|------------------------| | kimi-k2-0905-preview | ~$0.15 | ~$2.50 | | kimi-k2-turbo-preview | ~$0.15 | ~$2.50 |
| Context Tier | Input (per 1M tokens) | Output (per 1M tokens) | |--------------|----------------------|------------------------| | 8K | $0.20 | $2.00 | | 32K | $1.00 | $3.00 | | 128K | $2.00 | $5.00 |
| Tool | Cost per Call | |------|---------------| | $web_search | ~$0.005 |
from litellm import completion
response = completion(
model="moonshot/kimi-k2-0905-preview",
messages=[{"role": "user", "content": "Hello"}]
)
model_list:
- model_name: kimi-k2
litellm_params:
model: moonshot/kimi-k2-0905-preview
api_key: os.environ/MOONSHOT_API_KEY
- model_name: kimi-128k
litellm_params:
model: moonshot/moonshot-v1-128k
api_key: os.environ/MOONSHOT_API_KEY
LiteLLM automatically handles:
Moonshot also offers an Anthropic-compatible API endpoint:
from anthropic import Anthropic
client = Anthropic(
api_key="your-moonshot-key",
base_url="https://api.moonshot.ai/v1"
)
# Note: Temperature mapping
# real_temperature = request_temperature * 0.6
response = client.messages.create(
model="kimi-k2-0905-preview",
messages=[{"role": "user", "content": "Hello!"}],
max_tokens=1024,
temperature=1.0 # Will become 0.6 internally
)
# Recommended default
temperature = 0.6
# For creative tasks
temperature = 0.8
# For factual/deterministic tasks
temperature = 0.3
# Default system prompt (good starting point)
system_prompt = "You are Kimi, an AI assistant created by Moonshot AI."
# Custom for specific tasks
system_prompt = """You are a coding assistant.
Provide clean, well-documented code with explanations.
Use Python unless otherwise specified."""
# For documents up to 256K tokens
response = client.chat.completions.create(
model="kimi-k2-0905-preview", # Supports 256K
messages=[
{"role": "system", "content": "Analyze the following document."},
{"role": "user", "content": very_long_document}
],
temperature=0.3 # Lower for analysis tasks
)
| Benchmark | Score | Notes | |-----------|-------|-------| | AIME 2024 | 69.6% | Math reasoning | | MATH-500 | 97.4% | Mathematics | | LiveCodeBench | 53.7% | Code generation | | SWE-bench Verified | 71.6% | Agentic coding | | MMLU | 89.5% | General knowledge | | MMLU-Redux | 92.7% | Updated evaluation | | Tau2 Retail | 70.6% | Tool use | | AceBench | 76.5% | Agent evaluation |
Error: 401 Unauthorized
Solutions:
Error: 429 Too Many Requests
Solutions:
Error: Context length exceeded
Solutions:
Error: Invalid tool definition
Solutions:
tools
MemPalace local-first AI memory system. Use when setting up persistent memory for Claude Code sessions, mining project files or conversation transcripts, querying past context, configuring MCP tools, managing the knowledge graph, or troubleshooting palace operations.
tools
LangSmith Python SDK — trace, evaluate, and monitor LLM applications. Covers @traceable decorator, trace context manager, Client API, evaluate() / aevaluate(), comparative evaluation, custom evaluators, dataset management, prompt caching, ASGI middleware, and pytest plugin.
development
LangGraph (Python) — build stateful, controllable agent graphs with checkpointing, streaming, persistence, interrupts, fault tolerance, and durable execution. Covers both Graph API (StateGraph) and Functional API (@entrypoint/@task).
development
LangGraph Graph API (Python) — build explicit DAG agent workflows with StateGraph, typed state, nodes, edges, Command routing, Send fan-out, checkpointers, interrupts, and streaming. Use when you need explicit control flow and graph topology.