src/skills/ai-provider-cohere-sdk/SKILL.md
Official Cohere TypeScript SDK patterns -- CohereClientV2, chat, embeddings, rerank, RAG with citations, tool use, streaming, and model selection
npx skillsauth add agents-inc/skills ai-provider-cohere-sdkInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Quick Guide: Use the
cohere-ainpm package withCohereClientV2for all new Cohere integrations. V2 API requiresmodelon every call. UsechatStreamfor streaming withcontent-deltaevents. Embeddings requireinputTypematching your use case (search_documentfor indexing,search_queryfor querying). Rerank scores documents by relevance. RAG works by passingdocumentstochat()-- the model returns inline citations automatically. Tool use follows a 4-step loop: user message, model returnstool_calls, you execute and return results, model generates cited response.
<critical_requirements>
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST use CohereClientV2 (not CohereClient) for all new code -- V2 is the current API with required model parameter)
(You MUST specify inputType on every embed call -- search_document for indexing, search_query for querying -- mismatched types produce garbage similarity scores)
(You MUST handle the tool use loop correctly: append the full assistant message (with tool_calls) to messages, then append tool role results with matching tool_call_id)
(You MUST check finish_reason in responses -- MAX_TOKENS means the output was truncated)
(You MUST never hardcode API keys -- pass via token constructor parameter sourced from environment variables)
</critical_requirements>
Auto-detection: Cohere, cohere-ai, CohereClientV2, CohereClient, command-a, command-r, command-r-plus, embed-v4, rerank-v4, chatStream, content-delta, inputType, search_document, search_query, embeddingTypes, topN, CO_API_KEY, COHERE_API_KEY
When to use:
Key patterns covered:
CohereClientV2 (token, timeout, platform configs)chat, chatStream, event types)inputType for search/classification/clusteringWhen NOT to use:
The Cohere TypeScript SDK (cohere-ai) provides direct access to Cohere's API surface -- chat, embeddings, rerank, and RAG with citations. The SDK is auto-generated from Cohere's API spec using Fern.
Core principles:
CohereClientV2 provides the modern API. model is required on every call. V1 methods on CohereClient are legacy.inputType parameter (search_document, search_query, classification, clustering) is mandatory for v3+ models. Mismatching input types between indexing and querying silently degrades results.documents directly to chat() and the model returns grounded answers with inline citations. No external retrieval framework required for the grounding step.When to use the Cohere SDK directly:
When NOT to use:
Initialize CohereClientV2. The token parameter is required (pass from environment).
// lib/cohere.ts -- basic setup
import { CohereClientV2 } from "cohere-ai";
const client = new CohereClientV2({
token: process.env.CO_API_KEY,
});
export { client };
// lib/cohere.ts -- production configuration
const TIMEOUT_MS = 30_000;
const client = new CohereClientV2({
token: process.env.CO_API_KEY,
timeout: TIMEOUT_MS,
});
Why good: Explicit token from env var, named timeout constant, named export
// BAD: Hardcoded key, default CohereClient (V1)
import { CohereClient } from "cohere-ai";
const client = new CohereClient({ token: "sk-abc123" });
Why bad: Hardcoded API key is a security breach risk, CohereClient is the legacy V1 client
See: examples/core.md for error handling, platform configs (Bedrock, Azure)
V2 chat uses messages array with system, user, assistant, and tool roles.
const response = await client.chat({
model: "command-a-03-2025",
messages: [
{ role: "system", content: "You are a helpful coding assistant." },
{ role: "user", content: "Explain TypeScript generics." },
],
});
console.log(response.message.content[0].text);
Why good: System message for instruction, model explicitly specified, correct V2 content access path
// BAD: Missing model (required in V2), wrong response access
const response = await client.chat({
messages: [{ role: "user", content: "Hello" }],
});
console.log(response.text); // WRONG: V2 uses response.message.content[0].text
Why bad: V2 requires model, response shape is response.message.content[0].text not response.text
See: examples/core.md for multi-turn, token tracking, temperature control
Use chatStream with for await and check event type for content-delta.
const stream = await client.chatStream({
model: "command-a-03-2025",
messages: [{ role: "user", content: "Explain async/await." }],
});
for await (const event of stream) {
if (event.type === "content-delta") {
process.stdout.write(event.delta?.message?.content?.text ?? "");
}
}
Why good: Checks event type before accessing delta, handles nullable content safely
// BAD: Not checking event type
for await (const event of stream) {
console.log(event.delta?.message); // Many events don't have message delta
}
Why bad: Only content-delta events have text content -- other events (message-start, citation-start, tool-plan-delta) have different shapes
See: examples/core.md for full streaming with all event types
inputType is required for v3+ models. Mismatching types between indexing and querying silently degrades results.
const EMBEDDING_MODEL = "embed-v4.0";
// Index documents with search_document
const docEmbeddings = await client.embed({
model: EMBEDDING_MODEL,
inputType: "search_document",
texts: ["TypeScript is a typed superset of JavaScript."],
embeddingTypes: ["float"],
});
// Query with search_query
const queryEmbedding = await client.embed({
model: EMBEDDING_MODEL,
inputType: "search_query",
texts: ["What is TypeScript?"],
embeddingTypes: ["float"],
});
Why good: Correct inputType pairing, embeddingTypes explicitly specified, named model constant
// BAD: Same inputType for both indexing and querying
const docs = await client.embed({
model: "embed-v4.0",
inputType: "search_query", // WRONG for documents
texts: documents,
embeddingTypes: ["float"],
});
Why bad: Using search_query for document indexing silently produces worse similarity scores -- documents must use search_document
See: examples/embeddings-rerank.md for cosine similarity, dimension control, batch embedding
Score documents by relevance to a query. Returns ordered results with relevance scores.
const RERANK_MODEL = "rerank-v4.0-pro";
const TOP_N = 3;
const result = await client.rerank({
model: RERANK_MODEL,
query: "What is TypeScript?",
documents: [
"TypeScript is a typed superset of JavaScript.",
"Python is a general-purpose language.",
"TypeScript compiles to JavaScript.",
],
topN: TOP_N,
});
for (const item of result.results) {
console.log(`Doc ${item.index}: score ${item.relevanceScore}`);
}
Why good: Named constants, topN limits results, accesses index and relevanceScore
See: examples/embeddings-rerank.md for embed + rerank pipeline, rank fields
Pass documents to chat() and the model returns grounded answers with inline citations.
const response = await client.chat({
model: "command-a-03-2025",
messages: [{ role: "user", content: "What is TypeScript?" }],
documents: [
{
data: {
text: "TypeScript is a typed superset of JavaScript.",
title: "TS Docs",
},
},
{
data: {
text: "TypeScript was developed by Microsoft.",
title: "History",
},
},
],
});
console.log(response.message.content[0].text);
// Citations reference which documents support each claim
if (response.message.citations) {
for (const citation of response.message.citations) {
console.log(`"${citation.text}" from doc ${citation.sources}`);
}
}
Why good: Documents passed inline with metadata, citations accessed from response, no external retrieval framework needed
See: examples/tools-rag.md for full RAG pipeline with embed + rerank + chat
4-step loop: user message -> model returns tool_calls -> execute tools -> return results with tool_call_id.
const tools = [
{
type: "function" as const,
function: {
name: "get_weather",
description: "Get weather for a city",
parameters: {
type: "object",
properties: {
location: { type: "string", description: "City name" },
},
required: ["location"],
},
},
},
];
const response = await client.chat({
model: "command-a-03-2025",
messages: [{ role: "user", content: "Weather in Paris?" }],
tools,
});
// Check if model wants to call tools
if (response.message.toolCalls) {
// See examples/tools-rag.md for the complete tool execution loop
}
Why good: Standard JSON Schema tool definition, checks for toolCalls before executing
See: examples/tools-rag.md for complete multi-step tool loop with tool result submission
Catch CohereError for API errors, CohereTimeoutError for timeouts.
import { CohereError, CohereTimeoutError } from "cohere-ai";
try {
const response = await client.chat({
model: "command-a-03-2025",
messages: [{ role: "user", content: "Hello" }],
});
} catch (error) {
if (error instanceof CohereTimeoutError) {
console.error("Request timed out");
} else if (error instanceof CohereError) {
console.error(`API Error [${error.statusCode}]: ${error.message}`);
console.error("Body:", error.body);
} else {
throw error; // Re-throw unknown errors
}
}
Why good: Specific error types with status codes, re-throws unexpected errors, timeout handled separately
See: examples/core.md for production error handling patterns
</patterns>General purpose (best) -> command-a-03-2025 (256K context, strongest)
Reasoning tasks -> command-a-reasoning-08-2025 (multi-step reasoning)
Vision/document analysis -> command-a-vision-07-2025 (images, charts, OCR)
Translation -> command-a-translate-08-2025 (23 languages)
Lightweight / edge -> command-r7b-12-2024 (7B, fast, 128K context)
Legacy (still supported) -> command-r-08-2024, command-r-plus-08-2024
Embeddings (best) -> embed-v4.0 (multimodal, 128K context, flexible dims)
Embeddings (English) -> embed-english-v3.0 (1024 dims)
Embeddings (multilingual) -> embed-multilingual-v3.0 (23 languages)
Rerank (quality) -> rerank-v4.0-pro (32K context, multilingual)
Rerank (speed) -> rerank-v4.0-fast (32K context, latency-optimized)
embed() call instead of calling per-documenttopN in rerank -- limit results to reduce response size and costoutputDimension with embed-v4 -- reduce dimensions (256/512/1024) for faster similarity search at minimal quality lossfinish_reason === "MAX_TOKENS" -- detect truncated outputtemperature: 0 for deterministic output (enables caching)int8/binary types for compressed storage with minimal quality lossstrictTools: true to force tool calls to follow the schema exactly (structured outputs)thinking: { type: "enabled" } with reasoning models for complex multi-step taskstoolChoice: "REQUIRED" when you always want the model to call a tool (command-r7b+ only)<decision_framework>
New project?
+-- YES -> CohereClientV2 (always)
+-- Existing V1 code?
+-- Working fine? -> Keep CohereClient but plan migration
+-- Need V2 features? -> Migrate to CohereClientV2
What is your task?
+-- General chat/generation -> command-a-03-2025 (most capable)
+-- Reasoning / multi-step -> command-a-reasoning-08-2025
+-- Image/document analysis -> command-a-vision-07-2025
+-- Translation -> command-a-translate-08-2025
+-- Lightweight / low latency -> command-r7b-12-2024
+-- Embeddings -> embed-v4.0 (or embed-english-v3.0 for English-only)
+-- Rerank quality -> rerank-v4.0-pro
+-- Rerank speed -> rerank-v4.0-fast
inputType SelectionWhat are you embedding?
+-- Documents for a search index -> "search_document"
+-- Search queries against an index -> "search_query"
+-- Text for a classifier -> "classification"
+-- Text for clustering -> "clustering"
+-- Images -> "image" (embed-v4+ only)
Do you have search results to re-order?
+-- YES -> Use rerank as a second-stage ranker
| +-- Quality matters most? -> rerank-v4.0-pro
| +-- Latency matters most? -> rerank-v4.0-fast
+-- NO -> Not applicable (rerank needs existing results to score)
Do you need grounded answers with citations?
+-- YES -> Pass documents to chat()
| +-- Have pre-retrieved documents? -> Pass directly via documents param
| +-- Need retrieval first? -> Use embed + vector search + rerank pipeline, then pass top results to chat()
+-- NO -> Use plain chat without documents
</decision_framework>
<red_flags>
High Priority Issues:
CohereClient instead of CohereClientV2 for new code (V1 is legacy)model parameter in V2 API calls (required on every call, unlike V1)inputType for embeddings (search_query for documents or vice versa -- silently degrades results)tool_calls) before appending tool results in the tool use loopMedium Priority Issues:
embeddingTypes (defaults may not match your storage format)finish_reason: "MAX_TOKENS" (output was silently truncated)CohereTimeoutError separately from CohereErrortype (only content-delta has text)preamble, connectors, conversation_id) with V2 clientCommon Mistakes:
response.text instead of response.message.content[0].text (V2 response shape changed)embeddingTypes is required in V2 Embed APItool_call_id when submitting tool results (model cannot correlate results)documents with string values instead of { data: { text: "..." } } objects in V2response.message.citations to exist when no documents were provided (citations only appear with grounded responses)Gotchas & Edge Cases:
cohere-ai version in package.json to avoid breaking changesinputType is camelCase in TypeScript SDK (inputType) but snake_case in the REST API (input_type)embed-v4.0 supports outputDimension for flexible sizing (256, 512, 1024, 1536) but v3 models have fixed dimensionsrelevanceScore is normalized 0-1 but not calibrated across queries -- compare scores within a single query onlytool-plan-delta before tool-call-start -- the model's reasoning about which tool to callsystem role for instructions (V1 used preamble parameter)sources in tool use responses reference tool_call_id values, not document indicesclientName constructor parameter is for logging/analytics, not authenticationresponseFormat: { type: "json_object" } is NOT supported in RAG mode (with documents, tools, or toolResults)toolChoice is only supported on command-r7b-12-2024 and newer modelsstrictTools: true and a new tool set take longer (schema compilation)thinking (reasoning mode) is only available on reasoning-capable models like command-a-reasoning-08-2025</red_flags>
<critical_reminders>
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST use CohereClientV2 (not CohereClient) for all new code -- V2 is the current API with required model parameter)
(You MUST specify inputType on every embed call -- search_document for indexing, search_query for querying -- mismatched types produce garbage similarity scores)
(You MUST handle the tool use loop correctly: append the full assistant message (with tool_calls) to messages, then append tool role results with matching tool_call_id)
(You MUST check finish_reason in responses -- MAX_TOKENS means the output was truncated)
(You MUST never hardcode API keys -- pass via token constructor parameter sourced from environment variables)
Failure to follow these rules will produce broken embeddings, missing citations, or insecure AI integrations.
</critical_reminders>
development
Material Design component library for Vue 3
development
VitePress 1.x — Vue-powered static site generator for documentation sites, built on Vite
tools
Docusaurus 3.x documentation framework — site configuration, docs/blog plugins, sidebars, versioning, MDX, swizzling, and deployment
development
TanStack Form patterns - useForm, form.Field, validators, arrays, linked fields, createFormHook, type safety