dist/plugins/api-ai-together-ai/skills/api-ai-together-ai/SKILL.md
Together AI SDK patterns for TypeScript — client setup, chat completions, streaming, structured output, function calling, embeddings, image generation, fine-tuning, and OpenAI-compatible endpoints
npx skillsauth add agents-inc/skills api-ai-together-aiInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Quick Guide: Use the
together-ainpm package to access 200+ open-source models (Llama, Qwen, Mistral, DeepSeek) via Together AI's fast inference API. The SDK mirrors the OpenAI API shape --client.chat.completions.create()for chat,client.images.generate()for images,client.embeddings.create()for embeddings. Useresponse_format: { type: "json_schema" }with Zod-generated schemas for structured output. Function calling uses the sametoolsparameter shape as OpenAI. You can also use the OpenAI SDK directly by pointingbaseURLtohttps://api.together.xyz/v1.
<critical_requirements>
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST use the together-ai package (import Together from "together-ai") -- NOT the OpenAI SDK -- unless explicitly building an OpenAI-compatible integration)
(You MUST include the JSON schema in BOTH the response_format parameter AND the system prompt when using structured output -- the model needs both)
(You MUST handle errors using Together.APIError and its subclasses -- never use bare catch blocks without error type checking)
(You MUST never hardcode API keys -- always use environment variables via process.env.TOGETHER_API_KEY)
</critical_requirements>
Auto-detection: Together AI, together-ai, together.ai, TOGETHER_API_KEY, client.chat.completions (together), client.images.generate, client.embeddings.create (together), Llama-3, Qwen3, Mistral, DeepSeek, FLUX, together.images, together.chat, together.embeddings, together.fineTuning, api.together.xyz
When to use:
Key patterns covered:
stream: true and for await...ofresponse_format: { type: "json_schema" } and Zodtools parameterWhen NOT to use:
Together AI provides fast serverless inference for open-source models. The TypeScript SDK (together-ai) is auto-generated with Stainless and mirrors the OpenAI API shape, making migration straightforward.
Core principles:
client.chat.completions.create() pattern, same messages array, same tools parameter. Switching from OpenAI is often just changing the import and model name.meta-llama/Llama-3.3-70B-Instruct-Turbo).response_format and include it in the system prompt. Use Zod's z.toJSONSchema() to generate schemas from TypeScript types.When to use Together AI:
When NOT to use:
Initialize the Together client. It reads TOGETHER_API_KEY from the environment.
// lib/together.ts -- basic setup
import Together from "together-ai";
const client = new Together();
export { client };
// lib/together.ts -- production configuration
const TIMEOUT_MS = 30_000;
const MAX_RETRIES = 3;
const client = new Together({
apiKey: process.env.TOGETHER_API_KEY,
timeout: TIMEOUT_MS,
maxRetries: MAX_RETRIES,
});
export { client };
Why good: Minimal setup, env var auto-detected, named constants for production settings
// BAD: Hardcoded API key
const client = new Together({
apiKey: "sk-abc123...",
});
Why bad: Hardcoded keys get leaked in version control, security breach risk
See: examples/core.md for error handling, OpenAI compatibility, per-request overrides
Stateless text generation with open-source models.
const completion = await client.chat.completions.create({
model: "meta-llama/Llama-3.3-70B-Instruct-Turbo",
messages: [
{ role: "system", content: "You are a helpful coding assistant." },
{ role: "user", content: "Explain TypeScript generics." },
],
});
console.log(completion.choices[0].message.content);
Why good: Clear message roles, system message for behavior control, direct content access
// BAD: No system message, no model specified
const res = await client.chat.completions.create({
messages: [{ role: "user", content: "do something" }],
});
Why bad: Missing model field will error, no system instruction means unpredictable behavior
See: examples/chat.md for multi-turn, vision models, model selection guide
Use streaming for user-facing responses.
const stream = await client.chat.completions.create({
model: "meta-llama/Llama-3.3-70B-Instruct-Turbo",
messages: [{ role: "user", content: "Explain async/await." }],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) process.stdout.write(content);
}
Why good: Progressive output for better UX, standard async iterator pattern
// BAD: Not consuming the stream
const stream = await client.chat.completions.create({
model: "meta-llama/Llama-3.3-70B-Instruct-Turbo",
messages: [{ role: "user", content: "Hello" }],
stream: true,
});
// Stream never consumed -- tokens are lost
Why bad: Stream must be consumed via iteration, otherwise tokens are silently lost
See: examples/streaming.md for stream cancellation, controller access
Use response_format: { type: "json_schema" } with Zod-generated schemas.
import Together from "together-ai";
import { z } from "zod";
const client = new Together();
const EventSchema = z.object({
name: z.string(),
date: z.string(),
participants: z.array(z.string()),
});
const jsonSchema = z.toJSONSchema(EventSchema);
const completion = await client.chat.completions.create({
model: "Qwen/Qwen3.5-9B",
messages: [
{
role: "system",
content: `Extract event details. Only answer in JSON. Follow this schema: ${JSON.stringify(jsonSchema)}`,
},
{ role: "user", content: "Alice and Bob meet next Tuesday for lunch." },
],
response_format: {
type: "json_schema",
json_schema: { name: "calendar_event", schema: jsonSchema },
},
});
const event = JSON.parse(completion.choices[0].message.content ?? "{}");
Why good: Zod generates schema, schema included in both system prompt and response_format, named schema object
// BAD: Schema only in response_format, not in system prompt
const completion = await client.chat.completions.create({
model: "Qwen/Qwen3.5-9B",
messages: [{ role: "user", content: "Extract event details." }],
response_format: {
type: "json_schema",
json_schema: { name: "event", schema: jsonSchema },
},
});
Why bad: Model needs the schema in the system prompt AND response_format for reliable structured output -- omitting the prompt instruction degrades output quality
See: examples/structured-output.md for regex mode, vision with JSON, complex schemas
Define functions the model can call. Same tools parameter shape as OpenAI.
const completion = await client.chat.completions.create({
model: "meta-llama/Llama-3.3-70B-Instruct-Turbo",
messages: [{ role: "user", content: "Weather in Paris?" }],
tools: [
{
type: "function",
function: {
name: "get_weather",
description: "Get current weather for a location",
parameters: {
type: "object",
properties: {
location: { type: "string", description: "City name" },
},
required: ["location"],
additionalProperties: false,
},
strict: true,
},
},
],
});
const toolCall = completion.choices[0].message.tool_calls?.[0];
if (toolCall) {
const args = JSON.parse(toolCall.function.arguments);
console.log(`Call ${toolCall.function.name} with:`, args);
}
Why good: Standard OpenAI-compatible tool format, strict mode for reliable arguments, additionalProperties: false prevents hallucinated fields
See: examples/tools.md for multi-step tool loops, tool_choice, parallel calls, supported models
Generate images with FLUX and Stable Diffusion models.
const response = await client.images.generate({
model: "black-forest-labs/FLUX.1-schnell",
prompt: "A serene mountain landscape at sunset with a lake reflection",
steps: 4,
});
console.log(response.data[0].url);
Why good: Simple API, model-specific parameters, URL response by default
See: examples/images.md for FLUX variants, base64, reference images, multiple variations
Create embeddings for semantic search and RAG pipelines.
const EMBEDDING_MODEL = "BAAI/bge-large-en-v1.5";
const response = await client.embeddings.create({
model: EMBEDDING_MODEL,
input: "TypeScript provides static type checking.",
});
console.log(response.data[0].embedding);
Why good: Named model constant, simple single-input embedding, array response
See: examples/images.md for batch embeddings, semantic search with cosine similarity
Always catch Together.APIError and its subclasses.
try {
const completion = await client.chat.completions.create({
model: "meta-llama/Llama-3.3-70B-Instruct-Turbo",
messages: [{ role: "user", content: "Hello" }],
});
} catch (error) {
if (error instanceof Together.APIError) {
console.error(`API Error [${error.status}]: ${error.message}`);
if (error instanceof Together.RateLimitError) {
console.error("Rate limited -- SDK will auto-retry.");
}
if (error instanceof Together.AuthenticationError) {
throw new Error("Invalid API key. Check TOGETHER_API_KEY.");
}
} else {
throw error; // Re-throw non-API errors
}
}
Why good: Specific error types, re-throws unexpected errors, actionable error messages
See: examples/core.md for full production error handling, error type hierarchy
</patterns>Fast + cheap -> Llama 3.3 70B Turbo, Qwen3.5 9B
Most capable -> DeepSeek V3.1, Qwen3.5 397B
Complex reasoning -> DeepSeek R1
Function calling -> Llama 3.3 70B, Qwen3.5 9B, DeepSeek V3
Structured output (JSON) -> Qwen3.5 9B, Llama 3.3 70B
Embeddings -> BAAI/bge-large-en-v1.5 (quality), UAE-Large-V1
Image generation (fast) -> FLUX.1 schnell (4 steps)
Image generation (quality)-> FLUX.2 pro, FLUX.1.1 pro
Vision / multimodal -> Qwen3-VL-8B-Instruct, Llama 3.2 Vision
temperature: 0 for deterministic output when possibleclient.embeddings.create() instead of one at a timesteps: 4 for FLUX.1 schnell images (higher steps have diminishing returns)<decision_framework>
What is your task?
+-- General chat / instruction following -> Llama 3.3 70B Turbo (fast, cheap)
+-- Most capable reasoning -> DeepSeek V3.1, Qwen3.5 397B
+-- Complex math / chain-of-thought -> DeepSeek R1
+-- Function calling / tool use -> Llama 3.3 70B, Qwen3.5 9B
+-- Structured JSON output -> Qwen3.5 9B (best JSON mode support)
+-- Vision / image understanding -> Qwen3-VL-8B-Instruct
+-- Code generation -> DeepSeek V3, Qwen Coder
+-- Embeddings -> BAAI/bge-large-en-v1.5 (default)
+-- Image generation (fast) -> FLUX.1 schnell
+-- Image generation (quality) -> FLUX.2 pro, FLUX.1.1 pro
Do you ONLY use Together AI models?
+-- YES -> Use together-ai package (purpose-built, full API coverage)
+-- NO -> Do you also use OpenAI models?
+-- YES -> Two options:
| +-- Separate SDKs: together-ai for Together, openai for OpenAI
| +-- OpenAI SDK only: Point baseURL to api.together.xyz/v1
+-- NO -> Use a provider-agnostic SDK
Is the response user-facing?
+-- YES -> Use streaming (stream: true)
+-- NO -> Use non-streaming
+-- Background processing -> client.chat.completions.create()
+-- Structured output -> Non-streaming with response_format
</decision_framework>
<red_flags>
High Priority Issues:
TOGETHER_API_KEY instead of using environment variables (security breach risk)catch blocks without checking Together.APIError (hides API errors)stream: true (tokens are silently lost)JSON.parse() on completion content without response_format (fragile, model may return non-JSON)response_format: { type: "json_schema" } (degrades output quality)Medium Priority Issues:
maxRetries / timeout for production deployments (default timeout is 1 minute)system role message (no system instruction means unpredictable behavior)tools parameter (will silently fail or error)tool_calls is defined before accessing argumentswidth/height with FLUX schnell/Kontext models (use aspect_ratio instead)Common Mistakes:
gpt-4o) with the Together AI SDK -- Together uses Hugging Face-style IDs like meta-llama/Llama-3.3-70B-Instruct-Turboclient.images.generate() (Together) with client.images.create() (OpenAI) -- different method namez.toJSONSchema() (Zod v4) or zodToJsonSchema() (Zod v3) to convert schemas before passing to response_formatdeveloper role (OpenAI-specific) instead of system role with Together AI modelsmax_completion_tokens instead of max_tokens -- Together uses max_tokensGotchas & Edge Cases:
maxRetries: 0.org/model-name format from Hugging Face.aspect_ratio parameter; FLUX.1 Pro and FLUX.1.1 Pro use width/height.response_format: "base64" for inline data.response_format: { type: "json_schema" } requires telling the model to "only answer in JSON" in the system prompt -- the schema alone is not sufficient.z.toJSONSchema() (Zod v4) -- if using Zod v3, use zodToJsonSchema() from the zod-to-json-schema package.client.images.generate() is the method name, not client.images.create() like OpenAI.messages array per line.api.together.xyz/v1) supports chat, embeddings, images, vision, function calling, and structured output -- but not fine-tuning or model management.</red_flags>
<critical_reminders>
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST use the together-ai package (import Together from "together-ai") -- NOT the OpenAI SDK -- unless explicitly building an OpenAI-compatible integration)
(You MUST include the JSON schema in BOTH the response_format parameter AND the system prompt when using structured output -- the model needs both)
(You MUST handle errors using Together.APIError and its subclasses -- never use bare catch blocks without error type checking)
(You MUST never hardcode API keys -- always use environment variables via process.env.TOGETHER_API_KEY)
Failure to follow these rules will produce insecure, unreliable, or incorrectly structured AI integrations.
</critical_reminders>
development
Material Design component library for Vue 3
development
VitePress 1.x — Vue-powered static site generator for documentation sites, built on Vite
tools
Docusaurus 3.x documentation framework — site configuration, docs/blog plugins, sidebars, versioning, MDX, swizzling, and deployment
development
TanStack Form patterns - useForm, form.Field, validators, arrays, linked fields, createFormHook, type safety