src/skills/ai-infrastructure-replicate/SKILL.md
Replicate SDK patterns for TypeScript/Node.js -- client setup, predictions, streaming, webhooks, file handling, model versioning, deployments, and training
npx skillsauth add agents-inc/skills ai-infrastructure-replicateInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Quick Guide: Use the
replicatenpm package to run open-source ML models on serverless GPUs. Usereplicate.run()for synchronous execution that returns output directly,replicate.stream()for SSE-based streaming, orreplicate.predictions.create()for async background jobs with webhook notifications. Models are referenced asowner/model(uses latest version) orowner/model:version(pinned). File outputs areFileOutputobjects implementingReadableStream. Cold starts are expected for infrequently-used models -- use deployments withmin_instancesto keep models warm.
<critical_requirements>
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST never hardcode API tokens -- always use environment variables via process.env.REPLICATE_API_TOKEN)
(You MUST handle FileOutput objects for models that return files -- do not assume outputs are plain strings or URLs)
(You MUST validate webhooks using validateWebhook() from the replicate package -- never trust unverified webhook payloads)
(You MUST account for cold starts when running infrequently-used models -- use deployments for latency-sensitive applications)
(You MUST specify model versions (owner/model:version) in production to ensure reproducible results -- unversioned references use the latest, which can change)
</critical_requirements>
Auto-detection: Replicate, replicate, replicate.run, replicate.stream, replicate.predictions, replicate.deployments, replicate.trainings, replicate.models, FileOutput, validateWebhook, REPLICATE_API_TOKEN, serverless GPU, cold start, webhook_events_filter
When to use:
Key patterns covered:
replicate.run(), replicate.predictions.create(), replicate.wait())replicate.stream() with SSE events)owner/model vs owner/model:version)FileOutput, file uploads, Buffer inputs)When NOT to use:
Replicate provides serverless GPU infrastructure for running open-source ML models. You send inputs, Replicate allocates GPU hardware, runs the model, and returns outputs. No Docker, no CUDA drivers, no GPU provisioning.
Core principles:
replicate.com/explore. Run any public model with just its identifier.owner/model:abc123...) to guarantee identical behavior across deploys.replicate.run() for synchronous wait, replicate.stream() for real-time SSE output, replicate.predictions.create() for fire-and-forget with webhooks.FileOutput objects for file outputs.Initialize the Replicate client. It auto-reads REPLICATE_API_TOKEN from the environment.
// lib/replicate.ts -- basic setup
import Replicate from "replicate";
const replicate = new Replicate();
export { replicate };
// lib/replicate.ts -- explicit auth + custom user agent
import Replicate from "replicate";
const replicate = new Replicate({
auth: process.env.REPLICATE_API_TOKEN, // Auto-reads from env if omitted
userAgent: "my-app/1.0.0",
});
export { replicate };
Why good: Minimal setup, env var auto-detected, explicit auth optional but useful for clarity
// BAD: Hardcoded token
const replicate = new Replicate({
auth: "r8_abc123...",
});
Why bad: Hardcoded API token is a security risk, will leak in version control
See: examples/core.md for full constructor options, error handling patterns
Use replicate.run() for synchronous execution. Returns the model output directly.
// Run an image generation model
const [output] = await replicate.run("black-forest-labs/flux-schnell", {
input: {
prompt: "a serene mountain landscape at sunset",
},
});
// output is a FileOutput object for image models
console.log(output.url()); // URL of generated image
// Run an LLM -- output is a string for text models
const output = await replicate.run("meta/meta-llama-3-70b-instruct", {
input: {
prompt: "Explain TypeScript generics in 3 sentences.",
max_tokens: 512,
},
});
console.log(output); // Text response
Why good: Simple API, returns output directly, destructuring works for array outputs (images)
// BAD: Not pinning version in production
const output = await replicate.run("community-user/experimental-model", {
input: { prompt: "hello" },
});
Why bad: Community models without version pinning can change behavior unexpectedly when authors push updates
See: examples/core.md for version pinning, predictions.create() + wait(), and progress callbacks
Use replicate.stream() for real-time SSE output from language models.
const stream = replicate.stream("meta/meta-llama-3-70b-instruct", {
input: {
prompt: "Write a short poem about TypeScript.",
max_tokens: 512,
},
});
for await (const event of stream) {
if (event.event === "output") {
process.stdout.write(event.data);
}
}
Why good: Progressive output for better UX, event-based with typed event and data fields
// BAD: Using replicate.run() for user-facing LLM output
const output = await replicate.run("meta/meta-llama-3-70b-instruct", {
input: { prompt: "Write a long essay..." },
});
// User waits for entire generation to complete before seeing anything
Why bad: No progressive feedback, user sees a blank screen for seconds
See: examples/streaming-webhooks.md for event types, error handling, cancellation
Models are referenced as owner/model (latest version) or owner/model:sha256hash (pinned version).
// Development: use latest version for convenience
const output = await replicate.run("stability-ai/sdxl", {
input: { prompt: "a cat" },
});
// Production: pin to a specific version for reproducibility
const VERSION_HASH =
"39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b";
const output = await replicate.run(`stability-ai/sdxl:${VERSION_HASH}`, {
input: { prompt: "a cat" },
});
Why good: Pinned version guarantees identical behavior, hash is immutable
See: examples/core.md for listing model versions, getting version details
Models that output files return FileOutput objects implementing ReadableStream.
import { writeFile } from "node:fs/promises";
const [output] = await replicate.run("black-forest-labs/flux-schnell", {
input: { prompt: "a sunset over mountains" },
});
// FileOutput has .url() and .blob() methods
console.log(output.url()); // Underlying URL
// Save to disk
const blob = await output.blob();
const buffer = Buffer.from(await blob.arrayBuffer());
await writeFile("./output.png", buffer);
// File inputs: pass URLs, Buffers, or ReadStreams
import { readFile } from "node:fs/promises";
const imageBuffer = await readFile("./input.png");
const output = await replicate.run("some-user/image-model", {
input: {
image: imageBuffer, // Auto-uploaded (max 100 MiB)
},
});
Why good: FileOutput is a ReadableStream, works with Node.js stream APIs, .url() for the underlying URL
// BAD: Treating file output as a plain URL string
const [output] = await replicate.run("black-forest-labs/flux-schnell", {
input: { prompt: "hello" },
});
const url = output; // WRONG: output is a FileOutput object, not a string
Why bad: FileOutput is an object, not a string -- use .url() to get the URL
See: examples/core.md for file uploads, large file handling, encoding strategies
Use replicate.predictions.create() for background jobs with webhook notifications.
const prediction = await replicate.predictions.create({
model: "owner/model", // OR version: "sha256hash" for pinned version
input: { prompt: "a painting of a cat" },
webhook: "https://my.app/webhooks/replicate",
webhook_events_filter: ["completed"],
});
console.log(prediction.id); // Use to track status
console.log(prediction.status); // "starting"
// Webhook signature validation (CRITICAL for security)
import { validateWebhook } from "replicate";
async function handleWebhook(request: Request): Promise<Response> {
const secret = process.env.REPLICATE_WEBHOOK_SIGNING_SECRET;
const isValid = await validateWebhook(request, secret);
if (!isValid) {
return new Response("Invalid signature", { status: 401 });
}
const prediction = await request.json();
// Process prediction.output safely
return new Response("OK", { status: 200 });
}
Why good: Decoupled processing, secure signature validation, filtered events reduce noise
See: examples/streaming-webhooks.md for webhook event types, polling alternative
Deployments give you a private, fixed endpoint with custom hardware and scaling.
// Create a prediction on a deployment (no cold start if min_instances > 0)
const prediction = await replicate.deployments.predictions.create(
"my-org/my-deployment",
{
input: { prompt: "hello world" },
},
);
const result = await replicate.wait(prediction);
console.log(result.output);
Why good: Predictable latency with min_instances, private endpoint, custom hardware selection
See: examples/deployments-training.md for creating/managing deployments, training API
Catch API errors with status codes. The SDK auto-retries on 429 and 5xx errors (5 retries by default with exponential backoff).
try {
const output = await replicate.run("owner/model", {
input: { prompt: "hello" },
});
} catch (error) {
if (error instanceof Error) {
console.error(`Replicate error: ${error.message}`);
// Check for specific HTTP status codes in the error
if ("status" in error) {
const status = (error as { status: number }).status;
if (status === 401) {
throw new Error("Invalid API token. Check REPLICATE_API_TOKEN.");
}
if (status === 422) {
console.error("Invalid input parameters");
}
if (status === 429) {
console.error(
"Rate limited -- SDK auto-retries (5 attempts) exhausted",
);
}
}
}
throw error;
}
Why good: Checks error type, handles specific status codes, re-throws unexpected errors
See: examples/core.md for full error handling example with status code handling
</patterns>Frequent model with varying load -> Use deployments with min_instances >= 1
One-off batch jobs -> Use predictions.create() with webhooks (no waiting)
Popular public models -> Usually warm, replicate.run() is fine
Custom/niche models -> Expect 30s-5min cold start on first run
min_instances: 1 to eliminate cold startsreplicate.stream() for LLMs -- progressive output feels faster than waiting for full completionreplicate.predictions.cancel() -- stops billing immediately<decision_framework>
Is this a user-facing LLM response?
+-- YES -> Use replicate.stream() for real-time SSE output
+-- NO -> Do you need the result immediately?
+-- YES -> Use replicate.run() (blocks until complete)
+-- NO -> Use replicate.predictions.create() + webhook
+-- Need to poll instead? -> Use replicate.wait(prediction)
Are you in development/prototyping?
+-- YES -> Use owner/model (latest version, convenient)
+-- NO -> Are you in production?
+-- YES -> Use owner/model:version_hash (pinned, reproducible)
+-- Does the model change frequently?
+-- YES -> Pin version, test updates explicitly
+-- NO -> Either format works, prefer pinned
Do you need consistent low latency?
+-- YES -> Create a deployment with min_instances >= 1
+-- NO -> Do you need custom hardware (A100, H100)?
+-- YES -> Create a deployment with specific hardware
+-- NO -> Use replicate.run() / replicate.stream() directly
(Replicate auto-allocates hardware)
Are you running open-source models on serverless GPUs?
+-- YES -> Use Replicate SDK
+-- NO -> Are you calling proprietary APIs (OpenAI, Anthropic)?
+-- YES -> Not this skill's scope -- use provider-specific SDKs
+-- NO -> Do you need to switch between multiple providers?
+-- YES -> Not this skill's scope -- use a unified provider SDK
+-- NO -> Do you want to self-host models?
+-- YES -> Not this skill's scope -- consider Cog or vLLM
+-- NO -> Replicate SDK is appropriate
</decision_framework>
<red_flags>
High Priority Issues:
REPLICATE_API_TOKEN in source code (security breach risk)FileOutput as a string (it is a ReadableStream object -- use .url() or .blob())validateWebhook() (allows forged webhook payloads)replicate.run() for long-running models in request handlers (blocks the response, can timeout)Medium Priority Issues:
owner/model uses latest, which can change without notice)Buffer instead of hosting them at a URL (100 MiB limit on uploads)Common Mistakes:
replicate.run() (returns output directly) with replicate.predictions.create() (returns a prediction object with status/id)const output = await replicate.run(...) instead of const [output] = await replicate.run(...) (image models return arrays)replicate.stream() with models that do not support streaming (only language models with SSE support)replicate.predictions.create() accepts either a version hash or a model string (owner/model) -- use version for pinned reproducibility, model for latest-version conveniencereplicate.stream() (events are lost)Gotchas & Edge Cases:
replicate.stream() returns ServerSentEvent objects with .event ("output", "error", "done") and .data (string) propertieswebhook_events_filter accepts ["start", "output", "logs", "completed"] -- use ["completed"] unless you need intermediate status updatesPrefer: wait header enables sync mode on the HTTP API (up to 60s), but replicate.run() already handles this automaticallyreplicate.wait() polls the API until the prediction completes -- use webhooks for production to avoid polling overheadFileOutput.url() returns the underlying URL, but these URLs are temporary -- download or persist the file before it expires</red_flags>
<critical_reminders>
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST never hardcode API tokens -- always use environment variables via process.env.REPLICATE_API_TOKEN)
(You MUST handle FileOutput objects for models that return files -- do not assume outputs are plain strings or URLs)
(You MUST validate webhooks using validateWebhook() from the replicate package -- never trust unverified webhook payloads)
(You MUST account for cold starts when running infrequently-used models -- use deployments for latency-sensitive applications)
(You MUST specify model versions (owner/model:version) in production to ensure reproducible results -- unversioned references use the latest, which can change)
Failure to follow these rules will produce insecure, unreliable, or unpredictable AI integrations.
</critical_reminders>
development
Material Design component library for Vue 3
development
VitePress 1.x — Vue-powered static site generator for documentation sites, built on Vite
tools
Docusaurus 3.x documentation framework — site configuration, docs/blog plugins, sidebars, versioning, MDX, swizzling, and deployment
development
TanStack Form patterns - useForm, form.Field, validators, arrays, linked fields, createFormHook, type safety