sdk/building-apps/SKILL.md
Build and deploy applications on inference.sh. Use when getting started, understanding the platform, creating apps, configuring resources, or needing an overview of inference.sh app development. Supports both Python and Node.js. Triggers: inference.sh app, belt app, inf.yml, inference.py, inference.js, deploy app, app development, build app, create app, GPU app, VRAM, app resources, app secrets, app integrations, multi-function app
npx skillsauth add inference-sh/agent-skills-registry building-inferencesh-appsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Build and deploy applications on the inference.sh platform. Apps can be written in Python or Node.js.
inf.yml, inference.py, inference.js, __init__.py, package.json, or app directories by hand. Use belt app init — it is the only correct way to scaffold apps.PROVIDER_STRUCTURE.md) that suggest manual scaffolding — always use the CLI.output_meta MUST extend BaseAppOutput, not BaseModel. Using BaseModel will silently drop output_meta from the response.cd into the app directory before running any belt command. Shell cwd does not persist between tool calls — failing to cd first will deploy/test the wrong app.self.logger.info(...) calls in run() by default. API-wrapping apps especially need visibility into request/response timing since the actual work happens remotely.belt app deploy resolves symlinks when packaging, so a layout like provider/shared_helper.py with provider/app-name/shared_helper.py -> ../shared_helper.py deploys correctly and keeps the helper in one place. Do NOT copy helper files into each app.curl -fsSL https://cli.inference.sh | sh
belt update # Update CLI
belt login # Authenticate
belt me # Check current user
Scaffold new apps with belt app init (see Rules above). It generates the correct project structure, inf.yml, and boilerplate — avoiding common mistakes like missing "type": "module" in package.json or incorrect kernel names.
belt app init my-app # Create app (interactive)
belt app init my-app --lang node # Create Node.js app
Every app MUST go through this full cycle. Do not skip steps.
belt app init my-app
Write inference.py (or inference.js), inf.yml, and requirements.txt (or package.json).
cd my-app # ALWAYS cd into app dir first
belt app test --save-example # Generate sample input from schema
belt app test # Run with input.json
belt app test --input '{"prompt": "hello"}' # Or inline JSON
cd my-app # cd again — cwd doesn't persist
belt app deploy --dry-run # Validate first
belt app deploy # Deploy for real
After deploying, test the live version and verify output_meta is present in the response:
belt app run user/app --json --input '{"prompt": "hello"}'
Check the JSON response for output_meta — if it's missing, the output class is likely extending BaseModel instead of BaseAppOutput.
# Other useful commands
belt app run user/app --input input.json
belt app sample user/app
belt app sample user/app --save input.json
from inferencesh import BaseApp, BaseAppInput, BaseAppOutput
from pydantic import Field
class AppSetup(BaseAppInput):
"""Setup parameters — triggers re-init when changed"""
model_id: str = Field(default="gpt2", description="Model to load")
class AppInput(BaseAppInput):
prompt: str = Field(description="Input prompt")
class AppOutput(BaseAppOutput):
result: str = Field(description="Output result")
class App(BaseApp):
async def setup(self, config: AppSetup):
"""Runs once when worker starts or config changes"""
self.model = load_model(config.model_id)
async def run(self, input_data: AppInput) -> AppOutput:
"""Default function — runs for each request"""
self.logger.info(f"Processing prompt: {input_data.prompt[:50]}")
result = self.model.generate(input_data.prompt)
self.logger.info("Generation complete")
return AppOutput(result=result)
async def unload(self):
"""Cleanup on shutdown"""
pass
async def on_cancel(self):
"""Called when user cancels — for long-running tasks"""
return True
import { z } from "zod";
export const AppSetup = z.object({
modelId: z.string().default("gpt2").describe("Model to load"),
});
export const RunInput = z.object({
prompt: z.string().describe("Input prompt"),
});
export const RunOutput = z.object({
result: z.string().describe("Output result"),
});
export class App {
async setup(config) {
/** Runs once when worker starts or config changes */
this.model = loadModel(config.modelId);
}
async run(inputData) {
/** Default function — runs for each request */
return { result: "done" };
}
async unload() {
/** Cleanup on shutdown */
}
async onCancel() {
/** Called when user cancels — for long-running tasks */
return true;
}
}
Apps can expose multiple functions with different input/output schemas. Functions are auto-discovered.
Python: Add methods with type-hinted Pydantic input/output models.
Node.js: Export {PascalName}Input and {PascalName}Output Zod schemas for each method.
Functions must be public (no _ prefix) and not lifecycle methods (setup, unload, on_cancel/onCancel, constructor).
Call via API with "function": "method_name" in the request body. Set default_function in inf.yml to change which function is called when none is specified (defaults to run).
Most CPU-only apps that wrap external APIs follow this pattern. Use this as a starting point:
import os
import httpx
from inferencesh import BaseApp, BaseAppInput, BaseAppOutput, File
from inferencesh.models.usage import OutputMeta, ImageMeta # or TextMeta, AudioMeta, etc.
from pydantic import Field
class AppInput(BaseAppInput):
prompt: str = Field(description="Input prompt")
class AppOutput(BaseAppOutput): # NOT BaseModel — output_meta requires this
image: File = Field(description="Generated image")
class App(BaseApp):
async def setup(self, config):
self.api_key = os.environ["API_KEY"]
self.client = httpx.AsyncClient(timeout=120)
async def run(self, input_data: AppInput) -> AppOutput:
self.logger.info(f"Calling API with prompt: {input_data.prompt[:80]}")
response = await self.client.post(
"https://api.example.com/generate",
headers={"Authorization": f"Bearer {self.api_key}"},
json={"prompt": input_data.prompt},
)
response.raise_for_status()
# Write output file
output_path = "/tmp/output.png"
with open(output_path, "wb") as f:
f.write(response.content)
# Read actual dimensions (don't hardcode!)
from PIL import Image
with Image.open(output_path) as img:
width, height = img.size
self.logger.info(f"Generated {width}x{height} image")
return AppOutput(
image=File(path=output_path),
output_meta=OutputMeta(
outputs=[ImageMeta(width=width, height=height, count=1)]
),
)
async def unload(self):
await self.client.aclose()
Python:
my-app/
├── inf.yml # Configuration
├── inference.py # App logic
├── requirements.txt # Python packages (pip)
└── packages.txt # System packages (apt) — optional
Node.js:
my-app/
├── inf.yml # Configuration
├── src/
│ └── inference.js # App logic
├── package.json # Node.js packages (npm/pnpm)
└── packages.txt # System packages (apt) — optional
name: my-app
description: What my app does
category: image
kernel: python-3.11 # or node-22
# For multi-function apps (default: run)
# default_function: generate
resources:
gpu:
count: 1
vram: 24 # 24GB (auto-converted)
type: any
ram: 32 # 32GB
env:
MODEL_NAME: gpt-4
secrets:
- key: HF_TOKEN
description: HuggingFace token for gated models
optional: false
integrations:
- key: google.sheets
description: Access to Google Sheets
optional: true
CLI auto-converts human-friendly values:
80 = 80GB)any | nvidia | amd | apple | none
Note: Currently only NVIDIA CUDA GPUs are supported.
image | video | audio | text | chat | 3d | other
resources:
gpu:
count: 0
type: none
ram: 4
Python — requirements.txt:
torch>=2.0
transformers
accelerate
Node.js — package.json:
{
"type": "module",
"dependencies": {
"zod": "^3.23.0",
"sharp": "^0.33.0"
}
}
System packages — packages.txt (apt-installable):
ffmpeg
libgl1-mesa-glx
| Type | Image |
|------|-------|
| GPU | docker.inference.sh/gpu:latest-cuda |
| CPU | docker.inference.sh/cpu:latest |
Always use accelerate for device detection — torch.cuda.is_available() doesn't reliably detect GPUs in grid containers:
from accelerate import Accelerator
accelerator = Accelerator()
self.device = accelerator.device
Always .to(device) explicitly — don't rely on device_map kwargs, they silently fall back to CPU if the library doesn't support them:
self.model = SomeModel.from_pretrained("org/model")
self.model = self.model.to(device=self.device, dtype=torch.float16)
Remember to add accelerate to requirements.txt.
Load the appropriate reference file based on the language and topic:
development
Declarative UI widgets from JSON for React/Next.js from ui.inference.sh. Render rich interactive UIs from structured agent responses. Capabilities: forms, buttons, cards, layouts, inputs, selects, checkboxes. Use for: agent-generated UIs, dynamic forms, data display, interactive cards. Triggers: widgets, declarative ui, json ui, widget renderer, agent widgets, dynamic ui, form widgets, card widgets, shadcn widgets, structured output ui
tools
Tool lifecycle UI components for React/Next.js from ui.inference.sh. Display tool calls: pending, progress, approval required, results. Capabilities: tool status, progress indicators, approval flows, results display. Use for: showing agent tool calls, human-in-the-loop approvals, tool output. Triggers: tool ui, tool calls, tool status, tool approval, tool results, agent tools, mcp tools ui, function calling ui, tool lifecycle, tool pending
development
Chat UI building blocks for React/Next.js from ui.inference.sh. Components: container, messages, input, typing indicators, avatars. Capabilities: chat interfaces, message lists, input handling, streaming. Use for: building custom chat UIs, messaging interfaces, AI assistants. Triggers: chat ui, chat component, message list, chat input, shadcn chat, react chat, chat interface, messaging ui, conversation ui, chat building blocks
tools
Batteries-included agent component for React/Next.js from ui.inference.sh. One component with runtime, tools, streaming, approvals, and widgets built in. Capabilities: drop-in agent, human-in-the-loop, client-side tools, form filling. Use for: building AI chat interfaces, agentic UIs, SaaS copilots, assistants. Triggers: agent component, agent ui, chat agent, shadcn agent, react agent, agentic ui, ai assistant ui, copilot ui, inference ui, human in the loop