Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

hsliuustc0106/vllm-omni-api

Name: vllm-omni-api
Author: hsliuustc0106

skills/vllm-omni-api/SKILL.md

npx skillsauth add hsliuustc0106/vllm-omni-skills vllm-omni-api

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

vLLM-Omni API Integration

Overview

vLLM-Omni exposes OpenAI-compatible REST endpoints for all modalities. Existing OpenAI client libraries work with minimal changes. The server supports chat completions, image generation, image editing, and speech synthesis.

Starting the Server

vllm serve <model-name> --omni --port 8091

Diffusion models benefit from multi-thread weight loading (enabled by default), which parallelizes safetensors shard loading for faster startup. See vllm-omni-perf for details.

Core Endpoints

| Endpoint | Method | Purpose | |----------|--------|---------| | /v1/chat/completions | POST | Chat-based generation (text, image, audio) | | /v1/images/generations | POST | Direct image generation | | /v1/images/edits | POST | Image editing | | /v1/audio/speech | POST | Text-to-speech (wav/mp3) | | /v1/audio/voice/upload | POST | Upload custom voice for cloning | | /v1/images/edits | POST | Image editing | | /v1/videos/generations | POST | Video generation (async poll) | | /health | GET | Server health check | | /v1/models | GET | List loaded models |

/v1/audio/voice/upload endpoint restored. /v1/audio/speech supports response_format: "wav" with streaming. /v1/audio/speech supports response_format: "wav" with streaming.

/v1/images/generations supports client-side request cancellation via AbortController (or client.cancel() in the openai Python SDK). --max-generated-image-size is enforced on both /v1/images/generations and /v1/images/edits (returns HTTP 400 for oversized requests).

Chat Completions (Universal)

The chat completions endpoint handles all modalities through the message format:

Python (openai SDK)

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8091/v1", api_key="unused")

response = client.chat.completions.create(
    model="Tongyi-MAI/Z-Image-Turbo",
    messages=[{"role": "user", "content": "a sunset over mountains"}],
    extra_body={
        "height": 1024,
        "width": 1024,
        "num_inference_steps": 50,
        "guidance_scale": 4.0,
        "seed": 42,
    },
)

image_b64 = response.choices[0].message.content[0].image_url.url

curl

curl -s http://localhost:8091/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "a sunset over mountains"}],
    "extra_body": {
      "height": 1024,
      "width": 1024,
      "num_inference_steps": 50,
      "guidance_scale": 4.0,
      "seed": 42
    }
  }' | jq -r '.choices[0].message.content[0].image_url.url' \
     | cut -d',' -f2 | base64 -d > sunset.png

Image Generation Endpoint

Supports output_format (png, jpeg, webp) and size in both request and response:

curl -s http://localhost:8091/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "a cup of coffee on a table",
    "size": "1024x1024",
    "n": 1,
    "output_format": "png"
  }' | jq '.data[0]'

The response includes output_format and size fields. When output_format is not specified, defaults to png.

Streaming Responses

For models supporting streaming (text/audio outputs):

response = client.chat.completions.create(
    model="Qwen/Qwen2.5-Omni-7B",
    messages=[{"role": "user", "content": "Tell me about AI"}],
    stream=True,
)
for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Multi-modal Input

Send images/audio as input to omni-modality models:

import base64

with open("photo.jpg", "rb") as f:
    img_b64 = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="Qwen/Qwen2.5-Omni-7B",
    messages=[{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{img_b64}"}},
            {"type": "text", "text": "Describe this image"},
        ],
    }],
)

Error Handling

| Status Code | Meaning | Action | |-------------|---------|--------| | 200 | Success | Process response | | 400 | Bad request | Check request body format | | 404 | Model not found | Verify model name and server config | | 413 | Input too large | Reduce input size or increase limits | | 500 | Server error | Check server logs | | 503 | Server overloaded | Retry with backoff | | 507 | Insufficient storage (OOM) | Reduce resolution/batch or use quantization |

Health Check

import requests

resp = requests.get("http://localhost:8091/health")
assert resp.status_code == 200

References

For full endpoint specifications and parameters, see references/endpoints.md

hsliuustc0106/vllm-omni-api

skills/vllm-omni-api/SKILL.md

Integrate with vLLM-Omni using the OpenAI-compatible API for text, image, video, and audio generation. Use when building client applications, calling vllm-omni endpoints, sending requests to the API server, or integrating vllm-omni into an application.

59 stars

tools

Updated May 3, 2026

$ install --global

skillsauth

npx skillsauth add hsliuustc0106/vllm-omni-skills vllm-omni-api

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 3, 2026, 5:46 AM56.2s1 file scanned

SKILL.md

name:: vllm-omni-api
description:: Integrate with vLLM-Omni using the OpenAI-compatible API for text, image, video, and audio generation. Use when building client applications, calling vllm-omni endpoints, sending requests to the API server, or integrating vllm-omni into an application.

vLLM-Omni API Integration

Overview

Starting the Server

vllm serve <model-name> --omni --port 8091

Diffusion models benefit from multi-thread weight loading (enabled by default), which parallelizes safetensors shard loading for faster startup. See vllm-omni-perf for details.

Core Endpoints

/v1/audio/voice/upload endpoint restored. /v1/audio/speech supports response_format: "wav" with streaming. /v1/audio/speech supports response_format: "wav" with streaming.

Chat Completions (Universal)

The chat completions endpoint handles all modalities through the message format:

Python (openai SDK)

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8091/v1", api_key="unused")

response = client.chat.completions.create(
    model="Tongyi-MAI/Z-Image-Turbo",
    messages=[{"role": "user", "content": "a sunset over mountains"}],
    extra_body={
        "height": 1024,
        "width": 1024,
        "num_inference_steps": 50,
        "guidance_scale": 4.0,
        "seed": 42,
    },
)

image_b64 = response.choices[0].message.content[0].image_url.url

curl

curl -s http://localhost:8091/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "a sunset over mountains"}],
    "extra_body": {
      "height": 1024,
      "width": 1024,
      "num_inference_steps": 50,
      "guidance_scale": 4.0,
      "seed": 42
    }
  }' | jq -r '.choices[0].message.content[0].image_url.url' \
     | cut -d',' -f2 | base64 -d > sunset.png

Image Generation Endpoint

Supports output_format (png, jpeg, webp) and size in both request and response:

curl -s http://localhost:8091/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "a cup of coffee on a table",
    "size": "1024x1024",
    "n": 1,
    "output_format": "png"
  }' | jq '.data[0]'

The response includes output_format and size fields. When output_format is not specified, defaults to png.

Streaming Responses

For models supporting streaming (text/audio outputs):

response = client.chat.completions.create(
    model="Qwen/Qwen2.5-Omni-7B",
    messages=[{"role": "user", "content": "Tell me about AI"}],
    stream=True,
)
for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Multi-modal Input

Send images/audio as input to omni-modality models:

import base64

with open("photo.jpg", "rb") as f:
    img_b64 = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="Qwen/Qwen2.5-Omni-7B",
    messages=[{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{img_b64}"}},
            {"type": "text", "text": "Describe this image"},
        ],
    }],
)

Error Handling

Health Check

import requests

resp = requests.get("http://localhost:8091/health")
assert resp.status_code == 200

References

For full endpoint specifications and parameters, see references/endpoints.md

Related Skills

hsliuustc0106/vllm-omni-pre-check

development

VerifiedTrustedCommunity

Use before submitting a PR to vllm-project/vllm-omni — self-check the branch against project conventions, catch dead code, verify accuracy/performance claims, and confirm merge readiness. Use when the user says "pre-check", "self review", "pre-submit check", or "check my PR before I open it."

69SKILL.mdUpdated May 29, 2026

hsliuustc0106/vllm-omni-pre-check

hsliuustc0106/skills/vllm-omni-test-report

development

VerifiedTrustedCommunity

--- name: vllm-omni-test-report description: Two report kinds; **default output is always HTML** unless the user explicitly asks for Markdown (.md). **Release** — `scripts/compose_full_report.py` (**测试结论**, Buildkite metrics, **Test Result** = Common stack + optional `--log-dir-h*` nightly-style summaries + H100/CI block, **Issue tracking** = GitHub `ci-failure` + *local test* in:title, Open bugs); use `--format markdown` only when the user wants .md or `patch_report_*.py`. **Nightly** — `script

69SKILL.mdUpdated May 3, 2026

hsliuustc0106/skills/vllm-omni-test-report

hsliuustc0106/vllm-omni-review

testing

VerifiedTrustedCommunity

Review PRs on vllm-project/vllm-omni by routing to the right domain skills, checking critical evidence, and focusing comments on blocking issues. Use when reviewing pull requests or local branches, triaging review depth, running detailed or default review, or checking tests, benchmarks, and breaking changes in vllm-omni.

69SKILL.mdUpdated May 3, 2026

hsliuustc0106/vllm-omni-review

hsliuustc0106/vllm-omni-video-gen

data-ai

VerifiedTrustedCommunity

Generate videos with vLLM-Omni using Wan2.2 and other video generation models. Use when generating videos from text, creating videos from images, configuring video generation parameters, or working with text-to-video or image-to-video models.

67SKILL.mdUpdated May 3, 2026

hsliuustc0106/vllm-omni-video-gen

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/hsliuustc0106/vllm-omni-skills.git

# Copy into Claude Code skills folder (global)
cp -r vllm-omni-skills/skills/vllm-omni-api ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

hsliuustc0106/vllm-omni-skills

59 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT