skills/vllm-omni-cicd/SKILL.md
Set up CI/CD pipelines for vLLM-Omni model deployments including Docker builds, automated testing, rolling updates, and deployment validation. Use when creating deployment pipelines, automating model serving updates, setting up Docker workflows, or configuring GitHub Actions for vllm-omni.
npx skillsauth add hsliuustc0106/vllm-omni-skills vllm-omni-cicdInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill covers CI/CD patterns for deploying and updating vLLM-Omni model serving infrastructure. It includes Docker image builds, automated testing, deployment validation, and rollback strategies.
FROM vllm/vllm-omni:$VLLM_OMNI_VERSION
ARG MODEL_NAME
ENV MODEL_NAME=${MODEL_NAME}
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
CMD curl -sf http://localhost:8091/health || exit 1
EXPOSE 8091
CMD ["sh", "-c", "vllm serve ${MODEL_NAME} --omni --port 8091 --host 0.0.0.0"]
Build and push:
docker build --build-arg MODEL_NAME=Tongyi-MAI/Z-Image-Turbo \
-t my-registry/vllm-omni-z-image:latest .
docker push my-registry/vllm-omni-z-image:latest
For faster container startup, bake model weights into the image:
FROM vllm/vllm-omni:$VLLM_OMNI_VERSION
RUN python -c "from huggingface_hub import snapshot_download; \
snapshot_download('Tongyi-MAI/Z-Image-Turbo', local_dir='/models/z-image')"
ENV MODEL_PATH=/models/z-image
CMD ["sh", "-c", "vllm serve ${MODEL_PATH} --omni --port 8091 --host 0.0.0.0"]
name: vLLM-Omni CI
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install pre-commit
- run: pre-commit run --all-files
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install -e ".[dev]"
- run: pytest tests/ -v --ignore=tests/gpu
docker:
needs: [lint, test]
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v4
- uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- uses: docker/build-push-action@v5
with:
push: true
tags: ghcr.io/${{ github.repository }}/vllm-omni:${{ github.sha }}
gpu-test:
runs-on: [self-hosted, gpu]
needs: [lint]
steps:
- uses: actions/checkout@v4
- run: |
docker run --gpus all --rm \
-v $(pwd):/workspace \
vllm/vllm-omni:$VLLM_OMNI_VERSION \
pytest /workspace/tests/gpu/ -v
apiVersion: apps/v1
kind: Deployment
metadata:
name: vllm-omni
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
spec:
containers:
- name: vllm-omni
image: my-registry/vllm-omni:latest
readinessProbe:
httpGet:
path: /health
port: 8091
initialDelaySeconds: 120
periodSeconds: 10
resources:
limits:
nvidia.com/gpu: 1
# Deploy green
kubectl apply -f deployment-green.yaml
# Validate green
python scripts/validate_deployment.sh http://green-service:8091
# Switch traffic
kubectl patch service vllm-omni -p '{"spec":{"selector":{"version":"green"}}}'
# Teardown blue (after validation period)
kubectl delete deployment vllm-omni-blue
After every deployment, validate:
/health returns 200/v1/models returns expected modelUse the validation script:
./scripts/validate_deployment.sh http://localhost:8091
kubectl rollout undo deployment/vllm-omni
docker compose pull # pulls previous known-good tag
docker compose up -d
development
Use before submitting a PR to vllm-project/vllm-omni — self-check the branch against project conventions, catch dead code, verify accuracy/performance claims, and confirm merge readiness. Use when the user says "pre-check", "self review", "pre-submit check", or "check my PR before I open it."
development
--- name: vllm-omni-test-report description: Two report kinds; **default output is always HTML** unless the user explicitly asks for Markdown (.md). **Release** — `scripts/compose_full_report.py` (**测试结论**, Buildkite metrics, **Test Result** = Common stack + optional `--log-dir-h*` nightly-style summaries + H100/CI block, **Issue tracking** = GitHub `ci-failure` + *local test* in:title, Open bugs); use `--format markdown` only when the user wants .md or `patch_report_*.py`. **Nightly** — `script
testing
Review PRs on vllm-project/vllm-omni by routing to the right domain skills, checking critical evidence, and focusing comments on blocking issues. Use when reviewing pull requests or local branches, triaging review depth, running detailed or default review, or checking tests, benchmarks, and breaking changes in vllm-omni.
data-ai
Generate videos with vLLM-Omni using Wan2.2 and other video generation models. Use when generating videos from text, creating videos from images, configuring video generation parameters, or working with text-to-video or image-to-video models.