Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

Dirty13itch/deploy-model

Name: deploy-model
Author: Dirty13itch

.claude/skills/deploy-model/SKILL.md

npx skillsauth add Dirty13itch/kaizen deploy-model

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Deploy Model

Deploy or manage inference models on Kaizen. Argument should be one of: reasoning, embedding, heavy, or a manifest path.

Pre-flight Checks

Verify target node is Ready: kubectl get nodes
Check GPU availability: kubectl describe node <node> | grep -A3 nvidia.com/gpu
Check existing deployments: kubectl get pods -n inference

Available Manifests

k8s/apps/inference/ — All inference deployments
k8s/apps/inference/sglang-core.yaml — Heavy model (Qwen2.5-72B, TP=4, CORE)

GPU Constraints

INTERFACE: RTX 5090 (32GB) + RTX 4090 (24GB) — NO tensor parallelism
CORE: 4x RTX 5070 Ti (16GB each) — TP=4 supported
Deployment order determines GPU assignment on INTERFACE

Deploy Steps

Read the manifest file
Validate: kubectl apply --dry-run=client -f <manifest>
Show the user what will be deployed (model, GPU, node, port)
Ask for confirmation before applying
Apply: kubectl apply -f <manifest>
Watch: kubectl rollout status -n inference deploy/<name> --timeout=300s
Test: curl http://10.10.10.10:<port>/v1/models

Restart

kubectl rollout restart -n inference deploy/<name>
kubectl rollout status -n inference deploy/<name>

Dirty13itch/deploy-model

.claude/skills/deploy-model/SKILL.md

Deploy or redeploy an inference model on the cluster. Use when asked to deploy, update, or restart a model.

devops

Updated Apr 16, 2026

$ install --global

skillsauth

npx skillsauth add Dirty13itch/kaizen deploy-model

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 16, 2026, 2:38 PM6.3s1 file scanned

SKILL.md

name:: deploy-model
description:: Deploy or redeploy an inference model on the cluster. Use when asked to deploy, update, or restart a model.
disable-model-invocation:: true
allowed-tools:: Bash, Read, Edit

Deploy Model

Deploy or manage inference models on Kaizen. Argument should be one of: reasoning, embedding, heavy, or a manifest path.

Pre-flight Checks

Verify target node is Ready: kubectl get nodes
Check GPU availability: kubectl describe node <node> | grep -A3 nvidia.com/gpu
Check existing deployments: kubectl get pods -n inference

Available Manifests

k8s/apps/inference/ — All inference deployments
k8s/apps/inference/sglang-core.yaml — Heavy model (Qwen2.5-72B, TP=4, CORE)

GPU Constraints

INTERFACE: RTX 5090 (32GB) + RTX 4090 (24GB) — NO tensor parallelism
CORE: 4x RTX 5070 Ti (16GB each) — TP=4 supported
Deployment order determines GPU assignment on INTERFACE

Deploy Steps

Read the manifest file
Validate: kubectl apply --dry-run=client -f <manifest>
Show the user what will be deployed (model, GPU, node, port)
Ask for confirmation before applying
Apply: kubectl apply -f <manifest>
Watch: kubectl rollout status -n inference deploy/<name> --timeout=300s
Test: curl http://10.10.10.10:<port>/v1/models

Restart

kubectl rollout restart -n inference deploy/<name>
kubectl rollout status -n inference deploy/<name>

Related Skills

Dirty13itch/validate

testing

VerifiedTrustedCommunity

Pre-commit validation suite for manifests, scripts, and configs

SKILL.mdUpdated Apr 16, 2026

Dirty13itch/test

testing

VerifiedTrustedCommunity

Run the full integration test suite

SKILL.mdUpdated Apr 16, 2026

Dirty13itch/status

testing

VerifiedTrustedCommunity

Check Kaizen system health across all nodes and services

SKILL.mdUpdated Apr 16, 2026

Dirty13itch/session-resume

testing

VerifiedTrustedCommunity

Resume from last checkpoint — show progress, cluster state, next actions

SKILL.mdUpdated Apr 16, 2026

Dirty13itch/session-resume

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/Dirty13itch/kaizen.git

# Copy into Claude Code skills folder (global)
cp -r kaizen/.claude/skills/deploy-model ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

Dirty13itch/kaizen

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT