skills/aimlops/llm-ops/SKILL.md
Manages deployment, scaling, and monitoring of large language models in AI/ML operations environments.
npx skillsauth add alphaonedev/openclaw-graph llm-opsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill automates the deployment, scaling, and monitoring of large language models (LLMs) in AI/ML operations, handling infrastructure for models like GPT or BERT variants to ensure efficient runtime management.
Use this skill when deploying LLMs in production environments, such as scaling a chatbot backend during peak traffic, monitoring model performance in real-time, or updating models in Kubernetes-based ML ops setups. Apply it in scenarios involving resource-constrained environments or when integrating LLMs with CI/CD pipelines for automated deployments.
To deploy an LLM, first set the environment variable for authentication: export OPENCLAW_API_KEY=your_api_key. Then, use the CLI to initiate deployment with specific flags. For scaling, monitor metrics and trigger adjustments programmatically. Always specify the model ID and target environment in commands to avoid conflicts. For API-based usage, include the API key in headers and handle responses for asynchronous operations.
Use the OpenClaw CLI for quick operations; prefix commands with openclaw llm. For API calls, target the base endpoint https://api.openclaw.ai/llm and include the header Authorization: Bearer $OPENCLAW_API_KEY.
Deploy Command: openclaw llm deploy --model-id my-llm-123 --env production --replicas 3 --config-path ./config.json
{"image": "my-llm-image:v1", "resources": {"cpu": "2", "memory": "4Gi"}}import requests
response = requests.post('https://api.openclaw.ai/llm/deploy', json={'model_id': 'my-llm-123', 'replicas': 3}, headers={'Authorization': f'Bearer {os.environ["OPENCLAW_API_KEY"]}'})
print(response.json())
Scale Command: openclaw llm scale --model-id my-llm-123 --scale-to 5 --metric cpu_utilization
{"model_id": "my-llm-123", "scale_to": 5}Monitor Command: openclaw llm monitor --model-id my-llm-123 --duration 60 --output json
--alert-threshold 0.9 for CPU alerts.Rollback Command: openclaw llm rollback --model-id my-llm-123 --version v1.0
Config formats are JSON-based, e.g.,:
{
"model_id": "my-llm-123",
"deployment": {
"type": "kubernetes",
"namespace": "aiml"
}
}
Integrate this skill with existing ML ops tools by exporting metrics to Prometheus or using webhooks for CI/CD. For Kubernetes, apply manifests generated by openclaw llm generate-k8s --model-id my-llm-123. When combining with other OpenClaw skills, chain commands like openclaw llm deploy && openclaw monitoring setup. Use environment variables for secrets, e.g., set $OPENCLAW_API_KEY in your .env file and load it via dotenv in Python scripts. Ensure network accessibility to API endpoints; configure firewalls to allow traffic to api.openclaw.ai.
Check command exit codes; for example, if openclaw llm deploy fails with code 1, parse the error message for details like "Model not found". In API responses, handle HTTP status codes: 401 for authentication issues (retry with export OPENCLAW_API_KEY=new_key), 404 for missing models, or 500 for server errors (wait and retry with exponential backoff). Include try-except blocks in code snippets:
try:
response = requests.post('https://api.openclaw.ai/llm/deploy', ...)
response.raise_for_status()
except requests.exceptions.HTTPError as e:
print(f"Error: {e.response.status_code} - {e.response.text}")
sys.exit(1)
Log errors to files using --log-file errors.log in CLI commands and monitor for common issues like resource limits.
export OPENCLAW_API_KEY=abc123. Deploy a model with: openclaw llm deploy --model-id gpt-finetuned --env staging --replicas 2. Then, scale it based on load: openclaw llm scale --model-id gpt-finetuned --scale-to 10 --metric request_rate.openclaw llm monitor --model-id gpt-finetuned --duration 300. If issues arise, rollback: openclaw llm rollback --model-id gpt-finetuned --version v2.1.tools
Root web development: project structure, tooling selection, deployment decisions
development
WebAssembly: Rust/Go/C to WASM, wasm-bindgen, Emscripten, WASM Component Model
development
Vue 3: Composition API script setup, Pinia, Vue Router 4, SFCs, Vite, Nuxt 3
tools
Tailwind CSS 4: utility classes, config, JIT, arbitrary values, darkMode, plugins, shadcn/ui