Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

abhisheksharma-17/vllm

Name: vllm
Author: abhisheksharma-17

skills/vllm/SKILL.md

npx skillsauth add abhisheksharma-17/skills-graph vllm

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

vLLM

Source: docs.vllm.ai | Version tracked: 0.22.1 | pip install vllm

Reference Files

| File | Read When | |------|-----------| | references/00-overview.md | Starting with vLLM, understanding architecture, installation, quick start | | references/01-offline-inference.md | Batch inference with LLM class, SamplingParams, chat API, generate() | | references/02-serving.md | OpenAI-compatible API server, endpoints, vllm serve, client setup | | references/03-sampling-params.md | Generation parameters: temperature, top_p, top_k, penalties, stop tokens | | references/04-models.md | Supported model architectures, loading models, HuggingFace, model config | | references/05-quantization.md | FP8, GPTQ, AWQ, GGUF, BitsAndBytes, hardware compatibility matrix | | references/06-distributed-inference.md | Tensor/pipeline/expert parallelism, multi-GPU, multi-node, Ray | | references/07-speculative-decoding.md | Draft models, EAGLE, MTP, n-gram, --speculative-config | | references/08-structured-outputs.md | JSON schema, regex, grammar constraints, guided decoding backends | | references/09-tool-calling.md | Function calling, tool parsers, supported models, custom parsers | | references/10-multimodal.md | Vision, audio, video inputs, embedding inputs, media handling | | references/11-lora-adapters.md | LoRA serving, dynamic loading/unloading, multi-adapter, plugins | | references/12-production-deployment.md | Docker, Kubernetes, Prometheus metrics, autoscaling, best practices |

Installation

pip install vllm                    # NVIDIA CUDA (default)
uv pip install vllm --torch-backend=auto  # With uv (recommended)
pip install vllm-tpu                # Google TPU

Quick Reference

Docs | GitHub | PyPI

abhisheksharma-17/vllm

skills/vllm/SKILL.md

High-throughput LLM inference and serving engine with PagedAttention, continuous batching, and OpenAI-compatible API. MANDATORY TRIGGERS: vLLM, vllm, LLM serving, LLM inference engine, PagedAttention. Also trigger when the user wants to serve LLMs in production, deploy models with tensor parallelism, use speculative decoding, quantize models for inference, build OpenAI-compatible API servers, or optimize LLM throughput and latency. When in doubt about whether to use this skill for LLM serving tasks, use it.

1 stars

development

Updated Jun 12, 2026

$ install --global

skillsauth

npx skillsauth add abhisheksharma-17/skills-graph vllm

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 12, 2026, 4:06 AM35.6s18 files scanned

SKILL.md

name:: vllm
description:: High-throughput LLM inference and serving engine with PagedAttention, continuous batching, and OpenAI-compatible API. MANDATORY TRIGGERS: vLLM, vllm, LLM serving, LLM inference engine, PagedAttention. Also trigger when the user wants to serve LLMs in production, deploy models with tensor parallelism, use speculative decoding, quantize models for inference, build OpenAI-compatible API servers, or optimize LLM throughput and latency. When in doubt about whether to use this skill for LLM serving tasks, use it.
license:: MIT
version:: 1.0.0
author:: Abhishek Sharma
tags:: ["llm", "inference", "serving", "gpu", "quantization", "openai-api", "tensor-parallelism", "production"]

vLLM

Source: docs.vllm.ai | Version tracked: 0.22.1 | pip install vllm

Reference Files

Installation

pip install vllm                    # NVIDIA CUDA (default)
uv pip install vllm --torch-backend=auto  # With uv (recommended)
pip install vllm-tpu                # Google TPU

Quick Reference

Docs | GitHub | PyPI

Related Skills

abhisheksharma-17/pydantic-ai

tools

VerifiedTrustedCommunity

Type-safe Python agent framework for building production-grade GenAI applications with Pydantic validation, structured outputs, and dependency injection. MANDATORY TRIGGERS: pydantic-ai, pydantic_ai, PydanticAI, pydantic ai agent. Also trigger when the user wants to build type-safe AI agents in Python, create structured LLM outputs with Pydantic models, implement dependency injection for agents, use tools/capabilities with LLMs, or build multi-agent systems with Python type safety. When in doubt about whether to use this skill for Python AI agent tasks, use it.

1SKILL.mdUpdated Jun 11, 2026

abhisheksharma-17/pydantic-ai

abhisheksharma-17/temporal

development

VerifiedTrustedCommunity

Durable execution platform for building fault-tolerant workflows, long-running processes, and resilient distributed applications. MANDATORY TRIGGERS: temporal, temporal.io, temporalio, durable execution, workflow orchestration engine. Also trigger when the user wants to build fault-tolerant workflows, implement saga patterns, create long-running distributed processes, orchestrate microservices with retries and timeouts, or build durable AI agent pipelines. When in doubt about whether to use this skill for workflow orchestration or durable execution tasks, use it.

1SKILL.mdUpdated Jun 9, 2026

abhisheksharma-17/temporal

abhisheksharma-17/llamaindex

tools

VerifiedTrustedCommunity

AI framework for building RAG pipelines, agents, workflows, and data-augmented LLM applications with 300+ integrations. MANDATORY TRIGGERS: llamaindex, llama-index, llama_index, LlamaIndex, VectorStoreIndex, SimpleDirectoryReader, LlamaHub, LlamaParse. Also trigger when the user wants to build RAG applications with LlamaIndex, create document indexing pipelines, build agentic workflows with tool calling, implement structured data extraction from documents, or connect LLMs to custom data sources. When in doubt about whether to use this skill for RAG, document indexing, or LLM data augmentation tasks, use it.

1SKILL.mdUpdated Jun 7, 2026

abhisheksharma-17/llamaindex

abhisheksharma-17/wandb

development

VerifiedTrustedCommunity

AI developer platform for experiment tracking, LLM observability, hyperparameter sweeps, artifact versioning, and model registry. MANDATORY TRIGGERS: wandb, weights and biases, weights & biases, W&B, weave, wandb.init, wandb.log. Also trigger when the user wants to track ML experiments, log training metrics, tune hyperparameters with sweeps, version datasets or models, trace LLM calls, evaluate LLM applications, or monitor AI agents. When in doubt about whether to use this skill for ML experiment tracking or LLM observability tasks, use it.

1SKILL.mdUpdated Jun 6, 2026

abhisheksharma-17/wandb

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/abhisheksharma-17/skills-graph.git

# Copy into Claude Code skills folder (global)
cp -r skills-graph/skills/vllm ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

abhisheksharma-17/skills-graph

1 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT