Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

graphsignal/graphsignal-profiler

Name: graphsignal-profiler
Author: graphsignal

/SKILL.md

npx skillsauth add graphsignal/graphsignal-python graphsignal-profiler

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Graphsignal Profiler

Graphsignal observes inference workloads from a sidecar process — the profiler. It never shares a process with CUDA: the profiler watches the workload externally via /dev/shm, OTLP/gRPC, Prometheus scraping, and NVML. Auto-instrumentation covers vLLM, SGLang, and PyTorch out of the box.

Install

Two install patterns depending on how you'll launch the profiler.

For graphsignal-run (CLI, recommended): install as a uv tool, isolated from your workload env.

uv tool install 'graphsignal[cu12]'   # CUDA 12.x
# or
uv tool install 'graphsignal[cu13]'   # CUDA 13.x

This puts graphsignal-run on PATH. The CUPTI injection library ships inside the wheel and the launcher references it by absolute path, so the workload doesn't need the package in its own environment.

For graphsignal.watch() (in-process Python entry point): install into the app's own env.

uv add 'graphsignal[cu12]'    # or pip install -U 'graphsignal[cu12]'

The cu12 / cu13 extras are Linux-only and only needed for GPU profiling.

Configure

The profiler reads its config from environment variables.

| Variable | Purpose | | ------------------------------------- | ---------------------------------------------------------------------- | | GRAPHSIGNAL_API_KEY (required) | Account API key. | | GRAPHSIGNAL_API_BASE | Override the API endpoint (defaults to https://api.graphsignal.com). | | GRAPHSIGNAL_TAG_<KEY>=<value> | Arbitrary tag attached to all signals (e.g. GRAPHSIGNAL_TAG_DEPLOYMENT=us-prod). |

Set these before invoking graphsignal-run or calling graphsignal.watch().

Run

Option A — `graphsignal-run` CLI (recommended)

Wrap the launch command for your workload.

export GRAPHSIGNAL_API_KEY="..."
graphsignal-run vllm serve Qwen/Qwen1.5-7B-Chat --port 8000

The CLI sets up CUPTI env vars, spawns a profiler sidecar subprocess, and execv's into the workload.

Option B — `graphsignal.watch()` from Python

For applications that bootstrap themselves (long-lived servers, scripts, notebooks), call graphsignal.watch() once during startup, before any CUDA work happens.

import graphsignal

graphsignal.watch()
# ... your application code (PyTorch, vLLM, SGLang, etc.) ...

It sets up the CUPTI env vars in this process and spawns the profiler sidecar subprocess targeting os.getpid(). Returns the subprocess.Popen so the caller can wait() or terminate() it.

OpenTelemetry tracing (opt-in)

Distributed traces (engine / scheduler / attention spans over OTLP/gRPC) are off by default. Enable them with --enable-otel, which must come before the workload command:

graphsignal-run --enable-otel sglang serve --model-path Qwen/Qwen1.5-7B-Chat --port 8000

This injects the engine's trace flags and starts a local OTLP collector in the profiler. It requires OpenTelemetry installed in the engine's environment (e.g. pip install opentelemetry-sdk opentelemetry-exporter-otlp) — graphsignal can't provide it when installed in a separate env (e.g. uv tool), and SGLang ≥ 0.5.10 errors at startup if tracing is enabled without it. Prometheus metrics and CUPTI GPU profiling are captured regardless of this flag; OTEL injection applies only to graphsignal-run (not graphsignal.watch()).

Engine-specific notes

vLLM

export GRAPHSIGNAL_API_KEY="..."
graphsignal-run vllm serve Qwen/Qwen1.5-7B-Chat --port 8000

Or from Python (before importing vLLM):

import graphsignal
graphsignal.watch()
import vllm
# ...

Captures vLLM's Prometheus metrics and CUPTI GPU profiling out of the box. Engine / scheduler / KV-cache / attention / output-processing OTEL spans are added with --enable-otel (see OpenTelemetry tracing above).

vLLM Docker (image without CUPTI):

docker run --gpus all \
  -p 8000:8000 --ipc=host \
  -e GRAPHSIGNAL_API_KEY=YOUR_API_KEY \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  --entrypoint bash \
  vllm/vllm-openai:latest \
  -lc 'pip install --no-cache-dir graphsignal[cu12] \
       && exec graphsignal-run vllm serve \
           --model Qwen/Qwen2-VL-7B-Instruct \
           --trust-remote-code'

SGLang

export GRAPHSIGNAL_API_KEY="..."
graphsignal-run sglang serve \
  --model-path Qwen/Qwen1.5-7B-Chat \
  --port 8000

Captures SGLang's Prometheus metrics and operation-level GPU profiling out of the box. OTEL spans are added with --enable-otel (see OpenTelemetry tracing above).

SGLang Docker:

docker run --gpus all \
  -p 8000:8000 --ipc=host \
  -e GRAPHSIGNAL_API_KEY=YOUR_API_KEY \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  --entrypoint bash \
  your-sglang-image:latest \
  -lc 'pip install --no-cache-dir graphsignal[cu12] \
       && exec graphsignal-run sglang serve \
           --model-path Qwen/Qwen2.5-1.5B-Instruct \
           --port 8000'

PyTorch

Auto-instrumented for common PyTorch operator / module hot paths and CUDA memory metrics. Use either CLI or watch():

export GRAPHSIGNAL_API_KEY="..."
graphsignal-run python my_app.py

import graphsignal
graphsignal.watch()
import torch
# ...

dstack

dstack runs inference as services. Use graphsignal-run around the launch command, exactly like bare-metal SGLang/vLLM.

type: service
name: deepseek-r1

image: lmsysorg/sglang:latest
env:
  - MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B
  - GRAPHSIGNAL_API_KEY

commands:
  - |
    pip install --no-cache-dir 'graphsignal[cu12]' && \
    graphsignal-run python3 -m sglang.launch_server \
      --model-path $MODEL_ID \
      --port 8000 \
      --trust-remote-code

port: 8000
model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B

resources:
  gpu: 24GB

Deploy:

dstack apply -f service.dstack.yml

Troubleshooting

Ensure outgoing connections to https://api.graphsignal.com are allowed.

Reference

Full Profiler API: https://graphsignal.com/docs/reference/profiler-api/
vLLM integration: https://graphsignal.com/docs/integrations/vllm/
SGLang integration: https://graphsignal.com/docs/integrations/sglang/
dstack integration: https://graphsignal.com/docs/integrations/dstack/

graphsignal/graphsignal-profiler

/SKILL.md

Set up the Graphsignal Profiler for inference workloads — vLLM, SGLang, PyTorch, and dstack services. Use when the user wants GPU profiling, tracing, or monitoring for inference, asks about `graphsignal-run` or `graphsignal.watch()`, or asks about CUPTI / Prometheus / OTLP setup.

205 stars

testing

Updated Jun 6, 2026

$ install --global

skillsauth

npx skillsauth add graphsignal/graphsignal-python graphsignal-profiler

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 6, 2026, 2:02 AM86.1s101 files scanned

SKILL.md

name:: graphsignal-profiler
description:: >-

Graphsignal Profiler

Install

Two install patterns depending on how you'll launch the profiler.

For graphsignal-run (CLI, recommended): install as a uv tool, isolated from your workload env.

uv tool install 'graphsignal[cu12]'   # CUDA 12.x
# or
uv tool install 'graphsignal[cu13]'   # CUDA 13.x

For graphsignal.watch() (in-process Python entry point): install into the app's own env.

uv add 'graphsignal[cu12]'    # or pip install -U 'graphsignal[cu12]'

The cu12 / cu13 extras are Linux-only and only needed for GPU profiling.

Configure

The profiler reads its config from environment variables.

Set these before invoking graphsignal-run or calling graphsignal.watch().

Run

Option A — `graphsignal-run` CLI (recommended)

Wrap the launch command for your workload.

export GRAPHSIGNAL_API_KEY="..."
graphsignal-run vllm serve Qwen/Qwen1.5-7B-Chat --port 8000

The CLI sets up CUPTI env vars, spawns a profiler sidecar subprocess, and execv's into the workload.

Option B — `graphsignal.watch()` from Python

For applications that bootstrap themselves (long-lived servers, scripts, notebooks), call graphsignal.watch() once during startup, before any CUDA work happens.

import graphsignal

graphsignal.watch()
# ... your application code (PyTorch, vLLM, SGLang, etc.) ...

It sets up the CUPTI env vars in this process and spawns the profiler sidecar subprocess targeting os.getpid(). Returns the subprocess.Popen so the caller can wait() or terminate() it.

OpenTelemetry tracing (opt-in)

Distributed traces (engine / scheduler / attention spans over OTLP/gRPC) are off by default. Enable them with --enable-otel, which must come before the workload command:

graphsignal-run --enable-otel sglang serve --model-path Qwen/Qwen1.5-7B-Chat --port 8000

Engine-specific notes

vLLM

export GRAPHSIGNAL_API_KEY="..."
graphsignal-run vllm serve Qwen/Qwen1.5-7B-Chat --port 8000

Or from Python (before importing vLLM):

import graphsignal
graphsignal.watch()
import vllm
# ...

vLLM Docker (image without CUPTI):

docker run --gpus all \
  -p 8000:8000 --ipc=host \
  -e GRAPHSIGNAL_API_KEY=YOUR_API_KEY \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  --entrypoint bash \
  vllm/vllm-openai:latest \
  -lc 'pip install --no-cache-dir graphsignal[cu12] \
       && exec graphsignal-run vllm serve \
           --model Qwen/Qwen2-VL-7B-Instruct \
           --trust-remote-code'

SGLang

export GRAPHSIGNAL_API_KEY="..."
graphsignal-run sglang serve \
  --model-path Qwen/Qwen1.5-7B-Chat \
  --port 8000

Captures SGLang's Prometheus metrics and operation-level GPU profiling out of the box. OTEL spans are added with --enable-otel (see OpenTelemetry tracing above).

SGLang Docker:

docker run --gpus all \
  -p 8000:8000 --ipc=host \
  -e GRAPHSIGNAL_API_KEY=YOUR_API_KEY \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  --entrypoint bash \
  your-sglang-image:latest \
  -lc 'pip install --no-cache-dir graphsignal[cu12] \
       && exec graphsignal-run sglang serve \
           --model-path Qwen/Qwen2.5-1.5B-Instruct \
           --port 8000'

PyTorch

Auto-instrumented for common PyTorch operator / module hot paths and CUDA memory metrics. Use either CLI or watch():

export GRAPHSIGNAL_API_KEY="..."
graphsignal-run python my_app.py

import graphsignal
graphsignal.watch()
import torch
# ...

dstack

dstack runs inference as services. Use graphsignal-run around the launch command, exactly like bare-metal SGLang/vLLM.

type: service
name: deepseek-r1

image: lmsysorg/sglang:latest
env:
  - MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B
  - GRAPHSIGNAL_API_KEY

commands:
  - |
    pip install --no-cache-dir 'graphsignal[cu12]' && \
    graphsignal-run python3 -m sglang.launch_server \
      --model-path $MODEL_ID \
      --port 8000 \
      --trust-remote-code

port: 8000
model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B

resources:
  gpu: 24GB

Deploy:

dstack apply -f service.dstack.yml

Troubleshooting

Ensure outgoing connections to https://api.graphsignal.com are allowed.

Reference

Full Profiler API: https://graphsignal.com/docs/reference/profiler-api/
vLLM integration: https://graphsignal.com/docs/integrations/vllm/
SGLang integration: https://graphsignal.com/docs/integrations/sglang/
dstack integration: https://graphsignal.com/docs/integrations/dstack/

Related Skills

steipete/skill-creator

testing

VerifiedTrustedCommunity

Create, edit, improve, or audit AgentSkills. Use when creating a new skill from scratch or when asked to improve, review, audit, tidy up, or clean up an existing skill or SKILL.md file. Also use when editing or restructuring a skill directory (moving files to references/ or scripts/, removing stale content, validating against the AgentSkills spec). Triggers on phrases like "create a skill", "author a skill", "tidy up a skill", "improve this skill", "review the skill", "clean up the skill", "audit the skill".

356,423SKILL.mdUpdated Apr 13, 2026

steipete/skill-creator

steipete/healthcheck

testing

VerifiedTrustedCommunity

Host security hardening and risk-tolerance configuration for OpenClaw deployments. Use when a user asks for security audits, firewall/SSH/update hardening, risk posture, exposure review, OpenClaw cron scheduling for periodic checks, or version status checks on a machine running OpenClaw (laptop, workstation, Pi, VPS).

356,423SKILL.mdUpdated Apr 13, 2026

openclaw/skill-creator

testing

VerifiedTrustedCommunity

353,662SKILL.mdUpdated Apr 10, 2026

openclaw/skill-creator

openclaw/healthcheck

testing

VerifiedTrustedCommunity

353,662SKILL.mdUpdated Apr 10, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/graphsignal/graphsignal-python.git

# Copy into Claude Code skills folder (global)
cp -r graphsignal-python/ ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

graphsignal/graphsignal-python

205 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

graphsignal/graphsignal-profiler

$ install --global

Security Scan Results

SKILL.md

Graphsignal Profiler

Install

Configure

Run

Option A — graphsignal-run CLI (recommended)

Option B — graphsignal.watch() from Python

OpenTelemetry tracing (opt-in)

Engine-specific notes

vLLM

SGLang

PyTorch

dstack

Troubleshooting

Reference

Related Skills

steipete/skill-creator

steipete/healthcheck

openclaw/skill-creator

openclaw/healthcheck

graphsignal/graphsignal-profiler

$ install --global

Security Scan Results

SKILL.md

Graphsignal Profiler

Install

Configure

Run

Option A — graphsignal-run CLI (recommended)

Option B — graphsignal.watch() from Python

OpenTelemetry tracing (opt-in)

Engine-specific notes

vLLM

SGLang

PyTorch

dstack

Troubleshooting

Reference

Related Skills

steipete/skill-creator

steipete/healthcheck

openclaw/skill-creator

openclaw/healthcheck

Option A — `graphsignal-run` CLI (recommended)

Option B — `graphsignal.watch()` from Python

Option A — `graphsignal-run` CLI (recommended)

Option B — `graphsignal.watch()` from Python