skills/hf-mem/SKILL.md
CLI to estimate the required memory to load either Safetensors or GGUF model weights for inference from the Hugging Face Hub
npx skillsauth add alvarobartt/hf-mem hf-memInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Estimates inference memory (model weights + optional KV cache) for models on the Hugging Face Hub using HTTP Range requests; no weights are downloaded.
uv installed (for uvx)HF_TOKEN env var or --hf-token flag (gated/private models only)Auto-detected when the repo contains model.safetensors, model.safetensors.index.json, or model_index.json. Covers Transformers, Diffusers, and Sentence Transformers; no extra flags needed.
uvx hf-mem --model-id <org/model>
Auto-detected when the repo contains only .gguf files. When both Safetensors and GGUF files coexist, pass --gguf-file to target a specific file. Any shard path works for sharded models.
uvx hf-mem --model-id <org/model> --gguf-file <path-in-repo>
--experimental)Adds KV cache memory on top of weights. Applies to LLMs (...ForCausalLM), VLMs (...ForConditionalGeneration), and GGUF models. Reads max_model_len from config.json by default; override with --max-model-len. KV cache dtype defaults to auto (reads torch_dtype/dtype from config.json, or the FP8 quantization format if applicable; for GGUF auto = F16).
uvx hf-mem --model-id <org/model> [--gguf-file <path>] \
--experimental [--max-model-len N] [--batch-size N] \
[--kv-cache-dtype auto|bfloat16|fp8|fp8_e4m3|fp8_e5m2]
# Transformers
uvx hf-mem --model-id MiniMaxAI/MiniMax-M2
# Diffusers
uvx hf-mem --model-id Qwen/Qwen-Image
# Sentence Transformers
uvx hf-mem --model-id google/embeddinggemma-300m
# LLM with KV cache
uvx hf-mem --model-id mistralai/Mistral-7B-v0.1 --experimental
# GGUF with KV cache (sharded)
uvx hf-mem --model-id unsloth/Qwen3.5-397B-A17B-GGUF \
--gguf-file Q4_K_M/Qwen3.5-397B-A17B-Q4_K_M-00001-of-00006.gguf \
--experimental
HF_TOKEN or --hf-token.--gguf-file path doesn't match any file in the repository.tools
Use when work should span one or more detached tasks but still behave like one job with a single owner context. TaskFlow is the durable flow substrate under authoring layers like Lobster, ACPX, plugins, or plain code. Keep conditional logic in the caller; use TaskFlow for flow identity, child-task linkage, waiting state, revision-checked mutations, and user-facing emergence.
tools
# Lobster Lobster executes multi-step workflows with approval checkpoints. Use it when: - User wants a repeatable automation (triage, monitor, sync) - Actions need human approval before executing (send, post, delete) - Multiple tool calls should run as one deterministic operation ## When to use Lobster | User intent | Use Lobster? | | ------------------------------------------------------ | --------------------------
tools
# Lobster Lobster executes multi-step workflows with approval checkpoints. Use it when: - User wants a repeatable automation (triage, monitor, sync) - Actions need human approval before executing (send, post, delete) - Multiple tool calls should run as one deterministic operation ## When to use Lobster | User intent | Use Lobster? | | ------------------------------------------------------ | --------------------------
tools
A CLI tool for making authenticated requests to the X (Twitter) API. Use this skill when you need to post tweets, reply, quote, search, read posts, manage followers, send DMs, upload media, or interact with any X API v2 endpoint.