.claude/skills/nlp-slm-patterns/SKILL.md
# NLP / SLM Patterns ## When to Load This Skill Load when working with: local language models, anonymization/PII detection, text classification, NER, Ollama, vLLM, Presidio, spaCy, Hugging Face Transformers, SLM inference pipelines. ## Philosophy Local models run locally during development, then deploy to YC GPU VM for production. The adapter interface is identical in both environments — only the endpoint URL changes via `.env`. Anonymization projects deal with real personal data. Default p
npx skillsauth add pyramidheadshark/ml-claude-infra .claude/skills/nlp-slm-patternsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Load when working with: local language models, anonymization/PII detection, text classification, NER, Ollama, vLLM, Presidio, spaCy, Hugging Face Transformers, SLM inference pipelines.
Local models run locally during development, then deploy to YC GPU VM for production. The adapter interface is identical in both environments — only the endpoint URL changes via .env.
Anonymization projects deal with real personal data. Default posture: treat all input as sensitive until proven otherwise. Never log raw text containing PII.
| Option | When to Use | Infra | |---|---|---| | Ollama | Local dev, prototyping, CPU-only inference | Docker container, local | | vLLM | Production GPU inference, high throughput needed | YC GPU VM | | Transformers (direct) | Fine-tuned models, custom pipelines, offline only | Local or GPU VM |
import httpx
from src.project_name.core.config import settings
class OllamaAdapter:
def __init__(self) -> None:
self._base_url = settings.ollama_base_url
self._model = settings.ollama_model
async def generate(self, prompt: str, system: str = "") -> str:
async with httpx.AsyncClient(timeout=120.0) as client:
response = await client.post(
f"{self._base_url}/api/generate",
json={
"model": self._model,
"prompt": prompt,
"system": system,
"stream": False,
},
)
response.raise_for_status()
return response.json()["response"]
async def chat(self, messages: list[dict]) -> str:
async with httpx.AsyncClient(timeout=120.0) as client:
response = await client.post(
f"{self._base_url}/api/chat",
json={"model": self._model, "messages": messages, "stream": False},
)
response.raise_for_status()
return response.json()["message"]["content"]
Required .env:
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=qwen2.5:7b
Docker Compose for Ollama:
services:
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
volumes:
ollama_data:
from openai import AsyncOpenAI
from src.project_name.core.config import settings
class VLLMAdapter:
def __init__(self) -> None:
self._client = AsyncOpenAI(
api_key="EMPTY",
base_url=settings.vllm_base_url,
)
self._model = settings.vllm_model_name
async def generate(self, prompt: str, system: str = "", max_tokens: int = 1024) -> str:
response = await self._client.chat.completions.create(
model=self._model,
messages=[
{"role": "system", "content": system},
{"role": "user", "content": prompt},
],
max_tokens=max_tokens,
temperature=0.1,
)
return response.choices[0].message.content or ""
Required .env (production):
VLLM_BASE_URL=http://your-gpu-vm-ip:8000/v1
VLLM_MODEL_NAME=Qwen/Qwen2.5-7B-Instruct
vLLM launch command on GPU VM:
python -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen2.5-7B-Instruct \
--port 8000 \
--tensor-parallel-size 1 \
--max-model-len 32768
Microsoft Presidio is the standard for PII detection and anonymization. It supports custom recognizers and operators.
from presidio_analyzer import AnalyzerEngine, RecognizerResult
from presidio_anonymizer import AnonymizerEngine
from presidio_anonymizer.entities import OperatorConfig
class AnonymizationAdapter:
def __init__(self) -> None:
self._analyzer = AnalyzerEngine()
self._anonymizer = AnonymizerEngine()
def analyze(self, text: str, language: str = "ru") -> list[RecognizerResult]:
return self._analyzer.analyze(text=text, language=language)
def anonymize(self, text: str, language: str = "ru") -> str:
results = self.analyze(text, language)
anonymized = self._anonymizer.anonymize(
text=text,
analyzer_results=results,
operators={
"PERSON": OperatorConfig("replace", {"new_value": "<PERSON>"}),
"PHONE_NUMBER": OperatorConfig("replace", {"new_value": "<PHONE>"}),
"EMAIL_ADDRESS": OperatorConfig("replace", {"new_value": "<EMAIL>"}),
"LOCATION": OperatorConfig("replace", {"new_value": "<LOCATION>"}),
},
)
return anonymized.text
Required dependencies:
dependencies = [
"presidio-analyzer>=2.2.0",
"presidio-anonymizer>=2.2.0",
"spacy>=3.7.0",
]
Install spaCy Russian model:
uv run python -m spacy download ru_core_news_sm
uv run python -m spacy download en_core_web_sm
For domain-specific entities (e.g., employee IDs, contract numbers):
from presidio_analyzer import PatternRecognizer, Pattern
class EmployeeIdRecognizer(PatternRecognizer):
PATTERNS = [Pattern("EMPLOYEE_ID", r"\bEMP-\d{6}\b", 0.85)]
CONTEXT = ["employee", "id", "сотрудник"]
def __init__(self) -> None:
super().__init__(
supported_entity="EMPLOYEE_ID",
patterns=self.PATTERNS,
context=self.CONTEXT,
)
Register with analyzer:
from presidio_analyzer import AnalyzerEngine
analyzer = AnalyzerEngine()
analyzer.registry.add_recognizer(EmployeeIdRecognizer())
from pathlib import Path
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
class TextClassifierAdapter:
def __init__(self, model_path: str | Path) -> None:
self._pipeline = pipeline(
"text-classification",
model=str(model_path),
tokenizer=str(model_path),
device=0 if torch.cuda.is_available() else -1,
top_k=None,
)
def classify(self, text: str) -> list[dict]:
results = self._pipeline(text, truncation=True, max_length=512)
return sorted(results[0], key=lambda x: x["score"], reverse=True)
ANONYMIZED_TEXT column naming convention to signal anonymized contentresources/spacy-ner.md — custom NER with spaCyresources/model-quantization.md — GGUF/AWQ quantization for local inferencetesting
# Design Doc Creator ## When to Load This Skill Load when: design documents, requirements, new project start. Short fixture skill for testing (optional/meta skill).
development
# Windows Developer Guide ## When to Load Automatically loaded on Windows (`platform_trigger: "win32"`). Applies to: `.py`, `.ps1`, `.bat`, `.cmd` files and any Windows-specific workflow. ## Python on Windows ### Encoding (CRITICAL) Windows defaults to `cp1251` / `cp1252` for file I/O. Always specify UTF-8 explicitly: ```python with open("file.txt", "r", encoding="utf-8") as f: content = f.read() Path("file.txt").read_text(encoding="utf-8") Path("file.txt").write_text(content, encodin
development
# Test-First Patterns ## When to Load This Skill Load when writing tests, creating `.feature` files, setting up conftest, discussing test strategy, or reviewing coverage. ## Philosophy Tests are written BEFORE code. Always. No exceptions. The order is: Design Doc → BDD Scenarios → Unit Tests → Implementation. BDD scenarios come from the design document's use cases section — they are a direct translation of business requirements into executable specifications. This makes tests the living do
testing
# Skill: Supply Chain Auditor ## When to Load Auto-load when: adding dependencies, reviewing packages, updating versions, or discussing `requirements.txt`, `pyproject.toml`, `package.json`. Triggers on `dependency`, `install`, `package`, `CVE`, `audit`, `vulnerable` (≥2 keywords). ## Core Rules Every new dependency addition must pass this checklist before merging: 1. **Pinned** — exact version in production (`==1.2.3` for pip, `"1.2.3"` for npm, not `^` or `~`). 2. **Maintained** — last com