Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

pyramidheadshark/.claude/skills/nlp-slm-patterns

Name: .claude/skills/nlp-slm-patterns
Author: pyramidheadshark

.claude/skills/nlp-slm-patterns/SKILL.md

npx skillsauth add pyramidheadshark/ml-claude-infra .claude/skills/nlp-slm-patterns

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

NLP / SLM Patterns

When to Load This Skill

Load when working with: local language models, anonymization/PII detection, text classification, NER, Ollama, vLLM, Presidio, spaCy, Hugging Face Transformers, SLM inference pipelines.

Philosophy

Local models run locally during development, then deploy to YC GPU VM for production. The adapter interface is identical in both environments — only the endpoint URL changes via .env.

Anonymization projects deal with real personal data. Default posture: treat all input as sensitive until proven otherwise. Never log raw text containing PII.

Model Serving Options

| Option | When to Use | Infra | |---|---|---| | Ollama | Local dev, prototyping, CPU-only inference | Docker container, local | | vLLM | Production GPU inference, high throughput needed | YC GPU VM | | Transformers (direct) | Fine-tuned models, custom pipelines, offline only | Local or GPU VM |

Ollama Local Setup

import httpx
from src.project_name.core.config import settings


class OllamaAdapter:
    def __init__(self) -> None:
        self._base_url = settings.ollama_base_url
        self._model = settings.ollama_model

    async def generate(self, prompt: str, system: str = "") -> str:
        async with httpx.AsyncClient(timeout=120.0) as client:
            response = await client.post(
                f"{self._base_url}/api/generate",
                json={
                    "model": self._model,
                    "prompt": prompt,
                    "system": system,
                    "stream": False,
                },
            )
            response.raise_for_status()
            return response.json()["response"]

    async def chat(self, messages: list[dict]) -> str:
        async with httpx.AsyncClient(timeout=120.0) as client:
            response = await client.post(
                f"{self._base_url}/api/chat",
                json={"model": self._model, "messages": messages, "stream": False},
            )
            response.raise_for_status()
            return response.json()["message"]["content"]

Required .env:

OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=qwen2.5:7b

Docker Compose for Ollama:

services:
  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

volumes:
  ollama_data:

vLLM Production Setup

from openai import AsyncOpenAI
from src.project_name.core.config import settings


class VLLMAdapter:
    def __init__(self) -> None:
        self._client = AsyncOpenAI(
            api_key="EMPTY",
            base_url=settings.vllm_base_url,
        )
        self._model = settings.vllm_model_name

    async def generate(self, prompt: str, system: str = "", max_tokens: int = 1024) -> str:
        response = await self._client.chat.completions.create(
            model=self._model,
            messages=[
                {"role": "system", "content": system},
                {"role": "user", "content": prompt},
            ],
            max_tokens=max_tokens,
            temperature=0.1,
        )
        return response.choices[0].message.content or ""

Required .env (production):

VLLM_BASE_URL=http://your-gpu-vm-ip:8000/v1
VLLM_MODEL_NAME=Qwen/Qwen2.5-7B-Instruct

vLLM launch command on GPU VM:

python -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen2.5-7B-Instruct \
  --port 8000 \
  --tensor-parallel-size 1 \
  --max-model-len 32768

Anonymization with Presidio

Microsoft Presidio is the standard for PII detection and anonymization. It supports custom recognizers and operators.

from presidio_analyzer import AnalyzerEngine, RecognizerResult
from presidio_anonymizer import AnonymizerEngine
from presidio_anonymizer.entities import OperatorConfig


class AnonymizationAdapter:
    def __init__(self) -> None:
        self._analyzer = AnalyzerEngine()
        self._anonymizer = AnonymizerEngine()

    def analyze(self, text: str, language: str = "ru") -> list[RecognizerResult]:
        return self._analyzer.analyze(text=text, language=language)

    def anonymize(self, text: str, language: str = "ru") -> str:
        results = self.analyze(text, language)
        anonymized = self._anonymizer.anonymize(
            text=text,
            analyzer_results=results,
            operators={
                "PERSON": OperatorConfig("replace", {"new_value": "<PERSON>"}),
                "PHONE_NUMBER": OperatorConfig("replace", {"new_value": "<PHONE>"}),
                "EMAIL_ADDRESS": OperatorConfig("replace", {"new_value": "<EMAIL>"}),
                "LOCATION": OperatorConfig("replace", {"new_value": "<LOCATION>"}),
            },
        )
        return anonymized.text

Required dependencies:

dependencies = [
    "presidio-analyzer>=2.2.0",
    "presidio-anonymizer>=2.2.0",
    "spacy>=3.7.0",
]

Install spaCy Russian model:

uv run python -m spacy download ru_core_news_sm
uv run python -m spacy download en_core_web_sm

Custom Presidio Recognizer

For domain-specific entities (e.g., employee IDs, contract numbers):

from presidio_analyzer import PatternRecognizer, Pattern


class EmployeeIdRecognizer(PatternRecognizer):
    PATTERNS = [Pattern("EMPLOYEE_ID", r"\bEMP-\d{6}\b", 0.85)]
    CONTEXT = ["employee", "id", "сотрудник"]

    def __init__(self) -> None:
        super().__init__(
            supported_entity="EMPLOYEE_ID",
            patterns=self.PATTERNS,
            context=self.CONTEXT,
        )

from presidio_analyzer import AnalyzerEngine

analyzer = AnalyzerEngine()
analyzer.registry.add_recognizer(EmployeeIdRecognizer())

Transformers Inference Pipeline

from pathlib import Path
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline


class TextClassifierAdapter:
    def __init__(self, model_path: str | Path) -> None:
        self._pipeline = pipeline(
            "text-classification",
            model=str(model_path),
            tokenizer=str(model_path),
            device=0 if torch.cuda.is_available() else -1,
            top_k=None,
        )

    def classify(self, text: str) -> list[dict]:
        results = self._pipeline(text, truncation=True, max_length=512)
        return sorted(results[0], key=lambda x: x["score"], reverse=True)

Privacy Rules

Never log raw user text in production — log only metadata (length, language, entity counts)
Anonymize before storing any text in database
Use ANONYMIZED_TEXT column naming convention to signal anonymized content
Test anonymization with synthetic PII — never use real personal data in tests

Further Resources

resources/spacy-ner.md — custom NER with spaCy
resources/model-quantization.md — GGUF/AWQ quantization for local inference

pyramidheadshark/.claude/skills/nlp-slm-patterns

.claude/skills/nlp-slm-patterns/SKILL.md

# NLP / SLM Patterns ## When to Load This Skill Load when working with: local language models, anonymization/PII detection, text classification, NER, Ollama, vLLM, Presidio, spaCy, Hugging Face Transformers, SLM inference pipelines. ## Philosophy Local models run locally during development, then deploy to YC GPU VM for production. The adapter interface is identical in both environments — only the endpoint URL changes via `.env`. Anonymization projects deal with real personal data. Default p

4 stars

development

Updated Apr 15, 2026

$ install --global

skillsauth

npx skillsauth add pyramidheadshark/ml-claude-infra .claude/skills/nlp-slm-patterns

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 15, 2026, 8:13 AM154.0s4 files scanned

SKILL.md

NLP / SLM Patterns

When to Load This Skill

Load when working with: local language models, anonymization/PII detection, text classification, NER, Ollama, vLLM, Presidio, spaCy, Hugging Face Transformers, SLM inference pipelines.

Philosophy

Local models run locally during development, then deploy to YC GPU VM for production. The adapter interface is identical in both environments — only the endpoint URL changes via .env.

Anonymization projects deal with real personal data. Default posture: treat all input as sensitive until proven otherwise. Never log raw text containing PII.

Model Serving Options

Ollama Local Setup

import httpx
from src.project_name.core.config import settings


class OllamaAdapter:
    def __init__(self) -> None:
        self._base_url = settings.ollama_base_url
        self._model = settings.ollama_model

    async def generate(self, prompt: str, system: str = "") -> str:
        async with httpx.AsyncClient(timeout=120.0) as client:
            response = await client.post(
                f"{self._base_url}/api/generate",
                json={
                    "model": self._model,
                    "prompt": prompt,
                    "system": system,
                    "stream": False,
                },
            )
            response.raise_for_status()
            return response.json()["response"]

    async def chat(self, messages: list[dict]) -> str:
        async with httpx.AsyncClient(timeout=120.0) as client:
            response = await client.post(
                f"{self._base_url}/api/chat",
                json={"model": self._model, "messages": messages, "stream": False},
            )
            response.raise_for_status()
            return response.json()["message"]["content"]

Required .env:

OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=qwen2.5:7b

Docker Compose for Ollama:

services:
  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

volumes:
  ollama_data:

vLLM Production Setup

from openai import AsyncOpenAI
from src.project_name.core.config import settings


class VLLMAdapter:
    def __init__(self) -> None:
        self._client = AsyncOpenAI(
            api_key="EMPTY",
            base_url=settings.vllm_base_url,
        )
        self._model = settings.vllm_model_name

    async def generate(self, prompt: str, system: str = "", max_tokens: int = 1024) -> str:
        response = await self._client.chat.completions.create(
            model=self._model,
            messages=[
                {"role": "system", "content": system},
                {"role": "user", "content": prompt},
            ],
            max_tokens=max_tokens,
            temperature=0.1,
        )
        return response.choices[0].message.content or ""

Required .env (production):

VLLM_BASE_URL=http://your-gpu-vm-ip:8000/v1
VLLM_MODEL_NAME=Qwen/Qwen2.5-7B-Instruct

vLLM launch command on GPU VM:

python -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen2.5-7B-Instruct \
  --port 8000 \
  --tensor-parallel-size 1 \
  --max-model-len 32768

Anonymization with Presidio

Microsoft Presidio is the standard for PII detection and anonymization. It supports custom recognizers and operators.

from presidio_analyzer import AnalyzerEngine, RecognizerResult
from presidio_anonymizer import AnonymizerEngine
from presidio_anonymizer.entities import OperatorConfig


class AnonymizationAdapter:
    def __init__(self) -> None:
        self._analyzer = AnalyzerEngine()
        self._anonymizer = AnonymizerEngine()

    def analyze(self, text: str, language: str = "ru") -> list[RecognizerResult]:
        return self._analyzer.analyze(text=text, language=language)

    def anonymize(self, text: str, language: str = "ru") -> str:
        results = self.analyze(text, language)
        anonymized = self._anonymizer.anonymize(
            text=text,
            analyzer_results=results,
            operators={
                "PERSON": OperatorConfig("replace", {"new_value": "<PERSON>"}),
                "PHONE_NUMBER": OperatorConfig("replace", {"new_value": "<PHONE>"}),
                "EMAIL_ADDRESS": OperatorConfig("replace", {"new_value": "<EMAIL>"}),
                "LOCATION": OperatorConfig("replace", {"new_value": "<LOCATION>"}),
            },
        )
        return anonymized.text

Required dependencies:

dependencies = [
    "presidio-analyzer>=2.2.0",
    "presidio-anonymizer>=2.2.0",
    "spacy>=3.7.0",
]

Install spaCy Russian model:

uv run python -m spacy download ru_core_news_sm
uv run python -m spacy download en_core_web_sm

Custom Presidio Recognizer

For domain-specific entities (e.g., employee IDs, contract numbers):

from presidio_analyzer import PatternRecognizer, Pattern


class EmployeeIdRecognizer(PatternRecognizer):
    PATTERNS = [Pattern("EMPLOYEE_ID", r"\bEMP-\d{6}\b", 0.85)]
    CONTEXT = ["employee", "id", "сотрудник"]

    def __init__(self) -> None:
        super().__init__(
            supported_entity="EMPLOYEE_ID",
            patterns=self.PATTERNS,
            context=self.CONTEXT,
        )

from presidio_analyzer import AnalyzerEngine

analyzer = AnalyzerEngine()
analyzer.registry.add_recognizer(EmployeeIdRecognizer())

Transformers Inference Pipeline

from pathlib import Path
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline


class TextClassifierAdapter:
    def __init__(self, model_path: str | Path) -> None:
        self._pipeline = pipeline(
            "text-classification",
            model=str(model_path),
            tokenizer=str(model_path),
            device=0 if torch.cuda.is_available() else -1,
            top_k=None,
        )

    def classify(self, text: str) -> list[dict]:
        results = self._pipeline(text, truncation=True, max_length=512)
        return sorted(results[0], key=lambda x: x["score"], reverse=True)

Privacy Rules

Never log raw user text in production — log only metadata (length, language, entity counts)
Anonymize before storing any text in database
Use ANONYMIZED_TEXT column naming convention to signal anonymized content
Test anonymization with synthetic PII — never use real personal data in tests

Further Resources

resources/spacy-ner.md — custom NER with spaCy
resources/model-quantization.md — GGUF/AWQ quantization for local inference

Related Skills

pyramidheadshark/tests/fixtures/project-with-status/.claude/skills/design-doc-creator

testing

VerifiedTrustedCommunity

# Design Doc Creator ## When to Load This Skill Load when: design documents, requirements, new project start. Short fixture skill for testing (optional/meta skill).

4SKILL.mdUpdated Apr 17, 2026

pyramidheadshark/tests/fixtures/project-with-status/.claude/skills/design-doc-creator

pyramidheadshark/.claude/skills/windows-developer

development

VerifiedTrustedCommunity

# Windows Developer Guide ## When to Load Automatically loaded on Windows (`platform_trigger: "win32"`). Applies to: `.py`, `.ps1`, `.bat`, `.cmd` files and any Windows-specific workflow. ## Python on Windows ### Encoding (CRITICAL) Windows defaults to `cp1251` / `cp1252` for file I/O. Always specify UTF-8 explicitly: ```python with open("file.txt", "r", encoding="utf-8") as f: content = f.read() Path("file.txt").read_text(encoding="utf-8") Path("file.txt").write_text(content, encodin

4SKILL.mdUpdated Apr 15, 2026

pyramidheadshark/.claude/skills/windows-developer

pyramidheadshark/.claude/skills/test-first-patterns

development

VerifiedTrustedCommunity

# Test-First Patterns ## When to Load This Skill Load when writing tests, creating `.feature` files, setting up conftest, discussing test strategy, or reviewing coverage. ## Philosophy Tests are written BEFORE code. Always. No exceptions. The order is: Design Doc → BDD Scenarios → Unit Tests → Implementation. BDD scenarios come from the design document's use cases section — they are a direct translation of business requirements into executable specifications. This makes tests the living do

4SKILL.mdUpdated Apr 15, 2026

pyramidheadshark/.claude/skills/test-first-patterns

pyramidheadshark/.claude/skills/supply-chain-auditor

testing

VerifiedTrustedCommunity

# Skill: Supply Chain Auditor ## When to Load Auto-load when: adding dependencies, reviewing packages, updating versions, or discussing `requirements.txt`, `pyproject.toml`, `package.json`. Triggers on `dependency`, `install`, `package`, `CVE`, `audit`, `vulnerable` (≥2 keywords). ## Core Rules Every new dependency addition must pass this checklist before merging: 1. **Pinned** — exact version in production (`==1.2.3` for pip, `"1.2.3"` for npm, not `^` or `~`). 2. **Maintained** — last com

4SKILL.mdUpdated Apr 15, 2026

pyramidheadshark/.claude/skills/supply-chain-auditor

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/pyramidheadshark/ml-claude-infra.git

# Copy into Claude Code skills folder (global)
cp -r ml-claude-infra/.claude/skills/nlp-slm-patterns ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

pyramidheadshark/ml-claude-infra

4 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT