Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

pyramidheadshark/.claude/skills/data-validation

Name: .claude/skills/data-validation
Author: pyramidheadshark

.claude/skills/data-validation/SKILL.md

npx skillsauth add pyramidheadshark/ml-claude-infra .claude/skills/data-validation

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Data Validation

When to Load This Skill

Load when working with: Pandera DataFrame schemas, Great Expectations suites, data quality checks, input validation for ML pipelines, data contracts between pipeline stages.

Pandera — DataFrame Schema Validation

Define schemas declaratively and validate at pipeline boundaries:

import pandera as pa
from pandera.typing import DataFrame, Series


class InputSchema(pa.DataFrameModel):
    user_id: Series[int] = pa.Field(ge=0, nullable=False)
    age: Series[float] = pa.Field(ge=0, le=120, nullable=True)
    category: Series[str] = pa.Field(isin=["A", "B", "C"])
    score: Series[float] = pa.Field(ge=0.0, le=1.0)

    class Config:
        strict = True
        coerce = True


@pa.check_types
def preprocess(df: DataFrame[InputSchema]) -> DataFrame[InputSchema]:
    return df.dropna(subset=["user_id"])

Validate without decorator:

try:
    InputSchema.validate(df, lazy=True)
except pa.errors.SchemaErrors as e:
    print(e.failure_cases)

Pydantic Data Contracts

Use Pydantic for row-level validation in ingestion endpoints:

from pydantic import BaseModel, Field, field_validator
from typing import Literal


class RecordInput(BaseModel):
    user_id: int = Field(ge=0)
    age: float | None = Field(default=None, ge=0, le=120)
    category: Literal["A", "B", "C"]
    score: float = Field(ge=0.0, le=1.0)

    @field_validator("score")
    @classmethod
    def score_precision(cls, v: float) -> float:
        return round(v, 6)

FastAPI Ingestion Endpoint with Validation

from fastapi import APIRouter, HTTPException
import pandera as pa

router = APIRouter()


@router.post("/ingest")
async def ingest_batch(records: list[RecordInput]) -> dict:
    df = pd.DataFrame([r.model_dump() for r in records])
    try:
        InputSchema.validate(df, lazy=True)
    except pa.errors.SchemaErrors as e:
        raise HTTPException(status_code=422, detail=e.failure_cases.to_dict())
    return {"accepted": len(df)}

ML Pipeline Input/Output Validation

Validate at each stage boundary:

class FeatureSchema(pa.DataFrameModel):
    feature_1: Series[float] = pa.Field(nullable=False)
    feature_2: Series[float] = pa.Field(nullable=False)
    target: Series[int] = pa.Field(isin=[0, 1])

    class Config:
        strict = False


class PredictionSchema(pa.DataFrameModel):
    user_id: Series[int]
    probability: Series[float] = pa.Field(ge=0.0, le=1.0)
    label: Series[int] = pa.Field(isin=[0, 1])

Data Quality Checks (Custom)

For lightweight checks without a full framework:

from dataclasses import dataclass
from typing import Callable
import pandas as pd


@dataclass
class Check:
    name: str
    fn: Callable[[pd.DataFrame], bool]
    error_msg: str


def run_checks(df: pd.DataFrame, checks: list[Check]) -> list[str]:
    failures = []
    for check in checks:
        if not check.fn(df):
            failures.append(f"{check.name}: {check.error_msg}")
    return failures


QUALITY_CHECKS = [
    Check("no_nulls_user_id", lambda df: df["user_id"].notna().all(), "user_id has nulls"),
    Check("score_range", lambda df: df["score"].between(0, 1).all(), "score out of [0,1]"),
    Check("min_rows", lambda df: len(df) >= 10, "batch too small (< 10 rows)"),
]

Known Pitfalls

Pandera strict=True rejects any columns not in the schema — use strict=False for pass-through pipelines where extra columns are expected
lazy=True in validate() collects ALL failures before raising — use it for batch reporting; without it, validation stops at the first error
Pydantic field_validator runs AFTER type coercion — validate the coerced value, not the raw input string
Never skip validation in "dev mode" — data quality issues in dev become silent corruptions in production

Resources

Pandera docs: https://pandera.readthedocs.io/
Pydantic validation: https://docs.pydantic.dev/latest/concepts/validators/

pyramidheadshark/.claude/skills/data-validation

.claude/skills/data-validation/SKILL.md

# Data Validation ## When to Load This Skill Load when working with: Pandera DataFrame schemas, Great Expectations suites, data quality checks, input validation for ML pipelines, data contracts between pipeline stages. ## Pandera — DataFrame Schema Validation Define schemas declaratively and validate at pipeline boundaries: ```python import pandera as pa from pandera.typing import DataFrame, Series class InputSchema(pa.DataFrameModel): user_id: Series[int] = pa.Field(ge=0, nullable=Fa

4 stars

development

Updated Apr 15, 2026

$ install --global

skillsauth

npx skillsauth add pyramidheadshark/ml-claude-infra .claude/skills/data-validation

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 15, 2026, 8:03 AM101.0s2 files scanned

SKILL.md

Data Validation

When to Load This Skill

Load when working with: Pandera DataFrame schemas, Great Expectations suites, data quality checks, input validation for ML pipelines, data contracts between pipeline stages.

Pandera — DataFrame Schema Validation

Define schemas declaratively and validate at pipeline boundaries:

import pandera as pa
from pandera.typing import DataFrame, Series


class InputSchema(pa.DataFrameModel):
    user_id: Series[int] = pa.Field(ge=0, nullable=False)
    age: Series[float] = pa.Field(ge=0, le=120, nullable=True)
    category: Series[str] = pa.Field(isin=["A", "B", "C"])
    score: Series[float] = pa.Field(ge=0.0, le=1.0)

    class Config:
        strict = True
        coerce = True


@pa.check_types
def preprocess(df: DataFrame[InputSchema]) -> DataFrame[InputSchema]:
    return df.dropna(subset=["user_id"])

Validate without decorator:

try:
    InputSchema.validate(df, lazy=True)
except pa.errors.SchemaErrors as e:
    print(e.failure_cases)

Pydantic Data Contracts

Use Pydantic for row-level validation in ingestion endpoints:

from pydantic import BaseModel, Field, field_validator
from typing import Literal


class RecordInput(BaseModel):
    user_id: int = Field(ge=0)
    age: float | None = Field(default=None, ge=0, le=120)
    category: Literal["A", "B", "C"]
    score: float = Field(ge=0.0, le=1.0)

    @field_validator("score")
    @classmethod
    def score_precision(cls, v: float) -> float:
        return round(v, 6)

FastAPI Ingestion Endpoint with Validation

from fastapi import APIRouter, HTTPException
import pandera as pa

router = APIRouter()


@router.post("/ingest")
async def ingest_batch(records: list[RecordInput]) -> dict:
    df = pd.DataFrame([r.model_dump() for r in records])
    try:
        InputSchema.validate(df, lazy=True)
    except pa.errors.SchemaErrors as e:
        raise HTTPException(status_code=422, detail=e.failure_cases.to_dict())
    return {"accepted": len(df)}

ML Pipeline Input/Output Validation

Validate at each stage boundary:

class FeatureSchema(pa.DataFrameModel):
    feature_1: Series[float] = pa.Field(nullable=False)
    feature_2: Series[float] = pa.Field(nullable=False)
    target: Series[int] = pa.Field(isin=[0, 1])

    class Config:
        strict = False


class PredictionSchema(pa.DataFrameModel):
    user_id: Series[int]
    probability: Series[float] = pa.Field(ge=0.0, le=1.0)
    label: Series[int] = pa.Field(isin=[0, 1])

Data Quality Checks (Custom)

For lightweight checks without a full framework:

from dataclasses import dataclass
from typing import Callable
import pandas as pd


@dataclass
class Check:
    name: str
    fn: Callable[[pd.DataFrame], bool]
    error_msg: str


def run_checks(df: pd.DataFrame, checks: list[Check]) -> list[str]:
    failures = []
    for check in checks:
        if not check.fn(df):
            failures.append(f"{check.name}: {check.error_msg}")
    return failures


QUALITY_CHECKS = [
    Check("no_nulls_user_id", lambda df: df["user_id"].notna().all(), "user_id has nulls"),
    Check("score_range", lambda df: df["score"].between(0, 1).all(), "score out of [0,1]"),
    Check("min_rows", lambda df: len(df) >= 10, "batch too small (< 10 rows)"),
]

Known Pitfalls

Pandera strict=True rejects any columns not in the schema — use strict=False for pass-through pipelines where extra columns are expected
lazy=True in validate() collects ALL failures before raising — use it for batch reporting; without it, validation stops at the first error
Pydantic field_validator runs AFTER type coercion — validate the coerced value, not the raw input string
Never skip validation in "dev mode" — data quality issues in dev become silent corruptions in production

Resources

Pandera docs: https://pandera.readthedocs.io/
Pydantic validation: https://docs.pydantic.dev/latest/concepts/validators/

Related Skills

pyramidheadshark/tests/fixtures/project-with-status/.claude/skills/design-doc-creator

testing

VerifiedTrustedCommunity

# Design Doc Creator ## When to Load This Skill Load when: design documents, requirements, new project start. Short fixture skill for testing (optional/meta skill).

4SKILL.mdUpdated Apr 17, 2026

pyramidheadshark/tests/fixtures/project-with-status/.claude/skills/design-doc-creator

pyramidheadshark/.claude/skills/windows-developer

development

VerifiedTrustedCommunity

# Windows Developer Guide ## When to Load Automatically loaded on Windows (`platform_trigger: "win32"`). Applies to: `.py`, `.ps1`, `.bat`, `.cmd` files and any Windows-specific workflow. ## Python on Windows ### Encoding (CRITICAL) Windows defaults to `cp1251` / `cp1252` for file I/O. Always specify UTF-8 explicitly: ```python with open("file.txt", "r", encoding="utf-8") as f: content = f.read() Path("file.txt").read_text(encoding="utf-8") Path("file.txt").write_text(content, encodin

4SKILL.mdUpdated Apr 15, 2026

pyramidheadshark/.claude/skills/windows-developer

pyramidheadshark/.claude/skills/test-first-patterns

development

VerifiedTrustedCommunity

# Test-First Patterns ## When to Load This Skill Load when writing tests, creating `.feature` files, setting up conftest, discussing test strategy, or reviewing coverage. ## Philosophy Tests are written BEFORE code. Always. No exceptions. The order is: Design Doc → BDD Scenarios → Unit Tests → Implementation. BDD scenarios come from the design document's use cases section — they are a direct translation of business requirements into executable specifications. This makes tests the living do

4SKILL.mdUpdated Apr 15, 2026

pyramidheadshark/.claude/skills/test-first-patterns

pyramidheadshark/.claude/skills/supply-chain-auditor

testing

VerifiedTrustedCommunity

# Skill: Supply Chain Auditor ## When to Load Auto-load when: adding dependencies, reviewing packages, updating versions, or discussing `requirements.txt`, `pyproject.toml`, `package.json`. Triggers on `dependency`, `install`, `package`, `CVE`, `audit`, `vulnerable` (≥2 keywords). ## Core Rules Every new dependency addition must pass this checklist before merging: 1. **Pinned** — exact version in production (`==1.2.3` for pip, `"1.2.3"` for npm, not `^` or `~`). 2. **Maintained** — last com

4SKILL.mdUpdated Apr 15, 2026

pyramidheadshark/.claude/skills/supply-chain-auditor

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/pyramidheadshark/ml-claude-infra.git

# Copy into Claude Code skills folder (global)
cp -r ml-claude-infra/.claude/skills/data-validation ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

pyramidheadshark/ml-claude-infra

4 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT