Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

SiniyaYousuf/content-hash-cache-pattern

Name: content-hash-cache-pattern
Author: SiniyaYousuf

skills/content-hash-cache-pattern/SKILL.md

npx skillsauth add SiniyaYousuf/everything_claudecode content-hash-cache-pattern

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Content-Hash File Cache Pattern

Cache expensive file processing results (PDF parsing, text extraction, image analysis) using SHA-256 content hashes as cache keys. Unlike path-based caching, this approach survives file moves/renames and auto-invalidates when content changes.

When to Activate

Building file processing pipelines (PDF, images, text extraction)
Processing cost is high and same files are processed repeatedly
Need a --cache/--no-cache CLI option
Want to add caching to existing pure functions without modifying them

Core Pattern

1. Content-Hash Based Cache Key

Use file content (not path) as the cache key:

import hashlib
from pathlib import Path

_HASH_CHUNK_SIZE = 65536  # 64KB chunks for large files

def compute_file_hash(path: Path) -> str:
    """SHA-256 of file contents (chunked for large files)."""
    if not path.is_file():
        raise FileNotFoundError(f"File not found: {path}")
    sha256 = hashlib.sha256()
    with open(path, "rb") as f:
        while True:
            chunk = f.read(_HASH_CHUNK_SIZE)
            if not chunk:
                break
            sha256.update(chunk)
    return sha256.hexdigest()

Why content hash? File rename/move = cache hit. Content change = automatic invalidation. No index file needed.

2. Frozen Dataclass for Cache Entry

from dataclasses import dataclass

@dataclass(frozen=True, slots=True)
class CacheEntry:
    file_hash: str
    source_path: str
    document: ExtractedDocument  # The cached result

3. File-Based Cache Storage

Each cache entry is stored as {hash}.json — O(1) lookup by hash, no index file required.

import json
from typing import Any

def write_cache(cache_dir: Path, entry: CacheEntry) -> None:
    cache_dir.mkdir(parents=True, exist_ok=True)
    cache_file = cache_dir / f"{entry.file_hash}.json"
    data = serialize_entry(entry)
    cache_file.write_text(json.dumps(data, ensure_ascii=False), encoding="utf-8")

def read_cache(cache_dir: Path, file_hash: str) -> CacheEntry | None:
    cache_file = cache_dir / f"{file_hash}.json"
    if not cache_file.is_file():
        return None
    try:
        raw = cache_file.read_text(encoding="utf-8")
        data = json.loads(raw)
        return deserialize_entry(data)
    except (json.JSONDecodeError, ValueError, KeyError):
        return None  # Treat corruption as cache miss

4. Service Layer Wrapper (SRP)

Keep the processing function pure. Add caching as a separate service layer.

def extract_with_cache(
    file_path: Path,
    *,
    cache_enabled: bool = True,
    cache_dir: Path = Path(".cache"),
) -> ExtractedDocument:
    """Service layer: cache check -> extraction -> cache write."""
    if not cache_enabled:
        return extract_text(file_path)  # Pure function, no cache knowledge

    file_hash = compute_file_hash(file_path)

    # Check cache
    cached = read_cache(cache_dir, file_hash)
    if cached is not None:
        logger.info("Cache hit: %s (hash=%s)", file_path.name, file_hash[:12])
        return cached.document

    # Cache miss -> extract -> store
    logger.info("Cache miss: %s (hash=%s)", file_path.name, file_hash[:12])
    doc = extract_text(file_path)
    entry = CacheEntry(file_hash=file_hash, source_path=str(file_path), document=doc)
    write_cache(cache_dir, entry)
    return doc

Key Design Decisions

| Decision | Rationale | |----------|-----------| | SHA-256 content hash | Path-independent, auto-invalidates on content change | | {hash}.json file naming | O(1) lookup, no index file needed | | Service layer wrapper | SRP: extraction stays pure, cache is a separate concern | | Manual JSON serialization | Full control over frozen dataclass serialization | | Corruption returns None | Graceful degradation, re-processes on next run | | cache_dir.mkdir(parents=True) | Lazy directory creation on first write |

Best Practices

Hash content, not paths — paths change, content identity doesn't
Chunk large files when hashing — avoid loading entire files into memory
Keep processing functions pure — they should know nothing about caching
Log cache hit/miss with truncated hashes for debugging
Handle corruption gracefully — treat invalid cache entries as misses, never crash

Anti-Patterns to Avoid

# BAD: Path-based caching (breaks on file move/rename)
cache = {"/path/to/file.pdf": result}

# BAD: Adding cache logic inside the processing function (SRP violation)
def extract_text(path, *, cache_enabled=False, cache_dir=None):
    if cache_enabled:  # Now this function has two responsibilities
        ...

# BAD: Using dataclasses.asdict() with nested frozen dataclasses
# (can cause issues with complex nested types)
data = dataclasses.asdict(entry)  # Use manual serialization instead

When to Use

File processing pipelines (PDF parsing, OCR, text extraction, image analysis)
CLI tools that benefit from --cache/--no-cache options
Batch processing where the same files appear across runs
Adding caching to existing pure functions without modifying them

When NOT to Use

Data that must always be fresh (real-time feeds)
Cache entries that would be extremely large (consider streaming instead)
Results that depend on parameters beyond file content (e.g., different extraction configs)

SiniyaYousuf/content-hash-cache-pattern

skills/content-hash-cache-pattern/SKILL.md

Cache expensive file processing results using SHA-256 content hashes — path-independent, auto-invalidating, with service layer separation.

9 stars

content-media

Updated Apr 21, 2026

$ install --global

skillsauth

npx skillsauth add SiniyaYousuf/everything_claudecode content-hash-cache-pattern

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Mar 29, 2026, 7:26 AM38.3s1 file scanned

SKILL.md

name:: content-hash-cache-pattern
description:: Cache expensive file processing results using SHA-256 content hashes — path-independent, auto-invalidating, with service layer separation.
origin:: ECC

Content-Hash File Cache Pattern

When to Activate

Building file processing pipelines (PDF, images, text extraction)
Processing cost is high and same files are processed repeatedly
Need a --cache/--no-cache CLI option
Want to add caching to existing pure functions without modifying them

Core Pattern

1. Content-Hash Based Cache Key

Use file content (not path) as the cache key:

import hashlib
from pathlib import Path

_HASH_CHUNK_SIZE = 65536  # 64KB chunks for large files

def compute_file_hash(path: Path) -> str:
    """SHA-256 of file contents (chunked for large files)."""
    if not path.is_file():
        raise FileNotFoundError(f"File not found: {path}")
    sha256 = hashlib.sha256()
    with open(path, "rb") as f:
        while True:
            chunk = f.read(_HASH_CHUNK_SIZE)
            if not chunk:
                break
            sha256.update(chunk)
    return sha256.hexdigest()

Why content hash? File rename/move = cache hit. Content change = automatic invalidation. No index file needed.

2. Frozen Dataclass for Cache Entry

from dataclasses import dataclass

@dataclass(frozen=True, slots=True)
class CacheEntry:
    file_hash: str
    source_path: str
    document: ExtractedDocument  # The cached result

3. File-Based Cache Storage

Each cache entry is stored as {hash}.json — O(1) lookup by hash, no index file required.

import json
from typing import Any

def write_cache(cache_dir: Path, entry: CacheEntry) -> None:
    cache_dir.mkdir(parents=True, exist_ok=True)
    cache_file = cache_dir / f"{entry.file_hash}.json"
    data = serialize_entry(entry)
    cache_file.write_text(json.dumps(data, ensure_ascii=False), encoding="utf-8")

def read_cache(cache_dir: Path, file_hash: str) -> CacheEntry | None:
    cache_file = cache_dir / f"{file_hash}.json"
    if not cache_file.is_file():
        return None
    try:
        raw = cache_file.read_text(encoding="utf-8")
        data = json.loads(raw)
        return deserialize_entry(data)
    except (json.JSONDecodeError, ValueError, KeyError):
        return None  # Treat corruption as cache miss

4. Service Layer Wrapper (SRP)

Keep the processing function pure. Add caching as a separate service layer.

def extract_with_cache(
    file_path: Path,
    *,
    cache_enabled: bool = True,
    cache_dir: Path = Path(".cache"),
) -> ExtractedDocument:
    """Service layer: cache check -> extraction -> cache write."""
    if not cache_enabled:
        return extract_text(file_path)  # Pure function, no cache knowledge

    file_hash = compute_file_hash(file_path)

    # Check cache
    cached = read_cache(cache_dir, file_hash)
    if cached is not None:
        logger.info("Cache hit: %s (hash=%s)", file_path.name, file_hash[:12])
        return cached.document

    # Cache miss -> extract -> store
    logger.info("Cache miss: %s (hash=%s)", file_path.name, file_hash[:12])
    doc = extract_text(file_path)
    entry = CacheEntry(file_hash=file_hash, source_path=str(file_path), document=doc)
    write_cache(cache_dir, entry)
    return doc

Key Design Decisions

Best Practices

Hash content, not paths — paths change, content identity doesn't
Chunk large files when hashing — avoid loading entire files into memory
Keep processing functions pure — they should know nothing about caching
Log cache hit/miss with truncated hashes for debugging
Handle corruption gracefully — treat invalid cache entries as misses, never crash

Anti-Patterns to Avoid

# BAD: Path-based caching (breaks on file move/rename)
cache = {"/path/to/file.pdf": result}

# BAD: Adding cache logic inside the processing function (SRP violation)
def extract_text(path, *, cache_enabled=False, cache_dir=None):
    if cache_enabled:  # Now this function has two responsibilities
        ...

# BAD: Using dataclasses.asdict() with nested frozen dataclasses
# (can cause issues with complex nested types)
data = dataclasses.asdict(entry)  # Use manual serialization instead

When to Use

File processing pipelines (PDF parsing, OCR, text extraction, image analysis)
CLI tools that benefit from --cache/--no-cache options
Batch processing where the same files appear across runs
Adding caching to existing pure functions without modifying them

When NOT to Use

Data that must always be fresh (real-time feeds)
Cache entries that would be extremely large (consider streaming instead)
Results that depend on parameters beyond file content (e.g., different extraction configs)

Related Skills

SiniyaYousuf/x-api

development

VerifiedTrustedCommunity

X/Twitter API integration for posting tweets, threads, reading timelines, search, and analytics. Covers OAuth auth patterns, rate limits, and platform-native content posting. Use when the user wants to interact with X programmatically.

9SKILL.mdUpdated Apr 21, 2026

SiniyaYousuf/visa-doc-translate

documentation

VerifiedTrustedCommunity

Translate visa application documents (images) to English and create a bilingual PDF with original and translation

9SKILL.mdUpdated Apr 21, 2026

SiniyaYousuf/visa-doc-translate

SiniyaYousuf/videodb

tools

VerifiedTrustedCommunity

See, Understand, Act on video and audio. See- ingest from local files, URLs, RTSP/live feeds, or live record desktop; return realtime context and playable stream links. Understand- extract frames, build visual/semantic/temporal indexes, and search moments with timestamps and auto-clips. Act- transcode and normalize (codec, fps, resolution, aspect ratio), perform timeline edits (subtitles, text/image overlays, branding, audio overlays, dubbing, translation), generate media assets (image, audio, video), and create real time alerts for events from live streams or desktop capture.

9SKILL.mdUpdated Apr 21, 2026

SiniyaYousuf/video-editing

development

VerifiedTrustedCommunity

AI-assisted video editing workflows for cutting, structuring, and augmenting real footage. Covers the full pipeline from raw capture through FFmpeg, Remotion, ElevenLabs, fal.ai, and final polish in Descript or CapCut. Use when the user wants to edit video, cut footage, create vlogs, or build video content.

9SKILL.mdUpdated Apr 21, 2026

SiniyaYousuf/video-editing

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/SiniyaYousuf/everything_claudecode.git

# Copy into Claude Code skills folder (global)
cp -r everything_claudecode/skills/content-hash-cache-pattern ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

SiniyaYousuf/everything_claudecode

9 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT