skills/development/content-hash-cache-pattern/SKILL.md
Cache expensive file processing results using SHA-256 content hashes — path-independent, auto-invalidating, with service layer separation.
npx skillsauth add bereniketech/claude_kit content-hash-cache-patternInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Cache expensive file processing results (PDF parsing, text extraction, image analysis) using SHA-256 content hashes as cache keys. Unlike path-based caching, this survives file moves/renames and auto-invalidates when content changes.
Hash file content, not the file path:
import hashlib
from pathlib import Path
_HASH_CHUNK_SIZE = 65536 # 64 KB chunks
def compute_file_hash(path: Path) -> str:
sha256 = hashlib.sha256()
with open(path, "rb") as f:
while chunk := f.read(_HASH_CHUNK_SIZE):
sha256.update(chunk)
return sha256.hexdigest()
Rule: Always chunk large files — never load the entire file into memory just to hash it.
Name each entry {hash}.json for O(1) lookup with no index file required:
def write_cache(cache_dir: Path, file_hash: str, data: dict) -> None:
cache_dir.mkdir(parents=True, exist_ok=True)
(cache_dir / f"{file_hash}.json").write_text(
json.dumps(data, ensure_ascii=False), encoding="utf-8"
)
def read_cache(cache_dir: Path, file_hash: str) -> dict | None:
cache_file = cache_dir / f"{file_hash}.json"
if not cache_file.is_file():
return None
try:
return json.loads(cache_file.read_text(encoding="utf-8"))
except (json.JSONDecodeError, ValueError, KeyError):
return None # Corruption = cache miss, never crash
Rule: Treat any deserialization error as a cache miss — graceful degradation is mandatory.
Keep the processing function pure. Add caching as a separate wrapper — never mix the two responsibilities:
def process_with_cache(
file_path: Path,
*,
cache_enabled: bool = True,
cache_dir: Path = Path(".cache"),
):
if not cache_enabled:
return process_file(file_path) # pure function, no cache knowledge
file_hash = compute_file_hash(file_path)
cached = read_cache(cache_dir, file_hash)
if cached is not None:
return cached
result = process_file(file_path)
write_cache(cache_dir, file_hash, result)
return result
Rule: The core processing function must have zero knowledge of caching. SRP violation = future maintenance pain.
Content hashes self-invalidate: when file bytes change, the hash changes, producing a new cache key. No explicit invalidation logic is needed.
Use cases where this pattern fits:
--cache / --no-cache flagsUse cases where it does NOT fit:
dataclasses.asdict() on nested frozen dataclasses — use manual serializationtesting
AUTHORIZED USE ONLY: This skill contains dual-use security techniques. Before proceeding with any bypass or analysis: > 1.
testing
Provide comprehensive techniques for attacking Microsoft Active Directory environments. Covers reconnaissance, credential harvesting, Kerberos attacks, lateral movement, privilege escalation, and domain dominance for red team operations and penetration testing.
development
Detects missing zeroization of sensitive data in source code and identifies zeroization removed by compiler optimizations, with assembly-level analysis, and control-flow verification. Use for auditing C/C++/Rust code handling secrets, keys, passwords, or other sensitive data.
development
Comprehensive guide to auditing web content against WCAG 2.2 guidelines with actionable remediation strategies.