confide/skills/audit/SKILL.md
Run a corpus-scale, STATS-ONLY PII audit over a folder of session transcripts LOCALLY and produce an aggregate report — counts by type and by layer, the per-session redaction-rate distribution, document lengths, and a coarse residual proxy. Use when the user says "audit my sessions", "scan folder for PII", "how much PII across these transcripts", "PII stats for my corpus", "is my redaction holding at scale", or points at a directory of transcripts and asks how much personal data it contains. Fully local — raw text never leaves the machine; the report carries ZERO PII values, transcript substrings, or filenames (only anonymized own-NN ids and counts), so the aggregates are safe to surface. Run it on a RED (raw) corpus to size the PII, or on a GREEN (already-redacted) corpus to check residual leakage.
npx skillsauth add glebis/claude-skills auditInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Measure how much PII lives across a whole folder of sessions, without ever exposing any
of it. The audit runs the layered LOCAL detector stack from shared/confide_core.py
(regex → Natasha → local LLM) over each file and emits only aggregates. This mirrors
the real_session_eval privacy contract: read text only in-process, emit counts.
own-00, own-01, …
The original path/name is never written or printed. On an unreadable file, only the
index + exception class name is recorded.n_files, total / mean / min / max document charsspans_by_type (PERSON, EMAIL, PHONE, DATE, …) and spans_by_layer (regex / natasha / llm)overall_redaction_rate plus the per-session redaction-rate distribution
(min / median / mean / max)Point it at a folder (recurses, processes every .md/.txt; skips confide's own
*.green.md / *.stats.json outputs):
python3 skills/audit/scripts/audit.py FOLDER
Options:
--list paths.txt — also/instead audit absolute paths listed one per line.--layers regex,natasha,llm — choose detection layers (default from config).
Use --layers regex for a fully offline, deterministic pass (no models/network).--out report.md — report path; a report.json sibling is written alongside.--html — also write a Tufte-ish dashboard (report.html, counts only).Writes the markdown + json report (and optional HTML) and prints the aggregate summary — all counts only.
spans_by_type tell you
whether redaction is holding at scale.Layer availability (Natasha, local LLM via Ollama) comes from config — run
confide:setup if they aren't installed. --layers regex always works offline.
development
This skill should be used when designing, running, validating, or auditing statistical experiments on personal or observational time-series data (health metrics, speech/text corpora, behavioral logs, diaries, n-of-1 self-tracking). It enforces pre-registration, exact permutation tests, FDR discipline, data-validation gates, adversarial code review, and cross-validation with external models. Triggers on "design an experiment", "test this hypothesis on my data", "is this correlation real", "audit these findings", "pre-register", "validate this dataset", or any n-of-1 / quantified-self analysis request.
development
Create Tufte-inspired data reports and infographic dashboards as standalone HTML files. Uses EB Garamond for text, Monaspace Argon for numbers, Chart.js for interactive charts, and inline SVG sparklines. Produces publication-quality reports with 2-column narrative+data layouts, status dashboards, scroll animations, and responsive mobile support. Use this skill whenever the user wants to create a data report, activity dashboard, infographic, personal analytics page, health tracker visualization, or any document that combines narrative text with interactive charts and tables. Also triggers for "make a report like Tufte", "create an infographic", "build a dashboard", "visualize my data", or requests for beautiful data-driven documents.
documentation
Cut a software release and maintain a tiered compatibility policy. Use when the user wants to release, ship a version, bump the version, tag a release, write a changelog, or update COMPATIBILITY. Config-driven via release.config.json; bumps version files, runs a readiness gate, updates COMPATIBILITY.md tiers and deprecations, tags (→ release workflow), and reports closed issues. Teaches the underlying standards as it runs.
development
Sync and manage bilingual (EN/RU) library content for agency-docs. Use when adding, updating, or reviewing library articles. Handles translation, sync checks, and Russian stylistic review.