skills/windagszip/SKILL.md
This skill should be used when a SKILL.md file needs compression, deduplication, or token reduction. It provides an embedding-based compression pipeline that detects and removes redundant chunks within SKILL.md files using local embeddings (all-MiniLM-L6-v2). Two-pass approach: (1) free intra-skill deduplication via cosine similarity clustering, (2) optional LLM-judged graded eval to detect pretraining overlap. Typical result: 25-46% token reduction with zero quality loss. This skill is not intended for editing skill content, creating new skills, routing optimization, or cross-skill deduplication.
npx skillsauth add curiositech/windags-skills windagszipInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Analyze SKILL.md files for redundancy and produce compressed variants that preserve behavioral quality while cutting token count.
Two types of redundancy exist in skills:
Always run embeddings first (free), then graded eval on survivors (targeted).
flowchart LR
A[Chunk<br/>12 types] --> B[Embed<br/>384-dim]
B --> C[Similarity<br/>Matrix]
C --> D[Cluster<br/>BFS components]
D --> E[Variants<br/>per-cluster + max]
E --> F{Graded Eval?}
F -->|Free pass| G[Ship compressed]
F -->|Quality check| H[LLM Judge<br/>sonnet + haiku]
H --> G
All scripts live in tools/skill-compression/ relative to the WinDAGs repo root.
python tools/skill-compression/embed_ablate.py <skill-name>
Output shows redundancy clusters — groups of chunks saying the same thing:
Cluster 1 (avg sim: 0.847, redundant tokens: 2,127)
KEEP [reference ] 1821tk Full CSS layering reference...
CUT [code_block ] 312tk .aurora-container { position: absol...
CUT [code_block ] 287tk .atmosphere-layer { position: absol...
KEEP = canonical version (most complete). CUT = duplicates the canonical.
python tools/skill-compression/embed_ablate.py <skill-name> --generate
Produces per-cluster variants (remove one cluster's redundancy) and a max-compression variant (remove all redundancy). Output in ablations/<skill-name>/.
If a test suite exists for the skill:
# Verify pipeline with 1 test case
python tools/skill-compression/eval_judge.py <skill-name> --test
# Run baseline + top 5 variants by token removal
python tools/skill-compression/eval_judge.py <skill-name> --top-n 5
# Full evaluation (all variants)
python tools/skill-compression/eval_judge.py <skill-name> --all
# Analyze saved results (no API calls)
python tools/skill-compression/eval_judge.py <skill-name> --analyze
Two-phase LLM judge: sonnet executor generates a response using the ablated skill, haiku grader scores against expected behavior. Score drop < 0.05 = safe to compress.
Skills fall into two categories. The regime determines the compression strategy:
flowchart TD
A[Run embed_ablate.py] --> B{Redundancy ratio?}
B -->|>20%| C[Code-Heavy]
B -->|5-20%| D[Mixed]
B -->|<5%| E[Knowledge-Dense]
C --> F[Embeddings alone: 20-30% savings, zero API cost]
D --> G[Embeddings first, then graded eval on survivors]
E --> H[Skip to graded eval: savings come from pretraining overlap]
position: absolute; inset: 0 get subsumed by one reference.| Range | Meaning | Action | |-------|---------|--------| | >20% | Heavy self-duplication | Compress with embeddings only | | 5-20% | Moderate duplication | Embeddings first, then graded eval | | <5% | Minimal self-duplication | Skip to graded eval | | 0% | No intra-skill redundancy | All savings come from pretraining overlap |
| Score Drop | Interpretation | Decision | |------------|---------------|----------| | < 0.05 | Within measurement noise | Safe to compress | | 0.05-0.10 | Borderline | Run more test cases | | > 0.10 | Real quality loss | Keep the chunk | | Negative (improvement) | Chunk was actively hurting quality | Definitely remove |
Removing detailed content about well-known topics can IMPROVE quality. Claude's broader knowledge outperforms a skill's narrow checklist items. When a skill micromanages topics Claude already mastered, the extra tokens compete with Claude's training data and produce worse responses. Less is more for pretraining-overlapping content. Detection: Run graded eval. If score improves when a chunk is removed (negative quality drop), that chunk was actively hurting performance.
Human intuition about which chunks are redundant is unreliable. Chunks that look unique may embed nearly identically (same semantic signal), while chunks that look similar may encode distinct signals. Always run embeddings first to get the objective similarity matrix, then validate with graded eval. Detection: Compare human-proposed cuts against embedding clusters. Mismatches reveal hidden redundancy or false duplicates.
The chunker identifies 12 semantic types. Ablatable types (COMPOUND, PARAGRAPH, LIST_BLOCK, CODE_BLOCK, MERMAID, REFERENCE) are candidates for removal. Structural types (SECTION, SUBSECTION) and routing metadata (YAML_PAIRS_WITH) are never removed. A paragraph ending with : or containing "here's"/"for example" bonds to the next code block into a COMPOUND unit -- ablated together, never separately.
For the full taxonomy table, read ./references/technical-details.md.
Default cosine similarity threshold: 0.70.
# Aggressive (more duplicates caught, risk of false positives)
python tools/skill-compression/embed_ablate.py <skill-name> --threshold 0.60
# Conservative (only obvious duplicates)
python tools/skill-compression/embed_ablate.py <skill-name> --threshold 0.80
For production compression, use 0.70 and validate with graded eval.
For worked examples, embedding model internals, similarity computation details, clustering algorithm, dependency installation, script inventory, and rate-distortion theory background, read ./references/technical-details.md.
Key dependencies (for quick setup):
pip install sentence-transformers numpy
All scripts live in tools/skill-compression/ from the repo root. Primary entry points: embed_ablate.py (redundancy analysis) and eval_judge.py (LLM-judged quality eval).
This skill produces:
ablations/<skill-name>/ablations/<skill-name>/variants.jsonl with metadata per variant (chunks removed, token delta, variant ID)eval-results/<skill-name>/tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.