confide/skills/annotate/SKILL.md
--- name: annotate description: Build and verify a PII gold set with HUMAN annotators (first-class). Launch the browser annotator, label spans per the codebook, export per-annotator label files, then compute inter-annotator agreement (Cohen's/Fleiss' kappa) and draft an adjudicated gold. Use when the user says "annotate PII", "label this transcript", "build a gold set", "inter-annotator agreement", "review annotations", "adjudicate labels", or wants to measure/defend a de-identification gold sta
npx skillsauth add glebis/claude-skills confide/skills/annotateInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Humans label PII spans in a transcript; you measure how much they agree (κ) and draft an adjudicated gold from their labels. Annotators are first-class here — most of this skill is plain instructions FOR a person doing the labelling, plus a coordinator path to score it.
confide:anon) and annotate the GREEN copy.assets/annotator.html — zero-install browser annotation tool (EN/RU, runs offline).references/codebook.md — the labelling rulebook (10 PII types, direct/quasi, harm).references/tool-guide.md — how to drive the tool + scorer step by step.scripts/score_iaa.py — Cohen's/Fleiss' κ, span-F1, disagreement queue, draft gold (stdlib).scripts/gold_to_labels.py — turn an existing gold into a "reference annotator" to test solo.assets/annotator.html (or open it in Chrome/Firefox/
Safari). It runs entirely in your browser — nothing is uploaded; labels stay on your
machine until you Export.references/codebook.md first. It defines the 10 types
(PERSON, LOCATION, ORG, PHONE, EMAIL, ID, DATE, MEDICATION, AGE, PROFESSION), what counts as
a span (the minimal identifying text), and direct vs. quasi-identifier.A, B, or your name).
Use only synthetic or consented text.QUESTION: on the
span (e.g. QUESTION: gym or city?). These flow straight into the adjudication queue.labels.<doc>.<annotator>.json
(schema: {doc_id, annotator, text, spans:[{start,end,text,type,...}]}). Keep it local and
hand only this file to the coordinator. Two+ people should label the same doc independently
(blind) for a meaningful κ.labels.<doc>.<annotator>.json into one folder, e.g. labels/.python3 skills/annotate/scripts/score_iaa.py --labels-dir labels/ --out-dir results/
It writes (per doc + overall): Cohen's κ (pairwise), Fleiss' κ (3+ annotators),
span-F1, a disagreement queue (*-iaa-disagreements.json: every cluster annotators
don't fully agree on, plus any QUESTION: spans), and a draft adjudicated gold
(*-adjudicated-gold-draft.json: majority span per overlap-cluster, ties/questions marked
needs_review:true). Character-level κ sidesteps tokenization disputes.needs_review cluster. The resulting label set is the published gold; report
post-adjudication κ too. Nothing is ever auto-finalised.Treat an existing gold JSONL as one "reference annotator", label the same doc yourself in
annotator.html as another, then score the pair:
python3 skills/annotate/scripts/gold_to_labels.py --gold GOLD.jsonl --name gold --out-dir labels/
# label the same doc yourself in annotator.html as "me" -> drop labels.<doc>.me.json into labels/
python3 skills/annotate/scripts/score_iaa.py --labels-dir labels/ --out-dir results/
(--sessions-dir DIR lets gold_to_labels.py read transcript text from disk so char offsets
match the gold exactly.)
IAA results (κ, F1) + a disagreement list + a draft adjudicated gold — labels/stats only. Transcript text and original PII stay local; only what's needed to adjudicate is shared.
documentation
Cut a software release and maintain a tiered compatibility policy. Use when the user wants to release, ship a version, bump the version, tag a release, write a changelog, or update COMPATIBILITY. Config-driven via release.config.json; bumps version files, runs a readiness gate, updates COMPATIBILITY.md tiers and deprecations, tags (→ release workflow), and reports closed issues. Teaches the underlying standards as it runs.
development
Sync and manage bilingual (EN/RU) library content for agency-docs. Use when adding, updating, or reviewing library articles. Handles translation, sync checks, and Russian stylistic review.
development
This skill should be used to watch a long-running background job (ffmpeg/media encode, qmd or other embedding/vector-DB run, batch agent/LLM pipeline, or a real-browser/agent-browser daemon) until it finishes or wedges, then deliver a verdict (done, needs-attention, or blocked) plus the exact next command, without burning dozens of manual poll commands. Triggers on "babysit this job", "watch this until it's done", "ping me when the encode/embed/batch finishes", "is this background process stuck", "monitor this ffmpeg/qmd run", or any request to wait on a long-running process and be told when it's complete or hung.
development
Use when the user wants Claude Code, Codex, or other AI coding/business agents to work together as peers. This skill should be used whenever the user mentions coordinating Claude Code and Codex, agent handoffs, multi-agent workflows, parity, respect, pushback between agents, deciding which agent should lead, or turning a business/code workflow into a two-agent operating model.