Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

etanhey/wispr-mining

Name: wispr-mining
Author: etanhey

skills/golem-powers/wispr-mining/SKILL.md

npx skillsauth add etanhey/golems wispr-mining

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Wispr Flow Mining

Extract ASR misrecognition patterns from Wispr Flow's SQLite database and generate clean, importable dictionary files.

Database Location

~/Library/Application Support/Wispr Flow/flow.sqlite

CRITICAL: Always work on a COPY. Before any operation:

cp ~/Library/Application\ Support/Wispr\ Flow/flow.sqlite /tmp/wispr-flow-readonly.sqlite

Then query /tmp/wispr-flow-readonly.sqlite exclusively. Never write to the production database.

Schema Reference

Dictionary Table

| Column | Type | Purpose | |--------|------|---------| | id | VARCHAR(36) | UUID primary key | | phrase | VARCHAR(255) | The dictionary entry (vocabulary word or trigger phrase) | | replacement | VARCHAR(255) | NULL = vocabulary entry. Non-NULL = replacement mapping | | frequencyUsed | INTEGER | How often this entry has fired | | isDeleted | TINYINT(1) | Soft delete flag | | isSnippet | TINYINT(1) | Snippet (long-form expansion), not a correction | | manualEntry | TINYINT(1) | User-added vs auto-detected | | createdAt | DATETIME | When added | | lastUsed | DATETIME | Last trigger time |

History Table

| Column | Type | Purpose | |--------|------|---------| | transcriptEntityId | VARCHAR(36) | UUID primary key | | asrText | TEXT | Raw ASR output (what the mic heard) | | formattedText | TEXT | After Wispr's formatter + dictionary | | editedText | TEXT | User's manual correction (NULL if no edit) | | app | VARCHAR(255) | Which app was active | | timestamp | DATETIME | When dictated | | numWords | INTEGER | Word count | | isArchived | TINYINT(1) | Archive flag |

Mining Workflow

Step 1: Copy Database

cp ~/Library/Application\ Support/Wispr\ Flow/flow.sqlite /tmp/wispr-flow-readonly.sqlite

Step 2: Current Dictionary Audit

-- Active dictionary entries (not deleted, not snippets)
SELECT phrase, replacement, frequencyUsed, lastUsed
FROM Dictionary
WHERE isDeleted = 0 AND isSnippet = 0
ORDER BY frequencyUsed DESC;

-- Unused entries (candidates for cleanup)
SELECT phrase, replacement, createdAt
FROM Dictionary
WHERE isDeleted = 0 AND isSnippet = 0 AND frequencyUsed = 0
ORDER BY createdAt;

-- Snippets (long-form expansions)
SELECT phrase, replacement FROM Dictionary
WHERE isDeleted = 0 AND isSnippet = 1;

Step 3: Find ASR Misrecognition Patterns

-- Words that ASR consistently gets wrong (asrText differs from formattedText)
-- Group by the ASR mistake to find systematic patterns
SELECT
  LOWER(asrText) as asr_pattern,
  formattedText as corrected_to,
  COUNT(*) as occurrences,
  GROUP_CONCAT(DISTINCT app) as apps
FROM History
WHERE asrText IS NOT NULL
  AND formattedText IS NOT NULL
  AND asrText != formattedText
  AND isArchived = 0
  AND LENGTH(asrText) < 100
GROUP BY LOWER(asrText)
HAVING COUNT(*) >= 3
ORDER BY occurrences DESC
LIMIT 50;

Step 4: Find User Edit Patterns

-- Cases where user manually corrected the formatted text
-- These are the HIGHEST-SIGNAL gaps
SELECT
  formattedText,
  editedText,
  app,
  timestamp
FROM History
WHERE editedText IS NOT NULL
  AND editedText != formattedText
  AND isArchived = 0
  AND LENGTH(editedText) < 200
ORDER BY timestamp DESC
LIMIT 50;

IMPORTANT: Classify edits into two buckets:

Whitespace-only edits (trailing spaces, leading newlines, paragraph breaks) — LOW signal, ignore for dictionary purposes
Real content corrections (word changes, missing words, ASR errors) — HIGH signal, these drive new entries

Use TRIM(formattedText) != TRIM(editedText) to filter out whitespace-only edits. Report both counts but only act on content corrections.

Step 5: Cross-Reference with Existing Dictionary

For each ASR pattern found in Step 3, check if it's already covered:

-- Check if a misrecognition is already in dictionary
SELECT phrase, replacement
FROM Dictionary
WHERE isDeleted = 0
  AND (phrase LIKE '%<pattern>%' OR replacement LIKE '%<pattern>%');

Only recommend NEW entries that aren't already covered.

Step 6: Generate Output Files

OUTPUT FORMAT IS CRITICAL. Wispr Flow imports CSV files literally — every line becomes a dictionary entry.

Vocabulary File (new words to recognize)

File: wispr-vocabulary-update.csv Format: One word per line. NO headers. NO comments. NO blank lines. NO quotes unless the word itself contains a comma.

toml
TOML
golems.toml
agentic

Replacements File (trigger → correction mappings)

File: wispr-replacements.csv Format: trigger,replacement per line. NO headers. NO comments. NO blank lines. NO quotes unless values contain commas.

tomo,toml
golems.tomo,golems.toml
Claw.md,CLAUDE.md

Output Validation Gate

MANDATORY before declaring done. Run these checks on every generated file:

# Check 1: No comment lines (lines starting with #)
grep -c '^#' OUTPUT_FILE && echo "FAIL: Comments found" || echo "PASS: No comments"

# Check 2: No blank lines
grep -c '^$' OUTPUT_FILE && echo "FAIL: Blank lines found" || echo "PASS: No blank lines"

# Check 3: No header lines (common CSV headers)
grep -ciE '^(phrase|word|trigger|replacement|vocabulary|category)' OUTPUT_FILE && echo "FAIL: Header found" || echo "PASS: No headers"

# Check 4: Replacements file has exactly one comma per line
awk -F',' 'NF!=2 {print "FAIL line " NR ": " $0; exit 1}' REPLACEMENTS_FILE && echo "PASS: Format correct" || true

# Check 5: No markdown formatting
grep -cE '^\||\*\*|^##|^-' OUTPUT_FILE && echo "FAIL: Markdown found" || echo "PASS: No markdown"

If ANY check fails, fix the file and re-validate. Do NOT declare success with failing validation.

Anti-Patterns

| DO NOT | WHY | DO INSTEAD | |--------|-----|------------| | Add # comment lines to CSV | Wispr imports them as dictionary entries | Keep files pure data only | | Add CSV headers (phrase,replacement) | Imported as a dictionary entry | Start with data directly | | Put stats/analysis in the CSV | Gets imported as entries | Write analysis to a separate .md file | | Add blank lines for readability | May create empty dictionary entries | No blank lines ever | | Combine vocabulary + replacements | Different import workflows | Two separate files always | | Query the production database | Risk of corruption | Copy to /tmp first | | Add entries without checking existing | Creates duplicates | Always cross-reference Step 5 |

Output Deliverables

Every run MUST produce exactly 3 files:

wispr-vocabulary-update.csv — New vocabulary words (one per line, no metadata)
wispr-replacements.csv — New trigger→replacement mappings (trigger,replacement per line)
wispr-mining-report.md — Analysis report with:
- Total records analyzed
- Top ASR misrecognition patterns (with counts)
- User edit patterns found
- Dictionary effectiveness stats (% of entries with >0 uses)
- Recommended entries with justification
- Gap analysis (high-frequency misrecognitions not in dictionary)

The .md report is where analysis goes. The CSVs are PURE DATA ONLY.

Periodic Mining Schedule

Run this skill quarterly or when:

User reports frequent voice corrections in a new domain
New project vocabulary emerges (new repo names, tool names, client names)
Dictionary cleanup needed (removing unused entries)

etanhey/wispr-mining

skills/golem-powers/wispr-mining/SKILL.md

Mine Wispr Flow SQLite database for ASR vocabulary gaps and correction patterns. Generates clean, importable CSV files (vocabulary + replacements). Use when: updating Wispr dictionary, finding ASR misrecognitions, auditing voice transcription quality, 'wispr mining', 'update wispr dictionary', 'voice vocabulary gaps'. NOT for: general voice processing (use voicelayer), speech-to-text implementation.

2 stars

testing

Updated Apr 17, 2026

$ install --global

skillsauth

npx skillsauth add etanhey/golems wispr-mining

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 17, 2026, 12:16 PM6.4s3 files scanned

SKILL.md

name:: wispr-mining
description:: Mine Wispr Flow SQLite database for ASR vocabulary gaps and correction patterns. Generates clean, importable CSV files (vocabulary + replacements). Use when: updating Wispr dictionary, finding ASR misrecognitions, auditing voice transcription quality, 'wispr mining', 'update wispr dictionary', 'voice vocabulary gaps'. NOT for: general voice processing (use voicelayer), speech-to-text implementation.

Wispr Flow Mining

Extract ASR misrecognition patterns from Wispr Flow's SQLite database and generate clean, importable dictionary files.

Database Location

~/Library/Application Support/Wispr Flow/flow.sqlite

CRITICAL: Always work on a COPY. Before any operation:

cp ~/Library/Application\ Support/Wispr\ Flow/flow.sqlite /tmp/wispr-flow-readonly.sqlite

Then query /tmp/wispr-flow-readonly.sqlite exclusively. Never write to the production database.

Schema Reference

Dictionary Table

History Table

Mining Workflow

Step 1: Copy Database

cp ~/Library/Application\ Support/Wispr\ Flow/flow.sqlite /tmp/wispr-flow-readonly.sqlite

Step 2: Current Dictionary Audit

-- Active dictionary entries (not deleted, not snippets)
SELECT phrase, replacement, frequencyUsed, lastUsed
FROM Dictionary
WHERE isDeleted = 0 AND isSnippet = 0
ORDER BY frequencyUsed DESC;

-- Unused entries (candidates for cleanup)
SELECT phrase, replacement, createdAt
FROM Dictionary
WHERE isDeleted = 0 AND isSnippet = 0 AND frequencyUsed = 0
ORDER BY createdAt;

-- Snippets (long-form expansions)
SELECT phrase, replacement FROM Dictionary
WHERE isDeleted = 0 AND isSnippet = 1;

Step 3: Find ASR Misrecognition Patterns

-- Words that ASR consistently gets wrong (asrText differs from formattedText)
-- Group by the ASR mistake to find systematic patterns
SELECT
  LOWER(asrText) as asr_pattern,
  formattedText as corrected_to,
  COUNT(*) as occurrences,
  GROUP_CONCAT(DISTINCT app) as apps
FROM History
WHERE asrText IS NOT NULL
  AND formattedText IS NOT NULL
  AND asrText != formattedText
  AND isArchived = 0
  AND LENGTH(asrText) < 100
GROUP BY LOWER(asrText)
HAVING COUNT(*) >= 3
ORDER BY occurrences DESC
LIMIT 50;

Step 4: Find User Edit Patterns

-- Cases where user manually corrected the formatted text
-- These are the HIGHEST-SIGNAL gaps
SELECT
  formattedText,
  editedText,
  app,
  timestamp
FROM History
WHERE editedText IS NOT NULL
  AND editedText != formattedText
  AND isArchived = 0
  AND LENGTH(editedText) < 200
ORDER BY timestamp DESC
LIMIT 50;

IMPORTANT: Classify edits into two buckets:

Whitespace-only edits (trailing spaces, leading newlines, paragraph breaks) — LOW signal, ignore for dictionary purposes
Real content corrections (word changes, missing words, ASR errors) — HIGH signal, these drive new entries

Use TRIM(formattedText) != TRIM(editedText) to filter out whitespace-only edits. Report both counts but only act on content corrections.

Step 5: Cross-Reference with Existing Dictionary

For each ASR pattern found in Step 3, check if it's already covered:

-- Check if a misrecognition is already in dictionary
SELECT phrase, replacement
FROM Dictionary
WHERE isDeleted = 0
  AND (phrase LIKE '%<pattern>%' OR replacement LIKE '%<pattern>%');

Only recommend NEW entries that aren't already covered.

Step 6: Generate Output Files

OUTPUT FORMAT IS CRITICAL. Wispr Flow imports CSV files literally — every line becomes a dictionary entry.

Vocabulary File (new words to recognize)

File: wispr-vocabulary-update.csv Format: One word per line. NO headers. NO comments. NO blank lines. NO quotes unless the word itself contains a comma.

toml
TOML
golems.toml
agentic

Replacements File (trigger → correction mappings)

File: wispr-replacements.csv Format: trigger,replacement per line. NO headers. NO comments. NO blank lines. NO quotes unless values contain commas.

tomo,toml
golems.tomo,golems.toml
Claw.md,CLAUDE.md

Output Validation Gate

MANDATORY before declaring done. Run these checks on every generated file:

# Check 1: No comment lines (lines starting with #)
grep -c '^#' OUTPUT_FILE && echo "FAIL: Comments found" || echo "PASS: No comments"

# Check 2: No blank lines
grep -c '^$' OUTPUT_FILE && echo "FAIL: Blank lines found" || echo "PASS: No blank lines"

# Check 3: No header lines (common CSV headers)
grep -ciE '^(phrase|word|trigger|replacement|vocabulary|category)' OUTPUT_FILE && echo "FAIL: Header found" || echo "PASS: No headers"

# Check 4: Replacements file has exactly one comma per line
awk -F',' 'NF!=2 {print "FAIL line " NR ": " $0; exit 1}' REPLACEMENTS_FILE && echo "PASS: Format correct" || true

# Check 5: No markdown formatting
grep -cE '^\||\*\*|^##|^-' OUTPUT_FILE && echo "FAIL: Markdown found" || echo "PASS: No markdown"

If ANY check fails, fix the file and re-validate. Do NOT declare success with failing validation.

Anti-Patterns

Output Deliverables

Every run MUST produce exactly 3 files:

wispr-vocabulary-update.csv — New vocabulary words (one per line, no metadata)
wispr-replacements.csv — New trigger→replacement mappings (trigger,replacement per line)
wispr-mining-report.md — Analysis report with:
- Total records analyzed
- Top ASR misrecognition patterns (with counts)
- User edit patterns found
- Dictionary effectiveness stats (% of entries with >0 uses)
- Recommended entries with justification
- Gap analysis (high-frequency misrecognitions not in dictionary)

The .md report is where analysis goes. The CSVs are PURE DATA ONLY.

Periodic Mining Schedule

Run this skill quarterly or when:

User reports frequent voice corrections in a new domain
New project vocabulary emerges (new repo names, tool names, client names)
Dictionary cleanup needed (removing unused entries)

Related Skills

etanhey/phoenix-human-view

tools

VerifiedTrustedCommunity

The human-eval UX contract for Phoenix views: turn-by-turn scrollable replay (not a scorecard), hide-but-copyable IDs, collapsed thinking, identity chips, tool filters, tiny frozen starter datasets, mark-wrong-in-thread, mobile-first. Use when: building or reviewing ANY Phoenix/eval view, annotation UI, session replay, or human-grading surface. Triggers: phoenix view, eval UI, annotation view, session replay, human eval UX, grading interface. NOT for: Phoenix data pipelines/ingest (capture scripts have their own specs).

3SKILL.mdUpdated Jun 7, 2026

etanhey/phoenix-human-view

etanhey/mac-systems

tools

VerifiedTrustedCommunity

macOS systems specialist — AppKit NSPanel architecture, launchd services, socket activation, MCP bridge resilience, syspolicyd, and high-frequency SwiftUI dashboards. Use when building menu-bar apps, LaunchAgents, debugging syspolicyd/Gatekeeper/TCC, resilient UDS/MCP bridges, or SwiftUI dashboards at 10Hz+.

3SKILL.mdUpdated Jun 7, 2026

etanhey/judge-fleet

development

VerifiedTrustedCommunity

Bulk LLM-judging protocol for fleet-dispatched verdict runs (KG cluster, eval harness). Use when: dispatching or running judge workers (J1/J2/RT), planning bulk-apply from verdict JSONL, or triaging evidence_degraded outputs. Triggers: judge fleet, bulk judge, R3 verdicts, kg-judge, RT gate, evidence_degraded. NOT for: single-item code review, Phoenix view UX (use phoenix-human-view), or non-judge eval pipelines.

3SKILL.mdUpdated Jun 7, 2026

etanhey/fleet-wrap

development

VerifiedTrustedCommunity

Quiet-down protocol for sprint close: when the fleet wraps, delete ALL polling crons and monitors, send ONE final dashboard + ONE message, then go SILENT. Use when: fleet wraps, all workers done, overnight queue exhausted, sprint close, Etan asleep/away with nothing approved left. Triggers: fleet wrap, wrap the fleet, stand down, going quiet, sprint close. NOT for: mid-sprint monitoring (keep your loops), spawning a successor (use /session-handoff first).

3SKILL.mdUpdated Jun 7, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/etanhey/golems.git

# Copy into Claude Code skills folder (global)
cp -r golems/skills/golem-powers/wispr-mining ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

etanhey/golems

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT