Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

sharkitect-solutions/transcribe

Name: transcribe
Author: sharkitect-solutions

skills/transcribe/SKILL.md

npx skillsauth add sharkitect-solutions/sharkitect-claude-toolkit transcribe

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Audio Transcribe

Transcribe audio using OpenAI's transcription API via the bundled CLI. Supports plain text, structured JSON, and speaker-diarized output with optional known-speaker identification.

File Index

| File | Purpose | |------|---------| | scripts/transcribe_diarize.py | Python CLI (277 lines) - transcription with diarization, known speakers, dry-run mode | | references/api.md | API quick reference - input formats, size limits, response formats, known speaker notes | | agents/openai.yaml | Agent interface definition - display name, icon, default prompt | | assets/transcribe.png | Skill icon (large) | | assets/transcribe-small.svg | Skill icon (small) |

Scope Boundary

| Task | This skill? | Use instead | |------|-------------|-------------| | Transcribe audio/video file to text | YES | - | | Label speakers in recorded meeting | YES | - | | Identify known speakers by voice sample | YES | - | | Batch transcribe multiple audio files | YES | - | | Real-time speech-to-text streaming | NO | voice-ai-development | | Voice agent with conversation flow | NO | voice-agents | | Text-to-speech synthesis | NO | voice-ai-development | | Telephony IVR or call routing | NO | twilio-communications | | Audio noise reduction or editing | NO | standard audio tools |

Model Selection Decision Matrix

First match wins. Stop at the first row where Signal is true.

| Signal | Model | Response Format | CLI flags | |--------|-------|-----------------|-----------| | Need speaker labels | gpt-4o-transcribe-diarize | diarized_json | --model gpt-4o-transcribe-diarize --response-format diarized_json | | Known speakers to identify | gpt-4o-transcribe-diarize | diarized_json | above + --known-speaker "Name=path.wav" per speaker | | Need timestamps in structured output | gpt-4o-mini-transcribe | json | --response-format json | | Fast plain-text transcription (default) | gpt-4o-mini-transcribe | text | (no extra flags needed) |

Audio Pre-Assessment

| Condition | Action | Why | |-----------|--------|-----| | Clean recording, single speaker | Transcribe directly with mini-transcribe | Fastest, cheapest path | | Multiple speakers, labels needed | Use diarize model with diarized_json | Only model that produces speaker segments | | Multiple speakers, labels NOT needed | Use mini-transcribe with text | Faster, diarization unnecessary | | Audio >30 seconds | Keep --chunking-strategy auto (default) | Mandatory for diarize model; recommended for all long audio | | Audio file >25MB | Split file before sending | Hard API limit, request will fail at 25MB | | Background noise or low quality | Add --language hint for expected language | Helps model compensate; accuracy still degrades | | Non-standard format | Convert to mp3/wav/m4a first | Supported: mp3, mp4, mpeg, mpga, m4a, wav, webm |

Transcription Strategy Procedure

Follow this sequence for every transcription request. Do not skip steps.

Assess audio characteristics: Identify speaker count, recording quality, duration, and file format. Check file size against 25MB limit.
Determine output needs: Does the user need plain text, timestamps, or speaker labels? This determines the model and response format.
Select model using Decision Matrix: Match the first row where Signal is true. Do not default to diarize model "just in case" -- it is slower and more expensive.
Validate with dry-run: Run --dry-run first on any non-trivial request (multiple files, known speakers, unfamiliar audio format). This catches configuration errors before consuming API credits.
Execute and verify: After transcription, spot-check speaker attributions if diarized. Warn the user that labels are probabilistic for any critical attribution.

Key mindset: The most common mistake is jumping to the diarize model for any multi-speaker audio. If the user does not need speaker labels, mini-transcribe is faster, cheaper, and often more accurate for pure text output.

CLI Reference

Script location: ~/.claude/skills/transcribe/scripts/transcribe_diarize.py

Prerequisite: uv pip install openai (or pip install openai). OPENAI_API_KEY must be set in environment.

By use case

Simple transcription (most common):

python ~/.claude/skills/transcribe/scripts/transcribe_diarize.py recording.mp3 --out transcript.txt

Speaker-labeled meeting notes:

python ~/.claude/skills/transcribe/scripts/transcribe_diarize.py meeting.m4a \
  --model gpt-4o-transcribe-diarize \
  --response-format diarized_json \
  --out-dir output/meeting

Known speaker identification (max 4 speakers):

python ~/.claude/skills/transcribe/scripts/transcribe_diarize.py interview.wav \
  --model gpt-4o-transcribe-diarize \
  --response-format diarized_json \
  --known-speaker "Alice=refs/alice.wav" \
  --known-speaker "Bob=refs/bob.wav" \
  --out-dir output/interview

Batch transcription (multiple files):

python ~/.claude/skills/transcribe/scripts/transcribe_diarize.py file1.mp3 file2.wav file3.m4a --out-dir output/batch

Dry run (validate inputs, print payload, no API call):

python ~/.claude/skills/transcribe/scripts/transcribe_diarize.py audio.mp3 --dry-run

Diarization Gotchas

These are non-obvious constraints that cause silent failures or unexpected results:

Max 4 known speaker references -- API hard limit. The CLI enforces this. If you have more speakers, omit the least important and let the model assign generic labels.
Prompting NOT supported with diarize model -- --prompt flag with gpt-4o-transcribe-diarize causes an error. The CLI blocks this combination. Use --language hint instead for guidance.
Known speaker refs must be audio, not text -- The API needs voice samples (wav/mp3/m4a files) to match speakers. Text descriptions of voices do not work.
diarized_json format ONLY works with diarize model -- Using --response-format diarized_json with gpt-4o-mini-transcribe will fail. The CLI validates this.
Speaker labels are probabilistic -- The model guesses speaker boundaries. Short utterances (<2 seconds) and speakers with similar voices cause misattribution. Always verify critical speaker attributions manually.
Chunking is mandatory for diarize model on long audio -- The default --chunking-strategy auto handles this, but if overridden to none, audio >30s will fail with the diarize model.
Known speaker audio quality matters -- Reference clips should be clean, single-speaker audio of 5-15 seconds. Noisy or multi-speaker reference clips degrade matching accuracy significantly.

Output Format Guide

| Format | Extension | Use for | Contains | |--------|-----------|---------|----------| | text | .txt | Direct reading, editing, summarization | Plain transcript text only | | json | .json | Programmatic access, timestamp extraction | Structured segments with start/end times | | diarized_json | .json | Meeting notes, interview analysis, attribution | Speaker-labeled segments with timestamps |

Rationalization

| Concept | Why it is HERE and not in general knowledge | |---------|----------------------------------------------| | Model selection matrix (mini vs diarize) | OpenAI transcription models are new (2024-2025), selection criteria not widely documented, wrong choice = failed request | | Known speaker reference mechanics | Underdocumented API feature using extra_body with base64 data URLs -- not guessable from standard SDK docs | | diarized_json format constraints | Format-model coupling is a hard constraint that causes cryptic errors if violated | | Chunking strategy requirements | Mandatory for diarize model on long audio but optional for mini -- asymmetric requirement not obvious | | CLI script architecture | Bundled 277-line script with validation, dry-run, batch support -- must know it exists and how to invoke | | Speaker attribution confidence | Probabilistic labeling with known failure modes (short utterances, similar voices) -- critical for meeting notes accuracy |

Red Flags -- STOP and reassess

User wants real-time voice conversation -- this is file-based transcription, not streaming STT
User asks to generate speech from text -- this is speech-to-text only, not TTS
User needs telephony integration -- transcription is API-based, not phone-system-connected
Audio file exceeds 25MB -- must split before sending, cannot increase limit
User expects 100% speaker attribution accuracy -- labels are probabilistic, warn about verification
User wants to use --prompt with diarize model -- not supported, will error
User provides text descriptions instead of audio files for known speakers -- API requires audio samples
User needs transcription in a language the model may not support well -- set expectations about accuracy

NEVER

NEVER call the OpenAI API directly when the bundled CLI script handles the use case -- the script has validation, error handling, and output formatting built in
NEVER use diarized_json response format with gpt-4o-mini-transcribe -- format requires the diarize model
NEVER pass --prompt flag with gpt-4o-transcribe-diarize -- prompting is not supported for this model
NEVER send audio files larger than 25MB without splitting first -- hard API limit, request will be rejected
NEVER present diarized speaker labels as ground truth -- labels are probabilistic and must be verified for critical attributions

sharkitect-solutions/transcribe

skills/transcribe/SKILL.md

Use when transcribing audio/video files to text, speech-to-text from recordings, speaker diarization, labeling speakers in interviews/meetings/podcasts, or extracting text from audio. NEVER for real-time STT/TTS pipelines or voice agent implementation (use voice-ai-development). NEVER for voice agent architecture or multi-agent voice systems (use voice-agents). NEVER for audio editing, mixing, or processing (use standard audio tools). NEVER for phone system configuration or IVR (use twilio-communications).

tools

Updated Apr 29, 2026

$ install --global

skillsauth

npx skillsauth add sharkitect-solutions/sharkitect-claude-toolkit transcribe

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 26, 2026, 8:12 AM25.7s1 file scanned

SKILL.md

name:: transcribe
description:: Use when transcribing audio/video files to text, speech-to-text from recordings, speaker diarization, labeling speakers in interviews/meetings/podcasts, or extracting text from audio. NEVER for real-time STT/TTS pipelines or voice agent implementation (use voice-ai-development). NEVER for voice agent architecture or multi-agent voice systems (use voice-agents). NEVER for audio editing, mixing, or processing (use standard audio tools). NEVER for phone system configuration or IVR (use twilio-communications).

Audio Transcribe

Transcribe audio using OpenAI's transcription API via the bundled CLI. Supports plain text, structured JSON, and speaker-diarized output with optional known-speaker identification.

File Index

Scope Boundary

Model Selection Decision Matrix

First match wins. Stop at the first row where Signal is true.

Audio Pre-Assessment

Transcription Strategy Procedure

Follow this sequence for every transcription request. Do not skip steps.

Assess audio characteristics: Identify speaker count, recording quality, duration, and file format. Check file size against 25MB limit.
Determine output needs: Does the user need plain text, timestamps, or speaker labels? This determines the model and response format.
Select model using Decision Matrix: Match the first row where Signal is true. Do not default to diarize model "just in case" -- it is slower and more expensive.
Validate with dry-run: Run --dry-run first on any non-trivial request (multiple files, known speakers, unfamiliar audio format). This catches configuration errors before consuming API credits.
Execute and verify: After transcription, spot-check speaker attributions if diarized. Warn the user that labels are probabilistic for any critical attribution.

CLI Reference

Script location: ~/.claude/skills/transcribe/scripts/transcribe_diarize.py

Prerequisite: uv pip install openai (or pip install openai). OPENAI_API_KEY must be set in environment.

By use case

Simple transcription (most common):

python ~/.claude/skills/transcribe/scripts/transcribe_diarize.py recording.mp3 --out transcript.txt

Speaker-labeled meeting notes:

python ~/.claude/skills/transcribe/scripts/transcribe_diarize.py meeting.m4a \
  --model gpt-4o-transcribe-diarize \
  --response-format diarized_json \
  --out-dir output/meeting

Known speaker identification (max 4 speakers):

python ~/.claude/skills/transcribe/scripts/transcribe_diarize.py interview.wav \
  --model gpt-4o-transcribe-diarize \
  --response-format diarized_json \
  --known-speaker "Alice=refs/alice.wav" \
  --known-speaker "Bob=refs/bob.wav" \
  --out-dir output/interview

Batch transcription (multiple files):

python ~/.claude/skills/transcribe/scripts/transcribe_diarize.py file1.mp3 file2.wav file3.m4a --out-dir output/batch

Dry run (validate inputs, print payload, no API call):

python ~/.claude/skills/transcribe/scripts/transcribe_diarize.py audio.mp3 --dry-run

Diarization Gotchas

These are non-obvious constraints that cause silent failures or unexpected results:

Max 4 known speaker references -- API hard limit. The CLI enforces this. If you have more speakers, omit the least important and let the model assign generic labels.
Prompting NOT supported with diarize model -- --prompt flag with gpt-4o-transcribe-diarize causes an error. The CLI blocks this combination. Use --language hint instead for guidance.
Known speaker refs must be audio, not text -- The API needs voice samples (wav/mp3/m4a files) to match speakers. Text descriptions of voices do not work.
diarized_json format ONLY works with diarize model -- Using --response-format diarized_json with gpt-4o-mini-transcribe will fail. The CLI validates this.
Speaker labels are probabilistic -- The model guesses speaker boundaries. Short utterances (<2 seconds) and speakers with similar voices cause misattribution. Always verify critical speaker attributions manually.
Chunking is mandatory for diarize model on long audio -- The default --chunking-strategy auto handles this, but if overridden to none, audio >30s will fail with the diarize model.
Known speaker audio quality matters -- Reference clips should be clean, single-speaker audio of 5-15 seconds. Noisy or multi-speaker reference clips degrade matching accuracy significantly.

Output Format Guide

Rationalization

Red Flags -- STOP and reassess

User wants real-time voice conversation -- this is file-based transcription, not streaming STT
User asks to generate speech from text -- this is speech-to-text only, not TTS
User needs telephony integration -- transcription is API-based, not phone-system-connected
Audio file exceeds 25MB -- must split before sending, cannot increase limit
User expects 100% speaker attribution accuracy -- labels are probabilistic, warn about verification
User wants to use --prompt with diarize model -- not supported, will error
User provides text descriptions instead of audio files for known speakers -- API requires audio samples
User needs transcription in a language the model may not support well -- set expectations about accuracy

NEVER

NEVER call the OpenAI API directly when the bundled CLI script handles the use case -- the script has validation, error handling, and output formatting built in
NEVER use diarized_json response format with gpt-4o-mini-transcribe -- format requires the diarize model
NEVER pass --prompt flag with gpt-4o-transcribe-diarize -- prompting is not supported for this model
NEVER send audio files larger than 25MB without splitting first -- hard API limit, request will be rejected
NEVER present diarized speaker labels as ground truth -- labels are probabilistic and must be verified for critical attributions

Related Skills

sharkitect-solutions/paid-ads

development

VerifiedTrustedCommunity

When the user wants help with paid advertising campaigns on Google Ads, Meta (Facebook/Instagram), LinkedIn, Twitter/X, or other ad platforms. Also use when the user mentions 'PPC,' 'paid media,' 'ad copy,' 'ad creative,' 'ROAS,' 'CPA,' 'ad campaign,' 'retargeting,' or 'audience targeting.' This skill covers campaign strategy, ad creation, audience targeting, and optimization.

SKILL.mdUpdated May 29, 2026

sharkitect-solutions/paid-ads

sharkitect-solutions/skills/using-sharkitect-methodology

testing

VerifiedTrustedCommunity

--- name: using-sharkitect-methodology description: Use when starting any conversation in a Sharkitect workspace OR before any task involving NEW pricing, positioning, proposal, strategy, plan-execution, or schema-design work — mandates invocation of Sharkitect-specific methodology skills (pricing-strategy, marketing-strategy-pmm, smb-cfo, hq-revenue-ops, executing-plans, brainstorming) under the same anti-rationalization discipline as using-superpowers. Documentation has failed 4 times across H

SKILL.mdUpdated May 13, 2026

sharkitect-solutions/skills/using-sharkitect-methodology

sharkitect-solutions/end-session

testing

VerifiedTrustedCommunity

Use when user says 'end session', 'wrap up', 'stop for the day', 'done for today', 'close out', 'save session', 'wrapping up', or invokes /end-session. Runs the full 9-step end-of-session protocol: resource audit, MEMORY.md update, lessons capture, plan status, pending items, workspace checklist, .tmp/ audit, git commit+push, Supabase brain sync, session brief, summary. Final step schedules a detached self-kill of the current session ONLY (3s delay) so the window closes cleanly. Other claude.exe processes (active workspaces) are NOT touched -- orphan cleanup is handled separately by Claude-Orphan-Cleanup-Hourly with proper age safeguards. Do NOT use for: mid-session quick saves (use session-checkpoint), skill syncing (use sync-skills.py), brain memory queries (use supabase-sync.py pull), document freshness reviews (use document-lifecycle), resource gap detection (use resource-auditor).

SKILL.mdUpdated May 12, 2026

sharkitect-solutions/end-session

sharkitect-solutions/humanizer

testing

VerifiedTrustedCommunity

Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, passive voice, negative parallelisms, and filler phrases.

SKILL.mdUpdated May 7, 2026

sharkitect-solutions/humanizer

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/sharkitect-solutions/sharkitect-claude-toolkit.git

# Copy into Claude Code skills folder (global)
cp -r sharkitect-claude-toolkit/skills/transcribe ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

sharkitect-solutions/sharkitect-claude-toolkit

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT