Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

jeffvincent/voice-memo-transcriber

Name: voice-memo-transcriber
Author: jeffvincent

skills/voice-memo-transcriber/SKILL.md

npx skillsauth add jeffvincent/claude-config voice-memo-transcriber

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Voice Memo Transcriber

Overview

This skill automatically transcribes voice memo files into plain text using OpenAI's Whisper (open-source, local processing). It handles multiple audio/video formats, generates both plain text and timestamped SRT files, and provides a summary of the content.

All processing happens locally - no data is sent to cloud services. Voice memos may contain sensitive information, so privacy is maintained throughout.

When to Apply

Use this skill when:

User provides a voice memo file path and asks to "transcribe" it
User says "create a transcript" from an audio/video file
User wants to convert a voice recording to text
User needs both plain text and timestamped SRT output from audio
User mentions formats like .m4a, .mp3, .mov, .mp4, .wav, or QuickTime files

Do NOT use this skill for:

Real-time transcription of ongoing audio
Speaker diarization (identifying different speakers)
Translation to other languages
Editing or cleaning up existing transcripts

Inputs

Required:

file_path: Absolute path to voice memo file
- Supported formats: .m4a, .mp3, .mov, .mp4, .wav, .aac, .flac, .ogg, .webm
- File must exist and contain audio

Optional:

output_dir: Directory for output files
- Default: Same directory as source file
- Must be writable
whisper_model: Whisper model size (affects accuracy and speed)
- Default: "base"
- Options: tiny, base, small, medium, large
- Larger models = better accuracy but slower processing

Outputs

Generated files (in output directory):

[filename].txt - Plain text transcript with no timestamps
[filename].srt - Subtitle file with timestamps (HH:MM:SS,mmm format)
[filename].mp3 - Audio file (if conversion from another format was needed)

Displayed to user:

Processing progress updates
Summary of transcript content (1-3 sentences generated by Claude)
File paths for all generated outputs
Total processing time

Instructions for Claude

Step 1: Validate Input

Check that file_path exists using Read or Bash tool
Verify file has audio/video extension
If output_dir specified, verify it exists and is writable
If whisper_model specified, verify it's one of: tiny, base, small, medium, large

Step 2: Check and Install Dependencies

Run these checks in sequence:

Check ffmpeg:

which ffmpeg

If not found, provide installation instructions:

macOS: brew install ffmpeg
Linux: sudo apt-get install ffmpeg or sudo yum install ffmpeg
Windows: Download from https://ffmpeg.org/download.html

Check pipx:

which pipx

If not found, install:

brew install pipx
pipx ensurepath

Check openai-whisper:

pipx list | grep openai-whisper

If not found, install:

pipx install openai-whisper

Step 3: Prepare Audio File

Determine output directory:

If user specified output_dir, use it
Otherwise, extract directory from file_path

Check if file is already MP3:

If yes, skip conversion
If no, convert using ffmpeg:

ffmpeg -i "[input_file]" -vn -ar 16000 -ac 1 -b:a 96k "[output_dir]/[basename].mp3"

Flags explained:

-vn: No video (audio only)
-ar 16000: 16kHz sample rate (Whisper's native rate)
-ac 1: Mono audio
-b:a 96k: 96kbps bitrate (good quality, small size)

Step 4: Transcribe with Whisper

Run Whisper transcription:

whisper "[audio_file]" \
  --model [whisper_model] \
  --output_dir "[output_dir]" \
  --output_format txt \
  --output_format srt \
  --language English

This generates both .txt and .srt files automatically.

Important notes:

First run will download the model (base model ~140MB)
Transcription time varies: ~1-5 minutes for 10-minute audio with base model
Provide progress updates to user while processing

Step 5: Read and Summarize Transcript

Use Read tool to read the generated .txt file
Generate a concise 1-3 sentence summary of the content
Focus on main topics, key points, or action items if present

Step 6: Report Results

Display to user:

✓ Transcription complete!

Files generated:
- Transcript: [path]/[filename].txt
- Subtitles: [path]/[filename].srt
- Audio: [path]/[filename].mp3 (if converted)

Summary:
[Your 1-3 sentence summary here]

Processing time: [X] seconds

Error Handling

File not found:

Message: "Error: Could not find file at [path]. Please check the path and try again."

Unsupported format:

Message: "Error: File format not supported. Supported formats: .m4a, .mp3, .mov, .mp4, .wav, .aac, .flac, .ogg, .webm"

Missing dependencies:

Provide clear installation instructions for the missing tool
Ask user to install and re-run the skill

Corrupted audio:

Message: "Error: Could not process audio file. The file may be corrupted or empty."

Insufficient disk space:

Message: "Error: Insufficient disk space. Transcription requires approximately [size] of free space."

Examples

See resources/EXAMPLES.md for complete examples.

Testing Checklist

See resources/CHECKLIST.md for validation steps.

Security and Privacy

Local Processing:

All transcription happens on the user's machine
No data sent to external servers or APIs
Whisper models cached locally after first download

Sensitive Content:

Voice memos may contain personal, medical, or business-sensitive information
All files remain local and private
Consider adding output directories to .gitignore

File Handling:

Never commit voice files or transcripts to version control
Respect user's specified output directory
Clean up temporary files if conversion was needed (optional - ask user)

Dependencies:

ffmpeg, pipx, and openai-whisper are all open-source tools
Models are downloaded from HuggingFace's official repository
No telemetry or tracking in any dependency

jeffvincent/voice-memo-transcriber

skills/voice-memo-transcriber/SKILL.md

Transcribe voice memos to text using Whisper. Use when user provides audio/video files (.m4a, .mp3, .mov, etc.) and asks to transcribe them into text and SRT format with timestamps.

5 stars

development

Updated Apr 21, 2026

$ install --global

skillsauth

npx skillsauth add jeffvincent/claude-config voice-memo-transcriber

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 22, 2026, 2:05 AM123.4s5 files scanned

SKILL.md

name:: voice-memo-transcriber
description:: Transcribe voice memos to text using Whisper. Use when user provides audio/video files (.m4a, .mp3, .mov, etc.) and asks to transcribe them into text and SRT format with timestamps.

Voice Memo Transcriber

Overview

All processing happens locally - no data is sent to cloud services. Voice memos may contain sensitive information, so privacy is maintained throughout.

When to Apply

Use this skill when:

User provides a voice memo file path and asks to "transcribe" it
User says "create a transcript" from an audio/video file
User wants to convert a voice recording to text
User needs both plain text and timestamped SRT output from audio
User mentions formats like .m4a, .mp3, .mov, .mp4, .wav, or QuickTime files

Do NOT use this skill for:

Real-time transcription of ongoing audio
Speaker diarization (identifying different speakers)
Translation to other languages
Editing or cleaning up existing transcripts

Inputs

Required:

file_path: Absolute path to voice memo file
- Supported formats: .m4a, .mp3, .mov, .mp4, .wav, .aac, .flac, .ogg, .webm
- File must exist and contain audio

Optional:

output_dir: Directory for output files
- Default: Same directory as source file
- Must be writable
whisper_model: Whisper model size (affects accuracy and speed)
- Default: "base"
- Options: tiny, base, small, medium, large
- Larger models = better accuracy but slower processing

Outputs

Generated files (in output directory):

[filename].txt - Plain text transcript with no timestamps
[filename].srt - Subtitle file with timestamps (HH:MM:SS,mmm format)
[filename].mp3 - Audio file (if conversion from another format was needed)

Displayed to user:

Processing progress updates
Summary of transcript content (1-3 sentences generated by Claude)
File paths for all generated outputs
Total processing time

Instructions for Claude

Step 1: Validate Input

Check that file_path exists using Read or Bash tool
Verify file has audio/video extension
If output_dir specified, verify it exists and is writable
If whisper_model specified, verify it's one of: tiny, base, small, medium, large

Step 2: Check and Install Dependencies

Run these checks in sequence:

Check ffmpeg:

which ffmpeg

If not found, provide installation instructions:

macOS: brew install ffmpeg
Linux: sudo apt-get install ffmpeg or sudo yum install ffmpeg
Windows: Download from https://ffmpeg.org/download.html

Check pipx:

which pipx

If not found, install:

brew install pipx
pipx ensurepath

Check openai-whisper:

pipx list | grep openai-whisper

If not found, install:

pipx install openai-whisper

Step 3: Prepare Audio File

Determine output directory:

If user specified output_dir, use it
Otherwise, extract directory from file_path

Check if file is already MP3:

If yes, skip conversion
If no, convert using ffmpeg:

ffmpeg -i "[input_file]" -vn -ar 16000 -ac 1 -b:a 96k "[output_dir]/[basename].mp3"

Flags explained:

-vn: No video (audio only)
-ar 16000: 16kHz sample rate (Whisper's native rate)
-ac 1: Mono audio
-b:a 96k: 96kbps bitrate (good quality, small size)

Step 4: Transcribe with Whisper

Run Whisper transcription:

whisper "[audio_file]" \
  --model [whisper_model] \
  --output_dir "[output_dir]" \
  --output_format txt \
  --output_format srt \
  --language English

This generates both .txt and .srt files automatically.

Important notes:

First run will download the model (base model ~140MB)
Transcription time varies: ~1-5 minutes for 10-minute audio with base model
Provide progress updates to user while processing

Step 5: Read and Summarize Transcript

Use Read tool to read the generated .txt file
Generate a concise 1-3 sentence summary of the content
Focus on main topics, key points, or action items if present

Step 6: Report Results

Display to user:

✓ Transcription complete!

Files generated:
- Transcript: [path]/[filename].txt
- Subtitles: [path]/[filename].srt
- Audio: [path]/[filename].mp3 (if converted)

Summary:
[Your 1-3 sentence summary here]

Processing time: [X] seconds

Error Handling

File not found:

Message: "Error: Could not find file at [path]. Please check the path and try again."

Unsupported format:

Message: "Error: File format not supported. Supported formats: .m4a, .mp3, .mov, .mp4, .wav, .aac, .flac, .ogg, .webm"

Missing dependencies:

Provide clear installation instructions for the missing tool
Ask user to install and re-run the skill

Corrupted audio:

Message: "Error: Could not process audio file. The file may be corrupted or empty."

Insufficient disk space:

Message: "Error: Insufficient disk space. Transcription requires approximately [size] of free space."

Examples

See resources/EXAMPLES.md for complete examples.

Testing Checklist

See resources/CHECKLIST.md for validation steps.

Security and Privacy

Local Processing:

All transcription happens on the user's machine
No data sent to external servers or APIs
Whisper models cached locally after first download

Sensitive Content:

Voice memos may contain personal, medical, or business-sensitive information
All files remain local and private
Consider adding output directories to .gitignore

File Handling:

Never commit voice files or transcripts to version control
Respect user's specified output directory
Clean up temporary files if conversion was needed (optional - ask user)

Dependencies:

ffmpeg, pipx, and openai-whisper are all open-source tools
Models are downloaded from HuggingFace's official repository
No telemetry or tracking in any dependency

Related Skills

jeffvincent/Caption Video

tools

VerifiedTrustedCommunity

Render a video clip with captions overlaid, using the Remotion captioner at `/Users/jvincent/Projects/remotion-captioner/`. Use when user provides a video file and wants to add captions/subtitles, mentions "caption this video", "add captions", "burn in subtitles", or provides a video + SRT file pair.

5SKILL.mdUpdated Apr 23, 2026

jeffvincent/Caption Video

jeffvincent/wistia-uploader

development

VerifiedTrustedCommunity

Upload video files to Wistia projects using the Data API. Use when user wants to upload videos to their Wistia account for hosting, transcription, or sharing.

5SKILL.mdUpdated Apr 21, 2026

jeffvincent/wistia-uploader

jeffvincent/skills/voice-authenticity

testing

VerifiedTrustedCommunity

# Voice Authenticity Reviewer ## Purpose Review any written content for alignment with authentic speaking and writing voice using analyzed patterns from 7 meeting transcripts and strategic memos. ## When to Use This Skill - Before sharing strategic memos with leadership - Before sending important emails - When drafting presentation scripts - When reviewing documentation for external sharing - As part of Writing /produce-memo workflow (Step 6) - Anytime voice authenticity verification is needed

5SKILL.mdUpdated Apr 21, 2026

jeffvincent/skills/voice-authenticity

jeffvincent/Video Transcript Analyzer

data-ai

VerifiedTrustedCommunity

Analyze customer interview transcripts (SRT or plain text) to generate thematic breakdowns with summary, quotes, topics, timestamps, and full transcript. Use when given video transcripts or asked to create chapter markers.

5SKILL.mdUpdated Apr 21, 2026

jeffvincent/Video Transcript Analyzer

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/jeffvincent/claude-config.git

# Copy into Claude Code skills folder (global)
cp -r claude-config/skills/voice-memo-transcriber ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

jeffvincent/claude-config

5 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT