skills/podcast-edit/SKILL.md
Edit podcast audio — trim pre/post-show chat, remove filler words, cut silences, and enhance audio quality. Use when the user asks to edit a podcast, clean up audio, remove fillers, trim a recording, or improve voice quality.
npx skillsauth add OpenClaudia/openclaudia-skills podcast-editInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Process raw podcast/meeting recordings into polished podcast episodes.
ffmpeg and ffprobe installedOPENAI_API_KEY in environment (for Whisper API transcription)ffprobe -v quiet -print_format json -show_format -show_streams "INPUT_FILE"
Note: duration, sample rate, channels, codec, bitrate.
Split into 5-minute chunks and transcribe via OpenAI Whisper API with segment-level timestamps:
# Extract chunk
ffmpeg -y -i "INPUT_FILE" -ss OFFSET -t 300 -ar 16000 -ac 1 /tmp/chunk_OFFSET.mp3
# Transcribe
curl -s https://api.openai.com/v1/audio/transcriptions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-F file="@/tmp/chunk_OFFSET.mp3" \
-F model="whisper-1" \
-F response_format="verbose_json" \
-F language="LANG" \
-F 'timestamp_granularities[]=segment' > /tmp/transcript_OFFSET.json
Scan transcriptions for:
Do an initial trim with -ss START -to END and -c copy (no re-encode) to create a working file.
Split the trimmed file into 5-minute chunks and transcribe each with word-level timestamps:
# Extract chunks
for i in $(seq 0 300 DURATION); do
ffmpeg -y -i "TRIMMED_FILE" -ss $i -t 300 -ar 16000 -ac 1 /tmp/wchunk_${i}.mp3
done
# Transcribe each chunk (can run in parallel)
for i in $(seq 0 300 DURATION); do
curl -s https://api.openai.com/v1/audio/transcriptions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-F file="@/tmp/wchunk_${i}.mp3" \
-F model="whisper-1" \
-F response_format="verbose_json" \
-F language="LANG" \
-F 'timestamp_granularities[]=word' \
-F 'timestamp_granularities[]=segment' > /tmp/wtranscript_${i}.json &
done
wait
Then run the filler removal script that ships with this skill:
python3 ./filler_removal.py \
--total-duration DURATION \
--end-at END_TIMESTAMP \
--cut START1:END1 --cut START2:END2 \
--chunk-offsets 0,300,600,900,...
Arguments:
--total-duration: Duration of the trimmed input file in seconds (required)--end-at: Cut everything after this timestamp (e.g., post-show chat start)--cut START:END: Cut a specific range. Can be repeated.--chunk-offsets: Comma-separated chunk offsets (default: auto 0,300,600,…)The script outputs /tmp/ffmpeg_filter.txt with an atrim+concat filter.
Apply the filter in two passes:
# Step A: Cut fillers → intermediate WAV (avoids re-encoding artifacts)
ffmpeg -y -i "TRIMMED_FILE" \
-filter_complex_script /tmp/ffmpeg_filter.txt \
-map '[out]' -c:a pcm_s16le -ar 44100 /tmp/podcast_cut.wav
# Step B: Enhance audio → final MP3
ffmpeg -y -i /tmp/podcast_cut.wav \
-af "ENHANCEMENT_CHAIN" \
-c:a libmp3lame -b:a 192k "OUTPUT_FILE"
Limitations: Whisper word-level timestamps for Chinese can miss fillers that are blended into adjacent speech. The script catches standalone fillers reliably but may miss ~10–20% of embedded ones.
Default chain (guest-friendly — handles multi-speaker volume imbalance). The biggest mistake in past runs is using a noise gate (agate) that silences the quieter guest entirely. Never add agate back to the default chain.
highpass=f=80, # Remove room rumble
lowpass=f=12000, # Remove hiss (use 7500 for 16kHz sources)
afftdn=nf=-25:nr=8:nt=w, # Gentle FFT noise reduction
equalizer=f=180:t=q:w=1.5:g=-2, # Cut mud
equalizer=f=2500:t=q:w=1.2:g=3, # Boost presence
equalizer=f=4500:t=q:w=1.5:g=1.5, # Boost clarity
dynaudnorm=f=200:g=5:p=0.95:m=5:s=0, # Rolling-window normalization — lifts the quieter speaker independently
acompressor=threshold=-20dB:ratio=2:attack=5:release=200:makeup=1, # Gentle glue
loudnorm=I=-16:TP=-1.5:LRA=13 # Podcast standard loudness
Why dynaudnorm is the star: it normalizes in 200 ms rolling windows, so when the guest is speaking, that window gets lifted independently of the host's louder windows. Order matters — run dynaudnorm BEFORE acompressor so the compressor sees a balanced signal.
Never add these to the default chain:
agate (noise gate) — cuts off any speaker quieter than the threshold; kills the guest.loudnorm — crushes natural speech dynamics.Adjust lowpass based on source sample rate:
lowpass=7500lowpass=12000 (or skip)Verify guest audibility after rendering: run ffmpeg -i OUTPUT -af "ebur128=peak=true" -f null - and check I: is near −16 LUFS and LRA: is 4–6 LU (tighter LRA is fine because dynaudnorm did per-window balancing first). If the output sounds like the guest was cut, suspect a gate or aggressive compressor crept back in.
ls -lh "OUTPUT_FILE"
ffprobe -v quiet -show_entries format=duration -of csv=p=0 "OUTPUT_FILE"
Report: duration, file size, what was removed (filler count, silence count, time saved).
If the host is producing bilingual Chinese/English show notes, the Chinese section must be written in actual Chinese — not Chinese grammar with English verbs and nouns sprinkled in. Code-switching like "close 了一个 deal", "build 出来的 agent", or "PR 不是 buy 来的" reads like a draft and is the #1 mistake to avoid.
Translate these common startup/tech English loanwords into Chinese:
$20K, $200K, or 200 美金 (either form is fine when paired with a number)Re-read the Chinese section as a Chinese reader. If any sentence feels like it was half-translated — e.g., contains "build", "close", "deal", "view", "stack", "leader" as standalone English words — rewrite those words in Chinese. The only English that should survive a re-read is brand names and the acronyms above.
Whisper frequently mangles company names, product names, and personal names. Before generating show notes or any output that includes names and links:
acme.com, acmehq.com, or something else entirely. Always ask.This is especially important when generating backlinks or social posts — a misspelled domain is a wasted link.
Two separate sections — Chinese first, then English (or whichever languages the show targets). Do NOT interleave or put them side-by-side.
Heading rule: only use H2 (##). Avoid H3 or deeper — flatten all sub-sections to H2.
Timestamp format: always MM:SS with leading zeros (e.g., 08:25, 00:00, 42:10). Never 0:00 or 1:05.
EP{NNN}: {Episode title}
---
## 中文
**嘉宾:** {中文姓名 English Name}, {中文职位} {公司} (URL)
## 简介
{完整中文段落}
## 时间轴
- 00:00 — {中文描述}
- 08:25 — {中文描述}
## 核心要点
- {中文要点}
## 相关链接
- {品牌名}:{URL}
---
## English
**Guest:** {English Name}, {Title} at {Company} (URL)
## Summary
{Full English paragraph}
## Timestamps
- 00:00 — {English description}
- 08:25 — {English description}
## Key Takeaways
- {English takeaway}
## Links
- {Brand}: {URL}
Why two sections instead of bilingual bullets: Chinese readers want clean Chinese prose, English readers want clean English prose. Alternating "中文 / English" on every bullet makes both halves harder to read. Write each section as if it were the only one.
If the user just wants a simple trim (e.g., "cut the first 3s"):
ffmpeg -y -i "INPUT" -ss 3 -c copy "OUTPUT"
Use -c copy for instant lossless trim when no audio processing is needed.
data-ai
Generate images using AI (OpenAI GPT Image or Stability AI). Use when the user asks to generate an image, create an AI image, make an illustration, or produce artwork from a text prompt.
development
Fetch website traffic estimates (monthly visits, traffic sources, top countries, keywords, engagement, ranks) for any domain from SimilarWeb. Use when the user asks about a domain's traffic, monthly visits, traffic sources, audience countries, or wants to compare/benchmark sites against competitors.
development
Find which ChatGPT search queries mention a given brand. Tests long-tail queries against ChatGPT's web-search-enabled model and reports which ones surface the brand. Use when the user asks to "find queries for [brand]", "check GEO visibility", "which queries mention [brand]", "geo query finder", "find AI mentions", or "test ChatGPT queries for [brand]".
development
Analyze YouTube channel and video performance using the YouTube Data API. Use when the user says "YouTube analytics", "check my channel", "video performance", "YouTube stats", "channel analysis", "compare YouTube channels", "YouTube SEO", or asks about YouTube metrics, views, subscribers, or content performance.