journalism-core/skills/interview-transcription/SKILL.md
Transcription workflows, recording management, and quote extraction for journalists. Use when processing audio/video recordings, generating transcripts with timestamps, extracting quotes for fact-checking, or building source-and-recording databases. For interview question design and pre-interview preparation, see the interview-prep skill.
npx skillsauth add jamditis/claude-skills-journalism interview-transcriptionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Practical workflows for journalists managing interviews from preparation through publication.
For pre-interview research, question design, attribution agreements, and consent scripts, use the interview-prep skill. The notes here cover only the recording configuration that affects transcription quality.
# Standard recording configuration for clean transcription
RECORDING_SETTINGS = {
'format': 'wav', # Lossless for transcription
'sample_rate': 16000, # Whisper resamples to 16k anyway; 16k saves disk
'channels': 1, # Mono is fine for speech; stereo only if mics are positionally distinct
'backup': True, # Always run a backup recorder
}
# File naming convention
# YYYY-MM-DD_source-lastname_topic.wav
# Example: 2026-05-08_smith_budget-hearing.wav
Two-device rule. Always record on two devices. Phone as backup minimum. If using a wireless lav mic, the recorder built into the lav unit is one device; the phone running a backup app is the second.
Mono is preferred unless each speaker has their own dedicated microphone routed to a distinct channel. Stereo with both speakers bleeding into both channels is worse for diarization than clean mono.
Vanilla OpenAI Whisper transcribes audio to text but does not assign speaker labels. To get diarized output ("Speaker 1:" / "Speaker 2:" / etc.) you need a tool that combines Whisper with a diarization model — typically WhisperX (m-bain/whisperX), which wraps faster-whisper transcription with pyannote.audio diarization and produces word-level timestamps with speaker IDs in one pass.
from pathlib import Path
import subprocess
import json
def transcribe_interview(
audio_path: str,
output_dir: str = "./transcripts",
diarize: bool = True,
hf_token: str | None = None,
min_speakers: int = 2,
max_speakers: int = 2,
) -> dict:
"""
Transcribe an interview using WhisperX (Whisper + pyannote diarization).
Returns a transcript with word-level timestamps and speaker labels.
Diarization needs a Hugging Face token with access to the pyannote
speaker-diarization-3.1 model. Accept the model EULA at
huggingface.co/pyannote/speaker-diarization-3.1 once, then pass the token.
"""
Path(output_dir).mkdir(exist_ok=True)
cmd = [
'whisperx', audio_path,
'--model', 'large-v3',
'--output_format', 'json',
'--output_dir', output_dir,
'--language', 'en',
'--compute_type', 'int8', # CPU-friendly; use 'float16' on GPU
'--min_speakers', str(min_speakers),
'--max_speakers', str(max_speakers),
]
if diarize:
cmd.append('--diarize')
if hf_token:
cmd += ['--hf_token', hf_token]
subprocess.run(cmd, check=True, capture_output=True)
json_path = Path(output_dir) / f"{Path(audio_path).stem}.json"
with open(json_path) as f:
return json.load(f)
def format_for_editing(transcript: dict) -> str:
"""Convert to journalist-friendly format with timestamps."""
lines = []
for segment in transcript.get('segments', []):
timestamp = format_timestamp(segment['start'])
text = segment['text'].strip()
lines.append(f"[{timestamp}] {text}")
return '\n\n'.join(lines)
def format_timestamp(seconds: float) -> str:
"""Convert seconds to HH:MM:SS format."""
h = int(seconds // 3600)
m = int((seconds % 3600) // 60)
s = int(seconds % 60)
return f"{h:02d}:{m:02d}:{s:02d}"
Falling back to plain Whisper. If diarization is overkill or you can't get a Hugging Face token, drop the --diarize flag — the model still produces accurate timestamped transcription and you label speakers manually based on context. faster-whisper (CTranslate2 backend) is the speed-optimized variant and works the same way at the CLI. whisper.cpp is the C++ port for resource-constrained machines (Raspberry Pi, older laptops); it doesn't include diarization but runs the small/medium models on CPU comfortably.
For sensitive interviews or when AI transcription fails:
## Transcript: [Source] - [Date]
**Recording file**: [filename]
**Duration**: [XX:XX]
**Transcribed by**: [name]
**Verified against recording**: [ ] Yes / [ ] No
---
[00:00:15] **Q**: [Your question]
[00:00:45] **A**: [Source response - verbatim, including ums, pauses noted as (...)]
[00:01:30] **Q**: [Follow-up]
[00:01:42] **A**: [Response]
---
## Notes
- [Anything not captured in audio: gestures, documents shown, etc.]
## Potential quotes
- [00:01:42] "Quote that stands out" - context: [why it matters]
from dataclasses import dataclass
from typing import Optional
import re
@dataclass
class Quote:
text: str
timestamp: str
speaker: str
context: str
verified: bool = False
used_in: Optional[str] = None
class QuoteBank:
"""Manage quotes from interview transcripts."""
def __init__(self):
self.quotes = []
def extract_quote(self, transcript: str, start_time: str,
end_time: str, speaker: str, context: str) -> Quote:
"""Extract and store a quote with metadata."""
# Pull text between timestamps
pattern = rf'\[{re.escape(start_time)}\](.+?)(?=\[\d|$)'
match = re.search(pattern, transcript, re.DOTALL)
if match:
text = match.group(1).strip()
quote = Quote(
text=text,
timestamp=start_time,
speaker=speaker,
context=context
)
self.quotes.append(quote)
return quote
return None
def verify_quote(self, quote: Quote, audio_path: str) -> bool:
"""Mark quote as verified against original recording."""
# In practice: listen to audio at timestamp, confirm accuracy
quote.verified = True
return True
def export_for_story(self) -> str:
"""Export verified quotes ready for publication."""
output = []
for q in self.quotes:
if q.verified:
output.append(f'"{q.text}"\n— {q.speaker}\n[Timestamp: {q.timestamp}]')
return '\n\n'.join(output)
Before publishing any quote:
- [ ] Listened to original recording at timestamp
- [ ] Quote is verbatim (or clearly marked as paraphrased)
- [ ] Context preserved (not cherry-picked to change meaning)
- [ ] Speaker identified correctly
- [ ] Timestamp documented for fact-checker
- [ ] Source approved quote (if agreement made)
from dataclasses import dataclass, field
from datetime import datetime
from typing import List, Optional
from enum import Enum
class SourceStatus(Enum):
ACTIVE = "active" # Currently engaged
DORMANT = "dormant" # Not recently contacted
DECLINED = "declined" # Refused to participate
OFF_RECORD = "off_record" # Background only
class InterviewType(Enum):
ON_RECORD = "on_record"
BACKGROUND = "background"
DEEP_BACKGROUND = "deep_background"
OFF_RECORD = "off_record"
@dataclass
class Source:
name: str
organization: str
contact_info: dict # email, phone, signal, etc.
beat: str
status: SourceStatus = SourceStatus.ACTIVE
interviews: List['Interview'] = field(default_factory=list)
notes: str = ""
# Relationship tracking
first_contact: Optional[datetime] = None
trust_level: int = 1 # 1-5 scale
@dataclass
class Interview:
source: str
date: datetime
interview_type: InterviewType
recording_path: Optional[str] = None
transcript_path: Optional[str] = None
story_slug: Optional[str] = None
key_quotes: List[str] = field(default_factory=list)
follow_up_needed: bool = False
notes: str = ""
def find_sources_for_story(sources: List[Source], topic: str,
beat: str = None) -> List[Source]:
"""Find relevant sources for a new story."""
matches = []
for source in sources:
# Filter by beat if specified
if beat and source.beat != beat:
continue
# Only suggest active sources
if source.status != SourceStatus.ACTIVE:
continue
# Check if they've spoken on similar topics
for interview in source.interviews:
if topic.lower() in interview.notes.lower():
matches.append(source)
break
# Sort by trust level
return sorted(matches, key=lambda s: s.trust_level, reverse=True)
from pathlib import Path
from concurrent.futures import ProcessPoolExecutor
import json
def batch_transcribe(recordings_dir: str, output_dir: str) -> dict:
"""Process all recordings in a directory."""
recordings = list(Path(recordings_dir).glob('*.wav')) + \
list(Path(recordings_dir).glob('*.mp3')) + \
list(Path(recordings_dir).glob('*.m4a'))
results = {}
with ProcessPoolExecutor(max_workers=4) as executor:
futures = {
executor.submit(transcribe_interview, str(rec), output_dir): rec
for rec in recordings
}
for future in futures:
rec = futures[future]
try:
transcript = future.result()
results[rec.name] = {
'status': 'success',
'transcript': transcript
}
except Exception as e:
results[rec.name] = {
'status': 'error',
'error': str(e)
}
return results
import subprocess
def extract_audio_from_video(video_path: str, output_path: str = None) -> str:
"""Extract audio track from video for transcription."""
if output_path is None:
output_path = video_path.rsplit('.', 1)[0] + '.wav'
subprocess.run([
'ffmpeg', '-i', video_path,
'-vn', # No video
'-acodec', 'pcm_s16le', # WAV format
'-ar', '44100', # Sample rate
'-ac', '1', # Mono
output_path
], check=True)
return output_path
## Recording consent record
**Date**:
**Source name**:
**Recording type**: [ ] Audio [ ] Video
**Interview type**: [ ] On record [ ] Background [ ] Off record
### Consent obtained:
- [ ] Verbal consent recorded at start of interview
- [ ] Written consent form signed
- [ ] Email confirmation of consent
### Jurisdiction notes:
- Interview location state/country:
- One-party or two-party consent jurisdiction:
- Any specific restrictions agreed:
### Agreed terms:
- [ ] Full attribution allowed
- [ ] Organization attribution only
- [ ] Anonymous source
- [ ] Review quotes before publication
- [ ] Embargo until [date]:
For the per-state breakdown of one-party vs. all-party consent, hidden-recording rules, and federal preemption, use the interview-prep skill (which points to the Reporters Committee for Freedom of the Press Reporter's Recording Guide — the authoritative continuously-updated source).
Always get explicit consent on recording regardless of jurisdiction. Note the consent verbatim at the head of every transcript file (timestamp, speaker, response). This protects you legally everywhere and gives the fact-checker a clean starting point.
| Tool | Purpose | Notes |
|------|---------|-------|
| OpenAI Whisper | Local transcription, no diarization | Free, runs offline. large-v3 is the current best model |
| WhisperX | Whisper + speaker diarization | m-bain/whisperX. Free. Word-level timestamps with speaker IDs. Needs a Hugging Face token for the pyannote model |
| faster-whisper | Speed-optimized Whisper | CTranslate2 backend. ~4x faster than vanilla Whisper at the same accuracy. Used internally by WhisperX |
| whisper.cpp | CPU-friendly Whisper port | C++ implementation. Runs the small/medium models on a Raspberry Pi |
| pyannote.audio | Standalone speaker diarization | Use directly when you already have transcripts from another source |
| MacWhisper / Buzz | GUI wrappers for Whisper | macOS / cross-platform GUIs for journalists who don't want a CLI |
| Otter.ai | Cloud transcription, real-time | Verify privacy posture before using with sensitive sources — Otter Pilot has historically joined meetings unannounced and indexed transcripts; check current settings |
| Descript | Edit audio like text | Good for pulling clips. Cloud-hosted |
| Rev (human + AI) | Human transcription for sensitive material | Slower, more accurate. Cloud-hosted |
| Trint | Journalist-focused, collaboration | Cloud-hosted. Has team features |
| oTranscribe | Free web-based manual transcription aid | Local-only (browser); no upload. Good for off-the-record material you can't hand to a cloud service |
| Field | Value | |-------|-------| | version | 1.0.0 | | created | 2025-12-26 | | updated | 2026-05-08 | | author | Joe Amditis | | domain | journalism, research | | complexity | intermediate |
testing
Configure install-time cooldowns for npm/bun (minimum release age) and run a sandboxed pre-install scan when the cooldown has to be bypassed. Use when the user asks about supply-chain attacks, npm/bun security, "minimum release age", a "cooldown" for installs, hardening against Shai-Hulud-class worms, or how to safely install a package that was just published. Also use after any recent supply-chain incident in the npm ecosystem.
tools
Generate CLAUDE.md project memory files that transfer institutional knowledge, not obvious information. Use when setting up new journalism projects, onboarding collaborators, or documenting project-specific quirks. Includes templates for editorial tools, event websites, publications, research projects, content pipelines, and digital archives.
development
Use when suggesting APIs for a project, looking for free data sources, building weekend projects that need external data, or when the user needs weather, news, finance, sports, ML, or entertainment data without paid subscriptions
development
Choose the correct CLAUDE.md or LESSONS.md template for journalism projects. Use when starting a new project, setting up documentation, or unsure which template category fits best. Provides decision trees and selection guidance for 6 journalism-focused template types.