claude/skills/clean/SKILL.md
Clean noisy machine-generated text into readable output. Unlike converting (pandoc), cleaning handles fragmented, duplicated, or metadata-heavy formats. Supports VTT and SRT subtitle files via cleansubs. Triggers: 'clean vtt', 'clean srt', 'clean subtitles', 'vtt to text', 'srt to text', 'vtt to markdown', 'clean transcript', 'subtitle to text', 'cleansubs'.
npx skillsauth add kendreaditya/.config cleanInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Clean noisy, machine-generated text into readable output.
Converting (pandoc) = structured data, just change syntax. Cleaning = noisy data, strip metadata, dedup, stitch lines, find paragraph boundaries.
CLI: cleansubs (at scripts/cleansubs.py)
# Plain text
cleansubs input.vtt # stdout
cleansubs input.srt # SRT directly, no conversion
cleansubs input.vtt -o output.txt # file
# Markdown with metadata (matches yt-research output format)
cleansubs input.vtt --md # stdout
cleansubs input.srt --md --title "Movie" -o t.md # SRT to markdown
cleansubs input.vtt --md --title "T" --id "abc" # with metadata
# Stdin
cat /tmp/VIDEO_ID.en.vtt | cleansubs
cat /tmp/VIDEO_ID.en.vtt | cleansubs --md --title "My Video"
# Batch — multiple files to one markdown
cleansubs /tmp/*.vtt --md --heading "Research" -o combined.md
# Force format when auto-detect is wrong
cleansubs input.vtt --format srt
cleansubs input.vtt --format youtube
Flags:
| Flag | Description |
|------|-------------|
| --md | Output markdown with ## Title, Video ID, URL, transcript, --- |
| --title | Video title (inferred from filename if omitted) |
| --id | Video ID (inferred from yt-dlp filename pattern ID.lang.vtt) |
| --url | Video URL (auto-generated from ID if omitted) |
| --heading | H1 heading for combined batch output |
| --format | Force format: auto (default), youtube, srt, structured_vtt |
| -o | Output file (stdout if omitted) |
Format auto-detection:
.srt extension or comma-timestamps → SRT pipeline (block-based, timing-gap paragraphs)Kind: captions header or no cue IDs → YouTube auto-gen VTT (overlap-merge dedup)What it cleans:
Use with youtube skill:
# Download subs + clean to plain text
yt-dlp --write-auto-sub --sub-lang en --sub-format vtt --skip-download \
-o "/tmp/%(id)s.%(ext)s" "URL" && cleansubs /tmp/VIDEO_ID.en.vtt
# Download subs + clean to markdown with metadata
yt-dlp --write-auto-sub --sub-lang en --sub-format vtt --skip-download \
-o "/tmp/%(id)s.%(ext)s" "URL" && \
cleansubs /tmp/VIDEO_ID.en.vtt --md --title "Video Title" -o transcript.md
Dependencies: Python 3 (stdlib only — no pip packages)
testing
Reviews test coverage and suggests missing test cases for error paths, edge cases, and business logic. Activates when users write tests or implement new features.
development
Identify, categorize, and prioritize technical debt. Trigger with "tech debt", "technical debt audit", "what should we refactor", "code health", or when the user asks about code quality, refactoring priorities, or maintenance backlog.
tools
Comprehensive security scanning and vulnerability detection. Includes input validation, path traversal prevention, CVE detection, and secure coding pattern enforcement. Use when: authentication implementation, authorization logic, payment processing, user data handling, API endpoint creation, file upload handling, database queries, external API integration. Skip when: read-only operations on public data, internal development tooling, static documentation, styling changes.
development
Optimizes application performance. Use when performance requirements exist, when you suspect performance regressions, or when Core Web Vitals or load times need improvement. Use when profiling reveals bottlenecks that need fixing.