skills/dlai-transcript-fetcher/SKILL.md
Fetch and organize course transcripts from DeepLearning.AI. Use this skill whenever the user mentions DeepLearning.AI courses, wants to download course transcripts, subtitles, or VTT files from a course, or asks to organize lesson transcripts from learn.deeplearning.ai. It does NOT trigger for general video subtitle downloading — only for DeepLearning.AI courses specifically.
npx skillsauth add duruii/scientific-skills dlai-transcript-fetcherInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Fetches video transcripts (VTT subtitles) from DeepLearning.AI courses and saves them as clean, organized Markdown files.
Required: Chrome DevTools MCP + user logged into DeepLearning.AI
Before using this skill, the user must have:
--auto-connect in ~/.claude.jsonchrome://inspect/#remote-debuggingSee references/mcp-setup.md for the full setup guide.
If the MCP is not configured, tell the user to follow the setup guide first. Do not attempt to fetch transcripts without it — the VTT URL discovery requires the DevTools protocol.
The DeepLearning.AI video hosting infrastructure stores VTT subtitles at predictable paths. This skill discovers those paths by examining network requests on the course lesson page.
Use Chrome DevTools MCP to navigate to the course's first lesson page. Then evaluate this JavaScript to find all video resources:
() => {
return performance.getEntriesByType('resource')
.filter(r => r.name.includes('Radixark'))
.map(r => r.name);
}
From the results, extract all unique VTT subtitle URLs. They look like:
https://video.deeplearning.ai/Radixark/C1/L0/subtitle/eng/sc-Radixark-C1-L0-eng.vtthttps://video.deeplearning.ai/Radixark/C1/L1/subtitle/eng/sc-Radixark-C1-L1-eng.vtthttps://video.deeplearning.ai/Radixark/C1/Conclusion/subtitle/eng/sc-Radixark-C1-Conclusion-eng.vttImportant mapping rules:
L{n}/subtitle/eng/sc-Radixark-C1-L{n}-eng.vtt (L0, L1, L2, ...)Conclusion/subtitle/eng/sc-Radixark-C1-Conclusion-eng.vttAlso extract all lesson page URLs from the sidebar to map slugs to lesson names:
() => {
const links = document.querySelectorAll('a[href*="/lesson/"]');
return [...new Set([...links].map(l => l.href))];
}
Use curl to download each VTT file. Use the -L flag to follow redirects:
curl -s -L -o "temp_L{n}.vtt" "https://video.deeplearning.ai/Radixark/C1/L{n}/subtitle/eng/sc-Radixark-C1-L{n}-eng.vtt"
For the Conclusion lesson:
curl -s -L -o "temp_Conclusion.vtt" "https://video.deeplearning.ai/Radixark/C1/Conclusion/subtitle/eng/sc-Radixark-C1-Conclusion-eng.vtt"
Visit each lesson URL individually and run performance.getEntries to find its L-number. The first lesson visited gives you the CDN domain and COURSE_ID; subsequent lessons increment L{n}.
Alternative shortcut: once you have one VTT URL (e.g. L0 from the introduction page), increment L{n} sequentially and batch-download them all — most courses number lessons consecutively from L0. Verify by checking the HTTP status; skip any that return 403.
Write and run a Python script to parse the VTT files and output clean Markdown. The script should be placed in the project folder and run from there:
import re, os
# Pattern A (SGLang): sc-Radixark-C1-L{n}-eng.vtt
# Pattern B (Claude Code): sc-Anthropic-C3-L{n}.vtt
# The script handles both — just match the actual filename
files = {
'claude-code-01-introduction.md': 'temp_L0.vtt',
'claude-code-02-what-is-claude-code.md': 'temp_L1.vtt',
# ... fill in based on discovered lesson names
}
for md_name, vtt_name in files.items():
with open(vtt_name, 'r', encoding='utf-8') as f:
content = f.read()
lines = content.split('\n')
transcript_lines = []
skip_patterns = ('WEBVTT', 'X-TIMESTAMP-MAP')
i = 0
while i < len(lines):
line = lines[i].strip()
if not line or line.startswith(skip_patterns):
i += 1
continue
if re.match(r'^\d+$', line): # cue id
i += 1
continue
if '-->' in line: # timestamp
i += 1
continue
transcript_lines.append(line)
i += 1
text = ' '.join(transcript_lines)
text = re.sub(r'\s+', ' ', text).strip()
with open(md_name, 'w', encoding='utf-8') as f:
f.write(f'# {md_name.replace(".md","").replace("-"," ").title()}\n\n')
f.write(f'> Source: DeepLearning.AI\n\n---\n\n{text}\n')
Save all .md files to a transcripts/ subdirectory within the target project folder. Original .vtt files can be kept alongside for reference.
VTT URLs are discovered via performance.getEntries on the lesson page — do not guess the URL structure. However, after discovering one VTT URL, you can infer the pattern for that course.
Each course has its own CDN domain and course ID. From a VTT URL:
https://{CDN}/{COURSE_ID}/{LESSON_ID}/subtitle/eng/sc-{COURSE_ID}-{LESSON_ID}[-eng].vtt
Examples from real courses:
| Course | CDN | Course ID | VTT Pattern |
|--------|-----|-----------|-------------|
| SGLang | video.deeplearning.ai | Radixark/C1 | sc-Radixark-C1-L{n}[-eng].vtt |
| Claude Code | dyckms5inbsqq.cloudfront.net | Anthropic/C3 | sc-Anthropic-C3-L{n}.vtt |
L{n} (L0, L1, L2, ...)Conclusion (literal word) instead of a number — always discover from network requestsSome courses use sc-{COURSE_ID}-L{n}.vtt (no suffix), others use sc-{COURSE_ID}-L{n}-eng.vtt. Always check the actual discovered URL — the script should extract from what exists, not assume a suffix.
performance.getEntries to find at least one VTT URL403 on some VTT URLs: If a lesson returns 403, skip it — quiz lessons typically have no transcript. Some courses also have non-video pages (e.g., "course-notes", "prompts & summaries") that have no VTT.
Named lesson IDs: "Conclusion" uses literal "Conclusion" instead of a number. Some courses may have other named lessons. Always discover via performance.getEntries.
VTT filename suffix varies: Some courses use sc-{id}-L{n}.vtt (no suffix), others use sc-{id}-L{n}-eng.vtt. Always discover the actual filename from the lesson page — do not assume either form.
CDN and COURSE_ID vary per course: Each course has its own CDN domain and course ID. Never hardcode video.deeplearning.ai or Radixark/C1 — extract from the actual VTT URL discovered on the page.
Sidebar lesson count ≠ VTT count: Courses often have non-video lessons (course-notes, prompts & summaries, quiz) that won't have VTT files. Only count the L-numbered lessons.
testing
Help users design, audit, and maintain a file management system — folder structures, naming conventions, storage strategies, and cleanup workflows. Use this skill whenever the user mentions organizing files, cleaning up folders, setting up a file structure, naming files, managing photos/albums, digital asset organization, cloud vs local storage, backups, or says things like "my files are a mess", "where should I put this", "help me organize my drive", "set up a folder structure", or asks about file naming.
tools
Debugging methodology for LLM tool calls — trace from the model's side first, use runtime evidence over code inference, and follow live request chains for reasoning/thinking bugs.
testing
Research-grade single-paper analysis with evidence-grounded structured extraction and internal self-evaluation. Use when users ask to summarize or screen one academic paper from an arXiv link/ID or local PDF and need verifiable claims with citations, especially for Chinese-language output to students.
tools
Use browser MCP to access IEEE Xplore through university library proxy, preserve institutional session, run keyword/advanced/journal search, and optionally post-filter by CCF rank (for example CCF-A) with structured output.