Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

letta-ai/transcribe

Name: transcribe
Author: letta-ai

tools/transcribe/SKILL.md

npx skillsauth add letta-ai/skills transcribe

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Audio Transcribe

Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs.

Workflow

Collect inputs: audio file path(s), desired response format (text/json/diarized_json), optional language hint, and any known speaker references.
Verify OPENAI_API_KEY is set. If missing, ask the user to set it locally (do not ask them to paste the key).
Run the bundled transcribe_diarize.py CLI with sensible defaults (fast text transcription).
Validate the output: transcription quality, speaker labels, and segment boundaries; iterate with a single targeted change if needed.
Save outputs under output/transcribe/ when working in this repo.

Decision rules

Default to gpt-4o-mini-transcribe with --response-format text for fast transcription.
If the user wants speaker labels or diarization, use --model gpt-4o-transcribe-diarize --response-format diarized_json.
If audio is longer than ~30 seconds, keep --chunking-strategy auto.
Prompting is not supported for gpt-4o-transcribe-diarize.

Output conventions

Use output/transcribe/<job-id>/ for evaluation runs.
Use --out-dir for multiple files to avoid overwriting.

Dependencies (install if missing)

Prefer uv for dependency management.

uv pip install openai

If uv is unavailable:

python3 -m pip install openai

Environment

OPENAI_API_KEY must be set for live API calls.
If the key is missing, instruct the user to create one in the OpenAI platform UI and export it in their shell.
Never ask the user to paste the full key in chat.

Skill path (set once)

# Set to the directory containing this SKILL.md
export TRANSCRIBE_CLI="<path-to-skill>/scripts/transcribe_diarize.py"

Replace <path-to-skill> with the actual skill installation directory (e.g. .skills/transcribe or ~/.letta/skills/transcribe).

CLI quick start

Single file (fast text default):

python3 "$TRANSCRIBE_CLI" \
  path/to/audio.wav \
  --out transcript.txt

Diarization with known speakers (up to 4):

python3 "$TRANSCRIBE_CLI" \
  meeting.m4a \
  --model gpt-4o-transcribe-diarize \
  --known-speaker "Alice=refs/alice.wav" \
  --known-speaker "Bob=refs/bob.wav" \
  --response-format diarized_json \
  --out-dir output/transcribe/meeting

Plain text output (explicit):

python3 "$TRANSCRIBE_CLI" \
  interview.mp3 \
  --response-format text \
  --out interview.txt

Reference map

references/api.md: supported formats, limits, response formats, and known-speaker notes.

letta-ai/transcribe

tools/transcribe/SKILL.md

Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings.

95 stars

content-media

Updated May 5, 2026

$ install --global

skillsauth

npx skillsauth add letta-ai/skills transcribe

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 5, 2026, 7:51 AM128.3s5 files scanned

SKILL.md

name:: transcribe
description:: Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings.

Audio Transcribe

Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs.

Workflow

Collect inputs: audio file path(s), desired response format (text/json/diarized_json), optional language hint, and any known speaker references.
Verify OPENAI_API_KEY is set. If missing, ask the user to set it locally (do not ask them to paste the key).
Run the bundled transcribe_diarize.py CLI with sensible defaults (fast text transcription).
Validate the output: transcription quality, speaker labels, and segment boundaries; iterate with a single targeted change if needed.
Save outputs under output/transcribe/ when working in this repo.

Decision rules

Default to gpt-4o-mini-transcribe with --response-format text for fast transcription.
If the user wants speaker labels or diarization, use --model gpt-4o-transcribe-diarize --response-format diarized_json.
If audio is longer than ~30 seconds, keep --chunking-strategy auto.
Prompting is not supported for gpt-4o-transcribe-diarize.

Output conventions

Use output/transcribe/<job-id>/ for evaluation runs.
Use --out-dir for multiple files to avoid overwriting.

Dependencies (install if missing)

Prefer uv for dependency management.

uv pip install openai

If uv is unavailable:

python3 -m pip install openai

Environment

OPENAI_API_KEY must be set for live API calls.
If the key is missing, instruct the user to create one in the OpenAI platform UI and export it in their shell.
Never ask the user to paste the full key in chat.

Skill path (set once)

# Set to the directory containing this SKILL.md
export TRANSCRIBE_CLI="<path-to-skill>/scripts/transcribe_diarize.py"

Replace <path-to-skill> with the actual skill installation directory (e.g. .skills/transcribe or ~/.letta/skills/transcribe).

CLI quick start

Single file (fast text default):

python3 "$TRANSCRIBE_CLI" \
  path/to/audio.wav \
  --out transcript.txt

Diarization with known speakers (up to 4):

python3 "$TRANSCRIBE_CLI" \
  meeting.m4a \
  --model gpt-4o-transcribe-diarize \
  --known-speaker "Alice=refs/alice.wav" \
  --known-speaker "Bob=refs/bob.wav" \
  --response-format diarized_json \
  --out-dir output/transcribe/meeting

Plain text output (explicit):

python3 "$TRANSCRIBE_CLI" \
  interview.mp3 \
  --response-format text \
  --out interview.txt

Reference map

references/api.md: supported formats, limits, response formats, and known-speaker notes.

Related Skills

letta-ai/remote-desktop-testing-windows

tools

VerifiedTrustedCommunity

Test any GUI app or change on a Daytona Windows remote desktop sandbox. Use to launch a GUI program, sync a local project, take a screenshot, record a video, or share a clickable live-desktop link with a teammate. Generic — the only dependency is Daytona. For Linux, use remote-desktop-testing-linux.

123SKILL.mdUpdated Jul 4, 2026

letta-ai/remote-desktop-testing-windows

letta-ai/remote-desktop-testing-linux

tools

VerifiedTrustedCommunity

Test any GUI app or change on a Daytona Linux (Ubuntu xfce4 + noVNC) remote desktop sandbox. Use to launch a GUI program, sync a local project, take a screenshot, record a video, or share a clickable live-desktop link with a teammate. Generic — the only dependency is Daytona. For Windows, use remote-desktop-testing-windows.

123SKILL.mdUpdated Jul 4, 2026

letta-ai/remote-desktop-testing-linux

letta-ai/self-configuration

testing

VerifiedTrustedCommunity

Configures Letta agents' own runtime behavior, including model, context window, system prompt, reasoning, conversation overrides, compaction settings, and compaction prompts. Use when an agent or user asks to self-modify, tune summarization/compaction, change identity/system instructions, adjust model settings, or test conversation-scoped overrides.

121SKILL.mdUpdated Jun 17, 2026

letta-ai/self-configuration

letta-ai/setting-profile-images

development

VerifiedTrustedCommunity

Sets Letta Desktop and Letta Code agent profile images by writing profile.png into an agent MemFS repository. Use when the user asks to add, change, generate, or fix an agent avatar, profile picture, profile image, or Desktop agent photo.

119SKILL.mdUpdated Jun 16, 2026

letta-ai/setting-profile-images

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/letta-ai/skills.git

# Copy into Claude Code skills folder (global)
cp -r skills/tools/transcribe ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

letta-ai/skills

95 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT