Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

oimiragieo/telegram-voice-pipeline

Name: telegram-voice-pipeline
Author: oimiragieo

.claude/skills/_archive/telegram-voice-pipeline/SKILL.md

npx skillsauth add oimiragieo/agent-studio telegram-voice-pipeline

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Telegram Voice Pipeline

Overview

This skill handles Telegram voice messages end-to-end:

Detect voice message via attachment_file_id in channel tag
Download OGG/Opus file via Telegram MCP tool
Transcribe with Whisper (transcribe-anything)
Process transcribed text as the user query — generate a text response
Generate TTS audio via ElevenLabs (fallback: OpenAI TTS) → MP3
Reply with the MP3 audio file via Telegram MCP tool

When to Invoke

Skill({ skill: 'telegram-voice-pipeline' });

Invoke when:

A Telegram channel tag has an attachment_file_id attribute (voice or audio message)
The channel agent needs to respond with audio rather than text
Any step in the voice pipeline needs to be re-run after a failure

Detection Pattern

Telegram voice messages arrive as channel tags:

<channel source="telegram" chat_id="123456" message_id="789" user="username" ts="1234567890" attachment_file_id="BQACAgIAAxkBAAIBc2...">

Key detection logic:

attachment_file_id present → voice/audio message → invoke this skill
image_path present → photo → use image handling instead
Neither present → text message → normal text response

Workflow

Step 1: Download the Voice File

Call the Telegram MCP download tool with the file_id from the channel tag:

// MCP tool call (agent uses this directly)
mcp__telegram - relay__download_attachment({ file_id: '<attachment_file_id>' });
// Returns: local file path, e.g. /tmp/voice_abc123.ogg

Verify: The returned path exists and is non-empty (> 1KB for a real voice message).

Error handling: If download fails, reply with a text message: "Sorry, I couldn't download your voice message. Please try again."

Step 2: Transcribe with Whisper

Install transcribe-anything if not present:

pip install transcribe-anything

Run transcription:

transcribe-anything /tmp/voice_abc123.ogg --model medium --output_dir /tmp/tg_voice/

Read the transcript:

cat /tmp/tg_voice/voice_abc123.txt

Model selection (trade-off between speed and accuracy):

| Model | Speed | Accuracy | Use when | | ---------- | ----- | -------- | ------------------------------ | | tiny | ~2s | Low | Rapid prototyping only | | small | ~5s | Medium | Short messages, speed priority | | medium | ~12s | High | Default — best balance | | large-v3 | ~30s | Best | Long/complex messages |

Override via env: WHISPER_MODEL=small (default: medium)

Verify: /tmp/tg_voice/<filename>.txt exists and is non-empty.

Error handling: If transcription fails or output is empty, reply: "I received your voice message but couldn't transcribe it. Could you try sending it again or type your message?"

Step 3: Process Transcription as User Message

Use the transcribed text as the user input. Generate a text response using the agent's normal response logic.

transcribed_text = contents of /tmp/tg_voice/<filename>.txt
response_text = <agent's generated response to transcribed_text>

Guard max length for TTS: response_text[:4000] (ElevenLabs limit) or response_text[:4096] (OpenAI TTS limit).

Step 4: Generate TTS Audio

Primary: ElevenLabs (requires `ELEVENLABS_API_KEY`)

import os
from elevenlabs import ElevenLabs

client = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"])
audio = client.text_to_speech.convert(
    text=response_text[:4000],
    voice_id="JBFqnCBsd6RMkjVDRZzb",  # George — clear, neutral voice
    model_id="eleven_turbo_v2",
    output_format="mp3_44100_128",
)
output_path = "/tmp/tg_voice_response.mp3"
with open(output_path, "wb") as f:
    for chunk in audio:
        f.write(chunk)
print(f"TTS written to {output_path}")

Override voice via env: ELEVENLABS_VOICE_ID=<voice_id> (default: JBFqnCBsd6RMkjVDRZzb)

Fallback: OpenAI TTS (requires `OPENAI_API_KEY`, no `ELEVENLABS_API_KEY`)

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
with client.audio.speech.with_streaming_response.create(
    model="tts-1",
    voice="nova",
    input=response_text[:4096],
) as response:
    response.stream_to_file("/tmp/tg_voice_response.mp3")
print("TTS written to /tmp/tg_voice_response.mp3")

Selection logic:

if os.environ.get("ELEVENLABS_API_KEY"):
    # Use ElevenLabs
else:
    # Use OpenAI TTS fallback

Verify: /tmp/tg_voice_response.mp3 exists and is > 1KB.

Error handling: If TTS fails, send response_text as a plain text reply instead of audio.

Step 5: Reply with Audio File

// MCP tool call
mcp__telegram -
  relay__reply({
    chat_id: '<chat_id from channel tag>',
    text: response_text, // Also send the transcript so user can read it
    files: ['/tmp/tg_voice_response.mp3'],
  });

Note: Including text alongside the audio file gives the user both a readable transcript and the audio reply — useful for accessibility and noisy environments.

Verify: No error returned from the reply tool.

Step 6: Cleanup Temp Files

After a successful reply, clean up to avoid disk accumulation:

rm -f /tmp/tg_voice_response.mp3
rm -rf /tmp/tg_voice/

Environment Variables

| Variable | Required | Default | Purpose | | --------------------- | ---------------------- | ---------------------- | ------------------------- | | ELEVENLABS_API_KEY | NO (if OpenAI set) | — | ElevenLabs TTS API key | | ELEVENLABS_VOICE_ID | NO | JBFqnCBsd6RMkjVDRZzb | ElevenLabs voice (George) | | OPENAI_API_KEY | NO (if ElevenLabs set) | — | OpenAI TTS fallback key | | WHISPER_MODEL | NO | medium | Whisper model size |

At least one of ELEVENLABS_API_KEY or OPENAI_API_KEY must be set for TTS to work.

Full Pipeline Example

[Telegram] User sends 10-second voice message
   ↓
[Agent] Detects attachment_file_id in channel tag
   ↓
[MCP] download_attachment(file_id) → /tmp/voice_abc123.ogg
   ↓
[Bash] transcribe-anything /tmp/voice_abc123.ogg --model medium → "What is the weather like today?"
   ↓
[Agent] Generates response: "I don't have real-time weather data, but I can help you check..."
   ↓
[Python] ElevenLabs TTS → /tmp/tg_voice_response.mp3
   ↓
[MCP] reply(chat_id, text="I don't have...", files=["/tmp/tg_voice_response.mp3"])
   ↓
[Telegram] User receives text + audio reply
   ↓
[Bash] rm /tmp/tg_voice_response.mp3 && rm -rf /tmp/tg_voice/

Total time for 10-second voice message: ~15-25 seconds (download 1s + transcribe 12s + TTS 2s + reply 1s).

Anti-Patterns

Never skip the download step — attachment_file_id is not a file path, it must be resolved via the MCP tool
Never use shell: true for subprocess calls in transcription — use array args with shell: false
Never skip the text fallback — if TTS fails, always send the text response so the user gets an answer
Never process image attachments through this skill — photos use image_path, not attachment_file_id; route them differently
Never leave temp files on disk — clean up after every reply to prevent disk accumulation

Related Skills

enable-telegram — Start the channel daemon for background Telegram monitoring
transcription — Whisper transcription workflow (used in Step 2)
tts-generation — ElevenLabs and OpenAI TTS (used in Step 4)

References

Telegram MCP tools: mcp__telegram-relay__download_attachment, mcp__telegram-relay__reply
transcribe-anything: https://github.com/modal-labs/transcribe-anything
ElevenLabs Python SDK: https://github.com/elevenlabs/elevenlabs-python
OpenAI TTS docs: https://platform.openai.com/docs/guides/text-to-speech

Search Protocol

For code discovery and search tasks, follow this priority order:

pnpm search:code "<query>" (Primary intent-based search).
ripgrep (for exact keyword/regex matches).
semantic/structural search via code tools if available.

Memory Protocol (MANDATORY)

Before starting:

cat .claude/context/memory/learnings.md
cat .claude/context/memory/decisions.md

After completing:

New pattern -> .claude/context/memory/learnings.md
Issue found -> .claude/context/memory/issues.md
Decision made -> .claude/context/memory/decisions.md

ASSUME INTERRUPTION: Your context may reset. If it's not in memory, it didn't happen.

oimiragieo/telegram-voice-pipeline

.claude/skills/_archive/telegram-voice-pipeline/SKILL.md

End-to-end voice message pipeline for Telegram — download OGG attachment, transcribe with Whisper, generate a text response, convert to MP3 via ElevenLabs TTS, and reply with the audio file.

23 stars

devops

Updated Apr 7, 2026

$ install --global

skillsauth

npx skillsauth add oimiragieo/agent-studio telegram-voice-pipeline

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 7, 2026, 8:08 PM5.8s10 files scanned

SKILL.md

name:: telegram-voice-pipeline
description:: End-to-end voice message pipeline for Telegram — download OGG attachment, transcribe with Whisper, generate a text response, convert to MP3 via ElevenLabs TTS, and reply with the audio file.
version:: 1.0.0
invoked_by:: both
user_invocable:: false
tools:: [Bash, Read, TaskUpdate]
agents:: [developer, general-assistant]
category:: External Integrations
tags:: [telegram, voice, tts, whisper, elevenlabs, speech-to-text, audio, channels]
error_handling:: strict
streaming:: not-applicable

Telegram Voice Pipeline

Overview

This skill handles Telegram voice messages end-to-end:

Detect voice message via attachment_file_id in channel tag
Download OGG/Opus file via Telegram MCP tool
Transcribe with Whisper (transcribe-anything)
Process transcribed text as the user query — generate a text response
Generate TTS audio via ElevenLabs (fallback: OpenAI TTS) → MP3
Reply with the MP3 audio file via Telegram MCP tool

When to Invoke

Skill({ skill: 'telegram-voice-pipeline' });

Invoke when:

A Telegram channel tag has an attachment_file_id attribute (voice or audio message)
The channel agent needs to respond with audio rather than text
Any step in the voice pipeline needs to be re-run after a failure

Detection Pattern

Telegram voice messages arrive as channel tags:

<channel source="telegram" chat_id="123456" message_id="789" user="username" ts="1234567890" attachment_file_id="BQACAgIAAxkBAAIBc2...">

Key detection logic:

attachment_file_id present → voice/audio message → invoke this skill
image_path present → photo → use image handling instead
Neither present → text message → normal text response

Workflow

Step 1: Download the Voice File

Call the Telegram MCP download tool with the file_id from the channel tag:

// MCP tool call (agent uses this directly)
mcp__telegram - relay__download_attachment({ file_id: '<attachment_file_id>' });
// Returns: local file path, e.g. /tmp/voice_abc123.ogg

Verify: The returned path exists and is non-empty (> 1KB for a real voice message).

Error handling: If download fails, reply with a text message: "Sorry, I couldn't download your voice message. Please try again."

Step 2: Transcribe with Whisper

Install transcribe-anything if not present:

pip install transcribe-anything

Run transcription:

transcribe-anything /tmp/voice_abc123.ogg --model medium --output_dir /tmp/tg_voice/

Read the transcript:

cat /tmp/tg_voice/voice_abc123.txt

Model selection (trade-off between speed and accuracy):

Override via env: WHISPER_MODEL=small (default: medium)

Verify: /tmp/tg_voice/<filename>.txt exists and is non-empty.

Error handling: If transcription fails or output is empty, reply: "I received your voice message but couldn't transcribe it. Could you try sending it again or type your message?"

Step 3: Process Transcription as User Message

Use the transcribed text as the user input. Generate a text response using the agent's normal response logic.

transcribed_text = contents of /tmp/tg_voice/<filename>.txt
response_text = <agent's generated response to transcribed_text>

Guard max length for TTS: response_text[:4000] (ElevenLabs limit) or response_text[:4096] (OpenAI TTS limit).

Step 4: Generate TTS Audio

Primary: ElevenLabs (requires `ELEVENLABS_API_KEY`)

import os
from elevenlabs import ElevenLabs

client = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"])
audio = client.text_to_speech.convert(
    text=response_text[:4000],
    voice_id="JBFqnCBsd6RMkjVDRZzb",  # George — clear, neutral voice
    model_id="eleven_turbo_v2",
    output_format="mp3_44100_128",
)
output_path = "/tmp/tg_voice_response.mp3"
with open(output_path, "wb") as f:
    for chunk in audio:
        f.write(chunk)
print(f"TTS written to {output_path}")

Override voice via env: ELEVENLABS_VOICE_ID=<voice_id> (default: JBFqnCBsd6RMkjVDRZzb)

Fallback: OpenAI TTS (requires `OPENAI_API_KEY`, no `ELEVENLABS_API_KEY`)

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
with client.audio.speech.with_streaming_response.create(
    model="tts-1",
    voice="nova",
    input=response_text[:4096],
) as response:
    response.stream_to_file("/tmp/tg_voice_response.mp3")
print("TTS written to /tmp/tg_voice_response.mp3")

Selection logic:

if os.environ.get("ELEVENLABS_API_KEY"):
    # Use ElevenLabs
else:
    # Use OpenAI TTS fallback

Verify: /tmp/tg_voice_response.mp3 exists and is > 1KB.

Error handling: If TTS fails, send response_text as a plain text reply instead of audio.

Step 5: Reply with Audio File

// MCP tool call
mcp__telegram -
  relay__reply({
    chat_id: '<chat_id from channel tag>',
    text: response_text, // Also send the transcript so user can read it
    files: ['/tmp/tg_voice_response.mp3'],
  });

Note: Including text alongside the audio file gives the user both a readable transcript and the audio reply — useful for accessibility and noisy environments.

Verify: No error returned from the reply tool.

Step 6: Cleanup Temp Files

After a successful reply, clean up to avoid disk accumulation:

rm -f /tmp/tg_voice_response.mp3
rm -rf /tmp/tg_voice/

Environment Variables

At least one of ELEVENLABS_API_KEY or OPENAI_API_KEY must be set for TTS to work.

Full Pipeline Example

[Telegram] User sends 10-second voice message
   ↓
[Agent] Detects attachment_file_id in channel tag
   ↓
[MCP] download_attachment(file_id) → /tmp/voice_abc123.ogg
   ↓
[Bash] transcribe-anything /tmp/voice_abc123.ogg --model medium → "What is the weather like today?"
   ↓
[Agent] Generates response: "I don't have real-time weather data, but I can help you check..."
   ↓
[Python] ElevenLabs TTS → /tmp/tg_voice_response.mp3
   ↓
[MCP] reply(chat_id, text="I don't have...", files=["/tmp/tg_voice_response.mp3"])
   ↓
[Telegram] User receives text + audio reply
   ↓
[Bash] rm /tmp/tg_voice_response.mp3 && rm -rf /tmp/tg_voice/

Total time for 10-second voice message: ~15-25 seconds (download 1s + transcribe 12s + TTS 2s + reply 1s).

Anti-Patterns

Never skip the download step — attachment_file_id is not a file path, it must be resolved via the MCP tool
Never use shell: true for subprocess calls in transcription — use array args with shell: false
Never skip the text fallback — if TTS fails, always send the text response so the user gets an answer
Never process image attachments through this skill — photos use image_path, not attachment_file_id; route them differently
Never leave temp files on disk — clean up after every reply to prevent disk accumulation

Related Skills

enable-telegram — Start the channel daemon for background Telegram monitoring
transcription — Whisper transcription workflow (used in Step 2)
tts-generation — ElevenLabs and OpenAI TTS (used in Step 4)

References

Telegram MCP tools: mcp__telegram-relay__download_attachment, mcp__telegram-relay__reply
transcribe-anything: https://github.com/modal-labs/transcribe-anything
ElevenLabs Python SDK: https://github.com/elevenlabs/elevenlabs-python
OpenAI TTS docs: https://platform.openai.com/docs/guides/text-to-speech

Search Protocol

For code discovery and search tasks, follow this priority order:

pnpm search:code "<query>" (Primary intent-based search).
ripgrep (for exact keyword/regex matches).
semantic/structural search via code tools if available.

Memory Protocol (MANDATORY)

Before starting:

cat .claude/context/memory/learnings.md
cat .claude/context/memory/decisions.md

After completing:

New pattern -> .claude/context/memory/learnings.md
Issue found -> .claude/context/memory/issues.md
Decision made -> .claude/context/memory/decisions.md

ASSUME INTERRUPTION: Your context may reset. If it's not in memory, it didn't happen.

Related Skills

oimiragieo/neurokit2

tools

VerifiedTrustedCommunity

Comprehensive biosignal processing toolkit for analyzing physiological data including ECG, EEG, EDA, RSP, PPG, EMG, and EOG signals. Use this skill when processing cardiovascular signals, brain activity, electrodermal responses, respiratory patterns, muscle activity, or eye movements. Applicable for heart rate variability analysis, event-related potentials, complexity measures, autonomic nervous system assessment, psychophysiology research, and multi-modal physiological signal integration.

24SKILL.mdUpdated Apr 15, 2026

oimiragieo/networkx

tools

VerifiedTrustedCommunity

Comprehensive toolkit for creating, analyzing, and visualizing complex networks and graphs in Python. Use when working with network/graph data structures, analyzing relationships between entities, computing graph algorithms (shortest paths, centrality, clustering), detecting communities, generating synthetic networks, or visualizing network topologies. Applicable to social networks, biological networks, transportation systems, citation networks, and any domain involving pairwise relationships.

24SKILL.mdUpdated Apr 15, 2026

oimiragieo/molfeat

data-ai

VerifiedTrustedCommunity

Molecular featurization for ML (100+ featurizers). ECFP, MACCS, descriptors, pretrained models (ChemBERTa), convert SMILES to features, for QSAR and molecular ML.

24SKILL.mdUpdated Apr 15, 2026

oimiragieo/modal

development

VerifiedTrustedCommunity

Run Python code in the cloud with serverless containers, GPUs, and autoscaling. Use when deploying ML models, running batch processing jobs, scheduling compute-intensive tasks, or serving APIs that require GPU acceleration or dynamic scaling.

24SKILL.mdUpdated Apr 15, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/oimiragieo/agent-studio.git

# Copy into Claude Code skills folder (global)
cp -r agent-studio/.claude/skills/_archive/telegram-voice-pipeline ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

oimiragieo/agent-studio

23 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

oimiragieo/telegram-voice-pipeline

$ install --global

Security Scan Results

SKILL.md

Telegram Voice Pipeline

Overview

When to Invoke

Detection Pattern

Workflow

Step 1: Download the Voice File

Step 2: Transcribe with Whisper

Step 3: Process Transcription as User Message

Step 4: Generate TTS Audio

Primary: ElevenLabs (requires ELEVENLABS_API_KEY)

Fallback: OpenAI TTS (requires OPENAI_API_KEY, no ELEVENLABS_API_KEY)

Step 5: Reply with Audio File

Step 6: Cleanup Temp Files

Environment Variables

Full Pipeline Example

Anti-Patterns

Related Skills

References

Search Protocol

Memory Protocol (MANDATORY)

Related Skills

oimiragieo/neurokit2

oimiragieo/networkx

oimiragieo/molfeat

oimiragieo/modal

oimiragieo/telegram-voice-pipeline

$ install --global

Security Scan Results

SKILL.md

Telegram Voice Pipeline

Overview

When to Invoke

Detection Pattern

Workflow

Step 1: Download the Voice File

Step 2: Transcribe with Whisper

Step 3: Process Transcription as User Message

Step 4: Generate TTS Audio

Primary: ElevenLabs (requires ELEVENLABS_API_KEY)

Fallback: OpenAI TTS (requires OPENAI_API_KEY, no ELEVENLABS_API_KEY)

Step 5: Reply with Audio File

Step 6: Cleanup Temp Files

Environment Variables

Full Pipeline Example

Anti-Patterns

Related Skills

References

Search Protocol

Memory Protocol (MANDATORY)

Related Skills

oimiragieo/neurokit2

oimiragieo/networkx

oimiragieo/molfeat

oimiragieo/modal

Primary: ElevenLabs (requires `ELEVENLABS_API_KEY`)

Fallback: OpenAI TTS (requires `OPENAI_API_KEY`, no `ELEVENLABS_API_KEY`)

Primary: ElevenLabs (requires `ELEVENLABS_API_KEY`)

Fallback: OpenAI TTS (requires `OPENAI_API_KEY`, no `ELEVENLABS_API_KEY`)