Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

reaatech/skills/audio-format-conversion

Name: skills/audio-format-conversion
Author: reaatech

skills/audio-format-conversion/SKILL.md

npx skillsauth add reaatech/voice-agent-kit skills/audio-format-conversion

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Audio Format Conversion

Capability

Handles audio encoding and decoding between various formats (mulaw, linear16, PCM) and sample rates, enabling interoperability between Twilio (mulaw 8kHz) and STT/TTS providers that may use different audio formats.

MCP Tools

| Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | audio.decode | z.object({ data: z.instanceof(Buffer), fromEncoding: z.enum(['mulaw', 'linear16', 'pcm']), toEncoding: z.enum(['mulaw', 'linear16', 'pcm']), sampleRate: z.number() }) | { audio: Buffer, encoding: string, sampleRate: number } | 1000 RPM | | audio.resample | z.object({ data: z.instanceof(Buffer), fromSampleRate: z.number(), toSampleRate: z.number() }) | { audio: Buffer, sampleRate: number } | 1000 RPM | | audio.encode | z.object({ data: z.instanceof(Buffer), encoding: z.enum(['mulaw', 'linear16', 'pcm']), sampleRate: z.number() }) | { data: Buffer } | 1000 RPM | | audio.validate | z.object({ data: z.instanceof(Buffer), expectedEncoding: z.string(), expectedSampleRate: z.number() }) | { valid: boolean, detected?: { encoding: string, sampleRate: number } } | 100 RPM |

Usage Examples

Example 1: Convert Twilio mulaw to linear16 for STT

User intent: Convert inbound Twilio audio for provider that requires linear16

Tool call:

{
  "name": "audio.decode",
  "arguments": {
    "data": "<Buffer mulaw 8kHz>",
    "fromEncoding": "mulaw",
    "toEncoding": "linear16",
    "sampleRate": 8000
  }
}

Expected response:

{
  "audio": "<Buffer linear16 8kHz>",
  "encoding": "linear16",
  "sampleRate": 8000
}

Example 2: Resample audio from 16kHz to 8kHz

User intent: Downsample high-sample-rate audio for Twilio

Tool call:

{
  "name": "audio.resample",
  "arguments": {
    "data": "<Buffer 16kHz audio>",
    "fromSampleRate": 16000,
    "toSampleRate": 8000
  }
}

Expected response:

{
  "audio": "<Buffer 8kHz audio>",
  "sampleRate": 8000
}

Example 3: Encode audio to Twilio format

User intent: Convert TTS output to Twilio-compatible mulaw 8kHz

Tool call:

{
  "name": "audio.encode",
  "arguments": {
    "data": "<Buffer PCM 24kHz from Deepgram>",
    "encoding": "mulaw",
    "sampleRate": 8000
  }
}

Expected response:
```
{
  "data": "<Buffer mulaw 8kHz>"
}
```

Example 4: Validate audio format

User intent: Check if audio matches expected format

Tool call:

{
  "name": "audio.validate",
  "arguments": {
    "data": "<Buffer audio>",
    "expectedEncoding": "mulaw",
    "expectedSampleRate": 8000
  }
}

Expected response:
```
{
  "valid": true
}
```

Error Handling

Known Failure Modes

| Failure | Cause | Recovery | |---------|-------|----------| | Invalid encoding | Unknown format | Return error with supported formats | | Sample rate mismatch | Unexpected sample rate | Auto-detect or return error | | Buffer too small | Incomplete audio chunk | Pad with silence or return error | | Conversion overflow | Values out of range | Clamp values, log warning | | Memory exhaustion | Very large buffer | Process in chunks, return error if too large |

Recovery Strategies

Format errors: Return original buffer with error, let caller decide
Sample rate detection: Use audio metadata or heuristic detection
Large buffers: Process in 20ms chunks to limit memory usage

Security Considerations

PII Handling

Never log raw audio content
Do not store converted audio unless explicitly configured
Process audio in-memory without disk persistence

Permissions

Audio conversion is a local operation (no external API calls)
Rate limit to prevent CPU exhaustion
Validate buffer sizes to prevent memory attacks

Audit Logging

Log conversion operations (format, sample rate, duration)
Track conversion errors by type
Record performance metrics (conversion time)

Audio Format Specifications

Mulaw (µ-law)

Type: Lossy companded audio
Bit depth: 8 bits per sample
Dynamic range: ~14 bits (companded)
Common sample rates: 8000 Hz (telephony)
Use case: Twilio, PSTN, legacy telephony

Linear16 (PCM 16-bit)

Type: Uncompressed PCM
Bit depth: 16 bits per sample
Dynamic range: 96 dB
Common sample rates: 8000, 16000, 22050, 44100, 48000 Hz
Use case: Most STT/TTS providers

PCM (Generic)

Type: Uncompressed PCM
Bit depth: 8, 16, 24, or 32 bits
Endianness: Little-endian (typically)
Use case: Raw audio from various sources

Conversion Matrix

| From → To | Mulaw 8k | Linear16 8k | Linear16 16k | Linear16 24k | Linear16 48k | |-----------|----------|-------------|--------------|--------------|--------------| | Mulaw 8k | — | Decode | Resample+Decode | Resample+Decode | Resample+Decode | | Linear16 8k | Encode | — | Resample | Resample | Resample | | Linear16 16k | Resample+Encode | Resample | — | Resample | Resample | | Linear16 24k | Resample+Encode | Resample | Resample | — | Resample | | Linear16 48k | Resample+Encode | Resample | Resample | Resample | — |

Performance Considerations

Conversion Latency

| Operation | Typical Latency (per 20ms chunk) | |-----------|----------------------------------| | Mulaw ↔ Linear16 | < 1ms | | Resample (2x) | < 2ms | | Resample (3x, 6x) | < 5ms |

Memory Usage

Mulaw: 1 byte per sample (8000 Hz = 160 bytes per 20ms)
Linear16: 2 bytes per sample (8000 Hz = 320 bytes per 20ms)
Linear16 16k: 2 bytes per sample (16000 Hz = 640 bytes per 20ms)

Optimization Tips

Batch conversions — Convert multiple chunks together when possible
Use native bindings — Use optimized native libraries for resampling
Cache conversion state — Reuse resampler instances across chunks
Stream processing — Process audio in small chunks to minimize latency

Metrics and Observability

Key Metrics

| Metric | Type | Description | |--------|------|-------------| | audio.conversions.total | Counter | Total conversions performed | | audio.conversions.errors | Counter | Conversion errors | | audio.conversions.latency_ms | Histogram | Conversion latency | | audio.resampling.total | Counter | Resampling operations |

Tracing

| Span | Attributes | |------|------------| | audio.decode | from_encoding, to_encoding, sample_rate, buffer_size | | audio.resample | from_rate, to_rate, buffer_size | | audio.encode | encoding, sample_rate, buffer_size |

Related Skills

STT Provider Interface
TTS Provider Interface
Twilio Media Streams

reaatech/skills/audio-format-conversion

skills/audio-format-conversion/SKILL.md

# Audio Format Conversion ## Capability Handles audio encoding and decoding between various formats (mulaw, linear16, PCM) and sample rates, enabling interoperability between Twilio (mulaw 8kHz) and STT/TTS providers that may use different audio formats. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `audio.decode` | `z.object({ data: z.instanceof(Buffer), fromEncoding: z.enum(['mulaw', 'linear16', 'pcm']), toEncoding: z.enum(['mulaw

tools

Updated Apr 25, 2026

$ install --global

skillsauth

npx skillsauth add reaatech/voice-agent-kit skills/audio-format-conversion

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 25, 2026, 2:57 PM51.6s1 file scanned

SKILL.md

Audio Format Conversion

Capability

MCP Tools

Usage Examples

Example 1: Convert Twilio mulaw to linear16 for STT

User intent: Convert inbound Twilio audio for provider that requires linear16

Tool call:

{
  "name": "audio.decode",
  "arguments": {
    "data": "<Buffer mulaw 8kHz>",
    "fromEncoding": "mulaw",
    "toEncoding": "linear16",
    "sampleRate": 8000
  }
}

Expected response:

{
  "audio": "<Buffer linear16 8kHz>",
  "encoding": "linear16",
  "sampleRate": 8000
}

Example 2: Resample audio from 16kHz to 8kHz

User intent: Downsample high-sample-rate audio for Twilio

Tool call:

{
  "name": "audio.resample",
  "arguments": {
    "data": "<Buffer 16kHz audio>",
    "fromSampleRate": 16000,
    "toSampleRate": 8000
  }
}

Expected response:

{
  "audio": "<Buffer 8kHz audio>",
  "sampleRate": 8000
}

Example 3: Encode audio to Twilio format

User intent: Convert TTS output to Twilio-compatible mulaw 8kHz

Tool call:

{
  "name": "audio.encode",
  "arguments": {
    "data": "<Buffer PCM 24kHz from Deepgram>",
    "encoding": "mulaw",
    "sampleRate": 8000
  }
}

Expected response:
```
{
  "data": "<Buffer mulaw 8kHz>"
}
```

Example 4: Validate audio format

User intent: Check if audio matches expected format

Tool call:

{
  "name": "audio.validate",
  "arguments": {
    "data": "<Buffer audio>",
    "expectedEncoding": "mulaw",
    "expectedSampleRate": 8000
  }
}

Expected response:
```
{
  "valid": true
}
```

Error Handling

Known Failure Modes

Recovery Strategies

Format errors: Return original buffer with error, let caller decide
Sample rate detection: Use audio metadata or heuristic detection
Large buffers: Process in 20ms chunks to limit memory usage

Security Considerations

PII Handling

Never log raw audio content
Do not store converted audio unless explicitly configured
Process audio in-memory without disk persistence

Permissions

Audio conversion is a local operation (no external API calls)
Rate limit to prevent CPU exhaustion
Validate buffer sizes to prevent memory attacks

Audit Logging

Log conversion operations (format, sample rate, duration)
Track conversion errors by type
Record performance metrics (conversion time)

Audio Format Specifications

Mulaw (µ-law)

Type: Lossy companded audio
Bit depth: 8 bits per sample
Dynamic range: ~14 bits (companded)
Common sample rates: 8000 Hz (telephony)
Use case: Twilio, PSTN, legacy telephony

Linear16 (PCM 16-bit)

Type: Uncompressed PCM
Bit depth: 16 bits per sample
Dynamic range: 96 dB
Common sample rates: 8000, 16000, 22050, 44100, 48000 Hz
Use case: Most STT/TTS providers

PCM (Generic)

Type: Uncompressed PCM
Bit depth: 8, 16, 24, or 32 bits
Endianness: Little-endian (typically)
Use case: Raw audio from various sources

Conversion Matrix

Performance Considerations

Conversion Latency

| Operation | Typical Latency (per 20ms chunk) | |-----------|----------------------------------| | Mulaw ↔ Linear16 | < 1ms | | Resample (2x) | < 2ms | | Resample (3x, 6x) | < 5ms |

Memory Usage

Mulaw: 1 byte per sample (8000 Hz = 160 bytes per 20ms)
Linear16: 2 bytes per sample (8000 Hz = 320 bytes per 20ms)
Linear16 16k: 2 bytes per sample (16000 Hz = 640 bytes per 20ms)

Optimization Tips

Batch conversions — Convert multiple chunks together when possible
Use native bindings — Use optimized native libraries for resampling
Cache conversion state — Reuse resampler instances across chunks
Stream processing — Process audio in small chunks to minimize latency

Metrics and Observability

Key Metrics

Tracing

Related Skills

STT Provider Interface
TTS Provider Interface
Twilio Media Streams

Related Skills

reaatech/skills/twilio-media-streams

tools

VerifiedTrustedCommunity

# Twilio Media Streams ## Capability Handles Twilio Media Streams WebSocket connections for real-time bidirectional audio communication, parsing inbound messages, encoding outbound audio, and managing call lifecycle events. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `twilio.handleStart` | `z.object({ message: z.object({ event: z.literal('start'), callSid: z.string(), streamSid: z.string(), format: z.string(), tracks: z.array(z.st

SKILL.mdUpdated Apr 25, 2026

reaatech/skills/twilio-media-streams

reaatech/skills/tts-provider-interface

tools

VerifiedTrustedCommunity

# TTS Provider Interface ## Capability Provides a unified interface for text-to-speech (TTS) providers, enabling streaming audio synthesis with first-byte latency tracking, voice selection, and output format conversion. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `tts.synthesize` | `z.object({ text: z.string(), config: z.object({ provider: z.string(), voice: z.string().optional(), speed: z.number().optional() }) })` | `{ chunks: A

SKILL.mdUpdated Apr 25, 2026

reaatech/skills/tts-provider-interface

reaatech/skills/telephony-lifecycle

tools

VerifiedTrustedCommunity

# Telephony Lifecycle ## Capability Manages the complete lifecycle of voice calls from TwiML webhook initiation through call completion, including call connect, transfer, conference, and disconnect handling with proper session cleanup. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `telephony.generateTwiML` | `z.object({ sessionId: z.string(), wsUrl: z.string().url() })` | `{ twiml: string }` | 100 RPM | | `telephony.handleConnect` |

SKILL.mdUpdated Apr 25, 2026

reaatech/skills/telephony-lifecycle

reaatech/skills/stt-provider-interface

tools

VerifiedTrustedCommunity

# STT Provider Interface ## Capability Provides a unified interface for speech-to-text (STT) providers, enabling real-time streaming transcription with interim results, endpoint detection, and automatic reconnection handling. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `stt.connect` | `z.object({ provider: z.enum(['deepgram', 'aws-transcribe', 'google-cloud']), config: z.object({ apiKey: z.string().optional(), sampleRate: z.number

SKILL.mdUpdated Apr 25, 2026

reaatech/skills/stt-provider-interface

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/reaatech/voice-agent-kit.git

# Copy into Claude Code skills folder (global)
cp -r voice-agent-kit/skills/audio-format-conversion ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

reaatech/voice-agent-kit

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT