skills/pipeline-orchestration/SKILL.md
# Pipeline Orchestration ## Capability Coordinates the complete voice interaction pipeline from audio input to audio output, managing the flow between STT, MCP client, and TTS stages with event-driven architecture and latency budget enforcement. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `pipeline.create` | `z.object({ sessionId: z.string(), config: z.object({ stt: z.string(), tts: z.string(), mcp: z.string() }) })` | `{ pipeline
npx skillsauth add reaatech/voice-agent-kit skills/pipeline-orchestrationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Coordinates the complete voice interaction pipeline from audio input to audio output, managing the flow between STT, MCP client, and TTS stages with event-driven architecture and latency budget enforcement.
| Tool | Input Schema | Output | Rate Limit |
|------|-------------|--------|------------|
| pipeline.create | z.object({ sessionId: z.string(), config: z.object({ stt: z.string(), tts: z.string(), mcp: z.string() }) }) | { pipelineId: string, status: 'running' } | 10 RPM |
| pipeline.processAudio | z.object({ pipelineId: z.string(), chunk: z.instanceof(Buffer) }) | { events: PipelineEvent[] } | 1000 RPM |
| pipeline.cancel | z.object({ pipelineId: z.string() }) | { cancelled: boolean } | 60 RPM |
| pipeline.status | z.object({ pipelineId: z.string() }) | { status: string, currentStage: string, latencyMs: number } | 60 RPM |
{
"name": "pipeline.create",
"arguments": {
"sessionId": "sess-abc123",
"config": {
"stt": "deepgram",
"tts": "deepgram",
"mcp": "http://mcp-server:8080"
}
}
}
{
"pipelineId": "pipe-xyz789",
"status": "running"
}
{
"name": "pipeline.processAudio",
"arguments": {
"pipelineId": "pipe-xyz789",
"chunk": "<base64-encoded-mulaw-audio>"
}
}
{
"events": [
{ "type": "stt:interim", "transcript": "Hello, I need to..." },
{ "type": "stt:final", "transcript": "Hello, I need to reset my password" },
{ "type": "stt:eos" },
{ "type": "mcp:request", "utterance": "Hello, I need to reset my password" },
{ "type": "mcp:response", "text": "I can help with that." },
{ "type": "tts:start" },
{ "type": "tts:first_byte", "latencyMs": 150 },
{ "type": "tts:chunk", "audio": "<base64-audio>" },
{ "type": "tts:complete" },
{ "type": "pipeline:turn:end", "totalLatencyMs": 720 }
]
}
| Failure | Cause | Recovery | |---------|-------|----------| | Pipeline already exists | Duplicate session ID | Return existing pipeline or error | | Stage timeout | Provider not responding | Cancel stage, emit timeout event, use fallback | | Backpressure | Audio input faster than processing | Buffer with limits, drop oldest if full | | Cancellation during MCP | Barge-in detected | Cannot cancel MCP; complete, then handle new utterance | | Invalid config | Missing provider config | Return validation error with missing fields |
tools
# Twilio Media Streams ## Capability Handles Twilio Media Streams WebSocket connections for real-time bidirectional audio communication, parsing inbound messages, encoding outbound audio, and managing call lifecycle events. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `twilio.handleStart` | `z.object({ message: z.object({ event: z.literal('start'), callSid: z.string(), streamSid: z.string(), format: z.string(), tracks: z.array(z.st
tools
# TTS Provider Interface ## Capability Provides a unified interface for text-to-speech (TTS) providers, enabling streaming audio synthesis with first-byte latency tracking, voice selection, and output format conversion. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `tts.synthesize` | `z.object({ text: z.string(), config: z.object({ provider: z.string(), voice: z.string().optional(), speed: z.number().optional() }) })` | `{ chunks: A
tools
# Telephony Lifecycle ## Capability Manages the complete lifecycle of voice calls from TwiML webhook initiation through call completion, including call connect, transfer, conference, and disconnect handling with proper session cleanup. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `telephony.generateTwiML` | `z.object({ sessionId: z.string(), wsUrl: z.string().url() })` | `{ twiml: string }` | 100 RPM | | `telephony.handleConnect` |
tools
# STT Provider Interface ## Capability Provides a unified interface for speech-to-text (STT) providers, enabling real-time streaming transcription with interim results, endpoint detection, and automatic reconnection handling. ## MCP Tools | Tool | Input Schema | Output | Rate Limit | |------|-------------|--------|------------| | `stt.connect` | `z.object({ provider: z.enum(['deepgram', 'aws-transcribe', 'google-cloud']), config: z.object({ apiKey: z.string().optional(), sampleRate: z.number