partner-built/zoom-plugin/skills/rtms/SKILL.md
Reference skill for Zoom RTMS. Use after routing to a live-media workflow when processing real-time audio, video, chat, transcripts, screen share, or contact-center voice streams.
npx skillsauth add anthropics/knowledge-work-plugins zoom-rtmsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
4 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Background reference for live Zoom media pipelines. Prefer build-zoom-bot first, then use this skill for stream types, capabilities, and RTMS-specific implementation constraints.
Expert guidance for accessing live audio, video, transcript, chat, and screen share data from Zoom meetings, webinars, Video SDK sessions, and Zoom Contact Center Voice in real-time. RTMS uses a WebSocket-based protocol with open standards and does not require a meeting bot to capture the media plane.
RTMS is primarily a backend media ingestion service.
Optional architecture (common):
Use RTMS for media/data plane, and use frontend frameworks/Zoom Apps for presentation + user interactions.
Official Documentation: https://developers.zoom.us/docs/rtms/ SDK Reference (JS): https://zoom.github.io/rtms/js/ SDK Reference (Python): https://zoom.github.io/rtms/py/ Sample Repository: https://github.com/zoom/rtms-samples
New to RTMS? Follow this path:
Complete Implementation:
Reference:
Having issues?
| Product | Webhook Event | Payload ID | App Type |
|---------|--------------|------------|----------|
| Meetings | meeting.rtms_started / meeting.rtms_stopped | meeting_uuid | General App |
| Webinars | webinar.rtms_started / webinar.rtms_stopped | meeting_uuid (same!) | General App |
| Video SDK | session.rtms_started / session.rtms_stopped | session_id | Video SDK App |
| Zoom Contact Center Voice | Product-specific RTMS/ZCC Voice events | Product-specific stream/session identifiers | Contact Center / approved RTMS integration |
Once connected, the core signaling/media socket model is shared across products. Meetings, webinars, and Video SDK sessions use the familiar start/stop webhooks. Zoom Contact Center Voice adds its own RTMS/ZCC Voice event family and should be treated as the same transport model with product-specific event payloads.
RTMS is a data pipeline that gives your app access to live media from Zoom meetings, webinars, and Video SDK sessions without participant bots. Instead of having automated clients join meetings, use RTMS to collect media data directly from Zoom's infrastructure.
| Media Type | Format | Use Cases | |------------|--------|-----------| | Audio | PCM (L16), G.711, G.722, Opus | Transcription, voice analysis, recording | | Video | H.264, JPG, PNG | Recording, AI vision, thumbnails, active participant selection | | Screen Share | H.264, JPG, PNG | Content capture, slide extraction | | Transcript | JSON text | Meeting notes, search, compliance | | Chat | JSON text | Archive, sentiment analysis |
src_language and enable_lid. Default behavior is LID enabled. Set enable_lid: false to force a fixed language.data_opt is set to VIDEO_SINGLE_INDIVIDUAL_STREAM.STREAM_CLOSE_REQ over the signaling socket and wait for STREAM_CLOSE_RESP.| Approach | Best For | Complexity |
|----------|----------|------------|
| SDK (@zoom/rtms) | Most use cases | Low - handles WebSocket complexity |
| Manual WebSocket | Custom protocols, other languages | High - full protocol implementation |
Need RTMS access? Post in Zoom Developer Forum requesting RTMS access with your use case.
import rtms from "@zoom/rtms";
// All RTMS start/stop events across products
const RTMS_EVENTS = ["meeting.rtms_started", "webinar.rtms_started", "session.rtms_started"];
// Handle webhook events
rtms.onWebhookEvent(({ event, payload }) => {
if (!RTMS_EVENTS.includes(event)) return;
const client = new rtms.Client();
client.onAudioData((data, timestamp, metadata) => {
console.log(`Audio from ${metadata.userName}: ${data.length} bytes`);
});
client.onTranscriptData((data, timestamp, metadata) => {
const text = data.toString('utf8');
console.log(`${metadata.userName}: ${text}`);
});
client.onJoinConfirm((reason) => {
console.log(`Joined session: ${reason}`);
});
// SDK handles all WebSocket connections automatically
// Accepts both meeting_uuid and session_id transparently
client.join(payload);
});
For full control or non-SDK languages, implement the two-phase WebSocket protocol:
const WebSocket = require('ws');
const crypto = require('crypto');
const RTMS_EVENTS = ['meeting.rtms_started', 'webinar.rtms_started', 'session.rtms_started'];
// 1. Generate signature
// For meetings/webinars: uses meeting_uuid. For Video SDK: uses session_id.
function generateSignature(clientId, idValue, streamId, clientSecret) {
const message = `${clientId},${idValue},${streamId}`;
return crypto.createHmac('sha256', clientSecret).update(message).digest('hex');
}
// 2. Handle webhook
app.post('/webhook', (req, res) => {
res.status(200).send(); // CRITICAL: Respond immediately!
const { event, payload } = req.body;
if (RTMS_EVENTS.includes(event)) {
connectToRTMS(payload);
}
});
// 3. Connect to signaling WebSocket
function connectToRTMS(payload) {
const { server_urls, rtms_stream_id } = payload;
// meeting_uuid for meetings/webinars, session_id for Video SDK
const idValue = payload.meeting_uuid || payload.session_id;
const signature = generateSignature(CLIENT_ID, idValue, rtms_stream_id, CLIENT_SECRET);
const signalingWs = new WebSocket(server_urls);
signalingWs.on('open', () => {
signalingWs.send(JSON.stringify({
msg_type: 1, // Handshake request
protocol_version: 1,
meeting_uuid: idValue,
rtms_stream_id,
signature,
media_type: 9 // AUDIO(1) | TRANSCRIPT(8)
}));
});
// ... handle responses, connect to media WebSocket
}
See: Manual WebSocket Guide for complete implementation.
Combine types with bitwise OR:
| Type | Value | Description | |------|-------|-------------| | Audio | 1 | PCM audio samples | | Video | 2 | H.264/JPG video frames | | Screen Share | 4 | Separate from video! | | Transcript | 8 | Real-time speech-to-text | | Chat | 16 | In-meeting chat messages | | All | 32 | All media types |
Example: Audio + Transcript = 1 | 8 = 9
| Issue | Solution |
|-------|----------|
| Only 1 connection allowed | New connections kick out existing ones. Track active sessions! |
| Respond 200 immediately | If webhook delays, Zoom retries creating duplicate connections |
| Heartbeat mandatory | Respond to msg_type 12 with msg_type 13, or connection dies |
| Reconnection is YOUR job | RTMS doesn't auto-reconnect. Media keep-alive tolerance is now about 65s; signaling remains around 60s |
| Transcript language drift | Use src_language plus enable_lid: false when you want fixed-language transcription instead of automatic language switching |
| Single participant video only | VIDEO_SINGLE_INDIVIDUAL_STREAM supports one participant at a time. A new VIDEO_SUBSCRIPTION_REQ overrides the previous selection |
| Graceful close is explicit now | Use STREAM_CLOSE_REQ / STREAM_CLOSE_RESP when your backend wants to terminate the stream cleanly |
# Required - Authentication
ZM_RTMS_CLIENT=your_client_id # Zoom OAuth Client ID
ZM_RTMS_SECRET=your_client_secret # Zoom OAuth Client Secret
# Optional - Webhook server
ZM_RTMS_PORT=8080 # Default: 8080
ZM_RTMS_PATH=/webhook # Default: /
# Optional - Logging
ZM_RTMS_LOG_LEVEL=info # error, warn, info, debug, trace
ZM_RTMS_LOG_FORMAT=progressive # progressive or json
ZM_RTMS_LOG_ENABLED=true
ZOOM_CLIENT_ID=your_client_id
ZOOM_CLIENT_SECRET=your_client_secret
ZOOM_SECRET_TOKEN=your_webhook_token # For webhook validation
meeting.rtms_startedmeeting.rtms_stoppedwebinar.rtms_started (if using webinars)webinar.rtms_stopped (if using webinars)meeting:read:meeting_audiomeeting:read:meeting_videomeeting:read:meeting_transcriptmeeting:read:meeting_chatwebinar:read:webinar_audio (if using webinars)webinar:read:webinar_video (if using webinars)webinar:read:webinar_transcript (if using webinars)webinar:read:webinar_chat (if using webinars)session.rtms_startedsession.rtms_stopped| Repository | Description | |------------|-------------| | rtms-samples | RTMSManager, boilerplates, AI samples | | rtms-quickstart-js | JavaScript SDK quickstart | | rtms-quickstart-py | Python SDK quickstart | | rtms-sdk-cpp | C++ SDK | | zoom-rtms | Main SDK repository |
| Sample | Description | |--------|-------------| | rtms-meeting-assistant-starter-kit | AI meeting assistant with summaries | | arlo-meeting-assistant | Production meeting assistant with DB | | videosdk-rtms-transcribe-audio | Whisper transcription |
Need help? Start with Integrated Index section below for complete navigation.
This section was migrated from SKILL.md.
RTMS provides real-time access to live audio, video, transcript, chat, and screen share from Zoom meetings, webinars, and Video SDK sessions.
Treat RTMS as a backend service for receiving and processing media streams.
Do not model RTMS as a frontend-only SDK.
If you're new to RTMS, follow this order:
Run preflight checks first -> RUNBOOK.md
Understand the architecture -> concepts/connection-architecture.md
Choose your approach -> SDK or Manual
Understand the lifecycle -> concepts/lifecycle-flow.md
Configure media types -> references/media-types.md
Troubleshoot issues -> troubleshooting/common-issues.md
rtms/
├── SKILL.md # Main skill overview
├── SKILL.md # This file - navigation guide
│
├── concepts/ # Core architectural patterns
│ ├── connection-architecture.md # Two-phase WebSocket design
│ └── lifecycle-flow.md # Webhook to streaming flow
│
├── examples/ # Complete working code
│ ├── sdk-quickstart.md # Using @zoom/rtms SDK
│ ├── manual-websocket.md # Raw protocol implementation
│ ├── rtms-bot.md # Complete RTMS bot implementation
│ └── ai-integration.md # Transcription and analysis
│
├── references/ # Reference documentation
│ ├── media-types.md # Audio, video, transcript, chat, share
│ ├── data-types.md # All enums and constants
│ ├── connection.md # WebSocket protocol details
│ └── webhooks.md # Event subscription
│
└── troubleshooting/ # Problem solving guides
└── common-issues.md # FAQ and solutions
meeting.rtms_started. Uses General App with OAuth.webinar.rtms_started. Payload still uses meeting_uuid (NOT webinar_uuid).session.rtms_started. Payload uses session_id (NOT meeting_uuid).concepts/connection-architecture.md
RTMS uses two separate WebSocket connections:
examples/sdk-quickstart.md vs examples/manual-websocket.md
| SDK | Manual | |-----|--------| | Handles WebSocket complexity | Full protocol control | | Automatic reconnection | DIY reconnection | | Less code | More code | | Best for most use cases | Best for custom requirements |
troubleshooting/common-issues.md
Two-Phase WebSocket Design
Webhook Response Timing
Heartbeat is Mandatory
Signature Generation
HMAC-SHA256(clientSecret, "clientId,meetingUuid,streamId")session_id in place of meetingUuidmeeting_uuid (not webinar_uuid)Media Types are Bitmasks
Screen Share is SEPARATE from Video
-> Common Issues
-> Webhook timing
-> Media Types - Check configuration
-> Manual WebSocket
-> Data Types
-> AI Integration
Based on Zoom RTMS SDK v1.x and official documentation as of 2026.
Happy coding!
Remember: Start with SDK Quickstart for the fastest path, or Manual WebSocket if you need full control.
testing
Reads a forwarded customer email or ticket, pulls order/refund status from PayPal and account history from HubSpot, drafts a tone-matched reply in the owner's writing voice, and can issue a PayPal refund with explicit owner approval. Use when the user says "draft a response," "answer this customer," "where's my order," or "I want a refund."
development
Prepares tax-season materials for small business owners — framed as deliverables for their accountant, not tax advice. Two modes: (1) quarterly estimated tax calculation — pulls YTD net income from QuickBooks and calculates the federal income tax + self-employment tax liability and quarterly payment due; (2) year-end 1099 prep — scans QuickBooks, PayPal, and Stripe for contractors paid over $600, builds a 1099-NEC candidate list with missing W-9 flags, and produces a plain-English summary a CPA can work from directly. Trigger this skill whenever the user mentions: quarterly taxes, estimated tax payment, how much to set aside for taxes, 1099s, 1099-NEC, year-end tax prep, contractor payments, W-9s, or any phrase suggesting they are preparing for a tax deadline or handing materials to an accountant. Also trigger proactively when a user asks about net profit or YTD income in a context that suggests they are worried about their tax bill.
tools
Prepares tax-season materials — quarterly estimated tax calculation or year-end 1099 prep — and produces an accountant handoff packet. Accepts optional mode and year arguments.
tools
The front door to the Small Business plugin. Listens to what the owner needs right now — vague or specific — and routes them to the best skill or slash command for the moment. Also serves as a guide: explains what's available, suggests what to try next, and adapts recommendations based on stored business context. Trigger whenever the owner asks "what can you do," "help me with my business," "what should I focus on," "I don't know where to start," or any open-ended business request that doesn't clearly match a single skill.