Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

beam-ai-team/google-speech-to-text

Name: google-speech-to-text
Author: beam-ai-team

skills/integrations/google/google-speech-to-text/SKILL.md

npx skillsauth add beam-ai-team/beam-next-skills google-speech-to-text

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Google Speech-to-Text

Transcribe audio files (MP3, WAV, FLAC, etc.) to text using Google Cloud Speech-to-Text API. Supports short files (sync) and long files (batch via Cloud Storage).

Safety Contract

Before transcribing audio, uploading to Cloud Storage, splitting media, or writing transcripts, show the source audio/video path, language settings, service mode, output path, temporary cloud objects when used, and whether private speech content is included. Require explicit user approval in the current turn before sending audio to Google APIs, uploading media, or writing transcript files. Inspecting file metadata does not require approval.

Quick Start

Transcribe a short audio file (≤60 seconds)

python3 skills/integrations/google/google-speech-to-text/scripts/transcribe_operations.py transcribe path/to/audio.mp3

Transcribe with output to file

python3 skills/integrations/google/google-speech-to-text/scripts/transcribe_operations.py transcribe path/to/audio.mp3 --output transcript.txt

Output is automatically formatted into readable paragraphs. Use --no-format for raw output.

Language (auto-detect default)

Default: Auto-detects German + English. Use --language to force a single language:

# Auto-detect (de-DE, en-US) - default
python3 ... transcribe audio.opus

# Force German
python3 ... transcribe audio.opus --language de-DE

# Force English
python3 ... transcribe audio.opus --language en-US

Long audio – local file (no GCS): `transcribe-long`

For long local audio or video (e.g. MP4, Opus), use transcribe-long. It extracts audio (if video), splits into 60s chunks, transcribes each, and merges. Requires ffmpeg on PATH (or FFMPEG_PATH).

# Video (e.g. MP4) or long audio – German + English by default
python3 skills/integrations/google/google-speech-to-text/scripts/transcribe_operations.py transcribe-long /path/to/file.mp4 --output transcript.txt

Use the project’s venv if you installed google-cloud-speech there:

.venv-speech/bin/python3 skills/integrations/google/google-speech-to-text/scripts/transcribe_operations.py transcribe-long /path/to/file.mp4 --output transcript.txt

Long audio – GCS batch (up to 8 hours)

For files >60 seconds without local chunking, upload to Google Cloud Storage and use batch mode:

# 1. Upload to GCS (gsutil or your bucket)
gcloud storage cp audio.mp3 gs://your-bucket/audio.mp3

# 2. Batch transcribe
python3 skills/integrations/google/google-speech-to-text/scripts/transcribe_operations.py transcribe-batch gs://your-bucket/audio.mp3 --output transcript.txt

Pre-Flight Check

Speech-to-Text uses Google Cloud (not OAuth Workspace). It requires Application Default Credentials.

# Check if configured
python3 skills/integrations/google/google-speech-to-text/scripts/transcribe_operations.py check

If not configured:

gcloud auth application-default login

Then set GOOGLE_CLOUD_PROJECT in .env or export GOOGLE_CLOUD_PROJECT=your-project-id.

Setup

1. Enable the API

Go to Google Cloud Console
Select your project (or create one)
Enable Speech-to-Text API: Enable API
Ensure billing is enabled (free tier: $300 credits for new accounts)

2. Authenticate

# Option A: User credentials (recommended for local use)
gcloud auth application-default login

# Option B: Service account (for automation)
gcloud iam service-accounts create speech-transcribe --display-name "Speech Transcription"
# ... grant role, then:
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account-key.json

3. Set project ID

Add to .env at Beam Next root:

GOOGLE_CLOUD_PROJECT=your-project-id

Or use GOOGLE_PROJECT_ID from your existing Google setup (same project can host both Workspace and Speech APIs).

Supported formats

| Format | Sync | Batch | |--------|------|-------| | WAV, FLAC, LINEAR16 | ✅ | ✅ | | MP3, OGG, AMR | ✅ (auto-decode) | ✅ | | WebM, Opus | ✅ | ✅ |

Model options

--model long: Best for long-form (meetings, podcasts) - default for batch
--model short: Default for sync, optimized for short utterances
--model latest_long: Latest long-form model (Chirp 3)

API Reference

| Mode | Limit | Use case | |------|-------|----------| | Sync | ≤60 sec | Quick clips, voice notes | | Batch | ≤480 min | Meetings, podcasts, interviews |

Note: Google Cloud Speech-to-Text is a separate API from Google Workspace (Gmail, Docs). It uses Application Default Credentials, not OAuth tokens. You can use the same Google Cloud project.

Additional resources

For setup: references/setup-guide.md
Google Cloud Speech-to-Text docs

beam-ai-team/google-speech-to-text

skills/integrations/google/google-speech-to-text/SKILL.md

Transcribe audio files to text using Google Cloud Speech-to-Text API. Load when user mentions 'transcribe', 'speech to text', 'audio to text', 'transcribe audio', 'voice to text', 'transcription', or converting audio/recordings to text.

development

Updated Jul 8, 2026

$ install --global

skillsauth

npx skillsauth add beam-ai-team/beam-next-skills google-speech-to-text

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jul 8, 2026, 3:51 AM139.5s3 files scanned

SKILL.md

name:: google-speech-to-text
type:: skill
version:: 1.0
description:: Transcribe audio files to text using Google Cloud Speech-to-Text API.
category:: integrations
platform:: Google Workspace
updated:: 2026-02-26
visibility:: public
- approval:: audio_processing

Google Speech-to-Text

Transcribe audio files (MP3, WAV, FLAC, etc.) to text using Google Cloud Speech-to-Text API. Supports short files (sync) and long files (batch via Cloud Storage).

Safety Contract

Quick Start

Transcribe a short audio file (≤60 seconds)

python3 skills/integrations/google/google-speech-to-text/scripts/transcribe_operations.py transcribe path/to/audio.mp3

Transcribe with output to file

python3 skills/integrations/google/google-speech-to-text/scripts/transcribe_operations.py transcribe path/to/audio.mp3 --output transcript.txt

Output is automatically formatted into readable paragraphs. Use --no-format for raw output.

Language (auto-detect default)

Default: Auto-detects German + English. Use --language to force a single language:

# Auto-detect (de-DE, en-US) - default
python3 ... transcribe audio.opus

# Force German
python3 ... transcribe audio.opus --language de-DE

# Force English
python3 ... transcribe audio.opus --language en-US

Long audio – local file (no GCS): `transcribe-long`

# Video (e.g. MP4) or long audio – German + English by default
python3 skills/integrations/google/google-speech-to-text/scripts/transcribe_operations.py transcribe-long /path/to/file.mp4 --output transcript.txt

Use the project’s venv if you installed google-cloud-speech there:

.venv-speech/bin/python3 skills/integrations/google/google-speech-to-text/scripts/transcribe_operations.py transcribe-long /path/to/file.mp4 --output transcript.txt

Long audio – GCS batch (up to 8 hours)

For files >60 seconds without local chunking, upload to Google Cloud Storage and use batch mode:

# 1. Upload to GCS (gsutil or your bucket)
gcloud storage cp audio.mp3 gs://your-bucket/audio.mp3

# 2. Batch transcribe
python3 skills/integrations/google/google-speech-to-text/scripts/transcribe_operations.py transcribe-batch gs://your-bucket/audio.mp3 --output transcript.txt

Pre-Flight Check

Speech-to-Text uses Google Cloud (not OAuth Workspace). It requires Application Default Credentials.

# Check if configured
python3 skills/integrations/google/google-speech-to-text/scripts/transcribe_operations.py check

If not configured:

gcloud auth application-default login

Then set GOOGLE_CLOUD_PROJECT in .env or export GOOGLE_CLOUD_PROJECT=your-project-id.

Setup

1. Enable the API

Go to Google Cloud Console
Select your project (or create one)
Enable Speech-to-Text API: Enable API
Ensure billing is enabled (free tier: $300 credits for new accounts)

2. Authenticate

# Option A: User credentials (recommended for local use)
gcloud auth application-default login

# Option B: Service account (for automation)
gcloud iam service-accounts create speech-transcribe --display-name "Speech Transcription"
# ... grant role, then:
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account-key.json

3. Set project ID

Add to .env at Beam Next root:

GOOGLE_CLOUD_PROJECT=your-project-id

Or use GOOGLE_PROJECT_ID from your existing Google setup (same project can host both Workspace and Speech APIs).

Supported formats

| Format | Sync | Batch | |--------|------|-------| | WAV, FLAC, LINEAR16 | ✅ | ✅ | | MP3, OGG, AMR | ✅ (auto-decode) | ✅ | | WebM, Opus | ✅ | ✅ |

Model options

--model long: Best for long-form (meetings, podcasts) - default for batch
--model short: Default for sync, optimized for short utterances
--model latest_long: Latest long-form model (Chirp 3)

API Reference

| Mode | Limit | Use case | |------|-------|----------| | Sync | ≤60 sec | Quick clips, voice notes | | Batch | ≤480 min | Meetings, podcasts, interviews |

Note: Google Cloud Speech-to-Text is a separate API from Google Workspace (Gmail, Docs). It uses Application Default Credentials, not OAuth tokens. You can use the same Google Cloud project.

Additional resources

For setup: references/setup-guide.md
Google Cloud Speech-to-Text docs

Related Skills

beam-ai-team/use-case-proposal

tools

VerifiedTrustedCommunity

Build a Palantir-shape, PDF-native use-case proposal document for a sophisticated enterprise account: research-grounded use cases (each with description, challenge, impact, value), an operating-graph ontology page, a recommended PoC with a week-by-week plan, and a closing page that asks for one decision. Load when a client asks us to 'propose high-impact use cases', requests a use-case presentation/catalog for a function (finance, HR, ops), or when a technical evaluation team will review candidates to pick a PoC. NOT for single-account cold outreach (use prospect-brief), full process diagnostics (use operating-diagnostic), or priced proposals (use proposal-creation).

SKILL.mdUpdated Jul 8, 2026

beam-ai-team/use-case-proposal

beam-ai-team/beam-figma-to-html-slides

development

VerifiedTrustedCommunity

Convert Beam Figma slide designs into high-fidelity, editable HTML presentation decks. Use when Codex is asked to audit Figma slides, extract slide templates, rebuild Beam slides as HTML decks, decide whether Figma imagery should be exported or rebuilt in HTML/CSS, create Beam/Prism-compatible deck templates, or improve fidelity of existing Beam HTML slide rebuilds.

SKILL.mdUpdated Jul 8, 2026

beam-ai-team/beam-figma-to-html-slides

beam-ai-team/beam-ai-slide-library

development

VerifiedTrustedCommunity

Use the Beam AI reusable slide library: individual HTML slide templates extracted from Beam Figma rebuilds, kept separate from deck themes and full deck templates. Load when the user asks for a slide library, specific Beam slide patterns, reusable Figma-inspired slides, Prism slide-library items, or slide-level HTML templates.

SKILL.mdUpdated Jul 8, 2026

beam-ai-team/beam-ai-slide-library

beam-ai-team/beam-ai-deck-templates

development

VerifiedTrustedCommunity

Use Beam AI deck and report design packs, HTML templates, and curated examples to create sales decks, customer intro decks, RPO decks, and DIN A4 use-case proposal reports. Load when the user asks for Beam-branded presentation templates, Prism-compatible deck templates, Beam report templates, customer intro decks, commercial proposals, or reusable HTML deck/report examples.

SKILL.mdUpdated Jul 8, 2026

beam-ai-team/beam-ai-deck-templates

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/beam-ai-team/beam-next-skills.git

# Copy into Claude Code skills folder (global)
cp -r beam-next-skills/skills/integrations/google/google-speech-to-text ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

beam-ai-team/beam-next-skills

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

beam-ai-team/google-speech-to-text

$ install --global

Security Scan Results

SKILL.md

Google Speech-to-Text

Safety Contract

Quick Start

Transcribe a short audio file (≤60 seconds)

Transcribe with output to file

Language (auto-detect default)

Long audio – local file (no GCS): transcribe-long

Long audio – GCS batch (up to 8 hours)

Pre-Flight Check

Setup

1. Enable the API

2. Authenticate

3. Set project ID

Supported formats

Model options

API Reference

Additional resources

Related Skills

beam-ai-team/use-case-proposal

beam-ai-team/beam-figma-to-html-slides

beam-ai-team/beam-ai-slide-library

beam-ai-team/beam-ai-deck-templates

beam-ai-team/google-speech-to-text

$ install --global

Security Scan Results

SKILL.md

Google Speech-to-Text

Safety Contract

Quick Start

Transcribe a short audio file (≤60 seconds)

Transcribe with output to file

Language (auto-detect default)

Long audio – local file (no GCS): transcribe-long

Long audio – GCS batch (up to 8 hours)

Pre-Flight Check

Setup

1. Enable the API

2. Authenticate

3. Set project ID

Supported formats

Model options

API Reference

Additional resources

Related Skills

beam-ai-team/use-case-proposal

beam-ai-team/beam-figma-to-html-slides

beam-ai-team/beam-ai-slide-library

beam-ai-team/beam-ai-deck-templates

Long audio – local file (no GCS): `transcribe-long`

Long audio – local file (no GCS): `transcribe-long`