skills/compute/speech-to-text/SKILL.md
# Speech-to-Text Transcription ## Metadata - **Category**: compute - **SDK**: `@0glabs/0g-serving-broker` ^0.6.5, `ethers` ^6.13.0 - **Activation Triggers**: "transcribe", "speech-to-text", "Whisper", "audio transcription" ## Purpose Transcribe audio files using 0G Compute Network providers running Whisper Large V3. Supports multiple audio formats and output types (JSON, text, SRT subtitles). ## Prerequisites - Node.js >= 22 - `@0glabs/0g-serving-broker` and `ethers` installed - Funded and
npx skillsauth add 0gfoundation/0g-agent-skills skills/compute/speech-to-textInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
4 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
@0glabs/0g-serving-broker ^0.6.5, ethers ^6.13.0Transcribe audio files using 0G Compute Network providers running Whisper Large V3. Supports multiple audio formats and output types (JSON, text, SRT subtitles).
@0glabs/0g-serving-broker and ethers installedspeech-to-text service.env with PRIVATE_KEY, RPC_URL, PROVIDER_ADDRESSZG-Res-Key header ONLYprocessResponse(providerAddress, chatID, usageData)ZG-Res-Key header (no body fallback for speech)processResponse() after every transcriptionprocessResponse() param order: (providerAddress, chatID, usageData)processResponse() after transcriptionimport { ethers } from 'ethers';
import { createZGComputeNetworkBroker } from '@0glabs/0g-serving-broker';
import * as fs from 'fs';
import 'dotenv/config';
async function transcribe(audioPath: string): Promise<string> {
const provider = new ethers.JsonRpcProvider(process.env.RPC_URL);
const wallet = new ethers.Wallet(process.env.PRIVATE_KEY!, provider);
const broker = await createZGComputeNetworkBroker(wallet);
const providerAddress = process.env.PROVIDER_ADDRESS!;
const { endpoint, model } = await broker.inference.getServiceMetadata(providerAddress);
const headers = await broker.inference.getRequestHeaders(providerAddress);
const formData = new FormData();
const audioBuffer = fs.readFileSync(audioPath);
const audioBlob = new Blob([audioBuffer]);
formData.append('file', audioBlob, audioPath.split('/').pop());
formData.append('model', model);
formData.append('response_format', 'json');
const response = await fetch(`${endpoint}/audio/transcriptions`, {
method: 'POST',
headers: { ...headers },
body: formData,
});
const data = await response.json();
// ChatID from header ONLY for speech-to-text
const chatID = response.headers.get('ZG-Res-Key') || response.headers.get('zg-res-key');
await broker.inference.processResponse(
providerAddress,
chatID,
data.usage ? JSON.stringify(data.usage) : undefined,
);
return data.text;
}
// Usage
const text = await transcribe('./audio/podcast.mp3');
console.log('Transcription:', text);
type OutputFormat = 'json' | 'text' | 'srt' | 'verbose_json';
async function transcribeWithFormat(
audioPath: string,
format: OutputFormat = 'json',
language?: string,
): Promise<any> {
const provider = new ethers.JsonRpcProvider(process.env.RPC_URL);
const wallet = new ethers.Wallet(process.env.PRIVATE_KEY!, provider);
const broker = await createZGComputeNetworkBroker(wallet);
const providerAddress = process.env.PROVIDER_ADDRESS!;
const { endpoint, model } = await broker.inference.getServiceMetadata(providerAddress);
const headers = await broker.inference.getRequestHeaders(providerAddress);
const formData = new FormData();
const audioBuffer = fs.readFileSync(audioPath);
formData.append('file', new Blob([audioBuffer]), audioPath.split('/').pop());
formData.append('model', model);
formData.append('response_format', format);
if (language) formData.append('language', language);
const response = await fetch(`${endpoint}/audio/transcriptions`, {
method: 'POST',
headers: { ...headers },
body: formData,
});
const chatID = response.headers.get('ZG-Res-Key') || response.headers.get('zg-res-key');
if (format === 'text' || format === 'srt') {
const text = await response.text();
if (chatID) {
await broker.inference.processResponse(providerAddress, chatID);
}
return text;
}
const data = await response.json();
await broker.inference.processResponse(
providerAddress,
chatID,
data.usage ? JSON.stringify(data.usage) : undefined,
);
return data;
}
// Usage
const srt = await transcribeWithFormat('./audio/meeting.mp3', 'srt', 'en');
fs.writeFileSync('./output/meeting.srt', srt);
async function safeTranscribe(audioPath: string): Promise<string | null> {
const provider = new ethers.JsonRpcProvider(process.env.RPC_URL);
const wallet = new ethers.Wallet(process.env.PRIVATE_KEY!, provider);
const broker = await createZGComputeNetworkBroker(wallet);
const providerAddress = process.env.PROVIDER_ADDRESS!;
try {
// Validate file exists
if (!fs.existsSync(audioPath)) {
throw new Error(`Audio file not found: ${audioPath}`);
}
// Validate file size (most providers have limits)
const stats = fs.statSync(audioPath);
const maxSize = 25 * 1024 * 1024; // 25MB typical limit
if (stats.size > maxSize) {
throw new Error(`File too large (${stats.size} bytes). Max: ${maxSize} bytes`);
}
const { endpoint, model } = await broker.inference.getServiceMetadata(providerAddress);
const headers = await broker.inference.getRequestHeaders(providerAddress);
const formData = new FormData();
const audioBuffer = fs.readFileSync(audioPath);
formData.append('file', new Blob([audioBuffer]), audioPath.split('/').pop());
formData.append('model', model);
formData.append('response_format', 'json');
const response = await fetch(`${endpoint}/audio/transcriptions`, {
method: 'POST',
headers: { ...headers },
body: formData,
});
if (!response.ok) {
throw new Error(`HTTP ${response.status}: ${await response.text()}`);
}
const data = await response.json();
const chatID = response.headers.get('ZG-Res-Key') || response.headers.get('zg-res-key');
await broker.inference.processResponse(
providerAddress,
chatID,
data.usage ? JSON.stringify(data.usage) : undefined,
);
return data.text;
} catch (error) {
console.error('Transcription failed:', error);
return null;
}
}
| Format | Extension | Notes |
| ------ | --------- | ------------ |
| MP3 | .mp3 | Most common |
| WAV | .wav | Uncompressed |
| OGG | .ogg | Compressed |
| FLAC | .flac | Lossless |
| WebM | .webm | Web native |
| Format | Description |
| -------------- | -------------------------------- |
| json | { "text": "..." } |
| text | Plain text string |
| srt | SubRip subtitle format |
| verbose_json | Includes timestamps and segments |
~0.0001 0G per minute of audio (varies by provider).
// BAD: Sending audio as JSON
const response = await fetch(endpoint, {
body: JSON.stringify({ audio: base64Data }), // WRONG — use FormData
});
// BAD: Getting chatID from body
const chatID = data.id; // WRONG for speech — header only
// BAD: Missing processResponse
const data = await response.json();
return data.text; // processResponse() never called!
// BAD: Hardcoding private keys
const wallet = new ethers.Wallet('0xabc123...', provider); // NEVER do this
// BAD: ethers v5 syntax
const provider = new ethers.providers.JsonRpcProvider(url); // v5!
| Error | Cause | Fix |
| --------------------------- | ------------------- | -------------------------------- |
| Insufficient balance | Sub-account empty | Transfer more funds |
| unsupported format | Wrong audio format | Use mp3, wav, ogg, flac, or webm |
| file too large | Audio file too big | Split into smaller segments |
| Fee verification failed | Missing chatID | Check ZG-Res-Key header |
| Provider not acknowledged | First-time provider | acknowledgeProviderSigner() |
development
# Upload File to 0G Storage ## Metadata - **Category**: storage - **SDK**: `@0glabs/0g-ts-sdk` ^0.3.3, `ethers` ^6.13.0 - **Activation Triggers**: "upload file", "store on 0G", "ZgFile", "save to storage" ## Purpose Upload files to 0G decentralized storage using the ZgFile API and Indexer. Files are split into chunks, organized as a Merkle tree, and distributed across storage nodes. Returns a root hash for later retrieval. ## Prerequisites - Node.js >= 18 - `@0glabs/0g-ts-sdk` and `ethers`
development
# Merkle Verification ## Metadata - **Category**: storage - **SDK**: `@0glabs/0g-ts-sdk` ^0.3.3 - **Activation Triggers**: "verify file", "merkle proof", "data integrity", "root hash", "check file" ## Purpose Compute root hashes and verify data integrity for files stored on 0G Storage. Uses Merkle tree proofs to cryptographically verify that downloaded data matches what was originally uploaded. ## Prerequisites - Node.js >= 18 - `@0glabs/0g-ts-sdk` installed ## Quick Workflow 1. Create
development
# Download File from 0G Storage ## Metadata - **Category**: storage - **SDK**: `@0glabs/0g-ts-sdk` ^0.3.3, `ethers` ^6.13.0 - **Activation Triggers**: "download file", "retrieve from 0G", "get file", "fetch from storage" ## Purpose Download and verify files from 0G decentralized storage using a root hash. Supports verified downloads with Merkle proof validation to ensure data integrity. ## Prerequisites - Node.js >= 18 - `@0glabs/0g-ts-sdk` installed - Root hash of the file to download - `
development
# Storage + Chain Integration ## Metadata - **Category**: cross-layer - **SDK**: `@0glabs/0g-ts-sdk` ^0.3.3, `ethers` ^6.13.0 - **Activation Triggers**: "on-chain reference", "NFT metadata on 0G", "store hash on-chain", "registry contract", "chain and storage" ## Purpose Combine 0G Storage with 0G Chain smart contracts to create on-chain references to off-chain data. Common patterns include NFT metadata storage, content registries, and verifiable document systems. ## Prerequisites - Node