plugins/shiny-client/skills/shiny-speech/SKILL.md
Generate code using Shiny.Speech for cross-platform speech-to-text, text-to-speech, audio capture, and audio playback with pluggable cloud providers
npx skillsauth add shinyorg/skills shiny-speechInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are an expert in Shiny Speech, a library that provides cross-platform speech-to-text, text-to-speech, audio capture, and audio playback for .NET MAUI and Blazor WebAssembly with pluggable cloud providers.
Invoke this skill when the user wants to:
GitHub: https://github.com/shinyorg/speech NuGet Packages:
Shiny.Speech — Core library with platform-native STT, TTS, audio capture, and playbackShiny.Speech.Cloud — Cloud provider abstractionsShiny.Speech.Azure — Azure AI Speech providerShiny.Speech.ElevenLabs — ElevenLabs TTS providerNamespace: Shiny.Speech
Shiny Speech provides:
ISpeechToTextService (iOS, Android, Windows, Browser/WASM)ResultReceived, KeywordHeard, Error events allow multiple subscribersStart() to begin listening, Stop() to end; Start() throws if already listeningKeywords in SpeechRecognitionOptions and subscribe to KeywordHeardITextToSpeechService (iOS, Android, Windows, Browser/WASM)IAudioSource (raw PCM 16kHz, 16-bit, mono — all platforms including browser)IAudioPlayer (MP3 format; browser uses HTML5 Audio via base64 data URL)ISpeechToTextProvider and ITextToSpeechProviderListenUntilSilence, StatementAfterKeyword, WaitListenForKeywords, ListenForKeywordsAccessState and RequestAccess()AudioLevelChanged event on ITextToSpeechService and IAudioPlayer emits a normalized 0.0–1.0 RMS level during playback; IsPlayerAnalysisSupported reports per-platform availabilityFor platform-native speech only:
dotnet add package Shiny.Speech
For Azure AI Speech (cloud STT + TTS):
dotnet add package Shiny.Speech
dotnet add package Shiny.Speech.Azure
For ElevenLabs (cloud TTS):
dotnet add package Shiny.Speech
dotnet add package Shiny.Speech.ElevenLabs
Platform-native speech services:
builder.Services.AddSpeechServices(); // Registers STT, TTS, AudioSource, AudioPlayer
// On Browser/WASM: auto-detected via OperatingSystem.IsBrowser()
Or register individually:
builder.Services.AddSpeechToText(); // ISpeechToTextService only
builder.Services.AddTextToSpeech(); // ITextToSpeechService only
builder.Services.AddAudioSource(); // IAudioSource only
builder.Services.AddAudioPlayer(); // IAudioPlayer only
Azure AI Speech (replaces platform-native with cloud):
builder.Services.AddAzureSpeech("your-subscription-key", "eastus");
// Automatically registers IAudioSource and IAudioPlayer for platform audio I/O
Or with config object and selective services:
builder.Services.AddAzureSpeech(
new AzureSpeechConfig { SubscriptionKey = "key", Region = "eastus" },
speechToText: true,
textToSpeech: true
);
ElevenLabs (replaces platform-native STT/TTS with cloud — Scribe + TTS):
// Register both STT (Scribe) and TTS at once
builder.Services.AddElevenLabsSpeech("your-api-key");
// Or selectively
builder.Services.AddElevenLabsSpeechToText("your-api-key"); // Scribe STT only
builder.Services.AddElevenLabsTextToSpeech("your-api-key"); // TTS only
// Auto-registers IAudioSource and/or IAudioPlayer for platform audio I/O as needed
// With a config object — overrides default Scribe model / TTS model / voice
builder.Services.AddElevenLabsSpeech(new ElevenLabsConfig
{
ApiKey = "your-api-key",
SpeechToTextModel = "scribe_v1",
TextToSpeechModel = "eleven_multilingual_v2",
DefaultVoiceId = "21m00Tcm4TlvDq8ikWAM"
});
ElevenLabs Scribe is request/response, not streaming: results are yielded as a single final
SpeechRecognitionResultwhen the user callsStop()(the captured audio is buffered, wrapped in a WAV container, and posted to/v1/speech-to-text). For continuous partial results, use Azure instead.
Android — Add to AndroidManifest.xml:
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
MODIFY_AUDIO_SETTINGS is required for the TTS audio-level Visualizer and for the native STT beep suppression.
iOS — Add to Info.plist:
<key>NSSpeechRecognitionUsageDescription</key>
<string>This app uses speech recognition</string>
<key>NSMicrophoneUsageDescription</key>
<string>This app uses the microphone for speech recognition</string>
Browser (Blazor WebAssembly) — No manifest changes needed. The browser prompts the user for microphone access automatically. Include the JS interop module in index.html:
<script src="shiny-speech.js"></script>
Note:
IAudioSourcecaptures raw PCM audio in the browser using the Web Audio API (getUserMedia+ScriptProcessorNode), downsampled to 16kHz 16-bit mono — the same format as other platforms.
Always check permissions before using STT. The service uses a Start/Stop model with events.
public class MyViewModel(ISpeechToTextService stt)
{
async Task StartListening()
{
var access = await stt.RequestAccess();
if (access != AccessState.Available)
return;
// Subscribe to events (multiple subscribers allowed)
stt.ResultReceived += (s, result) =>
{
// result.Text — recognized text
// result.IsFinal — true when segment is finalized
// result.Confidence — optional confidence score (0-1)
};
stt.KeywordHeard += (s, keyword) =>
{
// keyword — the matched keyword string
};
stt.Error += (s, error) =>
{
// error.Message — error description
// error.Exception — optional exception
};
// Start listening (throws InvalidOperationException if already listening)
await stt.Start(new SpeechRecognitionOptions
{
Culture = CultureInfo.GetCultureInfo("en-US"),
SilenceTimeout = TimeSpan.FromSeconds(3),
PreferOnDevice = true,
Keywords = ["Yes", "No", "Maybe"] // optional keyword detection
});
}
async Task StopListening()
{
await stt.Stop(); // no-op if not listening
}
}
public class MyViewModel(ISpeechToTextService stt)
{
async Task SimpleDictation(CancellationToken ct)
{
// Listen until silence — starts, waits for first final result, stops
var text = await stt.ListenUntilSilence(
new SpeechRecognitionOptions
{
Culture = CultureInfo.GetCultureInfo("en-US"),
SilenceTimeout = TimeSpan.FromSeconds(3)
},
ct
);
}
async Task WakeWordActivation(CancellationToken ct)
{
// "Hey Computer, do something" → returns "do something"
// Waits for keyword, then captures next final statement
var command = await stt.StatementAfterKeyword(
["Hey Computer"],
cancellationToken: ct
);
}
async Task WaitForAnswer(CancellationToken ct)
{
// Wait for one specific keyword (with optional timeout)
var answer = await stt.WaitListenForKeywords(
["Yes", "No", "Maybe"],
timeout: TimeSpan.FromSeconds(30),
cancellationToken: ct
);
// Returns matched keyword or null on timeout
}
async Task ContinuousKeywords(CancellationToken ct)
{
// Stream keywords continuously as IAsyncEnumerable
await foreach (var keyword in stt.ListenForKeywords(
["Up", "Down", "Left", "Right"],
cancellationToken: ct))
{
Console.WriteLine($"Direction: {keyword}");
}
}
}
public class MyViewModel(ITextToSpeechService tts)
{
async Task Speak()
{
// Simple speech
await tts.SpeakAsync("Hello, world!");
// With options
await tts.SpeakAsync("Hello, world!", new TextToSpeechOptions
{
SpeechRate = 1.2f,
Pitch = 1.0f,
Volume = 0.8f,
Culture = CultureInfo.GetCultureInfo("en-US")
});
// List available voices
var voices = await tts.GetVoicesAsync();
var voice = voices.FirstOrDefault(v => v.Name.Contains("Neural"));
// Speak with specific voice
await tts.SpeakAsync("Hello!", new TextToSpeechOptions { Voice = voice });
// Stop speaking
if (tts.IsSpeaking)
await tts.StopAsync();
}
}
public class MyViewModel(IAudioSource audioSource)
{
async Task CaptureAudio(CancellationToken ct)
{
// Returns raw PCM stream (16kHz, 16-bit, mono)
await using var stream = await audioSource.StartCaptureAsync(ct);
// Read audio data from stream...
// Stream remains open until StopCaptureAsync is called
await audioSource.StopCaptureAsync();
}
}
public class MyViewModel(IAudioPlayer audioPlayer)
{
async Task PlayAudio(Stream mp3Stream, CancellationToken ct)
{
// Play MP3 format audio
await audioPlayer.PlayAsync(mp3Stream, ct);
// Check playback state
if (audioPlayer.IsPlaying)
await audioPlayer.StopAsync();
}
}
Subscribe to AudioLevelChanged on ITextToSpeechService (native + cloud TTS) or IAudioPlayer (generic audio playback). Each emitted value is a normalized RMS level in 0.0–1.0. Always gate UI on IsPlayerAnalysisSupported — it is false on Windows native TTS and Browser.
public partial class TtsViewModel(ITextToSpeechService tts) : ObservableObject
{
[ObservableProperty] double audioLevel; // bind to ProgressBar.Progress
public bool IsVuSupported => tts.IsPlayerAnalysisSupported;
public TtsViewModel(ITextToSpeechService tts) : this(tts)
=> tts.AudioLevelChanged += (_, level) =>
MainThread.BeginInvokeOnMainThread(() => AudioLevel = level);
}
Platform behaviour:
| Surface | iOS / macOS | Android | Windows | Browser |
|---|---|---|---|---|
| Native TTS (ITextToSpeechService) | ✅ AVAudioEngine + player-node tap | ✅ OnAudioAvailable PCM RMS | ❌ | ❌ |
| Cloud TTS (CloudTextToSpeech) | ✅ forwarded from IAudioPlayer | ✅ forwarded from IAudioPlayer | ❌ | ❌ |
| Generic playback (IAudioPlayer) | ✅ AVAudioPlayer.MeteringEnabled | ✅ Visualizer on session | ❌ | ❌ |
Apple native TTS plays through AVAudioEngine + AVAudioPlayerNode so a tap on the player node can compute RMS. The engine is created lazily on first speak and kept warm — first utterance adds ~50–150 ms; subsequent utterances are indistinguishable. Reset AudioLevel to 0 on speak completion / StopAsync so the meter drains.
Implement ISpeechToTextProvider and/or ITextToSpeechProvider:
public class MyCloudSttProvider : ISpeechToTextProvider
{
// Required: surface non-fatal errors (e.g. a transient network blip between
// chunked requests in continuous mode) without aborting the IAsyncEnumerable.
// CloudSpeechToText subscribes to this and forwards to ISpeechToTextService.Error.
public event EventHandler<SpeechRecognitionError>? Error;
public async IAsyncEnumerable<SpeechRecognitionResult> RecognizeAsync(
Stream audioStream,
SpeechRecognitionOptions? options = null,
[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
// Send audioStream to your cloud API
// Yield results as they arrive
try
{
yield return new SpeechRecognitionResult("Hello", IsFinal: true, Confidence: 0.95f);
}
catch (HttpRequestException ex)
{
// Non-fatal: signal the error and let the session keep running.
// Throwing instead would terminate the enumerator and end the session.
Error?.Invoke(this, new SpeechRecognitionError(ex.Message, ex));
}
}
}
// Register in DI (IAudioSource is auto-registered)
builder.Services.AddCloudSpeechToText<MyCloudSttProvider>();
RequestAccess() before STT operationsStart() to begin, Stop() to end; Start() throws if already listeningStart() to avoid missing resultsStop() to avoid leaksListenUntilSilence, StatementAfterKeyword, WaitListenForKeywords, ListenForKeywords handle Start/Stop/event wiring for youIAudioSource and IAudioPlayer implement IAsyncDisposableListenUntilSilence — For simple dictation scenariosStatementAfterKeyword — For "Hey Siri" style wake word activationWaitListenForKeywords — For yes/no/choice scenariosListenForKeywords — For continuous keyword detection as an async streamIAudioSource and IAudioPlayer as needed via TryAdd, so manual registration is no longer requiredAccessState — Check for NotSupported, Denied, and Restricted statesIsListening/IsSpeaking/IsPlaying — Check state before starting new listening/speech/playbackPreferOnDevice — Set to true for offline-capable STT when availableAddSpeechServices() uses OperatingSystem.IsBrowser() at runtime to register browser implementations; no conditional code needed in your appIAudioSource captures raw PCM via the Web Audio API (getUserMedia + ScriptProcessorNode), downsampled to 16kHz 16-bit monoshiny-speech.js in index.html for speech services to workPlayAndRecord with AllowBluetooth / AllowBluetoothA2dp / DefaultToSpeaker, so when CarPlay is active iOS automatically routes audio through the car's microphone and speakers — no CarPlay-specific code neededIsPlayerAnalysisSupported before showing meter UI; events do not fire on platforms where metering isn't available (Windows native TTS, Browser)AudioLevelChanged to the UI thread — the event fires from the audio render / synthesizer thread; use MainThread.BeginInvokeOnMainThread in MAUI or equivalent in Blazor before mutating bound propertiesAudioLevel back to 0 after SpeakAsync returns or StopAsync is called so the meter drains visuallyFor detailed API documentation, see:
reference/api-reference.md - Full API surface, interfaces, records, and configurationdotnet add package Shiny.Speech # Core platform-native speech services
dotnet add package Shiny.Speech.Cloud # Cloud provider abstractions (included by Azure/ElevenLabs)
dotnet add package Shiny.Speech.Azure # Azure AI Speech provider
dotnet add package Shiny.Speech.ElevenLabs # ElevenLabs TTS provider
devops
Guide for implementing push notifications in .NET MAUI apps using Shiny.Push (native FCM/APNs) and Shiny.Push.AzureNotificationHubs
tools
Cross-platform local notification management for .NET MAUI apps using Shiny, supporting scheduled, repeating, and geofence-triggered notifications with channels, badges, and interactive actions.
tools
GPS tracking, geofence monitoring, and motion activity recognition for .NET MAUI, iOS, and Android using Shiny.Locations
data-ai
Background job scheduling and execution for .NET MAUI (iOS/Android native OS schedulers) and in-process jobs for plain .NET, Linux, macOS, and Blazor WASM using Shiny.Jobs