.agents/skills/speech-recognition/SKILL.md
Transcribe speech to text using the Speech framework. Use when implementing live microphone transcription with AVAudioEngine, recognizing pre-recorded audio files, configuring on-device vs server-based recognition, handling authorization flows, or adopting the new SpeechAnalyzer API (iOS 26+) for modern async/await speech-to-text.
npx skillsauth add DFly7/iOS-FastAPI-Supabase-AI speech-recognitionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Transcribe live and pre-recorded audio to text using Apple's Speech framework.
Covers SFSpeechRecognizer (iOS 10+) and the new SpeechAnalyzer API (iOS 26+).
SpeechAnalyzer is an actor-based API introduced in iOS 26 that replaces
SFSpeechRecognizer for new projects. It uses Swift concurrency, AsyncSequence
for results, and supports modular analysis via SpeechTranscriber.
import Speech
// 1. Create a transcriber module
guard let locale = SpeechTranscriber.supportedLocale(
equivalentTo: Locale.current
) else { return }
let transcriber = SpeechTranscriber(locale: locale, preset: .offlineTranscription)
// 2. Ensure assets are installed
if let request = try await AssetInventory.assetInstallationRequest(
supporting: [transcriber]
) {
try await request.downloadAndInstall()
}
// 3. Create input stream and analyzer
let (inputSequence, inputBuilder) = AsyncStream.makeStream(of: AnalyzerInput.self)
let audioFormat = await SpeechAnalyzer.bestAvailableAudioFormat(
compatibleWith: [transcriber]
)
let analyzer = SpeechAnalyzer(modules: [transcriber])
// 4. Feed audio buffers (from AVAudioEngine or file)
Task {
// Append PCM buffers converted to audioFormat
let pcmBuffer: AVAudioPCMBuffer = // ... your audio buffer
inputBuilder.yield(AnalyzerInput(buffer: pcmBuffer))
inputBuilder.finish()
}
// 5. Consume results
Task {
for try await result in transcriber.results {
let text = String(result.text.characters)
print(text)
}
}
// 6. Run analysis
let lastSampleTime = try await analyzer.analyzeSequence(inputSequence)
// 7. Finalize
if let lastSampleTime {
try await analyzer.finalizeAndFinish(through: lastSampleTime)
} else {
try analyzer.cancelAndFinishNow()
}
let transcriber = SpeechTranscriber(locale: locale, preset: .offlineTranscription)
let audioFile = try AVAudioFile(forReading: fileURL)
let analyzer = SpeechAnalyzer(
inputAudioFile: audioFile, modules: [transcriber], finishAfterFile: true
)
for try await result in transcriber.results {
print(String(result.text.characters))
}
| Feature | SFSpeechRecognizer | SpeechAnalyzer |
|---|---|---|
| Concurrency | Callbacks/delegates | async/await + AsyncSequence |
| Type | class | actor |
| Modules | Monolithic | Composable (SpeechTranscriber, SpeechDetector) |
| Audio input | append(_:) on request | AsyncStream<AnalyzerInput> |
| Availability | iOS 10+ | iOS 26+ |
| On-device | requiresOnDeviceRecognition | Asset-based via AssetInventory |
import Speech
// Default locale (user's current language)
let recognizer = SFSpeechRecognizer()
// Specific locale
let recognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))
// Check if recognition is available for this locale
guard let recognizer, recognizer.isAvailable else {
print("Speech recognition not available")
return
}
final class SpeechManager: NSObject, SFSpeechRecognizerDelegate {
private let recognizer = SFSpeechRecognizer()!
override init() {
super.init()
recognizer.delegate = self
}
func speechRecognizer(
_ speechRecognizer: SFSpeechRecognizer,
availabilityDidChange available: Bool
) {
// Update UI — disable record button when unavailable
}
}
Request both speech recognition and microphone permissions before starting
live transcription. Add these keys to Info.plist:
NSSpeechRecognitionUsageDescriptionNSMicrophoneUsageDescriptionimport Speech
import AVFoundation
func requestPermissions() async -> Bool {
let speechStatus = await withCheckedContinuation { continuation in
SFSpeechRecognizer.requestAuthorization { status in
continuation.resume(returning: status)
}
}
guard speechStatus == .authorized else { return false }
let micStatus: Bool
if #available(iOS 17, *) {
micStatus = await AVAudioApplication.requestRecordPermission()
} else {
micStatus = await withCheckedContinuation { continuation in
AVAudioSession.sharedInstance().requestRecordPermission { granted in
continuation.resume(returning: granted)
}
}
}
return micStatus
}
The standard pattern: AVAudioEngine captures microphone audio → buffers are
appended to SFSpeechAudioBufferRecognitionRequest → results stream in.
import Speech
import AVFoundation
final class LiveTranscriber {
private let recognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))!
private let audioEngine = AVAudioEngine()
private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
private var recognitionTask: SFSpeechRecognitionTask?
func startTranscribing() throws {
// Cancel any in-progress task
recognitionTask?.cancel()
recognitionTask = nil
// Configure audio session
let audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
// Create request
let request = SFSpeechAudioBufferRecognitionRequest()
request.shouldReportPartialResults = true
self.recognitionRequest = request
// Start recognition task
recognitionTask = recognizer.recognitionTask(with: request) { result, error in
if let result {
let text = result.bestTranscription.formattedString
print("Transcription: \(text)")
if result.isFinal {
self.stopTranscribing()
}
}
if let error {
print("Recognition error: \(error)")
self.stopTranscribing()
}
}
// Install audio tap
let inputNode = audioEngine.inputNode
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) {
buffer, _ in
request.append(buffer)
}
audioEngine.prepare()
try audioEngine.start()
}
func stopTranscribing() {
audioEngine.stop()
audioEngine.inputNode.removeTap(onBus: 0)
recognitionRequest?.endAudio()
recognitionRequest = nil
recognitionTask?.cancel()
recognitionTask = nil
}
}
Use SFSpeechURLRecognitionRequest for audio files on disk:
func transcribeFile(at url: URL) async throws -> String {
guard let recognizer = SFSpeechRecognizer(), recognizer.isAvailable else {
throw SpeechError.unavailable
}
let request = SFSpeechURLRecognitionRequest(url: url)
request.shouldReportPartialResults = false
return try await withCheckedThrowingContinuation { continuation in
recognizer.recognitionTask(with: request) { result, error in
if let error {
continuation.resume(throwing: error)
} else if let result, result.isFinal {
continuation.resume(
returning: result.bestTranscription.formattedString
)
}
}
}
}
On-device recognition (iOS 13+) works offline but supports fewer locales:
let recognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))!
// Check if on-device is supported for this locale
if recognizer.supportsOnDeviceRecognition {
let request = SFSpeechAudioBufferRecognitionRequest()
request.requiresOnDeviceRecognition = true // Force on-device
}
Tip: On-device recognition avoids network latency and the one-minute audio limit imposed by server-based recognition. However, accuracy may be lower and not all locales are supported. Check
supportsOnDeviceRecognitionbefore forcing on-device mode.
let request = SFSpeechAudioBufferRecognitionRequest()
request.shouldReportPartialResults = true // default is true
recognizer.recognitionTask(with: request) { result, error in
guard let result else { return }
if result.isFinal {
// Final transcription — recognition is complete
let final = result.bestTranscription.formattedString
} else {
// Partial result — may change as more audio is processed
let partial = result.bestTranscription.formattedString
}
}
recognizer.recognitionTask(with: request) { result, error in
guard let result else { return }
// Best transcription
let best = result.bestTranscription
// All alternatives (sorted by confidence, descending)
for transcription in result.transcriptions {
for segment in transcription.segments {
print("\(segment.substring): \(segment.confidence)")
}
}
}
let request = SFSpeechAudioBufferRecognitionRequest()
request.addsPunctuation = true
Improve recognition of domain-specific terms:
let request = SFSpeechAudioBufferRecognitionRequest()
request.contextualStrings = ["SwiftUI", "Xcode", "CloudKit"]
// ❌ DON'T: Only request speech authorization for live audio
SFSpeechRecognizer.requestAuthorization { status in
// Missing microphone permission — audio engine will fail
self.startRecording()
}
// ✅ DO: Request both permissions before recording
SFSpeechRecognizer.requestAuthorization { status in
guard status == .authorized else { return }
AVAudioSession.sharedInstance().requestRecordPermission { granted in
guard granted else { return }
self.startRecording()
}
}
// ❌ DON'T: Assume recognizer stays available after initial check
let recognizer = SFSpeechRecognizer()!
// Recognition may fail if network drops or locale changes
// ✅ DO: Monitor availability via delegate
recognizer.delegate = self
func speechRecognizer(
_ speechRecognizer: SFSpeechRecognizer,
availabilityDidChange available: Bool
) {
recordButton.isEnabled = available
}
// ❌ DON'T: Leave audio engine running after recognition finishes
recognizer.recognitionTask(with: request) { result, error in
if result?.isFinal == true {
// Audio engine still running, wasting resources and battery
}
}
// ✅ DO: Clean up all audio resources
recognizer.recognitionTask(with: request) { result, error in
if result?.isFinal == true || error != nil {
self.audioEngine.stop()
self.audioEngine.inputNode.removeTap(onBus: 0)
self.recognitionRequest?.endAudio()
self.recognitionRequest = nil
}
}
// ❌ DON'T: Force on-device without checking support
let request = SFSpeechAudioBufferRecognitionRequest()
request.requiresOnDeviceRecognition = true // May silently fail
// ✅ DO: Check support before requiring on-device
if recognizer.supportsOnDeviceRecognition {
request.requiresOnDeviceRecognition = true
} else {
// Fall back to server-based or inform user
}
// ❌ DON'T: Start one long continuous recognition session
func startRecording() {
// This will be cut off after ~60 seconds (server-based)
}
// ✅ DO: Restart recognition when approaching the limit
func startRecording() {
// Use a timer to restart before the limit
recognitionTimer = Timer.scheduledTimer(withTimeInterval: 55, repeats: false) {
[weak self] _ in
self?.restartRecognition()
}
}
// ❌ DON'T: Start a new task without canceling the previous one
func startRecording() {
recognitionTask = recognizer.recognitionTask(with: request) { ... }
// Previous task is still running — undefined behavior
}
// ✅ DO: Cancel existing task before creating a new one
func startRecording() {
recognitionTask?.cancel()
recognitionTask = nil
recognitionTask = recognizer.recognitionTask(with: request) { ... }
}
NSSpeechRecognitionUsageDescription is in Info.plistNSMicrophoneUsageDescription is in Info.plist (if using live audio)SFSpeechRecognizerDelegate is set to handle availabilityDidChangerecognitionRequest.endAudio() is called when done recordingrecognitionTask is canceled before starting a new onesupportsOnDeviceRecognition is checked before requiring on-device modeisFinal) resultsAssetInventory assets are installed before using SpeechAnalyzerSpeechTranscriber.supportedLocale(equivalentTo:) is checkeddevelopment
Resolve Swift concurrency compiler errors, adopt approachable concurrency (SE-0466), and write data-race-safe async code. Use when fixing Sendable conformance errors, actor isolation warnings, or strict concurrency diagnostics; when adopting default MainActor isolation, @concurrent, nonisolated(nonsending), or Task.immediate; when designing actor-based architectures, structured concurrency with TaskGroup, or background work offloading; or when migrating from @preconcurrency to full Swift 6 strict concurrency.
development
Implement Swift Codable models for JSON and property-list encoding and decoding with JSONDecoder, JSONEncoder, CodingKeys, and custom init(from:) or encode(to:). Use when parsing API responses, remapping keys, flattening nested JSON, handling date or data decoding strategies, decoding heterogeneous arrays, or integrating Codable with URLSession, SwiftData, or UserDefaults.
development
Implement, review, or improve data visualizations using Swift Charts. Use when building bar, line, area, point, pie, or donut charts; when adding chart selection, scrolling, or annotations; when plotting functions with vectorized BarPlot, LinePlot, AreaPlot, or PointPlot; when customizing axes, scales, legends, or foregroundStyle grouping; or when creating specialized visualizations like heat maps, Gantt charts, stacked/grouped bars, sparklines, or threshold lines.
data-ai
Postgres performance optimization and best practices from Supabase. Use this skill when writing, reviewing, or optimizing Postgres queries, schema designs, or database configurations.