skills/speech-recognition/SKILL.md
Transcribe speech to text using Apple's Speech framework. Use when implementing live microphone transcription with AVAudioEngine, recognizing recorded audio files, handling speech and microphone authorization, choosing on-device vs server-backed SFSpeechRecognizer behavior, or adopting SpeechAnalyzer, SpeechTranscriber, DictationTranscriber, AssetInventory, and async result streams on iOS 26+.
npx skillsauth add dpearson2699/swift-ios-skills speech-recognitionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Transcribe live and pre-recorded audio to text using Apple's Speech framework.
Covers SpeechAnalyzer / SpeechTranscriber (iOS 26+) and
SFSpeechRecognizer (iOS 10+). Targets Swift 6.3 / iOS 26+ while preserving
fallback guidance for apps that support older OS versions.
Scope boundary: Use this skill for speech-to-text recognition, speech
authorization, microphone capture plumbing, and result handling. Hand off text
analysis, language identification after transcription, sentiment, embeddings,
and translation to natural-language; hand off audio playback UI to avkit;
hand off summarization or generation over transcripts to apple-on-device-ai.
Use SpeechAnalyzer for modern iOS 26+ speech analysis, especially long-form
recordings, live transcription, time-indexed transcripts, and fully on-device
flows. Keep SFSpeechRecognizer for iOS 10+ deployment targets, server-backed
locale coverage, or existing callback/delegate implementations.
Read SpeechAnalyzer patterns when implementing an iOS 26+ transcription pipeline, model asset handling, volatile results, or file/buffer examples.
SpeechTranscriber for the newer general-purpose on-device model.DictationTranscriber when SpeechTranscriber is unavailable for the
current device or locale and dictation-compatible support is acceptable.SpeechDetector only in conjunction with a transcriber when voice
activity detection is worth the accuracy/power tradeoff.SpeechTranscriber.isAvailableSpeechTranscriber.supportedLocale(equivalentTo:)SpeechTranscriber.installedLocales / supportedLocales when showing
language choices..transcription for basic accurate transcription..progressiveTranscription for live UI updates..timeIndexedProgressiveTranscription when playback highlighting needs
audioTimeRange.AssetInventory.assetInstallationRequest.SpeechAnalyzer.bestAvailableAudioFormat(compatibleWith:) before yielding
AnalyzerInput.AsyncSequence in a separate task.finalizeAndFinish(through:),
finalizeAndFinishThroughEndOfInput(), or cancelAndFinishNow().Do not use an offlineTranscription preset; Apple does not document one.
Finishing an AsyncStream input sequence does not finish the analyzer session.
import Speech
// Default locale (user's current language)
let recognizer = SFSpeechRecognizer()
// Specific locale
let recognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))
// Check if recognition is available for this locale
guard let recognizer, recognizer.isAvailable else {
print("Speech recognition not available")
return
}
final class SpeechManager: NSObject, SFSpeechRecognizerDelegate {
private let recognizer = SFSpeechRecognizer()!
override init() {
super.init()
recognizer.delegate = self
}
func speechRecognizer(
_ speechRecognizer: SFSpeechRecognizer,
availabilityDidChange available: Bool
) {
// Update UI — disable record button when unavailable
}
}
Request both speech recognition and microphone permissions before starting
live transcription. Add these keys to Info.plist:
NSSpeechRecognitionUsageDescriptionNSMicrophoneUsageDescriptionimport Speech
import AVFoundation
func requestPermissions() async -> Bool {
let speechStatus = await withCheckedContinuation { continuation in
SFSpeechRecognizer.requestAuthorization { status in
continuation.resume(returning: status)
}
}
guard speechStatus == .authorized else { return false }
let micStatus: Bool
if #available(iOS 17, *) {
micStatus = await AVAudioApplication.requestRecordPermission()
} else {
micStatus = await withCheckedContinuation { continuation in
AVAudioSession.sharedInstance().requestRecordPermission { granted in
continuation.resume(returning: granted)
}
}
}
return micStatus
}
The standard pattern: AVAudioEngine captures microphone audio → buffers are
appended to SFSpeechAudioBufferRecognitionRequest → results stream in.
import Speech
import AVFoundation
final class LiveTranscriber {
private let recognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))!
private let audioEngine = AVAudioEngine()
private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
private var recognitionTask: SFSpeechRecognitionTask?
func startTranscribing() throws {
// Cancel any in-progress task
recognitionTask?.cancel()
recognitionTask = nil
// Configure audio session
let audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
// Create request
let request = SFSpeechAudioBufferRecognitionRequest()
request.shouldReportPartialResults = true
self.recognitionRequest = request
// Start recognition task
recognitionTask = recognizer.recognitionTask(with: request) { result, error in
if let result {
let text = result.bestTranscription.formattedString
print("Transcription: \(text)")
if result.isFinal {
self.stopTranscribing()
}
}
if let error {
print("Recognition error: \(error)")
self.stopTranscribing()
}
}
// Install audio tap
let inputNode = audioEngine.inputNode
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) {
buffer, _ in
request.append(buffer)
}
audioEngine.prepare()
try audioEngine.start()
}
func stopTranscribing() {
audioEngine.stop()
audioEngine.inputNode.removeTap(onBus: 0)
recognitionRequest?.endAudio()
recognitionRequest = nil
recognitionTask?.cancel()
recognitionTask = nil
}
}
Use SFSpeechURLRecognitionRequest for audio files on disk:
func transcribeFile(at url: URL) async throws -> String {
guard let recognizer = SFSpeechRecognizer(), recognizer.isAvailable else {
throw SpeechError.unavailable
}
let request = SFSpeechURLRecognitionRequest(url: url)
request.shouldReportPartialResults = false
return try await withCheckedThrowingContinuation { continuation in
var didResume = false
recognizer.recognitionTask(with: request) { result, error in
guard !didResume else { return }
if let error {
didResume = true
continuation.resume(throwing: error)
} else if let result, result.isFinal {
didResume = true
continuation.resume(
returning: result.bestTranscription.formattedString
)
}
}
}
}
SFSpeechRecognizer can use on-device recognition for supported locales on
iOS 13+. If supportsOnDeviceRecognition is false, the recognizer requires a
network connection. requiresOnDeviceRecognition only has effect when the
recognizer supports it.
let recognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))!
// Check if on-device is supported for this locale
if recognizer.supportsOnDeviceRecognition {
let request = SFSpeechAudioBufferRecognitionRequest()
request.requiresOnDeviceRecognition = true // Force on-device
}
SFSpeechRecognizer requests may still be a poor fit for long-form capture.
Apple documents a roughly one-minute task limit for speech recognition and
other service limits. For long recordings on iOS 26+, prefer SpeechAnalyzer;
otherwise chunk or restart recognition before the limit and preserve transcript
state across tasks.
let request = SFSpeechAudioBufferRecognitionRequest()
request.shouldReportPartialResults = true // default is true
recognizer.recognitionTask(with: request) { result, error in
guard let result else { return }
if result.isFinal {
// Final transcription — recognition is complete
let final = result.bestTranscription.formattedString
} else {
// Partial result — may change as more audio is processed
let partial = result.bestTranscription.formattedString
}
}
recognizer.recognitionTask(with: request) { result, error in
guard let result else { return }
// Best transcription
let best = result.bestTranscription
// All alternatives (sorted by confidence, descending)
for transcription in result.transcriptions {
for segment in transcription.segments {
print("\(segment.substring): \(segment.confidence)")
}
}
}
let request = SFSpeechAudioBufferRecognitionRequest()
request.addsPunctuation = true
Improve recognition of domain-specific terms:
let request = SFSpeechAudioBufferRecognitionRequest()
request.contextualStrings = ["SwiftUI", "Xcode", "CloudKit"]
// ❌ DON'T: Only request speech authorization for live audio
SFSpeechRecognizer.requestAuthorization { status in
// Missing microphone permission — audio engine will fail
self.startRecording()
}
// ✅ DO: Request both permissions before recording
SFSpeechRecognizer.requestAuthorization { status in
guard status == .authorized else { return }
AVAudioSession.sharedInstance().requestRecordPermission { granted in
guard granted else { return }
self.startRecording()
}
}
// ❌ DON'T: Assume recognizer stays available after initial check
let recognizer = SFSpeechRecognizer()!
// Recognition may fail if network drops or locale changes
// ✅ DO: Monitor availability via delegate
recognizer.delegate = self
func speechRecognizer(
_ speechRecognizer: SFSpeechRecognizer,
availabilityDidChange available: Bool
) {
recordButton.isEnabled = available
}
// ❌ DON'T: Leave audio engine running after recognition finishes
recognizer.recognitionTask(with: request) { result, error in
if result?.isFinal == true {
// Audio engine still running, wasting resources and battery
}
}
// ✅ DO: Clean up all audio resources
recognizer.recognitionTask(with: request) { result, error in
if result?.isFinal == true || error != nil {
self.audioEngine.stop()
self.audioEngine.inputNode.removeTap(onBus: 0)
self.recognitionRequest?.endAudio()
self.recognitionRequest = nil
}
}
// ❌ DON'T: Force on-device without checking support
let request = SFSpeechAudioBufferRecognitionRequest()
request.requiresOnDeviceRecognition = true // Ignored unless the recognizer supports it
// ✅ DO: Check support before requiring on-device
if recognizer.supportsOnDeviceRecognition {
request.requiresOnDeviceRecognition = true
} else {
// Fall back to server-based or inform user
}
// ❌ DON'T: Start one long continuous recognition session
func startRecording() {
// SFSpeechRecognizer tasks can be cut off after about 60 seconds
}
// ✅ DO: roll the segment before the limit and let cleanup end audio once
func scheduleRecognitionRollover() {
recognitionTimer = Timer.scheduledTimer(withTimeInterval: 55, repeats: false) { [weak self] _ in
self?.commitLatestPartialText()
self?.stopTranscribing() // owns endAudio(), tap removal, and task cancellation
try? self?.startTranscribing()
}
}
SFSpeechRecognitionTask exposes finish(), cancel(), state, and error;
do not invent task properties such as recognitionTask to restart work. Keep
the active SFSpeechAudioBufferRecognitionRequest in your manager and call
endAudio() from one cleanup path only.
// ❌ DON'T: Only finish the AsyncStream and expect result streams to close
inputBuilder.finish()
// ✅ DO: explicitly finish or cancel the analyzer session
let lastSampleTime = try await analyzer.analyzeSequence(inputSequence)
if let lastSampleTime {
try await analyzer.finalizeAndFinish(through: lastSampleTime)
} else {
try analyzer.cancelAndFinishNow()
}
// ✅ Replace volatile text with the finalized result for the same audio range
for try await result in transcriber.results {
if result.isFinal {
volatileTranscript = AttributedString()
finalizedTranscript.append(result.text)
} else {
volatileTranscript = result.text
}
}
// ❌ DON'T: Start a new task without canceling the previous one
func startRecording() {
recognitionTask = recognizer.recognitionTask(with: request) { ... }
// Previous task is still running — undefined behavior
}
// ✅ DO: Cancel existing task before creating a new one
func startRecording() {
recognitionTask?.cancel()
recognitionTask = nil
recognitionTask = recognizer.recognitionTask(with: request) { ... }
}
NSSpeechRecognitionUsageDescription is in Info.plistNSMicrophoneUsageDescription is in Info.plist (if using live audio)SFSpeechRecognizerDelegate is set to handle availabilityDidChangerecognitionRequest.endAudio() is called when done recordingrecognitionTask is canceled before starting a new onesupportsOnDeviceRecognition is checked before requiring on-device modeisFinal) resultsSFSpeechRecognizer one-minute/service limits are accounted forAssetInventory assets are installed before using SpeechAnalyzerSpeechTranscriber.isAvailable and locale support are checkeddevelopment
Implement, review, or improve data visualizations using Swift Charts. Use when building bar, line, area, point, pie, donut, or iOS 26 3D charts; when adding chart selection, scrolling, annotations, axes, scales, legends, or foregroundStyle grouping; when plotting functions with BarPlot, LinePlot, AreaPlot, PointPlot, Chart3D, or SurfacePlot; or when creating heat maps, Gantt charts, grouped bars, sparklines, threshold lines, or spatial visualizations.
data-ai
Select, implement, or migrate between app architecture patterns for Apple platform apps. Use when choosing between MV (Model-View with @Observable), MVVM, MVI, TCA (The Composable Architecture), Clean Architecture, VIPER, or Coordinator patterns; when evaluating architecture fit for a feature's complexity; when migrating from one pattern to another; or when reviewing whether an app's current architecture is appropriate. Scoped to Apple-platform patterns using Swift 6.3, SwiftUI, and UIKit.
development
Apply Swift API Design Guidelines to name, label, and document Swift APIs. Covers argument label rules (prepositional phrase rule, grammatical phrase rule, first-label omission), mutating/nonmutating pair naming (-ed/-ing participle pattern, form- prefix, sort/sorted, formUnion/union), side-effect naming (noun for pure, verb for mutating), documentation comment structure (summary by declaration kind, O(1) complexity rule), clarity at call site, role-based naming, protocol naming (-able/-ible/-ing), default arguments over method families, casing conventions, and terminology. Use when designing new Swift APIs, reviewing naming and argument labels, writing documentation comments, or refactoring for call site clarity.
development
Implement, review, or improve in-app purchases and subscriptions using StoreKit 2. Use when building paywalls with SubscriptionStoreView or ProductView, processing transactions with Product and Transaction APIs, verifying entitlements, handling purchase flows (consumable, non-consumable, auto-renewable), implementing offer codes or promotional/win-back/introductory offers, managing subscription status and renewal state, setting up StoreKit testing with configuration files, or integrating Family Sharing, Ask to Buy, refund handling, and billing retry logic.