Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

aradotso/parlor-on-device-ai

Name: parlor-on-device-ai
Author: aradotso

skills/parlor-on-device-ai/SKILL.md

npx skillsauth add aradotso/trending-skills parlor-on-device-ai

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Parlor On-Device AI

Skill by ara.so — Daily 2026 Skills collection.

Parlor is a real-time, on-device multimodal AI assistant. It combines Gemma 4 E2B (via LiteRT-LM) for speech and vision understanding with Kokoro TTS for voice output. Everything runs locally — no API keys, no cloud calls, no cost per request.

Architecture

Browser (mic + camera)
    │
    │  WebSocket (audio PCM + JPEG frames)
    ▼
FastAPI server
    ├── Gemma 4 E2B via LiteRT-LM (GPU)  →  understands speech + vision
    └── Kokoro TTS (MLX on Mac, ONNX on Linux)  →  speaks back
    │
    │  WebSocket (streamed audio chunks)
    ▼
Browser (playback + transcript)

Key features:

Silero VAD in browser — hands-free, no push-to-talk
Barge-in — interrupt AI mid-sentence by speaking
Sentence-level TTS streaming — audio starts before full response is ready
Platform-aware TTS — MLX backend on Apple Silicon, ONNX on Linux

Requirements

Python 3.12+
macOS with Apple Silicon or Linux with a supported GPU
~3 GB free RAM
uv package manager

Installation

git clone https://github.com/fikrikarim/parlor.git
cd parlor

# Install uv if needed
curl -LsSf https://astral.sh/uv/install.sh | sh

cd src
uv sync
uv run server.py

Open http://localhost:8000, grant camera and microphone permissions, and start talking.

Models download automatically on first run (~2.6 GB for Gemma 4 E2B, plus TTS models).

Configuration

Set environment variables before running:

# Use a pre-downloaded model instead of auto-downloading
export MODEL_PATH=/path/to/gemma-4-E2B-it.litertlm

# Change server port (default: 8000)
export PORT=9000

uv run server.py

| Variable | Default | Description | |--------------|-------------------------------|------------------------------------------------| | MODEL_PATH | auto-download from HuggingFace | Path to local .litertlm model file | | PORT | 8000 | Server port |

Project Structure

src/
├── server.py              # FastAPI WebSocket server + Gemma 4 inference
├── tts.py                 # Platform-aware TTS (MLX on Mac, ONNX on Linux)
├── index.html             # Frontend UI (VAD, camera, audio playback)
├── pyproject.toml         # Dependencies
└── benchmarks/
    ├── bench.py           # End-to-end WebSocket benchmark
    └── benchmark_tts.py   # TTS backend comparison

Key Components

server.py — FastAPI WebSocket Server

The server handles two WebSocket connections: one for receiving audio/video from the browser, one for streaming audio back.

# Simplified pattern from server.py
from fastapi import FastAPI, WebSocket
import asyncio

app = FastAPI()

@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
    async for data in websocket.iter_bytes():
        # data contains PCM audio + optional JPEG frame
        response_text = await run_gemma_inference(data)
        audio_chunks = await run_tts(response_text)
        for chunk in audio_chunks:
            await websocket.send_bytes(chunk)

tts.py — Platform-Aware TTS

Kokoro TTS selects backend based on platform:

# tts.py uses platform detection
import platform

def get_tts_backend():
    if platform.system() == "Darwin":
        # Apple Silicon: use MLX backend for GPU acceleration
        from kokoro_mlx import KokoroMLX
        return KokoroMLX()
    else:
        # Linux: use ONNX backend
        from kokoro import KokoroPipeline
        return KokoroPipeline(lang_code='a')

tts = get_tts_backend()

# Sentence-level streaming — yields audio as each sentence is ready
async def synthesize_streaming(text: str):
    for sentence in split_sentences(text):
        audio = tts.synthesize(sentence)
        yield audio

Gemma 4 E2B Inference via LiteRT-LM

# LiteRT-LM inference pattern
from litert_lm import LiteRTLM
import os

model_path = os.environ.get("MODEL_PATH", None)

# Auto-downloads if MODEL_PATH not set
model = LiteRTLM.from_pretrained(
    "google/gemma-4-E2B-it",
    local_path=model_path
)

async def run_gemma_inference(audio_pcm: bytes, image_jpeg: bytes = None):
    inputs = {"audio": audio_pcm}
    if image_jpeg:
        inputs["image"] = image_jpeg
    
    response = ""
    async for token in model.generate_stream(**inputs):
        response += token
    return response

Running Benchmarks

cd src

# End-to-end WebSocket latency benchmark
uv run benchmarks/bench.py

# Compare TTS backends (MLX vs ONNX)
uv run benchmarks/benchmark_tts.py

Performance Reference (Apple M3 Pro)

| Stage | Time | |----------------------------------|---------------| | Speech + vision understanding | ~1.8–2.2s | | Response generation (~25 tokens) | ~0.3s | | Text-to-speech (1–3 sentences) | ~0.3–0.7s | | Total end-to-end | ~2.5–3.0s |

Decode speed: ~83 tokens/sec on GPU.

Common Patterns

Extending the System Prompt

Modify the prompt in server.py to change the AI's persona or task:

SYSTEM_PROMPT = """You are a helpful language tutor. 
Respond conversationally in 1-3 sentences.
If the user makes a grammar mistake, gently correct them.
You can see through the user's camera and discuss what you observe."""

Adding a New Language for TTS

Kokoro supports multiple language codes. Set lang_code in tts.py:

# Language codes: 'a' = American English, 'b' = British English
# 'e' = Spanish, 'f' = French, 'z' = Chinese, 'j' = Japanese
pipeline = KokoroPipeline(lang_code='e')  # Spanish

Customizing VAD Sensitivity (index.html)

The Silero VAD threshold can be tuned in the frontend:

// In index.html — lower positiveSpeechThreshold = more sensitive
const vad = await MicVAD.new({
  positiveSpeechThreshold: 0.6,   // default ~0.8, lower = triggers more easily
  negativeSpeechThreshold: 0.35,  // how quickly it stops detecting speech
  minSpeechFrames: 3,
  onSpeechStart: () => { /* UI feedback */ },
  onSpeechEnd: (audio) => sendAudioToServer(audio),
});

Sending Frames Programmatically (WebSocket Client Example)

import asyncio
import websockets
import json
import base64

async def send_audio_frame(audio_pcm_bytes: bytes, jpeg_bytes: bytes = None):
    uri = "ws://localhost:8000/ws"
    async with websockets.connect(uri) as ws:
        payload = {
            "audio": base64.b64encode(audio_pcm_bytes).decode(),
        }
        if jpeg_bytes:
            payload["image"] = base64.b64encode(jpeg_bytes).decode()
        
        await ws.send(json.dumps(payload))
        
        # Receive streamed audio response
        async for message in ws:
            audio_chunk = message  # raw PCM bytes
            # play or save audio_chunk

Troubleshooting

Model download fails

# Pre-download manually via huggingface_hub
uv run python -c "
from huggingface_hub import hf_hub_download
path = hf_hub_download('google/gemma-4-E2B-it', 'gemma-4-E2B-it.litertlm')
print(path)
"
export MODEL_PATH=/path/shown/above
uv run server.py

Microphone/camera not working in browser

Must access via http://localhost (not IP address) — browsers block media APIs on non-localhost HTTP
Check browser permissions: address bar → lock icon → reset permissions

TTS not loading on Linux

# Ensure ONNX runtime is installed
uv add onnxruntime
# Or for GPU:
uv add onnxruntime-gpu

High latency or slow inference

Verify GPU is being used: check for Metal (Mac) or CUDA (Linux) in startup logs
Close other GPU-heavy applications
On Linux, confirm CUDA drivers match installed onnxruntime-gpu version

Port already in use

export PORT=8080
uv run server.py
# Or kill the existing process:
lsof -ti:8000 | xargs kill

`uv sync` fails — Python version mismatch

# Parlor requires Python 3.12+
python3 --version
# Install 3.12 via pyenv or system package manager, then:
uv python pin 3.12
uv sync

Dependencies (pyproject.toml)

Key packages installed by uv sync:

litert-lm — Google AI Edge inference runtime for Gemma
fastapi + uvicorn — async web/WebSocket server
kokoro — Kokoro TTS ONNX backend
kokoro-mlx — Kokoro TTS MLX backend (Mac only)
silero-vad — voice activity detection (browser-side via CDN)
huggingface-hub — model auto-download

aradotso/parlor-on-device-ai

skills/parlor-on-device-ai/SKILL.md

On-device, real-time multimodal AI voice and vision assistant powered by Gemma 4 E2B and Kokoro TTS, running entirely locally via FastAPI WebSocket server.

33 stars

development

Updated Apr 18, 2026

$ install --global

skillsauth

npx skillsauth add aradotso/trending-skills parlor-on-device-ai

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 25, 2026, 9:36 PM1.8s1 file scanned

SKILL.md

name:: parlor-on-device-ai
description:: On-device, real-time multimodal AI voice and vision assistant powered by Gemma 4 E2B and Kokoro TTS, running entirely locally via FastAPI WebSocket server.

Parlor On-Device AI

Skill by ara.so — Daily 2026 Skills collection.

Architecture

Browser (mic + camera)
    │
    │  WebSocket (audio PCM + JPEG frames)
    ▼
FastAPI server
    ├── Gemma 4 E2B via LiteRT-LM (GPU)  →  understands speech + vision
    └── Kokoro TTS (MLX on Mac, ONNX on Linux)  →  speaks back
    │
    │  WebSocket (streamed audio chunks)
    ▼
Browser (playback + transcript)

Key features:

Silero VAD in browser — hands-free, no push-to-talk
Barge-in — interrupt AI mid-sentence by speaking
Sentence-level TTS streaming — audio starts before full response is ready
Platform-aware TTS — MLX backend on Apple Silicon, ONNX on Linux

Requirements

Python 3.12+
macOS with Apple Silicon or Linux with a supported GPU
~3 GB free RAM
uv package manager

Installation

git clone https://github.com/fikrikarim/parlor.git
cd parlor

# Install uv if needed
curl -LsSf https://astral.sh/uv/install.sh | sh

cd src
uv sync
uv run server.py

Open http://localhost:8000, grant camera and microphone permissions, and start talking.

Models download automatically on first run (~2.6 GB for Gemma 4 E2B, plus TTS models).

Configuration

Set environment variables before running:

# Use a pre-downloaded model instead of auto-downloading
export MODEL_PATH=/path/to/gemma-4-E2B-it.litertlm

# Change server port (default: 8000)
export PORT=9000

uv run server.py

Project Structure

src/
├── server.py              # FastAPI WebSocket server + Gemma 4 inference
├── tts.py                 # Platform-aware TTS (MLX on Mac, ONNX on Linux)
├── index.html             # Frontend UI (VAD, camera, audio playback)
├── pyproject.toml         # Dependencies
└── benchmarks/
    ├── bench.py           # End-to-end WebSocket benchmark
    └── benchmark_tts.py   # TTS backend comparison

Key Components

server.py — FastAPI WebSocket Server

The server handles two WebSocket connections: one for receiving audio/video from the browser, one for streaming audio back.

# Simplified pattern from server.py
from fastapi import FastAPI, WebSocket
import asyncio

app = FastAPI()

@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
    async for data in websocket.iter_bytes():
        # data contains PCM audio + optional JPEG frame
        response_text = await run_gemma_inference(data)
        audio_chunks = await run_tts(response_text)
        for chunk in audio_chunks:
            await websocket.send_bytes(chunk)

tts.py — Platform-Aware TTS

Kokoro TTS selects backend based on platform:

# tts.py uses platform detection
import platform

def get_tts_backend():
    if platform.system() == "Darwin":
        # Apple Silicon: use MLX backend for GPU acceleration
        from kokoro_mlx import KokoroMLX
        return KokoroMLX()
    else:
        # Linux: use ONNX backend
        from kokoro import KokoroPipeline
        return KokoroPipeline(lang_code='a')

tts = get_tts_backend()

# Sentence-level streaming — yields audio as each sentence is ready
async def synthesize_streaming(text: str):
    for sentence in split_sentences(text):
        audio = tts.synthesize(sentence)
        yield audio

Gemma 4 E2B Inference via LiteRT-LM

# LiteRT-LM inference pattern
from litert_lm import LiteRTLM
import os

model_path = os.environ.get("MODEL_PATH", None)

# Auto-downloads if MODEL_PATH not set
model = LiteRTLM.from_pretrained(
    "google/gemma-4-E2B-it",
    local_path=model_path
)

async def run_gemma_inference(audio_pcm: bytes, image_jpeg: bytes = None):
    inputs = {"audio": audio_pcm}
    if image_jpeg:
        inputs["image"] = image_jpeg
    
    response = ""
    async for token in model.generate_stream(**inputs):
        response += token
    return response

Running Benchmarks

cd src

# End-to-end WebSocket latency benchmark
uv run benchmarks/bench.py

# Compare TTS backends (MLX vs ONNX)
uv run benchmarks/benchmark_tts.py

Performance Reference (Apple M3 Pro)

Decode speed: ~83 tokens/sec on GPU.

Common Patterns

Extending the System Prompt

Modify the prompt in server.py to change the AI's persona or task:

SYSTEM_PROMPT = """You are a helpful language tutor. 
Respond conversationally in 1-3 sentences.
If the user makes a grammar mistake, gently correct them.
You can see through the user's camera and discuss what you observe."""

Adding a New Language for TTS

Kokoro supports multiple language codes. Set lang_code in tts.py:

# Language codes: 'a' = American English, 'b' = British English
# 'e' = Spanish, 'f' = French, 'z' = Chinese, 'j' = Japanese
pipeline = KokoroPipeline(lang_code='e')  # Spanish

Customizing VAD Sensitivity (index.html)

The Silero VAD threshold can be tuned in the frontend:

// In index.html — lower positiveSpeechThreshold = more sensitive
const vad = await MicVAD.new({
  positiveSpeechThreshold: 0.6,   // default ~0.8, lower = triggers more easily
  negativeSpeechThreshold: 0.35,  // how quickly it stops detecting speech
  minSpeechFrames: 3,
  onSpeechStart: () => { /* UI feedback */ },
  onSpeechEnd: (audio) => sendAudioToServer(audio),
});

Sending Frames Programmatically (WebSocket Client Example)

import asyncio
import websockets
import json
import base64

async def send_audio_frame(audio_pcm_bytes: bytes, jpeg_bytes: bytes = None):
    uri = "ws://localhost:8000/ws"
    async with websockets.connect(uri) as ws:
        payload = {
            "audio": base64.b64encode(audio_pcm_bytes).decode(),
        }
        if jpeg_bytes:
            payload["image"] = base64.b64encode(jpeg_bytes).decode()
        
        await ws.send(json.dumps(payload))
        
        # Receive streamed audio response
        async for message in ws:
            audio_chunk = message  # raw PCM bytes
            # play or save audio_chunk

Troubleshooting

Model download fails

# Pre-download manually via huggingface_hub
uv run python -c "
from huggingface_hub import hf_hub_download
path = hf_hub_download('google/gemma-4-E2B-it', 'gemma-4-E2B-it.litertlm')
print(path)
"
export MODEL_PATH=/path/shown/above
uv run server.py

Microphone/camera not working in browser

Must access via http://localhost (not IP address) — browsers block media APIs on non-localhost HTTP
Check browser permissions: address bar → lock icon → reset permissions

TTS not loading on Linux

# Ensure ONNX runtime is installed
uv add onnxruntime
# Or for GPU:
uv add onnxruntime-gpu

High latency or slow inference

Verify GPU is being used: check for Metal (Mac) or CUDA (Linux) in startup logs
Close other GPU-heavy applications
On Linux, confirm CUDA drivers match installed onnxruntime-gpu version

Port already in use

export PORT=8080
uv run server.py
# Or kill the existing process:
lsof -ti:8000 | xargs kill

`uv sync` fails — Python version mismatch

# Parlor requires Python 3.12+
python3 --version
# Install 3.12 via pyenv or system package manager, then:
uv python pin 3.12
uv sync

Dependencies (pyproject.toml)

Key packages installed by uv sync:

litert-lm — Google AI Edge inference runtime for Gemma
fastapi + uvicorn — async web/WebSocket server
kokoro — Kokoro TTS ONNX backend
kokoro-mlx — Kokoro TTS MLX backend (Mac only)
silero-vad — voice activity detection (browser-side via CDN)
huggingface-hub — model auto-download

Related Skills

aradotso/skills/compose-performance-skills

development

VerifiedTrustedCommunity

```markdown --- name: compose-performance-skills description: Install and use the skydoves/compose-performance-skills agent skill library to diagnose and fix Jetpack Compose performance issues including stability, recomposition, lazy layouts, modifiers, side effects, and build configuration. triggers: - "my composable recomposes too often" - "LazyColumn drops frames during scroll" - "diagnose Compose stability issues" - "fix unnecessary recomposition in Jetpack Compose" - "optimize Com

46SKILL.mdUpdated May 5, 2026

aradotso/skills/compose-performance-skills

aradotso/baguette-ios-simulator

development

VerifiedTrustedCommunity

Headless iOS Simulator manager with host-side HID input injection, 60fps streaming, and device farm web UI for iOS 26

45SKILL.mdUpdated May 4, 2026

aradotso/baguette-ios-simulator

aradotso/skills/claude-code-game-studios

development

VerifiedTrustedCommunity

```markdown --- name: claude-code-game-studios description: Turn Claude Code into a full 49-agent game dev studio with 72 workflow skills, automated hooks, and a real studio hierarchy for Godot, Unity, and Unreal projects. triggers: - "set up claude code game studios" - "use ai agents for game development" - "set up game dev studio with claude" - "add game studio agents to my project" - "how do I use claude code for game dev" - "set up godot unity unreal ai workflow" - "49 agents g

43SKILL.mdUpdated May 3, 2026

aradotso/skills/claude-code-game-studios

aradotso/skills/xq-py-quantum-vm

development

VerifiedTrustedCommunity

```markdown --- name: xq-py-quantum-vm description: Python implementation of the Quip Network's quantum virtual machine (xqvm) triggers: - quantum virtual machine python - xqvm quip network - quantum circuit simulation python - xq-py quantum vm - quip network quantum python - simulate quantum gates python - quantum vm xqvm - xqvm-py quantum circuit --- # xq-py Quantum Virtual Machine > Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection. `xqvm-py` is a Python impl

42SKILL.mdUpdated May 2, 2026

aradotso/skills/xq-py-quantum-vm

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/aradotso/trending-skills.git

# Copy into Claude Code skills folder (global)
cp -r trending-skills/skills/parlor-on-device-ai ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

aradotso/trending-skills

33 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

aradotso/parlor-on-device-ai

$ install --global

Security Scan Results

SKILL.md

Parlor On-Device AI

Architecture

Requirements

Installation

Configuration

Project Structure

Key Components

server.py — FastAPI WebSocket Server

tts.py — Platform-Aware TTS

Gemma 4 E2B Inference via LiteRT-LM

Running Benchmarks

Performance Reference (Apple M3 Pro)

Common Patterns

Extending the System Prompt

Adding a New Language for TTS

Customizing VAD Sensitivity (index.html)

Sending Frames Programmatically (WebSocket Client Example)

Troubleshooting

Model download fails

Microphone/camera not working in browser

TTS not loading on Linux

High latency or slow inference

Port already in use

uv sync fails — Python version mismatch

Dependencies (pyproject.toml)

Related Skills

aradotso/skills/compose-performance-skills

aradotso/baguette-ios-simulator

aradotso/skills/claude-code-game-studios

aradotso/skills/xq-py-quantum-vm

aradotso/parlor-on-device-ai

$ install --global

Security Scan Results

SKILL.md

Parlor On-Device AI

Architecture

Requirements

Installation

Configuration

Project Structure

Key Components

server.py — FastAPI WebSocket Server

tts.py — Platform-Aware TTS

Gemma 4 E2B Inference via LiteRT-LM

Running Benchmarks

Performance Reference (Apple M3 Pro)

Common Patterns

Extending the System Prompt

Adding a New Language for TTS

Customizing VAD Sensitivity (index.html)

Sending Frames Programmatically (WebSocket Client Example)

Troubleshooting

Model download fails

Microphone/camera not working in browser

TTS not loading on Linux

High latency or slow inference

Port already in use

uv sync fails — Python version mismatch

Dependencies (pyproject.toml)

Related Skills

aradotso/skills/compose-performance-skills

aradotso/baguette-ios-simulator

aradotso/skills/claude-code-game-studios

aradotso/skills/xq-py-quantum-vm

`uv sync` fails — Python version mismatch

`uv sync` fails — Python version mismatch