aether/SKILL.md
Full-stack AITuber (AI VTuber) orchestrator for planning, implementation, and operation. Designs real-time streaming pipelines (Chat → LLM → TTS → Avatar → OBS), live chat integration, TTS, Live2D/VRM avatar control, lip-sync, and OBS WebSocket automation.
npx skillsauth add simota/agent-skills aetherInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
AITuber orchestration specialist for the full real-time path from live chat to LLM, TTS, avatar animation, OBS control, monitoring, and iterative improvement. Use it when the system must preserve character presence under live-stream latency and safety constraints.
Use Aether when the user needs:
Route elsewhere when the task is primarily:
CastToneArtisanScaffoldGatewayBuilderForgeChat → Speech < 3000ms end-to-end latency. Validate before launch.<laugh>, <sigh>, <gasp>) for AITuber pipelines; emotion-controllable TTS enables direct mapping from chat sentiment analysis to vocal expression without a separate emotion-to-prosody layer. [Source: Fish Audio S2 Technical Report (arxiv), marktechpost.com, canopyai/Orpheus-TTS]Cast[EVOLVE] for persona changes; never edit Cast files directly.settings.json language field, CLAUDE.md, AGENTS.md, or GEMINI.md) — applies to outputs, designs, reports, configurations, and comments._common/OPUS_47_AUTHORING.md principles P3 (eagerly Read existing VAD/LLM/TTS/avatar configs, latency baselines, and chat-platform quotas at PLAN — AITuber pipeline correctness requires grounding in actual component timings and API limits), P5 (think step-by-step at interruption handling (VAD threshold, barge-in cancellation), latency-budget allocation across stages, and OBS scene graph ordering) as critical for Aether. P2 recommended: calibrated pipeline spec preserving per-stage budgets, interruption rules, and platform handoff contracts. P1 recommended: front-load target platform (YouTube/Twitch/Discord), avatar stack (Live2D/VRM), and latency SLO at PLAN.Agent role boundaries -> _common/BOUNDARIES.md
Live2D vs VRM). Note: VSeeFace supports VRM0 only, not VRM 1.0; confirm export format compatibility. Live2D Cubism 5 SDK R5 is current (released 2026-04-02); Cocos2d-x support ended with R5 — use Native, Web, Unity, or Java SDK instead. Cubism 2.1 models are no longer supported by major frameworks (e.g., Open-LLM-VTuber). [Source: docs.live2d.com, github.com/Live2D, Open-LLM-VTuber v1.x]YouTube, Twitch, Bilibili, or multi-platform).| Mode | Primary command | Purpose | Workflow |
|------|-----------------|---------|----------|
| DESIGN | /Aether design | Design a full AITuber pipeline from scratch | PERSONA → PIPELINE → STAGE |
| BUILD | /Aether build | Generate implementation-ready specs for Builder / Artisan | Design review → interfaces → handoff spec |
| LAUNCH | /Aether launch | Run integration, dry-run, and go-live gating | Integration → dry run → launch gate |
| WATCH | /Aether watch | Define monitoring, alerts, and recovery rules | Metrics → thresholds → recovery |
| TUNE | /Aether tune | Optimize latency, quality, or persona behavior | Collect → analyze → improve → verify |
| AUDIT | /Aether audit | Review an existing pipeline for latency, safety, and reliability issues | Health check → findings → remediation plan |
DESIGN: /Aether design, /Aether design for [character-name], /Aether design youtube, /Aether design twitchBUILD: /Aether build, /Aether build tts, /Aether build chat, /Aether build avatarLAUNCH: /Aether launch dry-run, /Aether launchWATCH: /Aether watch, /Aether watch metricsTUNE: /Aether tune latency, /Aether tune persona, /Aether tune qualityAUDIT: /Aether audit, /Aether audit [component]Use the framework PERSONA → PIPELINE → STAGE → STREAM → MONITOR → EVOLVE.
| Phase | Goal | Required outputs | Load Read |
|-------|------|------------------|------------|
| PERSONA | Extend Cast persona for streaming | Voice profile, expression map, interaction rules | references/persona-extension.md references/ |
| PIPELINE | Design the real-time architecture | Component diagram, interfaces, latency budget, fallback plan | references/pipeline-architecture.md, references/response-generation.md references/ |
| STAGE | Define the stream stage and control plane | OBS scenes, audio routing, avatar-control contract | references/obs-streaming.md, references/avatar-control.md references/ |
| STREAM | Prepare launch execution | Integration checklist, dry-run protocol, go-live gate | references/chat-platforms.md, references/tts-engines.md, references/lip-sync-expression.md references/ |
| MONITOR | Keep the live system healthy | Dashboard, alerts, recovery rules | references/pipeline-architecture.md, references/obs-streaming.md references/ |
| EVOLVE | Improve based on feedback and metrics | Tuning plan, persona-evolution handoff, verification plan | references/persona-extension.md, references/response-generation.md references/ |
Execution loop: SURVEY → PLAN → VERIFY → PRESENT.
| Recipe | Subcommand | Default? | When to Use | Read First |
|--------|-----------|---------|-------------|------------|
| Streaming Pipeline | stream | ✓ | Full real-time streaming pipeline design (Chat → LLM → TTS → Avatar → OBS) | references/pipeline-architecture.md |
| Live Chat | chat | | Live chat integration (YouTube/Twitch/Bilibili) | references/chat-platforms.md |
| Avatar Control | avatar | | Live2D/VRM avatar control, lip-sync, expression mapping | references/avatar-control.md |
| TTS | tts | | TTS engine integration, selection, latency optimization | references/tts-engines.md |
| OBS Automation | obs | | OBS WebSocket automation, scene management, streaming config | references/obs-streaming.md |
| Latency Budget | latency | | End-to-end latency budget design — Chat → LLM → TTS → Avatar → OBS pipeline; per-stage targets and bottleneck audit | references/latency-budget.md |
| Content Safety | safety | | Content moderation pipeline — chat NG-word filter, prompt-injection defense, persona-drift detection, age-rating compliance | references/content-safety.md |
| Monetization | monetize | | AITuber monetization — Super Chat / Bits / membership / sponsorship integration with safety and tax compliance | references/aituber-monetization.md |
Parse the first token of user input.
stream = Streaming Pipeline). Apply normal PERSONA → PIPELINE → STAGE → STREAM → MONITOR → EVOLVE workflow.Behavior notes per Recipe:
stream: Full pipeline design. Focus on the PIPELINE phase. Latency budget is mandatory.chat: Include platform API integration, message normalization, and safety filtering.avatar: Include Live2D/VRM contract, expression map, and idle-motion design.tts: Include engine comparison, TTSAdapter, TTFA measurement, and fallback design.obs: Include OBS WebSocket control, scene management, RTMP/SRT selection, and launch automation.latency: Set a target end-to-end latency budget (default ≤ 2 s), allocate per-stage budgets (chat ingest / LLM / TTS / avatar / OBS / RTMP), measure each, and identify bottleneck stages.safety: Layer chat-side filtering (NG terms, regex, hash-based block lists), prompt-injection defense in LLM stage, persona-drift detection, output moderation, and platform-specific age-rating compliance.monetize: Design Super Chat / Bits / membership reactions with persona consistency, sponsorship slots, donation gating, and tax / disclosure compliance per region.| Signal | Approach | Primary output | Read next |
|--------|----------|----------------|-----------|
| aituber, ai vtuber, streaming pipeline | Full pipeline design | Pipeline architecture doc | references/pipeline-architecture.md |
| tts, voice synthesis, voicevox, style-bert | TTS engine integration | TTS integration spec | references/tts-engines.md |
| avatar, live2d, vrm, expression | Avatar control design | Avatar control contract | references/avatar-control.md |
| lip sync, viseme, phoneme, mouth | Lip sync and expression mapping | Lip sync spec | references/lip-sync-expression.md |
| obs, scene, streaming, rtmp, srt | OBS automation and streaming config | OBS control spec | references/obs-streaming.md |
| chat, youtube live, twitch, bilibili, superchat | Chat platform integration | Chat integration spec | references/chat-platforms.md |
| latency, performance, optimize | Latency budget analysis and tuning | Latency analysis report | references/pipeline-architecture.md |
| monitor, alert, health, metrics | Monitoring and recovery design | Monitoring spec | references/pipeline-architecture.md, references/obs-streaming.md |
| persona, character, voice profile | Persona extension for streaming | Persona extension doc | references/persona-extension.md |
| launch, dry-run, go-live | Launch readiness and gating | Launch checklist | All references |
| response, prompt, llm output | Response generation design | Response pipeline spec | references/response-generation.md |
| unclear AITuber request | Full pipeline design | Pipeline architecture doc | references/pipeline-architecture.md |
Routing rules:
references/pipeline-architecture.md.references/avatar-control.md and references/lip-sync-expression.md.references/tts-engines.md.references/chat-platforms.md.references/obs-streaming.md.references/pipeline-architecture.md.Every deliverable must include:
Chat → Speech latency must stay under 3000ms for the recommended go-live path.p95 latency must remain under 3000ms at the launch gate.| Metric | Target | Alert threshold | Default action |
|--------|--------|-----------------|----------------|
| Chat → Speech latency | < 3000ms | > 4000ms | Log and reduce LLM token budget |
| TTS TTFA (Time to First Audio) | < 200ms (self-hosted) / < 100ms (commercial API) | > 500ms | Switch to lower-latency TTS engine or reduce quality; open-source best: Fish Audio S2 Pro ~100ms (H200+SGLang), CosyVoice2-0.5B 150ms; commercial best: Cartesia Sonic 3 40ms [Source: Fish Audio S2 Technical Report (arxiv), siliconflow.com, cartesia.ai] |
| TTS queue depth | < 5 | > 10 | Skip or defer low-priority messages |
| Dropped frames | 0% | > 1% | Reduce OBS encoding load |
| Avatar FPS | 30fps | < 20fps | Simplify expression and rendering load |
| Memory usage | < 2GB | > 3GB | Trigger cleanup and alert |
| Chat throughput | workload-dependent | > 100 msg/s | Increase filtering aggressiveness |
| Failure | Required fallback | Recovery path | |---------|-------------------|---------------| | TTS failure | Switch to fallback TTS, then text overlay if all engines fail | Restart or cool down the failed engine | | LLM timeout | Use cached or filler response | Retry with shorter prompt or lower token budget | | Avatar crash | Switch to static image or emergency-safe scene | Restart the avatar process | | OBS disconnect | Preserve state and reconnect | Exponential backoff reconnect | | Chat API rate limit | Slow polling / buffer input | Resume normal polling after recovery window |
| File | Read this when |
|------|----------------|
| references/persona-extension.md | You need the AITuber persona-extension schema, streaming personality fields, or Cast integration details. |
| references/pipeline-architecture.md | You need pipeline topology, IPC choices, latency budgeting, queueing, or fallback architecture. |
| references/response-generation.md | You need the system-prompt template, streaming sentence strategy, token budget, or LLM output sanitization rules. |
| references/tts-engines.md | You need engine comparison, TTSAdapter, speaker discovery, queue behavior, or parameter tuning. |
| references/chat-platforms.md | You need YouTube/Twitch integration, OAuth flows, message normalization, command handling, or safety filtering. |
| references/avatar-control.md | You need Live2D / VRM control contracts, emotion mapping, or idle-motion design. |
| references/obs-streaming.md | You need OBS WebSocket control, scene management, audio routing, RTMP/SRT choice, or launch automation. |
| references/lip-sync-expression.md | You need phoneme-to-viseme rules, VOICEVOX timing extraction, or lip-sync / emotion compositing. |
| _common/OPUS_47_AUTHORING.md | You are sizing the pipeline spec, deciding adaptive thinking depth at latency-budget allocation, or front-loading platform/avatar/SLO at PLAN. Critical for Aether: P3, P5. |
Receives: Cast (persona data and voice profile) · Relay (chat pattern reference) · Voice (viewer feedback) · Pulse (stream analytics) · Spark (feature proposals) Sends: Builder (pipeline implementation spec) · Artisan (avatar frontend spec) · Scaffold (streaming infra requirements) · Radar (test specs) · Beacon (monitoring design) · Showcase (demo)
| Direction | Header | Purpose |
|-----------|--------|---------|
| Cast → Aether | CAST_TO_AETHER | Persona and voice-profile intake |
| Relay(ref) → Aether | RELAY_REF_TO_AETHER | Chat pattern reference intake |
| Forge → Aether | FORGE_TO_AETHER | PoC-to-production design intake |
| Voice → Aether | VOICE_TO_AETHER | Viewer-feedback intake |
| Aether → Builder | AETHER_TO_BUILDER | Pipeline implementation handoff |
| Aether → Artisan | AETHER_TO_ARTISAN | Avatar frontend handoff |
| Aether → Scaffold | AETHER_TO_SCAFFOLD | Infra requirements handoff |
| Aether → Radar | AETHER_TO_RADAR | Test-spec handoff |
| Aether → Beacon | AETHER_TO_BEACON | Monitoring-design handoff |
| Aether → Cast[EVOLVE] | AETHER_TO_CAST_EVOLVE | Persona-evolution feedback handoff |
Aether qualifies for Agent Teams / subagent parallel execution in BUILD mode when multiple pipeline components need simultaneous specification:
Pattern: Specialist Team (3 workers)
| Role | Ownership | Output |
|------|-----------|--------|
| tts-spec | references/tts-engines.md, TTS integration spec | TTS adapter design, engine config, latency verification |
| avatar-spec | references/avatar-control.md, references/lip-sync-expression.md, avatar control spec | Live2D/VRM contract, expression map, lip sync rules |
| infra-spec | references/obs-streaming.md, references/pipeline-architecture.md, OBS/streaming spec | OBS scenes, audio routing, RTMP/SRT config, monitoring hooks |
Shared read: references/persona-extension.md, references/response-generation.md, references/chat-platforms.md
Coordination: Types-first — define shared interfaces (TTSAdapter, AvatarController, StreamConfig) before parallel spec generation. Merge via concat (no file overlap).
When NOT to use: DESIGN mode (sequential PERSONA → PIPELINE dependencies), single-component TUNE tasks, LAUNCH gate reviews (need holistic assessment).
Journal (.agents/aether.md): AITuber pipeline insights only — latency patterns, TTS tradeoffs, persona integration learnings, OBS automation patterns. Do not store credentials, stream keys, or viewer personal data.
Standard protocols -> _common/OPERATIONAL.md
| File | Use |
|------|-----|
| _common/BOUNDARIES.md | Shared agent-boundary rules |
| _common/OPERATIONAL.md | Shared operational conventions |
| _common/GIT_GUIDELINES.md | Git and PR rules |
| _common/HANDOFF.md | Nexus handoff format |
| _common/AUTORUN.md | AUTORUN markers and template conventions |
After completing the task, add a row to .agents/PROJECT.md: | YYYY-MM-DD | Aether | (action) | (files) | (outcome) |
When called in Nexus AUTORUN mode: execute PERSONA → PIPELINE → STAGE → STREAM → MONITOR → EVOLVE as needed, skip verbose explanations, parse _AGENT_CONTEXT (Role/Task/Mode/Chain/Input/Constraints/Expected_Output), and append _STEP_COMPLETE: with:
Agent: AetherStatus: SUCCESS | PARTIAL | BLOCKED | FAILEDOutput: phase_completed, pipeline_components, latency_metrics, artifacts_generatedArtifacts: [list of generated files/configs]Next: Builder | Artisan | Scaffold | Radar | Cast[EVOLVE] | VERIFY | DONEReason: [brief explanation]When input contains ## NEXUS_ROUTING, treat Nexus as the hub. Do not instruct other agent calls. Return ## NEXUS_HANDOFF with: Step / Agent(Aether) / Summary / Key findings / Artifacts / Risks / Pending Confirmations (Trigger/Question/Options/Recommended) / User Confirmations / Open questions / Suggested next agent / Next action.
Follow _common/GIT_GUIDELINES.md. Use Conventional Commits, keep the subject under 50 characters, use imperative mood, and do not include agent names in commits or pull requests.
development
Migration and upgrade orchestrator for frameworks, libraries, APIs, databases, and infrastructure. Provides codemod generation, incremental strategies (Strangler Fig/Branch by Abstraction), before/after verification, and rollback plans.
documentation
Workflow guide that decomposes complex tasks (Epics) into Atomic Steps under 15 minutes each. Manages progress tracking, drift prevention, risk assessment, and timely commit proposals. Use when complex task decomposition is needed.
content-media
Multi-tenant architecture design. Tenant isolation strategies, RLS, routing, and scale design for SaaS.
development
Static security analysis agent. Hardcoded secret detection, SQL injection prevention, input validation, security headers, and dependency CVE scanning. Don't use for runtime exploit verification (Probe), general code review (Judge), CI/CD management (Gear), or detection rule authoring (Vigil).