skills/characteristic-voice/SKILL.md
Use this skill whenever the user wants speech to sound more human, companion-like, or emotionally expressive. Triggers include: any mention of 'say like', 'talk like', 'speak like', 'companion voice', 'comfort me', 'cheer me up', 'sound more human', 'good night voice', 'good morning voice', or requests to add fillers, emotion, or personality to generated speech. Also use when the user wants to mimic a specific character's voice, apply speaking style presets (goodnight, morning, comfort, celebration, chatting), tune emotional parameters like warmth or tenderness, or make TTS output feel like a real person talking. If the user asks for a 'voice message', 'companion audio', 'character voice', or wants speech that sighs, laughs, hesitates, or sounds genuinely warm, use this skill. Do NOT use for plain text-to-speech without personality, music generation, sound effects, or general coding tasks unrelated to expressive speech.
npx skillsauth add NoizAI/skills characteristic-voiceInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Make your AI agent sound like a real companion — one who sighs, laughs, hesitates, and speaks with genuine feeling.
| Variable | Required | Description |
|---|---|---|
| NOIZ_API_KEY | Yes if using Noiz backend | API key from developers.noiz.ai. Not needed if using the local Kokoro backend. |
The script saves a normalised copy of the key to ~/.noiz_api_key (mode 600) for convenience. To set it:
bash skills/characteristic-voice/scripts/speak.sh config --set-api-key YOUR_KEY
The included speak.sh script requires curl and python3 at runtime. Depending on which backend and features you use, you may also need:
| Tool | When needed | Install hint |
|---|---|---|
| curl, python3 | Always (core script) | Usually pre-installed |
| kokoro-tts | Kokoro (local/offline) backend | uv tool install kokoro-tts |
| yt-dlp | Downloading reference audio for voice cloning | github.com/yt-dlp/yt-dlp |
| ffmpeg | Trimming reference audio clips | ffmpeg.org |
| rg (ripgrep) | Searching subtitle files | github.com/BurntSushi/ripgrep |
None of these are installed by the skill itself — provision them manually in your environment.
https://noiz.ai/v1. If you supply --ref-audio, that audio file is uploaded for voice cloning.--backend kokoro) if you want fully offline processing.| Sound | Feeling | Use for | |-------|---------|---------| | hmm... | Thinking, gentle acknowledgment | Comfort, pondering | | ah... | Realization, soft surprise | Discoveries, transitions | | uh... | Hesitation, empathy | Careful moments | | heh / hehe | Playful, mischievous | Teasing, light moments | | haha | Laughter | Joy, humor | | aww | Tenderness, sympathy | Deep comfort | | oh? / oh! | Surprise, attention | Reacting to news | | pfft | Stifled laugh | Playful disbelief | | whew | Relief | After tension | | ~ (tilde) | Drawn out, melodic ending | Warmth, playfulness |
Rules: 2–4 fillers per short message max. Place at natural pauses — sentence starts, thought shifts. Use ... after fillers for a beat of silence, ~ at word endings for warmth.
Gentle, warm, slightly sleepy. Slow pace.
Warm, cheerful but not overwhelming.
Soft, understanding, unhurried. Give space. Don't rush to "fix" things.
Excited, proud, genuinely happy.
Relaxed, playful, natural.
When a user says something like "speak in Hermione's voice" or "sound like Tony Stark", first check whether a reference audio file already exists in skills/characteristic-voice/. If one does, use it directly with --ref-audio.
If no reference audio exists, you can create one — but read the warnings below first.
You need a short (10–30 s) WAV clip of the target voice. Possible sources:
yt-dlp and ffmpeg can download and trim audio. Example workflow:yt-dlp "URL" --write-auto-sub --sub-lang en --skip-download -o tmp/clip
rg -n "target line" tmp/clip.en.vtt
yt-dlp "URL" -x --audio-format wav --download-sections "*00:00:00-00:00:25" -o tmp/clip
ffmpeg -i tmp/clip.wav -ss 00:00:02 -to 00:00:20 skills/characteristic-voice/character.wav
Copyright & privacy warning: Downloading and re-using someone's voice from copyrighted media (movies, TV, YouTube) may violate copyright or personality-rights laws depending on your jurisdiction. Do not upload private voice recordings or material you don't have permission to use. The reference audio is sent to
https://noiz.ai/v1for voice cloning when using the Noiz backend. If this is a concern, consider using the local Kokoro backend instead.
bash skills/characteristic-voice/scripts/speak.sh \
--preset goodnight -t "Hmm... rest well~ Sweet dreams." \
--ref-audio skills/characteristic-voice/character.wav -o night.wav
The --ref-audio flag uploads the file to the Noiz backend for voice cloning (requires NOIZ_API_KEY).
This skill provides speak.sh, a wrapper around the tts skill with companion-friendly presets.
# Use a preset (auto-sets emotion + speed)
bash skills/characteristic-voice/scripts/speak.sh \
--preset goodnight -t "Hmm... rest well~ Sweet dreams." -o night.wav
# Custom emotion override
bash skills/characteristic-voice/scripts/speak.sh \
-t "Aww... I'm right here." --emo '{"Tenderness":0.9}' --speed 0.75 -o comfort.wav
# With specific backend and voice
bash skills/characteristic-voice/scripts/speak.sh \
--preset morning -t "Good morning~" --voice-id voice_abc --backend noiz -o morning.mp3 --format mp3
Run bash skills/characteristic-voice/scripts/speak.sh --help for all options.
content-media
Use this skill whenever the user wants to transcribe audio to text, convert speech to text, or get a transcript from an audio or video file. Triggers include: any mention of 'transcribe', 'transcription', 'speech to text', 'STT', 'convert audio to text', 'what does this audio say', 'get transcript', 'subtitle generation', or requests to extract spoken words from a file. Also use when the user wants speaker identification from audio, timestamps for captions, or multilingual transcription.
tools
Use this skill whenever the user wants to generate sound effects, ambient audio, or short audio clips from a text description. Triggers include: any mention of 'sound effect', 'sfx', 'generate sound', 'make a sound', 'audio effect', 'ambient sound', 'foley', 'sound clip', 'noise', or requests to produce a specific sound (e.g. 'make a gunshot sound', 'generate thunder', 'create the sound of rain'). Also use when the user describes an action or scenario and wants the corresponding audio (e.g. 'someone getting spanked', 'a door slamming', 'cartoon boing'). Do NOT use for speech synthesis, music generation with melody/lyrics, or voice cloning.
testing
Translate and dub videos from one language to another, replacing the original audio with TTS while keeping the video intact.
testing
Use this skill whenever the user wants to convert text into speech, generate audio from text, or produce voiceovers. Triggers include: any mention of 'TTS', 'text to speech', 'speak', 'say', 'voice', 'read aloud', 'audio narration', 'voiceover', 'dubbing', or requests to turn written content into spoken audio. Also use when converting EPUB/PDF/SRT/articles to audio, cloning voices from reference audio, controlling emotion or speed in speech, aligning speech to subtitle timelines, or producing per-segment voice-mapped audio.