skills/content-writing/prompt-library-image-audio-video/SKILL.md
--- name: prompt-library-image-audio-video description: Provides structured prompt frameworks and quality standards for AI-generated image, audio, video, and music content — ensuring all non-text AI assets meet the Golden Rule: output that looks, sounds, and feels like it was produced by the best skilled human creatives. Invoke when generating any AI-powered visual, audio, or video content for a client campaign, thought leadership asset, personalised outreach sequence, or brand content programme
npx skillsauth add peterbamuhigire/social-media-skills skills/content-writing/prompt-library-image-audio-videoInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
SKILL.md; do not skip mandatory steps or required fields.references/ directory is added later, treat its files as the deeper source material and keep this SKILL.md execution-focused.Ask for the following before generating any AI media prompts:
Every AI-generated asset — image, audio, or video — must meet the same standard: it must look, sound, and feel as though it was produced by a skilled human creative. The failure signatures differ by medium:
Each medium has its own prompt structure to prevent these failures.
See image-prompt-engineer for the full Eight-Layer Prompt Anatomy, negative prompt library, platform technical parameters, and cultural accuracy review protocol.
Summary for this skill:
Cross-reference image-prompt-engineer for full platform syntax (Midjourney, DALL-E 3, Stable Diffusion, Flux, Adobe Firefly).
Tools: ElevenLabs, Murf, Resemble AI, Descript, NotebookLM (podcast-style audio)
Four factors cause AI audio to sound robotic. Address each before generating:
1. Sentence length. AI voices stumble on sentences over 25 words. Break long sentences before entering text into a TTS tool.
2. Punctuation as prosody. Commas create micro-pauses; em dashes create dramatic pauses; ellipses suggest hesitation. Use punctuation deliberately to shape the spoken rhythm of the script — not just for grammar.
3. Consonant clustering. Consecutive words starting with the same consonant produce robotic, over-emphasised stress. Rewrite alliterative sequences before generating.
4. Speaking style selection. Choose a speaking style explicitly in platform settings — "conversational," "authoritative," "warm and friendly" — never default to the neutral tone.
Script for [platform: LinkedIn video / Instagram Reel / explainer video / podcast]
Duration target: [X seconds / X minutes]
Speaking style: [conversational and warm / authoritative and clear / enthusiastic and energetic]
Audience: [describe the target listener — role, location, primary concern]
Core message: [one sentence — the single thing the listener must remember]
CTA at end: [specific instruction — WhatsApp number, website URL, or next step]
---
[Script text — write in spoken register, not written register.
Short sentences. Active voice. No jargon. One idea per sentence.]
Spoken register rules: Write "you can" not "one may." Write "let's" not "let us." Write "here's what that means" not "the following section describes." Read the script aloud before generating — if it sounds unnatural when spoken, rewrite it.
Upload source documents (a strategy report, blog post series, or research summary); NotebookLM generates a conversational two-host podcast episode.
Best use: internal knowledge-sharing, client education, thought leadership audio.
Important: Always review and edit the generated transcript before producing the final audio. The AI hosts occasionally add incorrect context, misattribute sources, or insert inaccurate statistics. The transcript is the editorial record — treat it as a draft.
Tools: HeyGen, Synthesia, D-ID, Runway, Pika Labs
[Opening — 5 seconds]
Hook. One sentence that names the viewer's problem or goal.
Example: "If you're losing customers to competitors and you don't know why, this is for you."
[Body — 60–90 seconds]
Three key points. One sentence per point. Transition word between each.
Point 1: [specific insight or fact]
Transition: "And there's more —" / "Here's why that matters —" / "But here's the part most people miss:"
Point 2: [specific insight or fact]
Transition: [as above]
Point 3: [specific insight or fact]
[CTA — 10 seconds]
One specific instruction. Include the contact method.
Example: "Send us a WhatsApp message on [number] today — we'll respond within the hour."
Total word count: under 250 words for a 90-second video.
Personalised video achieves 75% open rates and 40% response rates in B2B outreach (Roth and neuroflash Team, 2024/2025). Use HeyGen or Tavus for campaign-scale personalised video.
Prompt structure: Write one master script with placeholder variables:
{{first_name}} — recipient's first name{{company_name}} — recipient's company{{specific_detail}} — one personalised observation about their businessThe AI renders a unique video per recipient with the avatar appearing to speak directly to them. Review the master script carefully — all placeholder replacements are machine-generated at scale.
Tools: Suno AI, Udio, Soundraw, AIVA, Beatoven.ai
Style: [genre: warm acoustic / energetic Afrobeats / calm ambient / professional corporate]
Mood: [emotional register: motivating / relaxing / celebratory / trustworthy / urgent]
Tempo: [slow / medium / upbeat]
Instruments: [specify if required: piano and strings / guitar and percussion / synths only]
Duration: [X seconds]
Use: [social media background / explainer video / podcast intro / presentation background]
Ching and Mothi (2025) document that AI-generated music — even when technically proficient — frequently lacks the emotional depth and cultural resonance of human-composed music. Suno AI tracks were found to be technically acceptable but thin in emotional weight.
The test: after generating a track, ask a human reviewer with musical knowledge — "Does this music feel right for this moment, or is it merely technically acceptable?" If the answer is the latter, regenerate with a more specific mood and instrument brief, or commission a human musician.
All AI-generated music for client use must pass this review before deployment.
Under the EU AI Act (Article 4) and emerging global standards, AI-generated audio and video must be disclosed where the content is presented as a real person or used commercially.
Apply the following to every AI media asset:
| Asset type | Disclosure requirement | |---|---| | AI voice-over | Disclose in the production record; inform the client | | AI avatar video (brand character) | No disclosure required if clearly a brand avatar, not a real person | | AI avatar video (presented as real person) | Disclose to the end audience | | AI-generated music (commercial distribution) | Disclose in metadata | | AI-generated images (commercial use) | Confirm Adobe Firefly or equivalent for licensing-safe output |
SynthID (Google/DeepMind) is the current standard for watermarking AI-generated audio; equivalent tools exist for images and video. Apply SynthID watermarking to all AI audio produced for commercial client distribution.
Maintain consistency across all AI-generated assets in a single campaign:
Note the production record fields: Asset ID | Tool | Prompt or Script | Seed/Voice ID | Style Settings | Date Generated | Human Reviewer | Approved (Y/N).
Good output from this skill meets all of the following standards:
image-prompt-engineertools
Generates a foundational social media training guide for clients and their teams who are completely new to social media marketing, or who have been posting without any strategic understanding. Invoke when the user says "write a social media basics guide", "create a beginner training document", "the client doesn't understand social media", "start-here training", or when a client needs to understand social media before any strategy or content work begins. Distinct from training-client-team (operational handover of an existing strategy) and training-diy-content (content creation for self-managing clients). This skill covers what social media is, how it works, and how to approach it intelligently — the conceptual foundation that makes all downstream strategy work land.
tools
Generates a practical smartphone video production training guide for East African clients and content teams. Covers shooting, audio, lighting, framing, editing, and platform-specific formats using only a smartphone — no professional equipment required. Invoke this skill when a client or their team needs to produce their own social video content and requires a hands-on, jargon-free training document tailored to EA field conditions.
tools
Generates a complete DIY content creation handbook for clients who want to manage some or all of their own content after the initial strategy engagement. Invoke when the user says "write a DIY content guide", "create a self-managed content handbook", "the client wants to manage their own content", or when a handover guide is needed at the end of a strategy engagement. Output is a self-contained reference document — not a training presentation — that the client keeps and uses independently.
tools
Generates a complete 2-hour in-person training workbook for a client's internal team — employees who will assist with content creation or community management. Invoke when the user says "create a team training guide", "write a staff training workbook", "onboard our internal team on social media", or needs a printable workshop document for client employees. Output is a structured, print-ready workbook — not a presentation deck.