docs/zh-CN/skills/fal-ai-media/SKILL.md
通过 fal.ai MCP 实现统一的媒体生成——图像、视频和音频。涵盖文本到图像(Nano Banana)、文本/图像到视频(Seedance、Kling、Veo 3)、文本到语音(CSM-1B),以及视频到音频(ThinkSound)。当用户想要使用 AI 生成图像、视频或音频时使用。
npx skillsauth add SiniyaYousuf/everything_claudecode fal-ai-mediaInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
通过 MCP 使用 fal.ai 模型生成图像、视频和音频。
必须配置 fal.ai MCP 服务器。添加到 ~/.claude.json:
"fal-ai": {
"command": "npx",
"args": ["-y", "fal-ai-mcp-server"],
"env": { "FAL_KEY": "YOUR_FAL_KEY_HERE" }
}
在 fal.ai 获取 API 密钥。
fal.ai MCP 提供以下工具:
search — 通过关键词查找可用模型find — 获取模型详情和参数generate — 使用参数运行模型result — 检查异步生成状态status — 检查作业状态cancel — 取消正在运行的作业estimate_cost — 估算生成成本models — 列出热门模型upload — 上传文件用作输入最适合:快速迭代、草稿、文生图、图像编辑。
generate(
app_id: "fal-ai/nano-banana-2",
input_data: {
"prompt": "a futuristic cityscape at sunset, cyberpunk style",
"image_size": "landscape_16_9",
"num_images": 1,
"seed": 42
}
)
最适合:生产级图像、写实感、排版、详细提示。
generate(
app_id: "fal-ai/nano-banana-pro",
input_data: {
"prompt": "professional product photo of wireless headphones on marble surface, studio lighting",
"image_size": "square",
"num_images": 1,
"guidance_scale": 7.5
}
)
| 参数 | 类型 | 选项 | 说明 |
|-------|------|---------|-------|
| prompt | 字符串 | 必需 | 描述您想要的内容 |
| image_size | 字符串 | square、portrait_4_3、landscape_16_9、portrait_16_9、landscape_4_3 | 宽高比 |
| num_images | 数字 | 1-4 | 生成数量 |
| seed | 数字 | 任意整数 | 可重现性 |
| guidance_scale | 数字 | 1-20 | 遵循提示的紧密程度(值越高越贴近字面) |
使用 Nano Banana 2 并输入图像进行修复、扩展或风格迁移:
# First upload the source image
upload(file_path: "/path/to/image.png")
# Then generate with image input
generate(
app_id: "fal-ai/nano-banana-2",
input_data: {
"prompt": "same scene but in watercolor style",
"image_url": "<uploaded_url>",
"image_size": "landscape_16_9"
}
)
最适合:文生视频、图生视频,具有高运动质量。
generate(
app_id: "fal-ai/seedance-1-0-pro",
input_data: {
"prompt": "a drone flyover of a mountain lake at golden hour, cinematic",
"duration": "5s",
"aspect_ratio": "16:9",
"seed": 42
}
)
最适合:文生/图生视频,带原生音频生成。
generate(
app_id: "fal-ai/kling-video/v3/pro",
input_data: {
"prompt": "ocean waves crashing on a rocky coast, dramatic clouds",
"duration": "5s",
"aspect_ratio": "16:9"
}
)
最适合:带生成声音的视频,高视觉质量。
generate(
app_id: "fal-ai/veo-3",
input_data: {
"prompt": "a bustling Tokyo street market at night, neon signs, crowd noise",
"aspect_ratio": "16:9"
}
)
从现有图像开始:
generate(
app_id: "fal-ai/seedance-1-0-pro",
input_data: {
"prompt": "camera slowly zooms out, gentle wind moves the trees",
"image_url": "<uploaded_image_url>",
"duration": "5s"
}
)
| 参数 | 类型 | 选项 | 说明 |
|-------|------|---------|-------|
| prompt | 字符串 | 必需 | 描述视频内容 |
| duration | 字符串 | "5s"、"10s" | 视频长度 |
| aspect_ratio | 字符串 | "16:9"、"9:16"、"1:1" | 帧比例 |
| seed | 数字 | 任意整数 | 可重现性 |
| image_url | 字符串 | URL | 用于图生视频的源图像 |
文本转语音,具有自然、对话式的音质。
generate(
app_id: "fal-ai/csm-1b",
input_data: {
"text": "Hello, welcome to the demo. Let me show you how this works.",
"speaker_id": 0
}
)
根据视频内容生成匹配的音频。
generate(
app_id: "fal-ai/thinksound",
input_data: {
"video_url": "<video_url>",
"prompt": "ambient forest sounds with birds chirping"
}
)
如需专业的语音合成,直接使用 ElevenLabs:
import os
import requests
resp = requests.post(
"https://api.elevenlabs.io/v1/text-to-speech/<voice_id>",
headers={
"xi-api-key": os.environ["ELEVENLABS_API_KEY"],
"Content-Type": "application/json"
},
json={
"text": "Your text here",
"model_id": "eleven_turbo_v2_5",
"voice_settings": {"stability": 0.5, "similarity_boost": 0.75}
}
)
with open("output.mp3", "wb") as f:
f.write(resp.content)
如果配置了 VideoDB,使用其生成式音频:
# Voice generation
audio = coll.generate_voice(text="Your narration here", voice="alloy")
# Music generation
music = coll.generate_music(prompt="upbeat electronic background music", duration=30)
# Sound effects
sfx = coll.generate_sound_effect(prompt="thunder crack followed by rain")
生成前,检查估算成本:
estimate_cost(
estimate_type: "unit_price",
endpoints: {
"fal-ai/nano-banana-pro": {
"unit_quantity": 1
}
}
)
查找特定任务的模型:
search(query: "text to video")
find(endpoint_ids: ["fal-ai/seedance-1-0-pro"])
models()
seed 以获得可重现的结果estimate_costvideodb — 视频处理、编辑和流媒体video-editing — AI 驱动的视频编辑工作流content-engine — 社交媒体平台内容创作development
X/Twitter API integration for posting tweets, threads, reading timelines, search, and analytics. Covers OAuth auth patterns, rate limits, and platform-native content posting. Use when the user wants to interact with X programmatically.
documentation
Translate visa application documents (images) to English and create a bilingual PDF with original and translation
tools
See, Understand, Act on video and audio. See- ingest from local files, URLs, RTSP/live feeds, or live record desktop; return realtime context and playable stream links. Understand- extract frames, build visual/semantic/temporal indexes, and search moments with timestamps and auto-clips. Act- transcode and normalize (codec, fps, resolution, aspect ratio), perform timeline edits (subtitles, text/image overlays, branding, audio overlays, dubbing, translation), generate media assets (image, audio, video), and create real time alerts for events from live streams or desktop capture.
development
AI-assisted video editing workflows for cutting, structuring, and augmenting real footage. Covers the full pipeline from raw capture through FFmpeg, Remotion, ElevenLabs, fal.ai, and final polish in Descript or CapCut. Use when the user wants to edit video, cut footage, create vlogs, or build video content.