plugins/specweave/skills/image/SKILL.md
Generate and edit images using AI. Powered by Nano Banana Pro (Google Gemini image models) with Pollinations.ai and Imagen 4 fallback. Supports text-to-image, image editing, aspect ratios, 2K/4K, and batch generation. Use when generating images, creating visuals, AI art, text-to-image, image generation, create picture, make illustration, generate photo, nano banana, edit image, batch images.
npx skillsauth add anton-abyzov/specweave plugins/specweave/skills/imageInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generate and edit images from text prompts using AI. Powered by Nano Banana Pro (Google Gemini image models — gemini-3.1-flash-image-preview, gemini-2.5-flash-image, gemini-3-pro-image-preview), with Pollinations.ai and Imagen 4 as fallbacks.
Note: If the
nanobananaskill is available in your session, this skill delegates to it automatically. Otherwise, it uses its own built-in Nano Banana implementation as fallback.
Tier 1: Gemini Native (FREE) ─── gemini-3.1-flash-image-preview (Nano Banana 2) ──┐
↓ on error │
gemini-2.5-flash-image ─────────────────────────────────────────────────────┤
↓ on error │
gemini-3-pro-image-preview (Nano Banana Pro) ───────────────────────────────┤
↓ on error │
Tier 2: Pollinations.ai (FREE, no key) ─────────────────────────────────────────────┤
↓ on error │
Tier 3: Imagen 4 (PAID, billing required) ──────────────────────────────────────────┘
--hq or "high quality" in prompt)Tier 1: Imagen 4 (PAID, ~$0.04/image) ────────────────────────┐
↓ on error │
Tier 2: gemini-3-pro-image-preview (Nano Banana Pro) ──────────┤
↓ on error │
Tier 3: gemini-3.1-flash-image-preview (Nano Banana 2) ────────┤
↓ on error │
Tier 4: Pollinations.ai (FREE) ────────────────────────────────┘
| Feature | Nano Banana 2 | Nano Banana Pro | Pollinations | Imagen 4 | |---------|:---:|:---:|:---:|:---:| | Text-to-image | ✓ | ✓ | ✓ | ✓ | | Image editing | ✓ | ✓ | — | — | | Aspect ratios | ✓ | ✓ | ✓ | ✓ | | 2K/4K output | ✓ | ✓ | — | ✓ | | Search grounding | ✓ | ✓ | — | — | | Batch generation | ✓ | ✓ | ✓ | ✓ |
Aspect ratios supported: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
Before running the built-in fallback chain, check if the nanobanana skill is available in the current session. If it is, delegate the entire request to it — it provides a richer, dedicated Gemini image generation experience.
How to check: Look for nanobanana in the available skills list (system-reminder). If present:
Skill({ skill: "nanobanana" }) with the user's original promptnanobanana skill handles everything (generation, editing, batch, aspect ratios, 2K/4K).When to skip and use built-in chain instead:
nanobanana skill is NOT listed in available skillsIf nanobanana is not available, proceed with the built-in chain below.
Extract from the user's prompt:
standard or high. Detect from: "high quality", "hq", "best quality", "maximum quality", "premium"standard, 2K, or 4K. Detect from: "2K", "4K", "high-res", "high resolution"1:1, portrait→9:16, landscape→16:9, widescreen→21:9, vertical→9:16./generated-media/)If user mentions "nano banana" — they mean this built-in capability. Explain available options and proceed.
Quality modes:
Resolution modes:
", ultra detailed, 2048px quality"", maximum quality 4K ultra detailed, sharp text rendering, 3840px"mkdir -p ./generated-media
# Source .env if it exists
if [ -f .env ]; then
export $(grep -E '^GEMINI_API_KEY=' .env | xargs)
fi
# Check parent dirs (monorepo support)
if [ -z "$GEMINI_API_KEY" ] && [ -f ../.env ]; then
export $(grep -E '^GEMINI_API_KEY=' ../.env | xargs)
fi
# Also load POLLINATIONS_API_KEY if available
if [ -f .env ]; then
export $(grep -E '^POLLINATIONS_API_KEY=' .env | xargs 2>/dev/null) 2>/dev/null || true
fi
TIMESTAMP=$(date +%s)
PROMPT="YOUR_PROMPT_HERE" # The full prompt (with resolution suffix if 2K/4K)
ASPECT_RATIO="1:1" # Set from user request (default: 1:1)
OUTFILE="generated-media/image-${TIMESTAMP}.png"
TMPFILE="/tmp/gemini-img-response-${TIMESTAMP}.json"
INPUT_IMAGE="" # Path to input image (for editing), or empty
If editing an image: Encode input image as base64:
if [ -n "$INPUT_IMAGE" ]; then
INPUT_B64=$(base64 -i "$INPUT_IMAGE" | tr -d '\n')
INPUT_MIME=$(file -b --mime-type "$INPUT_IMAGE")
fi
IMPORTANT: Try each provider in order. On ANY error (quota, billing, network), move to next tier. Write API responses to temp files to avoid JSON parsing issues with large base64 payloads.
If high-quality mode: Start with Tier 3 (Imagen 4), then fall back upward.
Models (try in order): gemini-3.1-flash-image-preview, gemini-2.5-flash-image, gemini-3-pro-image-preview
if [ -n "$GEMINI_API_KEY" ]; then
for MODEL in "gemini-3.1-flash-image-preview" "gemini-2.5-flash-image" "gemini-3-pro-image-preview"; do
echo "Trying $MODEL (Nano Banana)..."
# Build request JSON — with or without input image
if [ -n "$INPUT_IMAGE" ]; then
# Image editing mode: pass both image and text
REQUEST_JSON="{
\"contents\": [{
\"parts\": [
{\"inlineData\": {\"mimeType\": \"${INPUT_MIME}\", \"data\": \"${INPUT_B64}\"}},
{\"text\": \"${PROMPT}\"}
]
}],
\"generationConfig\": {
\"responseModalities\": [\"TEXT\", \"IMAGE\"],
\"aspectRatio\": \"${ASPECT_RATIO}\"
}
}"
else
# Text-to-image mode
REQUEST_JSON="{
\"contents\": [{
\"parts\": [{\"text\": \"${PROMPT}\"}]
}],
\"generationConfig\": {
\"responseModalities\": [\"TEXT\", \"IMAGE\"],
\"aspectRatio\": \"${ASPECT_RATIO}\"
}
}"
fi
curl -s -X POST \
"https://generativelanguage.googleapis.com/v1beta/models/${MODEL}:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-o "$TMPFILE" \
-d "$REQUEST_JSON"
if python3 -c "
import json, sys, base64
with open('$TMPFILE') as f:
data = json.load(f)
if 'error' in data:
print(f'Error: {data[\"error\"][\"message\"][:200]}', file=sys.stderr)
sys.exit(1)
for candidate in data.get('candidates', []):
for part in candidate.get('content', {}).get('parts', []):
if 'inlineData' in part:
img_bytes = base64.b64decode(part['inlineData']['data'])
with open('$OUTFILE', 'wb') as f:
f.write(img_bytes)
print(f'Saved: $OUTFILE')
sys.exit(0)
print('No image in response', file=sys.stderr)
sys.exit(1)
" 2>/dev/null; then
echo "Generated with $MODEL (Nano Banana, free)"
rm -f "$TMPFILE"
break 2
fi
echo "$MODEL failed, trying next..."
done
fi
If Tier 1 fails, continue to Tier 2.
Free models: flux (best), gptimage, turbo
if [ ! -f "$OUTFILE" ] || [ ! -s "$OUTFILE" ]; then
echo "Trying Pollinations.ai..."
ENCODED_PROMPT=$(python3 -c "import urllib.parse; print(urllib.parse.quote('''${PROMPT}'''))")
POLL_MODEL="flux"
POLL_OK=false
# Determine width/height from aspect ratio
case "$ASPECT_RATIO" in
"16:9") POLL_W=1344; POLL_H=768 ;;
"9:16") POLL_W=768; POLL_H=1344 ;;
"4:3") POLL_W=1024; POLL_H=768 ;;
"3:4") POLL_W=768; POLL_H=1024 ;;
"21:9") POLL_W=1512; POLL_H=648 ;;
"1:1"|*) POLL_W=1024; POLL_H=1024 ;;
esac
# Try authenticated endpoint first
if [ -n "${POLLINATIONS_API_KEY:-}" ]; then
curl -s -L --max-time 120 \
-H "Authorization: Bearer $POLLINATIONS_API_KEY" \
-o "$OUTFILE" \
"https://gen.pollinations.ai/image/${ENCODED_PROMPT}?model=${POLL_MODEL}&width=${POLL_W}&height=${POLL_H}&nologo=true"
if [ -f "$OUTFILE" ] && [ -s "$OUTFILE" ]; then
FILETYPE=$(file -b "$OUTFILE" | head -1)
echo "$FILETYPE" | grep -qiE "image|PNG|JPEG|GIF|WebP" && POLL_OK=true || rm -f "$OUTFILE"
fi
fi
# Fall back to anonymous endpoint
if [ "$POLL_OK" != "true" ]; then
curl -s -L --max-time 120 \
-o "$OUTFILE" \
"https://image.pollinations.ai/prompt/${ENCODED_PROMPT}?model=${POLL_MODEL}&width=${POLL_W}&height=${POLL_H}&nologo=true"
if [ -f "$OUTFILE" ] && [ -s "$OUTFILE" ]; then
FILETYPE=$(file -b "$OUTFILE" | head -1)
echo "$FILETYPE" | grep -qiE "image|PNG|JPEG|GIF|WebP" && POLL_OK=true || { echo "Pollinations returned non-image"; rm -f "$OUTFILE"; }
fi
fi
[ "$POLL_OK" = "true" ] && echo "Generated with Pollinations.ai (free)"
fi
if [ ! -f "$OUTFILE" ] || [ ! -s "$OUTFILE" ]; then
if [ -n "$GEMINI_API_KEY" ]; then
echo "Trying Imagen 4 (paid ~$0.04)..."
IMAGEN_MODEL="imagen-4.0-generate-001"
curl -s -X POST \
"https://generativelanguage.googleapis.com/v1beta/models/${IMAGEN_MODEL}:predict" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-o "$TMPFILE" \
-d "{
\"instances\": [{\"prompt\": \"${PROMPT}\"}],
\"parameters\": {
\"sampleCount\": 1,
\"aspectRatio\": \"${ASPECT_RATIO}\"
}
}"
python3 -c "
import json, sys, base64
with open('$TMPFILE') as f:
data = json.load(f)
if 'predictions' in data:
img = base64.b64decode(data['predictions'][0]['bytesBase64Encoded'])
with open('$OUTFILE', 'wb') as f:
f.write(img)
print('Saved: $OUTFILE')
elif 'error' in data:
print(f'Imagen error: {data[\"error\"][\"message\"][:200]}', file=sys.stderr)
sys.exit(1)
" 2>/dev/null && echo "Generated with Imagen 4 (paid)"
rm -f "$TMPFILE"
fi
fi
When user requests high quality, reverse order: Imagen 4 → Gemini Pro → Gemini Flash → Pollinations. Before starting, inform the user:
"High-quality mode — trying Imagen 4 first (~$0.04/image).
If billing isn't enabled, falling back to Gemini Pro (free, Nano Banana Pro quality)."
If user requests multiple images, loop the generation:
COUNT=3 # from user request
for i in $(seq 1 $COUNT); do
OUTFILE="generated-media/image-$(date +%s)-${i}.png"
TMPFILE="/tmp/gemini-img-${RANDOM}.json"
# ... run Tier 1 → Tier 2 → Tier 3 chain for each image ...
sleep 2 # Avoid rate limits between generations
done
Inform user: "Generating $COUNT images..." and show each file as it completes.
if [ -f "$OUTFILE" ] && [ -s "$OUTFILE" ]; then
file "$OUTFILE"
SIZE=$(du -h "$OUTFILE" | cut -f1)
echo "Image generated: $OUTFILE ($SIZE)"
else
echo "ERROR: All providers failed."
echo " - Gemini: Daily quota exceeded (resets at midnight PT)"
echo " - Pollinations: Service temporarily down"
echo " - Imagen 4: Billing not enabled"
echo ""
echo "Fix: Set GEMINI_API_KEY in .env or enable billing at https://aistudio.google.com/"
fi
Tell the user:
When user provides an input image + edit instruction:
"Change the background to blue" [attached: photo.jpg]
"Remove the logo from this image" [path: ./logo-image.png]
"Make this look like a watercolor painting" [input image provided]
Supported edits:
Note: Image editing requires Gemini (Nano Banana) — Pollinations and Imagen 4 tiers do not support editing.
| Ratio | Use Case |
|-------|----------|
| 1:1 | Square, social profile, Instagram post (default) |
| 16:9 | Landscape, YouTube thumbnail, desktop wallpaper |
| 9:16 | Portrait, Instagram Story, TikTok, mobile |
| 4:3 | Standard photo, presentation slide |
| 3:4 | Portrait photo, Pinterest |
| 21:9 | Cinematic, ultra-wide, banner |
| 4:5 | Instagram feed portrait |
| 2:3 | Print portrait |
| Error | Action |
|-------|--------|
| Gemini quota exceeded | Auto-fallback to Pollinations, then Imagen 4 |
| Pollinations 502/timeout | Auto-fallback to Imagen 4 |
| Imagen billing not enabled | Report all failed, suggest enabling billing |
| GEMINI_API_KEY not set | Skip Gemini tiers, use Pollinations only |
| Content policy block | Report prompt blocked, suggest rewording |
| No image in response | Try next model in the chain |
| All providers fail | Show diagnostic with links |
If GEMINI_API_KEY is not set:
Using Pollinations.ai only (free, aspect-ratio limited, may be unreliable).
For full Nano Banana Pro capabilities (image editing, 2K/4K, all aspect ratios):
- Go to https://aistudio.google.com/
- Click "Get API key" → Create API key
- Add to your
.env:GEMINI_API_KEY=your-key-hereThe free tier includes
gemini-3.1-flash-image-preview(Nano Banana 2) andgemini-3-pro-image-preview(Nano Banana Pro) with daily quota.
"Nano Banana" is the nickname for Google's Gemini image generation models:
gemini-3.1-flash-image-preview — fast, great quality, defaultgemini-3-pro-image-preview — highest quality, slowergemini-2.5-flash-image — alternative fast modelAll three use the same generateContent API endpoint and are FREE with a daily quota. No separate skill install needed — this skill includes everything.
generate image, create image, make image, AI image, text-to-image, image generation, create picture, make illustration, generate photo, AI art, create visual, generate artwork, make a picture, nano banana, edit image, edit photo, image editing, batch images, batch generate, 2K image, 4K image, high resolution image, widescreen image, portrait image, square image, cinematic image
tools
Generate AI videos from text prompts or images. Supports Google Veo 3.1 and Pollinations.ai (free). Use when generating video, creating animations, text-to-video, AI video, video generation, make clip, animate.
tools
Validate increment with rule-based checks and AI quality assessment. Use when saying "validate", "check quality", or "verify increment".
tools
Create and manage umbrella workspaces for multi-repo projects. Activate when the user wants to: create umbrella, umbrella init, wrap in umbrella, create workspace, setup multi-repo, migrate repos to umbrella, umbrella create, new workspace, restructure into umbrella, "wrap this repo", "create umbrella for these repos", "setup workspace with repos", "move repos into umbrella". Do NOT activate for: add a repo to existing umbrella (use sw:get), add a feature, add an increment, clone a repo (use sw:get).
tools
--- description: Merge completed parallel agent work and trigger GitHub sync per increment. Activates for: team merge, merge agents, combine work, team finish. --- # Team Merge **Verify all teammates completed, run quality gates, close increments, and trigger sync.** ## Usage ```bash sw:team-merge sw:team-merge --dry-run # Preview merge plan sw:team-merge --skip-sync # Merge without GitHub/JIRA sync ``` ## What This Skill Does 1. **Verify all teammates completed** -- bl