skills/integrations/google/google-gemini-image/SKILL.md
Generate, edit, and refine images with Google Gemini. Load when user says 'generate image', 'edit image', 'refine image', 'text to image', 'gemini image', 'modify image'.
npx skillsauth add beam-ai-team/beam-next-skills google-gemini-imageInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Unified skill for text-to-image generation, image editing, and iterative refinement.
Replaces: gemini-generate-image, gemini-edit-image, gemini-refine-image.
# Generate from text prompt
uv run python scripts/gemini_image.py generate "a cat in space"
uv run python scripts/gemini_image.py generate "sunset over mountains" --aspect 16:9 --size 2K
uv run python scripts/gemini_image.py generate "abstract art" --output my_art.png
# Edit an existing image
uv run python scripts/gemini_image.py edit photo.png "make the sky blue"
uv run python scripts/gemini_image.py edit scene.jpg "add clouds" --aspect 4:3 --size 1K --output result.png
# Refine the last generated/edited image
uv run python scripts/gemini_image.py refine "add more stars"
uv run python scripts/gemini_image.py refine "make it brighter" --image specific.png
All actions use gemini-3.1-flash-image-preview (Nano Banana 2). Inputs are text-only, or image + text for edit/refine.
Legacy gemini-2.0-flash-exp* models are deprecated; Google scheduled shutdown June 1, 2026 — this skill targets 3.1 only.
--aspect: 1:1, 1:4, 1:8, 2:3, 3:2, 3:4, 4:1, 4:3, 4:5, 5:4, 8:1, 9:16, 16:9, 21:9 (default 1:1).--size: 512, 1K, 2K, 4K (default 1K). Passed to the API as image_config.image_size.GEMINI_API_KEY in .env (get from https://aistudio.google.com/app/apikey)pip install google-genai PillowImages save as PNG to 04-workspace/generated-images/ (or --output path).
Filenames auto-generated with timestamp: generated_20260323_143000.png, edited_..., refined_....
Shared path helpers live in scripts/gemini_client.py.
development
--- name: taste-skill type: skill version: '1.0' author: Leonxlnx (packaged by Zhichao Li) category: general tags: - frontend - design - anti-slop - landing-page updated: '2026-06-11' visibility: public description: Anti-slop frontend skill for landing pages, portfolios, and redesigns. The agent reads the brief, infers the right design direction, and ships interfaces that do not look templated. Real design systems when applicable, audit-first on redesigns, strict pre-flight check. license: MIT.
development
Use when communicating quantitative information in any form — Slack updates, emails, reports, decks, dashboards, landing pages, product UI, public talks. Covers two integrated layers: (1) making numbers semantically meaningful (translation, anchoring, simplification, story-pairing) and (2) showing numbers cleanly (chart vs table vs prose, chart-by-message, pre-attentive emphasis, color discipline, decluttering). Distilled and integrated from *Show Me the Numbers* (Stephen Few) and *Make Numbers Count* (Chip Heath & Karla Starr). Not for raw data analysis or statistics — this is about communication of numbers, not their derivation.
development
Use when the user wants to design, redesign, shape, critique, audit, polish, clarify, distill, harden, optimize, adapt, animate, colorize, extract, or otherwise improve a frontend interface. Covers websites, landing pages, dashboards, product UI, app shells, components, forms, settings, onboarding, and empty states. Handles UX review, visual hierarchy, information architecture, cognitive load, accessibility, performance, responsive behavior, theming, anti-patterns, typography, fonts, spacing, layout, alignment, color, motion, micro-interactions, UX copy, error states, edge cases, i18n, and reusable design systems or tokens. Also use for bland designs that need to become bolder or more delightful, loud designs that should become quieter, live browser iteration on UI elements, or ambitious visual effects that should feel technically extraordinary. Not for backend-only or non-UI tasks.
tools
Stateful multi-session tutor adapted for Beam — teach a stakeholder to understand, trust, and operate a specific agent, or teach a Solution Engineer a client's business process for delivery. Grounds every lesson in Knowledge Hub sources (real agent graphs, real tasks, transcripts, Linear) before any web resource. Also works for any general topic. Trigger on "teach me", "beam teach", "教我", "onboard <person> on <agent>", "help <stakeholder> understand the agent", "learn this client's process".