tools/visual-identity/SKILL.md
Build and maintain a persistent visual identity for your agent using Flux Kontext Pro. Use when the user asks the agent to generate selfies, avatars, character art, or any image that should look like the same person across generations.
npx skillsauth add letta-ai/skills visual-identityInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Build a persistent visual identity that stays consistent across sessions. Supports OpenAI (gpt-image-1) and Flux Kontext Pro.
Two workflows:
The script auto-detects which provider to use based on environment variables:
| Priority | Env var | Provider | Notes |
|----------|---------|----------|-------|
| 1st | OPENAI_API_KEY | OpenAI gpt-image-1 | Recommended. Most users already have this. |
| 2nd | BFL_API_KEY | Flux Kontext Pro | Better face consistency. Requires BFL account. |
You can override with --provider openai or --provider flux.
If neither key is set, guide the user:
OPENAI_API_KEY set. If not, https://platform.openai.com/api-keysNever ask the user to paste the full key in chat.
Install if missing (prefer uv):
uv pip install requests Pillow
If uv is unavailable:
pip3 install requests Pillow
This is the primary workflow. The goal is to establish a reference appearance and then generate new scenes that preserve the same face, bone structure, and features.
Either the user provides a photo, or you generate a base character:
Option A -- User provides a reference photo: The user pastes or specifies an image file. Save it to the persistent identity directory (see "Persisting Visual Identity" below).
Option B -- Generate a base character from text: Use text-to-image to create the initial character. Be very specific about physical features. Example prompt:
A portrait of a young woman with shoulder-length auburn hair, green eyes, light freckles, wearing a black leather jacket. Clean background, studio lighting, 3:4 portrait.
Save the result as the reference image.
Pass the reference image as base64-encoded input_image:
python3 <path-to-skill>/scripts/generate_image.py edit \
--reference /path/to/canonical.jpg \
--prompt "The same person is sitting at a desk coding late at night, lit by monitor glow" \
--out /tmp/identity_coding.jpg
Always include an identity-anchoring phrase in every prompt that uses a reference. This tells the model to preserve facial features:
Keep his/her exact face, bone structure, eye color, and hair.
Or more naturally woven into the prompt:
The same man is relaxing on a tropical beach at sunset, wearing a linen shirt. Golden hour lighting. Keep his exact face, bone structure, eye color, and hair.
Two things persist across sessions: the reference image (binary) and the identity metadata (markdown). They live in different places.
Save the canonical reference image to ~/.letta/agents/$AGENT_ID/reference/visual-identity/canonical.jpg. This is outside memfs because binary images would bloat the git-backed memory repo. The reference/ directory persists across sessions.
mkdir -p ~/.letta/agents/$AGENT_ID/reference/visual-identity
cp /tmp/generated_portrait.jpg ~/.letta/agents/$AGENT_ID/reference/visual-identity/canonical.jpg
After establishing a visual identity, create a memory file at reference/visual-identity.md in the agent's memory filesystem. This syncs via git like all other memory files.
Use the Memory tool to create it:
memory(command="create", reason="Store visual identity metadata",
file_path="reference/visual-identity.md",
description="Agent's persistent visual identity -- reference image path and appearance description.",
file_text="## Reference Image\n~/.letta/agents/$AGENT_ID/reference/visual-identity/canonical.jpg\n\n## Appearance\n- Hair: shoulder-length auburn, slight wave\n- Eyes: green\n- Skin: light with freckles\n- Build: athletic\n- Distinguishing: small scar above left eyebrow\n\n## Anchoring Phrase\nKeep the exact same face, bone structure, eye color, and hair from the reference image.\n\n## History\n- Established: 2026-04-15\n- User feedback: \"make the hair a bit darker\" -> regenerated, approved")
When this skill is loaded, check the agent's memory tree for reference/visual-identity.md. If it exists:
If it does not exist, the agent has no visual identity yet. Offer to create one if the user asks for images.
If the user wants to change their visual identity:
canonical.jpg in the reference directoryFor one-off image generation that does not need identity persistence.
python3 <path-to-skill>/scripts/generate_image.py generate \
--prompt "A corgi wearing a tiny space helmet on the moon" \
--out /tmp/corgi_moon.jpg
Or inline with requests (OpenAI):
import requests, base64, os
resp = requests.post(
"https://api.openai.com/v1/images/generations",
headers={
"Authorization": f"Bearer {os.environ['OPENAI_API_KEY']}",
"Content-Type": "application/json",
},
json={
"model": "gpt-image-1",
"prompt": "A corgi wearing a tiny space helmet on the moon",
"n": 1,
"size": "1024x1024",
"quality": "medium",
},
).json()
img = base64.b64decode(resp["data"][0]["b64_json"])
with open("/tmp/corgi_moon.png", "wb") as f:
f.write(img)
| Parameter | Values | Default | Provider | Notes |
|-----------|--------|---------|----------|-------|
| --prompt | string | required | Both | Scene description |
| --reference | file path | none | Both | Reference photo for identity mode (edit only) |
| --provider | openai, flux | auto | Both | Override provider auto-detection |
| --aspect-ratio | 1:1, 3:4, 4:3, 16:9, 9:16 | 3:4 | Both | Use 3:4 for portraits |
| --output-format | png, jpeg, webp | png | Both | |
| --quality | low, medium, high | medium | OpenAI | Image quality |
| --seed | integer | random | Flux | Fix for reproducible results |
| --safety-tolerance | 0-6 | 2 | Flux | Higher = more permissive |
| --guidance | 1.5-100 | varies | Flux | Prompt adherence strength |
3:4 or 4:316:9Pending for over 120 seconds, retry onceReady response are signed and expire; save images immediatelyFull CLI documentation: references/api.md
Common commands:
# Text-to-image
python3 <path-to-skill>/scripts/generate_image.py generate \
--prompt "..." --out output.jpg
# Reference-based editing (visual identity)
python3 <path-to-skill>/scripts/generate_image.py edit \
--reference photo.jpg --prompt "..." --out output.jpg
# Dry run (show request without sending)
python3 <path-to-skill>/scripts/generate_image.py generate \
--prompt "..." --dry-run
# Custom aspect ratio and seed
python3 <path-to-skill>/scripts/generate_image.py generate \
--prompt "..." --aspect-ratio 16:9 --seed 42 --out wide.jpg
OpenAI:
HTTP 400/422: Usually a malformed request or content policy violationHTTP 429: Rate limited -- wait and retryOPENAI_API_KEY: Guide the user to https://platform.openai.com/api-keysFlux:
Insufficient credits: Direct user to https://api.bfl.ai/creditsHTTP 422: Usually a malformed request -- check prompt and parametersPending timeout: Retry the request; the queue may be congestedBFL_API_KEY: Guide the user to https://api.bfl.aiBoth:
testing
Navigates archived ChatGPT or Claude-style conversation exports and a MemFS reference archive on demand. Use when recalling what a past assistant knew, searching old conversations, rendering specific chats, seeding reference memory from export sidecars, or mining historical context without doing a full import.
testing
Migrates deprecated Letta Filesystem folders/files to MemFS using markdown document corpora, chunking, local lexical search, and QMD semantic search via the memfs-search skill. Use when replacing folders.files.upload, working with PDFs or document QA, or emulating open_file, grep_file, and search_file behavior.
data-ai
Configures Letta agent compaction settings and custom summarization prompts. Use when a user asks to change an agent's compaction prompt, improve summaries after context eviction, tune sliding-window or all-message compaction, or design companion/coding-agent continuity summaries.
development
Semantic search over agent memory files. Use when you need to find conceptually related memory blocks, discover forgotten reference files, check what you already know before creating new memory, or search beyond exact keyword matching. Currently supports QMD (local, no API keys).