docs/skills/sdxl-character-lora-training/SKILL.md
# SDXL Character LoRA Training — Pipeline Reference ## Overview Character LoRAs are trained using Kohya sd-scripts (`sdxl_train_network.py`) on RunPod GPU pods. Training runs as a fire-and-forget batch job — the orchestrator creates the pod, the pod trains, uploads the result, and POSTs a webhook on completion. **Base model for training:** SDXL 1.0 base (NOT Juggernaut Ragnarok). Training against the base model produces portable LoRAs that work across all SDXL fine-tunes (Juggernaut, RealVisX
npx skillsauth add G858-debug/No-Safe-Word docs/skills/sdxl-character-lora-trainingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Character LoRAs are trained using Kohya sd-scripts (sdxl_train_network.py) on RunPod GPU pods. Training runs as a fire-and-forget batch job — the orchestrator creates the pod, the pod trains, uploads the result, and POSTs a webhook on completion.
Base model for training: SDXL 1.0 base (NOT Juggernaut Ragnarok). Training against the base model produces portable LoRAs that work across all SDXL fine-tunes (Juggernaut, RealVisXL, epiCRealism, etc.).
Inference checkpoint: Juggernaut XL Ragnarok (or any SDXL fine-tune). The trained LoRA is injected into the workflow at inference time.
We use a two-pass training approach for maximum character flexibility:
Why two passes: The first LoRA learns from a narrow distribution (one checkpoint's interpretation of the character). The second pass uses that LoRA to generate more diverse training data, producing a LoRA that generalises across scenes, poses, and styles. This is the approach used by the Everly Heights Story Studio project for serialised visual storytelling.
Cost: ~$0.60–1.40 per character (doubled from single-pass, negligible for the quality gain).
Target a roughly 60/40 face-to-body ratio with more repeats on face images:
| Category | Count | Repeats | Purpose | |----------|-------|---------|---------| | Face close-ups | 10–12 | 40 | Facial identity — bone structure, eyes, nose, lips | | Head-and-shoulders | 6–8 | 25 | Face + upper body transition | | Waist-up | 6–8 | 20 | Upper body proportions, clothing variety | | Full-body | 8–10 | 20 | Body proportions, full figure, posture |
Total steps target: ~1500–2000 (repeats × images × epochs)
Must include across the dataset:
Must AVOID:
All female characters have specific body types defined in their character data. The LoRA must capture body proportions alongside facial identity. The dataset must include enough full-body and waist-up shots to teach the model the character's figure.
| Parameter | Value | Notes | |-----------|-------|-------| | Base model | SDXL 1.0 base | NOT the inference checkpoint — for portability | | Network dim | 32 | Higher than Pony's 8 — photorealistic needs more capacity for facial detail | | Network alpha | 16 | Half of dim | | Optimizer | Prodigy | Self-adjusting learning rate | | Learning rate | 1.0 | Prodigy handles the actual LR internally | | Scheduler | cosine_with_restarts | | | Noise offset | 0.03 | | | Resolution | 1024 | SDXL native | | Batch size | 2 | | | Clip skip | 1 | SDXL standard (Pony used 2 — don't carry that over) | | Epochs | 10–15 | Save checkpoints every 2 epochs to find the sweet spot | | Save every N epochs | 2 | Test intermediate checkpoints — the best LoRA is rarely the last epoch | | Mixed precision | fp16 | | | Cache latents | Yes | Reduces VRAM, maintains quality | | Cache text encoder | Yes | |
Key changes from Pony pipeline:
Use natural language captions (not Booru tags). Ragnarok was trained with both styles, but natural language produces better results for photorealistic character LoRAs.
{trigger_word}, a [age] [ethnicity] [gender] [action/pose], [clothing], [expression], [setting], [lighting]
The trigger word carries the character's identity. Strip these from captions so the model associates them with the trigger word, not with explicit text:
Example caption:
lindiwe_nsw, a young woman smiling, fitted blazer and tailored trousers, warm expression looking at camera, modern office interior, soft window light
NOT:
lindiwe_nsw, a young Black South African woman with medium-brown skin, oval face, high cheekbones, dark brown eyes, neat braids in low bun, slim curvaceous figure, smiling...
The second version bakes identity into the text — the model should learn identity from the images, not from repeated text descriptions.
Each generated dataset image is evaluated by Claude Vision on these criteria (1–10 each):
Pass threshold: Weighted score ≥ 6.0 Minimum passed images: 25 (per pass)
Each evaluation must produce:
The user can override any AI evaluation:
The user's judgment is final. The AI evaluation is a first pass to save time, not an authority.
After training completes:
If validation fails → retry training with adjusted parameters (lower learning rate, different epoch checkpoint).
| File | Purpose |
|------|---------|
| packages/image-gen/src/lora-trainer.ts | Pipeline orchestrator (replaces pony-lora-trainer.ts) |
| packages/image-gen/src/dataset-generator.ts | Training image generation (replaces pony-dataset-generator.ts) |
| packages/image-gen/src/character-lora-validator.ts | Post-training validation (replaces pony-character-lora-validator.ts) |
| packages/image-gen/src/character-lora/training-image-evaluator.ts | Dataset curation |
| packages/image-gen/src/character-lora/training-caption-builder.ts | Caption generation |
| infra/kohya-trainer/train_entrypoint.py | Kohya training script on RunPod pod |
| apps/web/app/api/lora-training-webhook/route.ts | Webhook handler for training completion |
getAvailableGpusSortedByPrice() — 24GB+ VRAM, Secure cloud, $1.00/hr capghcr.io/g858-debug/nsw-kohya-trainer (update tag when rebuilding)nsw-comfyui-models — checkpoints and trained LoRAs stored heresd_xl_base_1.0.safetensors) — download from HuggingFace as fallback if not presentgetAvailableGpusSortedByPrice()write:packages scope)testing
# Juggernaut XL Ragnarok — Pipeline Reference ## Overview Juggernaut XL Ragnarok is a photorealistic SDXL checkpoint. It is the most downloaded SDXL model (520K+ downloads) and the final SDXL release from KandooAI / RunDiffusion. **Key characteristics:** - Photorealistic output with cinematic quality - NSFW capability baked into training (trained with Booru tags on an NSFW dataset, merged with a Lustify-based NSFW pass for anatomical stability) - Supports BOTH natural language prompts AND Boo
tools
# Image Editing Workflows — Pipeline Reference ## Overview After initial image generation, the user has access to several post-generation editing tools. These are ComfyUI workflow variants that modify an existing generated image rather than generating from scratch. All editing workflows run on the same RunPod serverless infrastructure as generation, using the same Juggernaut Ragnarok checkpoint. ## Inpainting **Purpose:** Fix a specific region of a generated image without regenerating the w
testing
When the user needs marketing ideas, inspiration, or strategies for their SaaS or software product. Also use when the user asks for 'marketing ideas,' 'growth ideas,' 'how to market,' 'marketing strategies,' 'marketing tactics,' 'ways to promote,' 'ideas to grow,' 'what else can I try,' 'I don't know how to market this,' 'brainstorm marketing,' or 'what marketing should I do.' Use this as a starting point whenever someone is stuck or looking for inspiration on how to grow. For specific channel execution, see the relevant skill (paid-ads, social-content, email-sequence, etc.).
tools
When the user wants to create, plan, or optimize a lead magnet for email capture or lead generation. Also use when the user mentions "lead magnet," "gated content," "content upgrade," "downloadable," "ebook," "cheat sheet," "checklist," "template download," "opt-in," "freebie," "PDF download," "resource library," "content offer," "email capture content," "Notion template," "spreadsheet template," or "what should I give away for emails." Use this for planning what to create and how to distribute it. For interactive tools as lead magnets, see free-tool-strategy. For writing the actual content, see copywriting. For the email sequence after capture, see email-sequence.