skills/gemma-tuner-multimodal/SKILL.md
Fine-tune Gemma 4 and 3n models with audio, images, and text on Apple Silicon using PyTorch and Metal Performance Shaders.
npx skillsauth add aradotso/trending-skills gemma-tuner-multimodalInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Skill by ara.so — Daily 2026 Skills collection.
Fine-tune Gemma 4 and Gemma 3n models on text, images, and audio data entirely on Apple Silicon (MPS), with support for streaming large datasets from GCS/BigQuery without filling local storage.
# Install Python 3.12 if needed
brew install [email protected]
# Create venv
python3.12 -m venv .venv
source .venv/bin/activate
# Verify arm64 (must show arm64, not x86_64)
python -c "import platform; print(platform.machine())"
# Install PyTorch
pip install torch torchaudio
# Clone and install
git clone https://github.com/mattmireles/gemma-tuner-multimodal
cd gemma-tuner-multimodal
pip install -e .
# For Gemma 4 support (separate venv recommended)
pip install -r requirements/requirements-gemma4.txt
huggingface-cli login
# Or set environment variable:
export HF_TOKEN=your_token_here
# Check system is ready
gemma-macos-tuner system-check
# Guided setup wizard (recommended for first run)
gemma-macos-tuner wizard
# Prepare dataset
gemma-macos-tuner prepare <dataset-profile>
# Fine-tune a model
gemma-macos-tuner finetune <profile> --json-logging
# Evaluate a run
gemma-macos-tuner evaluate <profile-or-run>
# Export merged HF/SafeTensors (merges LoRA when adapter_config.json present)
gemma-macos-tuner export <run-dir-or-profile>
# Blacklist bad samples from errors
gemma-macos-tuner blacklist <profile>
# List training runs
gemma-macos-tuner runs list
config/config.ini)The config is hierarchical INI: defaults → groups → models → datasets → profiles.
[defaults]
output_dir = output
batch_size = 2
gradient_accumulation_steps = 8
learning_rate = 2e-4
num_train_epochs = 3
[model:gemma-3n-e2b-it]
group = gemma
base_model = google/gemma-3n-E2B-it
[model:gemma-4-e2b-it]
group = gemma
base_model = google/gemma-4-E2B-it
[dataset:my-audio-dataset]
data_dir = data/datasets/my-audio-dataset
audio_column = audio_path
text_column = transcript
[profile:my-audio-profile]
model = gemma-3n-e2b-it
dataset = my-audio-dataset
modality = audio
lora_r = 16
lora_alpha = 32
lora_dropout = 0.05
max_seq_length = 512
Use GEMMA_TUNER_CONFIG env var to point to config outside repo root:
export GEMMA_TUNER_CONFIG=/path/to/my/config.ini
Instruction tuning (user/assistant pairs):
[profile:text-instruction]
model = gemma-3n-e2b-it
dataset = my-text-dataset
modality = text
text_sub_mode = instruction
prompt_column = prompt
text_column = response
max_seq_length = 2048
lora_r = 16
lora_alpha = 32
Completion tuning (full sequence trained):
[profile:text-completion]
model = gemma-3n-e2b-it
dataset = my-text-dataset
modality = text
text_sub_mode = completion
text_column = text
max_seq_length = 2048
CSV format for instruction tuning (data/datasets/my-text-dataset/train.csv):
prompt,response
"What is photosynthesis?","Photosynthesis is the process by which plants..."
"Explain LoRA fine-tuning","LoRA (Low-Rank Adaptation) is a parameter-efficient..."
[profile:image-caption]
model = gemma-3n-e2b-it
dataset = my-image-dataset
modality = image
image_sub_mode = captioning
image_token_budget = 256
prompt_column = prompt
text_column = caption
max_seq_length = 512
CSV format (data/datasets/my-image-dataset/train.csv):
image_path,prompt,caption
/data/images/img1.jpg,Describe this image,A dog sitting on a green lawn...
/data/images/img2.jpg,What is shown here,A bar chart showing quarterly revenue...
[profile:audio-asr]
model = gemma-3n-e2b-it
dataset = my-audio-dataset
modality = audio
audio_column = audio_path
text_column = transcript
max_seq_length = 512
lora_r = 16
lora_alpha = 32
lora_dropout = 0.05
CSV format (data/datasets/my-audio-dataset/train.csv):
audio_path,transcript
/data/audio/recording1.wav,The patient presents with acute respiratory symptoms
/data/audio/recording2.wav,Counsel objects to the characterization of the evidence
| Model Key | Hugging Face ID | Notes |
|---|---|---|
| gemma-3n-e2b-it | google/gemma-3n-E2B-it | Default, ~2B instruct |
| gemma-3n-e4b-it | google/gemma-3n-E4B-it | ~4B instruct |
| gemma-4-e2b-it | google/gemma-4-E2B-it | Needs requirements-gemma4.txt |
| gemma-4-e4b-it | google/gemma-4-E4B-it | Needs requirements-gemma4.txt |
| gemma-4-e2b | google/gemma-4-E2B | Base, needs Gemma 4 stack |
| gemma-4-e4b | google/gemma-4-E4B | Base, needs Gemma 4 stack |
Add custom models with a [model:your-name] section using group = gemma.
data/
└── datasets/
└── <dataset-name>/
├── train.csv # required
├── validation.csv # optional
└── test.csv # optional
output/
└── {run-id}-{profile}/
├── metadata.json
├── metrics.json
├── checkpoint-*/
└── adapter_model/ # LoRA artifacts
from gemma_tuner.core.config import load_config
from gemma_tuner.core.ops import run_finetune
# Load config
config = load_config("config/config.ini")
# Run fine-tuning for a profile
run_finetune(profile="my-audio-profile", config=config, json_logging=True)
from gemma_tuner.utils.device import get_device, memory_hint
device = get_device() # Returns "mps", "cuda", or "cpu"
print(f"Training on: {device}")
hint = memory_hint(model_key="gemma-3n-e2b-it")
print(hint)
from gemma_tuner.utils.dataset_utils import load_csv_dataset
train_df, val_df = load_csv_dataset(
data_dir="data/datasets/my-text-dataset",
text_column="response",
prompt_column="prompt"
)
print(f"Train samples: {len(train_df)}, Val samples: {len(val_df)}")
from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"google/gemma-3n-E2B-it",
torch_dtype="auto",
device_map="mps"
)
lora_config = LoraConfig(
r=16,
lora_alpha=32,
lora_dropout=0.05,
target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
task_type="CAUSAL_LM"
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# 1. Prepare your data
mkdir -p data/datasets/my-dataset
cp train.csv data/datasets/my-dataset/
cp validation.csv data/datasets/my-dataset/
# 2. Add profile to config/config.ini
cat >> config/config.ini << 'EOF'
[dataset:my-dataset]
data_dir = data/datasets/my-dataset
[profile:my-text-run]
model = gemma-3n-e2b-it
dataset = my-dataset
modality = text
text_sub_mode = instruction
prompt_column = prompt
text_column = response
max_seq_length = 2048
lora_r = 16
lora_alpha = 32
EOF
# 3. Prepare dataset
gemma-macos-tuner prepare my-dataset
# 4. Fine-tune
gemma-macos-tuner finetune my-text-run --json-logging
# 5. Export merged weights
gemma-macos-tuner export my-text-run
[dataset:large-audio-gcs]
source = gcs
gcs_bucket = my-bucket
gcs_prefix = audio-training-data/
audio_column = audio_path
text_column = transcript
[profile:large-audio-run]
model = gemma-3n-e4b-it
dataset = large-audio-gcs
modality = audio
lora_r = 32
lora_alpha = 64
Set credentials:
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
gemma-macos-tuner finetune large-audio-run
[model:my-custom-gemma]
group = gemma
base_model = my-org/my-gemma-checkpoint
[profile:custom-run]
model = my-custom-gemma
dataset = my-dataset
modality = text
text_sub_mode = instruction
python -c "import platform; print(platform.machine())"
# Must be arm64 — if x86_64, reinstall Python natively:
brew install [email protected]
python3.12 -m venv .venv && source .venv/bin/activate
batch_size (try 1)gradient_accumulation_steps to compensatee2b instead of e4b)max_seq_length# Gemma 4 requires the updated Transformers stack
pip install -r requirements/requirements-gemma4.txt
# Use a separate venv if you also need Gemma 3n
export GEMMA_TUNER_CONFIG=/absolute/path/to/config/config.ini
gemma-macos-tuner finetune my-profile
huggingface-cli login
# Or:
export HF_TOKEN=your_hf_token
# Accept Gemma license at: https://huggingface.co/google/gemma-3n-E2B-it
gemma-macos-tuner system-check
This is a known v1 issue — USM audio tower weights stay in memory even for modality = text. See README/KNOWN_ISSUES.md. Workaround: use a smaller model variant to stay within RAM budget.
| File | Role |
|---|---|
| gemma_tuner/cli_typer.py | Main CLI entrypoint (gemma-macos-tuner) |
| gemma_tuner/core/ops.py | Dispatches prepare/finetune/evaluate/export |
| gemma_tuner/scripts/finetune.py | Router: Gemma models → models/gemma/finetune.py |
| gemma_tuner/models/gemma/finetune.py | Core training loop with LoRA |
| gemma_tuner/scripts/export.py | Merges LoRA → HF/SafeTensors tree |
| gemma_tuner/utils/device.py | MPS/CUDA/CPU selection and memory hints |
| gemma_tuner/utils/dataset_utils.py | CSV loading, blacklist/protection semantics |
| gemma_tuner/wizard/ | Interactive CLI wizard (questionary + Rich) |
| config/config.ini | Hierarchical INI configuration |
development
```markdown --- name: compose-performance-skills description: Install and use the skydoves/compose-performance-skills agent skill library to diagnose and fix Jetpack Compose performance issues including stability, recomposition, lazy layouts, modifiers, side effects, and build configuration. triggers: - "my composable recomposes too often" - "LazyColumn drops frames during scroll" - "diagnose Compose stability issues" - "fix unnecessary recomposition in Jetpack Compose" - "optimize Com
development
Headless iOS Simulator manager with host-side HID input injection, 60fps streaming, and device farm web UI for iOS 26
development
```markdown --- name: claude-code-game-studios description: Turn Claude Code into a full 49-agent game dev studio with 72 workflow skills, automated hooks, and a real studio hierarchy for Godot, Unity, and Unreal projects. triggers: - "set up claude code game studios" - "use ai agents for game development" - "set up game dev studio with claude" - "add game studio agents to my project" - "how do I use claude code for game dev" - "set up godot unity unreal ai workflow" - "49 agents g
development
```markdown --- name: xq-py-quantum-vm description: Python implementation of the Quip Network's quantum virtual machine (xqvm) triggers: - quantum virtual machine python - xqvm quip network - quantum circuit simulation python - xq-py quantum vm - quip network quantum python - simulate quantum gates python - quantum vm xqvm - xqvm-py quantum circuit --- # xq-py Quantum Virtual Machine > Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection. `xqvm-py` is a Python impl