hugging-face-model-trainer/SKILL.md
Train/fine-tune LLMs using TRL on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO, reward modeling, GGUF conversion, dataset validation, hardware selection, cost estimation, Trackio monitoring, and model persistence. Invoke for cloud GPU training or GGUF conversion.
npx skillsauth add lidge-jun/cli-jaw-skills hugging-face-model-trainerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Train language models using TRL on managed Hugging Face infrastructure. Models train on cloud GPUs and results save to Hugging Face Hub automatically.
hf_jobs() MCP tool — use hf_jobs("uv", {...}) with inline Python code in the script parameter. Local file paths do not work (jobs run in isolated Docker containers). When a user asks to train a model, create the script and submit immediately.scripts/ as templates.push_to_hub=True, hub_model_id, and secrets={"HF_TOKEN": "$HF_TOKEN"}.references/dataset_validation.md.| Method | Use Case | Dataset Format |
|--------|----------|---------------|
| SFT | Instruction tuning | messages / text / prompt-completion |
| DPO | Preference alignment | prompt, chosen, rejected |
| GRPO | Online RL training | prompt-only |
| Reward | RLHF reward models | preference pairs |
For method details: references/training_methods.md or hf_doc_search("query", product="trl")
Use Unsloth when VRAM is limited, speed matters, training large models (>13B), or training VLMs. ~2x faster, ~60% less VRAM. See references/unsloth.md.
hf_whoami()secrets={"HF_TOKEN": "$HF_TOKEN"} in job configdatasets.load_dataset()references/dataset_validation.md)UV scripts use PEP 723 inline dependencies for self-contained training:
hf_jobs("uv", {
"script": """
# /// script
# dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio"]
# ///
from datasets import load_dataset
from peft import LoraConfig
from trl import SFTTrainer, SFTConfig
import trackio
dataset = load_dataset("trl-lib/Capybara", split="train")
dataset_split = dataset.train_test_split(test_size=0.1, seed=42)
trainer = SFTTrainer(
model="Qwen/Qwen2.5-0.5B",
train_dataset=dataset_split["train"],
eval_dataset=dataset_split["test"],
peft_config=LoraConfig(r=16, lora_alpha=32),
args=SFTConfig(
output_dir="my-model",
push_to_hub=True,
hub_model_id="username/my-model",
num_train_epochs=3,
eval_strategy="steps",
eval_steps=50,
report_to="trackio",
project="meaningful_project_name",
run_name="meaningful_run_name",
)
)
trainer.train()
trainer.push_to_hub()
""",
"flavor": "a10g-large",
"timeout": "2h",
"secrets": {"HF_TOKEN": "$HF_TOKEN"}
})
Demo tip: For quick demos on smaller GPUs (t4-small), omit eval_dataset and eval_strategy to save ~40% memory.
TRL config classes use max_length (not max_seq_length):
SFTConfig(max_length=512) # ✅ Truncate to 512 tokens
SFTConfig(max_seq_length=512) # ❌ TypeError
Default max_length=1024 works well for most training. Override for longer context, memory constraints, or vision models (max_length=None).
See references/cli_usage.md for:
hf jobs uv run terminal commands (when MCP unavailable)trl-jobs sft one-liner traininguv-scripts organization| Model Size | Hardware | Cost (approx/hr) | Use Case |
|------------|----------|------------------|----------|
| <1B params | t4-small | ~$0.75 | Demos, quick tests (skip eval) |
| 1-3B params | t4-medium, l4x1 | ~$1.50-2.50 | Development |
| 3-7B params | a10g-small, a10g-large | ~$3.50-5.00 | Production |
| 7-13B params | a10g-large, a100-large | ~$5-10 | Large models (use LoRA) |
| 13B+ params | a100-large, a10g-largex2 | ~$10-20 | Very large (use LoRA) |
Use LoRA/PEFT for models >7B. Multi-GPU is handled automatically by TRL/Accelerate.
All flavors: cpu-basic/upgrade/performance/xl, t4-small/medium, l4x1/x4, a10g-small/large/largex2/largex4, a100-large, h100/h100x8
See references/hardware_guide.md for detailed specifications.
The training environment is ephemeral — all files are deleted when the job ends.
In training config:
SFTConfig(
push_to_hub=True,
hub_model_id="username/model-name",
hub_strategy="every_save", # optional: push checkpoints
)
In job submission:
{"secrets": {"HF_TOKEN": "$HF_TOKEN"}}
Checklist: push_to_hub=True ✓ | hub_model_id set ✓ | secrets has HF_TOKEN ✓ | write access ✓
See references/hub_saving.md for troubleshooting.
| Scenario | Recommended | Notes | |----------|-------------|-------| | Quick demo (50-100 examples) | 10-30 min | Verify setup | | Development training | 1-2 hours | Small datasets | | Production (3-7B model) | 4-6 hours | Full datasets | | Large model with LoRA | 3-6 hours | Depends on dataset |
On timeout, the job is killed immediately and unsaved progress is lost. Add 20-30% buffer.
{"timeout": "2h"} # formats: "90m", "2h", "1.5h", or seconds as integer
Offer to estimate cost when planning jobs with known parameters:
uv run scripts/estimate_cost.py \
--model meta-llama/Llama-2-7b-hf \
--dataset trl-lib/Capybara \
--hardware a10g-large \
--dataset-size 16000 \
--epochs 3
Output: estimated time, cost, recommended timeout, optimization suggestions.
Include Trackio in every script (report_to="trackio"). Default config:
{username}/trackioSee references/trackio_guide.md for complete setup and experiment grouping.
hf_jobs("ps") # List all jobs
hf_jobs("inspect", {"job_id": "your-job-id"}) # Job details
hf_jobs("logs", {"job_id": "your-job-id"}) # View logs
Validate format before GPU training — the #1 cause of training failures:
hf_jobs("uv", {
"script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py",
"script_args": ["--dataset", "username/dataset-name", "--split", "train"]
})
Output markers: ✓ READY (use directly) | ✗ NEEDS MAPPING (code provided) | ✗ INCOMPATIBLE
Skip validation for known TRL datasets (trl-lib/Capybara, trl-lib/ultrachat_200k, etc.).
See references/dataset_validation.md for full workflow and DPO mapping examples.
Convert trained models for local inference (Ollama, LM Studio, llama.cpp):
hf_jobs("uv", {
"script": "<see references/gguf_conversion.md for complete script>",
"flavor": "a10g-large",
"timeout": "45m",
"secrets": {"HF_TOKEN": "$HF_TOKEN"},
"env": {
"ADAPTER_MODEL": "username/my-finetuned-model",
"BASE_MODEL": "Qwen/Qwen2.5-0.5B",
"OUTPUT_REPO": "username/my-model-gguf"
}
})
See references/gguf_conversion.md for quantization options, hardware requirements, and troubleshooting.
| Problem | Fix |
|---------|-----|
| OOM | Reduce per_device_train_batch_size=1, increase gradient_accumulation_steps=8, enable gradient_checkpointing=True, upgrade GPU |
| Dataset format | Validate with dataset inspector first (see above) |
| Job timeout | Increase timeout with 30% buffer, reduce epochs/dataset, save checkpoints with hub_strategy="every_save" |
| Hub push fails | Add secrets={"HF_TOKEN": "$HF_TOKEN"}, verify push_to_hub=True and hub_model_id, check write permissions |
| Missing deps | Add to PEP 723 header: # dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio", "missing-pkg"] |
See references/troubleshooting.md for detailed solutions.
pip install -r requirements.txt
references/training_methods.md — SFT, DPO, GRPO, KTO, PPO, Reward Modeling overviewreferences/training_patterns.md — Common training patterns and examplesreferences/dataset_validation.md — Dataset validation workflowreferences/cli_usage.md — CLI, TRL scripts, alternative submission methodsreferences/unsloth.md — Unsloth for fast training (~2x speed, 60% less VRAM)references/gguf_conversion.md — GGUF conversion guidereferences/trackio_guide.md — Trackio monitoring setupreferences/hardware_guide.md — Hardware specs and selectionreferences/hub_saving.md — Hub auth troubleshootingreferences/troubleshooting.md — Common issues and solutionsscripts/train_sft_example.py — Production SFT templatescripts/train_dpo_example.py — Production DPO templatescripts/train_grpo_example.py — Production GRPO templatescripts/unsloth_sft_example.py — Unsloth training templatescripts/estimate_cost.py — Time and cost estimatorscripts/convert_to_gguf.py — GGUF conversion scriptdevelopment
Native Web UI structured renderer schemas for compose-block drafts, search-results cards, dataframe tables, chart-json charts, and diff output
tools
Unified search hub. Route any web/real-time/X lookup through a 4-tier escalation: built-in web search → cli-jaw browser CDP → progrok Grok OAuth → web-ai (Grok Expert / GPT Pro). Use for: search, 검색, web search, latest news, real-time info, X/Twitter, fact lookup, deep research.
development
UI/UX intent discovery, design vocabulary, product personalities, UX state patterns, typography line break judgment, favicon/product logo design, and logo trust section design. Use when user design direction is vague, when building onboarding/empty/error states, when setting up favicons or product logos, or when referencing a product aesthetic.
development
Canonical owner of module boundary rules, circular dependency detection/prevention, implicit coupling taxonomy, barrel/re-export discipline, and boundary-only defensive programming. Referenced by dev, dev-code-reviewer, dev-backend, dev-frontend stubs.