skills/AI/AI-llm-architecture/7.2.-fine-tuning-to-follow-instructions/SKILL.md
How to fine-tune a pre-trained LLM to follow instructions and respond to tasks like a chatbot. Use this skill whenever the user wants to train an LLM on instruction-response pairs, format datasets for instruction tuning, evaluate fine-tuned model responses, or understand the complete instruction fine-tuning workflow. Make sure to use this skill when users mention instruction tuning, chatbot training, Alpaca format, Phi-3 format, or any scenario where they need to make an LLM respond to specific prompts rather than just generate text.
npx skillsauth add abelrguezr/hacktricks-skills llm-instruction-finetuningInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill guides you through fine-tuning a pre-trained LLM to follow instructions and respond to tasks like a chatbot, rather than just generating text.
Use this skill when:
Instruction fine-tuning transforms a pre-trained language model into one that:
You need a dataset with instructions and responses. Common formats:
Alpaca Style:
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Calculate the area of a circle with a radius of 5 units.
### Response:
The area of a circle is calculated using the formula A = πr². Plugging in the radius of 5 units:
A = π(5)² = π × 25 = 25π square units.
Phi-3 Style:
<|User|>
Can you explain what gravity is in simple terms?
<|Assistant|>
Absolutely! Gravity is a force that pulls objects toward each other.
Use the format_instruction_data.py script (in scripts/) to convert your raw instruction-response pairs into the desired format. This handles:
Using cross_entropy(..., ignore_index=-100) tells PyTorch to ignore targets with -100. This means:
Load your pre-trained LLM (this was covered in previous training steps). Then use your existing training function to fine-tune.
Watch both training and validation loss:
Action: Stop training at the epoch where validation loss starts increasing to avoid overfitting.
Unlike classification tasks, you can't trust loss alone for instruction tuning. The model might:
Loss won't catch this. You must evaluate response quality.
1. Manual Review
2. LLM-as-Judge
3. Standardized Benchmarks
| Benchmark | What It Tests | Link | |-----------|---------------|------| | MMLU | Knowledge across 57 subjects (humanities, sciences, etc.) | https://arxiv.org/abs/2009.03300 | | LMSYS Chatbot Arena | Side-by-side chatbot comparison | https://arena.lmsys.org | | AlpacaEval | GPT-4 evaluates model responses | https://github.com/tatsu-lab/alpaca_eval | | GLUE | 9 NLU tasks (sentiment, entailment, QA) | https://gluebenchmark.com | | SuperGLUE | Harder version of GLUE | https://super.gluebenchmark.com | | BIG-bench | 200+ tasks (reasoning, translation, QA) | https://github.com/google/BIG-bench | | HELM | Comprehensive evaluation (accuracy, robustness, fairness) | https://crfm.stanford.edu/helm | | HumanEval | Code generation problems | https://github.com/openai/human-eval | | SQuAD | Question answering on Wikipedia | https://rajpurkar.github.io/SQuAD-explorer | | TriviaQA | Trivia questions with evidence | https://nlp.cs.washington.edu/triviaqa |
testing
How to perform a House of Lore (small bin attack) heap exploitation. Use this skill whenever the user mentions heap exploitation, small bin attacks, fake chunks, glibc heap vulnerabilities, or needs to insert fake chunks into small bins for arbitrary read/write. Trigger for CTF challenges involving heap corruption, glibc 2.31+ exploitation, or when the user needs to bypass malloc sanity checks using fake chunk linking.
testing
How to perform House of Force heap exploitation attacks. Use this skill whenever the user mentions heap exploitation, House of Force, top chunk manipulation, arbitrary memory allocation, malloc manipulation, or wants to allocate chunks at specific addresses. Also trigger for CTF challenges involving heap overflows, top chunk size overwrites, or when the user needs to calculate evil_size for heap attacks. Make sure to use this skill for any binary exploitation task involving glibc heap manipulation, even if they don't explicitly say "House of Force".
tools
How to perform House of Einherjar heap exploitation to allocate memory at arbitrary addresses. Use this skill whenever the user mentions heap exploitation, glibc heap attacks, arbitrary memory allocation, off-by-one overflow exploitation, tcache poisoning, fast bin attacks, or any CTF challenge involving heap manipulation. This is essential for binary exploitation tasks where you need to control malloc() return addresses.
testing
How to identify, analyze, and exploit heap overflow vulnerabilities in binary exploitation challenges and real-world scenarios. Use this skill whenever the user mentions heap overflows, memory corruption, heap grooming, tcache poisoning, fast-bin attacks, or any heap-related vulnerability in CTF challenges, binary analysis, or security research. This skill covers heap overflow fundamentals, exploitation techniques, heap grooming strategies, and real-world CVE analysis.