examples/book-sft-pipeline/SKILL.md
This skill should be used when the user asks to "fine-tune on books", "create SFT dataset", "train style model", "extract ePub text", or mentions style transfer, LoRA training, book segmentation, or author voice replication.
npx skillsauth add kalyanikhandare29/agent-skills-for-context-engineering book-sft-pipelineInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A complete system for converting books into SFT datasets and training style-transfer models. This skill teaches the pipeline from raw ePub to a model that writes in any author's voice.
Activate this skill when:
1. Intelligent Segmentation Text chunks must be semantically coherent. Breaking mid-sentence teaches the model to produce fragmented output. Target: 150-400 words per chunk, always at natural boundaries.
2. Diverse Instruction Generation Use multiple prompt templates and system prompts to prevent overfitting. A single prompt style leads to memorization. Use 15+ prompt templates with 5+ system prompts.
3. Style Over Content The goal is learning the author's rhythm and vocabulary patterns, not memorizing plots. Synthetic instructions describe what happens without quoting the text.
┌─────────────────────────────────────────────────────────────────┐
│ ORCHESTRATOR AGENT │
│ Coordinates pipeline phases, manages state, handles failures │
└──────────────────────┬──────────────────────────────────────────┘
│
┌───────────────┼───────────────┬───────────────┐
▼ ▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ EXTRACTION │ │ SEGMENTATION │ │ INSTRUCTION │ │ DATASET │
│ AGENT │ │ AGENT │ │ AGENT │ │ BUILDER │
│ ePub → Text │ │ Text → Chunks│ │ Chunks → │ │ Pairs → │
│ │ │ 150-400 words│ │ Prompts │ │ JSONL │
└──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
│
┌───────────────┴───────────────┐
▼ ▼
┌──────────────┐ ┌──────────────┐
│ TRAINING │ │ VALIDATION │
│ AGENT │ │ AGENT │
│ LoRA on │ │ AI detector │
│ Tinker │ │ Originality │
└──────────────┘ └──────────────┘
<p> tags to preserve breaks# Extract text from ePub paragraphs
from epub2 import EPub
from bs4 import BeautifulSoup
def extract_epub(path):
book = EPub(path)
chapters = []
for item in book.flow:
html = book.get_chapter(item.id)
soup = BeautifulSoup(html, 'html.parser')
paragraphs = [p.get_text().strip() for p in soup.find_all('p')]
chapters.append('\n\n'.join(p for p in paragraphs if p))
return '\n\n'.join(chapters)
Smaller chunks (150-400 words) produce more training examples and better style transfer than larger chunks (250-650).
def segment(text, min_words=150, max_words=400):
paragraphs = text.split('\n\n')
chunks, buffer, buffer_words = [], [], 0
for para in paragraphs:
words = len(para.split())
if buffer_words + words > max_words and buffer_words >= min_words:
chunks.append('\n\n'.join(buffer))
# Keep last paragraph for overlap
buffer = [buffer[-1], para] if buffer else [para]
buffer_words = sum(len(p.split()) for p in buffer)
else:
buffer.append(para)
buffer_words += words
if buffer:
chunks.append('\n\n'.join(buffer))
return chunks
For an 86,000-word book:
Using a single prompt template causes memorization. Diverse templates teach the underlying style.
SYSTEM_PROMPTS = [
"You are an expert creative writer capable of emulating specific literary styles.",
"You are a literary writer with deep knowledge of classic prose styles.",
"You are a creative writer skilled at emulating distinctive authorial voices.",
"You write prose that captures the essence of modernist literature.",
"You are a talented writer who can channel classic American authors.",
]
PROMPT_TEMPLATES = [
"Write a passage in the style of {author}: {desc}",
"Channel {author}'s voice to write about: {desc}",
"In {author}'s distinctive prose style, describe: {desc}",
"Write this scene as {author} would have: {desc}",
"Using {author}'s repetitive technique, describe: {desc}",
"Capture the rhythm of {author} in this passage: {desc}",
"Write like {author}: {desc}",
"In the voice of {author}, write: {desc}",
"This is a literary exercise. Write like {author}: {desc}",
"Can you write in {author}'s style? {desc}",
]
INSTRUCTION_PROMPT = """Describe what is happening in this excerpt in 2-3 sentences.
Focus on: characters present, actions, emotions, setting.
Do NOT quote the text directly.
Excerpt:
{text}
"""
# Use a fast, cheap LLM (e.g., Gemini Flash)
instruction = llm_call(INSTRUCTION_PROMPT.format(text=chunk))
{
"messages": [
{"role": "system", "content": "You are an expert creative writer..."},
{"role": "user", "content": "Write in the style of Author: Scene description..."},
{"role": "assistant", "content": "The actual book text from chunk..."}
]
}
def build_examples(chunk, instruction, author, variants=2):
examples = []
for i in range(variants):
system = SYSTEM_PROMPTS[i % len(SYSTEM_PROMPTS)]
template = PROMPT_TEMPLATES[(chunk.id + i) % len(PROMPT_TEMPLATES)]
user = template.format(author=author, desc=instruction)
examples.append({"messages": [
{"role": "system", "content": system},
{"role": "user", "content": user},
{"role": "assistant", "content": chunk.text}
]})
return examples
CONFIG = {
"model_name": "Qwen/Qwen3-8B-Base", # Base, not instruct
"lora_rank": 32, # 352MB adapter
"learning_rate": 5e-4, # Higher for LoRA
"batch_size": 4,
"epochs": 3,
}
Use base (pretrained) models, not instruction-tuned versions:
import tinker
from tinker import types
training_client = await service_client.create_lora_training_client_async(
base_model="Qwen/Qwen3-8B-Base",
rank=32
)
for epoch in range(3):
for batch in batches:
await training_client.forward_backward_async(batch, loss_fn="cross_entropy")
await training_client.optim_step_async(types.AdamParams(learning_rate=5e-4))
result = await training_client.save_weights_for_sampler_async(name="final")
Test with scenarios that couldn't exist in the original book:
TEST_PROMPTS = [
"Write about a barista making lattes",
"Describe lovers communicating through text messages",
"Write about someone anxious about climate change",
]
If the model applies style markers to modern scenarios, it learned style, not content.
# Search training data for output phrases
grep "specific phrase from output" dataset.jsonl
# Should return: No matches
Test outputs with GPTZero, Pangram, or ZeroGPT.
Symptom: Model uses original character names in new scenarios. Cause: Limited name diversity from one book. Solution: Train on multiple books or add synthetic examples.
Symptom: Outputs contain exact sentences from training data. Cause: Too few prompt variations or too many epochs. Solution: Use 15+ templates, limit to 3 epochs.
Symptom: Sentences feel incomplete. Cause: Poor segmentation breaking mid-thought. Solution: Always break at paragraph boundaries.
| Metric | Value | |--------|-------| | Training examples | 500-1000 per book | | Model | Qwen/Qwen3-8B-Base | | LoRA rank | 32 | | Adapter size | ~350 MB | | Training time | ~15 min | | Loss reduction | 90%+ | | Style transfer success | ~50% perfect |
| Component | Cost | |-----------|------| | LLM (instruction generation) | ~$0.50 | | Tinker training (15 min) | ~$1.50 | | Total | ~$2.00 |
This example applies several skills from the Agent Skills for Context Engineering collection:
The pipeline follows the staged, idempotent architecture pattern:
Each phase is resumable and produces intermediate artifacts for debugging.
Segmentation is a form of context compression for training. The core insight from context-compression applies: information density matters more than information quantity. Smaller, coherent chunks (150-400 words) produce better style transfer than larger, diluted chunks.
The two-tier strategy mirrors context compression evaluation:
The pipeline uses the supervisor/orchestrator pattern:
This matches the principle that sub-agents exist primarily to isolate context rather than simulate roles.
Validation follows the end-state evaluation pattern:
The "modern scenario" test is a form of out-of-distribution evaluation that proves generalization.
Prompt diversity prevents attention collapse on single patterns. When training with identical prompt structures, the model memorizes the instruction-response mapping. Diverse templates force attention across the style patterns themselves.
Internal references:
Related skills from Agent Skills for Context Engineering:
External resources:
Created: 2025-12-26 Last Updated: 2025-12-28 Author: Muratcan Koylan Version: 2.0.0 Standalone: Yes (separate from main context-engineering collection)
development
This skill should be used when the user asks to "write a post", "check my voice", "look up contact", "prepare for meeting", "weekly review", "track goals", or mentions personal brand, content creation, network management, or voice consistency.
data-ai
Template for creating new Agent Skills for context engineering. Use this template when adding new skills to the collection.
tools
This skill should be used when the user asks to "design agent tools", "create tool descriptions", "reduce tool complexity", "implement MCP tools", or mentions tool consolidation, architectural reduction, tool naming conventions, or agent-tool interfaces.
development
This skill should be used when the user asks to "start an LLM project", "design batch pipeline", "evaluate task-model fit", "structure agent project", or mentions pipeline architecture, agent-assisted development, cost estimation, or choosing between LLM and traditional approaches.