codex-plugin/skills/hint-corpus/SKILL.md
Build, convert, and fine-tune the Qwen3-0.6B hint model for personal fact extraction. Covers corpus generation, ChatML conversion, LoRA fine-tuning with unsloth, GGUF export, and Ollama registration.
npx skillsauth add genomewalker/cc-soul hint-corpusInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Full pipeline to produce chitta-hint-tuned (Qwen3-0.6B Q4_K_M) from scratch.
# 1. Generate corpus (requires Ollama + gemma4:26b or any capable model)
python3 $PLUGIN_DIR/scripts/generate_hint_corpus.py \
--out /maps/projects/caeg/scratch/kbd606/tmp/hint_corpus_raw.jsonl \
--model gemma4:26b \
--target 3000
# 2. Convert to Qwen3 ChatML for unsloth
python3 $PLUGIN_DIR/scripts/convert_to_chatml.py \
--in /maps/projects/caeg/scratch/kbd606/tmp/hint_corpus_raw.jsonl \
--out /maps/projects/caeg/scratch/kbd606/tmp/hint_corpus_chatml.jsonl \
--split 0.1
# 3. Fine-tune Qwen3-0.6B + export GGUF
bash $PLUGIN_DIR/scripts/finetune_hint_qwen.sh \
--data /maps/projects/caeg/scratch/kbd606/tmp/hint_corpus_chatml.jsonl \
--steps 300
# 4. Register with Ollama
bash $PLUGIN_DIR/chitta-mcp/enrichers/setup_hint_model.sh
Where $PLUGIN_DIR = /maps/projects/fernandezguerra/apps/repos/cc-soul (or installed plugin path).
generate_hint_corpus.py builds diverse synthetic conversation excerpts and labels them via a teacher LLM. It covers:
| Axis | Examples | |------|---------| | Profession | bioinformatician, nurse, teacher, architect, chef... | | Location | city, country, living situation | | Language background | native/non-native/bilingual | | Relationships | partner, children, pets | | Health | dietary restrictions, exercise habits | | Hobbies | sports, arts, gaming, gardening... | | Preferences | dark mode, editors, morning/evening person | | Education | PhD, self-taught, vocational |
35% hard negatives (questions, debugging requests, factual queries — output: -).
Key flags:
--target N # examples to generate (default: 1500; recommend 3000)
--model MODEL # teacher model (default: llama3.3:70b; gemma4:26b works well)
--neg-ratio 0.35 # fraction of negatives
--dry-run # preview templates, no LLM calls
Expected runtime: ~2h for 3000 examples with gemma4:26b on a single GPU node.
convert_to_chatml.py wraps each {"input", "output"} row in a ShareGPT conversation with the system prompt baked in.
System prompt (fixed, version-controlled):
You extract personal facts from conversation excerpts. Given a message or conversation, output a single concise third-person sentence about the user (e.g. "User lives in Copenhagen.", "User has two cats."). If no stable personal fact is present, output exactly: -
--split 0.1 writes a 10% eval holdout to <out>_eval.jsonl.
finetune_hint_qwen.sh runs QLoRA via unsloth:
| Hyperparameter | Default |
|----------------|---------|
| Base model | Qwen/Qwen3-0.6B |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| Max steps | 200 |
| Batch size | 4 × grad_accum 4 = 16 effective |
| Learning rate | 2e-4 |
| Quantisation | 4-bit QLoRA (bitsandbytes) |
Requirements:
pip install "unsloth[colab-new]" xformers trl peft accelerate bitsandbytes
GPU note: Qwen3-0.6B fits in ~4 GB VRAM at 4-bit. CPU training is possible but slow (~30 min/100 steps).
After training, the script:
$OUT_DIR)convert_hf_to_gguf.py (needs llama.cpp)llama-quantize (~480 MB)Override paths via environment:
CHITTA_HINT_DATA=/path/to/corpus.jsonl
CHITTA_HINT_MODEL_DIR=/path/to/merged_output
CHITTA_HINT_GGUF_DIR=/path/to/gguf_output
LLAMA_CONVERT=/path/to/llama.cpp/convert_hf_to_gguf.py
LLAMA_QUANTIZE=/path/to/llama-quantize
setup_hint_model.sh registers the Q4_K_M GGUF with Ollama as chitta-hint-tuned.
It checks $CHITTA_HINT_GGUF_DIR for the GGUF, falls back to F16, then safetensors.
After registration, test with:
chitta hint_enrich --dry-run
# or via MCP:
chitta run_hint_enricher --dry_run true --limit 10
After registration, run the embedding benchmark:
python3 /maps/projects/caeg/scratch/kbd606/tmp/test_embeddings.py
Target metrics vs Qwen2.5-0.5B baseline: | Metric | Baseline | Target | |--------|----------|--------| | Personal↔Personal cosine | 0.76 | >0.85 | | Separation ratio (pp−pn) | 0.28 | >0.40 | | NN accuracy | 5/8 | 7/8+ |
Qwen3-0.6B shares its architecture with Qwen3-Embedding-0.6B (MTEB STS 86.57) — use --pooling last and L2-normalize embeddings.
<|endoftext|> as final token for embedding mode./hint-corpus again after accumulating new session data. Use --target 5000 if separation metrics plateau at 3k.development
Build, convert, and fine-tune the Qwen3-0.6B hint model for personal fact extraction. Covers corpus generation, ChatML conversion, LoRA fine-tuning with unsloth, GGUF export, and Ollama registration.
tools
Browse and resume tasks, threads, and background jobs across sessions
tools
Resume a thread by loading its ~800-token context capsule
tools
Browse and resume tasks, threads, and background jobs across sessions