skills/persona-knowledge/SKILL.md
Persistent, incremental, searchable persona knowledge base. Ingests data from Obsidian vaults, chat exports, X/Twitter archives, and more into a MemPalace-backed store with a Karpathy LLM Wiki knowledge layer. Exports training/ directories for persona-model-trainer.
npx skillsauth add acnlabs/openpersona persona-knowledgeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Persistent, incremental, searchable persona knowledge base — the data layer between raw sources and persona training.
Architecture: MemPalace (storage + search) + Knowledge Graph (relationships + timeline) + Karpathy LLM Wiki (knowledge accumulation)
Dependency chain: data sources → persona-knowledge → anyone-skill / persona-model-trainer
Trigger phrases:
Not suitable when:
Create a new persona dataset:
python scripts/init_knowledge.py --slug {slug} --name "Display Name"
This creates ~/.openpersona/knowledge/{slug}/ with:
~/.openpersona/knowledge/{slug}/
dataset.json # metadata: slug, name, created_at, stats
.mempalace/ # MemPalace local data (per-dataset isolation via palace_path)
palace/ # MemPalace internal store (ChromaDB + KG)
sources/ # immutable source file backups
.source-index.json # per-file metadata: hash, import time, line count, PII flags
wiki/ # Karpathy wiki (LLM-maintained derived artifact)
_schema.md # wiki maintenance rules
identity.md
voice.md
values.md
thinking.md
relationships.md # generated from Knowledge Graph
timeline.md # generated from Knowledge Graph
_contradictions.md
_changelog.md
_evidence.md
MemPalace palace structure:
hall_facts — Identity (background, career, education)hall_events — Memory (key events, turning points)hall_preferences — Personality (values, preferences, boundaries)hall_discoveries — Procedure (mental models, decision heuristics)hall_voice — Interaction (vocabulary, rhythm, humor, emotional temperature)Gate: Confirm slug and display name with the user before proceeding.
Import data sources into the dataset. Can be called multiple times for incremental ingestion.
python scripts/ingest.py --slug {slug} --source <path> [--adapter <name>] [--since <date>]
Three adapters cover all supported formats:
| Source | Adapter | Detection |
|--------|---------|-----------|
| Obsidian vault | universal | Directory containing .obsidian/ or *.md files |
| GBrain export dir | universal | Markdown directory with .raw/ sidecar dirs |
| .md / .txt / .csv / .pdf | universal | File extension |
| .jsonl / .json | universal | File extension |
| GBrain JSON export | universal | .json with memories key or --entity flag |
| WhatsApp .txt export | chat_export | Matches WhatsApp timestamp pattern |
| Telegram result.json | chat_export | JSON with chats key |
| Signal export | chat_export | JSON with Signal message format |
| iMessage .db | chat_export | SQLite with message + handle tables |
| X (Twitter) archive | social | Directory containing data/tweets.js |
| Instagram archive | social | Directory containing content/posts_1.json |
Ingest pipeline (per source):
[{role, content, timestamp, source_file, source_type}]After each source is ingested, report:
✅ whatsapp-2024.txt → 1,247 messages (892 assistant turns)
PII: none detected
KG: +3 entities, +7 relationships
→ sources/whatsapp-2024.jsonl
After ingesting new data, the agent reads MemPalace content and Knowledge Graph relationships, then builds or updates the wiki pages following the Karpathy LLM Wiki pattern.
This phase is driven by agent intelligence (SKILL.md instructions), not by automated scripts. The LLM decides which pages to update, how to phrase entries, and how to tag evidence.
[L1:source] — direct quote, traceable[L2] — reported/paraphrased, verifiable[L3:inferred] — reasonably inferred from multiple signals[L4:inspired] — impression-based[[page]] wikilink syntax_contradictions.md with both sides cited_changelog.md_evidence.mdWhen the user asks a question about the persona:
Run periodically or before export:
python scripts/lint_wiki.py --slug {slug}
Checks:
[[links]] (referenced page doesn't exist)references/wiki-schema.md for full spec)Each page follows this template:
# {Page Title}
> One-sentence summary of this page's scope.
## Content
{Structured content with [L?:source] evidence tags and [[backlinks]]}
## Sources
- {source_file}: {what was extracted} [L?]
## See also
- [[related_page]]
relationships.md and timeline.md are generated from the Knowledge Graph, not written freehand:
from mempalace.knowledge_graph import KnowledgeGraph
kg = KnowledgeGraph(palace_path)
kg.timeline(slug) # → chronological event list for timeline.md
kg.query_entity(slug) # → current relationships for relationships.md
After generating, the agent may annotate with evidence tags and additional context.
Generate a training/ directory compatible with persona-model-trainer:
python scripts/export_training.py --slug {slug} --output training/
Each export is automatically versioned (v1, v2, …). Override with --version:
python scripts/export_training.py --slug {slug} --output training/ --version v3
List export history:
python scripts/export_training.py --slug {slug} --list
# v1 2026-04-01 10:00 142 turns sha256:a3f9c2d1 3 sources
# v2 2026-04-10 14:22 198 turns sha256:c7d2e1f3 4 sources
Output:
training/
raw/ # copied from sources/ (authentic voice, unmodified)
conversations.jsonl # generated from wiki pages (structured Q-A pairs)
profile.md # summarized from wiki identity/voice/values
metadata.json # slug, source count, turn count, export version + hash
How each file is built:
training/raw/ — direct copy of sources/*.jsonl and sources/*.txt filestraining/conversations.jsonl — the agent reads wiki pages and generates distilled user/assistant turn pairs representing the persona's voice, knowledge, and valuestraining/profile.md — 300-500 word character sheet derived from identity.md, voice.md, values.mdtraining/metadata.json — slug, name, subject_type, created_at, source_count, total_words, distilled_turns, raw_files + versioning fields:
export_version — version tag (e.g. "v2")export_hash — SHA-256 of conversations.jsonl (e.g. "sha256:c7d2e1f3...")source_snapshot — {filename: sha256_hash} dict of all source files at export timeExport history is appended to dataset.json → export_history[] after each run.
Downstream traceability: persona-model-trainer's pipeline.sh reads export_version and export_hash from metadata.json and injects them as dataset_version / dataset_export_hash into training_summary.json, forming a complete provenance chain from source data to trained model adapter.
This output is directly consumable by persona-model-trainer's prepare_data.py — no changes needed downstream.
→ Next step — train a local persona model:
bash skills/persona-model-trainer/scripts/pipeline.sh \
--slug {slug} \
--model google/gemma-4-E4B-it \
--source ./training \
--method mlx \ # or: unsloth (NVIDIA GPU) / colab (no GPU)
--preset gemma4 \
--probes ./training/probes.json
Full guide:
persona-model-trainer/references/pipeline-guide.md
Query the dataset using MemPalace's semantic search and Knowledge Graph:
# Semantic search across all stored memories
mempalace search "how does this person handle conflict" --wing {slug}
# Knowledge Graph: look up an entity's relationships
python scripts/query_kg.py --slug {slug} --entity "Tom"
# Knowledge Graph: shortest path between two entities
python scripts/query_kg.py --slug {slug} --path "Tom" "Alice"
# Knowledge Graph: overall statistics
python scripts/query_kg.py --slug {slug} --stats
# Knowledge Graph: JSON output (for programmatic use)
python scripts/query_kg.py --slug {slug} --entity "Tom" --json
# Wake-up summary (~170 tokens)
mempalace wake-up --wing {slug}
The agent can also search programmatically during wiki build or distillation:
from mempalace.searcher import search_memories
results = search_memories("vocabulary patterns", palace_path="~/.openpersona/knowledge/{slug}/.mempalace/palace")
Ongoing dataset management:
sources/ + re-index → run wiki lint to flag orphaned contentpython scripts/lint_wiki.py --slug {slug} — health checkpython scripts/init_knowledge.py --slug {slug} --stats — show current statsls ~/.openpersona/knowledge/ — all available datasets| Tool | Purpose |
|------|---------|
| Bash | Run init, ingest, export, lint scripts; MemPalace CLI commands |
| Read | Load source files, wiki pages, dataset.json |
| Write | Update wiki pages, write training exports |
| WebSearch | Fetch public figure data for ingestion |
| Script | Purpose |
|--------|---------|
| scripts/init_knowledge.py | Initialize knowledge directory + MemPalace wing + KG |
| scripts/ingest.py | Unified ingestion: adapter dispatch + PII scan + dedup + MemPalace + KG |
| scripts/export_training.py | Export sources/ + wiki → training/ directory |
| scripts/lint_wiki.py | Wiki health check: broken links, contradictions, coverage gaps |
| scripts/query_kg.py | Knowledge Graph query: entity lookup, shortest path, statistics |
| Adapter | Sources | Format |
|---------|---------|--------|
| universal | Obsidian vault, GBrain export, .md, .txt, .csv, .pdf, .jsonl, .json | All pure file reading |
| chat_export | WhatsApp / Telegram / Signal / iMessage | .txt / JSON / SQLite (special parsing) |
| social | X (Twitter) / Instagram archive | JS wrapper stripping + archive dirs |
references/wiki-schema.md — Karpathy wiki structure specification and maintenance rulesreferences/source-formats.md — supported data source formats and adapter detailstools
Audit any OpenPersona (or peer LLM-agent) persona in three complementary modes: structural (CLI, deterministic, CI-friendly: 4 Layers × 5 Systemic Concepts × Constitution gate with role-aware severity), semantic white-box (LLM reads pack-content JSON and scores Soul-narrative quality via rubrics), and semantic black-box (LLM evaluates a remote agent it cannot read on disk, via A2A handshake / consent-probe / passive observation, with confidence caps). Produces quality reports with dimension scores, strengths, and actionable improvements. Use when asked to evaluate, audit, score, review, self-review, peer-review, or black-box review an agent.
tools
Distill any commercial entity into a personalized brand agent — a living brand persona with authentic voice, declared service capabilities, and a standard service contract. Every commercial entity has a brand: a name, a style, a way of showing up in the world. This skill exists so that a street vendor, a family clinic, and a global chain can all have their own agent on equal footing. Supports both distillation from existing brand content and declaration from scratch.
development
A local-first personal AI double framework that helps users build, govern, and evolve their own digital self with clear
development
A complete pipeline to build your AI Second Me: distill your identity from personal data, grow a private knowledge base, train a local model, and govern what gets shared.