skills/concept-synthesis/SKILL.md
Deduplicate and synthesize raw concept stubs into a tiered intellectual map (T1 Canon to T4 Riff), tracing idea evolution across sources over time. Transforms thousands of raw concept pages into a curated intellectual fingerprint.
npx skillsauth add garrytan/gbrain concept-synthesisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Convention: see conventions/quality.md for back-link enforcement and quote-fidelity requirements.
Convention: see _brain-filing-rules.md — output files under
concepts/per the primary-subject rule.
Many ingestion pipelines (signal-detector, idea-ingest, voice-note-ingest) create a concept page for every idea mentioned. Over months this produces:
This skill transforms that raw material into a curated intellectual map.
Phase 1: Dedup + merge (deterministic)
N stubs → ~N/4 canonical concepts
├── Jaccard dedup (word-overlap on titles + first-paragraph)
├── Substring dedup ("founder mode" vs "founder mode vs manager mode")
├── Semantic dedup (LLM: "are these the same idea?")
└── Merge timelines + aliases from duplicates into the canonical page
Phase 2: Score + tier (deterministic + heuristic)
Each canonical concept → scored and tiered
├── Frequency: distinct sources referencing this concept
├── Timespan: first mention → last mention in days
├── Breadth: distinct months it appears in
├── Engagement: avg engagement on concept-bearing sources (if available)
└── Tier: T1 Canon | T2 Developing | T3 Speculative | T4 Riff
Phase 3: Synthesize (LLM, T1+T2 only)
T1 + T2 concepts → rich synthesis
├── Evolution narrative: how the idea sharpened over time
├── Best articulation: highest-engagement or most precise quote
├── Related concepts: cross-links to other concepts
├── Context: what was happening when this idea emerged / evolved
└── Counter-positions: what this idea argues against
Phase 4: Cluster + map (LLM)
All tiered concepts → intellectual clusters
├── Group related concepts into domains (auto-named via LLM)
├── Generate cluster summary pages
├── Build a master concepts/README.md with the full map
└── Identify idea genealogies (concept A → evolved into concept B)
The skill is markdown agent instructions. The agent uses gbrain's existing operations + LLM passes:
# 1. List all concept pages
gbrain query "type:concept" --limit 10000 --json
# 2. Phase 1 dedup — agent applies Jaccard + substring locally,
# then LLM passes to identify semantic duplicates.
# 3. Phase 2 tier — agent scores each canonical concept based on
# frequency / timespan / breadth and writes tier into frontmatter.
# 4. Phase 3 synthesis — for each T1/T2, agent reads the timeline
# + associated source pages and writes a synthesis section
# onto the concept page via put_page.
# 5. Phase 4 clustering — agent reads the tiered concept list
# and writes concepts/README.md with the full intellectual map.
---
title: "concept name"
type: concept
tier: 1
tier_label: "Canon"
mention_count: 18
distinct_months: 8
first_mention: "YYYY-MM-DD"
last_mention: "YYYY-MM-DD"
composite_score: 78.4
aliases: ["alternate phrasing 1", "alternate phrasing 2"]
related: ["sibling-concept-1", "sibling-concept-2"]
---
# concept name
**Tier 1 — Canon** | 18 mentions across 8 months
## Synthesis
[2-4 paragraph narrative tracing how the idea evolved, what it means in
the user's worldview, why it matters. Third-person analytical voice.]
## Best Articulation
> "Verbatim quote from a source — the most precise or highest-engagement
> expression of this idea." — [Date](source-url)
## Evolution
| Period | Expression | Signal |
|--------|-----------|--------|
| YYYY-MM | "First articulation" | First use — aspiration frame |
| YYYY-MM | "Sharpening" | Anti-pattern emerges |
| YYYY-MM | "Peak form" | Cleanest expression |
## Related Concepts
- [sibling concept](sibling-concept.md) — relationship description
- [sibling concept](sibling-concept.md) — relationship description
## Timeline
[Full timeline with deduped entries, quotes, source links]
---
title: "concept name"
type: concept
tier: 4
tier_label: "Riff"
mention_count: 1
---
# concept name
**Tier 4 — Riff** | 1 mention
> "Quote from the source" — [Date](URL)
# Intellectual Universe
## Canon (T1) — N concepts
The permanent intellectual fingerprint. Ideas that recur across years.
### [Cluster Name]
- [concept-slug](concept-slug.md) — one-line characterization
- ...
### [Other Cluster]
- ...
## Developing (T2) — N concepts
Sharpening. Might become canon.
## Speculative (T3) — N concepts
Testing in public.
## Stats
- Total concepts: N
- T1 Canon: N
- T2 Developing: N
- T3 Speculative: N
- T4 Riff: N
- Earliest source: YYYY-MM-DD
- Latest source: YYYY-MM-DD
gbrain query "type:concept" and spot-check the count reduction.This is heavy work. Run on a cadence, not on every signal:
skills/signal-detector/SKILL.md — creates raw concept stubs from text channelsskills/voice-note-ingest/SKILL.md — same for audio channelsskills/idea-ingest/SKILL.md — same for links / articlesThis skill guarantees:
writes_to: (when applicable).quality.md, brain-first.md, _brain-filing-rules.md) are followed.The full behavior contract is documented in the body sections above; this section exists for the conformance test.
The skill's output shape is documented inline in the body sections above (see "Output", "Brain page format", or equivalent). The literal section header here exists for the conformance test (test/skills-conformance.test.ts).
tools
--- name: query-helper triggers: - find a page tools: - search - query writes_pages: false --- # query-helper This skill helps you query the brain. The first prose line becomes the description when no `description:` frontmatter is present.
testing
# broken This SKILL.md has no YAML frontmatter fence. It must still be listed (with the directory name as its name and empty triggers), never throw.
documentation
Read, enrich, and write brain pages with source attribution.
testing
Migrate a brain from gbrain-base (or any pack) to gbrain-base-v2's 14-canonical-type taxonomy via gbrain onboard --check + the unify-types Minion handler. Collapses 94 noisy types to 15 canonical with subtypes, alias rows, and link rows. Triggers when an agent notices pack_upgrade_available, type_proliferation, or asks "what is the canonical taxonomy / how do I clean up my page types".