skills/data-research/SKILL.md
Structured data research: search sources, extract structured data, archive raw sources, maintain canonical tracker pages, deduplicate. Parameterized via YAML recipes for investor updates, donations, company updates, or any email-to-structured-data pipeline.
npx skillsauth add garrytan/gbrain data-researchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Structured research pipeline: search sources, extract structured data, archive raw, deduplicate, update canonical trackers, backlink entities.
One skill for any email-to-structured-data pipeline. The only differences between tracking investor updates, expenses, and company metrics are the search queries, extraction schemas, and tracker page format. All three use the same 7-phase pipeline with parameterized recipes.
Ask the user what they want to track. Either:
Recipes are YAML files at ~/.gbrain/recipes/{name}.yaml. Use gbrain research init
to scaffold a new one.
Brain first (maybe we already have this data). Then:
Deterministic first (regex patterns from recipe), LLM fallback. Log every LLM fallback for future regex improvement (fail-improve loop). Skip marketing, newsletters, noise based on recipe's classification rules.
EXTRACTION INTEGRITY RULE:
This prevents a known hallucination bug where batch-processed amounts were 13/13 wrong from LLM working memory while saved files were correct.
put_raw_data for email bodies, API responsesfile_upload for PDF attachments, documents.redirect.yaml pointers for large files in storageBefore adding to tracker:
Three example recipes ship with GBrain (see ~/.gbrain/recipes/):
Brain page at the recipe's tracker_page path with markdown tables:
### 2026
| Date | Company | MRR | ARR | Growth | Status |
|------|---------|-----|-----|--------|--------|
| 2026-04-01 | Example Co | $188K | $2.3M | +14.7% MoM | [Source](link) |
Each entry links to its raw source. Running totals at the bottom of each section.
References skills/conventions/quality.md for citation and back-linking rules.
tools
Validate and auto-repair YAML frontmatter on brain pages. Catches malformed pages before they enter the brain (missing closing ---, nested quotes, slug mismatches, null bytes, empty frontmatter, YAML parse failures). Wraps the `gbrain frontmatter` CLI for agent-driven workflows.
data-ai
Trace one idea's evolution through the brain: first mention, best articulation, related concepts, reversals, contradictions, abandoned branches, and the current live version. Use for single-idea conceptual lineage, not broad concept-map synthesis or structured entity metrics.
data-ai
Route to Venus (sharp executive-assistant voice persona). Used for logistics — calendar, tasks, recent messages, brain lookups — at sub-second phone-call latency. The default voice persona unless DEFAULT_PERSONA=mars is set.
tools
Route to Mars (introspective thought partner / demo showman voice persona). Used when the operator wants depth, meaning, or impressive social demos rather than logistics. Mars handles SOLO mode (philosophy, presence, patterns) and DEMO mode (tool-driven showmanship) automatically.