skills/data-research/SKILL.md
Structured data research: search sources, extract structured data, archive raw sources, maintain canonical tracker pages, deduplicate. Parameterized via YAML recipes for investor updates, donations, company updates, or any email-to-structured-data pipeline.
npx skillsauth add life-efficient/jarvis data-researchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Structured research pipeline: search sources, extract structured data, archive raw, deduplicate, update canonical trackers, backlink entities.
One skill for any email-to-structured-data pipeline. The only differences between tracking investor updates, expenses, and company metrics are the search queries, extraction schemas, and tracker page format. All three use the same 7-phase pipeline with parameterized recipes.
Ask the user what they want to track. Either:
Recipes are YAML files at ~/.gbrain/recipes/{name}.yaml. Use gbrain research init
to scaffold a new one.
Brain first (maybe we already have this data). Then:
Deterministic first (regex patterns from recipe), LLM fallback. Log every LLM fallback for future regex improvement (fail-improve loop). Skip marketing, newsletters, noise based on recipe's classification rules.
EXTRACTION INTEGRITY RULE:
This prevents a known hallucination bug where batch-processed amounts were 13/13 wrong from LLM working memory while saved files were correct.
put_raw_data for email bodies, API responsesfile_upload for PDF attachments, documents.redirect.yaml pointers for large files in storageBefore adding to tracker:
Three example recipes ship with GBrain (see ~/.gbrain/recipes/):
Brain page at the recipe's tracker_page path with markdown tables:
### 2026
| Date | Company | MRR | ARR | Growth | Status |
|------|---------|-----|-----|--------|--------|
| 2026-04-01 | Example Co | $188K | $2.3M | +14.7% MoM | [Source](link) |
Each entry links to its raw source. Running totals at the bottom of each section.
References skills/conventions/quality.md for citation and back-linking rules.
development
Generic framework for converting external events (SMS, meetings, social mentions) into brain-ingestible signals. Define a transform function, register a webhook URL, and incoming events get processed through the brain pipeline.
development
Skill validation framework. Validates every skill has SKILL.md with frontmatter, every reference exists, every env var is declared. The testing contract for the skill system itself.
testing
6-phase interactive interview that generates the agent's identity (SOUL.md), user profile (USER.md), access control (ACCESS_POLICY.md), and operational cadence (HEARTBEAT.md). Re-runnable anytime to update any section.
testing
Run `gbrain skillpack-check` to produce an agent-readable JSON health report for the gbrain install. Wraps `gbrain doctor` + `gbrain apply-migrations --list` so a host agent (Wintermute's morning-briefing, any OpenClaw cron) can see at a glance whether the skillpack needs attention. Use when the user asks "is gbrain healthy?", when a cron fires a morning check, or proactively when something seems off (jobs not running, brain not updating, autopilot silent).