Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

life-efficient/data-research

Name: data-research
Author: life-efficient

skills/data-research/SKILL.md

npx skillsauth add life-efficient/jarvis data-research

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Data Research

Structured research pipeline: search sources, extract structured data, archive raw, deduplicate, update canonical trackers, backlink entities.

Contract

One skill for any email-to-structured-data pipeline. The only differences between tracking investor updates, expenses, and company metrics are the search queries, extraction schemas, and tracker page format. All three use the same 7-phase pipeline with parameterized recipes.

When to Use

User wants to track structured data from email, web, or API sources
User says "research", "track", "extract from email", "build a tracker"
User mentions investor updates, donations, company metrics, filings
User wants to set up recurring data collection (with cron recipe)

Phases

Phase 1: Define Research Recipe

Ask the user what they want to track. Either:

Pick a built-in recipe: investor-updates, expense-tracker, company-updates
Define a custom recipe with: source queries, classification rules, extraction schema, tracker page path, tracker format

Recipes are YAML files at ~/.gbrain/recipes/{name}.yaml. Use gbrain research init to scaffold a new one.

Phase 2: Search Sources

Brain first (maybe we already have this data). Then:

Email via credential gateway: windowed queries (quarterly, monthly if truncated)
Web via search: public filings, press releases, regulatory data
APIs: any structured data source the recipe defines
Attachments: PDF extraction, HTML stripping

Phase 3: Classify

Deterministic first (regex patterns from recipe), LLM fallback. Log every LLM fallback for future regex improvement (fail-improve loop). Skip marketing, newsletters, noise based on recipe's classification rules.

Phase 4: Extract Structured Data

EXTRACTION INTEGRITY RULE:

Save raw source immediately (before any extraction)
Extract fields using deterministic regex first, LLM fallback
When summarizing batch results: re-read from saved files
Never trust LLM working memory after batch processing

This prevents a known hallucination bug where batch-processed amounts were 13/13 wrong from LLM working memory while saved files were correct.

Phase 5: Archive Raw Sources

put_raw_data for email bodies, API responses
file_upload for PDF attachments, documents
Create .redirect.yaml pointers for large files in storage
Every tracker entry must link back to its raw source

Phase 6: Deduplicate

Before adding to tracker:

Exact match (same key fields) → skip
Fuzzy match (same entity + date + similar amount within tolerance) → flag for review
Different amount for same entity+date → add with note (could be correction)

Phase 7: Update Canonical Tracker + Backlink

Parse existing tracker page (markdown table)
Append new entries in correct section (grouped by year/quarter/entity)
Compute running totals
Backlink every mentioned entity (person → people/ page, company → companies/ page)
Uses enrichment service for entity pages

Built-In Recipes

Three example recipes ship with GBrain (see ~/.gbrain/recipes/):

investor-updates — extract MRR, ARR, growth, burn, runway, headcount from investor update emails
expense-tracker — extract amounts, recipients, platforms from receipt emails (subscriptions, services, recurring charges)
company-updates — extract revenue, users, key metrics from portfolio company update emails

Anti-Patterns

Trusting LLM working memory for amounts after batch processing (use extraction integrity rule)
Creating tracker entries without raw source links
Running without deduplication (leads to double-counted entries)
Hardcoding source-specific patterns in the pipeline code (use recipes)

Output Format

Brain page at the recipe's tracker_page path with markdown tables:

### 2026

| Date | Company | MRR | ARR | Growth | Status |
|------|---------|-----|-----|--------|--------|
| 2026-04-01 | Example Co | $188K | $2.3M | +14.7% MoM | [Source](link) |

Each entry links to its raw source. Running totals at the bottom of each section.

Conventions

References skills/conventions/quality.md for citation and back-linking rules.

life-efficient/data-research

skills/data-research/SKILL.md

Structured data research: search sources, extract structured data, archive raw sources, maintain canonical tracker pages, deduplicate. Parameterized via YAML recipes for investor updates, donations, company updates, or any email-to-structured-data pipeline.

testing

Updated Apr 21, 2026

$ install --global

skillsauth

npx skillsauth add life-efficient/jarvis data-research

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 21, 2026, 11:04 PM139.7s1 file scanned

SKILL.md

name:: data-research
version:: 1.0.0
description:: |
Structured data research:: search sources, extract structured data,
mutating:: true

Data Research

Structured research pipeline: search sources, extract structured data, archive raw, deduplicate, update canonical trackers, backlink entities.

Contract

When to Use

User wants to track structured data from email, web, or API sources
User says "research", "track", "extract from email", "build a tracker"
User mentions investor updates, donations, company metrics, filings
User wants to set up recurring data collection (with cron recipe)

Phases

Phase 1: Define Research Recipe

Ask the user what they want to track. Either:

Pick a built-in recipe: investor-updates, expense-tracker, company-updates
Define a custom recipe with: source queries, classification rules, extraction schema, tracker page path, tracker format

Recipes are YAML files at ~/.gbrain/recipes/{name}.yaml. Use gbrain research init to scaffold a new one.

Phase 2: Search Sources

Brain first (maybe we already have this data). Then:

Email via credential gateway: windowed queries (quarterly, monthly if truncated)
Web via search: public filings, press releases, regulatory data
APIs: any structured data source the recipe defines
Attachments: PDF extraction, HTML stripping

Phase 3: Classify

Phase 4: Extract Structured Data

EXTRACTION INTEGRITY RULE:

Save raw source immediately (before any extraction)
Extract fields using deterministic regex first, LLM fallback
When summarizing batch results: re-read from saved files
Never trust LLM working memory after batch processing

This prevents a known hallucination bug where batch-processed amounts were 13/13 wrong from LLM working memory while saved files were correct.

Phase 5: Archive Raw Sources

put_raw_data for email bodies, API responses
file_upload for PDF attachments, documents
Create .redirect.yaml pointers for large files in storage
Every tracker entry must link back to its raw source

Phase 6: Deduplicate

Before adding to tracker:

Exact match (same key fields) → skip
Fuzzy match (same entity + date + similar amount within tolerance) → flag for review
Different amount for same entity+date → add with note (could be correction)

Phase 7: Update Canonical Tracker + Backlink

Parse existing tracker page (markdown table)
Append new entries in correct section (grouped by year/quarter/entity)
Compute running totals
Backlink every mentioned entity (person → people/ page, company → companies/ page)
Uses enrichment service for entity pages

Built-In Recipes

Three example recipes ship with GBrain (see ~/.gbrain/recipes/):

investor-updates — extract MRR, ARR, growth, burn, runway, headcount from investor update emails
expense-tracker — extract amounts, recipients, platforms from receipt emails (subscriptions, services, recurring charges)
company-updates — extract revenue, users, key metrics from portfolio company update emails

Anti-Patterns

Trusting LLM working memory for amounts after batch processing (use extraction integrity rule)
Creating tracker entries without raw source links
Running without deduplication (leads to double-counted entries)
Hardcoding source-specific patterns in the pipeline code (use recipes)

Output Format

Brain page at the recipe's tracker_page path with markdown tables:

### 2026

| Date | Company | MRR | ARR | Growth | Status |
|------|---------|-----|-----|--------|--------|
| 2026-04-01 | Example Co | $188K | $2.3M | +14.7% MoM | [Source](link) |

Each entry links to its raw source. Running totals at the bottom of each section.

Conventions

References skills/conventions/quality.md for citation and back-linking rules.

Related Skills

life-efficient/webhook-transforms

development

VerifiedTrustedCommunity

Generic framework for converting external events (SMS, meetings, social mentions) into brain-ingestible signals. Define a transform function, register a webhook URL, and incoming events get processed through the brain pipeline.

SKILL.mdUpdated Apr 21, 2026

life-efficient/webhook-transforms

life-efficient/testing

development

VerifiedTrustedCommunity

Skill validation framework. Validates every skill has SKILL.md with frontmatter, every reference exists, every env var is declared. The testing contract for the skill system itself.

SKILL.mdUpdated Apr 21, 2026

life-efficient/testing

life-efficient/soul-audit

testing

VerifiedTrustedCommunity

6-phase interactive interview that generates the agent's identity (SOUL.md), user profile (USER.md), access control (ACCESS_POLICY.md), and operational cadence (HEARTBEAT.md). Re-runnable anytime to update any section.

SKILL.mdUpdated Apr 21, 2026

life-efficient/soul-audit

life-efficient/skillpack-check

testing

VerifiedTrustedCommunity

Run `gbrain skillpack-check` to produce an agent-readable JSON health report for the gbrain install. Wraps `gbrain doctor` + `gbrain apply-migrations --list` so a host agent (Wintermute's morning-briefing, any OpenClaw cron) can see at a glance whether the skillpack needs attention. Use when the user asks "is gbrain healthy?", when a cron fires a morning check, or proactively when something seems off (jobs not running, brain not updating, autopilot silent).

SKILL.mdUpdated Apr 21, 2026

life-efficient/skillpack-check

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/life-efficient/jarvis.git

# Copy into Claude Code skills folder (global)
cp -r jarvis/skills/data-research ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

life-efficient/jarvis

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT