skills/research/news-collection/SKILL.md
Collect, filter, and freshness-qualify news items.
npx skillsauth add notque/claude-code-toolkit news-collectionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Gather news items on a topic, filter junk cheaply, verify freshness, and emit qualified items under an evidence contract. Pipeline-shaped: four phases, a gate between each. Content pipelines consume the JSON artifact; pair with fact-check to verify what this skill qualifies.
Goal: gather candidate items from available sources (feeds, search results, provided fixtures). Every item carries the five-fact evidence contract: title, url, outlet, author, published_at.
Rule (verbatim from the design): publish times are extracted (article
metadata), never guessed; missing timestamp → recorded as unknown, confidence
lowered and disclosed. A guessed timestamp poisons every downstream freshness
verdict; an honest "published_at": null keeps the item usable with known
uncertainty.
Record each item as one JSON object per the schema in
references/evidence-contract.md. Fill evidence_notes with where each fact
came from (meta tag, byline, JSON-LD, sitemap).
Distinguish outcomes: zero items because sources were unreachable is a collection failure (report it, stop); zero items from reachable sources is a valid empty feed (deliver an empty artifact with counts of zero).
Gate: every collected item has all five fields present — value or explicit
null with a confidence downgrade and a disclosure note. Items with silent
gaps stay in COLLECT until the gap is recorded.
Goal: cheap, high-recall pass over collected items. Three verdicts:
keep / monitor_only / reject, each with a reason code from
references/coarse-filter.md.
Rule (verbatim from the design): high-magnitude stories are downgraded to monitor_only at most, never rejected — a false keep is cheap, a silent drop is expensive. A keep costs one extra freshness check; a wrongly dropped major story costs the whole pipeline its value.
This phase runs on a cheap model when dispatched — it needs recall, not
judgment depth. See the dispatch note in references/coarse-filter.md.
Gate: every item has exactly one verdict and one reason code. reject
verdicts on items that look high-magnitude get re-checked once before the
phase closes.
Goal: for each keep and monitor_only item, establish when the story
first became public and whether this page is the original coverage. Methods in
references/freshness-forensics.md:
Rule (verbatim from the design): two independent sources or verdict "unclear". Conservative default: unclear over guessed. An "unclear" verdict is recoverable downstream; a confidently wrong "fresh" verdict ships stale news.
Gate: every surviving item carries freshness: fresh | stale | unclear, a
first_public_estimate (or null), and the count of sources backing the
verdict. Duplicate clusters are consolidated to one canonical item with
duplicates_of links.
Goal: emit qualified items as a structured JSON artifact (schema:
references/evidence-contract.md) plus a summary table. Every verdict state
appears in the artifact — monitor_only, unclear, and reject items ship
with their verdicts rather than vanishing, so consumers see the full triage.
Gate (deterministic phase checkpoint — emit this table before delivering the artifact; delivery without it is incomplete):
| Verdict | Count | |---------|-------| | keep | n | | monitor_only | n | | reject | n | | unclear (freshness) | n | | duplicates consolidated | n |
The counts make silent drops visible: collected total must equal keep + monitor_only + reject. If it does not, return to the phase that lost items.
No items collected
No timestamp found anywhere for an item
published_at: null, confidence: low, disclose in
evidence_notes; freshness verdict for that item is unclear.Two sources disagree on first-public time
references/freshness-forensics.md); if still split, verdict unclear.Item count mismatch at DELIVER
| Signal | Load These Files | Why |
|---|---|---|
| Recording items, JSON artifact schema, confidence fields | evidence-contract.md | Five-fact contract and artifact schema |
| Assigning verdicts, reason codes, cheap-model dispatch | coarse-filter.md | Verdict definitions and dispatch note |
| Dating a story, syndication, duplicates, canonical pick | freshness-forensics.md | Forensic methods and rubrics |
data-ai
Extract video transcripts: yt-dlp subtitles to clean paragraphs.
development
Convert PDF, Office, HTML, data, media, ZIP to Markdown.
testing
Verify factual claims against sources before publish.
data-ai
Package session state for the next agent, or rehydrate it at start.