skills/tvb-wiki-ingestion/SKILL.md
arXiv paper ingestion for the TVB research wiki: fetch new papers, extract entities/concepts, update raw/ and meta/.
npx skillsauth add maedoc/tvb-wiki tvb-wiki-ingestionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Skill for fetching new arXiv papers relevant to connectome‑based whole‑brain modeling, extracting software/concept/author mentions, and updating the wiki’s raw paper library and entity counts.
Use this skill when:
~/tvb‑wiki (or WIKI_PATH environment variable)requests and xml.etree.ElementTree (standard library)~/tvb‑wiki/
├── raw/papers/ # Raw paper markdown files (arxiv-*.md)
├── meta/entity_counts.json # Counts of software/concept/author mentions
└── scripts/hourly_update.py # Main ingestion script
The ingestion script (hourly_update.py) performs:
raw/papers/meta/entity_counts.jsoncd ~/tvb‑wiki
python3 scripts/hourly_update.py
Output: Prints number of new papers added, e.g., Hourly update complete: 3 new papers.
List recently added raw papers:
ls -lt ~/tvb‑wiki/raw/papers/arxiv-*.md | head -5
View a raw paper:
cat ~/tvb‑wiki/raw/papers/arxiv-<ID>.md
The script updates meta/entity_counts.json with mention counts. View current counts:
cat ~/tvb‑wiki/meta/entity_counts.json | jq .
Example structure:
{
"software": {
"TVB": 42,
"NEST": 18,
"NEURON": 12
},
"concepts": {
"neural mass model": 56,
"fMRI": 89,
"EEG": 67
},
"authors": {
"John Doe": 5,
"Jane Smith": 3
}
}
If you need to ingest papers from other arXiv categories or with different keywords, you can modify the search queries inside hourly_update.py (function fetch_arxiv_since). Default queries:
queries = [
'cat:q-bio.NC+AND+all:connectome',
'all:neural+mass+model',
'all:dynamic+causal+modeling',
'all:The+Virtual+Brain',
'all:TVB',
'all:whole+brain+model',
]
Add new queries, adjust max_results, or change the since_date threshold.
To extend beyond arXiv:
fetch_biorxiv_since() that uses the bioRxiv RSS feed or API.source: bioRxiv field and save with filename prefix biorxiv-.See the automated‑research‑wiki skill for a multi‑source pattern.
The current extraction uses simple keyword matching. Keywords are defined in extract_entities:
Limitation: Keyword matching may miss synonyms or paraphrases. For more robust extraction, consider using a lightweight NLP library (spaCy) or an LLM‑based extractor.
Entity counts are used by the wiki’s page‑creation logic (in hourly_full.py). When a software/concept/author reaches a threshold (default: 2 mentions), a corresponding page is created in entities/ or concepts/. This happens during the full pipeline, not during ingestion alone.
Agents that discover this skill can:
scripts/hourly_update.pyWIKI_PATH if the repo is elsewhere.This skill is listed in the repo’s skill‑manifest.json:
{
"name": "tvb‑wiki‑ingestion",
"path": "skills/tvb‑wiki‑ingestion/SKILL.md",
"description": "arXiv paper ingestion and entity/concept extraction.",
"entry_point": "scripts/hourly_update.py",
"dependencies": ["python3"],
"schedule": "hourly"
}
time.sleep(1) between queries.After ingestion, verify:
.md files appear in raw/papers/meta/entity_counts.json has updated countsRun a quick count:
grep -c '^# ' ~/tvb‑wiki/raw/papers/arxiv-*.md
arxiv – General arXiv search and download skillllm‑wiki – Core wiki‑building skillautomated‑research‑wiki – Full automation patternresearch
Set up a recurring research watch on a topic, company, paper area, or product surface. Use when the user asks to monitor a field, track new papers, watch for updates, or set up alerts on a research area.
development
Build and deploy the TVB research wiki as a static site using MkDocs and GitHub Pages.
development
Full pipeline for maintaining the TVB research wiki: hourly arXiv ingestion, static site build, git commit, and GitHub Pages deployment.
tools
Compare multiple sources on a topic and produce a grounded comparison matrix. Use when the user asks to compare papers, tools, approaches, frameworks, or claims across multiple sources.