skills/ClawBio-skills/claw-semantic-sim/SKILL.md
Semantic Similarity Index for disease research literature using PubMedBERT embeddings
npx skillsauth add aaaaqwq/agi-super-team claw-semantic-simInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Measure how isolated or connected disease research is across the global biomedical literature, using PubMedBERT embeddings on PubMed abstracts spanning 175 GBD diseases.
If you ask ChatGPT to "measure research neglect for diseases," it will:
This skill encodes the correct methodological decisions:
Neglected tropical diseases (NTDs) are significantly more semantically isolated than other conditions (P < 0.001, Cohen's d = 0.8+). They exist in knowledge silos with limited cross-disciplinary research bridges. The 25 most isolated diseases are disproportionately Global South priority conditions.
05-00-heim-sem-setup.py # Validate environment, create directories
05-01-heim-sem-fetch.py # Retrieve PubMed abstracts (checkpointed)
05-02-heim-sem-embed.py # Generate PubMedBERT embeddings (MPS/CPU)
05-03-heim-sem-compute.py # Compute SII, KTP, RCC, temporal drift
05-04-heim-sem-figures.py # Generate publication figures
05-05-heim-sem-integrate.py # Merge with biobank + clinical trial dimensions
python semantic_sim.py --demo --output demo_report
The demo uses pre-computed embeddings and metrics for 175 GBD diseases and generates the full 4-panel figure instantly.
Semantic Similarity Index
=========================
Diseases analysed: 175
Total PubMed abstracts: 13,100,000
Embedding model: PubMedBERT (768-dim)
Metric Ranges:
SII: 0.0412 - 0.1893
KTP: 0.6234 - 0.9187
RCC: 0.0891 - 0.3421
Key Finding:
NTDs show +38% higher semantic isolation
P < 0.0001, Cohen's d = 0.84
14/25 most isolated diseases are Global South priority
Figures saved to: demo_report/
Fig5_Semantic_Structure.png (300 dpi)
Fig5_Semantic_Structure.pdf (vector)
Reproducibility:
commands.sh | environment.yml | checksums.sha256
If you use this skill in a publication, please cite:
development
Technology-agnostic prompt generator that creates customizable AI prompts for scanning codebases and identifying high-quality code exemplars. Supports multiple programming languages (.NET, Java, JavaScript, TypeScript, React, Angular, Python) with configurable analysis depth, categorization methods, and documentation formats to establish coding standards and maintain consistency across development teams.
tools
Expert-level browser automation, debugging, and performance analysis using Chrome DevTools MCP. Use for interacting with web pages, capturing screenshots, analyzing network traffic, and profiling performance.
data-ai
Prompt for creating detailed feature implementation plans, following Epoch monorepo structure.
tools
Interactive prompt refinement workflow: interrogates scope, deliverables, constraints; copies final markdown to clipboard; never writes code. Requires the Joyride extension.