biogeobears/SKILL.md
Set up and execute phylogenetic biogeographic analyses using BioGeoBEARS in R. Use when users request biogeographic reconstruction, ancestral range estimation, or want to analyze species distributions on phylogenies. Handles input file validation, data reformatting, RMarkdown workflow generation, and result visualization.
npx skillsauth add brunoasm/my_claude_skills biogeobearsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
BioGeoBEARS (BioGeography with Bayesian and Likelihood Evolutionary Analysis in R Scripts) performs probabilistic inference of ancestral geographic ranges on phylogenetic trees. This skill helps set up complete biogeographic analyses by:
Use this skill when users request:
The skill triggers when users mention phylogenetic biogeography, ancestral area reconstruction, or provide tree + distribution data.
Users must provide:
Phylogenetic tree (Newick format, .nwk, .tre, or .tree file)
Geographic distribution data (any tabular format)
When a user requests a BioGeoBEARS analysis, ask for:
Input file paths:
Analysis parameters (if not specified):
Use the AskUserQuestion tool to gather this information efficiently:
Example questions:
- "Maximum range size" - options based on number of areas (e.g., for 4 areas: "All 4 areas", "3 areas", "2 areas")
- "Models to compare" - options: "All 6 models (recommended)", "Only base models (DEC, DIVALIKE, BAYAREALIKE)", "Only +J models", "Custom selection"
- "Visualization type" - options: "Pie charts (show probabilities)", "Text labels (show most likely states)", "Both"
Use the Read tool to check the tree file:
# In R, basic validation:
library(ape)
tr <- read.tree("path/to/tree.nwk")
print(paste("Tips:", length(tr$tip.label)))
print(paste("Rooted:", is.rooted(tr)))
print(tr$tip.label) # Check species names
Verify:
Use scripts/validate_geography_file.py to validate or reformat the geography file.
If file is already in PHYLIP format (starts with numbers):
python scripts/validate_geography_file.py path/to/geography.txt --validate --tree path/to/tree.nwk
This checks:
If file is in CSV/TSV format (needs reformatting):
python scripts/validate_geography_file.py path/to/distribution.csv --reformat -o geography.data --delimiter ","
Or for tab-delimited:
python scripts/validate_geography_file.py path/to/distribution.txt --reformat -o geography.data --delimiter tab
The script will:
Always validate the reformatted file before proceeding:
python scripts/validate_geography_file.py geography.data --validate --tree path/to/tree.nwk
Create an organized directory for the analysis:
biogeobears_analysis/
├── input/
│ ├── tree.nwk # Original or copied tree
│ ├── geography.data # Validated/reformatted geography file
│ └── original_data/ # Original input files
│ ├── original_tree.nwk
│ └── original_distribution.csv
├── scripts/
│ └── run_biogeobears.Rmd # Generated RMarkdown script
├── results/ # Created by analysis (output directory)
│ ├── [MODEL]_result.Rdata # Saved model results
│ └── plots/ # Visualization outputs
│ ├── [MODEL]_pie.pdf
│ └── [MODEL]_text.pdf
└── README.md # Analysis documentation
Create this structure programmatically:
mkdir -p biogeobears_analysis/input/original_data
mkdir -p biogeobears_analysis/scripts
mkdir -p biogeobears_analysis/results/plots
# Copy files
cp path/to/tree.nwk biogeobears_analysis/input/
cp geography.data biogeobears_analysis/input/
cp original_files biogeobears_analysis/input/original_data/
Use the template at scripts/biogeobears_analysis_template.Rmd and customize it with user parameters.
Copy and customize the template:
cp scripts/biogeobears_analysis_template.Rmd biogeobears_analysis/scripts/run_biogeobears.Rmd
Create a parameter file or modify the YAML header in the Rmd to use the user's specific settings:
Example customization via R code:
# Edit YAML parameters programmatically or provide as params when rendering
rmarkdown::render(
"biogeobears_analysis/scripts/run_biogeobears.Rmd",
params = list(
tree_file = "../input/tree.nwk",
geog_file = "../input/geography.data",
max_range_size = 4,
models = "DEC,DEC+J,DIVALIKE,DIVALIKE+J,BAYAREALIKE,BAYAREALIKE+J",
output_dir = "../results"
),
output_file = "../results/biogeobears_report.html"
)
Or create a run script:
# biogeobears_analysis/run_analysis.sh
#!/bin/bash
cd "$(dirname "$0")/scripts"
R -e "rmarkdown::render('run_biogeobears.Rmd', params = list(
tree_file = '../input/tree.nwk',
geog_file = '../input/geography.data',
max_range_size = 4,
models = 'DEC,DEC+J,DIVALIKE,DIVALIKE+J,BAYAREALIKE,BAYAREALIKE+J',
output_dir = '../results'
), output_file = '../results/biogeobears_report.html')"
Generate a README.md in the analysis directory explaining:
Example:
# BioGeoBEARS Analysis
## Overview
Biogeographic analysis of [NUMBER] species across [NUMBER] geographic areas.
## Input Data
- **Tree**: `input/tree.nwk` ([NUMBER] tips)
- **Geography**: `input/geography.data` ([NUMBER] species × [NUMBER] areas)
- **Areas**: [A, B, C, ...]
## Parameters
- Maximum range size: [NUMBER]
- Models tested: [LIST]
## Running the Analysis
### Option 1: Using RMarkdown directly
```r
library(rmarkdown)
render("scripts/run_biogeobears.Rmd",
output_file = "../results/biogeobears_report.html")
bash run_analysis.sh
Results will be saved in results/:
biogeobears_report.html - Full analysis report with visualizations[MODEL]_result.Rdata - Saved R objects for each modelplots/[MODEL]_pie.pdf - Ancestral range reconstructions (pie charts)plots/[MODEL]_text.pdf - Ancestral range reconstructions (text labels)The HTML report includes:
See references/biogeobears_details.md for detailed model descriptions.
# Install BioGeoBEARS
install.packages("rexpokit")
install.packages("cladoRcpp")
library(devtools)
devtools::install_github(repo="nmatzke/BioGeoBEARS")
# Other packages
install.packages(c("ape", "rmarkdown", "knitr", "kableExtra"))
### Step 6: Provide User Instructions
After setting up the analysis, provide clear instructions to the user:
Analysis Setup Complete!
Directory structure created at: biogeobears_analysis/
📁 Files created: ✓ input/tree.nwk - Phylogenetic tree ([N] tips) ✓ input/geography.data - Geographic distribution data (validated) ✓ scripts/run_biogeobears.Rmd - RMarkdown analysis script ✓ README.md - Documentation and instructions ✓ run_analysis.sh - Convenience script to run analysis
📋 Next steps:
Review the README.md for analysis details
Install BioGeoBEARS if not already installed:
install.packages("rexpokit")
install.packages("cladoRcpp")
library(devtools)
devtools::install_github(repo="nmatzke/BioGeoBEARS")
Run the analysis:
cd biogeobears_analysis
bash run_analysis.sh
Or in R:
setwd("biogeobears_analysis")
rmarkdown::render("scripts/run_biogeobears.Rmd",
output_file = "../results/biogeobears_report.html")
View results:
⏱️ Expected runtime: [ESTIMATE based on tree size]
💡 The HTML report includes model comparison, parameter estimates, and visualization of ancestral ranges on your phylogeny.
## Analysis Parameter Guidance
When users ask for guidance on parameters, consult `references/biogeobears_details.md` and provide recommendations:
### Maximum Range Size
**Ask**: "What's the maximum number of areas a species in your group can realistically occupy?"
Common approaches:
- **Conservative**: Number of areas - 1 (prevents unrealistic cosmopolitan ancestral ranges)
- **Permissive**: All areas (if biologically plausible)
- **Data-driven**: Maximum observed in extant species
**Impact**: Larger values increase computational time exponentially
### Model Selection
**Default recommendation**: Run all 6 models for comprehensive comparison
- DEC, DIVALIKE, BAYAREALIKE (base models)
- DEC+J, DIVALIKE+J, BAYAREALIKE+J (+J variants)
**Rationale**:
- Model comparison is key to inference
- +J parameter is often significant
- Small additional computational cost
If computation is a concern, suggest starting with DEC and DEC+J.
### Visualization Options
**Pie charts** (`plotwhat = "pie"`):
- Show probability distributions across all possible states
- Better for conveying uncertainty
- Can be cluttered with many areas
**Text labels** (`plotwhat = "text"`):
- Show only maximum likelihood state
- Cleaner, easier to read
- Doesn't show uncertainty
**Recommendation**: Generate both in the analysis (template does this automatically)
## Common Issues and Troubleshooting
### Species Name Mismatches
**Symptom**: Error about species in tree not in geography file (or vice versa)
**Solution**: Use the validation script with `--tree` option to identify mismatches, then either:
1. Edit the geography file to match tree tip labels
2. Edit tree tip labels to match geography file
3. Remove species that aren't in both
### Tree Not Rooted
**Symptom**: Error about unrooted tree
**Solution**:
```r
library(ape)
tr <- read.tree("tree.nwk")
tr <- root(tr, outgroup = "outgroup_species_name")
write.tree(tr, "tree_rooted.nwk")
Ask user which species to use as outgroup.
Symptom: Validation errors about tabs, spaces, or binary codes
Solution: Use the reformat option:
python scripts/validate_geography_file.py input.csv --reformat -o geography.data
Symptom: NA values in parameter estimates or very negative log-likelihoods
Possible causes:
Solution: Check input data quality and try simpler model first (DEC only)
Causes:
Solutions:
force_sparse = TRUE in run objectThis skill includes:
validate_geography_file.py - Validates and reformats geography files
python validate_geography_file.py --helpbiogeobears_analysis_template.Rmd - RMarkdown template for complete analysis
Load this reference when:
Always validate input files before analysis - saves time debugging later
Organize analysis in a dedicated directory - keeps everything together and reproducible
Run all 6 models by default - model comparison is crucial for biogeographic inference
Document parameters and decisions - analysis README helps with reproducibility
Generate both visualization types - pie charts for uncertainty, text labels for clarity
Save intermediate results - the RMarkdown template does this automatically
Check parameter estimates - unrealistic values suggest data or model issues
Provide context with visualizations - explain what dispersal/extinction rates mean for the user's system
When presenting results to users, explain:
User: "I have a phylogeny of 30 bird species and their distributions across 5 islands. Can you help me figure out where their ancestors lived?"
Claude (using this skill):
1. Ask for tree and distribution file paths
2. Validate tree file (check 30 tips, rooted)
3. Validate/reformat geography file (5 areas)
4. Ask about max_range_size (suggest 4 areas)
5. Ask about models (suggest all 6)
6. Set up biogeobears_analysis/ directory structure
7. Copy template RMarkdown script with parameters
8. Generate README.md and run_analysis.sh
9. Provide clear instructions to run analysis
10. Explain expected outputs and how to interpret them
Result: User has complete, ready-to-run analysis with documentation
This skill was created based on:
Time estimate for skill execution:
Analysis runtime (separate from skill execution):
Installation requirements (user must have):
When to consult references/:
biogeobears_details.md when users need detailed explanations of models, parameters, or interpretationtools
Convert scanned PDFs and document images into clean Markdown using docling for layout (figures, tables, reading order) plus a vision-language OCR model. Use when a user needs high-quality OCR of scanned documents, historical literature, or photographed pages — preserving multi-column reading order, diacritics, special characters, and figures. Supports local vLLM/Ollama servers and cloud vision APIs (OpenAI, Anthropic). Assumes an OCR backend already exists.
tools
Engages structured analysis to explore multiple perspectives and context dependencies before responding. Use when users ask confirmation-seeking questions, make leading statements, request binary choices, or when feeling inclined to quickly agree or disagree without thorough consideration.
tools
Generate phylogenies from genome assemblies using BUSCO/compleasm-based single-copy orthologs with scheduler-aware workflow generation
testing
This skill should be used when extracting structured data from scientific PDFs for systematic reviews, meta-analyses, or database creation. Use when working with collections of research papers that need to be converted into analyzable datasets with validation metrics.