skills/43-wentorai-research-plugins/skills/domains/geoscience/pangaea-data-api/SKILL.md
Access earth and environmental science datasets via PANGAEA API
npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research pangaea-data-apiInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
PANGAEA is the world's leading data repository for earth and environmental sciences, hosting 400K+ datasets with 20B+ data points. It archives research data from oceanography, paleoclimatology, geology, ecology, and atmospheric science. Each dataset has a DOI and is linked to the originating publication. The API provides search, metadata retrieval, and data download. Free, no authentication required.
# Search datasets by keyword
curl "https://www.pangaea.de/advanced/search.php?q=ocean+temperature&count=20&type=json"
# Search with geographic bounding box
curl "https://www.pangaea.de/advanced/search.php?\
q=sediment+core&minlat=-60&maxlat=-30&minlon=-180&maxlon=180&type=json"
# Filter by parameter (measurement type)
curl "https://www.pangaea.de/advanced/search.php?\
q=carbon+dioxide¶m=Atmospheric+CO2&type=json"
# Filter by date range
curl "https://www.pangaea.de/advanced/search.php?\
q=Arctic+ice&mindate=2020-01-01&maxdate=2026-12-31&type=json"
# Full-text search via Elasticsearch
curl -X POST "https://ws.pangaea.de/es/pangaea/panmd/_search" \
-H "Content-Type: application/json" \
-d '{
"query": {
"bool": {
"must": [
{"match": {"citation.title": "ocean temperature"}}
],
"filter": [
{"range": {"citation.year": {"gte": 2020}}}
]
}
},
"size": 20
}'
# Get dataset metadata
curl "https://doi.pangaea.de/10.1594/PANGAEA.123456?format=metainfo_json"
# Download dataset as tab-delimited text
curl "https://doi.pangaea.de/10.1594/PANGAEA.123456?format=textfile"
# Download as CSV
curl "https://doi.pangaea.de/10.1594/PANGAEA.123456?format=csv"
# List records
curl "https://ws.pangaea.de/oai/provider?verb=ListRecords&metadataPrefix=oai_dc"
# Get specific record
curl "https://ws.pangaea.de/oai/provider?verb=GetRecord&identifier=oai:pangaea.de:doi:10.1594/PANGAEA.123456&metadataPrefix=oai_dc"
| Parameter | Description | Example |
|-----------|-------------|---------|
| q | Search query | q=coral+reef+bleaching |
| count | Results per page | count=50 |
| offset | Pagination offset | offset=20 |
| minlat/maxlat | Latitude bounds | -90 to 90 |
| minlon/maxlon | Longitude bounds | -180 to 180 |
| mindate/maxdate | Temporal filter | 2020-01-01 |
| param | Parameter/measurement | Temperature |
| topic | Topic filter | Atmosphere, Biosphere |
| type | Response format | json, xml |
import requests
import pandas as pd
from io import StringIO
SEARCH_URL = "https://www.pangaea.de/advanced/search.php"
ES_URL = "https://ws.pangaea.de/es/pangaea/panmd/_search"
def search_pangaea(query: str, count: int = 20,
bbox: dict = None) -> list:
"""Search PANGAEA for earth science datasets."""
params = {"q": query, "count": count, "type": "json"}
if bbox:
params.update({
"minlat": bbox.get("south", -90),
"maxlat": bbox.get("north", 90),
"minlon": bbox.get("west", -180),
"maxlon": bbox.get("east", 180),
})
resp = requests.get(SEARCH_URL, params=params, timeout=30)
resp.raise_for_status()
data = resp.json()
results = []
for item in data.get("results", []):
results.append({
"doi": item.get("URI", ""),
"title": item.get("citation", ""),
"year": item.get("year"),
"size": item.get("size"),
"parameters": item.get("params", []),
"score": item.get("score"),
})
return results
def download_dataset(doi: str) -> pd.DataFrame:
"""Download a PANGAEA dataset as a pandas DataFrame."""
url = f"https://doi.pangaea.de/{doi}?format=textfile"
resp = requests.get(url, timeout=60)
resp.raise_for_status()
lines = resp.text.split("\n")
header_end = next(
(i for i, line in enumerate(lines) if line.startswith("*/")),
-1,
)
data_text = "\n".join(lines[header_end + 1:])
return pd.read_csv(StringIO(data_text), sep="\t")
def search_by_location(query: str, lat: float, lon: float,
radius_deg: float = 5.0) -> list:
"""Search datasets near a geographic location."""
bbox = {
"south": lat - radius_deg,
"north": lat + radius_deg,
"west": lon - radius_deg,
"east": lon + radius_deg,
}
return search_pangaea(query, bbox=bbox)
# Example: find ocean temperature datasets
datasets = search_pangaea("sea surface temperature", count=5)
for ds in datasets:
print(f"[{ds['year']}] {ds['title'][:80]}...")
print(f" DOI: {ds['doi']} | Size: {ds['size']}")
# Example: download a specific dataset
# df = download_dataset("10.1594/PANGAEA.123456")
# print(df.head())
# Example: find Arctic research data
arctic = search_by_location("permafrost", lat=70, lon=25)
for ds in arctic[:3]:
print(f"{ds['title'][:80]}...")
| Topic | Coverage | |-------|----------| | Oceans | Temperature, salinity, currents, chemistry | | Paleoclimate | Ice cores, sediment cores, tree rings | | Atmosphere | CO2, aerosols, weather observations | | Lithosphere | Geology, tectonics, geochemistry | | Biosphere | Biodiversity, ecology, marine biology | | Cryosphere | Sea ice, glaciers, permafrost |
tools
Show mcp-stata identity, connected tools, and status. Use when the user asks if mcp-stata is available, asks about access to the toolkit, or asks what Stata tools are connected.
tools
Activate when users mention Stata commands, .do files, regressions, econometrics, stored results, graphs, dataset inspection, replication, or Stata errors. Route the task through mcp-stata tools and the specialized research skills instead of treating it as plain text coding.
development
Build and review paper-ready regression, balance, and summary tables from Stata outputs. Use when the user needs a clean table for a draft, appendix, or coauthor share-out.
tools
Install, configure, update, or verify mcp-stata across Claude Code, Codex, Gemini CLI, Cursor, Windsurf, and VS Code. Activate when users ask to set up the Stata toolkit or troubleshoot the installation.