skills/43-wentorai-research-plugins/skills/tools/scraping/SKILL.md
6 web scraping & data collection skills. Trigger: collecting web data, finding datasets, API access for research. Design: ethical scraping methods with rate limiting and data quality checks.
npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research scraping-skillsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Select the skill matching the user's need, then read its SKILL.md.
| Skill | Description | |-------|-------------| | academic-web-scraping | Ethical web scraping and API-based data collection for research | | dataset-finder-guide | Search and download research datasets from Kaggle, HuggingFace, and repos | | easy-spider-guide | Guide to EasySpider for visual no-code web data collection | | google-scholar-scraper | Ethical Google Scholar data collection techniques and best practices | | repository-harvesting-guide | Harvest metadata from open repositories using OAI-PMH protocol | | web-scraping-ethics-guide | Scrape web data ethically and legally for research purposes |
development
Track dataset lineage, transformation steps, merge logic, and reproducibility risks in Stata workflows. Use when the user needs to explain where data came from, how it changed, or why a pipeline can be trusted.
development
Audit datasets for structure, missingness, labeling, suspicious values, duplicate identifiers, and documentation readiness. Use when a researcher asks for data QA, codebook review, sanity checks, or pre-analysis cleanup guidance.
data-ai
Design, run, and critique causal inference workflows in Stata. Use when the user is working on identification, treatment effects, DiD, IV, event studies, RD, or assumption-sensitive empirical claims.
tools
Complete survival analysis library in Python. Handles right-censored data, Kaplan-Meier curves, and Cox regression. Standard for clinical trial analysis and epidemiology.