skills/43-wentorai-research-plugins/skills/tools/scraping/easy-spider-guide/SKILL.md
Guide to EasySpider for visual no-code web data collection
npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research easy-spider-guideInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
EasySpider is a visual, no-code web crawler tool with over 44K stars on GitHub. It provides a graphical interface where users design web scraping tasks by interacting directly with target web pages, clicking on elements to extract, and defining navigation flows visually. No programming knowledge is required to build functional scrapers, making it accessible to researchers across all disciplines.
For academic researchers, data collection from web sources is a frequent need but often a technical barrier. Whether gathering publication metadata from journal websites, collecting survey responses from public forums, extracting pricing data for economic research, or archiving web content for digital humanities projects, EasySpider enables researchers to build custom scrapers without writing Python or JavaScript code. The visual approach also makes scrapers easier to maintain and modify when target websites change their structure.
EasySpider runs as a desktop application on Windows, macOS, and Linux. It uses a built-in Chromium browser for rendering, which means it can handle JavaScript-heavy websites, single-page applications, and sites that require user interaction such as clicking buttons, scrolling, or filling forms. Scraped data can be exported as CSV, JSON, or directly to databases.
# Download the latest release for your platform from GitHub releases
# https://github.com/NaiboWang/EasySpider/releases
# macOS - download the .dmg file and drag to Applications
# Linux - download the AppImage
chmod +x EasySpider-linux-x86_64.AppImage
./EasySpider-linux-x86_64.AppImage
# Or run from source
git clone https://github.com/NaiboWang/EasySpider.git
cd EasySpider
npm install
npm start
EasySpider follows a visual task design approach with these steps:
Researchers can use EasySpider to gather publication information from journal websites, conference proceedings pages, or institutional repositories.
Example workflow for scraping a conference proceedings page:
Task: Daily scan of funding agency websites for new opportunities
Steps configured in EasySpider:
1. Navigate to funding agency announcement page
2. Extract: opportunity title, deadline, funding amount, eligibility
3. Filter: only new announcements (since last check)
4. Schedule: run daily at 8:00 AM
5. Export: append to CSV file, send notification email
For economics and social science research, EasySpider can collect publicly available data from government statistics portals, price comparison websites, and public registries.
Task: Collect commodity prices from public market websites
Fields to extract:
- commodity_name: product identifier
- price: current listed price
- unit: measurement unit
- date: listing date
- source_url: page URL for reference
Pagination: navigate through category pages
Schedule: weekly collection
Output: CSV with timestamp for time-series analysis
Task: Archive public blog posts for discourse analysis
Configuration:
- Start URL: blog archive page
- Follow: links matching pattern /posts/*
- Extract per page:
- post_title
- post_date
- author_name
- post_content (full text)
- comment_count
- tags/categories
- Pagination: follow archive navigation links
- Output: JSON with full text content
EasySpider supports conditional branches in task flows:
Built-in text processing options can be applied during extraction:
For JavaScript-rendered pages, EasySpider provides options to:
For responsible scraping, EasySpider includes options to:
Request configuration:
- Delay between requests: 2-5 seconds (randomized)
- User-Agent rotation: enabled
- Concurrent requests: 1 (sequential for politeness)
- Respect robots.txt: check before scraping
- Rate limiting: max 30 requests per minute
The most common format for researchers. Data is exported with headers matching the field names defined during task design.
title,authors,year,journal,doi,abstract
"Machine Learning in Materials Science","Smith J, Lee K",2025,"Nature Materials","10.1038/xxx","Abstract text here..."
Preserves nested structure for complex extractions:
{
"task_name": "proceedings_scrape",
"extracted_at": "2026-03-10T14:30:00Z",
"records": [
{
"title": "Machine Learning in Materials Science",
"authors": ["Smith J", "Lee K"],
"year": 2025,
"metadata": {
"journal": "Nature Materials",
"doi": "10.1038/xxx"
}
}
]
}
EasySpider can write directly to SQLite databases, which is convenient for subsequent analysis with Python pandas or R.
import sqlite3
import pandas as pd
# Read EasySpider output database
conn = sqlite3.connect("easyspider_results.db")
df = pd.read_sql("SELECT * FROM scraped_data", conn)
# Process and analyze
print(f"Total records: {len(df)}")
print(df.describe())
conn.close()
When using EasySpider for research data collection, follow these ethical guidelines:
tools
Show mcp-stata identity, connected tools, and status. Use when the user asks if mcp-stata is available, asks about access to the toolkit, or asks what Stata tools are connected.
tools
Activate when users mention Stata commands, .do files, regressions, econometrics, stored results, graphs, dataset inspection, replication, or Stata errors. Route the task through mcp-stata tools and the specialized research skills instead of treating it as plain text coding.
development
Build and review paper-ready regression, balance, and summary tables from Stata outputs. Use when the user needs a clean table for a draft, appendix, or coauthor share-out.
tools
Install, configure, update, or verify mcp-stata across Claude Code, Codex, Gemini CLI, Cursor, Windsurf, and VS Code. Activate when users ask to set up the Stata toolkit or troubleshoot the installation.