Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

brycewang-stanford/easy-spider-guide

Name: easy-spider-guide
Author: brycewang-stanford

skills/43-wentorai-research-plugins/skills/tools/scraping/easy-spider-guide/SKILL.md

npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research easy-spider-guide

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

EasySpider Guide

Overview

EasySpider is a visual, no-code web crawler tool with over 44K stars on GitHub. It provides a graphical interface where users design web scraping tasks by interacting directly with target web pages, clicking on elements to extract, and defining navigation flows visually. No programming knowledge is required to build functional scrapers, making it accessible to researchers across all disciplines.

For academic researchers, data collection from web sources is a frequent need but often a technical barrier. Whether gathering publication metadata from journal websites, collecting survey responses from public forums, extracting pricing data for economic research, or archiving web content for digital humanities projects, EasySpider enables researchers to build custom scrapers without writing Python or JavaScript code. The visual approach also makes scrapers easier to maintain and modify when target websites change their structure.

EasySpider runs as a desktop application on Windows, macOS, and Linux. It uses a built-in Chromium browser for rendering, which means it can handle JavaScript-heavy websites, single-page applications, and sites that require user interaction such as clicking buttons, scrolling, or filling forms. Scraped data can be exported as CSV, JSON, or directly to databases.

Installation

Download and Setup

# Download the latest release for your platform from GitHub releases
# https://github.com/NaiboWang/EasySpider/releases

# macOS - download the .dmg file and drag to Applications

# Linux - download the AppImage
chmod +x EasySpider-linux-x86_64.AppImage
./EasySpider-linux-x86_64.AppImage

# Or run from source
git clone https://github.com/NaiboWang/EasySpider.git
cd EasySpider
npm install
npm start

System Requirements

Operating system: Windows 10+, macOS 10.15+, or Linux (Ubuntu 18.04+)
RAM: 4 GB minimum, 8 GB recommended for complex scraping tasks
Disk space: 500 MB for the application plus storage for scraped data
Network: stable internet connection for web scraping

Core Concepts

Task Design Workflow

EasySpider follows a visual task design approach with these steps:

Open target page - Enter the URL in EasySpider's built-in browser
Select elements - Click on the data elements you want to extract
Define fields - Name each extracted element (title, author, date, etc.)
Configure pagination - Click the "next page" button to set up pagination
Set extraction rules - Define how to handle lists, tables, and nested pages
Test and run - Preview results, then execute the full scraping task

Element Selection Modes

Single element - Click one element to extract that specific item
Similar elements - Click two similar items and EasySpider detects the pattern for all matching elements on the page
Table mode - Select a table header row to extract entire structured tables
Input mode - Define form fields to fill before extracting (useful for search-based data collection)

Research Use Cases

Collecting Publication Metadata

Researchers can use EasySpider to gather publication information from journal websites, conference proceedings pages, or institutional repositories.

Example workflow for scraping a conference proceedings page:

Navigate to the proceedings listing page
Click on the first paper title to mark it as a "title" field
Click on the second paper title; EasySpider recognizes the pattern and selects all titles
Similarly select author names, abstract snippets, and publication dates
If papers span multiple pages, click the "Next" pagination button
Configure "click into each paper" to follow links and extract full abstracts
Run the task and export as CSV

Monitoring Research Funding Opportunities

Task: Daily scan of funding agency websites for new opportunities

Steps configured in EasySpider:
1. Navigate to funding agency announcement page
2. Extract: opportunity title, deadline, funding amount, eligibility
3. Filter: only new announcements (since last check)
4. Schedule: run daily at 8:00 AM
5. Export: append to CSV file, send notification email

Gathering Economic Data from Public Sources

For economics and social science research, EasySpider can collect publicly available data from government statistics portals, price comparison websites, and public registries.

Task: Collect commodity prices from public market websites

Fields to extract:
- commodity_name: product identifier
- price: current listed price
- unit: measurement unit
- date: listing date
- source_url: page URL for reference

Pagination: navigate through category pages
Schedule: weekly collection
Output: CSV with timestamp for time-series analysis

Digital Humanities Web Archiving

Task: Archive public blog posts for discourse analysis

Configuration:
- Start URL: blog archive page
- Follow: links matching pattern /posts/*
- Extract per page:
  - post_title
  - post_date
  - author_name
  - post_content (full text)
  - comment_count
  - tags/categories
- Pagination: follow archive navigation links
- Output: JSON with full text content

Advanced Features

Conditional Logic

EasySpider supports conditional branches in task flows:

If element exists - Check for specific elements before attempting extraction
If text contains - Filter items based on content matching
Loop control - Set maximum iterations for pagination or nested page visits

Data Cleaning Options

Built-in text processing options can be applied during extraction:

Remove HTML tags from extracted text
Trim whitespace and normalize spacing
Extract numbers from mixed text fields
Apply regex patterns to clean specific formats
Convert date strings to standardized formats

Handling Dynamic Content

For JavaScript-rendered pages, EasySpider provides options to:

Wait for specific elements to appear before extracting
Scroll to load lazy-loaded content
Click "Load More" buttons automatically
Handle infinite scroll pages with configurable scroll limits

Anti-Detection Configuration

For responsible scraping, EasySpider includes options to:

Request configuration:
- Delay between requests: 2-5 seconds (randomized)
- User-Agent rotation: enabled
- Concurrent requests: 1 (sequential for politeness)
- Respect robots.txt: check before scraping
- Rate limiting: max 30 requests per minute

Exporting Research Data

CSV Export

The most common format for researchers. Data is exported with headers matching the field names defined during task design.

title,authors,year,journal,doi,abstract
"Machine Learning in Materials Science","Smith J, Lee K",2025,"Nature Materials","10.1038/xxx","Abstract text here..."

JSON Export

Preserves nested structure for complex extractions:

{
  "task_name": "proceedings_scrape",
  "extracted_at": "2026-03-10T14:30:00Z",
  "records": [
    {
      "title": "Machine Learning in Materials Science",
      "authors": ["Smith J", "Lee K"],
      "year": 2025,
      "metadata": {
        "journal": "Nature Materials",
        "doi": "10.1038/xxx"
      }
    }
  ]
}

Database Export

EasySpider can write directly to SQLite databases, which is convenient for subsequent analysis with Python pandas or R.

import sqlite3
import pandas as pd

# Read EasySpider output database
conn = sqlite3.connect("easyspider_results.db")
df = pd.read_sql("SELECT * FROM scraped_data", conn)

# Process and analyze
print(f"Total records: {len(df)}")
print(df.describe())
conn.close()

Ethical Web Scraping Guidelines for Researchers

When using EasySpider for research data collection, follow these ethical guidelines:

Check robots.txt before scraping any website
Respect rate limits and add appropriate delays between requests
Review terms of service for target websites
Use APIs when available rather than scraping HTML (many services offer research APIs)
Minimize data collection to only what is needed for the research question
Store data securely especially when collecting personal information
Cite data sources in publications and include data collection methodology
Obtain IRB approval if scraping involves human subjects data
Consider GDPR/privacy regulations when scraping data from EU sources

References

EasySpider GitHub repository: https://github.com/NaiboWang/EasySpider
EasySpider documentation: https://github.com/NaiboWang/EasySpider/wiki
EasySpider video tutorials: https://github.com/NaiboWang/EasySpider#tutorials
Web scraping ethics in research: https://doi.org/10.1177/2053951720943006

brycewang-stanford/easy-spider-guide

skills/43-wentorai-research-plugins/skills/tools/scraping/easy-spider-guide/SKILL.md

Guide to EasySpider for visual no-code web data collection

1,661 stars

development

Updated Jun 4, 2026

$ install --global

skillsauth

npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research easy-spider-guide

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 4, 2026, 6:16 AM112.3s1 file scanned

SKILL.md

name:: easy-spider-guide
description:: Guide to EasySpider for visual no-code web data collection
emoji:: 🕷️
category:: tools
subcategory:: scraping
keywords:: ["web scraping", "visual crawler", "no-code scraping", "data collection", "research data", "web automation"]
source:: https://github.com/NaiboWang/EasySpider

EasySpider Guide

Overview

Installation

Download and Setup

# Download the latest release for your platform from GitHub releases
# https://github.com/NaiboWang/EasySpider/releases

# macOS - download the .dmg file and drag to Applications

# Linux - download the AppImage
chmod +x EasySpider-linux-x86_64.AppImage
./EasySpider-linux-x86_64.AppImage

# Or run from source
git clone https://github.com/NaiboWang/EasySpider.git
cd EasySpider
npm install
npm start

System Requirements

Operating system: Windows 10+, macOS 10.15+, or Linux (Ubuntu 18.04+)
RAM: 4 GB minimum, 8 GB recommended for complex scraping tasks
Disk space: 500 MB for the application plus storage for scraped data
Network: stable internet connection for web scraping

Core Concepts

Task Design Workflow

EasySpider follows a visual task design approach with these steps:

Open target page - Enter the URL in EasySpider's built-in browser
Select elements - Click on the data elements you want to extract
Define fields - Name each extracted element (title, author, date, etc.)
Configure pagination - Click the "next page" button to set up pagination
Set extraction rules - Define how to handle lists, tables, and nested pages
Test and run - Preview results, then execute the full scraping task

Element Selection Modes

Single element - Click one element to extract that specific item
Similar elements - Click two similar items and EasySpider detects the pattern for all matching elements on the page
Table mode - Select a table header row to extract entire structured tables
Input mode - Define form fields to fill before extracting (useful for search-based data collection)

Research Use Cases

Collecting Publication Metadata

Researchers can use EasySpider to gather publication information from journal websites, conference proceedings pages, or institutional repositories.

Example workflow for scraping a conference proceedings page:

Navigate to the proceedings listing page
Click on the first paper title to mark it as a "title" field
Click on the second paper title; EasySpider recognizes the pattern and selects all titles
Similarly select author names, abstract snippets, and publication dates
If papers span multiple pages, click the "Next" pagination button
Configure "click into each paper" to follow links and extract full abstracts
Run the task and export as CSV

Monitoring Research Funding Opportunities

Task: Daily scan of funding agency websites for new opportunities

Steps configured in EasySpider:
1. Navigate to funding agency announcement page
2. Extract: opportunity title, deadline, funding amount, eligibility
3. Filter: only new announcements (since last check)
4. Schedule: run daily at 8:00 AM
5. Export: append to CSV file, send notification email

Gathering Economic Data from Public Sources

For economics and social science research, EasySpider can collect publicly available data from government statistics portals, price comparison websites, and public registries.

Task: Collect commodity prices from public market websites

Fields to extract:
- commodity_name: product identifier
- price: current listed price
- unit: measurement unit
- date: listing date
- source_url: page URL for reference

Pagination: navigate through category pages
Schedule: weekly collection
Output: CSV with timestamp for time-series analysis

Digital Humanities Web Archiving

Task: Archive public blog posts for discourse analysis

Configuration:
- Start URL: blog archive page
- Follow: links matching pattern /posts/*
- Extract per page:
  - post_title
  - post_date
  - author_name
  - post_content (full text)
  - comment_count
  - tags/categories
- Pagination: follow archive navigation links
- Output: JSON with full text content

Advanced Features

Conditional Logic

EasySpider supports conditional branches in task flows:

If element exists - Check for specific elements before attempting extraction
If text contains - Filter items based on content matching
Loop control - Set maximum iterations for pagination or nested page visits

Data Cleaning Options

Built-in text processing options can be applied during extraction:

Remove HTML tags from extracted text
Trim whitespace and normalize spacing
Extract numbers from mixed text fields
Apply regex patterns to clean specific formats
Convert date strings to standardized formats

Handling Dynamic Content

For JavaScript-rendered pages, EasySpider provides options to:

Wait for specific elements to appear before extracting
Scroll to load lazy-loaded content
Click "Load More" buttons automatically
Handle infinite scroll pages with configurable scroll limits

Anti-Detection Configuration

For responsible scraping, EasySpider includes options to:

Request configuration:
- Delay between requests: 2-5 seconds (randomized)
- User-Agent rotation: enabled
- Concurrent requests: 1 (sequential for politeness)
- Respect robots.txt: check before scraping
- Rate limiting: max 30 requests per minute

Exporting Research Data

CSV Export

The most common format for researchers. Data is exported with headers matching the field names defined during task design.

title,authors,year,journal,doi,abstract
"Machine Learning in Materials Science","Smith J, Lee K",2025,"Nature Materials","10.1038/xxx","Abstract text here..."

JSON Export

Preserves nested structure for complex extractions:

{
  "task_name": "proceedings_scrape",
  "extracted_at": "2026-03-10T14:30:00Z",
  "records": [
    {
      "title": "Machine Learning in Materials Science",
      "authors": ["Smith J", "Lee K"],
      "year": 2025,
      "metadata": {
        "journal": "Nature Materials",
        "doi": "10.1038/xxx"
      }
    }
  ]
}

Database Export

EasySpider can write directly to SQLite databases, which is convenient for subsequent analysis with Python pandas or R.

import sqlite3
import pandas as pd

# Read EasySpider output database
conn = sqlite3.connect("easyspider_results.db")
df = pd.read_sql("SELECT * FROM scraped_data", conn)

# Process and analyze
print(f"Total records: {len(df)}")
print(df.describe())
conn.close()

Ethical Web Scraping Guidelines for Researchers

When using EasySpider for research data collection, follow these ethical guidelines:

Check robots.txt before scraping any website
Respect rate limits and add appropriate delays between requests
Review terms of service for target websites
Use APIs when available rather than scraping HTML (many services offer research APIs)
Minimize data collection to only what is needed for the research question
Store data securely especially when collecting personal information
Cite data sources in publications and include data collection methodology
Obtain IRB approval if scraping involves human subjects data
Consider GDPR/privacy regulations when scraping data from EU sources

References

EasySpider GitHub repository: https://github.com/NaiboWang/EasySpider
EasySpider documentation: https://github.com/NaiboWang/EasySpider/wiki
EasySpider video tutorials: https://github.com/NaiboWang/EasySpider#tutorials
Web scraping ethics in research: https://doi.org/10.1177/2053951720943006

Related Skills

brycewang-stanford/literature-review-tools

tools

VerifiedTrustedCommunity

Recommend AND run open-source AI tools, agents, Claude Code / Codex skills, and MCP servers for any stage of a literature review — searching, reading, extracting, synthesizing, screening, citation-checking, and paper writing. Use when the user asks "what tool should I use to..." OR "install/run/use <tool> to ..." for research/lit-review work: automating a survey or related-work section, PDF→Markdown extraction for LLMs (MinerU/marker/docling), PRISMA / systematic review (ASReview), citation-backed Q&A over PDFs (PaperQA2), wiring papers into Claude/Cursor via MCP (arxiv/paper-search/zotero servers), or chatting with a Zotero library. Ships a launcher (scripts/litrun.py) that installs each tool in an isolated venv and runs it. Curated catalog of 70+ vetted projects. 支持中英文（用于「文献综述工具选型」与「一键安装/运行」）。

3,109SKILL.mdUpdated Jul 28, 2026

brycewang-stanford/literature-review-tools

brycewang-stanford/auto-empirical-research-skills

development

VerifiedTrustedCommunity

Route empirical-research requests through the Auto-Empirical Research Skills catalog when this whole repository is installed as one skill in Codex, CodeBuddy, Claude Code, or another IDE. Use to choose and load the right vendored AERS skill for causal inference, econometrics, replication, data acquisition, manuscript writing, peer review and referee responses, citation checking, de-AIGC editing, or full empirical-paper workflows without reading the entire repository at once.

3,109SKILL.mdUpdated Jun 27, 2026

brycewang-stanford/auto-empirical-research-skills

brycewang-stanford/aer-preregistration

documentation

VerifiedTrustedCommunity

Use when the project collects primary data or runs a field, lab, or survey experiment, before the intervention begins — write the pre-analysis plan, size the sample from a power calculation, and register with the AEA RCT Registry. Apply after the design is chosen in aer-identification and before any outcome data are seen.

3,021SKILL.mdUpdated Jul 23, 2026

brycewang-stanford/aer-preregistration

brycewang-stanford/economist-data-skill

tools

VerifiedTrustedCommunity

Guide economists to authoritative data sources with explicit, confirmed data specifications before retrieval; interfaces with Playwright MCP to navigate portals and extract real data, not articles about data.

3,021SKILL.mdUpdated Jul 23, 2026

brycewang-stanford/economist-data-skill

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research.git

# Copy into Claude Code skills folder (global)
cp -r Awesome-Agent-Skills-for-Empirical-Research/skills/43-wentorai-research-plugins/skills/tools/scraping/easy-spider-guide ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research

1,661 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT