Name: actionbook-scraper
Author: actionbook

Actionbook Scraper Skill

⚠️ CRITICAL: Two-Part Verification

Every generated script MUST pass BOTH checks:

| Check | What to Verify | Failure Example | |-------|----------------|-----------------| | Part 1: Script Runs | No errors, no timeouts | Selector not found | | Part 2: Data Correct | Content matches expected | Extracted "Click to expand" instead of name |

┌─────────────────────────────────────────────────────┐
│   1. Generate Script                                │
│          ↓                                          │
│   2. Execute Script                                 │
│          ↓                                          │
│   3. Check Part 1: Script runs without errors?      │
│          ↓                                          │
│   4. Check Part 2: Data content is correct?         │
│      - Not empty                                    │
│      - Not placeholder text ("Loading...")          │
│      - Not UI text ("Click to expand")              │
│      - Fields mapped correctly                      │
│          ↓                                          │
│      ┌───┴───┐                                      │
│   BOTH Pass  Either Fails                           │
│      │           │                                  │
│      │           ↓                                  │
│      │       Is it Actionbook data issue?           │
│      │           │                                  │
│      │       ┌───┴───┐                              │
│      │      Yes      No                             │
│      │       │       │                              │
│      │       ↓       ↓                              │
│      │    Log to   Fix script                       │
│      │    .actionbook-issues.log                    │
│      │       │       │                              │
│      │       └───┬───┘                              │
│      │           ↓                                  │
│      │       Retry (max 3x)                         │
│      ↓                                              │
│   Output Script                                     │
└─────────────────────────────────────────────────────┘

Default Output Format

/actionbook-scraper:generate <url>

DEFAULT = agent-browser script (bash commands)

agent-browser open "https://example.com"
agent-browser scroll down 2000
agent-browser get text ".selector"
agent-browser close

With --standalone Flag

/actionbook-scraper:generate <url> --standalone

Output = Playwright JavaScript code

Verification Requirements

Two-Part Verification

Every generated script must pass BOTH checks:

| Check | What to Verify | Failure Action | |-------|---------------|----------------| | 1. Script Runs | No errors, no timeouts | Fix syntax/selector errors | | 2. Data Correct | Content matches expected fields | Fix extraction logic |

Part 1: Script Execution Check

No runtime errors
No timeout errors
Browser closes properly

Part 2: Data Content Check (CRITICAL)

Verify extracted data matches the expected structure:

Expected: Company name, description, website, year founded
Actual:   "Click to expand", "Loading...", empty strings

→ FAIL: Data content incorrect, need to fix extraction logic

Data validation rules:

| Rule | Example Failure | Fix | |------|-----------------|-----| | Fields not empty | name: "" | Check selector targets correct element | | No placeholder text | name: "Loading..." | Add wait for dynamic content | | No UI text | name: "Click to expand" | Extract after expanding, not button text | | Correct data type | year: "View Details" | Wrong selector, fix field mapping | | Reasonable count | Expected ~100, got 3 | Add scroll/pagination handling |

For agent-browser Scripts

Execute the generated commands
Check script runs without errors
Check data content is correct:
- Fields match expected structure
- Values are actual data, not UI text
- Count is reasonable
If failed:
- Analyze what's wrong (script error vs data error)
- Fix selector, wait logic, or extraction
- Re-execute
If success:
- Output the verified script
- Show data preview with field validation

For Playwright Scripts (--standalone)

Write script to temp file
Run with node script.js
Check script runs without errors
Check output data is correct:
- JSON structure matches expected fields
- Values contain actual data
- Count matches expected range
If failed:
- Analyze error type
- Fix script
- Re-run
If success:
- Output the verified script

Architecture Overview

/generate <url>              → OUTPUT: agent-browser bash commands
/generate <url> --standalone → OUTPUT: Playwright .js file

┌─────────────────────────────────────────────────────────────┐
│                   /generate <url>                           │
│                                                             │
│   1. Search Actionbook → get selectors                      │
│   2. Generate OUTPUT:                                       │
│                                                             │
│      WITHOUT --standalone    │    WITH --standalone         │
│      ─────────────────────   │    ──────────────────        │
│      agent-browser commands  │    Playwright .js code       │
│                              │                              │
│      ```bash                 │    ```javascript             │
│      agent-browser open ...  │    const { chromium } = ...  │
│      agent-browser get ...   │    await page.goto(...)      │
│      agent-browser close     │    ```                       │
│      ```                     │                              │
└─────────────────────────────────────────────────────────────┘

Tool Priority

| Operation | Primary Tool | Fallback | Notes | |-----------|-------------|----------|-------| | Find selectors for URL | search_actions | None | Search by domain/keywords | | Get full selector details | get_action_by_id | None | Use action_id from search | | List available sources | list_sources | search_sources | Browse all indexed sites | | Generate agent-browser script | Agent (sonnet) | - | Default mode for /generate | | Generate Playwright script | Agent (sonnet) | - | Use --standalone flag | | Structure analysis | Agent (haiku) | - | Parse Actionbook response | | Request new website | agent-browser | Manual | Submit to actionbook.dev (ONLY command that executes agent-browser) |

Workflow Rules

CRITICAL: Generate → Verify → Fix

Every generated script MUST be verified by executing it.

| Step | Action | |------|--------| | 1 | Generate script with Actionbook selectors | | 2 | Execute script to verify it works | | 3 | If failed: analyze error, fix script, go to step 2 | | 4 | If success: output verified script + data preview |

Verification Process

For agent-browser scripts:

# Execute each command
agent-browser open "https://example.com"
agent-browser wait --load networkidle
agent-browser get text ".selector"
# Check if data is returned
# If error → fix and retry
agent-browser close

For Playwright scripts (--standalone):

# Write to temp file and execute
node /tmp/scraper.js
# Check if output file has data
# If error → fix and retry

Critical Rules

ALWAYS verify generated scripts - Execute and check BOTH parts
Part 1: Script must run - No errors, no timeouts
Part 2: Data must be correct - Not empty, not UI text, fields mapped correctly
Fix errors automatically - Don't output broken scripts or wrong data
Use Actionbook MCP tools first - Never guess selectors
Include scroll handling for lazy-loaded pages
Include expand/collapse logic for card-based layouts
Always close browser - Include agent-browser close
Retry up to 3 times - If still failing, report the specific issue

Common Data Errors to Catch

| Error | Example | Fix | |-------|---------|-----| | Extracted button text | name: "Click to expand" | Extract content after expanding | | Extracted placeholder | desc: "Loading..." | Add wait for dynamic content | | Empty fields | name: "" | Fix selector | | Wrong field mapping | year: "San Francisco" | Fix selector for each field | | Too few items | Expected 100, got 3 | Add scroll/pagination |

Record Actionbook Data Issues

If Actionbook selectors are wrong or outdated, record to local file:

.actionbook-issues.log

When to record:

Selector doesn't exist on page
Selector returns wrong element
Page structure has changed
Missing selectors for key elements

Log format:

[YYYY-MM-DD HH:MM] URL: {url}
Action ID: {action_id}
Issue Type: {selector_error | outdated | missing}
Details: {description}
Selector: {selector}
Expected: {what it should select}
Actual: {what it actually selects or error}
---

Selector Priority

When Actionbook provides multiple selectors, prefer in this order:

data-testid - Most stable, designed for automation
aria-label - Accessibility-based, semantic
css - Class-based selectors
xpath - Last resort, most fragile

Commands

| Command | Description | Agent | |---------|-------------|-------| | /actionbook-scraper:analyze <url> | Analyze page structure and show available selectors | structure-analyzer | | /actionbook-scraper:generate <url> | Generate agent-browser scraper script | code-generator | | /actionbook-scraper:generate <url> --standalone | Generate Playwright/Puppeteer script | code-generator | | /actionbook-scraper:list-sources | List websites with Actionbook data | - | | /actionbook-scraper:request-website <url> | Request new website to be indexed (uses agent-browser) | website-requester |

Data Flow

Analyze Command

1. User: /actionbook-scraper:analyze https://example.com/page
2. Extract domain from URL → "example.com"
3. search_actions("example page") → [action_ids]
4. For best match: get_action_by_id(action_id) → full selector data
5. Structure-analyzer agent formats and presents findings

Generate Command (Default: agent-browser script)

User: /actionbook-scraper:generate https://example.com/page

Step 1: Search Actionbook
  search_actions("example.com page") → action_ids

Step 2: Get selectors
  get_action_by_id(best_match) → selectors

Step 3: Generate agent-browser script
  ```bash
  agent-browser open "https://example.com/page"
  agent-browser wait --load networkidle
  agent-browser scroll down 2000
  agent-browser get text ".item-container"
  agent-browser close

Step 4: VERIFY script (REQUIRED) Execute the commands and check if data is extracted If failed → analyze error → fix script → retry (max 3x)

Step 5: Return verified script + data preview


**Example Output:**
````markdown
## Verified Scraper (agent-browser)

**Status**: ✅ Verified (extracted 50 items)

Run these commands to scrape:

```bash
agent-browser open "https://example.com/page"
agent-browser wait --load networkidle
agent-browser scroll down 2000
agent-browser get text ".item-container"
agent-browser close

Data Preview

[
  {"name": "Item 1", "description": "..."},
  {"name": "Item 2", "description": "..."},
  // ... showing first 3 items
]


### Generate Command (--standalone: Playwright script)

```
User: /actionbook-scraper:generate https://example.com/page --standalone

Step 1: Search Actionbook for selectors
Step 2: Get full selector data
Step 3: Generate Playwright/Puppeteer script
Step 4: VERIFY script (REQUIRED)
  Write to temp file → node /tmp/scraper.js → check output
  If failed → analyze error → fix script → retry (max 3x)
Step 5: Return verified script + data preview
```

**Example Output:**
````markdown
## Verified Scraper (Playwright)

**Status**: ✅ Verified (extracted 50 items)

```javascript
const { chromium } = require('playwright');
// ... generated code with Actionbook selectors
```

Usage:
```bash
npm install playwright
node scraper.js
```

### Data Preview
```json
[
  {"name": "Item 1", "description": "..."},
  // ... first 3 items
]
```

Request Website Command

1. User: /actionbook-scraper:request-website https://newsite.com/page
2. Launch website-requester agent (uses agent-browser)
3. Agent workflow:
   a. agent-browser open "https://actionbook.dev/request-website"
   b. agent-browser snapshot -i (discover form selectors)
   c. agent-browser type <url-field> "https://newsite.com/page"
   d. agent-browser type <email-field> (optional)
   e. agent-browser type <usecase-field> (optional)
   f. agent-browser click <submit-button>
   g. agent-browser snapshot -i (verify submission)
   h. agent-browser close
4. Output: Confirmation of submission

Selector Data Structure

Actionbook returns selector data in this format:

{
  "url": "https://example.com/page",
  "title": "Page Title",
  "content": "## Selector Reference\n\n| Element | CSS | XPath | Type |\n..."
}

Common Selector Patterns

Card-based layouts:

Container: .card-list, .grid-container
Card item: .card, .list-item
Card name: .card__title, .card-name
Card description: .card__description
Expand button: .card__expand, button.expand

Detail extraction (dt/dd pattern):

// Common pattern for key-value pairs
const items = container.querySelectorAll('.info-item');
items.forEach(item => {
  const label = item.querySelector('dt').textContent;
  const value = item.querySelector('dd').textContent;
});

Table layouts:

Table: table, .data-table
Header: thead th, .table-header
Row: tbody tr, .table-row
Cell: td, .table-cell

Page Type Detection

| Indicator | Page Type | Template | |-----------|-----------|----------| | Scroll to load more | Dynamic/Infinite | playwright-js (with scroll) | | Click to expand | Card-based | playwright-js (with click) | | Pagination links | Paginated | playwright-js (with pagination) | | Static content | Static | puppeteer or playwright | | SPA framework detected | SPA | playwright-js (network idle) |

Output Formats

Analysis Output

## Page Analysis: {url}

### Matched Action
- **Action ID**: {action_id}
- **Confidence**: HIGH | MEDIUM | LOW

### Available Selectors

| Element | Selector | Type | Methods |
|---------|----------|------|---------|
| {name} | {selector} | {type} | {methods} |

### Page Structure
- **Type**: {static|dynamic|spa}
- **Data Pattern**: {cards|table|list}
- **Lazy Loading**: {yes|no}
- **Expand/Collapse**: {yes|no}

### Recommendations
- Suggested template: {template}
- Special handling needed: {notes}

Generated Code Output

## Generated Scraper

**Target URL**: {url}
**Template**: {template}
**Expected Output**: {description}

### Dependencies
```bash
npm install playwright

Code

{generated_code}

Usage

node scraper.js

Output

Results saved to {output_file}


## Templates Reference

| Template | Flag | Output | Run With |
|----------|------|--------|----------|
| **agent-browser** | (default) | CLI commands | `agent-browser` CLI |
| playwright-js | --standalone | .js file | `node scraper.js` |
| playwright-python | --standalone --template playwright-python | .py file | `python scraper.py` |
| puppeteer | --standalone --template puppeteer | .js file | `node scraper.js` |

## Error Handling

| Error | Cause | Solution |
|-------|-------|----------|
| No actions found | URL not indexed | Use `/actionbook-scraper:request-website` to request indexing |
| Selectors not working | Page updated | Report to Actionbook, try alternative selectors |
| Timeout | Slow page load | Increase timeout, add retry logic |
| Empty data | Dynamic content | Add scroll/wait handling |
| Form submission failed | Network/page issue | Retry or submit manually at actionbook.dev |

## agent-browser Usage

For the `request-website` command, the plugin uses **agent-browser CLI** to automate form submission.

### agent-browser Commands

```bash
# Open a URL
agent-browser open "https://actionbook.dev/request-website"

# Get page snapshot (discover selectors)
agent-browser snapshot -i

# Type into form field
agent-browser type "input[name='url']" "https://example.com"

# Click button
agent-browser click "button[type='submit']"

# Close browser (ALWAYS do this)
agent-browser close

Selector Discovery

If form selectors are unknown, use snapshot to discover them:

agent-browser open "https://actionbook.dev/request-website"
agent-browser snapshot -i  # Returns page structure with selectors

Always Close Browser

Critical: Always run agent-browser close at the end of any agent-browser session, even if errors occur.

Rate Limiting

Actionbook MCP: No rate limit for local usage
Target websites: Respect robots.txt and add delays between requests
Recommended: 1-2 second delay between page requests

Examples

Example 1: Generate agent-browser Script (Default)

/actionbook-scraper:generate https://firstround.com/companies

Output: agent-browser commands
```bash
agent-browser open "https://firstround.com/companies"
agent-browser scroll down 2000
agent-browser get text ".company-list-card-small"
agent-browser close

User runs these commands to scrape.


### Example 2: Generate Playwright Script

/actionbook-scraper:generate https://firstround.com/companies --standalone

Output: Playwright JavaScript code

const { chromium } = require('playwright');
// ... full script

User runs: node scraper.js


### Example 3: Analyze Page Structure

/actionbook-scraper:analyze https://example.com/products

Output: Analysis showing:

Available selectors
Page structure
Recommended approach


### Example 4: Request New Website

/actionbook-scraper:request-website https://newsite.com/data

Action: Submits form to actionbook.dev (this command DOES execute agent-browser)


## Best Practices

1. **Always analyze before generating** - Understand the page structure first
2. **Check list-sources** - Verify the site is indexed before attempting
3. **Review generated code** - Verify selectors match expected elements
4. **Add appropriate delays** - Be respectful to target servers
5. **Handle edge cases** - Empty states, loading states, errors
6. **Test incrementally** - Run on small subset before full scrape

Actionbook Scraper Skill

⚠️ CRITICAL: Two-Part Verification

Every generated script MUST pass BOTH checks:

┌─────────────────────────────────────────────────────┐
│   1. Generate Script                                │
│          ↓                                          │
│   2. Execute Script                                 │
│          ↓                                          │
│   3. Check Part 1: Script runs without errors?      │
│          ↓                                          │
│   4. Check Part 2: Data content is correct?         │
│      - Not empty                                    │
│      - Not placeholder text ("Loading...")          │
│      - Not UI text ("Click to expand")              │
│      - Fields mapped correctly                      │
│          ↓                                          │
│      ┌───┴───┐                                      │
│   BOTH Pass  Either Fails                           │
│      │           │                                  │
│      │           ↓                                  │
│      │       Is it Actionbook data issue?           │
│      │           │                                  │
│      │       ┌───┴───┐                              │
│      │      Yes      No                             │
│      │       │       │                              │
│      │       ↓       ↓                              │
│      │    Log to   Fix script                       │
│      │    .actionbook-issues.log                    │
│      │       │       │                              │
│      │       └───┬───┘                              │
│      │           ↓                                  │
│      │       Retry (max 3x)                         │
│      ↓                                              │
│   Output Script                                     │
└─────────────────────────────────────────────────────┘

Default Output Format

/actionbook-scraper:generate <url>

DEFAULT = agent-browser script (bash commands)

agent-browser open "https://example.com"
agent-browser scroll down 2000
agent-browser get text ".selector"
agent-browser close

With --standalone Flag

/actionbook-scraper:generate <url> --standalone

Output = Playwright JavaScript code

Verification Requirements

Two-Part Verification

Every generated script must pass BOTH checks:

Part 1: Script Execution Check

No runtime errors
No timeout errors
Browser closes properly

Part 2: Data Content Check (CRITICAL)

Verify extracted data matches the expected structure:

Expected: Company name, description, website, year founded
Actual:   "Click to expand", "Loading...", empty strings

→ FAIL: Data content incorrect, need to fix extraction logic

Data validation rules:

For agent-browser Scripts

Execute the generated commands
Check script runs without errors
Check data content is correct:
- Fields match expected structure
- Values are actual data, not UI text
- Count is reasonable
If failed:
- Analyze what's wrong (script error vs data error)
- Fix selector, wait logic, or extraction
- Re-execute
If success:
- Output the verified script
- Show data preview with field validation

For Playwright Scripts (--standalone)

Write script to temp file
Run with node script.js
Check script runs without errors
Check output data is correct:
- JSON structure matches expected fields
- Values contain actual data
- Count matches expected range
If failed:
- Analyze error type
- Fix script
- Re-run
If success:
- Output the verified script

Architecture Overview

/generate <url>              → OUTPUT: agent-browser bash commands
/generate <url> --standalone → OUTPUT: Playwright .js file

┌─────────────────────────────────────────────────────────────┐
│                   /generate <url>                           │
│                                                             │
│   1. Search Actionbook → get selectors                      │
│   2. Generate OUTPUT:                                       │
│                                                             │
│      WITHOUT --standalone    │    WITH --standalone         │
│      ─────────────────────   │    ──────────────────        │
│      agent-browser commands  │    Playwright .js code       │
│                              │                              │
│      ```bash                 │    ```javascript             │
│      agent-browser open ...  │    const { chromium } = ...  │
│      agent-browser get ...   │    await page.goto(...)      │
│      agent-browser close     │    ```                       │
│      ```                     │                              │
└─────────────────────────────────────────────────────────────┘

Tool Priority

Workflow Rules

CRITICAL: Generate → Verify → Fix

Every generated script MUST be verified by executing it.

Verification Process

For agent-browser scripts:

# Execute each command
agent-browser open "https://example.com"
agent-browser wait --load networkidle
agent-browser get text ".selector"
# Check if data is returned
# If error → fix and retry
agent-browser close

For Playwright scripts (--standalone):

# Write to temp file and execute
node /tmp/scraper.js
# Check if output file has data
# If error → fix and retry

Critical Rules

ALWAYS verify generated scripts - Execute and check BOTH parts
Part 1: Script must run - No errors, no timeouts
Part 2: Data must be correct - Not empty, not UI text, fields mapped correctly
Fix errors automatically - Don't output broken scripts or wrong data
Use Actionbook MCP tools first - Never guess selectors
Include scroll handling for lazy-loaded pages
Include expand/collapse logic for card-based layouts
Always close browser - Include agent-browser close
Retry up to 3 times - If still failing, report the specific issue

Common Data Errors to Catch

Record Actionbook Data Issues

If Actionbook selectors are wrong or outdated, record to local file:

.actionbook-issues.log

When to record:

Selector doesn't exist on page
Selector returns wrong element
Page structure has changed
Missing selectors for key elements

Log format:

[YYYY-MM-DD HH:MM] URL: {url}
Action ID: {action_id}
Issue Type: {selector_error | outdated | missing}
Details: {description}
Selector: {selector}
Expected: {what it should select}
Actual: {what it actually selects or error}
---

Selector Priority

When Actionbook provides multiple selectors, prefer in this order:

data-testid - Most stable, designed for automation
aria-label - Accessibility-based, semantic
css - Class-based selectors
xpath - Last resort, most fragile

Commands

Data Flow

Analyze Command

1. User: /actionbook-scraper:analyze https://example.com/page
2. Extract domain from URL → "example.com"
3. search_actions("example page") → [action_ids]
4. For best match: get_action_by_id(action_id) → full selector data
5. Structure-analyzer agent formats and presents findings

Generate Command (Default: agent-browser script)

User: /actionbook-scraper:generate https://example.com/page

Step 1: Search Actionbook
  search_actions("example.com page") → action_ids

Step 2: Get selectors
  get_action_by_id(best_match) → selectors

Step 3: Generate agent-browser script
  ```bash
  agent-browser open "https://example.com/page"
  agent-browser wait --load networkidle
  agent-browser scroll down 2000
  agent-browser get text ".item-container"
  agent-browser close

Step 4: VERIFY script (REQUIRED) Execute the commands and check if data is extracted If failed → analyze error → fix script → retry (max 3x)

Step 5: Return verified script + data preview


**Example Output:**
````markdown
## Verified Scraper (agent-browser)

**Status**: ✅ Verified (extracted 50 items)

Run these commands to scrape:

```bash
agent-browser open "https://example.com/page"
agent-browser wait --load networkidle
agent-browser scroll down 2000
agent-browser get text ".item-container"
agent-browser close

Data Preview

[
  {"name": "Item 1", "description": "..."},
  {"name": "Item 2", "description": "..."},
  // ... showing first 3 items
]


### Generate Command (--standalone: Playwright script)

```
User: /actionbook-scraper:generate https://example.com/page --standalone

Step 1: Search Actionbook for selectors
Step 2: Get full selector data
Step 3: Generate Playwright/Puppeteer script
Step 4: VERIFY script (REQUIRED)
  Write to temp file → node /tmp/scraper.js → check output
  If failed → analyze error → fix script → retry (max 3x)
Step 5: Return verified script + data preview
```

**Example Output:**
````markdown
## Verified Scraper (Playwright)

**Status**: ✅ Verified (extracted 50 items)

```javascript
const { chromium } = require('playwright');
// ... generated code with Actionbook selectors
```

Usage:
```bash
npm install playwright
node scraper.js
```

### Data Preview
```json
[
  {"name": "Item 1", "description": "..."},
  // ... first 3 items
]
```

Request Website Command

1. User: /actionbook-scraper:request-website https://newsite.com/page
2. Launch website-requester agent (uses agent-browser)
3. Agent workflow:
   a. agent-browser open "https://actionbook.dev/request-website"
   b. agent-browser snapshot -i (discover form selectors)
   c. agent-browser type <url-field> "https://newsite.com/page"
   d. agent-browser type <email-field> (optional)
   e. agent-browser type <usecase-field> (optional)
   f. agent-browser click <submit-button>
   g. agent-browser snapshot -i (verify submission)
   h. agent-browser close
4. Output: Confirmation of submission

Selector Data Structure

Actionbook returns selector data in this format:

{
  "url": "https://example.com/page",
  "title": "Page Title",
  "content": "## Selector Reference\n\n| Element | CSS | XPath | Type |\n..."
}

Common Selector Patterns

Card-based layouts:

Container: .card-list, .grid-container
Card item: .card, .list-item
Card name: .card__title, .card-name
Card description: .card__description
Expand button: .card__expand, button.expand

Detail extraction (dt/dd pattern):

// Common pattern for key-value pairs
const items = container.querySelectorAll('.info-item');
items.forEach(item => {
  const label = item.querySelector('dt').textContent;
  const value = item.querySelector('dd').textContent;
});

Table layouts:

Table: table, .data-table
Header: thead th, .table-header
Row: tbody tr, .table-row
Cell: td, .table-cell

Page Type Detection

Output Formats

Analysis Output

## Page Analysis: {url}

### Matched Action
- **Action ID**: {action_id}
- **Confidence**: HIGH | MEDIUM | LOW

### Available Selectors

| Element | Selector | Type | Methods |
|---------|----------|------|---------|
| {name} | {selector} | {type} | {methods} |

### Page Structure
- **Type**: {static|dynamic|spa}
- **Data Pattern**: {cards|table|list}
- **Lazy Loading**: {yes|no}
- **Expand/Collapse**: {yes|no}

### Recommendations
- Suggested template: {template}
- Special handling needed: {notes}

Generated Code Output

## Generated Scraper

**Target URL**: {url}
**Template**: {template}
**Expected Output**: {description}

### Dependencies
```bash
npm install playwright

Code

{generated_code}

Usage

node scraper.js

Output

Results saved to {output_file}


## Templates Reference

| Template | Flag | Output | Run With |
|----------|------|--------|----------|
| **agent-browser** | (default) | CLI commands | `agent-browser` CLI |
| playwright-js | --standalone | .js file | `node scraper.js` |
| playwright-python | --standalone --template playwright-python | .py file | `python scraper.py` |
| puppeteer | --standalone --template puppeteer | .js file | `node scraper.js` |

## Error Handling

| Error | Cause | Solution |
|-------|-------|----------|
| No actions found | URL not indexed | Use `/actionbook-scraper:request-website` to request indexing |
| Selectors not working | Page updated | Report to Actionbook, try alternative selectors |
| Timeout | Slow page load | Increase timeout, add retry logic |
| Empty data | Dynamic content | Add scroll/wait handling |
| Form submission failed | Network/page issue | Retry or submit manually at actionbook.dev |

## agent-browser Usage

For the `request-website` command, the plugin uses **agent-browser CLI** to automate form submission.

### agent-browser Commands

```bash
# Open a URL
agent-browser open "https://actionbook.dev/request-website"

# Get page snapshot (discover selectors)
agent-browser snapshot -i

# Type into form field
agent-browser type "input[name='url']" "https://example.com"

# Click button
agent-browser click "button[type='submit']"

# Close browser (ALWAYS do this)
agent-browser close

Selector Discovery

If form selectors are unknown, use snapshot to discover them:

agent-browser open "https://actionbook.dev/request-website"
agent-browser snapshot -i  # Returns page structure with selectors

Always Close Browser

Critical: Always run agent-browser close at the end of any agent-browser session, even if errors occur.

Rate Limiting

Actionbook MCP: No rate limit for local usage
Target websites: Respect robots.txt and add delays between requests
Recommended: 1-2 second delay between page requests

Examples

Example 1: Generate agent-browser Script (Default)

/actionbook-scraper:generate https://firstround.com/companies

Output: agent-browser commands
```bash
agent-browser open "https://firstround.com/companies"
agent-browser scroll down 2000
agent-browser get text ".company-list-card-small"
agent-browser close

User runs these commands to scrape.


### Example 2: Generate Playwright Script

/actionbook-scraper:generate https://firstround.com/companies --standalone

Output: Playwright JavaScript code

const { chromium } = require('playwright');
// ... full script

User runs: node scraper.js


### Example 3: Analyze Page Structure

/actionbook-scraper:analyze https://example.com/products

Output: Analysis showing:

Available selectors
Page structure
Recommended approach


### Example 4: Request New Website

/actionbook-scraper:request-website https://newsite.com/data

Action: Submits form to actionbook.dev (this command DOES execute agent-browser)


## Best Practices

1. **Always analyze before generating** - Understand the page structure first
2. **Check list-sources** - Verify the site is indexed before attempting
3. **Review generated code** - Verify selectors match expected elements
4. **Add appropriate delays** - Be respectful to target servers
5. **Handle edge cases** - Empty states, loading states, errors
6. **Test incrementally** - Run on small subset before full scrape

Adoption

actionbook/actionbook-scraper

$ install --global

Security Scan Results

SKILL.md

Actionbook Scraper Skill

⚠️ CRITICAL: Two-Part Verification

Default Output Format

With --standalone Flag

Verification Requirements

Two-Part Verification

Part 1: Script Execution Check

Part 2: Data Content Check (CRITICAL)

For agent-browser Scripts

For Playwright Scripts (--standalone)

Architecture Overview

Tool Priority

Workflow Rules

CRITICAL: Generate → Verify → Fix

Verification Process

Critical Rules

Common Data Errors to Catch

Record Actionbook Data Issues

Selector Priority

Commands

Data Flow

Analyze Command

Generate Command (Default: agent-browser script)

Data Preview

Request Website Command

Selector Data Structure

Common Selector Patterns

Page Type Detection

Output Formats

Analysis Output

Generated Code Output

Code

Usage

Output

Selector Discovery

Always Close Browser

Rate Limiting

Examples

Example 1: Generate agent-browser Script (Default)

Related Skills