Meta Ad Spy — Competitor Ad Intelligence Skill

A two-phase skill for extracting and analyzing competitor ads from Meta platforms.

Architecture Overview

Phase 1: Playwright Scraper (No API key needed)
  └── facebook.com/ads/library → Ad creatives, copy, status, platforms, dates
  
Phase 2: Meta Graph API (Requires access token)
  └── graph.facebook.com/v23.0/ads_archive → Spend ranges, impressions, demographics
  
Analysis Layer: Claude synthesizes insights from both sources

PHASE 1: Playwright Scraper

When to use: Always as the first step, or when user has no API token.
What it gets: Ad creatives (image/video URLs), ad copy, CTA text, page name, start date, active status, platforms (Facebook/Instagram), ad format (carousel, video, static).
What it can't get: Spend ranges, impressions, demographic breakdown (those need Phase 2).

Setup

pip install playwright --break-system-packages
playwright install chromium
pip install asyncio --break-system-packages

Core Playwright Script

Write this to /tmp/meta_ad_scraper.py:

import asyncio
import json
import re
import sys
from playwright.async_api import async_playwright

async def scrape_ad_library(
    search_query: str = None,
    page_id: str = None,
    country: str = "ALL",
    ad_type: str = "all",           # all | political_and_issue_ads | housing_ads
    active_status: str = "active",  # active | inactive | all
    media_type: str = "all",        # all | image | meme | video | none
    max_ads: int = 50
) -> list[dict]:
    """
    Scrape Meta Ad Library for competitor ads.
    Either search_query or page_id must be provided.
    """
    results = []

    # Build URL
    base = "https://www.facebook.com/ads/library/?"
    params = {
        "active_status": active_status,
        "ad_type": ad_type,
        "country": country,
        "media_type": media_type,
    }
    if search_query:
        params["q"] = search_query
        params["search_type"] = "keyword_unordered"
    elif page_id:
        params["view_all_page_id"] = page_id
        params["search_type"] = "page"
    
    url = base + "&".join(f"{k}={v}" for k, v in params.items())

    async with async_playwright() as p:
        browser = await p.chromium.launch(
            headless=True,
            args=[
                "--no-sandbox",
                "--disable-blink-features=AutomationControlled",
                "--disable-dev-shm-usage",
            ]
        )
        context = await browser.new_context(
            viewport={"width": 1440, "height": 900},
            user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
            locale="en-US",
        )
        # Stealth: mask webdriver
        await context.add_init_script("""
            Object.defineProperty(navigator, 'webdriver', { get: () => undefined });
        """)

        page = await context.new_page()
        print(f"[Phase 1] Navigating to: {url}")
        await page.goto(url, wait_until="networkidle", timeout=30000)
        await page.wait_for_timeout(3000)

        # Scroll to load more ads
        ads_loaded = 0
        scroll_attempts = 0
        while ads_loaded < max_ads and scroll_attempts < 20:
            await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
            await page.wait_for_timeout(2000)
            # Count ad cards
            ad_cards = await page.query_selector_all('[data-testid="ad-card"], ._7jvw, [class*="x8t9es0"]')
            ads_loaded = len(ad_cards)
            scroll_attempts += 1
            if scroll_attempts % 5 == 0:
                print(f"[Phase 1] Loaded {ads_loaded} ads so far...")

        # Extract ad data via JavaScript
        ads_data = await page.evaluate("""
            () => {
                const ads = [];
                // Meta Ad Library renders ads in divs; extract all visible text/image data
                // Look for ad archive links which contain library IDs
                const links = document.querySelectorAll('a[href*="ads/archive"]');
                const seen_ids = new Set();
                links.forEach(link => {
                    const href = link.href;
                    const id_match = href.match(/id=(\d+)/);
                    if (id_match && !seen_ids.has(id_match[1])) {
                        seen_ids.add(id_match[1]);
                        // Walk up to find the ad container
                        let container = link;
                        for (let i = 0; i < 8; i++) {
                            container = container.parentElement;
                            if (!container) break;
                        }
                        const getText = (el, fallback='') => el ? el.innerText.trim() : fallback;
                        const getAttr = (el, attr, fallback='') => el ? el.getAttribute(attr) || fallback : fallback;
                        
                        ads.push({
                            ad_archive_id: id_match[1],
                            ad_snapshot_url: href,
                            page_name: getText(container?.querySelector('[class*="page-name"], strong')),
                            ad_body: getText(container?.querySelector('[data-ad-preview="message"], [class*="body"]')),
                            ad_title: getText(container?.querySelector('[class*="title"]')),
                            cta_text: getText(container?.querySelector('[class*="cta"], button')),
                            image_url: getAttr(container?.querySelector('img[src*="fbcdn"]'), 'src'),
                            started_running: getText(container?.querySelector('[class*="started-running"]')),
                            platforms: Array.from(container?.querySelectorAll('[class*="platform"]') || []).map(el => el.innerText.trim()).filter(Boolean),
                            raw_text: container?.innerText?.substring(0, 500) || '',
                        });
                    }
                });
                return ads;
            }
        """)
        
        # Also capture network requests for richer data
        print(f"[Phase 1] Extracted {len(ads_data)} ads from DOM")
        results = ads_data[:max_ads]
        
        await browser.close()
    
    return results


async def main():
    query = sys.argv[1] if len(sys.argv) > 1 else "Nike shoes"
    ads = await scrape_ad_library(search_query=query, max_ads=20)
    print(json.dumps(ads, indent=2, ensure_ascii=False))


if __name__ == "__main__":
    asyncio.run(main())

How to Run Phase 1

python /tmp/meta_ad_scraper.py "competitor brand name"

Or from within Python (for page ID lookups):

ads = await scrape_ad_library(page_id="434174436675167", active_status="active")

Filters Available in Phase 1

| Filter | Values | Notes | |--------|--------|-------| | active_status | active, inactive, all | active = currently running | | ad_type | all, political_and_issue_ads, housing_ads, employment_ads, credit_ads | Default: all | | country | ALL, US, IN, GB, DE, FR, AU, etc. | ISO codes | | media_type | all, image, meme, video, none | Filter by creative format | | search_query | Any keyword string | Brand name, product, keyword | | page_id | Facebook Page ID | More precise than keyword search |

PHASE 2: Meta Graph API

When to use: After Phase 1, or when user wants spend/impression/demographic data.
Requirements: Meta developer account + access token (see setup below).
What it gets: Spend ranges, impression ranges, demographic distribution (EU/political), delivery by region, ad creative details, estimated audience size.

Setup Instructions (tell the user)

Go to Meta for Developers → Create App
Go to facebook.com/ID → Confirm identity (required for spend data)
Generate a User Access Token with ads_read permission from Graph API Explorer
Set as env var: export META_ACCESS_TOKEN="your_token_here"

Core API Script

Write this to /tmp/meta_ad_api.py:

import requests
import json
import os
import time
import sys
from typing import Optional

META_API_VERSION = "v23.0"
BASE_URL = f"https://graph.facebook.com/{META_API_VERSION}/ads_archive"

# All available fields from the API
ALL_FIELDS = [
    "id",
    "ad_archive_id", 
    "ad_creative_bodies",
    "ad_creative_link_captions",
    "ad_creative_link_descriptions", 
    "ad_creative_link_titles",
    "ad_delivery_start_time",
    "ad_delivery_stop_time",
    "ad_snapshot_url",
    "bylines",
    "delivery_by_region",
    "demographic_distribution",
    "estimated_audience_size",
    "impressions",
    "page_id",
    "page_name",
    "publisher_platforms",
    "spend",
    "languages",
    "currency",
    "ad_creative_link_caption",
    "ad_creative_link_url",
]

def query_ad_library(
    access_token: str,
    search_terms: str = None,
    search_page_ids: list[str] = None,
    ad_reached_countries: list[str] = ["US"],
    ad_active_status: str = "ACTIVE",     # ACTIVE | INACTIVE | ALL
    ad_type: str = "ALL",                  # ALL | POLITICAL_AND_ISSUE_ADS | etc.
    ad_delivery_date_min: str = None,      # "YYYY-MM-DD"
    ad_delivery_date_max: str = None,      # "YYYY-MM-DD"
    publisher_platforms: list[str] = None, # ["FACEBOOK", "INSTAGRAM"]
    languages: list[str] = None,
    limit: int = 50,
    max_pages: int = 5,
) -> list[dict]:
    """
    Query Meta Ad Library API with full pagination support.
    Returns list of ad objects with all available fields.
    """
    if not access_token:
        raise ValueError("META_ACCESS_TOKEN is required for Phase 2")

    params = {
        "access_token": access_token,
        "ad_active_status": ad_active_status,
        "ad_type": ad_type,
        "ad_reached_countries": json.dumps(ad_reached_countries),
        "fields": ",".join(ALL_FIELDS),
        "limit": min(limit, 500),  # API max per page
    }
    
    if search_terms:
        params["search_terms"] = search_terms
    if search_page_ids:
        params["search_page_ids"] = ",".join(search_page_ids)
    if ad_delivery_date_min:
        params["ad_delivery_date_min"] = ad_delivery_date_min
    if ad_delivery_date_max:
        params["ad_delivery_date_max"] = ad_delivery_date_max
    if publisher_platforms:
        params["publisher_platforms"] = json.dumps(publisher_platforms)
    if languages:
        params["languages"] = json.dumps(languages)

    all_ads = []
    page_count = 0
    next_url = None

    while page_count < max_pages:
        try:
            if next_url:
                response = requests.get(next_url, timeout=30)
            else:
                response = requests.get(BASE_URL, params=params, timeout=30)
            
            response.raise_for_status()
            data = response.json()
            
            if "error" in data:
                print(f"[Phase 2] API Error: {data['error']}", file=sys.stderr)
                break
            
            ads = data.get("data", [])
            all_ads.extend(ads)
            page_count += 1
            print(f"[Phase 2] Page {page_count}: fetched {len(ads)} ads (total: {len(all_ads)})")
            
            # Pagination
            paging = data.get("paging", {})
            next_url = paging.get("next")
            if not next_url or len(all_ads) >= limit:
                break
            
            time.sleep(1)  # Rate limit courtesy
            
        except requests.exceptions.RequestException as e:
            print(f"[Phase 2] Request error: {e}", file=sys.stderr)
            break

    return all_ads[:limit]


def analyze_ads(ads: list[dict]) -> dict:
    """
    Extract competitive intelligence insights from raw ad data.
    """
    if not ads:
        return {"error": "No ads found"}

    # Spend analysis
    spends = []
    for ad in ads:
        spend = ad.get("spend", {})
        if isinstance(spend, dict):
            lo = spend.get("lower_bound", 0)
            hi = spend.get("upper_bound", 0)
            if lo and hi:
                spends.append({"ad_id": ad.get("ad_archive_id"), "min": int(lo), "max": int(hi), "midpoint": (int(lo)+int(hi))//2})

    # Platform distribution
    platform_counts = {}
    for ad in ads:
        for p in ad.get("publisher_platforms", []):
            platform_counts[p] = platform_counts.get(p, 0) + 1

    # Ad longevity (proxy for performance — longer running = likely working)
    from datetime import datetime
    long_running = []
    for ad in ads:
        start = ad.get("ad_delivery_start_time")
        if start:
            try:
                days = (datetime.now() - datetime.fromisoformat(start.replace("Z",""))).days
                long_running.append({"ad_id": ad.get("ad_archive_id"), "days_running": days, "page": ad.get("page_name")})
            except:
                pass
    long_running.sort(key=lambda x: x["days_running"], reverse=True)

    # Creative format distribution
    creative_bodies = [ad.get("ad_creative_bodies", []) for ad in ads if ad.get("ad_creative_bodies")]
    
    return {
        "total_ads": len(ads),
        "spend_analysis": {
            "ads_with_spend_data": len(spends),
            "estimated_total_min_spend": sum(s["min"] for s in spends),
            "estimated_total_max_spend": sum(s["max"] for s in spends),
            "top_spenders": sorted(spends, key=lambda x: x["midpoint"], reverse=True)[:5],
        },
        "platform_distribution": platform_counts,
        "longest_running_ads": long_running[:10],
        "pages_advertising": list(set(ad.get("page_name") for ad in ads if ad.get("page_name"))),
        "sample_creatives": [
            {
                "page": ad.get("page_name"),
                "body": (ad.get("ad_creative_bodies") or [""])[0][:300],
                "title": (ad.get("ad_creative_link_titles") or [""])[0],
                "platforms": ad.get("publisher_platforms", []),
                "snapshot_url": ad.get("ad_snapshot_url"),
            }
            for ad in ads[:10]
        ]
    }


if __name__ == "__main__":
    token = os.environ.get("META_ACCESS_TOKEN", "")
    search = sys.argv[1] if len(sys.argv) > 1 else "Nike"
    ads = query_ad_library(token, search_terms=search, ad_reached_countries=["US"], limit=50)
    analysis = analyze_ads(ads)
    print(json.dumps(analysis, indent=2, ensure_ascii=False))
    # Also save raw data
    with open("/tmp/meta_ads_raw.json", "w") as f:
        json.dump(ads, f, indent=2, ensure_ascii=False)
    print(f"\n[Phase 2] Raw data saved to /tmp/meta_ads_raw.json")

API Filter Reference

| Parameter | Values | Notes | |-----------|--------|-------| | search_terms | Any string | Keyword search in ad content | | search_page_ids | List of FB page IDs | Most precise competitor lookup | | ad_reached_countries | ["US"], ["IN"], ["GB","DE"] | Required parameter | | ad_active_status | ACTIVE, INACTIVE, ALL | ACTIVE = currently live | | ad_type | ALL, POLITICAL_AND_ISSUE_ADS, HOUSING_ADS, EMPLOYMENT_ADS, FINANCIAL_SERVICES | Filter by category | | ad_delivery_date_min | "2024-01-01" | Start of date range | | ad_delivery_date_max | "2024-12-31" | End of date range | | publisher_platforms | ["FACEBOOK"], ["INSTAGRAM"], ["FACEBOOK","INSTAGRAM"] | Platform filter | | languages | ["en"], ["hi"], ["es"] | Language codes |

Data Fields Available from API

Always available (all ads):

ad_archive_id — Unique ad ID
page_id, page_name — Advertiser page
ad_creative_bodies — Ad copy text(s)
ad_creative_link_titles, ad_creative_link_descriptions — Headlines
ad_delivery_start_time, ad_delivery_stop_time — Run dates
publisher_platforms — FB/Instagram/Messenger/Audience Network
ad_snapshot_url — Link to view the actual ad

EU/UK/Political ads only:

spend — {lower_bound, upper_bound, currency} — Spend RANGE, not exact
impressions — {lower_bound, upper_bound} — Impression RANGE
estimated_audience_size — {lower_bound, upper_bound}
demographic_distribution — [{age, gender, percentage}] array
delivery_by_region — Geographic breakdown
bylines — "Paid for by" disclaimer

⚠️ Important: Spend and impressions are RANGES, not exact numbers. For non-EU/non-political ads in most countries including US and India, spend/impression data will NOT be returned. The official API is primarily a transparency tool. For richer commercial ad data, see the third-party alternatives in references/alternatives.md.

ANALYSIS WORKFLOW

When a user wants competitor ad intelligence, follow this flow:

Step 1 — Clarify the Target

Ask (or infer from context):

Who — brand name OR Facebook Page ID (better)
Where — country/region (US, IN, ALL, etc.)
What — active only, or historical too?
Goal — creative inspiration, spend monitoring, format analysis, copy patterns?

Step 2 — Find the Page ID (if only brand name given)

# Tell user to visit: https://www.facebook.com/ads/library/?q=BRAND_NAME
# The page_id appears in the URL when clicking on a page
# OR use the search API:
curl "https://graph.facebook.com/v23.0/pages/search?q=BRAND_NAME&access_token=TOKEN"

Step 3 — Run Phase 1 (Playwright)

Always run Phase 1 first. Write and execute /tmp/meta_ad_scraper.py.

Step 4 — Run Phase 2 (API), if token available

Check for META_ACCESS_TOKEN env var. If present, run Phase 2. If missing, tell user what Phase 2 would add, and give setup instructions.

Step 5 — Synthesize & Report

Produce a structured competitive intelligence report covering:

## 🕵️ Competitor Ad Intelligence Report: [Brand Name]

### 1. Current Ad Activity
- How many ads active right now
- Platforms being used (FB vs Instagram split)
- Ad formats (video, image, carousel)

### 2. Creative Strategy Analysis  
- Common themes in ad copy
- CTA patterns (Shop Now, Learn More, Sign Up, etc.)
- Headline formulas being used
- Hook styles (question, statement, social proof, urgency)

### 3. Ad Longevity Signals
- Longest-running ads (strong = likely performing well)
- New ads launched recently (testing phase)

### 4. Spend & Scale Signals (Phase 2 only, EU/political)
- Estimated spend ranges
- Impression volume estimates
- Geographic distribution

### 5. Audience Signals (Phase 2 EU only)
- Age/gender demographic breakdown
- Platform delivery split

### 6. Strategic Recommendations
- Gaps in competitor's strategy you can exploit
- Formats/messages they're NOT using
- High-performing creative patterns to draw inspiration from

COMMON WORKFLOWS

"What ads is [Brand] running right now?"

# Phase 1
ads = await scrape_ad_library(search_query="Brand Name", active_status="active")

# Phase 2 (if token available)
ads_api = query_ad_library(token, search_terms="Brand Name", ad_active_status="ACTIVE")

"Show me competitor video ads in India"

ads = await scrape_ad_library(
    search_query="competitor name",
    country="IN",
    media_type="video",
    active_status="active"
)

"How much is [Brand] spending on ads?" (EU/political only)

ads = query_ad_library(
    token,
    search_terms="Brand",
    ad_reached_countries=["GB"],  # or EU countries
    ad_type="ALL",
)
analysis = analyze_ads(ads)
# Look at analysis["spend_analysis"]

"Show me ads that have been running the longest" (= likely winners)

ads = query_ad_library(token, search_terms="Brand", ad_active_status="ALL")
analysis = analyze_ads(ads)
# analysis["longest_running_ads"] — sorted by days running

"Find ads about [topic/product keyword]"

ads = await scrape_ad_library(search_query="keyword phrase", active_status="all")

LIMITATIONS & WORKAROUNDS

| Limitation | Workaround | |------------|------------| | Spend data only for EU/political | Target EU countries in API query | | No CTR/conversion data | Use ad longevity as performance proxy | | Phase 1 DOM selectors may break | Fall back to raw text extraction + Claude analysis | | Rate limits on API | Add time.sleep(1) between pages, use cursor pagination | | Max ~429 ads per API session | Run multiple targeted queries, filter by date ranges | | No exact targeting info | Infer from demographic distribution (EU only) |

NOTES ON LEGALITY & ETHICS

The Meta Ad Library is public data — no login required for commercial ads
Using it for competitive research is explicitly Meta's stated purpose for the tool
The official API is a transparency tool — use it as intended
Playwright scraping of public pages is generally legal (ref: hiQ v. LinkedIn, 2022)
Do NOT attempt to scrape user data, private profiles, or non-public content
Always respect rate limits and avoid aggressive scraping

Meta Ad Spy — Competitor Ad Intelligence Skill

A two-phase skill for extracting and analyzing competitor ads from Meta platforms.

Architecture Overview

Phase 1: Playwright Scraper (No API key needed)
  └── facebook.com/ads/library → Ad creatives, copy, status, platforms, dates
  
Phase 2: Meta Graph API (Requires access token)
  └── graph.facebook.com/v23.0/ads_archive → Spend ranges, impressions, demographics
  
Analysis Layer: Claude synthesizes insights from both sources

PHASE 1: Playwright Scraper

Setup

pip install playwright --break-system-packages
playwright install chromium
pip install asyncio --break-system-packages

Core Playwright Script

Write this to /tmp/meta_ad_scraper.py:

import asyncio
import json
import re
import sys
from playwright.async_api import async_playwright

async def scrape_ad_library(
    search_query: str = None,
    page_id: str = None,
    country: str = "ALL",
    ad_type: str = "all",           # all | political_and_issue_ads | housing_ads
    active_status: str = "active",  # active | inactive | all
    media_type: str = "all",        # all | image | meme | video | none
    max_ads: int = 50
) -> list[dict]:
    """
    Scrape Meta Ad Library for competitor ads.
    Either search_query or page_id must be provided.
    """
    results = []

    # Build URL
    base = "https://www.facebook.com/ads/library/?"
    params = {
        "active_status": active_status,
        "ad_type": ad_type,
        "country": country,
        "media_type": media_type,
    }
    if search_query:
        params["q"] = search_query
        params["search_type"] = "keyword_unordered"
    elif page_id:
        params["view_all_page_id"] = page_id
        params["search_type"] = "page"
    
    url = base + "&".join(f"{k}={v}" for k, v in params.items())

    async with async_playwright() as p:
        browser = await p.chromium.launch(
            headless=True,
            args=[
                "--no-sandbox",
                "--disable-blink-features=AutomationControlled",
                "--disable-dev-shm-usage",
            ]
        )
        context = await browser.new_context(
            viewport={"width": 1440, "height": 900},
            user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
            locale="en-US",
        )
        # Stealth: mask webdriver
        await context.add_init_script("""
            Object.defineProperty(navigator, 'webdriver', { get: () => undefined });
        """)

        page = await context.new_page()
        print(f"[Phase 1] Navigating to: {url}")
        await page.goto(url, wait_until="networkidle", timeout=30000)
        await page.wait_for_timeout(3000)

        # Scroll to load more ads
        ads_loaded = 0
        scroll_attempts = 0
        while ads_loaded < max_ads and scroll_attempts < 20:
            await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
            await page.wait_for_timeout(2000)
            # Count ad cards
            ad_cards = await page.query_selector_all('[data-testid="ad-card"], ._7jvw, [class*="x8t9es0"]')
            ads_loaded = len(ad_cards)
            scroll_attempts += 1
            if scroll_attempts % 5 == 0:
                print(f"[Phase 1] Loaded {ads_loaded} ads so far...")

        # Extract ad data via JavaScript
        ads_data = await page.evaluate("""
            () => {
                const ads = [];
                // Meta Ad Library renders ads in divs; extract all visible text/image data
                // Look for ad archive links which contain library IDs
                const links = document.querySelectorAll('a[href*="ads/archive"]');
                const seen_ids = new Set();
                links.forEach(link => {
                    const href = link.href;
                    const id_match = href.match(/id=(\d+)/);
                    if (id_match && !seen_ids.has(id_match[1])) {
                        seen_ids.add(id_match[1]);
                        // Walk up to find the ad container
                        let container = link;
                        for (let i = 0; i < 8; i++) {
                            container = container.parentElement;
                            if (!container) break;
                        }
                        const getText = (el, fallback='') => el ? el.innerText.trim() : fallback;
                        const getAttr = (el, attr, fallback='') => el ? el.getAttribute(attr) || fallback : fallback;
                        
                        ads.push({
                            ad_archive_id: id_match[1],
                            ad_snapshot_url: href,
                            page_name: getText(container?.querySelector('[class*="page-name"], strong')),
                            ad_body: getText(container?.querySelector('[data-ad-preview="message"], [class*="body"]')),
                            ad_title: getText(container?.querySelector('[class*="title"]')),
                            cta_text: getText(container?.querySelector('[class*="cta"], button')),
                            image_url: getAttr(container?.querySelector('img[src*="fbcdn"]'), 'src'),
                            started_running: getText(container?.querySelector('[class*="started-running"]')),
                            platforms: Array.from(container?.querySelectorAll('[class*="platform"]') || []).map(el => el.innerText.trim()).filter(Boolean),
                            raw_text: container?.innerText?.substring(0, 500) || '',
                        });
                    }
                });
                return ads;
            }
        """)
        
        # Also capture network requests for richer data
        print(f"[Phase 1] Extracted {len(ads_data)} ads from DOM")
        results = ads_data[:max_ads]
        
        await browser.close()
    
    return results


async def main():
    query = sys.argv[1] if len(sys.argv) > 1 else "Nike shoes"
    ads = await scrape_ad_library(search_query=query, max_ads=20)
    print(json.dumps(ads, indent=2, ensure_ascii=False))


if __name__ == "__main__":
    asyncio.run(main())

How to Run Phase 1

python /tmp/meta_ad_scraper.py "competitor brand name"

Or from within Python (for page ID lookups):

ads = await scrape_ad_library(page_id="434174436675167", active_status="active")

Filters Available in Phase 1

PHASE 2: Meta Graph API

Setup Instructions (tell the user)

Go to Meta for Developers → Create App
Go to facebook.com/ID → Confirm identity (required for spend data)
Generate a User Access Token with ads_read permission from Graph API Explorer
Set as env var: export META_ACCESS_TOKEN="your_token_here"

Core API Script

Write this to /tmp/meta_ad_api.py:

import requests
import json
import os
import time
import sys
from typing import Optional

META_API_VERSION = "v23.0"
BASE_URL = f"https://graph.facebook.com/{META_API_VERSION}/ads_archive"

# All available fields from the API
ALL_FIELDS = [
    "id",
    "ad_archive_id", 
    "ad_creative_bodies",
    "ad_creative_link_captions",
    "ad_creative_link_descriptions", 
    "ad_creative_link_titles",
    "ad_delivery_start_time",
    "ad_delivery_stop_time",
    "ad_snapshot_url",
    "bylines",
    "delivery_by_region",
    "demographic_distribution",
    "estimated_audience_size",
    "impressions",
    "page_id",
    "page_name",
    "publisher_platforms",
    "spend",
    "languages",
    "currency",
    "ad_creative_link_caption",
    "ad_creative_link_url",
]

def query_ad_library(
    access_token: str,
    search_terms: str = None,
    search_page_ids: list[str] = None,
    ad_reached_countries: list[str] = ["US"],
    ad_active_status: str = "ACTIVE",     # ACTIVE | INACTIVE | ALL
    ad_type: str = "ALL",                  # ALL | POLITICAL_AND_ISSUE_ADS | etc.
    ad_delivery_date_min: str = None,      # "YYYY-MM-DD"
    ad_delivery_date_max: str = None,      # "YYYY-MM-DD"
    publisher_platforms: list[str] = None, # ["FACEBOOK", "INSTAGRAM"]
    languages: list[str] = None,
    limit: int = 50,
    max_pages: int = 5,
) -> list[dict]:
    """
    Query Meta Ad Library API with full pagination support.
    Returns list of ad objects with all available fields.
    """
    if not access_token:
        raise ValueError("META_ACCESS_TOKEN is required for Phase 2")

    params = {
        "access_token": access_token,
        "ad_active_status": ad_active_status,
        "ad_type": ad_type,
        "ad_reached_countries": json.dumps(ad_reached_countries),
        "fields": ",".join(ALL_FIELDS),
        "limit": min(limit, 500),  # API max per page
    }
    
    if search_terms:
        params["search_terms"] = search_terms
    if search_page_ids:
        params["search_page_ids"] = ",".join(search_page_ids)
    if ad_delivery_date_min:
        params["ad_delivery_date_min"] = ad_delivery_date_min
    if ad_delivery_date_max:
        params["ad_delivery_date_max"] = ad_delivery_date_max
    if publisher_platforms:
        params["publisher_platforms"] = json.dumps(publisher_platforms)
    if languages:
        params["languages"] = json.dumps(languages)

    all_ads = []
    page_count = 0
    next_url = None

    while page_count < max_pages:
        try:
            if next_url:
                response = requests.get(next_url, timeout=30)
            else:
                response = requests.get(BASE_URL, params=params, timeout=30)
            
            response.raise_for_status()
            data = response.json()
            
            if "error" in data:
                print(f"[Phase 2] API Error: {data['error']}", file=sys.stderr)
                break
            
            ads = data.get("data", [])
            all_ads.extend(ads)
            page_count += 1
            print(f"[Phase 2] Page {page_count}: fetched {len(ads)} ads (total: {len(all_ads)})")
            
            # Pagination
            paging = data.get("paging", {})
            next_url = paging.get("next")
            if not next_url or len(all_ads) >= limit:
                break
            
            time.sleep(1)  # Rate limit courtesy
            
        except requests.exceptions.RequestException as e:
            print(f"[Phase 2] Request error: {e}", file=sys.stderr)
            break

    return all_ads[:limit]


def analyze_ads(ads: list[dict]) -> dict:
    """
    Extract competitive intelligence insights from raw ad data.
    """
    if not ads:
        return {"error": "No ads found"}

    # Spend analysis
    spends = []
    for ad in ads:
        spend = ad.get("spend", {})
        if isinstance(spend, dict):
            lo = spend.get("lower_bound", 0)
            hi = spend.get("upper_bound", 0)
            if lo and hi:
                spends.append({"ad_id": ad.get("ad_archive_id"), "min": int(lo), "max": int(hi), "midpoint": (int(lo)+int(hi))//2})

    # Platform distribution
    platform_counts = {}
    for ad in ads:
        for p in ad.get("publisher_platforms", []):
            platform_counts[p] = platform_counts.get(p, 0) + 1

    # Ad longevity (proxy for performance — longer running = likely working)
    from datetime import datetime
    long_running = []
    for ad in ads:
        start = ad.get("ad_delivery_start_time")
        if start:
            try:
                days = (datetime.now() - datetime.fromisoformat(start.replace("Z",""))).days
                long_running.append({"ad_id": ad.get("ad_archive_id"), "days_running": days, "page": ad.get("page_name")})
            except:
                pass
    long_running.sort(key=lambda x: x["days_running"], reverse=True)

    # Creative format distribution
    creative_bodies = [ad.get("ad_creative_bodies", []) for ad in ads if ad.get("ad_creative_bodies")]
    
    return {
        "total_ads": len(ads),
        "spend_analysis": {
            "ads_with_spend_data": len(spends),
            "estimated_total_min_spend": sum(s["min"] for s in spends),
            "estimated_total_max_spend": sum(s["max"] for s in spends),
            "top_spenders": sorted(spends, key=lambda x: x["midpoint"], reverse=True)[:5],
        },
        "platform_distribution": platform_counts,
        "longest_running_ads": long_running[:10],
        "pages_advertising": list(set(ad.get("page_name") for ad in ads if ad.get("page_name"))),
        "sample_creatives": [
            {
                "page": ad.get("page_name"),
                "body": (ad.get("ad_creative_bodies") or [""])[0][:300],
                "title": (ad.get("ad_creative_link_titles") or [""])[0],
                "platforms": ad.get("publisher_platforms", []),
                "snapshot_url": ad.get("ad_snapshot_url"),
            }
            for ad in ads[:10]
        ]
    }


if __name__ == "__main__":
    token = os.environ.get("META_ACCESS_TOKEN", "")
    search = sys.argv[1] if len(sys.argv) > 1 else "Nike"
    ads = query_ad_library(token, search_terms=search, ad_reached_countries=["US"], limit=50)
    analysis = analyze_ads(ads)
    print(json.dumps(analysis, indent=2, ensure_ascii=False))
    # Also save raw data
    with open("/tmp/meta_ads_raw.json", "w") as f:
        json.dump(ads, f, indent=2, ensure_ascii=False)
    print(f"\n[Phase 2] Raw data saved to /tmp/meta_ads_raw.json")

API Filter Reference

Data Fields Available from API

Always available (all ads):

ad_archive_id — Unique ad ID
page_id, page_name — Advertiser page
ad_creative_bodies — Ad copy text(s)
ad_creative_link_titles, ad_creative_link_descriptions — Headlines
ad_delivery_start_time, ad_delivery_stop_time — Run dates
publisher_platforms — FB/Instagram/Messenger/Audience Network
ad_snapshot_url — Link to view the actual ad

EU/UK/Political ads only:

spend — {lower_bound, upper_bound, currency} — Spend RANGE, not exact
impressions — {lower_bound, upper_bound} — Impression RANGE
estimated_audience_size — {lower_bound, upper_bound}
demographic_distribution — [{age, gender, percentage}] array
delivery_by_region — Geographic breakdown
bylines — "Paid for by" disclaimer

⚠️ Important: Spend and impressions are RANGES, not exact numbers. For non-EU/non-political ads in most countries including US and India, spend/impression data will NOT be returned. The official API is primarily a transparency tool. For richer commercial ad data, see the third-party alternatives in references/alternatives.md.

ANALYSIS WORKFLOW

When a user wants competitor ad intelligence, follow this flow:

Step 1 — Clarify the Target

Ask (or infer from context):

Who — brand name OR Facebook Page ID (better)
Where — country/region (US, IN, ALL, etc.)
What — active only, or historical too?
Goal — creative inspiration, spend monitoring, format analysis, copy patterns?

Step 2 — Find the Page ID (if only brand name given)

# Tell user to visit: https://www.facebook.com/ads/library/?q=BRAND_NAME
# The page_id appears in the URL when clicking on a page
# OR use the search API:
curl "https://graph.facebook.com/v23.0/pages/search?q=BRAND_NAME&access_token=TOKEN"

Step 3 — Run Phase 1 (Playwright)

Always run Phase 1 first. Write and execute /tmp/meta_ad_scraper.py.

Step 4 — Run Phase 2 (API), if token available

Check for META_ACCESS_TOKEN env var. If present, run Phase 2. If missing, tell user what Phase 2 would add, and give setup instructions.

Step 5 — Synthesize & Report

Produce a structured competitive intelligence report covering:

## 🕵️ Competitor Ad Intelligence Report: [Brand Name]

### 1. Current Ad Activity
- How many ads active right now
- Platforms being used (FB vs Instagram split)
- Ad formats (video, image, carousel)

### 2. Creative Strategy Analysis  
- Common themes in ad copy
- CTA patterns (Shop Now, Learn More, Sign Up, etc.)
- Headline formulas being used
- Hook styles (question, statement, social proof, urgency)

### 3. Ad Longevity Signals
- Longest-running ads (strong = likely performing well)
- New ads launched recently (testing phase)

### 4. Spend & Scale Signals (Phase 2 only, EU/political)
- Estimated spend ranges
- Impression volume estimates
- Geographic distribution

### 5. Audience Signals (Phase 2 EU only)
- Age/gender demographic breakdown
- Platform delivery split

### 6. Strategic Recommendations
- Gaps in competitor's strategy you can exploit
- Formats/messages they're NOT using
- High-performing creative patterns to draw inspiration from

COMMON WORKFLOWS

"What ads is [Brand] running right now?"

# Phase 1
ads = await scrape_ad_library(search_query="Brand Name", active_status="active")

# Phase 2 (if token available)
ads_api = query_ad_library(token, search_terms="Brand Name", ad_active_status="ACTIVE")

"Show me competitor video ads in India"

ads = await scrape_ad_library(
    search_query="competitor name",
    country="IN",
    media_type="video",
    active_status="active"
)

"How much is [Brand] spending on ads?" (EU/political only)

ads = query_ad_library(
    token,
    search_terms="Brand",
    ad_reached_countries=["GB"],  # or EU countries
    ad_type="ALL",
)
analysis = analyze_ads(ads)
# Look at analysis["spend_analysis"]

"Show me ads that have been running the longest" (= likely winners)

ads = query_ad_library(token, search_terms="Brand", ad_active_status="ALL")
analysis = analyze_ads(ads)
# analysis["longest_running_ads"] — sorted by days running

"Find ads about [topic/product keyword]"

ads = await scrape_ad_library(search_query="keyword phrase", active_status="all")

LIMITATIONS & WORKAROUNDS

NOTES ON LEGALITY & ETHICS

The Meta Ad Library is public data — no login required for commercial ads
Using it for competitive research is explicitly Meta's stated purpose for the tool
The official API is a transparency tool — use it as intended
Playwright scraping of public pages is generally legal (ref: hiQ v. LinkedIn, 2022)
Do NOT attempt to scrape user data, private profiles, or non-public content
Always respect rate limits and avoid aggressive scraping

Adoption

openclaw/meta-ad-spy

$ install --global

Security Scan Results

SKILL.md

Meta Ad Spy — Competitor Ad Intelligence Skill

Architecture Overview

PHASE 1: Playwright Scraper

Setup

Core Playwright Script

How to Run Phase 1

Filters Available in Phase 1

PHASE 2: Meta Graph API

Setup Instructions (tell the user)

Core API Script

API Filter Reference

Data Fields Available from API

ANALYSIS WORKFLOW

Step 1 — Clarify the Target

Step 2 — Find the Page ID (if only brand name given)

Step 3 — Run Phase 1 (Playwright)

Step 4 — Run Phase 2 (API), if token available

Step 5 — Synthesize & Report

COMMON WORKFLOWS

"What ads is [Brand] running right now?"

"Show me competitor video ads in India"

"How much is [Brand] spending on ads?" (EU/political only)

"Show me ads that have been running the longest" (= likely winners)

"Find ads about [topic/product keyword]"

LIMITATIONS & WORKAROUNDS

NOTES ON LEGALITY & ETHICS

SEE ALSO

Related Skills

openclaw/mcdonalds-skill

openclaw/scrapebadger

openclaw/slowmist-security-cc

openclaw/humanizer-cn

openclaw/meta-ad-spy

$ install --global

Security Scan Results

SKILL.md

Meta Ad Spy — Competitor Ad Intelligence Skill

Architecture Overview

PHASE 1: Playwright Scraper

Setup

Core Playwright Script

How to Run Phase 1

Filters Available in Phase 1

PHASE 2: Meta Graph API

Setup Instructions (tell the user)

Core API Script

API Filter Reference

Data Fields Available from API

ANALYSIS WORKFLOW

Step 1 — Clarify the Target

Step 2 — Find the Page ID (if only brand name given)

Step 3 — Run Phase 1 (Playwright)

Step 4 — Run Phase 2 (API), if token available

Step 5 — Synthesize & Report

COMMON WORKFLOWS

"What ads is [Brand] running right now?"

"Show me competitor video ads in India"

"How much is [Brand] spending on ads?" (EU/political only)

"Show me ads that have been running the longest" (= likely winners)

"Find ads about [topic/product keyword]"

LIMITATIONS & WORKAROUNDS

NOTES ON LEGALITY & ETHICS

SEE ALSO

Related Skills

openclaw/mcdonalds-skill

openclaw/scrapebadger

openclaw/slowmist-security-cc

openclaw/humanizer-cn