skills/substack-notes-scraper/SKILL.md
Scrapes a Substack Notes page and exports engagement data (likes, comments, restacks) to a formatted .xlsx file with conditional formatting and summary stats.
npx skillsauth add mohitagw15856/pm-claude-skills substack-notes-scraperInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Substack has no public API for Notes analytics. You can't see likes, comments, and restacks in one place without scrolling through your feed manually. This skill scrapes the rendered Notes page, filters to only your original content, and exports everything to a spreadsheet you can actually analyze.
Credit: Originally created by a Substack newsletter author — adapted and extended for this library.
| Input | Format | Example |
|---|---|---|
| Notes URL | Full URL to the Notes tab | https://substack.com/@handle/notes |
| Author handle or name | Exact handle or display name | @handle or Jane Smith |
| Date range | Plain English or explicit range | last 30 days or Jan 2026 – Mar 2026 |
Claude will ask for these if not provided upfront.
substack-notes-[handle]-[YYYY-MM-DD].xlsx
| Column | Description |
|---|---|
| Date | Publication date (YYYY-MM-DD) |
| Text Preview | First 200 characters of the note |
| Full Text | Complete note text |
| Likes | Like count at time of scrape |
| Comments | Comment count |
| Restacks | Restack count |
| Total Engagement | Likes + Comments + Restacks |
| Link | Direct URL to the note |
| Note Type | original or restack |
Formatting applied:
#FFF2CC)Scrape Date: [YYYY-MM-DD HH:MM UTC]
Author: [handle]
Date Range: [start] – [end]
Total Notes: [n]
Original Notes: [n]
Restacks Filtered: [n]
Avg Likes: [n.n]
Avg Comments: [n.n]
Avg Restacks: [n.n]
Avg Total Eng: [n.n]
Best Note (Likes): [date] — [first 80 chars] — [n] likes
Best Note (Eng): [date] — [first 80 chars] — [n] total engagement
Confirm the three required inputs are present. If any are missing, ask before proceeding. Parse the date range into a concrete start date and end date (convert relative ranges like "last 30 days" to explicit dates using today's date).
Use WebFetch to load the Notes URL. Substack Notes pages are JavaScript-rendered — request the full rendered HTML. If WebFetch returns a skeleton page without note content, note this in your response and ask the user to paste the page HTML manually or confirm browser access is available.
Substack Notes load incrementally. Repeat fetching or scrolling until either:
Rate-limit: wait 2 seconds between each paginated request. Do not hammer the endpoint.
For every note element found on the page, extract:
original if the author matches the specified author; restack if it belongs to someone elseKeep ALL rows in the data (restacks included as rows with Note Type = restack). The Summary sheet stats should count only original notes. Mark restacks clearly so the user can filter them out themselves in Excel if preferred.
Apply date filter: exclude any note outside the specified date range.
For each row: Total Engagement = Likes + Comments + Restacks
Sort original notes by Likes descending. Mark the top 20% (round up) for conditional formatting. These rows will be highlighted yellow in the output file.
Use Python with openpyxl to generate the file. Structure:
# Required libraries
import openpyxl
from openpyxl.styles import PatternFill, Font, Alignment
from openpyxl.utils import get_column_letter
from datetime import datetime
# Sheet 1: Notes Data
# - Write header row, bold, freeze row 1
# - Write all data rows
# - Apply auto-filter: ws.auto_filter.ref = ws.dimensions
# - Apply yellow fill to top-20% rows by likes
# - Auto-size columns (iterate cells to find max length)
# Sheet 2: Summary
# - Write summary stats as key-value pairs, no table format
Name the file substack-notes-[handle]-[YYYY-MM-DD].xlsx using today's date.
After generating the file, report:
original notes, not restacksdevelopment
Build a framework for creating shareable, high-reach social media content. Use when asked to plan viral content, develop a shareable content strategy, create a hook writing system, or build a repeatable process for content that gets shared. Produces a platform-specific viral content framework with hook formulas, content structures, shareability triggers, and a content testing system.
development
Generate article or newsletter thumbnail candidates using the Gemini API from inside Claude Code. Claude reads article copy, proposes composition concepts, writes image generation prompts incorporating brand specs, calls Gemini to generate the images, evaluates the results via computer vision, and returns ranked candidates with rationale. Use when asked to create thumbnails, generate cover images, or produce visual candidates for an article or newsletter.
testing
Flips Claude's default from "find reasons you're right" to "find reasons you're wrong." A genuine thinking partner, not a mirror with grammar. Use before high-stakes decisions, plans, assumptions, or pitches you haven't stress-tested.
testing
Audit an existing social media presence across all active platforms. Use when asked to review social media performance, analyse a brand's social presence, benchmark against competitors, or identify what's working and what isn't. Produces a scored audit with platform-by-platform analysis, content performance review, competitive benchmarking, and a prioritised action plan.