clawdbot/gallery-scraper/SKILL.md
Bulk download images from login-protected gallery websites using an attached browser session. Use when asked to scrape, download, or save images from authenticated gallery pages, extract full-size images from thumbnails, or batch download from multi-page galleries.
npx skillsauth add jdrhyne/agent-skills gallery-scraperInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Bulk download images from authenticated gallery websites via browser relay.
Ask user to:
Most gallery sites store full-size URLs in data attributes. Common patterns:
// Extract via browser evaluate
() => {
// Try common patterns
const patterns = [
'img[data-max]', // data-max attribute
'img[data-src]', // lazy-load pattern
'img[data-full]', // full-size pattern
'a[data-lightbox] img', // lightbox galleries
'.gallery-item img' // generic gallery
];
for (const sel of patterns) {
const imgs = document.querySelectorAll(sel);
if (imgs.length > 0) {
return {
selector: sel,
count: imgs.length,
sample: imgs[0].outerHTML.substring(0, 200)
};
}
}
return null;
}
Once pattern identified, extract all URLs:
// For data-max pattern (common)
() => Array.from(document.querySelectorAll('img[data-max]'))
.map(img => img.dataset.max)
// For thumbnail→full conversion (replace path segment)
() => Array.from(document.querySelectorAll('.gallery img'))
.map(img => img.src.replace('/thumb/', '/full/'))
Check for multiple pages:
() => {
const pagination = document.querySelectorAll('.pagination a, [class*="page"] a');
return Array.from(pagination).map(a => ({text: a.textContent, href: a.href}));
}
Navigate to each page and collect URLs.
When you need multiple galleries quickly and can’t automate CDP, you can load each gallery in a hidden iframe and extract data-max URLs:
async () => {
const urls = [
'https://site.example/galleries/view/123',
'https://site.example/galleries/view/456'
];
const results = [];
for (const url of urls) {
const iframe = document.createElement('iframe');
iframe.style.position = 'fixed';
iframe.style.left = '-9999px';
iframe.style.width = '800px';
iframe.style.height = '600px';
iframe.src = url;
document.body.appendChild(iframe);
await new Promise((resolve, reject) => {
const t = setTimeout(() => reject(new Error('timeout load')), 20000);
iframe.onload = () => { clearTimeout(t); resolve(); };
});
const doc = iframe.contentDocument;
const start = Date.now();
let imgs = [];
while (Date.now() - start < 20000) {
imgs = Array.from(doc.querySelectorAll('img[data-max]')).map(i => i.dataset.max);
if (imgs.length) break;
await new Promise(r => setTimeout(r, 500));
}
results.push({ id: url.split('/').pop(), urls: imgs });
iframe.remove();
}
return results;
}
Test if CDN requires authentication or just Referer:
# Test direct access
curl -I "CDN_URL" 2>/dev/null | head -3
# Test with Referer
curl -I -H "Referer: https://SITE_DOMAIN/" "CDN_URL" 2>/dev/null | head -3
Collect the URLs into a text file, then parallel download:
# Create output directory
mkdir -p ~/Downloads/gallery_name
# Download with Referer header (parallel)
cd ~/Downloads/gallery_name
while IFS= read -r url; do
filename=$(basename "$url")
curl -s -H "Referer: https://SITE_DOMAIN/" -o "$filename" "$url" &
[ $(jobs -r | wc -l) -ge 8 ] && wait -n
done < urls.txt
wait
Python ThreadPool fallback (avoids shell quoting + wait -n issues):
import os
import requests
from concurrent.futures import ThreadPoolExecutor
outdir = os.path.expanduser('~/Downloads/gallery_name')
os.makedirs(outdir, exist_ok=True)
headers = {'Referer': 'https://SITE_DOMAIN/', 'User-Agent': 'Mozilla/5.0'}
with open('urls.txt') as f:
urls = [line.strip() for line in f if line.strip()]
def download(url):
filename = os.path.join(outdir, os.path.basename(url))
if os.path.exists(filename) and os.path.getsize(filename) > 0:
return
r = requests.get(url, headers=headers, timeout=60)
r.raise_for_status()
with open(filename, 'wb') as f:
f.write(r.content)
with ThreadPoolExecutor(max_workers=8) as ex:
for url in urls:
ex.submit(download, url)
Some galleries have "lock" buttons to reveal hidden content. Look for:
// Find lock/unlock buttons
() => {
const locks = document.querySelectorAll(
'[class*="lock"], [class*="unlock"], ' +
'button[title*="lock"], .premium-unlock'
);
return Array.from(locks).map(el => ({
tag: el.tagName,
class: el.className,
text: el.innerText?.substring(0, 30)
}));
}
Click each lock button before extracting URLs.
Optionally organize by gallery:
# Derive a gallery-specific folder name from the selected URL
mkdir -p "gallery_<id>"
document.cookiedevelopment
Manage Zendesk tickets, users, and support workflows through the Zendesk API. Use when searching tickets, updating support state, checking users, or exporting queue data.
development
Autonomous multi-agent task orchestration with dependency analysis, parallel tmux/Codex execution, and self-healing heartbeat monitoring. Use for large projects with multiple issues/tasks that need coordinated parallel execution.
tools
Query and manage Salesforce CRM data via the Salesforce CLI (`sf`). Run SOQL/SOSL queries, inspect object schemas, create/update/delete records, bulk import/export, execute Apex, deploy metadata, and make raw REST API calls.
development
Best practices for Remotion video creation in React — compositions, sequences, animation, timing, and rendering. Use when building, reviewing, or debugging Remotion videos.