.agents/skills/troubleshoot/SKILL.md
Diagnose and fix common issues in DataPipeline OS
npx skillsauth add Elmanda1/nexus_datagen Troubleshoot DataPipeline OSInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When an error occurs, follow this triage order:
config/login_*.png for Twitter login issuesdata_output/ for partial data that was writtenSymptom: Engine falls back to error state with login failure message
Root Cause: Twitter frequently changes HTML selectors on the login page.
Fix:
config/login_debug.png or config/login_debug_pwd.pngengines/twitter_engine.pyUSERNAME_SELECTORS (around line 314) and PASSWORD_SELECTORS (around line 320)config/chromium_profile/ and config/twitter_session.jsonSymptom: Playwright opens Twitter login but page is blank white/grey (no form rendered)
Root Cause: Twitter login is a React SPA. The page DOM loads but JavaScript hasn't mounted the React app yet.
Fix (already applied in current engine):
wait_until="networkidle" instead of "domcontentloaded"navigator.webdriver to prevent detectiondocument.body.scrollHeight < 200 and reloadsconfig/chromium_profile/ and config/twitter_session.json, then retrySymptom: Log shows rate limit warning, engine stops early
Fix: This is by design. Engine auto-skips to next keyword. Wait 15 minutes before rerunning. For large date ranges, reduce keywords or narrow the range.
Symptom: OSError: [Errno 48] Address already in use
Fix:
netstat -ano | findstr :8080 # Windows
lsof -i :8080 # macOS/Linux
app.py (last line):
app.run(debug=True, port=8081, use_reloader=False)
Symptom: 403 Forbidden from Reddit API
Fix: Reddit requires accounts with some activity history. New accounts are rejected. Wait 5-7 days with normal activity, then try again.
Symptom: TooManyRequestsError from pytrends
Fix: pytrends is limited to ~10 requests per minute by Google. Reduce number of keywords or increase delay between requests in trends_engine.py.
Symptom: ModuleNotFoundError: No module named 'xxx'
Fix:
# Ensure venv is active
venv\Scripts\activate # Windows
source venv/bin/activate # macOS/Linux
# Reinstall
pip install -r requirements.txt --timeout 120
Symptom: Pipeline completes but CSV is empty or very small
Possible Causes:
.env, engines will use simulation modeDiagnostic:
# Check if engines ran in simulation mode
# Look for "_sim" in the source column of output CSV
Symptom: playwright._impl._errors.Error: Executable doesn't exist
Fix:
pip install playwright
playwright install chromium
If behind a proxy:
set HTTPS_PROXY=http://proxy:port # Windows
export HTTPS_PROXY=http://proxy:port # macOS/Linux
playwright install chromium
curl http://localhost:8080/api/state | python -m json.tool
Look at engines object — each engine shows status, rows, ram_mb.
To test without any API keys, ensure all API-related variables in .env are empty. All engines will automatically fall back to simulation.
curl http://localhost:8080/api/twitter/info
Returns: session_exists, profile_exists, username, has_password.
development
How to set up and run the DataPipeline OS extraction pipeline
development
Step-by-step guide to add a new data source engine to DataPipeline OS
development
Maintainer-only workflow for handling GitHub Secret Scanning alerts on OpenClaw. Use when Codex needs to triage, redact, clean up, and resolve secret leakage found in issue comments, issue bodies, PR comments, or other GitHub content.
development
Maintainer workflow for OpenClaw releases, prereleases, changelog release notes, and publish validation. Use when Codex needs to prepare or verify stable or beta release steps, align version naming, assemble release notes, check release auth requirements, or validate publish-time commands and artifacts.