skills/osint/SKILL.md
Conduct deep OSINT research on individuals. Build full digital footprint, psychoprofile (MBTI/Big Five), career history, social graph with confidence scores. Recursive self-evaluation until completeness threshold is met. Includes internal intelligence (Telegram history, email, vault contacts) before going external. Use when: "osint", "досье", "research person", "find everything about", "пробей", "разведка", "due diligence", "background check", "digital footprint", "найди всё про", "собери информацию", "кто это", "профиль человека". NOT for: company/product research without a named person, competitive analysis, market research, content generation, or general web scraping tasks.
npx skillsauth add ValorInvestigator/claude-plugin-toolkit osintInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Systematic intelligence gathering on individuals. From a name or handle to a scored dossier with psychoprofile, career map, and entry points.
Determine entry point from context:
Default (full research request): Phase 0 → 1 → 1.5 → 2 → 3 → 4 → 5 → 6.
All API keys via environment variables. Never hardcode tokens.
PERPLEXITY_API_KEY — Perplexity Sonar (fast answers + deep research)EXA_API_KEY — Exa AI (semantic search, company/people research, deep research)TAVILY_API_KEY — Tavily (agent-optimized search + extract, $0.005/req basic)APIFY_API_TOKEN — Apify scraping (LinkedIn, Instagram, Facebook)JINA_API_KEY — Jina reader/search/deepsearchPARALLEL_API_KEY — Parallel AI searchBRIGHTDATA_MCP_URL — Bright Data MCP endpoint (full URL with token)MCPORTER_CONFIG — mcporter config pathRun from skill dir: bash scripts/<name>.sh.
Each validates env vars, exits with descriptive error + URL to get the key.
Search & Research:
diagnose.sh — run FIRST. Capability map of all tools.perplexity.sh — search <query> | sonar <query> (AI answer) | deep <query> (deep research)tavily.sh — search <query> (basic $0.005) | deep <query> (advanced) | extract <url>exa.sh — search <query> | company <name> | people <name> | crawl <url> | deep <prompt>first-volley.sh "Name" "context" — parallel search, all engines at once.merge-volley.sh <outdir> — deduplicate and merge first-volley results.Scraping:
apify.sh — linkedin <url> | instagram <handle> | run | results | store-searchrun-actor.sh — universal Apify runner (55+ actors). Embedded from apify/agent-skills.
Quick answer: bash scripts/run-actor.sh "actor/id" '{"input":"json"}'
Export: bash scripts/run-actor.sh "actor/id" '{"input":"json"}' --output /tmp/out.csvjina.sh — read <url> | search <query> | deepsearch <query>parallel.sh — search <query> | extract <url>brightdata.sh — scrape <url> | scrape-batch | search | search-geo <cc> | search-yandexПринцип: от дешёвого к дорогому, от быстрого к глубокому.
Начни ВСЕГДА с этого. Получи быстрый контекст прежде чем копать. Запускай ВСЕ параллельно:
# Perplexity Sonar — AI ответ с цитатами
bash skills/osint/scripts/perplexity.sh sonar "Who is <Name>, <context>"
# Brave Search — классический поиск
web_search "<Name> <company> <role>"
# Tavily — agent-optimized search с AI answer
bash skills/osint/scripts/tavily.sh search "<Name> <context>"
# Exa — семантический поиск + company/people research
bash skills/osint/scripts/exa.sh search "<Name> <context>"
bash skills/osint/scripts/exa.sh people "<Name>"
→ Получаешь: быстрые факты, ссылки, контекст. → Решение: достаточно? → Phase 6. Нужно больше? → Level 2.
Проверяй источники из Level 1 через fetch:
# Читай найденные URL
web_fetch "<url_from_perplexity>"
bash skills/osint/scripts/jina.sh read "<url>"
bash skills/osint/scripts/parallel.sh extract "<url>"
→ Получаешь: подтверждённые факты, cross-reference. → Совпадает? → дополняй досье. Нужно глубже? → Level 3.
Подключай scraping для соцсетей:
# LinkedIn
bash skills/osint/scripts/apify.sh linkedin "<url>"
# Instagram
bash skills/osint/scripts/apify.sh instagram "<handle>"
# Facebook, заблокированные сайты
bash skills/osint/scripts/brightdata.sh scrape "<url>"
→ Получаешь: структурированные профили, фото, связи.
Если нужно копать ещё глубже — формируй развёрнутый промпт и отправляй в deep research. Запускай ВСЕ параллельно (30-60 сек каждый):
# Perplexity Deep Research
bash skills/osint/scripts/perplexity.sh deep "<detailed research prompt about Name>"
# Exa Deep Research
bash skills/osint/scripts/exa.sh deep "<detailed prompt>"
# Parallel AI Deep Search
bash skills/osint/scripts/parallel.sh search "<detailed query>"
# Jina DeepSearch
bash skills/osint/scripts/jina.sh deepsearch "<query>"
Правило: Level 4 промпт должен быть РАЗВЁРНУТЫМ — включай всё что уже знаешь из Level 1-3, чтобы deep research не повторял базовые факты, а копал дальше.
OSINT research runs as a swarm of parallel sub-agents on Sonnet. The main agent is the coordinator — it does NOT scrape itself.
sessions_spawn with model: sonnet, mode: runstreamers/youtube-channel-scraper for channel dataapify/facebook-pages-scraper + apify/facebook-page-contact-informationvdrmota/contact-info-scraper on found websitesclockworks/tiktok-profile-scraper), local registries, press, university records, Yandex search, Google Maps (compass/crawler-google-places if business owner)/tmp/osint-<subject>-<task>.mdbash skills/osint/scripts/diagnose.sh.tg.py (Telegram history), himalaya (email), vault contacts.Start with Level 1 (quick answers) ALWAYS before heavy scraping.
bash skills/osint/scripts/perplexity.sh search "Who is <Name>, <context>"
web_search "<Name> <company>"
bash skills/osint/scripts/first-volley.sh "Full Name" "context"
web_fetch "<citation_url_1>"
web_fetch "<citation_url_2>"
bash skills/osint/scripts/merge-volley.sh /tmp/osint-<timestamp>.Rate limiting: wait 1s between Brave queries, 2s between Jina calls. Do NOT hammer APIs in tight loops — stagger parallel launches.
Before going external, check what we already know. This phase mines local sources that may contain gold — prior conversations, emails, vault contacts.
If tg.py is available (check Phase 0):
# Search by name/handle in Telegram
python3 skills/telegram/scripts/tg.py search "Name" 20
# If we have their username/id — read conversation history
python3 skills/telegram/scripts/tg.py history <username_or_id> 50
What to extract from Telegram history:
⚠️ Telegram history is Grade A intelligence — unfiltered, real-time, authentic. Weight it higher than curated LinkedIn/Instagram profiles. ⚠️ Privacy: internal intelligence stays in the dossier. Never quote DMs in public outputs.
If himalaya is available:
# Search emails by name or domain
~/.local/bin/himalaya search "from:[email protected] OR to:[email protected]" -f INBOX
# Or by name
~/.local/bin/himalaya search "Name Surname" -f INBOX
~/.local/bin/himalaya search "Name Surname" -f Sent
What to extract from email:
# Check if we already have a card
grep -rl "Name" vault/crm/ vault/contacts/ 2>/dev/null
# Check MOC indexes (adjust paths to your vault structure)
grep -i "name" vault/MOC/*.md 2>/dev/null
If vault card exists: read it, note last_accessed, existing tags, prior interactions. Don't duplicate — enrich the existing card after research completes.
If meeting in person and node is available, nodes camera_snap can capture context.
Only with explicit user permission.
After Phase 1.5, you should know:
This context shapes Phase 2 priorities — if we already know their career from emails, focus external research on psychoprofile and social media instead.
Read references/platforms.md ONLY when needing URL patterns or extraction signals.
Tool priority (primary → fallback). If primary fails, switch immediately. Never retry same tool.
apify.sh linkedin → brightdata.sh scrape → jina.sh readapify.sh instagram → brightdata.sh scraperun-actor.sh "apify/instagram-tagged-scraper" (who tags them), apify/instagram-comment-scraper (sentiment)brightdata.sh scrape → none (only Bright Data works)run-actor.sh "apify/facebook-pages-scraper" → brightdata.sh scraperun-actor.sh "clockworks/tiktok-profile-scraper" → clockworks/tiktok-scraper (comprehensive)run-actor.sh "clockworks/tiktok-user-search-scraper" (find by keywords)run-actor.sh "streamers/youtube-channel-scraper" → jina.sh read → brightdata.sh scrapeweb_fetch t.me/s/{channel} → jina.sh readpython3 scripts/twitter.py tweet <url> → jina.sh readrun-actor.sh "compass/crawler-google-places"run-actor.sh "vdrmota/contact-info-scraper" (extract emails/phones from any URL)jina.sh read → brightdata.sh scraperun-actor.sh = universal Apify runner (embedded, 55+ actors). See references/tools.md for full actor catalog.
Read references/tools.md ONLY when troubleshooting a failed tool.
When you find YouTube, podcast, blog, or conference talks — read references/content-extraction.md immediately and extract 3-5 pieces of content on the spot.
Do NOT just note the URL. Extract transcripts/text NOW. A 20-minute YouTube video reveals more about a person than their entire LinkedIn. Content platforms are the #1 source for psychoprofile — skipping them = shallow dossier.
If initial searches return unusually little for someone who should have a footprint:
web_fetch "https://web.archive.org/web/2024*/target-url" — deleted profiles, old biosweb_search "cache:domain.com/path" — recently removed pagesbrightdata.sh search-yandex "Name" — Yandex indexes CIS deeper and caches longerList every claim as a row: fact | source 1 | source 2 | grade.
For each critical fact (employer, role, location, education):
If LinkedIn says "CEO" but company site says "Co-founder" — flag explicitly. Include both with sources. Do NOT silently pick one.
If common name — verify at least 2 facts (company + city, or photo + company) link to same person. If unsure, split into separate entities.
Internal intelligence (Phase 1.5) counts as an independent source.
Read references/psychoprofile.md ONLY at this phase.
9 mandatory checks. If any fail, flag as critical gap:
| Dimension | Weight | What to score (1-10) | |-----------|--------|---------------------| | Identity | 0.15 | Full name, DOB, location, education, photo | | Career | 0.20 | Completeness of work history, current role clarity | | Digital footprint | 0.15 | Number of platforms found, account activity level | | Psychoprofile | 0.15 | MBTI confidence, writing style quantified, values deduced | | Internal intel | 0.10 | Telegram/email history depth, vault data | | Personal life | 0.05 | Family, hobbies, lifestyle, pets | | Cross-reference | 0.10 | How many facts are A-grade, contradiction count | | Actionability | 0.10 | Entry points identified, approach strategy clear |
Weighted sum (1-10) = Depth Score.
Count unique source types used (max 12): LinkedIn, Instagram, Facebook, Telegram DM, Telegram channel, VK, Twitter/X, company website, press/media articles, conference profiles, government/business registries, email correspondence.
| Depth Score | Coverage | Diagnosis | Action | |------------|----------|-----------|--------| | 8+ | All pass | Strong dossier | Proceed to Phase 6 | | 8+ | Some fail | Deep but blind spots | Target failed checks, 1 more cycle | | <7 | All pass | Wide but shallow | Deepen via interviews/articles/deepsearch | | <7 | Some fail | Restart needed | Different search angle, new tool combination |
(a) Depth Score ≥ 8.0 AND all coverage checks pass → exit to Phase 6 (b) 3 cycles completed → deliver best available with honest assessment (c) Two cycles with delta < 0.5 → plateau reached, deliver with note
Read assets/dossier-template.md before rendering. Follow the template structure exactly.
No markdown tables in output (Telegram cannot render). Bullet lists only.
Report Depth Score, source count, source types, and total API spend.
If internal intelligence was used, add a separate "из переписки" section (marked as internal/confidential, not for sharing outside).
$0.50: ask user before proceeding.
brightdata.sh scrape as primary instead of Apify.apify.sh store-search "linkedin scraper" for alternatives. Actors on Apify are volatile — always have a Bright Data fallback.jina.sh deepsearch. Check Telegram history.bash scripts/apify.sh store-search "people search". If mcpc installed: APIFY_TOKEN=$APIFY_API_TOKEN mcpc --json mcp.apify.com --header "Authorization: Bearer $APIFY_TOKEN" tools-call search-actors keywords:="people search" limit:=10. Check Telegram contacts by phone.clockworks/free-tiktok-scraper (free tier) as fallback. TikTok usernames often differ from other platforms — search by real name via clockworks/tiktok-user-search-scraper.vdrmota/contact-info-scraper — it crawls the site and extracts all contact info.development
# Write Article -- Investigative Series in Levi Bakke's Voice You are ghostwriting publishable investigative journalism in Levi's voice. He is a participant-investigator -- IN the story, not observing from outside. ## BEFORE WRITING Read the style guide: [references/style-guide.md](references/style-guide.md) Read the gold standard: `C:\Users\Big Levi\Desktop\DHS Stories\the Canary FINAL.txt` ## THE WRITING PROCESS 1. **Gather** -- Read relevant timeline docs, investigation files, databases
development
Dual-engine web search using BOTH Firecrawl AND Brave Search simultaneously. ALWAYS trigger this skill when Levi uses any of these phrases or close variations: - "search the web" / "search the internet" / "search online" - "www" (used as a verb or shorthand, e.g. "www this", "look it up on the www") - "internet" (as in "check the internet", "find on the internet", "look this up on the internet") - "go online", "look this up online", "check online" - "search for X" when context implies web search (not local files or database) - "find X online", "look up X", "research X on the web" This is Levi's preferred web research protocol. Both engines run together -- Brave for fast broad coverage, Firecrawl for deep scraping. Never use just one without the other when this skill triggers.
development
Web scraping with anti-bot bypass, content extraction, undocumented APIs and poison pill detection. Use when extracting content from websites, handling paywalls, implementing scraping cascades or processing social media. Covers requests, trafilatura, Playwright with stealth mode, yt-dlp and instaloader patterns.
development
# Text to Voice -- Convert Articles to Audio Convert written articles to spoken audio (.mp3) using Google Cloud TTS with Chirp 3: HD Algieba voice. ## VOICE PROFILE - **Voice:** `en-US-Chirp3-HD-Algieba` (male, Chirp 3: HD) - **Speaking Rate:** `1.0` | **Volume Gain:** `0.0` dB - **Audio Encoding:** MP3, 44100 Hz, 192k bitrate (final stitch) - **API Version:** `texttospeech_v1beta1` (Chirp 3 HD requires v1beta1) - **Google Cloud Project:** `valorinvestigates` ## THE TWO-STEP PROCESS 1. **Rew