skills/utility/smart-screenshot/SKILL.md
Intelligent screenshot and screen capture with OCR, markdown conversion, and annotation. Triggered by PrtSc key, captures screen regions, extracts text with OCR, converts to markdown using MarkItDown, saves with auto-formatting. Similar to Windows Snipping Tool with AI enhancements for text extraction and document processing.
npx skillsauth add astoreyai/claude-skills smart-screenshotInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Intelligent screen capture with OCR, markdown conversion, and smart formatting. Capture screen regions, extract text, convert images/PDFs to markdown, and save with automated formatting.
Trigger methods:
PrtSc (customizable)python scripts/capture.pyWorkflow:
Core (required):
# Screenshot and OCR
pip install pillow pyautogui mss pytesseract pyscreenshot --break-system-packages
# MarkItDown (Microsoft's converter)
pip install markitdown --break-system-packages
# Keyboard hooks
pip install keyboard pynput --break-system-packages
# GUI for dialogs
pip install tkinter --break-system-packages # May be pre-installed
OCR engine (Tesseract):
Windows:
# Download installer from:
# https://github.com/UB-Mannheim/tesseract/wiki
# Install to: C:\Program Files\Tesseract-OCR\
# Add to PATH
macOS:
brew install tesseract
Linux:
sudo apt-get install tesseract-ocr
# or
sudo dnf install tesseract
Optional enhancements:
# Better OCR (EasyOCR - slower but more accurate)
pip install easyocr --break-system-packages
# PDF handling
pip install pdf2image pypdf2 --break-system-packages
# Image enhancement
pip install opencv-python --break-system-packages
# Clipboard integration
pip install pyperclip --break-system-packages
See reference/setup-guide.md for detailed installation.
1. Region Selection
2. Window Capture
3. Full Screen
4. Scrolling Capture
OCR Engines:
Smart text processing:
Using MarkItDown (Microsoft):
Conversion features:
Keyboard shortcut:
# Run as background service
python scripts/screenshot_service.py
# Now press PrtSc anytime:
# 1. Screen freezes
# 2. Choose "Image" or "Text"
# 3. Select region
# 4. Auto-process and save
Command line:
# Capture with UI
python scripts/capture.py
# Capture full screen immediately
python scripts/capture.py --fullscreen --output screenshot.png
# Capture region with coordinates
python scripts/capture.py --region 100,100,800,600 --output region.png
Interactive:
# Start capture
python scripts/capture.py --mode text
# Process:
# 1. Select region
# 2. OCR extracts text
# 3. MarkItDown formats
# 4. Save dialog opens
# 5. Save as .md file
Automatic:
# Capture and OCR
python scripts/capture_text.py --output extracted.md
# With specific language
python scripts/capture_text.py --lang eng+fra --output text.md
# With enhancement
python scripts/capture_text.py --enhance --output clean.md
Interactive:
# Start capture
python scripts/capture.py --mode image
# Process:
# 1. Select region
# 2. Annotation tools appear
# 3. Add arrows, boxes, text
# 4. Save dialog opens
With annotations:
# Capture and annotate
python scripts/capture_annotate.py --output annotated.png
# Annotation tools:
# - Arrow
# - Rectangle
# - Circle
# - Text
# - Highlight
# - Blur (redact sensitive info)
Convert PDF to markdown:
# Using MarkItDown
python scripts/pdf_to_markdown.py --input document.pdf --output document.md
# With OCR for scanned PDFs
python scripts/pdf_to_markdown.py --input scanned.pdf --ocr --output text.md
# Batch convert folder
python scripts/batch_pdf_convert.py --input ./pdfs/ --output ./markdown/
Process existing image:
# Extract text to markdown
python scripts/image_to_markdown.py --input screenshot.png --output text.md
# Clean up image first
python scripts/enhance_and_extract.py --input noisy.png --output clean.md
Settings file: config.yaml
# Keyboard shortcut
hotkey: "Print" # or "ctrl+shift+s", "cmd+shift+5", etc.
# Default capture mode
default_mode: "prompt" # "image", "text", or "prompt"
# OCR settings
ocr:
engine: "tesseract" # "tesseract", "easyocr", or "cloud"
language: "eng"
enhance: true # Pre-process image for better OCR
# Output settings
output:
directory: "~/Screenshots"
filename_pattern: "Screenshot-{date}-{time}"
auto_save: false # true = skip save dialog
clipboard: true # Copy to clipboard
# Markdown settings
markdown:
format_code_blocks: true
detect_tables: true
preserve_formatting: true
# Annotation defaults
annotation:
arrow_color: "#FF0000"
box_color: "#0000FF"
text_color: "#000000"
text_size: 12
line_width: 2
Scenario: Capture code from screen → Markdown documentation
# 1. Run screenshot service
python scripts/screenshot_service.py &
# 2. Press PrtSc on your keyboard
# 3. Select "Text" mode
# 4. Select code region on screen
# 5. OCR extracts code
# 6. MarkItDown formats as code block:
```python
def example_function():
return "formatted code"
### Workflow 2: Meeting Notes from Slides
**Scenario:** Capture presentation slides → Formatted notes
```bash
# Capture multiple slides
python scripts/capture_sequence.py \
--count 5 \
--delay 3 \
--mode text \
--output slides.md
# Result: All slides as markdown in one file
Scenario: Screenshot email → Extract and format text
# Capture email
python scripts/capture.py --mode text --enhance
# Text extracted, formatted, and saved
# Perfect for archiving or processing
Scenario: Screenshot paper → Annotate → Save
# Capture and annotate
python scripts/capture_annotate.py --output paper-notes.png
# Add arrows, highlights, notes
# Save annotated version
Scenario: Convert all PDFs to markdown
# Convert folder of PDFs
python scripts/batch_pdf_convert.py \
--input ~/Documents/PDFs/ \
--output ~/Documents/Markdown/ \
--ocr # Enable OCR for scanned docs
# Progress shown for each file
# All PDFs → Clean markdown
Microsoft's MarkItDown converts:
Images:
PDFs:
Documents:
Code:
Tables:
During capture:
Esc - Cancel captureSpace - Toggle crosshair/selectionEnter - Confirm selectionCtrl+Z - Undo annotationCtrl+C - Copy to clipboardCtrl+S - SaveAnnotation mode:
A - Arrow toolR - Rectangle toolC - Circle toolT - Text toolH - Highlight toolB - Blur toolDelete - Remove last annotationBetter results:
Pre-processing:
# Enhance before OCR
python scripts/enhance_image.py \
--input screenshot.png \
--output enhanced.png \
--operations "grayscale,contrast,denoise"
# Then OCR
python scripts/image_to_markdown.py --input enhanced.png
Capture from specific monitor:
# List monitors
python scripts/list_monitors.py
# Capture from monitor 2
python scripts/capture.py --monitor 2
# Capture all monitors
python scripts/capture.py --all-monitors
When save dialog appears:
Skip dialog (auto-save):
# config.yaml
output:
auto_save: true
directory: "~/Screenshots"
Copy to clipboard automatically:
# Text mode - copies markdown
python scripts/capture.py --mode text --clipboard
# Image mode - copies image
python scripts/capture.py --mode image --clipboard
# Both
python scripts/capture.py --clipboard --save
Paste from clipboard:
# Process clipboard image
python scripts/process_clipboard.py --output result.md
Install as Windows service:
# Install
python scripts/install_windows_service.py
# Service runs on startup
# PrtSc always available
Or use Task Scheduler:
1. Open Task Scheduler
2. Create Basic Task
3. Trigger: At log on
4. Action: Start program
5. Program: python
6. Arguments: path\to\screenshot_service.py
LaunchAgent setup:
# Install service
python scripts/install_macos_service.py
# Creates ~/Library/LaunchAgents/com.screenshot.service.plist
# Runs on login
Manual:
# Create LaunchAgent plist
# Load with launchctl
launchctl load ~/Library/LaunchAgents/com.screenshot.service.plist
Systemd service:
# Install
python scripts/install_linux_service.py
# Creates ~/.config/systemd/user/screenshot.service
# Enable and start:
systemctl --user enable screenshot
systemctl --user start screenshot
Or use autostart:
# Copy desktop entry
cp screenshot.desktop ~/.config/autostart/
For highest accuracy:
Azure Computer Vision:
export AZURE_CV_KEY="your-key"
export AZURE_CV_ENDPOINT="https://your-region.api.cognitive.microsoft.com/"
python scripts/capture.py --mode text --ocr-engine azure
Google Vision:
export GOOGLE_APPLICATION_CREDENTIALS="path/to/credentials.json"
python scripts/capture.py --mode text --ocr-engine google
Costs:
Capture:
capture.py - Main interactive capturecapture_text.py - Text mode onlycapture_annotate.py - Image with annotationscapture_sequence.py - Multiple capturesService:
screenshot_service.py - Background serviceinstall_windows_service.py - Windows installerinstall_macos_service.py - macOS installerinstall_linux_service.py - Linux installerConversion:
image_to_markdown.py - Image → Markdownpdf_to_markdown.py - PDF → Markdownbatch_pdf_convert.py - Batch conversionProcessing:
enhance_image.py - Image enhancementprocess_clipboard.py - Clipboard processingextract_tables.py - Table extractionUtilities:
list_monitors.py - List displaystest_ocr.py - Test OCR accuracyconfigure.py - Interactive config"Tesseract not found"
# Install Tesseract
# Windows: Download installer
# macOS: brew install tesseract
# Linux: apt install tesseract-ocr
# Check installation
tesseract --version
"Permission denied" (screenshot)
Windows: Run as Administrator
macOS: System Preferences → Security → Privacy → Screen Recording
Linux: Check X11 permissions
"Keyboard hook failed"
# Requires administrator/root privileges
# Windows: Run as Administrator
# macOS: Grant Accessibility permissions
# Linux: Run with sudo or add user to input group
"Poor OCR quality"
# Enhance image first
python scripts/enhance_image.py --input screenshot.png
# Try different OCR engine
python scripts/capture.py --ocr-engine easyocr
# Specify language
python scripts/capture.py --lang eng+fra
"MarkItDown not working"
pip install --upgrade markitdown --break-system-packages
# Check version
python -c "import markitdown; print(markitdown.__version__)"
See examples/ for complete workflows:
tools
# YouTube Transcriber Pipeline - Claude Code Skill **Version**: 1.0.0 | **Status**: Ready for Claude Code Integration ## Overview Complete 4-skill pipeline for extracting, formatting, organizing, and archiving YouTube transcripts. Integrates with Claude Code CLI using the system's configured API keys (no manual key management needed). ## Capabilities This skill provides a complete workflow: 1. **Extract Facts** - AI-powered fact extraction from transcripts (uses Claude API via Claude Code)
content-media
# YouTube Transcriber Skill Extract transcripts from YouTube videos, playlists, and channels with automatic intelligent processing. ## Overview One unified command that intelligently assesses input and handles everything—single videos, batch files, playlist expansion, channel extraction. No need to choose between commands; it figures out what to do. ## Capabilities - **Auto-Detect Input**: Single URL, file of URLs, playlist, channel - **Smart Expansion**: Automatically expands playlists/cha
development
# Trading Analysis Skill **Version**: 1.0.0 **Category**: Financial Analysis / Trading **Author**: Claude Code **Last Updated**: November 22, 2025 ## Overview Comprehensive trading performance analysis and edge identification system for Interactive Brokers accounts. Analyzes CSV statements to identify trading patterns, position sizing issues, time-of-day edges, and risk management problems. ## Features ### 1. **CSV Statement Parsing** - Parse Interactive Brokers activity statements (CSV for
development
# System Health Check & Cleanup Skill **Version**: 1.0.0 **Category**: System Administration / Performance Optimization **Author**: Claude Code **Last Updated**: November 22, 2025 ## Overview Automated system health monitoring and cleanup workflow. Diagnoses performance issues, identifies resource bottlenecks, fixes orphaned services, kills stale processes, and cleans cache bloat. Designed for archimedes (32c/125GB) but works on any Linux system. ## Features ### 1. **System Diagnostics** -