skills/egadams/receipt-scanner-master/SKILL.md
Master receipt scanning operations including parsing, debugging, enhancing accuracy, and database integration. Use when working with receipts, images, OCR issues, expense categorization, or troubleshooting receipt uploads.
npx skillsauth add aiskillstore/marketplace receipt-scanner-masterInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Master the receipt scanning system that uses AI-powered OCR to extract structured data from receipt images and store them in the database.
This skill helps you:
Receipt Scanner Component: /home/adamsl/planner/office-assistant/js/components/receipt-scanner.js
http://localhost:8080/receipt-scanner.htmlUpload Component: /home/adamsl/planner/office-assistant/js/upload-component.js
Receipt Parser: app/services/receipt_parser.py
Receipt Engine: app/services/receipt_engine.py
API Endpoints: app/api/receipt_endpoints.py
/api/parse-receipt - Uploads and parses receipt image (returns temp data, doesn't save)/api/receipt-items - Auto-saves individual line items when categorized/api/save-receipt - Final save for categorized items (batch operation)/api/receipts/{expense_id} - Retrieves receipt metadata/api/receipts/file/{year}/{month}/{filename} - Serves stored receipt filesData Models: app/models/receipt_models.py
ReceiptExtractionResult - Complete receipt data structureReceiptItem - Individual line items with categorizationReceiptTotals - Subtotal, tax, tip, discount, totalReceiptPartyInfo - Merchant detailsReceiptMeta - Parsing metadata and model infoPaymentMethod - Enum: CASH, CARD, BANK, OTHERTables:
expenses - Main expense entries (amount, date, category, method)receipt_metadata - Parsing metadata (model, confidence, raw response)Storage Structure:
app/data/receipts/
├── YYYY/
│ ├── MM/
│ │ ├── receipt_TIMESTAMP_filename.jpg
│ │ └── receipt_TIMESTAMP_filename.pdf
└── temp/
└── temp_receipt_TIMESTAMP_filename.jpg
Parse a receipt image to extract structured data:
# Start the API server if not running
python3 api_server.py
# Test with curl (from another terminal)
curl -X POST "http://localhost:8000/api/parse-receipt" \
-F "file=@/path/to/receipt.jpg"
Expected Response:
{
"parsed_data": {
"transaction_date": "2025-01-15",
"payment_method": "CARD",
"party": {
"merchant_name": "Walmart",
"merchant_phone": null,
"merchant_address": "123 Main St",
"store_location": "Store #1234"
},
"items": [
{
"description": "MILK WHOLE GAL",
"quantity": 1.0,
"unit_price": 4.99,
"line_total": 4.99
}
],
"totals": {
"subtotal": 4.99,
"tax_amount": 0.35,
"tip_amount": 0.0,
"discount_amount": 0.0,
"total_amount": 5.34
},
"meta": {
"currency": "USD",
"receipt_number": "12345",
"model_name": "gemini-2.5-pro"
}
},
"temp_file_name": "temp_receipt_20250115T120000Z_receipt.jpg"
}
When OCR produces incorrect amounts or descriptions:
Common Issues:
Debug Process:
Check the raw image quality:
# View the receipt image
open /path/to/receipt.jpg
# or
xdg-open /path/to/receipt.jpg
Review the Gemini prompt in app/services/receipt_engine.py:96-173:
Test with higher quality image:
RECEIPT_IMAGE_MAX_WIDTH_PX in settingsreceipt_parser.py:80,83Add validation logic:
quantity × unit_price = line_total for each itemsum(line_totals) ≈ subtotalsubtotal + tax - discount = totalExamine the raw AI response:
# Add debug logging in receipt_engine.py:78
print(f"Raw Gemini Response: {json_response}")
To improve parsing accuracy and features:
Modify the Gemini Prompt (app/services/receipt_engine.py):
def _get_prompt(self) -> str:
return """
You are an expert at extracting structured data from receipt images with EXTREME ACCURACY.
[Add new instructions here, such as:]
**NEW RULE**: For grocery store receipts, items often have:
- Short codes (e.g., "VEG", "DAIRY", "MEAT")
- Weight-based pricing (price per lb/kg)
- Multi-buy discounts (e.g., "2 for $5")
**VALIDATION ENHANCEMENT**: Before returning JSON:
1. Verify every item's math: quantity × unit_price = line_total
2. Sum all line_totals and compare to subtotal
3. Check: subtotal + tax - discount + tip = total_amount
4. If any validation fails, RE-EXAMINE the receipt more carefully
... [rest of prompt]
"""
Improve Image Processing (app/services/receipt_parser.py):
async def _process_image(self, image_data: bytes, mime_type: str):
if mime_type.startswith("image/"):
img = Image.open(BytesIO(image_data))
# Add preprocessing steps:
# 1. Auto-rotate based on EXIF
# 2. Increase contrast for faded receipts
# 3. Sharpen slightly for better OCR
# 4. Convert to grayscale if color isn't needed
Add Custom Validation (app/api/receipt_endpoints.py):
@router.post("/parse-receipt")
async def parse_receipt_endpoint(file: UploadFile = File(...)):
parsed_data, temp_file_name = await parser.process_receipt(file)
# Add validation here:
validation_errors = validate_receipt_data(parsed_data)
if validation_errors:
return JSONResponse(
status_code=422,
content={
"errors": validation_errors,
"parsed_data": parsed_data,
"temp_file_name": temp_file_name
}
)
return ParseReceiptResponse(...)
Test the complete workflow including UI:
Start the API server:
cd /home/adamsl/planner/nonprofit_finance_db
python3 api_server.py
Open the web interface:
cd /home/adamsl/planner/office-assistant
# Open index.html in browser or use a local server
python3 -m http.server 8080
# Navigate to http://localhost:8080
Test the upload flow:
Check database entries:
# Connect to database and verify
mysql -u root -p nonprofit_finance_db
-- Check latest expense entries
SELECT * FROM expenses ORDER BY id DESC LIMIT 5;
-- Check receipt metadata
SELECT * FROM receipt_metadata ORDER BY id DESC LIMIT 5;
-- Verify file storage
SELECT expense_id, receipt_url FROM expenses WHERE receipt_url IS NOT NULL LIMIT 5;
Common database integration problems:
Issue: Receipt parsed but not saved to database
Debug steps:
# Check API server logs
tail -f api_server.log
# Look for errors in save_receipt_endpoint
grep -A 10 "Error saving expense" api_server.log
# Verify database connection
python3 -c "from app.repositories.expenses import ExpenseRepository; repo = ExpenseRepository(); print('Connection OK')"
Issue: File saved to temp but not moved to permanent storage
Debug steps:
# Check temp directory
ls -lth app/data/receipts/temp/ | head -20
# Check permanent storage structure
ls -R app/data/receipts/ | grep -E "^\\./"
# Verify permissions
ls -ld app/data/receipts/
Issue: Categorization not working
Debug steps:
# Check categories table
mysql -u root -p -e "SELECT id, name, category_path FROM categories ORDER BY id;" nonprofit_finance_db
# Verify category_id assignments in parsed items
# Items without category_id are not saved to database
Manually verify OCR accuracy:
Get the parsed data:
curl -X POST "http://localhost:8000/api/parse-receipt" \
-F "[email protected]" | jq '.'
Compare against actual receipt:
Calculate accuracy metrics:
# Create a validation script
import json
def validate_receipt(parsed_json, actual_receipt_data):
errors = []
# Check item count
if len(parsed_json['items']) != len(actual_receipt_data['items']):
errors.append(f"Item count mismatch: {len(parsed_json['items'])} vs {len(actual_receipt_data['items'])}")
# Check each item
for i, (parsed, actual) in enumerate(zip(parsed_json['items'], actual_receipt_data['items'])):
if parsed['line_total'] != actual['line_total']:
errors.append(f"Item {i}: ${parsed['line_total']} vs ${actual['line_total']}")
# Check total
if parsed_json['totals']['total_amount'] != actual_receipt_data['total']:
errors.append(f"Total: ${parsed_json['totals']['total_amount']} vs ${actual_receipt_data['total']}")
return errors
Environment Variables (.env):
GEMINI_API_KEY=your_gemini_api_key_here
# Receipt settings
RECEIPT_MAX_SIZE_MB=10
RECEIPT_IMAGE_MAX_WIDTH_PX=2048
RECEIPT_IMAGE_MAX_HEIGHT_PX=2048
RECEIPT_PARSE_TIMEOUT_SECONDS=30
RECEIPT_UPLOAD_DIR=app/data/receipts
RECEIPT_TEMP_UPLOAD_DIR=app/data/receipts/temp
Settings (app/config.py):
class Settings(BaseSettings):
GEMINI_API_KEY: str
RECEIPT_MAX_SIZE_MB: int = 10
RECEIPT_IMAGE_MAX_WIDTH_PX: int = 1024
RECEIPT_IMAGE_MAX_HEIGHT_PX: int = 1024
RECEIPT_PARSE_TIMEOUT_SECONDS: int = 30
RECEIPT_UPLOAD_DIR: str = "app/data/receipts"
RECEIPT_TEMP_UPLOAD_DIR: str = "app/data/receipts/temp"
CRITICAL: Items do NOT automatically save when you scan a receipt. You must categorize items for them to be saved.
expenses table// When you select a category for an item:
_persistCategorizedItem(index, categoryId) {
// Immediately POSTs to /api/receipt-items
// Creates expense entry in database
// Returns expense_id for the item
}
Solution:
# Add to .env file
echo 'GEMINI_API_KEY=your_key_here' >> .env
# Or export in current session
export GEMINI_API_KEY=your_key_here
Root Cause: Hit the free tier daily quota for a specific model
Solutions:
Model fallback (already implemented):
Wait for quota reset (24 hours)
Use different Google account:
.envUpgrade to paid tier (higher quotas)
Root Cause: Digit confusion (4 vs 9)
Solution: Enhance Gemini prompt with specific digit rules:
**DIGIT 4 vs 9 RECOGNITION**:
- 4 has sharp angles, often looks like "4" with a horizontal line and vertical line meeting
- 9 has a curved top, looks like "g" or "q" without the tail
- Context check: grocery items rarely cost $9.99, more often $4.99
Root Cause: Items at bottom of receipt or spanning multiple lines
Solution:
receipt_parser.py**COMPLETE EXTRACTION**: Extract ALL items from top to bottom of receipt.
Do not skip items even if they are:
- At the very bottom of the receipt
- Spanning multiple lines
- In a different format or font
Root Cause: Some items are tax-exempt or have different tax rates
Solution:
ReceiptItem modelsum(item.tax_amount for item in items) = totals.tax_amountRoot Cause: Large image file or slow API response
Solutions:
# Increase timeout in settings
RECEIPT_PARSE_TIMEOUT_SECONDS=60
# Reduce image size before sending to API
# In receipt_parser.py, decrease max dimensions
max_width = 1024 # Instead of 2048
max_height = 1024
Root Cause: Frontend not polling or backend endpoint error
Debug steps:
# Check backend endpoint
curl http://localhost:8000/api/recent-downloads
# Check frontend console
# Open browser DevTools → Console → look for errors
# Verify file in Downloads folder
ls -lth ~/Downloads/*.pdf | head -5
app/services/receipt_parser.py - Main parsing logicapp/services/receipt_engine.py - AI engine integrationapp/api/receipt_endpoints.py - REST API endpointsapp/models/receipt_models.py - Data modelsapp/repositories/receipt_metadata.py - Metadata storageapp/repositories/expenses.py - Expense storageapp/config.py - Configuration settings/home/adamsl/planner/office-assistant/js/upload-component.js - Upload UI component/home/adamsl/planner/office-assistant/js/app.js - Main application/home/adamsl/planner/office-assistant/js/category-picker.js - Category selectiontests/test_receipt_processing.py - Receipt processing teststests/test_receipt_items_api.py - API endpoint teststest_receipt_api.py - Integration testsUser request:
I want to scan my Meijer receipt and categorize the groceries
You would:
Direct user to the receipt scanner:
Open http://localhost:8080/receipt-scanner.html in your browser
Guide the workflow:
Verify in database:
View in Daily Expense Categorizer:
http://localhost:8080/daily_expense_categorizer.htmlUser request:
Parse this grocery receipt via API and extract all items with prices
You would:
Verify API server is running:
ps aux | grep api_server.py
# If not running: python3 api_server.py
Parse the receipt:
curl -X POST "http://localhost:8080/api/parse-receipt" \
-F "file=@grocery_receipt.jpg" | jq '.'
Review the output:
items[] array for all productstotals.total_amount matches receipttemp_file_name for saving laterIf items are missing:
User request:
The receipt parser is reading $4.99 items as $9.99
You would:
Reproduce the issue:
curl -X POST "http://localhost:8000/api/parse-receipt" \
-F "file=@problem_receipt.jpg" > parsed_output.json
# Compare parsed vs actual
cat parsed_output.json | jq '.parsed_data.items[] | {description, unit_price}'
Read the current Gemini prompt:
grep -A 30 "DIGIT CONFUSION PREVENTION" app/services/receipt_engine.py
Enhance the prompt with specific 4 vs 9 rules:
# In receipt_engine.py, _get_prompt() method
**CRITICAL: DIGIT 4 vs DIGIT 9**:
- When you see what might be 4 or 9, examine the top of the digit
- 4: Angular top, horizontal line going right
- 9: Curved/circular top, like the letter "g"
- Common grocery prices: $4.99, $14.99, NOT $9.99, $19.99
- If unsure, default to 4 for items under $10
Test with the problematic receipt:
# Restart server to load new prompt
pkill -f api_server.py
python3 api_server.py &
# Re-test
curl -X POST "http://localhost:8000/api/parse-receipt" \
-F "file=@problem_receipt.jpg" | jq '.parsed_data.items[].unit_price'
Verify improvement and test with other receipts
User request:
Validate that line totals match quantity times price
You would:
Read the current endpoint code:
cat app/api/receipt_endpoints.py | grep -A 20 "parse_receipt_endpoint"
Create a validation function:
# Add to receipt_endpoints.py
def validate_receipt_math(parsed_data: ReceiptExtractionResult) -> List[str]:
errors = []
for i, item in enumerate(parsed_data.items):
expected_total = round(item.quantity * item.unit_price, 2)
if abs(expected_total - item.line_total) > 0.01:
errors.append(
f"Item {i} '{item.description}': "
f"{item.quantity} × ${item.unit_price} = ${expected_total}, "
f"but line_total is ${item.line_total}"
)
# Validate subtotal
items_sum = sum(item.line_total for item in parsed_data.items)
if abs(items_sum - parsed_data.totals.subtotal) > 0.50:
errors.append(
f"Items sum to ${items_sum:.2f} but subtotal is ${parsed_data.totals.subtotal:.2f}"
)
# Validate final total
calculated_total = (
parsed_data.totals.subtotal +
(parsed_data.totals.tax_amount or 0) +
(parsed_data.totals.tip_amount or 0) -
(parsed_data.totals.discount_amount or 0)
)
if abs(calculated_total - parsed_data.totals.total_amount) > 0.01:
errors.append(
f"Calculated total ${calculated_total:.2f} != stated total ${parsed_data.totals.total_amount:.2f}"
)
return errors
Integrate validation into endpoint:
@router.post("/parse-receipt", response_model=ParseReceiptResponse)
async def parse_receipt_endpoint(file: UploadFile = File(...)):
parser = get_receipt_parser()
temp_file_name: Optional[str] = None
try:
parsed_data, temp_file_name = await parser.process_receipt(file)
# Add validation
validation_errors = validate_receipt_math(parsed_data)
if validation_errors:
# Log errors but still return the data
print(f"Validation warnings: {validation_errors}")
return ParseReceiptResponse(parsed_data=parsed_data, temp_file_name=temp_file_name)
Test the validation:
# Use a receipt with known correct totals
curl -X POST "http://localhost:8000/api/parse-receipt" \
-F "file=@test_receipt_good.jpg"
# Use a receipt with deliberate errors (or mock the data)
# Check logs for validation warnings
tail -f api_server.log
User request:
Make Letta able to scan and categorize receipts
You would:
Ensure this skill is available to Letta:
# Skill already in .claude/skills/receipt-scanner/
# Letta can invoke Claude Code skills via agent tool calls
Create a Letta tool function:
# In letta_agent/tools/receipt_tools.py
from typing import Optional
import httpx
@tool
def scan_receipt(image_path: str) -> dict:
"""
Scan a receipt image and extract structured data.
Args:
image_path: Path to the receipt image file
Returns:
Dictionary with merchant, items, totals, and metadata
"""
with open(image_path, 'rb') as f:
files = {'file': f}
response = httpx.post(
'http://localhost:8000/api/parse-receipt',
files=files,
timeout=60.0
)
if response.status_code == 200:
return response.json()
else:
return {'error': response.text}
Register the tool with Letta agent:
# In hybrid_letta_persistent.py
from letta_agent.tools.receipt_tools import scan_receipt
agent = client.create_agent(
name="finance_assistant",
tools=[scan_receipt, ...],
...
)
Test with Letta:
# Chat with Letta
response = client.send_message(
agent_id=agent.id,
message="Scan the receipt at ~/Downloads/walmart_receipt.jpg and tell me the total"
)
print(response)
The skill is successful when:
development
Apple Human Interface Guidelines for content display components. Use this skill when the user asks about charts component, collection view, image view, web view, color well, image well, activity view, lockup, data visualization, content display, displaying images, rendering web content, color pickers, or presenting collections of items in Apple apps. Also use when the user says how should I display charts, what's the best way to show images, should I use a web view, how do I build a grid of items, what component shows media, or how do I present a share sheet. Cross-references: hig-foundations for color/typography/accessibility, hig-patterns for data visualization patterns, hig-components-layout for structural containers, hig-platforms for platform-specific component behavior.
tools
Automate HelpDesk tasks via Rube MCP (Composio): list tickets, manage views, use canned responses, and configure custom fields. Always search tools first for current schemas.
testing
Expert Haskell engineer specializing in advanced type systems, pure functional design, and high-reliability software. Use PROACTIVELY for type-level programming, concurrency, and architecture guidance.
tools
GraphQL gives clients exactly the data they need - no more, no less. One endpoint, typed schema, introspection. But the flexibility that makes it powerful also makes it dangerous. Without proper controls, clients can craft queries that bring down your server. This skill covers schema design, resolvers, DataLoader for N+1 prevention, federation for microservices, and client integration with Apollo/urql. Key insight: GraphQL is a contract. The schema is the API documentation. Design it carefully.