.claude/skills/test-memory-system/SKILL.md
Comprehensive testing playbook for Local Brain Search memory improvements (Phases 1, 3, 4)
npx skillsauth add abilityai/cornelius test-memory-systemInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Comprehensively test the new memory improvements implemented in Local Brain Search:
| Source | Location | Read | Write | Description |
|--------|----------|------|-------|-------------|
| Memory Config | resources/local-brain-search/memory_config.py | ✓ | | Central configuration |
| Q-Values | resources/local-brain-search/data/q_values.json | ✓ | | Learned preferences |
| Usage History | resources/local-brain-search/data/usage_history.jsonl | ✓ | | Event tracking |
| FAISS Index | resources/local-brain-search/data/brain.faiss | ✓ | | Vector index |
| Graph | resources/local-brain-search/data/brain_graph.pkl | ✓ | | Note graph |
| Test Report | resources/memory-test-reports/ | | ✓ | Output location |
source resources/local-brain-search/venv/bin/activatedata/brain.faiss existsdata/brain_graph.pkl existspip install -r requirements.txtVerify all components are installed and accessible.
cd /Users/eugene/Dropbox/Agents/Cornelius/resources/local-brain-search
source venv/bin/activate
# Check files exist
ls -la data/brain.faiss data/brain_metadata.pkl data/brain_graph.pkl
# Check Python dependencies
python -c "import faiss; import networkx; import sentence_transformers; print('All dependencies OK')"
# Check new modules exist
python -c "from intent import classify_intent; from spreading import spreading_activation; from learning import get_learning_stats; print('All new modules OK')"
Expected: All files exist, all imports succeed.
Test that queries are correctly classified into four intent types.
cd /Users/eugene/Dropbox/Agents/Cornelius/resources/local-brain-search
source venv/bin/activate
python intent.py
Test Cases:
| Query | Expected Intent | Confidence | |-------|-----------------|------------| | "What is dopamine?" | factual | >70% | | "How does motivation work?" | conceptual | >70% | | "Connect dopamine and Buddhism" | synthesis | >70% | | "Recent notes about AI agents" | temporal | >70% | | "Dopamine" | factual | >50% | | "relationship between identity and belief" | synthesis | >70% | | "patterns across neuroscience and AI" | synthesis | >80% |
Record Results:
Compare results between static (traditional) and spreading activation modes.
cd /Users/eugene/Dropbox/Agents/Cornelius/resources/local-brain-search
# Test 1: Static search
./run_search.sh "dopamine and motivation" --mode static --limit 5 --json | python -m json.tool
# Test 2: Spreading search (same query)
./run_search.sh "dopamine and motivation" --mode spreading --limit 5 --json | python -m json.tool
# Test 3: Synthesis query (where spreading should excel)
./run_search.sh "connect Buddhism neuroscience consciousness" --mode static --limit 5 --json
./run_search.sh "connect Buddhism neuroscience consciousness" --mode spreading --limit 5 --json
Expected Results:
Record:
Verify that different intents produce different spreading parameters.
cd /Users/eugene/Dropbox/Agents/Cornelius/resources/local-brain-search
# Factual query (should use minimal spreading)
./run_search.sh "What is dopamine?" --mode spreading --json 2>&1 | grep -E "(iterations|Intent)"
# Synthesis query (should use maximum spreading)
./run_search.sh "connect dopamine Buddhism identity" --mode spreading --json 2>&1 | grep -E "(iterations|Intent)"
Expected:
Verify that hub notes don't dominate results.
cd /Users/eugene/Dropbox/Agents/Cornelius/resources/local-brain-search
# Search for topic where Dopamine.md is a known hub
./run_search.sh "motivation reward behavior" --mode spreading --limit 10 --json
# Check if Dopamine.md dominates or if diverse results appear
Expected:
Record:
Check current learning system state.
cd /Users/eugene/Dropbox/Agents/Cornelius/resources/local-brain-search
# Check learning status
./run_learning.sh status
# View top notes by Q-value
./run_learning.sh top --limit 10
# Export full learning data
./run_learning.sh export --output /tmp/learning_export.json
Expected:
Record:
Verify that search operations are being tracked.
cd /Users/eugene/Dropbox/Agents/Cornelius/resources/local-brain-search
# Count events before
BEFORE=$(wc -l < data/usage_history.jsonl)
# Run a search
./run_search.sh "test query for tracking" --mode spreading --limit 5
# Count events after
AFTER=$(wc -l < data/usage_history.jsonl)
echo "Events before: $BEFORE, after: $AFTER, new: $((AFTER - BEFORE))"
# Verify event structure
tail -5 data/usage_history.jsonl | python -m json.tool
Expected:
Verify that different event types produce appropriate Q-value changes.
cd /Users/eugene/Dropbox/Agents/Cornelius/resources/local-brain-search
# Get current Q-value for a test note
cat data/q_values.json | python -c "import json,sys; d=json.load(sys.stdin); print(d.get('02-Permanent/Dopamine.md', 'not found'))"
# Log a read event
./run_learning.sh log read "02-Permanent/Dopamine.md" --query "test"
# Check Q-value increased
cat data/q_values.json | python -c "import json,sys; d=json.load(sys.stdin); print(d.get('02-Permanent/Dopamine.md', 'not found'))"
Expected Q-value changes:
Note: Actual changes are modulated by learning_rate (0.1) and position factor.
Verify that Q-values influence search result ranking.
cd /Users/eugene/Dropbox/Agents/Cornelius/resources/local-brain-search
# First, artificially boost a note's Q-value
python -c "
import json
with open('data/q_values.json', 'r') as f:
q = json.load(f)
q['02-Permanent/Identity.md'] = 1.5 # High Q-value
with open('data/q_values.json', 'w') as f:
json.dump(q, f, indent=2)
print('Q-value set for Identity.md')
"
# Search for something where Identity.md is somewhat relevant
./run_search.sh "belief systems self" --mode spreading --limit 10 --json
# Check if Identity.md ranks higher due to Q-value boost
Expected:
Verify that --no-track flag prevents usage logging.
cd /Users/eugene/Dropbox/Agents/Cornelius/resources/local-brain-search
# Count events before
BEFORE=$(wc -l < data/usage_history.jsonl)
# Run search with --no-track
./run_search.sh "no track test" --mode static --no-track
# Count events after
AFTER=$(wc -l < data/usage_history.jsonl)
echo "Events should be same: before=$BEFORE, after=$AFTER"
Expected: No new events logged when --no-track is used.
Verify memory_config.py is the single source of truth.
cd /Users/eugene/Dropbox/Agents/Cornelius/resources/local-brain-search
# Print current configuration
python memory_config.py
# Verify spreading uses config
python -c "
from memory_config import MEMORY_CONFIG
print('Spreading max_iterations:', MEMORY_CONFIG['spreading']['max_iterations'])
print('Learning enabled:', MEMORY_CONFIG['learning']['enabled'])
print('Q-weight:', MEMORY_CONFIG['learning']['q_weight'])
"
Expected:
Measure search latency for both modes.
cd /Users/eugene/Dropbox/Agents/Cornelius/resources/local-brain-search
# Benchmark static search
time (for i in {1..5}; do ./run_search.sh "dopamine" --mode static --limit 10 --no-track > /dev/null; done)
# Benchmark spreading search
time (for i in {1..5}; do ./run_search.sh "dopamine" --mode spreading --limit 10 --no-track > /dev/null; done)
Expected:
Test boundary conditions.
cd /Users/eugene/Dropbox/Agents/Cornelius/resources/local-brain-search
# Empty query
./run_search.sh "" --mode spreading --limit 5 2>&1
# Very long query
./run_search.sh "$(python -c 'print("dopamine " * 100)')" --mode spreading --limit 5 2>&1
# Non-existent topic
./run_search.sh "xyznonexistenttopicxyz" --mode spreading --limit 5 --json
# Unicode query
./run_search.sh "意識 consciousness" --mode spreading --limit 5 --json
Expected:
After running all tests, generate a comprehensive report.
REPORT_DIR="/Users/eugene/Dropbox/Agents/Cornelius/resources/memory-test-reports"
REPORT_FILE="$REPORT_DIR/test-report-$(date +%Y-%m-%d-%H%M).md"
mkdir -p "$REPORT_DIR"
# Memory System Test Report
**Date:** YYYY-MM-DD HH:MM
**Tester:** [name]
**System Version:** Cornelius v01.25
## Summary
| Component | Status | Notes |
|-----------|--------|-------|
| Intent Classification | ✓/✗ | |
| Spreading Activation | ✓/✗ | |
| Lateral Inhibition | ✓/✗ | |
| Usage Tracking | ✓/✗ | |
| Q-Value Learning | ✓/✗ | |
| Configuration Loading | ✓/✗ | |
| Performance | ✓/✗ | |
## Detailed Results
### Intent Classification
- Accuracy: X%
- Failures: [list]
### Spreading vs Static
- Result overlap: X/5
- Spreading found cross-domain connections: Y/N
- Example good result: [describe]
### Lateral Inhibition
- Hub dominance reduced: Y/N
- Diversity improved: Y/N
### Learning System
- Events tracked: X
- Q-values updated correctly: Y/N
- Ranking adjustment working: Y/N
### Performance
- Static search avg: Xms
- Spreading search avg: Xms
- Acceptable: Y/N
## Issues Found
1. [Issue description]
- Severity: High/Medium/Low
- Steps to reproduce: [steps]
- Suggested fix: [suggestion]
## Recommendations
1. [Recommendation]
2. [Recommendation]
## Next Steps
- [ ] Address issues found
- [ ] Move to Phase 2 (Extended Graph) if all passes
- [ ] Enable spreading as default mode if stable
If tests fail:
requirements.txt installed in venvpython index_brain.py to rebuild index./run_learning.sh reset --confirm to resetmemory_config.py syntaxdevelopment
Rebuild the Cornelius voice agent system prompt from knowledge base sources
data-ai
Update dashboard.yaml with current knowledge base metrics from analysis report
documentation
Update Knowledge Graph Changelog
tools
Conversational Partner Mode