skills/05-kthorn-research-superpower/research/cleaning-up-research-sessions/SKILL.md
<!-- ╔══════════════════════════════════════════════════════════════╗ ║ 本文件为开源 Skill 原始文档,收录仅供学习与研究参考 ║ ║ CoPaper.AI 收集整理 | https://copaper.ai ║ ╚══════════════════════════════════════════════════════════════╝ 来源仓库: https://github.com/kthorn/research-superpower 项目名称: research-superpower 开源协议: MIT License 收录日期: 2026-04-02 声明: 本文件版权归原作者所有。此处收录旨在为社会科学实证研究者 提供 AI Agent Skills 的集中参考。如有侵权,请联系删除。 --> --- name: Cleaning Up Research Sessions description: S
npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research skills/05-kthorn-research-superpower/research/cleaning-up-research-sessionsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Remove intermediate files created during research workflow while preserving all important data.
Core principle: Conservative cleanup with user confirmation. Never delete anything important.
Use this skill when:
When NOT to use:
NEVER delete these (protected list):
Core outputs:
SUMMARY.md - Enhanced findings with methodologyrelevant-papers.json - Filtered relevant paperspapers-reviewed.json - Complete screening historypapers/ directory - All PDFs and supplementary filescitations/citation-graph.json - Citation relationshipsMethodology documentation:
screening-criteria.json - Rubric definition (if exists)test-set.json - Rubric validation papers (if exists)abstracts-cache.json - Cached abstracts for re-screening (if exists)rubric-changelog.md - Rubric version history (if exists)Auxiliary documentation (if exists):
README.md - Project overviewTOP_PRIORITY_PAPERS.md - Curated priority listevaluated-papers.json - Rich structured dataProject configuration:
.claude/ directory - Permissions and settings*.py helper scripts that were created - Keep for reproducibilityCandidates for removal (with confirmation):
Intermediate search results:
initial-search-results.json - Raw PubMed results before screening
Temporary files:
*.tmp files*.swp files (vim swap files).DS_Store (macOS)__pycache__/ (Python cache)*.pyc (Python compiled)Log files:
*.log filesdebug-*.txt filescd research-sessions/YYYY-MM-DD-description/
# List all files with sizes
find . -type f -exec ls -lh {} \; | awk '{print $5, $9}' | sort -rh
Identify files by category:
Show what will be deleted:
🧹 Cleanup Analysis for: research-sessions/2025-10-11-btk-selectivity/
Files to KEEP (protected):
✅ SUMMARY.md (45 KB)
✅ relevant-papers.json (12 KB)
✅ papers-reviewed.json (28 KB)
✅ papers/ (14 PDFs, 32 MB)
✅ citations/citation-graph.json (5 KB)
✅ screening-criteria.json (2 KB)
✅ abstracts-cache.json (156 KB)
Files that CAN be removed (intermediate):
🗑️ initial-search-results.json (8 KB) - Raw PubMed results
🗑️ .DS_Store (6 KB) - macOS metadata
Total space to recover: 14 KB
Proceed with cleanup? (y/n/review)
Options:
y - Delete intermediate filesn - Cancel cleanup, keep everythingreview - Show contents of each file before decidingBefore deleting ANY file:
Example confirmation:
About to delete:
- initial-search-results.json (8 KB)
This file contains raw PubMed search results. The data is preserved in
papers-reviewed.json, so this is safe to delete.
Confirm deletion? (y/n)
Delete confirmed files:
# Move to trash instead of rm (safer)
# On macOS:
mv initial-search-results.json ~/.Trash/
# On Linux:
mv initial-search-results.json ~/.local/share/Trash/files/
# Or use rm if user confirms
rm initial-search-results.json
Report results:
✅ Cleanup complete!
Removed:
- initial-search-results.json (8 KB)
- .DS_Store (6 KB)
Space recovered: 14 KB
Protected files preserved:
- All 8 core files kept
- All 14 PDFs kept
- All methodology documentation kept
After cleanup, verify critical files:
# Check core files exist
test -f SUMMARY.md && echo "✓ SUMMARY.md"
test -f relevant-papers.json && echo "✓ relevant-papers.json"
test -f papers-reviewed.json && echo "✓ papers-reviewed.json"
test -d papers && echo "✓ papers/ directory"
# Verify JSON files are valid
jq empty relevant-papers.json && echo "✓ relevant-papers.json valid JSON"
jq empty papers-reviewed.json && echo "✓ papers-reviewed.json valid JSON"
Report to user:
✅ Integrity check passed
- All core files present
- All JSON files valid
- All PDFs intact
If abstracts-cache.json is very large (>100 MB):
⚠️ abstracts-cache.json is 256 MB
This file enables re-screening if you update the rubric. Options:
1. Keep (recommended if you might refine rubric)
2. Compress (gzip to ~50 MB, can decompress later)
3. Delete (only if research is final and won't be updated)
Choice? (1/2/3)
If user chooses compress:
gzip abstracts-cache.json
# Creates abstracts-cache.json.gz
echo "Compressed abstracts-cache.json to $(du -h abstracts-cache.json.gz | cut -f1)"
If user created helper scripts during research:
📝 Found helper scripts:
- screen_papers.py (created for batch screening)
- deep_dive_papers.py (created for data extraction)
These scripts document your methodology. Recommendations:
- Keep for reproducibility
- Add comments if not already documented
- Reference in SUMMARY.md under "Reproducibility" section
Keep scripts? (y/n)
If cleaning up multiple sessions:
# Find all research sessions
find research-sessions/ -maxdepth 1 -type d
# For each session:
for session in research-sessions/*/; do
echo "Analyzing: $session"
# Run cleanup analysis
done
Ask user:
Found 5 completed research sessions.
Clean up all sessions? (y/n/select)
- y: Analyze and clean all sessions
- n: Cancel
- select: Choose which sessions to clean
Maintain hardcoded list of patterns to NEVER delete:
PROTECTED_PATTERNS = [
'SUMMARY.md',
'relevant-papers.json',
'papers-reviewed.json',
'papers/*.pdf',
'papers/*.zip',
'citations/citation-graph.json',
'screening-criteria.json',
'test-set.json',
'abstracts-cache.json',
'rubric-changelog.md',
'README.md',
'TOP_PRIORITY_PAPERS.md',
'evaluated-papers.json',
'*.py', # Helper scripts
'.claude/*', # Project settings
]
Before deleting any file:
def is_protected(filepath):
"""Check if file matches any protected pattern"""
for pattern in PROTECTED_PATTERNS:
if fnmatch(filepath, pattern):
return True
return False
# Never delete protected files
if is_protected(file_to_delete):
print(f"⚠️ ERROR: {file_to_delete} is protected and cannot be deleted")
return
Always show what will be deleted before doing it:
# Dry run (show only, don't delete)
echo "DRY RUN - No files will be deleted"
for file in $candidate_files; do
if is_safe_to_delete "$file"; then
echo "Would delete: $file ($(du -h $file | cut -f1))"
fi
done
echo ""
echo "Proceed with actual deletion? (y/n)"
After answering-research-questions workflow:
Add to answering-research-questions Phase 8:
### Optional: Cleanup
After reviewing outputs, optionally clean up intermediate files:
"Research session is complete. Would you like me to clean up intermediate files?
I'll show you what will be deleted before removing anything."
If yes: Use `cleaning-up-research-sessions` skill
Deleting papers-reviewed.json: This is the deduplication database - NEVER delete → Always protect with hardcoded list Deleting abstracts-cache.json: Needed for re-screening → Ask user, default to keep Deleting helper scripts: Important for reproducibility → Keep by default, ask if user wants to remove Not showing user what will be deleted: User needs to see the plan → Always show dry run first Deleting without confirmation: Too risky → Always ask for final confirmation Not verifying after cleanup: Files could be corrupted → Run integrity check after deletion
| Action | Command |
|--------|---------|
| List all files with sizes | find . -type f -exec ls -lh {} \; \| sort -rh |
| Check if JSON valid | jq empty file.json |
| Move to trash (macOS) | mv file ~/.Trash/ |
| Move to trash (Linux) | mv file ~/.local/share/Trash/files/ |
| Compress large file | gzip file |
| Calculate folder size | du -sh . |
Cleanup successful when:
User: "Can you clean up the research session folder?"
Agent: "I'll analyze the research session and show you what can be safely cleaned up."
[Analyzes folder]
Agent: "🧹 Cleanup Analysis for: research-sessions/2025-10-11-btk-selectivity/
Files to KEEP (protected):
✅ SUMMARY.md (45 KB) - Final findings
✅ relevant-papers.json (12 KB) - Filtered results
✅ papers-reviewed.json (28 KB) - Complete history
✅ papers/ (14 PDFs, 32 MB) - Downloaded papers
✅ citations/citation-graph.json (5 KB) - Citation network
✅ screening-criteria.json (2 KB) - Rubric definition
✅ abstracts-cache.json (156 KB) - Cached abstracts
Files that CAN be removed:
🗑️ initial-search-results.json (8 KB) - Raw PubMed results
→ Data is preserved in papers-reviewed.json
🗑️ .DS_Store (6 KB) - macOS metadata
→ Not needed
Total space to recover: 14 KB
Proceed with cleanup? (y/n/review)"
User: "y"
Agent: "Final confirmation - About to delete:
- initial-search-results.json (8 KB)
- .DS_Store (6 KB)
All protected files will be preserved. Confirm? (y/n)"
User: "y"
Agent: [Deletes files]
"✅ Cleanup complete!
Removed:
- initial-search-results.json (8 KB)
- .DS_Store (6 KB)
Space recovered: 14 KB
Running integrity check...
✅ All core files intact
✅ All JSON files valid
✅ All 14 PDFs present
Research session is ready for archiving or sharing."
After cleanup:
development
Conduct rigorous thematic analysis (TA) of qualitative data following Braun and Clarke's (2006) six-phase framework. Use whenever the user mentions 'thematic analysis', 'TA', 'Braun and Clarke', 'qualitative coding', 'identifying themes', or asks for help analysing interviews, focus groups, open-ended survey responses, or transcripts to identify patterns. Also trigger for questions about inductive vs theoretical coding, semantic vs latent themes, essentialist vs constructionist epistemology, building a thematic map, or writing up a qualitative findings section. Covers all six phases, the four upfront analytic decisions, the 15-point quality checklist, and the five common pitfalls. Produces a Word document write-up and an annotated thematic map. Does NOT cover IPA, grounded theory, discourse analysis, conversation analysis, or narrative analysis — use a different method for those.
development
Guide users through writing a systematic literature review (SLR) following the PRISMA 2020 framework. Use this skill whenever the user mentions 'systematic review', 'systematic literature review', 'SLR', 'PRISMA', 'PRISMA 2020', 'PRISMA flow diagram', 'PRISMA checklist', or asks for help writing, structuring, or auditing a literature review that follows reporting guidelines. Also trigger when the user asks about inclusion/exclusion criteria for a review, search strategies for databases like Scopus/WoS/PubMed, study selection processes, risk of bias assessment, or narrative synthesis for a review paper. This skill covers the full PRISMA 2020 checklist (27 items), produces a Word document manuscript in strict journal article format, generates an annotated PRISMA flow diagram, and enforces APA 7th Edition referencing throughout. It does NOT cover meta-analysis or statistical pooling. By Chuah Kee Man.
testing
Performs placebo-in-time sensitivity analysis with hierarchical null model and optional Bayesian assurance. Use when checking model robustness, verifying lack of pre-intervention effects, or estimating study power.
data-ai
Fit, summarize, plot, and interpret a chosen CausalPy experiment. Use after the causal method has been selected, including when configuring PyMC/sklearn models and scale-aware custom priors.