skills/deep-research-reports/SKILL.md
Process deep research report outputs for scRNAseq cell type annotation and non-metazoan gene characterization. Use when the user invokes /process-deep-research or asks to clean, convert to PDF/HTML, parse, or compile deep research reports from ChatGPT or Claude. Handles artifact removal, PDF/HTML generation, YAML header extraction, and summary table maintenance. Supports three report types: cluster-level cell-type-annotation, family-level cell-type-annotation (WGCNA-module-organized), and nonmetazoan characterization. Detects report type automatically from query.report_type field. Works on both local macOS and HPC cluster (auto-detects quarto path and LaTeX availability).
npx skillsauth add musserlab/lab-claude-skills deep-research-reportsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Processes deep research reports (from ChatGPT, Claude, or other platforms) generated by the deep-research-genelist skill. Cleans platform-specific artifacts, generates publication-quality PDF and/or HTML, extracts the YAML header, and maintains a growing annotation summary table.
Accept one of:
outs/deep_research*/**/*.md (note the * after deep_research — catches deep_research/, deep_research_platynereis/, etc.), skip files ending in _clean.md, _raw.md, or _prompt.mdouts/deep_research*/**/*.md, same exclusions, then keep only files that don't yet have a corresponding _clean.mdIf no outs/deep_research*/ directories exist, tell the user and stop.
Each module gets its own subdirectory under outs/deep_research/:
outs/deep_research/
├── annotation_summary.tsv ← cross-cluster summary (top level)
├── clade6sub25/ ← per-module subdirectory
│ ├── clade6sub25_prompt.md ← shared prompt (no date/platform prefix)
│ ├── 260304_chatgpt_clade6sub25_raw.md
│ ├── 260304_chatgpt_clade6sub25_clean.md
│ ├── 260304_chatgpt_clade6sub25_report.pdf
│ ├── 260304_chatgpt_clade6sub25_report.html
│ ├── 260304_claude_clade6sub25_clean.md
│ ├── 260304_claude_clade6sub25_report.pdf
│ └── 260304_claude_clade6sub25_report.html
└── another_cluster/
└── ...
Create the subdirectory outs/deep_research/{module_id}/ if it doesn't exist. All output files for that module go inside it.
Read the file as raw bytes. Detect the platform from artifact signatures:
| Signal | Platform |
|--------|----------|
| PUA chars (U+E000–U+F8FF) present | chatgpt (Deep Research mode) |
| entity[ or citeturn in text after PUA stripping | chatgpt (Deep Research mode) |
| None of the above | Ask user (see below) |
Artifact-free reports (no PUA, no entity tags) could be from Claude OR ChatGPT Pro (extended thinking). These two platforms produce clean output with no artifacts. When no artifacts are detected, ask the user which platform generated the report using AskUserQuestion:
chatgpt_pro — ChatGPT Pro / extended thinking / o-series modelsclaude — Claude (Anthropic)This matters for the canonical filename (260330_chatgpt_pro_family_C vs 260330_claude_family_C) and for the summary table platform column.
Report detection to user: "Detected {platform} report." (or "No artifacts detected — asking platform.")
Before cleaning, extract the YAML front matter (between first --- line and next --- line) to get module_id and date_generated. These are needed for the canonical filename.
Pattern depends on report type:
YYMMDD_platform_moduleID (e.g., 260304_chatgpt_clade6sub25)YYMMDD_platform_family_FAMILYID (e.g., 260330_claude_family_C)YYMMDD_platform_kingdom (e.g., 260315_chatgpt_prokaryote)Components:
YYMMDD from query.date_generated (format: YYYY-MM-DD → YYMMDD). Fall back to today's date.platform: chatgpt, chatgpt_pro, or claude (lowercase)moduleID: from query.module_id (cluster reports)FAMILYID: from query.family (family reports — note: NOT query.module_id)kingdom: from query.kingdom (nonmetazoan reports)ChatGPT reports (have artifacts):
Cleaning order is critical — PUA characters must be stripped first or other patterns won't match.
import re
# 1. Strip all PUA characters (U+E000–U+F8FF) — MUST be first
text = re.sub(r'[\ue000-\uf8ff]', '', text)
# 2. Replace entity["type","name","desc"] → name
text = re.sub(r'entity\["[^"]*","([^"]*)","[^"]*"\]', r'\1', text)
# 3. Remove citeturn... strings (the [N] refs are already inline)
text = re.sub(r'\s*citeturn\S+', '', text)
# 4. Remove image_group{...} lines
text = re.sub(r'^.*image_group\{.*\}.*$', '', text, flags=re.MULTILINE)
# 5. Normalize whitespace
text = re.sub(r' +', ' ', text) # collapse double spaces
text = re.sub(r'\n{3,}', '\n\n', text) # collapse triple+ newlines
text = re.sub(r' +$', '', text, flags=re.MULTILINE) # strip trailing whitespace
Output files:
{base}_raw.md (preserves original with artifacts for reference){base}_clean.mdReport artifact counts: "Removed X entity tags, Y citation markers, Z PUA characters."
Claude reports (already clean):
Claude deep research outputs have no known artifacts (verified: zero PUA chars, zero entity tags, zero citeturn markers).
{base}_clean.md (no separate _raw.md since there are no artifacts to preserve)Report: "Claude report — no artifacts detected. Renamed to {base}_clean.md."
Parse the YAML front matter from the clean file using Python yaml.safe_load().
Report type detection: Check query.report_type to determine which parser to use:
"cluster" (or absent) → cluster cell-type-annotation parser"family" → family cell-type-annotation parser (see Step 3c below)"nonmetazoan_characterization" → nonmetazoan parser (see Step 3b below)ChatGPT YAML indentation fix: ChatGPT inconsistently outputs YAML with 1-space or 2-space indentation across runs. 1-space indent is valid YAML for simple key-value mappings, but breaks for two specific constructs:
summary: >) — the continuation text lands at the same indent as sibling keys, so the parser can't distinguish block scalar content from a new key- cell_type: "..." followed by organism: "...") — continuation keys land at the same indent as the - marker, so they're parsed as siblings rather than part of the list item's mappingImportant: Do NOT "double all indentation" — this breaks normal mapping keys by making children indent past their siblings. Instead, apply a targeted fix that only touches the two broken patterns:
The fix: try yaml.safe_load() first. If it fails, apply fix_flat_yaml() (for zero-indent ChatGPT Pro output), then fix_chatgpt_yaml() (for 1-space indent issues), then retry.
fix_flat_yaml — zero-indent ChatGPT Pro / extended thinking outputChatGPT Pro with extended thinking outputs YAML with ALL keys at column 0 (zero indent under section headers). This fixer detects and corrects this pattern before the existing fix_chatgpt_yaml runs.
def fix_flat_yaml(yaml_text):
"""Fix YAML where all keys are at column 0 (ChatGPT Pro / extended thinking).
Detects this pattern: top-level section keys (query:, annotation:, markers:)
followed by child keys at the same indent level. Adds proper indent to all
child keys under each section. Distinguishes mapping list items ("- key: val"
→ continuations get 6-space indent) from simple list items ('- "value"' →
next sibling key resets to 2-space).
"""
lines = yaml_text.split('\n')
# Quick check: if we see 'query:' at col 0 followed by 'organism:' at col 0,
# this is a flat YAML that needs fixing
has_flat_pattern = False
for i, line in enumerate(lines):
if line.strip() == 'query:' and i + 1 < len(lines):
next_line = lines[i + 1]
if next_line and not next_line.startswith(' ') and ':' in next_line:
has_flat_pattern = True
break
if not has_flat_pattern:
return yaml_text # Already properly indented
fixed = []
section = None
section_keys = {'query', 'annotation', 'markers', 'classification',
'functional_categories', 'hgt_candidates', 'expression_patterns',
'biology'}
in_block_scalar = False
in_mapping_list = False # inside "- key: val" list item (continuations need 6-space)
in_simple_list = False # inside "- value" list (next sibling key resets to 2-space)
for line in lines:
stripped = line.strip()
n_spaces = len(line) - len(line.lstrip())
if not stripped:
in_block_scalar = False
in_mapping_list = False
in_simple_list = False
fixed.append('')
continue
# Block scalar continuation
if in_block_scalar and n_spaces == 0 and not any(
stripped.startswith(k + ':') for k in section_keys
):
fixed.append(' ' + stripped)
continue
elif in_block_scalar:
in_block_scalar = False
# Top-level section key
key_candidate = stripped.split(':')[0].strip()
if n_spaces == 0 and key_candidate in section_keys and stripped.endswith(':'):
section = key_candidate
in_mapping_list = False
in_simple_list = False
fixed.append(line)
continue
# Detect block scalar start
if stripped.endswith(': >') or stripped.endswith(': |'):
in_block_scalar = True
# Child content under a section at col 0
if section and n_spaces == 0:
if stripped.startswith('---'):
section = None
in_mapping_list = False
in_simple_list = False
fixed.append(line)
continue
if key_candidate in section_keys:
section = key_candidate
in_mapping_list = False
in_simple_list = False
fixed.append(line)
continue
# List items
if stripped.startswith('- '):
after_dash = stripped[2:].strip()
if re.match(r'[\w_-]+\s*:', after_dash):
in_mapping_list = True
in_simple_list = False
else:
in_simple_list = True
in_mapping_list = False
fixed.append(' ' + stripped)
continue
# Continuation key inside a mapping list item
if in_mapping_list and re.match(r'[\w_-]+\s*:', stripped):
fixed.append(' ' + stripped)
continue
# Regular key (not inside a list, or after simple list ended)
in_mapping_list = False
in_simple_list = False
fixed.append(' ' + stripped)
continue
# Already indented — add 2 more spaces for section nesting
if section and n_spaces > 0:
fixed.append(' ' + line)
continue
fixed.append(line)
return '\n'.join(fixed)
fix_chatgpt_yaml — 1-space indent issuesThis existing fixer handles the two specific constructs broken by 1-space indentation:
: > or : |, adds 1 extra space to continuation lines (non-key, non-blank lines at the same indent) until a blank line or a real key is encountered- key: value (list item mapping start), adds 2 extra spaces to subsequent non-- key lines at the same indent (these are continuation keys inside the mapping)def fix_chatgpt_yaml(yaml_text):
"""Fix ChatGPT 1-space indent YAML: folded block scalars + list item mapping continuations."""
lines = yaml_text.split('\n')
fixed = []
in_block_scalar = False
block_scalar_key_indent = -1
in_list_item = False
list_item_indent = -1
for i, line in enumerate(lines):
stripped = line.lstrip(' ')
n_spaces = len(line) - len(stripped)
if stripped == '':
in_block_scalar = False
fixed.append(line)
continue
# FIX 1: Folded/literal block scalar continuation
if in_block_scalar:
if (n_spaces == block_scalar_key_indent
and not re.match(r'[\w-]+\s*:', stripped)
and not stripped.startswith('- ')):
fixed.append(' ' * (n_spaces + 1) + stripped)
continue
else:
in_block_scalar = False
# FIX 2: List item mapping continuation
if in_list_item:
if (n_spaces == list_item_indent
and re.match(r'[\w_-]+\s*:', stripped)
and not stripped.startswith('- ')):
fixed.append(' ' * (n_spaces + 2) + stripped)
continue
elif n_spaces <= list_item_indent and not (
n_spaces == list_item_indent and stripped.startswith('- ')):
in_list_item = False
if re.search(r':\s*[>|]\s*$', stripped):
in_block_scalar = True
block_scalar_key_indent = n_spaces
if re.match(r'- [\w_-]+\s*:', stripped):
in_list_item = True
list_item_indent = n_spaces
fixed.append(line)
return '\n'.join(fixed)
This handles both 1-space and 2-space ChatGPT output without breaking valid YAML.
Validate required fields:
Required:
query.module_id, query.organism, query.n_genesannotation.proposed_name, annotation.confidence, annotation.one_linemarkers.top_diagnostic (non-empty list)Optional but checked:
query.source_object, query.clustering_column, query.marker_file — warn if missing (these are provenance fields added in v2 of the genelist skill; older prompts won't have them)query.comparison_mode, query.clade_family — present when merged marker mode was used; absent for single-list prompts or older prompts. Default to empty string in summary table.query.report_type — "cluster" (default) or "family". Family reports are generated by the family-aware mode of the genelist skill.query.member_clusters — list of fine cluster names (family reports only). Default to empty list.query.n_member_clusters — integer count (family reports only). Default to 0.If validation fails, report which fields are missing and ask whether to proceed.
When query.report_type == "nonmetazoan_characterization", use this validation instead:
Detect template variant from which classification fields are present:
classification.n_candidate_hgt → prokaryote variant (Bacteria, Archaea, combined)classification.n_database_bias → eukaryote variant (Fungi, Viridiplantae, etc.)Required fields (both variants):
query.kingdom, query.organism, query.n_genes, query.n_expressedclassification.assessment_confidencefunctional_categoriesProkaryote-specific required:
classification.n_symbiont_transcript, classification.n_candidate_hgt, classification.n_conserved, classification.n_ambiguous (can be 0)Eukaryote-specific required:
classification.n_database_bias, classification.n_conserved, classification.n_symbiont_transcript, classification.n_lateral_transfer, classification.n_ambiguous (can be 0)Optional but checked:
hgt_candidates — warn if absent for prokaryote reportsexpression_patterns.cell_type_enriched_genes — warn if emptybiology.symbiosis_relevant — prokaryote onlybiology.evolutionary_insight — eukaryote onlyCanonical base name for nonmetazoan reports:
Pattern: YYMMDD_platform_kingdom (e.g., 260315_chatgpt_prokaryote, 260315_claude_fungi)
kingdom from query.kingdom, lowercased, spaces replaced with underscoresDirectory: Reports go in outs/deep_research/nonmetazoan/{kingdom}/ (note: under deep_research/nonmetazoan/, not the script-04 outs/scmicrobiome/ directory — the processed reports live alongside cell-type-annotation reports).
When query.report_type == "family", use this validation:
Required:
query.family, query.organism, query.n_modules or query.n_member_clustersannotation.proposed_family_name, annotation.confidence, annotation.one_lineannotation.per_cluster (non-empty list with at least cluster and proposed_name per entry)Optional but checked:
query.subfamilies — subfamily structure. Default empty.annotation.best_matches — comparative matches. Default empty list.annotation.family_conservation — conservation level. Default empty.markers.family_defining — family marker list. Warn if empty.markers.family_specifying_tfs, markers.cluster_specifying_tfs — TF annotations.Canonical base name: YYMMDD_platform_family_FAMILYID
Directory: Reports go in the same directory as the input file (family reports already live in their own subdirectory, e.g., outs/deep_research_platynereis/family_C/).
Use AskUserQuestion:
Create a temporary _report_for_render.md from {base}_clean.md:
Strip the data YAML header — the large query:/annotation:/markers: (or classification:/functional_categories:) block is for programmatic parsing only, not for display
Insert a pandoc formatting YAML. The title format depends on report type:
"Gene Module Interpretation: *{organism}* {module_id}""Cell Type Family Report: *{organism}* Family {family}""Non-Metazoan Gene Characterization: *{organism}* — {kingdom}"---
title: "<title per report type>"
subtitle: "Deep Research Report — {Platform} ({date_generated})"
geometry: margin=1in
fontsize: 11pt
header-includes:
- \renewcommand{\arraystretch}{1.4}
- \usepackage{booktabs}
- \usepackage{longtable}
---
The \arraystretch{1.4} adds 40% vertical padding to every table row — critical for readability.
Remove the H1 title from the body (it's now in the YAML title: field)
Normalize heading levels — ChatGPT reports nest headings too deep, producing tiny headers at H4/H5 in PDFs. Apply these fixes in order:
a. Remove redundant heading pairs. ChatGPT often outputs a generic ## heading immediately followed by the lettered ### section (e.g., ## Comparative biological analysis then ### H. Comparative Biological Analysis). Remove the redundant ## line:
body = re.sub(r'^##\s+[^\n]+\n\n(###\s+[A-L]\.)', r'\1', body, flags=re.MULTILINE)
b. Shift all headings up one level. Since the H1 title was removed, the remaining hierarchy is too deep (##→####). Shift ##→#, ###→##, ####→###, etc. (never shift H1):
def shift_heading(m):
hashes = m.group(1)
rest = m.group(2)
if len(hashes) > 1:
return '#' * (len(hashes) - 1) + rest
return m.group(0)
body = re.sub(r'^(#{2,})([ \t].*)$', shift_heading, body, flags=re.MULTILINE)
After normalization, the typical hierarchy is: # (sections A–K), ## (subsections like "Core identity modules"), ### (sub-subsections).
Escape backslashes in gene names for LaTeX. ChatGPT reports sometimes include gene names with literal backslashes (e.g., Dmel\cg5579), which LaTeX interprets as control sequences. Escape them in the body only:
body = re.sub(r'\\(?=[a-zA-Z])', r'\\\\', body)
Detect rendering tools: Find quarto on PATH or common locations (~/.local/bin/quarto, /usr/local/bin/quarto). Check for xelatex availability — if not on PATH, try module load texlive (HPC cluster). If no LaTeX available, skip PDF and inform user. On cluster, shell commands for PDF rendering must include module load texlive && prefix.
Important: When writing the pandoc YAML header from Python, write the LaTeX commands directly to the file (not through Python f-strings). The backslashes in \renewcommand, \arraystretch, \usepackage must appear literally in the markdown file — do NOT double-escape them in Python. Use f.write() with raw strings or explicit line writes.
Generate PDF:
{quarto_path} pandoc \
"{base}_report_for_render.md" \
-o "{base}_report.pdf" \
--pdf-engine=xelatex \
-V colorlinks=true -V linkcolor=blue -V urlcolor=blue
On cluster, prepend module load texlive && before the quarto command.
Generate HTML:
{quarto_path} pandoc \
"{base}_report_for_render.md" \
-o "{base}_report.html" \
--standalone \
--css="$HOME/.claude/skills/deep-research-reports/templates/report-style.css" \
--embed-resources
The --embed-resources flag inlines the CSS so the HTML is fully self-contained.
Delete _report_for_render.md after successful conversion.
If pandoc fails (e.g., LaTeX error), fall back to HTML only and notify the user.
Extract fields from validated YAML and update outs/deep_research/annotation_summary.tsv.
Composite key: module_id + platform + date_generated. This allows the table to hold both ChatGPT and Claude annotations for the same cluster side by side.
Logic:
csv with delimiter='\t')Columns — parse ALL YAML fields into the summary table:
module_id, organism, common_name, clade, dataset, module_type, report_type, source_object, clustering_column, marker_file, comparison_mode, clade_family, member_clusters, n_member_clusters, biological_context, n_genes, proposed_name, alternative_names, confidence, confidence_rationale, one_line, summary, cell_type_family, family_conservation, top_tfs, top_markers, top_diagnostic_ids, top_diagnostic_roles, receptors_channels, signaling_ligands, adhesion_molecules, secreted_products, key_pathways, metabolic_signature, n_uncharacterized, best_match_1, best_match_2, best_match_3, date_generated, platform, report_file, date_processed
Field mapping from YAML:
| Summary column | YAML source | Format |
|---------------|-------------|--------|
| module_id | query.module_id | verbatim |
| organism | query.organism | verbatim |
| common_name | query.common_name | verbatim |
| clade | query.clade | verbatim |
| dataset | query.dataset | verbatim |
| module_type | query.module_type | verbatim |
| report_type | query.report_type | verbatim; default "cluster" |
| source_object | query.source_object | verbatim |
| clustering_column | query.clustering_column | verbatim |
| marker_file | query.marker_file | verbatim |
| comparison_mode | query.comparison_mode | verbatim; default empty |
| clade_family | query.clade_family | verbatim; default empty |
| member_clusters | query.member_clusters | join with ; ; default empty (family reports only) |
| n_member_clusters | query.n_member_clusters | integer; default 0 (family reports only) |
| biological_context | query.biological_context | verbatim |
| n_genes | query.n_genes | integer |
| proposed_name | annotation.proposed_name | verbatim |
| alternative_names | annotation.alternative_names | join with ; |
| confidence | annotation.confidence | verbatim |
| confidence_rationale | annotation.confidence_rationale | verbatim |
| one_line | annotation.one_line | verbatim |
| summary | annotation.summary | verbatim (may be multi-line — flatten to single line) |
| cell_type_family | annotation.cell_type_family | verbatim |
| family_conservation | annotation.family_conservation | verbatim |
| top_tfs | markers.transcription_factors | join with ; |
| top_markers | markers.top_diagnostic[].name | top 5, join with ; |
| top_diagnostic_ids | markers.top_diagnostic[].gene_id | top 5, join with ; |
| top_diagnostic_roles | markers.top_diagnostic[].role | top 5, join with ; |
| receptors_channels | markers.receptors_channels | join with ; |
| signaling_ligands | markers.signaling_ligands | join with ; |
| adhesion_molecules | markers.adhesion_molecules | join with ; |
| secreted_products | markers.secreted_products | join with ; |
| key_pathways | markers.key_pathways | join with ; |
| metabolic_signature | markers.metabolic_signature | verbatim |
| n_uncharacterized | markers.n_uncharacterized_notable | integer |
| best_match_1 | annotation.best_matches[0] | cell_type (organism) [conservation] |
| best_match_2 | annotation.best_matches[1] | cell_type (organism) [conservation] |
| best_match_3 | annotation.best_matches[2] | cell_type (organism) [conservation] |
| date_generated | query.date_generated | verbatim |
| platform | detected platform | chatgpt or claude |
| report_file | computed path | {base}_clean.md |
| date_processed | current date | YYYY-MM-DD |
Fields may be absent in older reports — default to empty string.
Family report field mappings: For family reports, the summary table uses the same annotation_summary.tsv but with these field mappings:
| Summary column | Family YAML source |
|---|---|
| module_id | query.family (prefixed with family_, e.g., family_C) |
| proposed_name | annotation.proposed_family_name |
| member_clusters | query.member_clusters joined with ; |
| n_member_clusters | query.n_member_clusters |
All other columns map the same as cluster reports. The per_cluster annotations are NOT added to the summary table row — they are available in the clean markdown file for detailed parsing.
For nonmetazoan reports (report_type == "nonmetazoan_characterization"), update a separate summary table at outs/deep_research/nonmetazoan_summary.tsv. Do NOT mix with annotation_summary.tsv — the column semantics are fundamentally different.
Composite key: kingdom + platform + date_generated
Columns:
| Column | Source | Format |
|--------|--------|--------|
| kingdom | query.kingdom | verbatim |
| phylum | query.phylum | verbatim |
| organism | query.organism | verbatim |
| common_name | query.common_name | verbatim |
| n_genes | query.n_genes | int |
| n_expressed | query.n_expressed | int |
| template_variant | detected from fields | "prokaryote" or "eukaryote" |
| n_symbiont_transcript | classification.n_symbiont_transcript | int; 0 if absent |
| n_candidate_hgt | classification.n_candidate_hgt | int; 0 if absent |
| n_database_bias | classification.n_database_bias | int; 0 if absent |
| n_lateral_transfer | classification.n_lateral_transfer | int; 0 if absent |
| n_conserved | classification.n_conserved | int |
| n_ambiguous | classification.n_ambiguous | int |
| assessment_confidence | classification.assessment_confidence | str |
| confidence_rationale | classification.confidence_rationale | str |
| n_functional_categories | len(functional_categories) | int |
| top_categories | top 5 category names by n_genes | join "; " |
| top_category_origins | matching likely_origin for top 5 | join "; " |
| symbiosis_relevant | biology.symbiosis_relevant | bool; empty for eukaryote |
| evolutionary_insight | biology.evolutionary_insight | str; empty for prokaryote |
| key_functions | biology.key_functions | join "; " |
| recommended_followup | biology.recommended_followup | join "; " |
| n_hgt_candidates | len(hgt_candidates.top_candidates) | int; 0 for eukaryote |
| top_hgt_genes | top 3 candidate gene_ids | join "; " |
| n_cell_type_enriched | len(expression_patterns.cell_type_enriched_genes) | int |
| top_enriched_genes | top 5 gene_id values | join "; " |
| notable_associations | expression_patterns.notable_associations | join "; " |
| date_generated | query.date_generated | str |
| platform | detected | "chatgpt" or "claude" |
| report_file | computed path | str |
| date_processed | current date | YYYY-MM-DD |
Cell-type-annotation reports:
Processed: clade6sub25_annotation_report.md
Platform: Claude
Renamed: clade6sub25/260304_claude_clade6sub25_clean.md
PDF: outs/deep_research/clade6sub25/260304_claude_clade6sub25_report.pdf
HTML: outs/deep_research/clade6sub25/260304_claude_clade6sub25_report.html
Summary: annotation_summary.tsv (new row added)
Annotation: "hemocyte-like immune/scavenger cells (GCM+ phagocytes)"
Confidence: medium
Cell type family: immune/scavenger (hemocyte-like)
Nonmetazoan characterization reports:
Processed: deep-research-report (24).md
Platform: ChatGPT
Report type: nonmetazoan_characterization (prokaryote variant)
Cleaned: prokaryote/260315_chatgpt_prokaryote_clean.md
PDF: prokaryote/260315_chatgpt_prokaryote_report.pdf
HTML: prokaryote/260315_chatgpt_prokaryote_report.html
Summary: nonmetazoan_summary.tsv (new row added)
Kingdom: Prokaryote (369 genes, 178 expressed)
Classification: 310 symbiont, 12 HGT, 25 conserved, 22 ambiguous
Confidence: low
Top HGT candidates: c102759-g4 (Protein-ADP-ribose hydrolase), c101192-g1 (Deubiquitinase)
When processing multiple reports (via "all" or "new"), the skill:
_clean.md files are the archival versions — they retain the YAML header for parsing and are the source of truth for the summary table.{module_id}_prompt.md) is shared across platforms and dates — it doesn't get the date/platform prefix.templates/cleaning-patterns.md.development
Phylogenetic tree visualization and formatting with ggtree (R) or iTOL (web). Use when rendering a phylogenetic tree as a figure, choosing tree layout, coloring branches or labels by taxonomy, collapsing clades, displaying support values, or adding overlays to a tree. Do NOT load for tree inference (use protein-phylogeny skill) or domain annotation (future separate skill).
development
Configure and manage Claude Code security protections for sensitive files, credentials, and data. Use when the user invokes /security-setup to set up or modify protections against unauthorized file access, credential exposure, or sensitive data leaks.
development
Script organization for data science analysis projects with numbered scripts, data/outs/ directories, and reproducibility conventions. Use when creating new analysis scripts in projects that follow data science conventions (numbered XX_ prefix scripts, outs/ directories, BUILD_INFO.txt). Do NOT load for documentation projects (Quarto books), infrastructure repos, or projects without data/outs/ directory structure.
testing
R renv package management for data science projects. Use when working with renv (renv.lock, renv::restore, renv::snapshot) in R analysis projects. Do NOT load for projects that do not use R or renv.