skills/tooluniverse-cell-line-profiling/SKILL.md
Cancer cell-line selection and profiling for experimental model choice. Cross-references DepMap, Cellosaurus, COSMIC, PharmacoDB to deliver identity verification, mutation/CNV profile, gene dependencies, drug sensitivities, and druggable targets. Use to answer 'which cell line should I use for studying gene X?' or 'is this cell line a good model for cancer Y?'. Outputs ranked recommendations with rationale, growth characteristics, and known pitfalls.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-cell-line-profilingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Comprehensive profiling of cancer cell lines for experimental model selection. Transforms a query (cancer type, gene, or cell line name) into an actionable report covering identity verification, molecular features, gene dependencies, drug sensitivities, and druggable targets.
KEY PRINCIPLES:
When uncertain about any scientific fact, SEARCH databases first rather than reasoning from memory. A database-verified answer is always more reliable than a guess.
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
Apply for: cell line selection by cancer type/gene, cell line profiling, gene dependencies, drug sensitivity queries, cell line comparisons, mutation checks.
BEFORE calling ANY tool, verify parameters against this table.
| Tool | Key Parameters | Notes |
|------|---------------|-------|
| DepMap_search_cell_lines | query (required) | Search by name, e.g., "A549", "MCF" |
| DepMap_get_cell_line | model_name OR model_id | Name: "A549"; ID: "SIDM00001" |
| DepMap_get_cell_lines | tissue, cancer_type, page_size | Filter by tissue (e.g., "Lung") |
| DepMap_get_gene_dependencies | gene_symbol (required), model_id | Gene effect scores; negative = essential |
| DepMap_search_genes | query (required) | Validate gene symbol in DepMap first |
| cellosaurus_search_cell_lines | q (required), size | Solr syntax: id:HeLa, ox:9606 AND char:cancer |
| cellosaurus_get_cell_line_info | accession (required, CVCL_ format) | Full cell line record |
| cellosaurus_query_converter | query (required) | Natural language to Solr syntax |
| COSMIC_search_mutations | terms OR query, max_results | Search "BRAF V600E" or gene name |
| COSMIC_get_mutations_by_gene | gene OR gene_name, max_results | All mutations for a gene |
| PharmacoDB_get_cell_line | operation="get_cell_line", cell_name | Cell line metadata + datasets |
| PharmacoDB_get_experiments | operation="get_experiments", compound_name, cell_line_name, dataset_name, per_page | Drug response data (IC50, AAC, EC50) |
| PharmacoDB_get_biomarker_assoc | operation="get_biomarker_associations", compound_name, tissue_name, mdata_type, per_page | Gene-drug sensitivity correlations |
| PharmacoDB_search | operation="search", query | Find PharmacoDB IDs |
| CellMarker_search_cancer_markers | operation="search_cancer_markers", cancer_type, gene_symbol, cell_type | Cancer cell markers |
| CellMarker_search_by_gene | operation="search_by_gene", gene_symbol (required), species | Cell types expressing a gene |
| HPA_get_comparative_expression_by_gene_and_cellline | gene_name (required), cell_line (required) | Supported lines: ishikawa, hela, mcf7, a549, hepg2, jurkat, pc3, rh30, siha, u251 |
| CLUE_get_cell_lines | operation="get_cell_lines", cell_id | L1000 CMap cell line info (requires CLUE_API_KEY) |
| SYNERGxDB_search_combos | drug_name_1, drug_name_2, sample (tissue or cell ID) | Drug combination synergy (ZIP, Bliss, Loewe) |
| SYNERGxDB_list_cell_lines | - | All cell lines in SYNERGxDB |
| DGIdb_get_drug_gene_interactions | genes: list[str] | Druggable gene interactions |
| OpenTargets_get_associated_drugs_by_target_ensemblID | ensemblId, size | Drugs targeting a gene |
| STRING_get_network | protein_ids: list[str], species: int (9606) | PPI network for gene context |
| MyGene_query_genes | query (NOT q) | Resolve gene symbol to Ensembl ID |
| cBioPortal_get_mutations | study_id, gene_list (STRING, not array) | Cell line mutations from CCLE |
Input: Cancer type AND/OR Gene of interest AND/OR Cell line name(s)
Phase 1: Cell Line Identification
- Search and verify cell line identity (Cellosaurus)
- Get metadata: species, disease, STR profile, cross-references
- If cancer type given without cell line: find candidate lines (DepMap)
Phase 2: Molecular Profiling
- Mutation landscape (COSMIC, cBioPortal CCLE)
- Gene expression (HPA, DepMap)
- Cancer markers (CellMarker)
Phase 3: Gene Dependencies (CRISPR Screens)
- Gene essentiality scores from DepMap
- Identify selectively essential genes
- Compare across cell lines if multiple candidates
Phase 4: Drug Sensitivity
- IC50/AAC from PharmacoDB (GDSC, CCLE, CTRPv2, PRISM)
- Biomarker associations for drug response
- Drug combination synergy (SYNERGxDB)
Phase 5: Target Druggability & Recommendations
- Druggable targets (DGIdb, OpenTargets)
- Final ranked recommendation with rationale
Goal: Verify cell line identity and find candidates.
If specific cell line given: (1) cellosaurus_search_cell_lines(q="id:<NAME>") → get CVCL accession, species, disease, contamination flags. (2) cellosaurus_get_cell_line_info(accession="CVCL_XXXX") for STR profile. (3) DepMap_get_cell_line(model_name="...") for tissue, cancer_type, MSI, ploidy. (4) PharmacoDB_get_cell_line(operation="get_cell_line", cell_name="...") for datasets.
If cancer type only: (1) DepMap_get_cell_lines(tissue="Lung", page_size=20). (2) Narrow by gene mutations/dependencies in Phases 2-3. (3) CellMarker_search_cancer_markers(operation="search_cancer_markers", cancer_type="Lung").
OUTPUT: Table of candidate cell lines with: name, tissue, cancer type, key identifiers.
Goal: Characterize mutational and expression landscape.
2A Mutations: COSMIC_get_mutations_by_gene(gene="EGFR") + cBioPortal_get_mutations(study_id="ccle_broad_2019", gene_list="EGFR,KRAS,TP53"). Note: gene_list is a comma-separated STRING. CCLE study ID: ccle_broad_2019.
2B Expression: HPA_get_comparative_expression_by_gene_and_cellline(gene_name="EGFR", cell_line="a549"). Only 10 lines supported: hela, mcf7, a549, hepg2, jurkat, pc3, rh30, siha, u251, ishikawa.
2C Cancer markers: CellMarker_search_by_gene(operation="search_by_gene", gene_symbol="EGFR", species="Human")
OUTPUT: Mutation table (gene, AA change, type) + expression summary per cell line.
Goal: Determine which genes are essential in candidate cell lines.
LIMITATION: DepMap_get_gene_dependencies returns gene metadata (HGNC ID, Ensembl ID) but NOT per-cell-line CRISPR scores. Full Chronos scores require depmap.org download.
Available tools: (1) DepMap_search_genes(query="EGFR") — validate gene exists. (2) DepMap_get_gene_dependencies(gene_symbol="EGFR") — metadata only. (3) Alternatives: cBioPortal CCLE for mutation data, PubMed for published screens, or direct user to depmap.org/portal.
Interpreting Chronos scores (from DepMap portal): <-0.5 = essential; ~0 = not essential; ~-1.0 = strongly essential. Selective dependency (essential in some lineages only) indicates therapeutic window.
OUTPUT: Gene validation + mutation status per cell line.
Per-cell-line Chronos scores (what the API can't give you): use the bundled script scripts/depmap_gene_dependency.py. It pulls the current DepMap Public release (CRISPRGeneEffect.csv + Model.csv) once via the public download index, caches it, and answers the dependency question directly:
# Cell lines most dependent on a gene (optionally within a lineage)
python scripts/depmap_gene_dependency.py gene KRAS --lineage Pancreas --top 20
# Genes a given cell line is most dependent on
python scripts/depmap_gene_dependency.py cell-line A375 --top 25
Output: cell line, lineage, primary disease, Chronos score (most negative = most dependent; < -0.5 ≈ dependency). For selective-dependency reasoning, compare a gene's scores across lineages.
If DepMap data is unavailable: Use cBioPortal_get_mutations(study_id="ccle_broad_2019", gene_list="KRAS") for mutation data, and the Quick Reference table below for common recommendations.
Goal: Profile drug response data.
4A PharmacoDB: PharmacoDB_get_experiments(operation="get_experiments", compound_name="Erlotinib", cell_line_name="A549", per_page=20) for dose-response (IC50, AAC, EC50). Omit compound_name to get all drugs for a cell line. Use PharmacoDB_get_biomarker_assoc(compound_name="...", tissue_name="...", mdata_type="mutation") for sensitivity biomarkers.
4B SYNERGxDB: SYNERGxDB_search_combos(drug_name_1="gemcitabine", drug_name_2="erlotinib", sample="lung"). Positive ZIP = synergy. Covers cytotoxic agents only (not targeted therapies/biologics).
4C CLUE: CLUE_get_cell_lines(operation="get_cell_lines", cell_id="MCF7") — requires CLUE_API_KEY.
OUTPUT: Drug sensitivity table (drug, IC50, AAC, dataset) + synergy data if available.
5A Druggability: DGIdb_get_drug_gene_interactions(genes=["EGFR", "KRAS"]) + MyGene_query_genes(query="EGFR") → OpenTargets_get_associated_drugs_by_target_ensemblID(ensemblId="...", size=10) + STRING_get_network(protein_ids=["EGFR"], species=9606).
5B Final Recommendation: Synthesize all phases. Explain WHY one line is better for this specific use case.
| Criterion | Weight | Score 3 (Best) | Score 2 (Acceptable) | Score 1 (Poor) | |-----------|--------|----------------|---------------------|----------------| | Mutation match | x3 | Exact mutation (e.g., KRAS G12D) | Same gene, different mutation | No mutation in gene of interest | | Co-mutation simplicity | x2 | Few co-mutations (cleaner background) | Moderate co-mutations | Complex background (3+ driver mutations) | | Gene dependency | x2 | DepMap score < -0.5 (essential) | Score -0.5 to -0.2 (moderately essential) | Score > -0.2 (not essential) | | Drug sensitivity data | x1 | In GDSC + CCLE + PRISM (3+ datasets) | In 1-2 datasets | No drug response data | | Practical factors | x1 | Adherent, well-characterized, widely used | Suspension or less common | Hard to culture, contamination-prone |
Total score = sum of (criterion score × weight). Max = 27. Rank cell lines by total score.
The best cell line depends on what you're doing with it:
| Use Case | Key Requirements | Extra Considerations | |----------|-----------------|---------------------| | CRISPR knockout screen | Adherent growth, good lentiviral transduction, pre-existing Cas9 clones (check Cellosaurus for "-Cas9" derivatives) | Doubling time matters for library coverage; <72h ideal | | Drug sensitivity testing | In PharmacoDB/GDSC, known IC50 for reference compounds | Check SYNERGxDB for combo data | | Xenograft model | Known tumorigenicity in mice, available PDX data | Check if line forms tumors in nude/NSG mice (Cellosaurus often notes this) | | Mechanism of action | Clean genetic background, gene dependency confirmed | Fewer co-mutations = easier to attribute phenotypes | | Biomarker discovery | Isogenic pairs available, well-characterized omics | Check if isogenic knockouts exist (Cellosaurus) | | Drug combination | In SYNERGxDB with combo data, known single-agent responses | ZIP score available for synergy assessment |
Check for pre-made derivatives — this can save months of lab work:
cellosaurus_search_cell_lines(q="ca:<PARENT_LINE>", size=20) — finds all derivativesIf DepMap_get_gene_dependencies fails (common for some genes):
cBioPortal_get_mutations(study_id="ccle_broad_2019", gene_list="<GENE>") as an alternative source for cell line mutation data.OUTPUT: Ranked cell line table with total scores, per-criterion breakdown, and a text recommendation explaining the top pick and runner-up with biological reasoning.
| Pattern | Question Type | Key Tools (in order) | |---------|--------------|---------------------| | 1 | "Which cell line for [cancer] + [gene]?" | DepMap_get_cell_lines → DepMap_get_gene_dependencies → COSMIC_get_mutations_by_gene → cBioPortal_get_mutations (ccle_broad_2019) → PharmacoDB_get_experiments → rank by mutation + dependency + drug sensitivity | | 2 | "Profile cell line X" | cellosaurus_search → DepMap_get_cell_line → PharmacoDB_get_cell_line → cBioPortal_get_mutations → HPA expression (if supported) → PharmacoDB_get_experiments | | 3 | "Which lines are sensitive to [drug]?" | DepMap_get_cell_lines (tissue filter) → PharmacoDB_get_experiments (compound) → PharmacoDB_get_biomarker_assoc → rank by AAC (higher=sensitive) or IC50 (lower=sensitive) | | 4 | "Compare A vs B" | Run Pattern 2 for both in parallel → side-by-side comparison table | | 5 | "Drug combos for [cell line]?" | SYNERGxDB_search_combos → PharmacoDB_get_experiments (single-agent baseline) → report synergistic pairs with ZIP scores |
| Cancer Type | Key Cell Lines | Common Mutations | |-------------|---------------|-----------------| | NSCLC | A549 (KRAS G12S), H1975 (EGFR L858R/T790M), PC-9 (EGFR del19), HCC827 (EGFR del19/amp), H460 (KRAS Q61H), H1299 (NRAS Q61K, TP53-null) | KRAS, EGFR, TP53, STK11 | | Breast | MCF7 (ER+/PR+), MDA-MB-231 (TNBC, KRAS G13D), T-47D (ER+), BT-474 (HER2+), SK-BR-3 (HER2+) | PIK3CA, TP53, BRCA1/2 | | Colorectal | HCT116 (KRAS G13D, MSI-H), SW480 (KRAS G12V), HT-29 (BRAF V600E), Caco-2 (APC), LoVo (KRAS G13D, MSI-H) | APC, KRAS, TP53, BRAF | | Melanoma | A375 (BRAF V600E), SK-MEL-28 (BRAF V600E), WM266-4 (BRAF V600D), MeWo (WT BRAF) | BRAF, NRAS, TP53 | | Pancreatic | PANC-1 (KRAS G12D), MIA PaCa-2 (KRAS G12C), AsPC-1 (KRAS G12D), Capan-1 (BRCA2 mut) | KRAS, TP53, CDKN2A, SMAD4 | | Prostate | PC-3 (AR-negative), LNCaP (AR+, PTEN-null), DU145 (AR-negative), VCaP (AR amp, TMPRSS2-ERG) | AR, PTEN, TP53, RB1 | | Ovarian | SKOV3 (HER2+, TP53 mut), OVCAR3 (TP53 mut), A2780 (sensitive), A2780cis (cisplatin-resistant) | TP53, BRCA1/2 | | Leukemia | K562 (CML, BCR-ABL), Jurkat (T-ALL), HL-60 (AML), THP-1 (AML, monocytic) | BCR-ABL, FLT3, NPM1 | | Glioblastoma | U251 (TP53 mut), U87MG (PTEN-null), T98G (TP53/PTEN mut), LN229 (TP53 mut, PTEN WT) | TP53, PTEN, EGFR, IDH1 | | Liver | HepG2 (hepatoblastoma, WT TP53), Hep3B (HBV+, TP53-null), Huh7 (HCC, TP53 Y220C) | TP53, CTNNB1, AXIN1 |
Use cell line NAME as common key across databases. IDs: DepMap=SIDM, Cellosaurus=CVCL, cBioPortal=sample (e.g. A549_LUNG), PharmacoDB/SYNERGxDB=name string. When names differ ("HCT 116" vs "HCT116"), check Cellosaurus synonyms first.
Mutation-based filtering: cBioPortal_get_mutations(study_id="ccle_broad_2019", gene_list="KRAS") → filter by amino_acid_change → extract cell line names → query other databases.
| Issue | Resolution |
|-------|-----------|
| DepMap returns no results for cell line name | Try alternative names: check Cellosaurus synonyms first |
| cBioPortal CCLE study ID unknown | Use ccle_broad_2019 as default CCLE study |
| PharmacoDB cell line name mismatch | Use PharmacoDB_search(operation="search", query="<name>") to find the canonical name |
| HPA cell line not supported | Only 10 lines supported (hela, mcf7, a549, hepg2, jurkat, pc3, rh30, siha, u251, ishikawa). Skip HPA for other lines |
| CLUE requires API key | Skip CLUE tools if CLUE_API_KEY not set; note in report |
| Gene symbol not found in DepMap | Use DepMap_search_genes(query="<symbol>") to check aliases |
| Cellosaurus accession pattern | Must be CVCL_XXXX format; search first if you only have a name |
| SYNERGxDB no results for drug combo | Drug may not be in database; SYNERGxDB covers cytotoxic agents, not most targeted therapies |
Before finalizing the report, verify:
tools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.