plugin/skills/tooluniverse-cell-line-profiling/SKILL.md
Cancer cell-line selection and profiling for experimental model choice. Cross-references DepMap, Cellosaurus, COSMIC, PharmacoDB to deliver identity verification, mutation/CNV profile, gene dependencies, drug sensitivities, and druggable targets. Use to answer 'which cell line should I use for studying gene X?' or 'is this cell line a good model for cancer Y?'. Outputs ranked recommendations with rationale, growth characteristics, and known pitfalls.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-cell-line-profilingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Comprehensive profiling of cancer cell lines for experimental model selection. Transforms a query (cancer type, gene, or cell line name) into an actionable report covering identity verification, molecular features, gene dependencies, drug sensitivities, and druggable targets.
KEY PRINCIPLES:
When uncertain about any scientific fact, SEARCH databases first rather than reasoning from memory. A database-verified answer is always more reliable than a guess.
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
Apply for: cell line selection by cancer type/gene, cell line profiling, gene dependencies, drug sensitivity queries, cell line comparisons, mutation checks.
BEFORE calling ANY tool, verify parameters against this table.
| Tool | Key Parameters | Notes |
|------|---------------|-------|
| DepMap_search_cell_lines | query (required) | Search by name, e.g., "A549", "MCF" |
| DepMap_get_cell_line | model_name OR model_id | Name: "A549"; ID: "SIDM00001" |
| DepMap_get_cell_lines | tissue, cancer_type, page_size | Filter by tissue (e.g., "Lung") |
| DepMap_get_gene_dependencies | gene_symbol (required), model_id | Gene effect scores; negative = essential |
| DepMap_search_genes | query (required) | Validate gene symbol in DepMap first |
| cellosaurus_search_cell_lines | q (required), size | Solr syntax: id:HeLa, ox:9606 AND char:cancer |
| cellosaurus_get_cell_line_info | accession (required, CVCL_ format) | Full cell line record |
| cellosaurus_query_converter | query (required) | Natural language to Solr syntax |
| COSMIC_search_mutations | terms OR query, max_results | Search "BRAF V600E" or gene name |
| COSMIC_get_mutations_by_gene | gene OR gene_name, max_results | All mutations for a gene |
| PharmacoDB_get_cell_line | operation="get_cell_line", cell_name | Cell line metadata + datasets |
| PharmacoDB_get_experiments | operation="get_experiments", compound_name, cell_line_name, dataset_name, per_page | Drug response data (IC50, AAC, EC50) |
| PharmacoDB_get_biomarker_assoc | operation="get_biomarker_associations", compound_name, tissue_name, mdata_type, per_page | Gene-drug sensitivity correlations |
| PharmacoDB_search | operation="search", query | Find PharmacoDB IDs |
| CellMarker_search_cancer_markers | operation="search_cancer_markers", cancer_type, gene_symbol, cell_type | Cancer cell markers |
| CellMarker_search_by_gene | operation="search_by_gene", gene_symbol (required), species | Cell types expressing a gene |
| HPA_get_comparative_expression_by_gene_and_cellline | gene_name (required), cell_line (required) | Supported lines: ishikawa, hela, mcf7, a549, hepg2, jurkat, pc3, rh30, siha, u251 |
| CLUE_get_cell_lines | operation="get_cell_lines", cell_id | L1000 CMap cell line info (requires CLUE_API_KEY) |
| SYNERGxDB_search_combos | drug_name_1, drug_name_2, sample (tissue or cell ID) | Drug combination synergy (ZIP, Bliss, Loewe) |
| SYNERGxDB_list_cell_lines | - | All cell lines in SYNERGxDB |
| DGIdb_get_drug_gene_interactions | genes: list[str] | Druggable gene interactions |
| OpenTargets_get_associated_drugs_by_target_ensemblID | ensemblId, size | Drugs targeting a gene |
| STRING_get_network | protein_ids: list[str], species: int (9606) | PPI network for gene context |
| MyGene_query_genes | query (NOT q) | Resolve gene symbol to Ensembl ID |
| cBioPortal_get_mutations | study_id, gene_list (STRING, not array) | Cell line mutations from CCLE |
Input: Cancer type AND/OR Gene of interest AND/OR Cell line name(s)
Phase 1: Cell Line Identification
- Search and verify cell line identity (Cellosaurus)
- Get metadata: species, disease, STR profile, cross-references
- If cancer type given without cell line: find candidate lines (DepMap)
Phase 2: Molecular Profiling
- Mutation landscape (COSMIC, cBioPortal CCLE)
- Gene expression (HPA, DepMap)
- Cancer markers (CellMarker)
Phase 3: Gene Dependencies (CRISPR Screens)
- Gene essentiality scores from DepMap
- Identify selectively essential genes
- Compare across cell lines if multiple candidates
Phase 4: Drug Sensitivity
- IC50/AAC from PharmacoDB (GDSC, CCLE, CTRPv2, PRISM)
- Biomarker associations for drug response
- Drug combination synergy (SYNERGxDB)
Phase 5: Target Druggability & Recommendations
- Druggable targets (DGIdb, OpenTargets)
- Final ranked recommendation with rationale
Goal: Verify cell line identity and find candidates.
If specific cell line given: (1) cellosaurus_search_cell_lines(q="id:<NAME>") → get CVCL accession, species, disease, contamination flags. (2) cellosaurus_get_cell_line_info(accession="CVCL_XXXX") for STR profile. (3) DepMap_get_cell_line(model_name="...") for tissue, cancer_type, MSI, ploidy. (4) PharmacoDB_get_cell_line(operation="get_cell_line", cell_name="...") for datasets.
If cancer type only: (1) DepMap_get_cell_lines(tissue="Lung", page_size=20). (2) Narrow by gene mutations/dependencies in Phases 2-3. (3) CellMarker_search_cancer_markers(operation="search_cancer_markers", cancer_type="Lung").
OUTPUT: Table of candidate cell lines with: name, tissue, cancer type, key identifiers.
Goal: Characterize mutational and expression landscape.
2A Mutations: COSMIC_get_mutations_by_gene(gene="EGFR") + cBioPortal_get_mutations(study_id="ccle_broad_2019", gene_list="EGFR,KRAS,TP53"). Note: gene_list is a comma-separated STRING. CCLE study ID: ccle_broad_2019.
2B Expression: HPA_get_comparative_expression_by_gene_and_cellline(gene_name="EGFR", cell_line="a549"). Only 10 lines supported: hela, mcf7, a549, hepg2, jurkat, pc3, rh30, siha, u251, ishikawa.
2C Cancer markers: CellMarker_search_by_gene(operation="search_by_gene", gene_symbol="EGFR", species="Human")
OUTPUT: Mutation table (gene, AA change, type) + expression summary per cell line.
Goal: Determine which genes are essential in candidate cell lines.
LIMITATION: DepMap_get_gene_dependencies returns gene metadata (HGNC ID, Ensembl ID) but NOT per-cell-line CRISPR scores. Full Chronos scores require depmap.org download.
Available tools: (1) DepMap_search_genes(query="EGFR") — validate gene exists. (2) DepMap_get_gene_dependencies(gene_symbol="EGFR") — metadata only. (3) Alternatives: cBioPortal CCLE for mutation data, PubMed for published screens, or direct user to depmap.org/portal.
Interpreting Chronos scores (from DepMap portal): <-0.5 = essential; ~0 = not essential; ~-1.0 = strongly essential. Selective dependency (essential in some lineages only) indicates therapeutic window.
OUTPUT: Gene validation + mutation status per cell line.
Offline DepMap analysis (when API lacks CRISPR scores): Download CRISPRGeneEffect.csv + Model.csv from https://depmap.org/portal/download/all/. Load with pandas, find gene column (format: "KRAS (3845)"), merge with metadata, filter by lineage, sort by score. Most negative Chronos score = most dependent.
If DepMap data is unavailable: Use cBioPortal_get_mutations(study_id="ccle_broad_2019", gene_list="KRAS") for mutation data, and the Quick Reference table below for common recommendations.
Goal: Profile drug response data.
4A PharmacoDB: PharmacoDB_get_experiments(operation="get_experiments", compound_name="Erlotinib", cell_line_name="A549", per_page=20) for dose-response (IC50, AAC, EC50). Omit compound_name to get all drugs for a cell line. Use PharmacoDB_get_biomarker_assoc(compound_name="...", tissue_name="...", mdata_type="mutation") for sensitivity biomarkers.
4B SYNERGxDB: SYNERGxDB_search_combos(drug_name_1="gemcitabine", drug_name_2="erlotinib", sample="lung"). Positive ZIP = synergy. Covers cytotoxic agents only (not targeted therapies/biologics).
4C CLUE: CLUE_get_cell_lines(operation="get_cell_lines", cell_id="MCF7") — requires CLUE_API_KEY.
OUTPUT: Drug sensitivity table (drug, IC50, AAC, dataset) + synergy data if available.
5A Druggability: DGIdb_get_drug_gene_interactions(genes=["EGFR", "KRAS"]) + MyGene_query_genes(query="EGFR") → OpenTargets_get_associated_drugs_by_target_ensemblID(ensemblId="...", size=10) + STRING_get_network(protein_ids=["EGFR"], species=9606).
5B Final Recommendation: Synthesize all phases. Explain WHY one line is better for this specific use case.
| Criterion | Weight | Score 3 (Best) | Score 2 (Acceptable) | Score 1 (Poor) | |-----------|--------|----------------|---------------------|----------------| | Mutation match | x3 | Exact mutation (e.g., KRAS G12D) | Same gene, different mutation | No mutation in gene of interest | | Co-mutation simplicity | x2 | Few co-mutations (cleaner background) | Moderate co-mutations | Complex background (3+ driver mutations) | | Gene dependency | x2 | DepMap score < -0.5 (essential) | Score -0.5 to -0.2 (moderately essential) | Score > -0.2 (not essential) | | Drug sensitivity data | x1 | In GDSC + CCLE + PRISM (3+ datasets) | In 1-2 datasets | No drug response data | | Practical factors | x1 | Adherent, well-characterized, widely used | Suspension or less common | Hard to culture, contamination-prone |
Total score = sum of (criterion score × weight). Max = 27. Rank cell lines by total score.
The best cell line depends on what you're doing with it:
| Use Case | Key Requirements | Extra Considerations | |----------|-----------------|---------------------| | CRISPR knockout screen | Adherent growth, good lentiviral transduction, pre-existing Cas9 clones (check Cellosaurus for "-Cas9" derivatives) | Doubling time matters for library coverage; <72h ideal | | Drug sensitivity testing | In PharmacoDB/GDSC, known IC50 for reference compounds | Check SYNERGxDB for combo data | | Xenograft model | Known tumorigenicity in mice, available PDX data | Check if line forms tumors in nude/NSG mice (Cellosaurus often notes this) | | Mechanism of action | Clean genetic background, gene dependency confirmed | Fewer co-mutations = easier to attribute phenotypes | | Biomarker discovery | Isogenic pairs available, well-characterized omics | Check if isogenic knockouts exist (Cellosaurus) | | Drug combination | In SYNERGxDB with combo data, known single-agent responses | ZIP score available for synergy assessment |
Check for pre-made derivatives — this can save months of lab work:
cellosaurus_search_cell_lines(q="ca:<PARENT_LINE>", size=20) — finds all derivativesIf DepMap_get_gene_dependencies fails (common for some genes):
cBioPortal_get_mutations(study_id="ccle_broad_2019", gene_list="<GENE>") as an alternative source for cell line mutation data.OUTPUT: Ranked cell line table with total scores, per-criterion breakdown, and a text recommendation explaining the top pick and runner-up with biological reasoning.
| Pattern | Question Type | Key Tools (in order) | |---------|--------------|---------------------| | 1 | "Which cell line for [cancer] + [gene]?" | DepMap_get_cell_lines → DepMap_get_gene_dependencies → COSMIC_get_mutations_by_gene → cBioPortal_get_mutations (ccle_broad_2019) → PharmacoDB_get_experiments → rank by mutation + dependency + drug sensitivity | | 2 | "Profile cell line X" | cellosaurus_search → DepMap_get_cell_line → PharmacoDB_get_cell_line → cBioPortal_get_mutations → HPA expression (if supported) → PharmacoDB_get_experiments | | 3 | "Which lines are sensitive to [drug]?" | DepMap_get_cell_lines (tissue filter) → PharmacoDB_get_experiments (compound) → PharmacoDB_get_biomarker_assoc → rank by AAC (higher=sensitive) or IC50 (lower=sensitive) | | 4 | "Compare A vs B" | Run Pattern 2 for both in parallel → side-by-side comparison table | | 5 | "Drug combos for [cell line]?" | SYNERGxDB_search_combos → PharmacoDB_get_experiments (single-agent baseline) → report synergistic pairs with ZIP scores |
| Cancer Type | Key Cell Lines | Common Mutations | |-------------|---------------|-----------------| | NSCLC | A549 (KRAS G12S), H1975 (EGFR L858R/T790M), PC-9 (EGFR del19), HCC827 (EGFR del19/amp), H460 (KRAS Q61H), H1299 (NRAS Q61K, TP53-null) | KRAS, EGFR, TP53, STK11 | | Breast | MCF7 (ER+/PR+), MDA-MB-231 (TNBC, KRAS G13D), T-47D (ER+), BT-474 (HER2+), SK-BR-3 (HER2+) | PIK3CA, TP53, BRCA1/2 | | Colorectal | HCT116 (KRAS G13D, MSI-H), SW480 (KRAS G12V), HT-29 (BRAF V600E), Caco-2 (APC), LoVo (KRAS G13D, MSI-H) | APC, KRAS, TP53, BRAF | | Melanoma | A375 (BRAF V600E), SK-MEL-28 (BRAF V600E), WM266-4 (BRAF V600D), MeWo (WT BRAF) | BRAF, NRAS, TP53 | | Pancreatic | PANC-1 (KRAS G12D), MIA PaCa-2 (KRAS G12C), AsPC-1 (KRAS G12D), Capan-1 (BRCA2 mut) | KRAS, TP53, CDKN2A, SMAD4 | | Prostate | PC-3 (AR-negative), LNCaP (AR+, PTEN-null), DU145 (AR-negative), VCaP (AR amp, TMPRSS2-ERG) | AR, PTEN, TP53, RB1 | | Ovarian | SKOV3 (HER2+, TP53 mut), OVCAR3 (TP53 mut), A2780 (sensitive), A2780cis (cisplatin-resistant) | TP53, BRCA1/2 | | Leukemia | K562 (CML, BCR-ABL), Jurkat (T-ALL), HL-60 (AML), THP-1 (AML, monocytic) | BCR-ABL, FLT3, NPM1 | | Glioblastoma | U251 (TP53 mut), U87MG (PTEN-null), T98G (TP53/PTEN mut), LN229 (TP53 mut, PTEN WT) | TP53, PTEN, EGFR, IDH1 | | Liver | HepG2 (hepatoblastoma, WT TP53), Hep3B (HBV+, TP53-null), Huh7 (HCC, TP53 Y220C) | TP53, CTNNB1, AXIN1 |
Use cell line NAME as common key across databases. IDs: DepMap=SIDM, Cellosaurus=CVCL, cBioPortal=sample (e.g. A549_LUNG), PharmacoDB/SYNERGxDB=name string. When names differ ("HCT 116" vs "HCT116"), check Cellosaurus synonyms first.
Mutation-based filtering: cBioPortal_get_mutations(study_id="ccle_broad_2019", gene_list="KRAS") → filter by amino_acid_change → extract cell line names → query other databases.
| Issue | Resolution |
|-------|-----------|
| DepMap returns no results for cell line name | Try alternative names: check Cellosaurus synonyms first |
| cBioPortal CCLE study ID unknown | Use ccle_broad_2019 as default CCLE study |
| PharmacoDB cell line name mismatch | Use PharmacoDB_search(operation="search", query="<name>") to find the canonical name |
| HPA cell line not supported | Only 10 lines supported (hela, mcf7, a549, hepg2, jurkat, pc3, rh30, siha, u251, ishikawa). Skip HPA for other lines |
| CLUE requires API key | Skip CLUE tools if CLUE_API_KEY not set; note in report |
| Gene symbol not found in DepMap | Use DepMap_search_genes(query="<symbol>") to check aliases |
| Cellosaurus accession pattern | Must be CVCL_XXXX format; search first if you only have a name |
| SYNERGxDB no results for drug combo | Drug may not be in database; SYNERGxDB covers cytotoxic agents, not most targeted therapies |
Before finalizing the report, verify:
tools
PCR / qPCR primer and oligo design — design forward/reverse primers for a target region (SantaLucia nearest-neighbor thermodynamics), compute melting temperature (Tm) and annealing temperature (Ta), check GC content, and screen an oligo for hairpins and primer-dimers. Use when you need primers for a sequence, want to QC an existing primer pair, or need the Tm of an oligo. Covers the primer-design rules (Tm matching, GC clamp, 3'-end, length) and the tools' constraint quirks.
tools
Pharmacokinetic (PK) analysis of concentration-time data — non-compartmental analysis (NCA) for Cmax, Tmax, AUC (0-t and 0-∞), terminal half-life, clearance (CL), volume of distribution (Vd), MRT, and absolute bioavailability (F). Also one-compartment fitting. Use when you have plasma/serum drug concentrations over time after a dose and need PK parameters, or to compute bioavailability from IV + oral AUCs. NOT for ADMET property prediction from structure (use tooluniverse-admet-prediction).
tools
Molecular cloning assembly design — Gibson Assembly (overlap design for seamless multi-fragment joining) and Golden Gate Assembly (Type IIS / BsaI / BbsI design with unique 4-bp fusion overhangs). Use when you need to plan how to join DNA fragments into a construct, design assembly overlaps/overhangs, or decide between cloning methods. Covers the domestication (internal-site removal), overhang-uniqueness, and overlap-Tm rules. For PCR primers to generate the fragments, see tooluniverse-primer-design.
tools
Meta-analysis / evidence synthesis — pool effect sizes across studies (odds ratios, risk ratios, hazard ratios, mean differences, correlations, GWAS betas) with fixed- or random-effects models, quantify heterogeneity (Q, I², τ²), and build a forest plot. Use when you have results from MULTIPLE studies and need a single pooled estimate, or to synthesize evidence from a systematic review / multiple GWAS / replicated experiments. Handles the error-prone effect-size + standard-error preparation (converting OR/HR/CI, two-group means±SD, proportions, and correlations into the (effect, SE) the pooling step needs).