scientific-skills/Data Analysis/crispr-screen-analyzer/SKILL.md
Process CRISPR screening data to identify essential genes and hit candidates. Performs quality control, statistical analysis (RRA), and hit calling for pooled CRISPR screens including viability screens and drug resistance/sensitivity studies.
npx skillsauth add aipoch/medical-research-skills crispr-screen-analyzerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
4 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Analyze pooled CRISPR screening data to identify essential genes, drug resistance/sensitivity candidates, and screen quality metrics. Supports Robust Rank Aggregation (RRA) analysis, quality control assessment, and hit identification for functional genomics studies.
Key Capabilities:
scripts/main.py.references/ for task-specific guidance.See ## Prerequisites above for related details.
Python: 3.10+. Repository baseline for current packaged skills.numpy: unspecified. Declared in requirements.txt.pandas: unspecified. Declared in requirements.txt.scipy: unspecified. Declared in requirements.txt.See ## Usage above for related details.
cd "20260318/scientific-skills/Data Analytics/crispr-screen-analyzer"
python -m py_compile scripts/main.py
python scripts/main.py --help
Example run plan:
CONFIG block or documented parameters if the script uses fixed settings.python scripts/main.py with the validated inputs.See ## Workflow above for related details.
scripts/main.py.references/ contains supporting rules, prompts, or checklists.Use this command to verify that the packaged script entry point can be parsed before deeper execution.
python -m py_compile scripts/main.py
Use these concrete commands for validation. They are intentionally self-contained and avoid placeholder paths.
python -m py_compile scripts/main.py
python scripts/main.py --help
Upstream Skills:
crispr-grna-designer: Design sgRNA libraries before screening; validate library compositionfastqc-report-interpreter: Assess sequencing quality before CRISPR screen analysisalignment-quality-checker: Verify sgRNA alignment rates and mapping qualityDownstream Skills:
go-kegg-enrichment: Perform pathway enrichment on identified hit genespathway-visualization: Visualize hits in pathway contextshit-validation-planner: Design follow-up experiments for candidate genesgene-essentiality-predictor: Compare screen results with known essential gene databasesComplete Workflow:
Library Design (crispr-grna-designer) → Transduction → Sequencing → fastqc-report-interpreter → crispr-screen-analyzer → go-kegg-enrichment → Hit Validation
Assess CRISPR screen quality using established metrics including Gini index, read depth, and sgRNA dropout rates.
from scripts.main import CRISPRScreenAnalyzer
# Initialize analyzer with count matrix and sample annotations
analyzer = CRISPRScreenAnalyzer(
counts_file="sgrna_counts.txt",
samplesheet="samples.csv"
)
# Calculate QC metrics
qc_results = analyzer.qc_metrics()
# Review key metrics
print("Quality Control Metrics:")
print(f"Total reads per sample:")
for sample, reads in qc_results['total_reads'].items():
print(f" {sample}: {reads:,} reads")
print(f"\nGini index (library representation):")
for sample, gini in qc_results['gini_index'].items():
status = "✅ Good" if gini < 0.3 else "⚠️ Check" if gini < 0.4 else "❌ Poor"
print(f" {sample}: {gini:.3f} {status}")
print(f"\nZero-count sgRNAs (potential dropout):")
for sample, zeros in qc_results['zero_count_sgrnas'].items():
pct = (zeros / len(analyzer.counts)) * 100
print(f" {sample}: {zeros} ({pct:.1f}%)")
QC Metrics Explained:
| Metric | Target Range | Interpretation | |--------|--------------|----------------| | Gini Index | <0.3 | Measures library evenness; lower = more uniform | | Total Reads | >10M per sample | Sufficient depth for statistical power | | Zero-count sgRNAs | <5% | Acceptable dropout; higher indicates library loss | | Read Distribution | Log-normal | Should follow expected distribution |
Best Practices:
Common Issues and Solutions:
Issue: High Gini index (>0.4)
Issue: Excessive zero-count sgRNAs (>10%)
Calculate log2 fold changes between treatment and control conditions to identify enriched or depleted sgRNAs.
from scripts.main import CRISPRScreenAnalyzer
analyzer = CRISPRScreenAnalyzer("counts.txt", "samples.csv")
# Define sample groups
control_samples = ["Control_1", "Control_2", "Control_3"]
treatment_samples = ["Drug_1", "Drug_2", "Drug_3"]
# Calculate log fold changes
lfc = analyzer.calculate_lfc(control_samples, treatment_samples)
# Analyze distribution
print("Log Fold Change Statistics:")
print(f" Mean: {lfc.mean():.3f}")
print(f" Std: {lfc.std():.3f}")
print(f" Max: {lfc.max():.3f}")
print(f" Min: {lfc.min():.3f}")
# Identify extreme changes
strong_depletion = lfc[lfc < -2] # Strong negative selection
strong_enrichment = lfc[lfc > 2] # Strong positive selection
print(f"\nStrongly depleted sgRNAs: {len(strong_depletion)}")
print(f"Strongly enriched sgRNAs: {len(strong_enrichment)}")
LFC Calculation:
lfc = log2((treatment_mean + 1) / (control_mean + 1))
Interpretation:
| LFC Range | Interpretation | Biological Meaning | |-----------|---------------|-------------------| | LFC < -2 | Strong depletion | Essential gene or drug sensitivity | | LFC -2 to -1 | Moderate depletion | Moderate effect | | LFC -1 to 1 | No change | No significant effect | | LFC 1 to 2 | Moderate enrichment | Moderate resistance | | LFC > 2 | Strong enrichment | Resistance gene or suppressor |
Best Practices:
Common Issues and Solutions:
Issue: Skewed LFC distribution
Issue: Extreme outliers
Perform statistical analysis to identify significantly enriched or depleted sgRNAs using z-score and FDR correction.
from scripts.main import CRISPRScreenAnalyzer
analyzer = CRISPRScreenAnalyzer("counts.txt", "samples.csv")
# Calculate LFC first
lfc = analyzer.calculate_lfc(
control_samples=["Ctrl_1", "Ctrl_2"],
treatment_samples=["Treat_1", "Treat_2"]
)
# Perform RRA analysis
results = analyzer.rra_analysis(lfc, fdr_threshold=0.05)
# Review top hits
print("Top 10 Most Significant sgRNAs:")
top_hits = results.nsmallest(10, 'fdr')
print(top_hits[['sgrna', 'lfc', 'pvalue', 'fdr']].to_string(index=False))
# Summary statistics
print(f"\nTotal sgRNAs tested: {len(results)}")
print(f"Significant at FDR < 0.05: {sum(results['fdr'] < 0.05)}")
print(f"Significant depletions: {sum((results['fdr'] < 0.05) & (results['lfc'] < 0))}")
print(f"Significant enrichments: {sum((results['fdr'] < 0.05) & (results['lfc'] > 0))}")
RRA Analysis Steps:
z = (lfc - mean) / stdStatistical Output:
| Column | Description | Usage |
|--------|-------------|-------|
| sgrna | sgRNA identifier | Mapping to genes |
| lfc | Log fold change | Effect size |
| pvalue | Raw p-value | Statistical significance |
| fdr | Adjusted p-value (FDR) | Multiple testing correction |
Best Practices:
Common Issues and Solutions:
Issue: No significant hits despite visible effects
Issue: Too many significant hits
Apply statistical and biological thresholds to identify candidate genes for follow-up validation.
from scripts.main import CRISPRScreenAnalyzer
analyzer = CRISPRScreenAnalyzer("counts.txt", "samples.csv")
lfc = analyzer.calculate_lfc(["Ctrl_1", "Ctrl_2"], ["Treat_1", "Treat_2"])
results = analyzer.rra_analysis(lfc)
# Identify hits with multiple thresholds
threshold_configs = [
{"fdr": 0.05, "lfc": 1.0, "name": "Standard"},
{"fdr": 0.01, "lfc": 1.5, "name": "Stringent"},
{"fdr": 0.1, "lfc": 0.5, "name": "Permissive"}
]
for config in threshold_configs:
hits = analyzer.identify_hits(
results,
fdr_threshold=config['fdr'],
lfc_threshold=config['lfc']
)
depletions = hits[hits['lfc'] < 0]
enrichments = hits[hits['lfc'] > 0]
print(f"\n{config['name']} (FDR<{config['fdr']}, |LFC|>{config['lfc']}):")
print(f" Total hits: {len(hits)}")
print(f" Depletions: {len(depletions)}")
print(f" Enrichments: {len(enrichments)}")
# Save hits for downstream analysis
standard_hits = analyzer.identify_hits(results, fdr_threshold=0.05, lfc_threshold=1.0)
standard_hits.to_csv("hits_standard.csv", index=False)
Hit Classification:
| Category | Criteria | Biological Interpretation | |----------|----------|---------------------------| | Essential | FDR<0.05, LFC<-1 | Required for cell viability | | Drug Sensitive | FDR<0.05, LFC<-1 | Synthetic lethal with treatment | | Drug Resistant | FDR<0.05, LFC>1 | Confers resistance to treatment | | Suppressor | FDR<0.05, LFC>1 | Suppresses phenotype of interest |
Best Practices:
Common Issues and Solutions:
Issue: Single sgRNA hits
Issue: Off-target effects dominating
Aggregate sgRNA-level results to gene-level statistics for biological interpretation.
import pandas as pd
from scripts.main import CRISPRScreenAnalyzer
analyzer = CRISPRScreenAnalyzer("counts.txt", "samples.csv")
lfc = analyzer.calculate_lfc(["Ctrl_1", "Ctrl_2"], ["Treat_1", "Treat_2"])
results = analyzer.rra_analysis(lfc)
# Add gene annotations (example mapping)
sgrna_to_gene = pd.read_csv("library_annotation.csv") # sgRNA, Gene columns
results_with_gene = results.merge(sgrna_to_gene, on='sgrna')
# Aggregate to gene level
gene_results = results_with_gene.groupby('Gene').agg({
'lfc': 'mean', # Average LFC across sgRNAs
'pvalue': 'min', # Best p-value
'fdr': 'min', # Best FDR
'sgrna': 'count' # Number of sgRNAs
}).rename(columns={'sgrna': 'sgrna_count'})
# Filter genes with multiple sgRNAs
gene_results = gene_results[gene_results['sgrna_count'] >= 2]
# Identify gene-level hits
gene_hits = gene_results[
(gene_results['fdr'] < 0.05) &
(abs(gene_results['lfc']) > 1.0)
]
print(f"Gene-level hits: {len(gene_hits)}")
print("\nTop 10 hits:")
print(gene_hits.nsmallest(10, 'fdr')[['lfc', 'pvalue', 'fdr', 'sgrna_count']])
Gene Aggregation Methods:
| Method | Description | Best For | |--------|-------------|----------| | Mean LFC | Average across sgRNAs | General hit calling | | Best FDR | Most significant sgRNA | Conservative approach | | Second-best | Second most significant | Reduces outlier effects | | STARS/RRA | Rank-based aggregation | Standard CRISPR analysis |
Best Practices:
Common Issues and Solutions:
Issue: Discordant sgRNAs for same gene
Compare CRISPR screen results across multiple treatment conditions or time points.
from scripts.main import CRISPRScreenAnalyzer
analyzer = CRISPRScreenAnalyzer("counts.txt", "samples.csv")
# Define multiple comparisons
comparisons = {
"Drug_A": {
"control": ["DMSO_1", "DMSO_2"],
"treatment": ["DrugA_1", "DrugA_2"]
},
"Drug_B": {
"control": ["DMSO_1", "DMSO_2"],
"treatment": ["DrugB_1", "DrugB_2"]
},
"Combination": {
"control": ["DMSO_1", "DMSO_2"],
"treatment": ["Combo_1", "Combo_2"]
}
}
# Analyze all conditions
all_results = {}
for comp_name, samples in comparisons.items():
lfc = analyzer.calculate_lfc(samples['control'], samples['treatment'])
results = analyzer.rra_analysis(lfc)
hits = analyzer.identify_hits(results)
all_results[comp_name] = {
'lfc': lfc,
'results': results,
'hits': hits
}
print(f"{comp_name}: {len(hits)} hits")
# Find common hits across conditions
common_hits = set(all_results['Drug_A']['hits'].index)
for comp in ['Drug_B', 'Combination']:
common_hits &= set(all_results[comp]['hits'].index)
print(f"\nCommon hits across all conditions: {len(common_hits)}")
# Compare LFC correlations between conditions
import matplotlib.pyplot as plt
lfc_drugA = all_results['Drug_A']['lfc']
lfc_drugB = all_results['Drug_B']['lfc']
correlation = lfc_drugA.corr(lfc_drugB)
print(f"\nCorrelation between Drug A and Drug B: {correlation:.3f}")
Multi-Condition Analysis:
| Comparison Type | Question Addressed | Interpretation | |----------------|-------------------|----------------| | Drug vs Control | What genes mediate drug response? | Resistance/sensitivity mechanisms | | Condition A vs B | Differential genetic dependencies | Context-specific essentiality | | Time-course | How does genetic dependency change? | Temporal dynamics | | Cell line comparison | Cell-type specific dependencies | Lineage-specific vulnerabilities |
Best Practices:
Common Issues and Solutions:
Issue: High variability between replicates
From count matrix to hit identification:
# Step 1: Run QC assessment
python scripts/main.py --counts sgrna_counts.txt --samples samples.csv --output qc_results
# Step 2: Perform differential analysis
python scripts/main.py \
--counts sgrna_counts.txt \
--samples samples.csv \
--control "Ctrl_1,Ctrl_2,Ctrl_3" \
--treatment "Drug_1,Drug_2,Drug_3" \
--output drug_screen \
--fdr 0.05
# Step 3: Review results
cat drug_screen_sgrna_results.csv | head -20
Python API Usage:
from scripts.main import CRISPRScreenAnalyzer
import pandas as pd
def analyze_crispr_screen(
counts_file: str,
samplesheet: str,
control_samples: list,
treatment_samples: list,
output_prefix: str,
fdr_threshold: float = 0.05,
lfc_threshold: float = 1.0
) -> dict:
"""
Complete CRISPR screen analysis workflow.
"""
# Initialize analyzer
analyzer = CRISPRScreenAnalyzer(counts_file, samplesheet)
print(f"Loaded {analyzer.counts.shape[0]} sgRNAs x {analyzer.counts.shape[1]} samples")
# Quality control
print("\n1. Quality Control Assessment...")
qc = analyzer.qc_metrics()
# Check QC status
qc_pass = all(gini < 0.4 for gini in qc['gini_index'].values())
if not qc_pass:
print("⚠️ Warning: High Gini index detected - check library representation")
# Calculate fold changes
print("\n2. Calculating log fold changes...")
lfc = analyzer.calculate_lfc(control_samples, treatment_samples)
# Statistical analysis
print("\n3. Running RRA analysis...")
results = analyzer.rra_analysis(lfc, fdr_threshold)
# Identify hits
print("\n4. Identifying significant hits...")
hits = analyzer.identify_hits(results, fdr_threshold, lfc_threshold)
# Categorize hits
depletions = hits[hits['lfc'] < 0]
enrichments = hits[hits['lfc'] > 0]
# Save results
results.to_csv(f"{output_prefix}_sgrna_results.csv", index=False)
hits.to_csv(f"{output_prefix}_hits.csv", index=False)
# Compile summary
summary = {
'total_sgrnas': len(results),
'significant_hits': len(hits),
'depletions': len(depletions),
'enrichments': len(enrichments),
'qc_metrics': qc,
'output_files': {
'full_results': f"{output_prefix}_sgrna_results.csv",
'hits': f"{output_prefix}_hits.csv"
}
}
# Print summary
print(f"\n{'='*60}")
print("ANALYSIS SUMMARY")
print(f"{'='*60}")
print(f"Total sgRNAs: {summary['total_sgrnas']}")
print(f"Significant hits (FDR<{fdr_threshold}, |LFC|>{lfc_threshold}): {summary['significant_hits']}")
print(f" - Depletions: {summary['depletions']}")
print(f" - Enrichments: {summary['enrichments']}")
print(f"\nResults saved:")
print(f" - {summary['output_files']['full_results']}")
print(f" - {summary['output_files']['hits']}")
print(f"{'='*60}")
return summary
# Execute workflow
results = analyze_crispr_screen(
counts_file="sgrna_counts.txt",
samplesheet="samples.csv",
control_samples=["Ctrl_1", "Ctrl_2", "Ctrl_3"],
treatment_samples=["Drug_1", "Drug_2", "Drug_3"],
output_prefix="drug_resistance_screen",
fdr_threshold=0.05,
lfc_threshold=1.0
)
Expected Output Files:
analysis_results/
├── drug_resistance_screen_sgrna_results.csv # All sgRNA statistics
├── drug_resistance_screen_hits.csv # Significant hits only
└── qc_report.txt # Quality control summary
Scenario: Identify genes essential for cell survival by comparing T0 (transduction) vs T14 (14 days post-transduction).
{
"screen_type": "viability",
"comparison": "T14_vs_T0",
"expected_depletions": "Essential genes (ribosomal, splicing, etc.)",
"expected_enrichments": "None (unless suppressors of toxicity)",
"positive_controls": ["RPL30", "RPS19", "PCNA"],
"negative_controls": ["LacZ", "NTC"],
"analysis_parameters": {
"fdr_threshold": 0.05,
"lfc_threshold": 1.0,
"gene_aggregation": "mean"
}
}
Workflow:
Output Example:
Essential Gene Screen Results:
Total sgRNAs tested: 65,383
Significantly depleted: 3,847 sgRNAs (FDR<0.05, LFC<-1)
Top Essential Genes:
RPL30: mean LFC = -4.2, 5/5 sgRNAs significant
RPS19: mean LFC = -3.8, 4/5 sgRNAs significant
PCNA: mean LFC = -3.5, 5/5 sgRNAs significant
QC Metrics:
Gini index: 0.25 (excellent library representation)
Read depth: 25M per sample (sufficient)
Scenario: Identify genes whose knockout confers resistance to a cytotoxic drug (e.g., vemurafenib in BRAF-mutant melanoma).
{
"screen_type": "drug_resistance",
"treatment": "vemurafenib (2 μM)",
"control": "DMSO",
"duration": "14 days",
"expected_depletions": "Drug sensitizers, synthetic lethal",
"expected_enrichments": "Drug resistance genes",
"known_resistance_genes": ["NRAS", "MAP2K1", "MEK1"],
"analysis_parameters": {
"fdr_threshold": 0.05,
"lfc_threshold": 1.0,
"focus": "enrichments"
}
}
Workflow:
Output Example:
Drug Resistance Screen Results (Vemurafenib):
Significant enrichments: 156 sgRNAs (FDR<0.05, LFC>1)
Top Resistance Genes:
NRAS: mean LFC = +2.8, 4/5 sgRNAs enriched
MAP2K1: mean LFC = +2.5, 5/5 sgRNAs enriched
MED12: mean LFC = +2.1, 3/5 sgRNAs enriched
Validation recommended:
- Test individual sgRNAs in dose-response assay
- Confirm resistance phenotype with cell viability assay
- Check for known resistance mechanisms
Scenario: Identify genes that, when knocked out, sensitize cells to drug treatment (synthetic lethal interactions).
{
"screen_type": "drug_sensitivity",
"treatment": "PARP inhibitor (olaparib)",
"control": "DMSO",
"cell_line": "BRCA1-mutant ovarian cancer",
"expected_depletions": "DNA repair genes (synthetic lethal)",
"expected_enrichments": "Drug resistance mechanisms",
"known_synthetic_lethal": ["PARP1", "BRCA2", "PALB2"],
"analysis_parameters": {
"fdr_threshold": 0.05,
"lfc_threshold": 1.0,
"focus": "depletions"
}
}
Workflow:
Output Example:
Synthetic Lethality Screen (Olaparib in BRCA1-mutant):
Significant depletions: 234 sgRNAs (FDR<0.05, LFC<-1)
Top Synthetic Lethal Hits:
BRCA2: mean LFC = -3.2, 5/5 sgRNAs depleted
PALB2: mean LFC = -2.8, 4/5 sgRNAs depleted
RAD51C: mean LFC = -2.5, 5/5 sgRNAs depleted
Biological Interpretation:
- Strong enrichment of homologous recombination genes
- Consistent with known synthetic lethal interactions
- Potential combination therapy targets identified
Scenario: Compare genetic dependencies between two cell lines to identify lineage-specific vulnerabilities.
{
"screen_type": "comparative",
"comparison": "Melanoma_vs_Lung_cancer",
"cell_lines": ["A375", "SKMEL28", "A549", "H1299"],
"analysis_type": "differential_essentiality",
"expected_lineage_specific": {
"melanoma": ["MITF", "SOX10", "TYR"],
"lung": ["NKX2-1", "TP63"]
},
"analysis_parameters": {
"fdr_threshold": 0.05,
"lfc_threshold": 1.0,
"replicate_requirement": 2
}
}
Workflow:
Output Example:
Comparative Screen: Melanoma vs Lung Cancer
Melanoma-specific essential: 127 genes
Lung-specific essential: 203 genes
Common essential: 1,847 genes
Top Melanoma-Specific Dependencies:
MITF: LFC diff = -4.5 (essential in melanoma, not lung)
SOX10: LFC diff = -3.8
TYR: LFC diff = -3.2
Top Lung-Specific Dependencies:
NKX2-1: LFC diff = -3.9
TP63: LFC diff = -3.1
Therapeutic Implications:
- Lineage-specific targets identified
- Potential for tumor-type selective therapy
Pre-Analysis Checks:
During Analysis:
Post-Analysis Verification:
Before Validation or Publication:
Experimental Design Issues:
❌ Insufficient sequencing depth → Poor statistical power, missed hits
❌ Library bottleneck → Gini index >0.4, skewed representation
❌ Inadequate replicates → High variance, irreproducible results
❌ Wrong time point → Too early (no selection) or too late (extensive dropout)
Analysis Issues:
❌ Ignoring QC metrics → Analyzing poor quality data
❌ Incorrect sample assignment → Control/treatment mix-up
❌ Single sgRNA hits → Potential off-target effects
❌ Over-reliance on p-values → Many false positives with large library
Interpretation Issues:
❌ Ignoring cell number effects → Different growth rates confound results
❌ Off-target effects dominating → False positive hits
❌ Pan-essential vs selective → Misclassifying broadly essential genes
❌ Not validating hits → Publishing false positives
Technical Issues:
❌ Batch effects → Confounding by library prep or sequencing batch
❌ Contamination → Cross-sample contamination affects quantification
❌ Reference genome mismatch → sgRNAs not mapping correctly
❌ Incomplete annotation → sgRNAs missing gene mapping
Problem: No significant hits despite strong biological effect
Problem: Too many significant hits (1000s)
Problem: High Gini index (>0.4)
Problem: Known essential genes not identified
Problem: Discordant sgRNAs for same gene
Problem: Batch effects between replicates
Problem: Negative controls showing significant effects
Available in references/ directory:
External Resources:
Located in scripts/ directory:
main.py - CRISPR screen analysis engine with QC, RRA, and hit identification| Screen Type | Comparison | Expected Hits | Typical Duration | |-------------|-----------|---------------|------------------| | Viability | T14 vs T0 | Essential genes depleted | 10-14 days | | Drug Resistance | Drug vs DMSO | Resistance genes enriched | 14-21 days | | Drug Sensitivity | Drug vs DMSO | Sensitizers depleted | 14-21 days | | Comparative | Cell A vs Cell B | Lineage-specific dependencies | 10-14 days | | Sensitizer | Drug A+B vs Drug A | Combination targets | 10-14 days |
| Parameter | Type | Default | Required | Description |
|-----------|------|---------|----------|-------------|
| --counts, -c | string | - | Yes | sgRNA count matrix file |
| --samples, -s | string | - | Yes | Sample annotation file |
| --control | string | - | No | Control samples (comma-separated) |
| --treatment, -t | string | - | No | Treatment samples (comma-separated) |
| --output, -o | string | - | No | Output directory |
| --fdr | float | 0.05 | No | FDR threshold |
# Analyze CRISPR screen data
python scripts/main.py --counts sgrna_counts.txt --samples samplesheet.csv
# With specific control and treatment
python scripts/main.py --counts counts.txt --samples samples.csv --control "Ctrl1,Ctrl2" --treatment "Treat1,Treat2"
# Custom FDR threshold
python scripts/main.py --counts counts.txt --samples samples.csv --fdr 0.01 --output ./results
| Risk Indicator | Assessment | Level | |----------------|------------|-------| | Code Execution | Python script executed locally | Low | | Network Access | No external API calls | Low | | File System Access | Read count files, write results | Low | | Data Exposure | Processes genomic screening data | Medium | | PHI Risk | May contain cell line genetic info | Low |
# Python 3.7+
numpy
pandas
scipy
Last Updated: 2026-02-09
Skill ID: 183
Version: 2.0 (K-Dense Standard)
Every final response should make these items explicit when they are relevant:
scripts/main.py fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.This skill accepts requests that match the documented purpose of crispr-screen-analyzer and include enough context to complete the workflow safely.
Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:
crispr-screen-analyzeronly handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.
Use the following fixed structure for non-trivial requests:
If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.
tools
Generates complete conventional oncology bulk-transcriptome biomarker and hub-gene research designs from a user-provided cancer type and study direction. Always use this skill whenever a user wants to design, plan, or build a tumor bioinformatics study centered on differential expression, prognostic filtering or risk modeling, PPI-based hub-gene prioritization, diagnostic/prognostic evaluation, clinical association, immune infiltration context, methylation context, and optional tissue or cell validation. Covers five study patterns (signature-first prognostic workflow, hub-gene-first biomarker workflow, hybrid signature-to-hub workflow, immune-context biomarker workflow, translational validation workflow) and always outputs four workload configs (Lite / Standard / Advanced / Publication+) with recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version, publication upgrade path...
development
Generates complete conventional non-oncology bioinformatics research designs from a user-provided disease context, process-related gene family or biological theme, and validation direction. Use when a study centers on multi-dataset bulk transcriptome integration, DEG analysis, process-gene intersection, enrichment analysis, GSEA, PPI hub-gene prioritization, TF/miRNA regulatory networks, ROC-based biomarker evaluation, and immune infiltration analysis. Covers five study patterns (process-DEG discovery, enrichment/GSEA interpretation, hub-gene prioritization, regulatory-network and immune interpretation, multi-layer public validation) and always outputs Lite / Standard / Advanced / Publication+ with a recommended primary plan, stepwise workflow, figure plan, validation hierarchy, minimal executable version, publication upgrade path, and strictly verified literature retrieval.
tools
Plans confounder control, variable adjustment logic, and bias mitigation strategies at the protocol stage for clinical, epidemiologic, translational, observational, and biomarker studies. Always use this skill when a user needs to identify major confounders, decide which variables should or should not be adjusted for, compare matching/stratification/weighting approaches, anticipate selection or measurement bias, or pressure-test a study design before execution. Focus on bias sensing, causal structure awareness, variable-role classification, and critical design review rather than generic statistical advice.
testing
Generates complete comparative network-toxicology research designs from a user-provided exposure pair, shared toxic phenotype, and validation direction. Use when a study centers on two related exposures under one outcome and needs target collection, shared-vs-specific target decomposition, enrichment, PPI hub prioritization, docking, optional transcriptomic cross-checks, and conservative mechanistic synthesis. Covers five study patterns and always outputs Lite / Standard / Advanced / Publication+ with a recommended primary plan, stepwise workflow, figure plan, validation hierarchy, minimal executable version, publication upgrade path, and strictly verified literature retrieval.