machine-learning/prediction-explanation/SKILL.md
Explains machine learning predictions on omics data using SHAP values and LIME for feature attribution. Identifies which genes or features drive classifier decisions. Use when interpreting biomarker classifiers or understanding model predictions.
npx skillsauth add GPTomics/bioSkills bio-machine-learning-prediction-explanationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: matplotlib 3.8+, numpy 1.26+, pandas 2.2+, scikit-learn 1.4+
Before using code patterns, verify installed versions match. If versions differ:
pip show <package> then help(module.function) to check signaturesIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
"Which genes drive my classifier's predictions?" -> Compute per-feature attribution scores using SHAP values or LIME to explain which genes or features contribute most to model decisions.
shap.TreeExplainer(model).shap_values(X), lime.lime_tabular.LimeTabularExplainer()Goal: Compute exact SHAP values for tree-based models to quantify each feature's contribution to predictions.
Approach: Use TreeExplainer for polynomial-time exact Shapley value computation on Random Forest or boosted tree models.
import shap
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
explainer = shap.TreeExplainer(model)
# CORRECT (v0.47+): Call explainer directly, NOT .shap_values()
shap_values = explainer(X_test)
# shap_values is an Explanation object
# .values has shape (n_samples, n_features) for binary
# .base_values has expected value
print(f'SHAP values shape: {shap_values.values.shape}')
import shap
import matplotlib.pyplot as plt
# Beeswarm plot: shows impact direction and magnitude
shap.plots.beeswarm(shap_values, max_display=20, show=False)
plt.tight_layout()
plt.savefig('shap_summary.png', dpi=150, bbox_inches='tight')
plt.close()
# Bar plot: mean absolute SHAP values
shap.plots.bar(shap_values, max_display=20, show=False)
plt.savefig('shap_bar.png', dpi=150, bbox_inches='tight')
# Explain single prediction
sample_idx = 0
shap.plots.force(shap_values[sample_idx], matplotlib=True, show=False)
plt.savefig('shap_force_single.png', dpi=150, bbox_inches='tight')
# Waterfall plot (cleaner alternative)
shap.plots.waterfall(shap_values[sample_idx], max_display=15, show=False)
plt.savefig('shap_waterfall.png', dpi=150, bbox_inches='tight')
from xgboost import XGBClassifier
import shap
xgb = XGBClassifier(n_estimators=100, random_state=42, eval_metric='logloss')
xgb.fit(X_train, y_train)
explainer = shap.TreeExplainer(xgb)
shap_values = explainer(X_test)
# For XGBoost, shap_values contains log-odds contributions
shap.plots.beeswarm(shap_values, max_display=20)
from lime.lime_tabular import LimeTabularExplainer
import numpy as np
explainer = LimeTabularExplainer(
X_train.values,
feature_names=X_train.columns.tolist(),
class_names=['control', 'disease'],
mode='classification'
)
# Explain single instance
sample_idx = 0
exp = explainer.explain_instance(
X_test.iloc[sample_idx].values,
model.predict_proba,
num_features=20
)
exp.save_to_file('lime_explanation.html')
# Or get as list: exp.as_list()
import pandas as pd
import numpy as np
# Mean absolute SHAP value per feature
mean_shap = np.abs(shap_values.values).mean(axis=0)
feature_importance = pd.DataFrame({
'feature': X_test.columns,
'mean_shap': mean_shap
}).sort_values('mean_shap', ascending=False)
top_features = feature_importance.head(20)
top_features.to_csv('shap_top_features.csv', index=False)
# Shows how SHAP value varies with feature value
# Automatically colors by interacting feature
shap.plots.scatter(shap_values[:, 'GENE1'], color=shap_values, show=False)
plt.savefig('shap_dependence.png', dpi=150, bbox_inches='tight')
explainer = shap.TreeExplainer(model)
shap_values = explainer(X_test)
# For multi-class, shap_values.values has shape (n_samples, n_features, n_classes)
# Access class-specific values:
class_idx = 1
shap.plots.beeswarm(shap_values[:, :, class_idx], max_display=20)
development
Find restriction enzyme cut sites in DNA sequences using Biopython Bio.Restriction. Search with single enzymes, batches of enzymes, or commercially available enzyme sets. Returns cut positions for linear or circular DNA. Use when finding restriction enzyme cut sites in sequences.
development
Create restriction maps showing enzyme cut positions on DNA sequences using Biopython Bio.Restriction. Visualize cut sites, calculate distances between sites, and generate text or graphical maps. Use when creating or analyzing restriction maps.
development
Analyze restriction digest fragments using Biopython Bio.Restriction. Predict fragment sizes, get fragment sequences, simulate gel electrophoresis patterns, and perform double digests. Use when analyzing restriction digest fragment patterns.
development
Select restriction enzymes by criteria using Biopython Bio.Restriction. Find enzymes that cut once, don't cut, produce specific overhangs, are commercially available, or have compatible ends for cloning. Use when selecting restriction enzymes for cloning or analysis.