Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

arbazkhan971/ml

Name: ml
Author: arbazkhan971

skills/ml/SKILL.md

npx skillsauth add arbazkhan971/godmode ml

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Activate When

/godmode:ml, "train a model", "compare experiments"
"evaluate model", "check for bias", "dataset quality"
ML-related code detected (training loops, features)

Workflow

1. Experiment Definition

ID: EXP-<YYYY-MM-DD>-<NNN>
Hypothesis: <what you expect and why>
Objective: <metric to optimize>
Baseline: <current best or naive baseline>
Task: classification|regression|ranking|generation
Framework: PyTorch|TensorFlow|scikit-learn|JAX|XGBoost

# Check for ML frameworks
pip list 2>/dev/null | grep -iE "torch|tensorflow|sklearn"
cat requirements.txt 2>/dev/null | grep -iE "torch|tf"

2. Hyperparameter Management

search:
  strategy: grid|random|bayesian|hyperband
  space:
    learning_rate: [1e-5, 1e-4, 1e-3, 1e-2]
    batch_size: [16, 32, 64, 128]
    dropout: uniform(0.1, 0.5)
    hidden_size: [128, 256, 512, 1024]
  trials: <total>

IF trials > 50: use Bayesian or Hyperband (not grid). IF search space > 4 dimensions: use random search minimum.

3. Dataset Validation

Total samples: <N>
Split: train=<N>(<pct>%) / val=<N>(<pct>%) / test=<N>
Quality checks:
  Missing values: <count per feature>
  Duplicates: <count exact duplicates>
  Outliers: <count, method used>
  Class balance: <ratio of majority/minority>

IF class imbalance > 10:1: use stratified sampling

class weights or oversampling. IF missing > 5% for any feature: investigate before imputing.

4. Bias Detection

Protected attributes: <gender, race, age, geography>
Per-attribute:
| Attribute | Group | Samples | Accuracy | FPR | FNR |
IF max_group_accuracy - min_group_accuracy > 5%:
  FLAG bias. Investigate feature correlations.
IF FNR disparity > 10% across groups:
  BLOCK deployment until mitigated.

5. Training and Tracking

Epoch: <current>/<total>
Training loss: <value> (trend: decreasing|plateau)
Validation loss: <value> (trend)
Primary metric: <value> (best: <val> at epoch <N>)

IF val_loss increases 3 consecutive epochs: early stop. IF train_loss << val_loss (gap > 2x): overfitting.

6. Model Evaluation

Test set: <N samples> (used ONCE for final eval)
Accuracy: <val>  Precision: <val>  Recall: <val>
F1: <val>  AUC-ROC: <val>  AUC-PR: <val>
Statistical significance vs baseline:
  p=<val> (paired bootstrap, 10K iterations)

IF p > 0.05: improvement not significant, iterate. IF improvement < 1% absolute: likely noise.

7. Experiment Comparison

| Experiment | F1 | AUC | Latency | Size | Params |
Winner selection: best accuracy/latency tradeoff.

8. Commit and Transition

Commit: "ml: EXP-<ID> — <metric>=<value> (<delta>)" IF best found: -> /godmode:mlops to deploy. IF bias detected: address before deployment.

Hard Rules

NEVER evaluate on test set during development.
EVERY experiment: git SHA, data version, params, seeds.
NEVER report without comparing to baseline.
NEVER deploy model failing bias checks.
ALWAYS test significance (bootstrap, p < 0.05).

TSV Logging

Append .godmode/ml-results.tsv:

timestamp	experiment_id	model	metric	baseline	result	status

Keep/Discard

KEEP if: significant improvement AND bias passes
  AND no data leakage.
DISCARD if: no significance OR bias violation
  OR leakage found. Log both.

Stop Conditions

STOP when FIRST of:
  - Best model beats baseline significantly
  - Bias check passes all attributes
  - 3 consecutive experiments show no improvement

Autonomous Operation

On failure: git reset --hard HEAD~1. Never pause.

Error Recovery

arbazkhan971/ml

skills/ml/SKILL.md

ML development and experimentation.

9 stars

development

Updated Apr 17, 2026

$ install --global

skillsauth

npx skillsauth add arbazkhan971/godmode ml

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 17, 2026, 3:15 AM4.0s2 files scanned

SKILL.md

name:: ml
description:: ML development and experimentation.

Activate When

/godmode:ml, "train a model", "compare experiments"
"evaluate model", "check for bias", "dataset quality"
ML-related code detected (training loops, features)

Workflow

1. Experiment Definition

ID: EXP-<YYYY-MM-DD>-<NNN>
Hypothesis: <what you expect and why>
Objective: <metric to optimize>
Baseline: <current best or naive baseline>
Task: classification|regression|ranking|generation
Framework: PyTorch|TensorFlow|scikit-learn|JAX|XGBoost

# Check for ML frameworks
pip list 2>/dev/null | grep -iE "torch|tensorflow|sklearn"
cat requirements.txt 2>/dev/null | grep -iE "torch|tf"

2. Hyperparameter Management

search:
  strategy: grid|random|bayesian|hyperband
  space:
    learning_rate: [1e-5, 1e-4, 1e-3, 1e-2]
    batch_size: [16, 32, 64, 128]
    dropout: uniform(0.1, 0.5)
    hidden_size: [128, 256, 512, 1024]
  trials: <total>

IF trials > 50: use Bayesian or Hyperband (not grid). IF search space > 4 dimensions: use random search minimum.

3. Dataset Validation

Total samples: <N>
Split: train=<N>(<pct>%) / val=<N>(<pct>%) / test=<N>
Quality checks:
  Missing values: <count per feature>
  Duplicates: <count exact duplicates>
  Outliers: <count, method used>
  Class balance: <ratio of majority/minority>

IF class imbalance > 10:1: use stratified sampling

class weights or oversampling. IF missing > 5% for any feature: investigate before imputing.

4. Bias Detection

Protected attributes: <gender, race, age, geography>
Per-attribute:
| Attribute | Group | Samples | Accuracy | FPR | FNR |
IF max_group_accuracy - min_group_accuracy > 5%:
  FLAG bias. Investigate feature correlations.
IF FNR disparity > 10% across groups:
  BLOCK deployment until mitigated.

5. Training and Tracking

Epoch: <current>/<total>
Training loss: <value> (trend: decreasing|plateau)
Validation loss: <value> (trend)
Primary metric: <value> (best: <val> at epoch <N>)

IF val_loss increases 3 consecutive epochs: early stop. IF train_loss << val_loss (gap > 2x): overfitting.

6. Model Evaluation

Test set: <N samples> (used ONCE for final eval)
Accuracy: <val>  Precision: <val>  Recall: <val>
F1: <val>  AUC-ROC: <val>  AUC-PR: <val>
Statistical significance vs baseline:
  p=<val> (paired bootstrap, 10K iterations)

IF p > 0.05: improvement not significant, iterate. IF improvement < 1% absolute: likely noise.

7. Experiment Comparison

| Experiment | F1 | AUC | Latency | Size | Params |
Winner selection: best accuracy/latency tradeoff.

8. Commit and Transition

Commit: "ml: EXP-<ID> — <metric>=<value> (<delta>)" IF best found: -> /godmode:mlops to deploy. IF bias detected: address before deployment.

Hard Rules

NEVER evaluate on test set during development.
EVERY experiment: git SHA, data version, params, seeds.
NEVER report without comparing to baseline.
NEVER deploy model failing bias checks.
ALWAYS test significance (bootstrap, p < 0.05).

TSV Logging

Append .godmode/ml-results.tsv:

timestamp	experiment_id	model	metric	baseline	result	status

Keep/Discard

KEEP if: significant improvement AND bias passes
  AND no data leakage.
DISCARD if: no significance OR bias violation
  OR leakage found. Log both.

Stop Conditions

STOP when FIRST of:
  - Best model beats baseline significantly
  - Bias check passes all attributes
  - 3 consecutive experiments show no improvement

Autonomous Operation

On failure: git reset --hard HEAD~1. Never pause.

Error Recovery

Related Skills

arbazkhan971/webperf

development

VerifiedTrustedCommunity

Web performance optimization. Lighthouse, bundle analysis, code splitting, image optimization, critical CSS, fonts, service workers, CDN.

10SKILL.mdUpdated Apr 26, 2026

arbazkhan971/webhook

development

VerifiedTrustedCommunity

Webhook design, delivery, retry, HMAC verification, event subscriptions, dead letter queues.

10SKILL.mdUpdated Apr 26, 2026

arbazkhan971/vue

development

VerifiedTrustedCommunity

Vue.js mastery. Composition API, Pinia, Vue Router, Nuxt SSR/SSG, Vite optimization, testing.

10SKILL.mdUpdated Apr 26, 2026

arbazkhan971/verify

development

VerifiedTrustedCommunity

Evidence gate. Run command, read full output, confirm or deny claim. No trust, only proof.

10SKILL.mdUpdated Apr 26, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/arbazkhan971/godmode.git

# Copy into Claude Code skills folder (global)
cp -r godmode/skills/ml ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

arbazkhan971/godmode

9 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT