Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

abelrguezr/ml-data-prep-eval

Name: ml-data-prep-eval
Author: abelrguezr

skills/AI/AI-Model-Data-Preparation-and-Evaluation/SKILL.md

npx skillsauth add abelrguezr/hacktricks-skills ml-data-prep-eval

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

ML Data Preparation & Evaluation

This skill helps you prepare raw data for machine learning and evaluate model performance. Follow the workflow below for systematic data preparation.

Quick Start

# Clean and prepare your data
python scripts/data_cleaning.py --input data.csv --output cleaned_data.csv

# Transform features
python scripts/data_transformation.py --input cleaned_data.csv --output transformed_data.csv

# Split for training
python scripts/data_splitting.py --input transformed_data.csv --train-ratio 0.7 --val-ratio 0.15

# Evaluate model predictions
python scripts/model_evaluation.py --actual actual.csv --predicted predictions.csv

Workflow Overview

Data Collection → Gather from databases, APIs, files, or web scraping
Data Cleaning → Handle missing values, remove duplicates, filter outliers
Data Transformation → Normalize, encode, engineer features
Data Splitting → Create train/validation/test sets
Model Evaluation → Calculate performance metrics

1. Data Collection

Supported Sources

| Source | Method | Example | |--------|--------|---------| | CSV/JSON files | pandas.read_csv() | pd.read_csv('data.csv') | | SQL databases | sqlalchemy | pd.read_sql(query, connection) | | APIs | requests | requests.get(url).json() | | Web scraping | beautifulsoup4 | BeautifulSoup(html, 'html.parser') |

Best Practices

Validate data types immediately after loading
Check for encoding issues (UTF-8 is standard)
Log the number of records collected
Store metadata about collection time and source

2. Data Cleaning

Missing Values

Strategies by data type:

| Type | Strategy | When to use | |------|----------|-------------| | Numeric | Mean/Median imputation | Small gaps, normal distribution | | Numeric | KNN imputation | Complex relationships between features | | Categorical | Mode (most frequent) | When category matters | | Categorical | New category "Unknown" | When missingness is meaningful | | Any | Drop rows/columns | When >50% missing or not critical |

Use the cleaning script:

python scripts/data_cleaning.py \
  --input data.csv \
  --numeric-strategy median \
  --categorical-strategy most_frequent \
  --remove-duplicates \
  --filter-outliers zscore:3

Duplicates

Always check for exact duplicates: df.drop_duplicates()
Check for near-duplicates on key columns
Decide whether to keep first, last, or aggregate

Outliers

Detection methods:

| Method | Use case | Threshold | |--------|----------|----------| | Z-score | Normal distribution | |z| > 3 | | IQR | Skewed distribution | Q1 - 1.5×IQR, Q3 + 1.5×IQR | | Box plot | Visual inspection | Whisker bounds |

Decision framework:

Remove if clearly erroneous (e.g., age = 200)
Transform if valid but extreme (log transform)
Keep if legitimate edge cases (fraud detection)

3. Data Transformation

Normalization & Standardization

| Method | Formula | Range | Use when | |--------|---------|-------|----------| | Min-Max | (X - min) / (max - min) | [0, 1] | Neural networks, distance-based algorithms | | Z-Score | (X - μ) / σ | Mean=0, Std=1 | Linear models, when outliers exist | | Robust | (X - median) / IQR | - | Heavy outliers |

Script usage:

python scripts/data_transformation.py \
  --input cleaned_data.csv \
  --normalize zscore \
  --columns "feature1,feature2,feature3"

Encoding Categorical Variables

| Method | Output | Use when | |--------|--------|----------| | One-Hot | Binary columns | Low cardinality (<10 categories) | | Label | Integer 0,1,2... | Ordinal data or tree models | | Ordinal | Ordered integers | Natural ordering exists | | Target | Mean of target | High cardinality, supervised learning | | Hashing | Fixed-size vector | Very high cardinality |

Text encoding:

Bag of Words: Simple word counts
TF-IDF: Weighted by document frequency
Bigrams/Trigrams: Capture word sequences

Feature Engineering

Common patterns:

# Date/time features
df['hour'] = df['timestamp'].dt.hour
df['day_of_week'] = df['timestamp'].dt.dayofweek
df['is_weekend'] = df['day_of_week'].isin([5, 6])

# Ratios and combinations
df['price_per_sqft'] = df['price'] / df['sqft']
df['total_value'] = df['quantity'] * df['unit_price']

# Binning
df['age_group'] = pd.cut(df['age'], bins=[0, 18, 35, 50, 100], 
                         labels=['child', 'young', 'middle', 'senior'])

4. Data Splitting

Standard Split Ratios

| Dataset Size | Train | Validation | Test | |--------------|-------|------------|------| | Small (<10K) | 70% | 15% | 15% | | Medium (10K-100K) | 80% | 10% | 10% | | Large (>100K) | 90% | 5% | 5% |

Splitting Strategies

Stratified Split (classification with imbalanced classes):

python scripts/data_splitting.py \
  --input data.csv \
  --stratify target_column \
  --train-ratio 0.7 \
  --val-ratio 0.15

Time Series Split (temporal data):

Train on earlier periods
Test on later periods
Never shuffle time series data

K-Fold Cross-Validation (small datasets):

K=5 or K=10 typical
Each fold used once as validation
Average metrics across folds

5. Model Evaluation

Classification Metrics

| Metric | Formula | Best for | |--------|---------|----------| | Accuracy | (TP+TN) / Total | Balanced classes | | Precision | TP / (TP+FP) | Costly false positives | | Recall | TP / (TP+FN) | Costly false negatives | | F1 Score | 2×(P×R)/(P+R) | Imbalanced classes | | ROC-AUC | Area under ROC curve | Threshold-independent comparison | | MCC | Correlation coefficient | Imbalanced, all confusion matrix cells | | Specificity | TN / (TN+FP) | Costly false positives |

Script usage:

python scripts/model_evaluation.py \
  --actual actual_labels.csv \
  --predicted predictions.csv \
  --metrics "accuracy,precision,recall,f1,roc_auc,mcc"

Regression Metrics

| Metric | Formula | Interpretation | |--------|---------|----------------| | MAE | mean(|y - ŷ|) | Average error in original units | | MSE | mean((y - ŷ)²) | Penalizes large errors | | RMSE | sqrt(MSE) | Error in original units | | R² | 1 - SS_res/SS_tot | Proportion of variance explained |

Confusion Matrix

                Predicted
              Positive  Negative
Actual Positive    TP        FN
Actual Negative    FP        TN

Key insights:

High FP: Model is too aggressive (lower threshold)
High FN: Model is too conservative (raise threshold)
Diagonal dominance: Good performance
Off-diagonal patterns: Systematic errors to investigate

Common Patterns & Pitfalls

✅ Do

Always split data BEFORE any feature engineering
Use stratified splits for imbalanced classification
Keep test set completely untouched until final evaluation
Document all transformations for reproducibility
Check for data leakage (future info in training)

❌ Don't

Don't normalize using test set statistics
Don't impute missing values after splitting (fit on train only)
Don't use accuracy for imbalanced datasets
Don't evaluate on training data only
Don't shuffle time series data

⚠️ Watch Out For

Data leakage: Target information in features
Target imbalance: Use appropriate metrics (F1, MCC, ROC-AUC)
Overfitting: Large gap between train and test performance
Underfitting: Poor performance on both train and test
Feature scaling: Always scale before distance-based algorithms

Quick Reference

When to use which metric

| Scenario | Primary Metric | Secondary Metric | |----------|---------------|------------------| | Balanced classification | Accuracy | F1 Score | | Imbalanced classification | F1 Score | ROC-AUC | | Medical diagnosis | Recall | Precision | | Fraud detection | Precision | Recall | | Spam filtering | Recall | Specificity | | Regression | MAE or RMSE | R² | | Small dataset | MCC | F1 Score |

Script Quick Commands

# Full pipeline
python scripts/data_cleaning.py -i raw.csv -o clean.csv --remove-duplicates --filter-outliers zscore:3
python scripts/data_transformation.py -i clean.csv -o prep.csv --normalize zscore --encode onehot
python scripts/data_splitting.py -i prep.csv --stratify target --train-ratio 0.8
python scripts/model_evaluation.py -a actual.csv -p pred.csv --metrics all

Next Steps

After data preparation:

Train your model on the training set
Tune hyperparameters using validation set
Final evaluation on test set
Generate confusion matrix and detailed metrics
Analyze errors and iterate on features

For model training and deployment, consider using specialized ML frameworks (scikit-learn, TensorFlow, PyTorch).

abelrguezr/ml-data-prep-eval

skills/AI/AI-Model-Data-Preparation-and-Evaluation/SKILL.md

Prepare and evaluate machine learning data. Use this skill whenever the user needs to clean, transform, or split datasets for ML training, or evaluate model performance with metrics like accuracy, precision, recall, F1, ROC-AUC, MAE, or confusion matrices. Trigger for any data preprocessing task, feature engineering, handling missing values, encoding categorical variables, normalization, or model evaluation requests.

5 stars

development

Updated Apr 16, 2026

$ install --global

skillsauth

npx skillsauth add abelrguezr/hacktricks-skills ml-data-prep-eval

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 24, 2026, 8:50 PM1.8s1 file scanned

SKILL.md

name:: ml-data-prep-eval
description:: Prepare and evaluate machine learning data. Use this skill whenever the user needs to clean, transform, or split datasets for ML training, or evaluate model performance with metrics like accuracy, precision, recall, F1, ROC-AUC, MAE, or confusion matrices. Trigger for any data preprocessing task, feature engineering, handling missing values, encoding categorical variables, normalization, or model evaluation requests.

ML Data Preparation & Evaluation

This skill helps you prepare raw data for machine learning and evaluate model performance. Follow the workflow below for systematic data preparation.

Quick Start

# Clean and prepare your data
python scripts/data_cleaning.py --input data.csv --output cleaned_data.csv

# Transform features
python scripts/data_transformation.py --input cleaned_data.csv --output transformed_data.csv

# Split for training
python scripts/data_splitting.py --input transformed_data.csv --train-ratio 0.7 --val-ratio 0.15

# Evaluate model predictions
python scripts/model_evaluation.py --actual actual.csv --predicted predictions.csv

Workflow Overview

Data Collection → Gather from databases, APIs, files, or web scraping
Data Cleaning → Handle missing values, remove duplicates, filter outliers
Data Transformation → Normalize, encode, engineer features
Data Splitting → Create train/validation/test sets
Model Evaluation → Calculate performance metrics

1. Data Collection

Supported Sources

Best Practices

Validate data types immediately after loading
Check for encoding issues (UTF-8 is standard)
Log the number of records collected
Store metadata about collection time and source

2. Data Cleaning

Missing Values

Strategies by data type:

Use the cleaning script:

python scripts/data_cleaning.py \
  --input data.csv \
  --numeric-strategy median \
  --categorical-strategy most_frequent \
  --remove-duplicates \
  --filter-outliers zscore:3

Duplicates

Always check for exact duplicates: df.drop_duplicates()
Check for near-duplicates on key columns
Decide whether to keep first, last, or aggregate

Outliers

Detection methods:

Decision framework:

Remove if clearly erroneous (e.g., age = 200)
Transform if valid but extreme (log transform)
Keep if legitimate edge cases (fraud detection)

3. Data Transformation

Normalization & Standardization

Script usage:

python scripts/data_transformation.py \
  --input cleaned_data.csv \
  --normalize zscore \
  --columns "feature1,feature2,feature3"

Encoding Categorical Variables

Text encoding:

Bag of Words: Simple word counts
TF-IDF: Weighted by document frequency
Bigrams/Trigrams: Capture word sequences

Feature Engineering

Common patterns:

# Date/time features
df['hour'] = df['timestamp'].dt.hour
df['day_of_week'] = df['timestamp'].dt.dayofweek
df['is_weekend'] = df['day_of_week'].isin([5, 6])

# Ratios and combinations
df['price_per_sqft'] = df['price'] / df['sqft']
df['total_value'] = df['quantity'] * df['unit_price']

# Binning
df['age_group'] = pd.cut(df['age'], bins=[0, 18, 35, 50, 100], 
                         labels=['child', 'young', 'middle', 'senior'])

4. Data Splitting

Standard Split Ratios

| Dataset Size | Train | Validation | Test | |--------------|-------|------------|------| | Small (<10K) | 70% | 15% | 15% | | Medium (10K-100K) | 80% | 10% | 10% | | Large (>100K) | 90% | 5% | 5% |

Splitting Strategies

Stratified Split (classification with imbalanced classes):

python scripts/data_splitting.py \
  --input data.csv \
  --stratify target_column \
  --train-ratio 0.7 \
  --val-ratio 0.15

Time Series Split (temporal data):

Train on earlier periods
Test on later periods
Never shuffle time series data

K-Fold Cross-Validation (small datasets):

K=5 or K=10 typical
Each fold used once as validation
Average metrics across folds

5. Model Evaluation

Classification Metrics

Script usage:

python scripts/model_evaluation.py \
  --actual actual_labels.csv \
  --predicted predictions.csv \
  --metrics "accuracy,precision,recall,f1,roc_auc,mcc"

Regression Metrics

Confusion Matrix

                Predicted
              Positive  Negative
Actual Positive    TP        FN
Actual Negative    FP        TN

Key insights:

High FP: Model is too aggressive (lower threshold)
High FN: Model is too conservative (raise threshold)
Diagonal dominance: Good performance
Off-diagonal patterns: Systematic errors to investigate

Common Patterns & Pitfalls

✅ Do

Always split data BEFORE any feature engineering
Use stratified splits for imbalanced classification
Keep test set completely untouched until final evaluation
Document all transformations for reproducibility
Check for data leakage (future info in training)

❌ Don't

Don't normalize using test set statistics
Don't impute missing values after splitting (fit on train only)
Don't use accuracy for imbalanced datasets
Don't evaluate on training data only
Don't shuffle time series data

⚠️ Watch Out For

Data leakage: Target information in features
Target imbalance: Use appropriate metrics (F1, MCC, ROC-AUC)
Overfitting: Large gap between train and test performance
Underfitting: Poor performance on both train and test
Feature scaling: Always scale before distance-based algorithms

Quick Reference

When to use which metric

Script Quick Commands

# Full pipeline
python scripts/data_cleaning.py -i raw.csv -o clean.csv --remove-duplicates --filter-outliers zscore:3
python scripts/data_transformation.py -i clean.csv -o prep.csv --normalize zscore --encode onehot
python scripts/data_splitting.py -i prep.csv --stratify target --train-ratio 0.8
python scripts/model_evaluation.py -a actual.csv -p pred.csv --metrics all

Next Steps

After data preparation:

Train your model on the training set
Tune hyperparameters using validation set
Final evaluation on test set
Generate confusion matrix and detailed metrics
Analyze errors and iterate on features

For model training and deployment, consider using specialized ML frameworks (scikit-learn, TensorFlow, PyTorch).

Related Skills

abelrguezr/house-of-lore-exploit

testing

VerifiedTrustedCommunity

How to perform a House of Lore (small bin attack) heap exploitation. Use this skill whenever the user mentions heap exploitation, small bin attacks, fake chunks, glibc heap vulnerabilities, or needs to insert fake chunks into small bins for arbitrary read/write. Trigger for CTF challenges involving heap corruption, glibc 2.31+ exploitation, or when the user needs to bypass malloc sanity checks using fake chunk linking.

5SKILL.mdUpdated Apr 16, 2026

abelrguezr/house-of-lore-exploit

abelrguezr/house-of-force-exploit

testing

VerifiedTrustedCommunity

How to perform House of Force heap exploitation attacks. Use this skill whenever the user mentions heap exploitation, House of Force, top chunk manipulation, arbitrary memory allocation, malloc manipulation, or wants to allocate chunks at specific addresses. Also trigger for CTF challenges involving heap overflows, top chunk size overwrites, or when the user needs to calculate evil_size for heap attacks. Make sure to use this skill for any binary exploitation task involving glibc heap manipulation, even if they don't explicitly say "House of Force".

5SKILL.mdUpdated Apr 16, 2026

abelrguezr/house-of-force-exploit

abelrguezr/house-of-einherjar

tools

VerifiedTrustedCommunity

How to perform House of Einherjar heap exploitation to allocate memory at arbitrary addresses. Use this skill whenever the user mentions heap exploitation, glibc heap attacks, arbitrary memory allocation, off-by-one overflow exploitation, tcache poisoning, fast bin attacks, or any CTF challenge involving heap manipulation. This is essential for binary exploitation tasks where you need to control malloc() return addresses.

5SKILL.mdUpdated Apr 16, 2026

abelrguezr/house-of-einherjar

abelrguezr/heap-overflow-exploitation

testing

VerifiedTrustedCommunity

How to identify, analyze, and exploit heap overflow vulnerabilities in binary exploitation challenges and real-world scenarios. Use this skill whenever the user mentions heap overflows, memory corruption, heap grooming, tcache poisoning, fast-bin attacks, or any heap-related vulnerability in CTF challenges, binary analysis, or security research. This skill covers heap overflow fundamentals, exploitation techniques, heap grooming strategies, and real-world CVE analysis.

5SKILL.mdUpdated Apr 16, 2026

abelrguezr/heap-overflow-exploitation

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/abelrguezr/hacktricks-skills.git

# Copy into Claude Code skills folder (global)
cp -r hacktricks-skills/skills/AI/AI-Model-Data-Preparation-and-Evaluation ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

abelrguezr/hacktricks-skills

5 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT