skills/AI/AI-Supervised-Learning-Algorithms/SKILL.md
How to implement supervised machine learning algorithms for cybersecurity tasks like intrusion detection, malware classification, phishing detection, and spam filtering. Use this skill whenever the user mentions machine learning, ML models, classification, regression, cybersecurity datasets, NSL-KDD, phishing detection, intrusion detection, malware analysis, or wants to build predictive models for security applications. This skill covers Linear Regression, Logistic Regression, Decision Trees, Random Forests, SVM, Naive Bayes, k-NN, and Gradient Boosting with ready-to-use Python code.
npx skillsauth add abelrguezr/hacktricks-skills supervised-learning-cybersecurityInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill helps you implement supervised machine learning algorithms for cybersecurity applications. It provides ready-to-use Python code for common security tasks like intrusion detection, malware classification, phishing detection, and spam filtering.
Choose your task and algorithm, then run the corresponding script:
# For intrusion detection with Random Forest
python scripts/train_intrusion_detection.py --algorithm random_forest
# For phishing detection with Logistic Regression
python scripts/train_phishing_detection.py --algorithm logistic_regression
# For comparing multiple algorithms
python scripts/compare_algorithms.py --dataset nsl-kdd
| Algorithm | Best For | Speed | Accuracy | Interpretability | |-----------|----------|-------|----------|------------------| | Logistic Regression | Binary classification, baseline | Fast | Good | High | | Decision Trees | Rule-based detection, explainability | Fast | Medium | Very High | | Random Forests | General purpose, robust detection | Medium | High | Medium | | SVM | High-dimensional data, complex boundaries | Slow | High | Low | | Naive Bayes | Text classification, spam filtering | Very Fast | Medium | Medium | | k-NN | Small datasets, anomaly detection | Slow | Medium | Low | | Gradient Boosting | Best accuracy, tabular data | Medium | Very High | Low | | Linear Regression | Predicting numeric values | Fast | Varies | High |
Detect network attacks from connection features.
python scripts/train_intrusion_detection.py --algorithm random_forest
What it does:
Expected results: Random Forest typically achieves 75-80% accuracy, 95%+ precision, 60-65% recall on NSL-KDD.
Classify websites as phishing or legitimate.
python scripts/train_phishing_detection.py --algorithm svm
What it does:
Expected results: SVM and Gradient Boosting typically achieve 95%+ accuracy, 98%+ ROC AUC.
Use Naive Bayes for text-based classification.
python scripts/train_spam_detection.py --algorithm naive_bayes
What it does:
Choose Logistic Regression when:
Choose Decision Trees when:
Choose Random Forests when:
Choose SVM when:
Choose Naive Bayes when:
Choose k-NN when:
Choose Gradient Boosting when:
Choose Linear Regression when:
For cybersecurity:
Combine multiple models for better performance:
python scripts/train_ensemble.py --method stacking
Voting Ensemble: Multiple models vote on final prediction
Stacking: Meta-model learns to combine base model predictions
Before training any model:
Problem: Model overfits (high train accuracy, low test accuracy)
Problem: Model underfits (low accuracy on both train and test)
Problem: Class imbalance (many more normal than attack samples)
Problem: Slow training time
Problem: Poor recall (missing attacks)
testing
How to perform a House of Lore (small bin attack) heap exploitation. Use this skill whenever the user mentions heap exploitation, small bin attacks, fake chunks, glibc heap vulnerabilities, or needs to insert fake chunks into small bins for arbitrary read/write. Trigger for CTF challenges involving heap corruption, glibc 2.31+ exploitation, or when the user needs to bypass malloc sanity checks using fake chunk linking.
testing
How to perform House of Force heap exploitation attacks. Use this skill whenever the user mentions heap exploitation, House of Force, top chunk manipulation, arbitrary memory allocation, malloc manipulation, or wants to allocate chunks at specific addresses. Also trigger for CTF challenges involving heap overflows, top chunk size overwrites, or when the user needs to calculate evil_size for heap attacks. Make sure to use this skill for any binary exploitation task involving glibc heap manipulation, even if they don't explicitly say "House of Force".
tools
How to perform House of Einherjar heap exploitation to allocate memory at arbitrary addresses. Use this skill whenever the user mentions heap exploitation, glibc heap attacks, arbitrary memory allocation, off-by-one overflow exploitation, tcache poisoning, fast bin attacks, or any CTF challenge involving heap manipulation. This is essential for binary exploitation tasks where you need to control malloc() return addresses.
testing
How to identify, analyze, and exploit heap overflow vulnerabilities in binary exploitation challenges and real-world scenarios. Use this skill whenever the user mentions heap overflows, memory corruption, heap grooming, tcache poisoning, fast-bin attacks, or any heap-related vulnerability in CTF challenges, binary analysis, or security research. This skill covers heap overflow fundamentals, exploitation techniques, heap grooming strategies, and real-world CVE analysis.