Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

abelrguezr/llm-fundamentals

Name: llm-fundamentals
Author: abelrguezr

skills/AI/AI-llm-architecture/0.-basic-llm-concepts/SKILL.md

npx skillsauth add abelrguezr/hacktricks-skills llm-fundamentals

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

LLM Fundamentals

A skill for explaining and teaching Large Language Model concepts, PyTorch operations, and neural network training fundamentals.

When to Use This Skill

Use this skill when the user:

Asks about how LLMs work or are trained
Wants to understand model architecture components (parameters, layers, attention heads)
Needs help with PyTorch tensor operations
Is learning about automatic differentiation or backpropagation
Wants to understand neural network training concepts
Asks about pretraining vs fine-tuning

Core Concepts to Explain

Pretraining

Pretraining is the foundational phase where an LLM learns language structure from vast text data. During pretraining:

The model learns grammar, vocabulary, syntax, and contextual relationships
It acquires broad world knowledge from diverse data
The model becomes capable of generating coherent, contextually relevant text
After pretraining, the model can be fine-tuned for specific tasks or domains

Key point: Pretraining creates the general language understanding; fine-tuning adapts it to specific applications.

Main LLM Architecture Components

When discussing LLM configuration, explain these components:

| Component | Description | Typical Values | |-----------|-------------|----------------| | Parameters | Learnable weights and biases in the neural network | Millions to billions | | Context Length | Maximum sequence length the model can process | 512 to 32K+ tokens | | Embedding Dimension | Size of vector representing each token | 768 to 16K+ | | Hidden Dimension | Size of hidden layers in the network | Matches embedding dimension | | Number of Layers | Depth of the network (transformer blocks) | 12 to 100+ | | Attention Heads | Parallel attention mechanisms per layer | 12 to 128+ | | Dropout | Percentage of neurons randomly disabled during training | 0-20% |

Example GPT-2 Configuration:

GPT_CONFIG_124M = {
    "vocab_size": 50257,      # BPE tokenizer vocabulary
    "context_length": 1024,   # Max sequence length
    "emb_dim": 768,           # Embedding dimension
    "n_heads": 12,            # Attention heads per layer
    "n_layers": 12,           # Number of transformer layers
    "drop_rate": 0.1,         # 10% dropout
    "qkv_bias": False         # No bias in QKV projections
}

PyTorch Tensors

Tensors are multi-dimensional arrays that serve as the fundamental data structure in PyTorch.

Tensor Ranks

Scalar (0D): Single number, e.g., 5
Vector (1D): One-dimensional array, e.g., [5, 1]
Matrix (2D): Two-dimensional array with rows/columns, e.g., [[1,3], [5,2]]
Higher-Rank (3D+): Multi-dimensional data, e.g., 3D tensors for images

Creating Tensors

import torch

# Scalar (0D)
tensor0d = torch.tensor(1)

# Vector (1D)
tensor1d = torch.tensor([1, 2, 3])

# Matrix (2D)
tensor2d = torch.tensor([[1, 2], [3, 4]])

# 3D Tensor
tensor3d = torch.tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

Tensor Data Types

Integers default to torch.int64
Floats default to torch.float32
Check type with .dtype
Convert with .to()

tensor1d = torch.tensor([1, 2, 3])
print(tensor1d.dtype)  # torch.int64

float_tensor = tensor1d.to(torch.float32)
print(float_tensor.dtype)  # torch.float32

Common Tensor Operations

# Access shape
print(tensor2d.shape)  # torch.Size([2, 2])

# Reshape
reshaped = tensor2d.reshape(4, 1)

# Transpose (2D only)
transposed = tensor2d.T

# Matrix multiplication
result = tensor2d @ tensor2d.T

Why Tensors Matter:

Store input data, weights, and biases
Enable forward and backward passes in training
Support automatic gradient computation via autograd
Can be moved to GPU for acceleration

Automatic Differentiation

Automatic differentiation (autograd) efficiently computes derivatives for optimization algorithms like gradient descent.

The Chain Rule

The chain rule is the mathematical foundation of autograd:

If y = f(u) and u = g(x), then:

dy/dx = dy/du * du/dx

Computational Graph

Autograd builds a computational graph where:

Each node represents an operation or variable
Traversing the graph computes derivatives efficiently
The graph is built dynamically during the forward pass

PyTorch Autograd Example

import torch
import torch.nn.functional as F

# Define inputs
x = torch.tensor([1.1])
y = torch.tensor([1.0])

# Initialize parameters with gradient tracking
w = torch.tensor([2.2], requires_grad=True)
b = torch.tensor([0.0], requires_grad=True)

# Forward pass
z = x * w + b
a = torch.sigmoid(z)
loss = F.binary_cross_entropy(a, y)

# Backward pass - computes gradients
loss.backward()

# Access gradients
print("Gradient w.r.t w:", w.grad)
print("Gradient w.r.t b:", b.grad)

Key Points:

Set requires_grad=True to track operations
Call .backward() to compute gradients
Gradients accumulate in the .grad attribute
Autograd handles the chain rule automatically

Backpropagation in Neural Networks

Backpropagation extends automatic differentiation to multi-layer networks.

The Training Loop

Initialize network parameters (weights and biases)
Forward Pass: Compute outputs by passing inputs through layers
Compute Loss: Evaluate difference between output and target
Backward Pass: Compute gradients using chain rule (backpropagation)
Update Parameters: Apply optimization algorithm (e.g., gradient descent)

Simple Neural Network Example

import torch
import torch.nn as nn
import torch.optim as optim

# Define network
class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(10, 5)   # Input to hidden
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(5, 1)    # Hidden to output
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        h = self.relu(self.fc1(x))
        y_hat = self.sigmoid(self.fc2(h))
        return y_hat

# Training setup
net = SimpleNet()
criterion = nn.BCELoss()
optimizer = optim.SGD(net.parameters(), lr=0.01)

# Training loop
inputs = torch.randn(1, 10)
labels = torch.tensor([1.0])

optimizer.zero_grad()          # Clear previous gradients
outputs = net(inputs)          # Forward pass
loss = criterion(outputs, labels)  # Compute loss
loss.backward()                # Backward pass
optimizer.step()               # Update parameters

Understanding the Backward Pass

During loss.backward():

PyTorch traverses the computational graph in reverse
Applies the chain rule at each operation
Accumulates gradients in .grad for each parameter
Gradients are ready for optimizer to use

Advantages of Automatic Differentiation

Efficiency: Reuses intermediate results, avoids redundant calculations
Accuracy: Provides exact derivatives up to machine precision
Simplicity: Eliminates manual derivative computation

Teaching Patterns

When Explaining Concepts

Start with intuition before diving into math
Use concrete examples with actual numbers when possible
Connect to real applications (e.g., "this is how ChatGPT learns")
Show code that demonstrates the concept
Explain the why - why this matters for LLMs

Common Questions to Anticipate

"What's the difference between pretraining and fine-tuning?"
"How many parameters does a typical LLM have?"
"Why do we need automatic differentiation?"
"What happens during backpropagation?"
"How do tensors relate to neural networks?"

When to Reference Scripts

For hands-on demonstrations, reference the bundled scripts:

scripts/tensor_demo.py - Tensor operations examples
scripts/autograd_demo.py - Automatic differentiation walkthrough
scripts/simple_net.py - Complete neural network training example

Quick Reference

Tensor Operations Cheat Sheet

| Operation | Code | Description | |-----------|------|-------------| | Create | torch.tensor([1,2,3]) | Create from list | | Shape | .shape | Get dimensions | | Reshape | .reshape(4,1) | Change shape | | Transpose | .T | Swap dimensions (2D) | | Multiply | @ or .matmul() | Matrix multiplication | | Type | .dtype | Check data type | | Convert | .to(torch.float32) | Change type |

Training Loop Template

for epoch in range(epochs):
    optimizer.zero_grad()      # 1. Clear gradients
    outputs = model(inputs)    # 2. Forward pass
    loss = criterion(outputs, labels)  # 3. Compute loss
    loss.backward()            # 4. Backward pass
    optimizer.step()           # 5. Update parameters

Key Takeaways

Pretraining = general language learning; Fine-tuning = task-specific adaptation
Parameters are the learnable weights that define the model
Tensors are the data containers for all neural network operations
Autograd automatically computes gradients via the chain rule
Backpropagation is the algorithm that updates parameters using gradients

abelrguezr/llm-fundamentals

skills/AI/AI-llm-architecture/0.-basic-llm-concepts/SKILL.md

Explain and teach Large Language Model fundamentals including pretraining, model architecture, PyTorch tensors, automatic differentiation, and backpropagation. Use this skill whenever the user asks about LLM concepts, neural network training, PyTorch operations, gradient computation, or wants to understand how LLMs work internally. Trigger on questions about model parameters, context length, embedding dimensions, tensor operations, autograd, or backpropagation.

5 stars

data-ai

Updated Apr 16, 2026

$ install --global

skillsauth

npx skillsauth add abelrguezr/hacktricks-skills llm-fundamentals

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 16, 2026, 2:03 AM62.9s2 files scanned

SKILL.md

name:: llm-fundamentals
description:: Explain and teach Large Language Model fundamentals including pretraining, model architecture, PyTorch tensors, automatic differentiation, and backpropagation. Use this skill whenever the user asks about LLM concepts, neural network training, PyTorch operations, gradient computation, or wants to understand how LLMs work internally. Trigger on questions about model parameters, context length, embedding dimensions, tensor operations, autograd, or backpropagation.

LLM Fundamentals

A skill for explaining and teaching Large Language Model concepts, PyTorch operations, and neural network training fundamentals.

When to Use This Skill

Use this skill when the user:

Asks about how LLMs work or are trained
Wants to understand model architecture components (parameters, layers, attention heads)
Needs help with PyTorch tensor operations
Is learning about automatic differentiation or backpropagation
Wants to understand neural network training concepts
Asks about pretraining vs fine-tuning

Core Concepts to Explain

Pretraining

Pretraining is the foundational phase where an LLM learns language structure from vast text data. During pretraining:

The model learns grammar, vocabulary, syntax, and contextual relationships
It acquires broad world knowledge from diverse data
The model becomes capable of generating coherent, contextually relevant text
After pretraining, the model can be fine-tuned for specific tasks or domains

Key point: Pretraining creates the general language understanding; fine-tuning adapts it to specific applications.

Main LLM Architecture Components

When discussing LLM configuration, explain these components:

Example GPT-2 Configuration:

GPT_CONFIG_124M = {
    "vocab_size": 50257,      # BPE tokenizer vocabulary
    "context_length": 1024,   # Max sequence length
    "emb_dim": 768,           # Embedding dimension
    "n_heads": 12,            # Attention heads per layer
    "n_layers": 12,           # Number of transformer layers
    "drop_rate": 0.1,         # 10% dropout
    "qkv_bias": False         # No bias in QKV projections
}

PyTorch Tensors

Tensors are multi-dimensional arrays that serve as the fundamental data structure in PyTorch.

Tensor Ranks

Scalar (0D): Single number, e.g., 5
Vector (1D): One-dimensional array, e.g., [5, 1]
Matrix (2D): Two-dimensional array with rows/columns, e.g., [[1,3], [5,2]]
Higher-Rank (3D+): Multi-dimensional data, e.g., 3D tensors for images

Creating Tensors

import torch

# Scalar (0D)
tensor0d = torch.tensor(1)

# Vector (1D)
tensor1d = torch.tensor([1, 2, 3])

# Matrix (2D)
tensor2d = torch.tensor([[1, 2], [3, 4]])

# 3D Tensor
tensor3d = torch.tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

Tensor Data Types

Integers default to torch.int64
Floats default to torch.float32
Check type with .dtype
Convert with .to()

tensor1d = torch.tensor([1, 2, 3])
print(tensor1d.dtype)  # torch.int64

float_tensor = tensor1d.to(torch.float32)
print(float_tensor.dtype)  # torch.float32

Common Tensor Operations

# Access shape
print(tensor2d.shape)  # torch.Size([2, 2])

# Reshape
reshaped = tensor2d.reshape(4, 1)

# Transpose (2D only)
transposed = tensor2d.T

# Matrix multiplication
result = tensor2d @ tensor2d.T

Why Tensors Matter:

Store input data, weights, and biases
Enable forward and backward passes in training
Support automatic gradient computation via autograd
Can be moved to GPU for acceleration

Automatic Differentiation

Automatic differentiation (autograd) efficiently computes derivatives for optimization algorithms like gradient descent.

The Chain Rule

The chain rule is the mathematical foundation of autograd:

If y = f(u) and u = g(x), then:

dy/dx = dy/du * du/dx

Computational Graph

Autograd builds a computational graph where:

Each node represents an operation or variable
Traversing the graph computes derivatives efficiently
The graph is built dynamically during the forward pass

PyTorch Autograd Example

import torch
import torch.nn.functional as F

# Define inputs
x = torch.tensor([1.1])
y = torch.tensor([1.0])

# Initialize parameters with gradient tracking
w = torch.tensor([2.2], requires_grad=True)
b = torch.tensor([0.0], requires_grad=True)

# Forward pass
z = x * w + b
a = torch.sigmoid(z)
loss = F.binary_cross_entropy(a, y)

# Backward pass - computes gradients
loss.backward()

# Access gradients
print("Gradient w.r.t w:", w.grad)
print("Gradient w.r.t b:", b.grad)

Key Points:

Set requires_grad=True to track operations
Call .backward() to compute gradients
Gradients accumulate in the .grad attribute
Autograd handles the chain rule automatically

Backpropagation in Neural Networks

Backpropagation extends automatic differentiation to multi-layer networks.

The Training Loop

Initialize network parameters (weights and biases)
Forward Pass: Compute outputs by passing inputs through layers
Compute Loss: Evaluate difference between output and target
Backward Pass: Compute gradients using chain rule (backpropagation)
Update Parameters: Apply optimization algorithm (e.g., gradient descent)

Simple Neural Network Example

import torch
import torch.nn as nn
import torch.optim as optim

# Define network
class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(10, 5)   # Input to hidden
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(5, 1)    # Hidden to output
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        h = self.relu(self.fc1(x))
        y_hat = self.sigmoid(self.fc2(h))
        return y_hat

# Training setup
net = SimpleNet()
criterion = nn.BCELoss()
optimizer = optim.SGD(net.parameters(), lr=0.01)

# Training loop
inputs = torch.randn(1, 10)
labels = torch.tensor([1.0])

optimizer.zero_grad()          # Clear previous gradients
outputs = net(inputs)          # Forward pass
loss = criterion(outputs, labels)  # Compute loss
loss.backward()                # Backward pass
optimizer.step()               # Update parameters

Understanding the Backward Pass

During loss.backward():

PyTorch traverses the computational graph in reverse
Applies the chain rule at each operation
Accumulates gradients in .grad for each parameter
Gradients are ready for optimizer to use

Advantages of Automatic Differentiation

Efficiency: Reuses intermediate results, avoids redundant calculations
Accuracy: Provides exact derivatives up to machine precision
Simplicity: Eliminates manual derivative computation

Teaching Patterns

When Explaining Concepts

Start with intuition before diving into math
Use concrete examples with actual numbers when possible
Connect to real applications (e.g., "this is how ChatGPT learns")
Show code that demonstrates the concept
Explain the why - why this matters for LLMs

Common Questions to Anticipate

"What's the difference between pretraining and fine-tuning?"
"How many parameters does a typical LLM have?"
"Why do we need automatic differentiation?"
"What happens during backpropagation?"
"How do tensors relate to neural networks?"

When to Reference Scripts

For hands-on demonstrations, reference the bundled scripts:

scripts/tensor_demo.py - Tensor operations examples
scripts/autograd_demo.py - Automatic differentiation walkthrough
scripts/simple_net.py - Complete neural network training example

Quick Reference

Tensor Operations Cheat Sheet

Training Loop Template

for epoch in range(epochs):
    optimizer.zero_grad()      # 1. Clear gradients
    outputs = model(inputs)    # 2. Forward pass
    loss = criterion(outputs, labels)  # 3. Compute loss
    loss.backward()            # 4. Backward pass
    optimizer.step()           # 5. Update parameters

Key Takeaways

Pretraining = general language learning; Fine-tuning = task-specific adaptation
Parameters are the learnable weights that define the model
Tensors are the data containers for all neural network operations
Autograd automatically computes gradients via the chain rule
Backpropagation is the algorithm that updates parameters using gradients

Related Skills

abelrguezr/house-of-lore-exploit

testing

VerifiedTrustedCommunity

How to perform a House of Lore (small bin attack) heap exploitation. Use this skill whenever the user mentions heap exploitation, small bin attacks, fake chunks, glibc heap vulnerabilities, or needs to insert fake chunks into small bins for arbitrary read/write. Trigger for CTF challenges involving heap corruption, glibc 2.31+ exploitation, or when the user needs to bypass malloc sanity checks using fake chunk linking.

5SKILL.mdUpdated Apr 16, 2026

abelrguezr/house-of-lore-exploit

abelrguezr/house-of-force-exploit

testing

VerifiedTrustedCommunity

How to perform House of Force heap exploitation attacks. Use this skill whenever the user mentions heap exploitation, House of Force, top chunk manipulation, arbitrary memory allocation, malloc manipulation, or wants to allocate chunks at specific addresses. Also trigger for CTF challenges involving heap overflows, top chunk size overwrites, or when the user needs to calculate evil_size for heap attacks. Make sure to use this skill for any binary exploitation task involving glibc heap manipulation, even if they don't explicitly say "House of Force".

5SKILL.mdUpdated Apr 16, 2026

abelrguezr/house-of-force-exploit

abelrguezr/house-of-einherjar

tools

VerifiedTrustedCommunity

How to perform House of Einherjar heap exploitation to allocate memory at arbitrary addresses. Use this skill whenever the user mentions heap exploitation, glibc heap attacks, arbitrary memory allocation, off-by-one overflow exploitation, tcache poisoning, fast bin attacks, or any CTF challenge involving heap manipulation. This is essential for binary exploitation tasks where you need to control malloc() return addresses.

5SKILL.mdUpdated Apr 16, 2026

abelrguezr/house-of-einherjar

abelrguezr/heap-overflow-exploitation

testing

VerifiedTrustedCommunity

How to identify, analyze, and exploit heap overflow vulnerabilities in binary exploitation challenges and real-world scenarios. Use this skill whenever the user mentions heap overflows, memory corruption, heap grooming, tcache poisoning, fast-bin attacks, or any heap-related vulnerability in CTF challenges, binary analysis, or security research. This skill covers heap overflow fundamentals, exploitation techniques, heap grooming strategies, and real-world CVE analysis.

5SKILL.mdUpdated Apr 16, 2026

abelrguezr/heap-overflow-exploitation

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/abelrguezr/hacktricks-skills.git

# Copy into Claude Code skills folder (global)
cp -r hacktricks-skills/skills/AI/AI-llm-architecture/0.-basic-llm-concepts ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

abelrguezr/hacktricks-skills

5 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT