skills/AI/AI-llm-architecture/0.-basic-llm-concepts/SKILL.md
Explain and teach Large Language Model fundamentals including pretraining, model architecture, PyTorch tensors, automatic differentiation, and backpropagation. Use this skill whenever the user asks about LLM concepts, neural network training, PyTorch operations, gradient computation, or wants to understand how LLMs work internally. Trigger on questions about model parameters, context length, embedding dimensions, tensor operations, autograd, or backpropagation.
npx skillsauth add abelrguezr/hacktricks-skills llm-fundamentalsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A skill for explaining and teaching Large Language Model concepts, PyTorch operations, and neural network training fundamentals.
Use this skill when the user:
Pretraining is the foundational phase where an LLM learns language structure from vast text data. During pretraining:
Key point: Pretraining creates the general language understanding; fine-tuning adapts it to specific applications.
When discussing LLM configuration, explain these components:
| Component | Description | Typical Values | |-----------|-------------|----------------| | Parameters | Learnable weights and biases in the neural network | Millions to billions | | Context Length | Maximum sequence length the model can process | 512 to 32K+ tokens | | Embedding Dimension | Size of vector representing each token | 768 to 16K+ | | Hidden Dimension | Size of hidden layers in the network | Matches embedding dimension | | Number of Layers | Depth of the network (transformer blocks) | 12 to 100+ | | Attention Heads | Parallel attention mechanisms per layer | 12 to 128+ | | Dropout | Percentage of neurons randomly disabled during training | 0-20% |
Example GPT-2 Configuration:
GPT_CONFIG_124M = {
"vocab_size": 50257, # BPE tokenizer vocabulary
"context_length": 1024, # Max sequence length
"emb_dim": 768, # Embedding dimension
"n_heads": 12, # Attention heads per layer
"n_layers": 12, # Number of transformer layers
"drop_rate": 0.1, # 10% dropout
"qkv_bias": False # No bias in QKV projections
}
Tensors are multi-dimensional arrays that serve as the fundamental data structure in PyTorch.
5[5, 1][[1,3], [5,2]]import torch
# Scalar (0D)
tensor0d = torch.tensor(1)
# Vector (1D)
tensor1d = torch.tensor([1, 2, 3])
# Matrix (2D)
tensor2d = torch.tensor([[1, 2], [3, 4]])
# 3D Tensor
tensor3d = torch.tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
torch.int64torch.float32.dtype.to()tensor1d = torch.tensor([1, 2, 3])
print(tensor1d.dtype) # torch.int64
float_tensor = tensor1d.to(torch.float32)
print(float_tensor.dtype) # torch.float32
# Access shape
print(tensor2d.shape) # torch.Size([2, 2])
# Reshape
reshaped = tensor2d.reshape(4, 1)
# Transpose (2D only)
transposed = tensor2d.T
# Matrix multiplication
result = tensor2d @ tensor2d.T
Why Tensors Matter:
Automatic differentiation (autograd) efficiently computes derivatives for optimization algorithms like gradient descent.
The chain rule is the mathematical foundation of autograd:
If y = f(u) and u = g(x), then:
dy/dx = dy/du * du/dx
Autograd builds a computational graph where:
import torch
import torch.nn.functional as F
# Define inputs
x = torch.tensor([1.1])
y = torch.tensor([1.0])
# Initialize parameters with gradient tracking
w = torch.tensor([2.2], requires_grad=True)
b = torch.tensor([0.0], requires_grad=True)
# Forward pass
z = x * w + b
a = torch.sigmoid(z)
loss = F.binary_cross_entropy(a, y)
# Backward pass - computes gradients
loss.backward()
# Access gradients
print("Gradient w.r.t w:", w.grad)
print("Gradient w.r.t b:", b.grad)
Key Points:
requires_grad=True to track operations.backward() to compute gradients.grad attributeBackpropagation extends automatic differentiation to multi-layer networks.
import torch
import torch.nn as nn
import torch.optim as optim
# Define network
class SimpleNet(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(10, 5) # Input to hidden
self.relu = nn.ReLU()
self.fc2 = nn.Linear(5, 1) # Hidden to output
self.sigmoid = nn.Sigmoid()
def forward(self, x):
h = self.relu(self.fc1(x))
y_hat = self.sigmoid(self.fc2(h))
return y_hat
# Training setup
net = SimpleNet()
criterion = nn.BCELoss()
optimizer = optim.SGD(net.parameters(), lr=0.01)
# Training loop
inputs = torch.randn(1, 10)
labels = torch.tensor([1.0])
optimizer.zero_grad() # Clear previous gradients
outputs = net(inputs) # Forward pass
loss = criterion(outputs, labels) # Compute loss
loss.backward() # Backward pass
optimizer.step() # Update parameters
During loss.backward():
.grad for each parameterFor hands-on demonstrations, reference the bundled scripts:
scripts/tensor_demo.py - Tensor operations examplesscripts/autograd_demo.py - Automatic differentiation walkthroughscripts/simple_net.py - Complete neural network training example| Operation | Code | Description |
|-----------|------|-------------|
| Create | torch.tensor([1,2,3]) | Create from list |
| Shape | .shape | Get dimensions |
| Reshape | .reshape(4,1) | Change shape |
| Transpose | .T | Swap dimensions (2D) |
| Multiply | @ or .matmul() | Matrix multiplication |
| Type | .dtype | Check data type |
| Convert | .to(torch.float32) | Change type |
for epoch in range(epochs):
optimizer.zero_grad() # 1. Clear gradients
outputs = model(inputs) # 2. Forward pass
loss = criterion(outputs, labels) # 3. Compute loss
loss.backward() # 4. Backward pass
optimizer.step() # 5. Update parameters
testing
How to perform a House of Lore (small bin attack) heap exploitation. Use this skill whenever the user mentions heap exploitation, small bin attacks, fake chunks, glibc heap vulnerabilities, or needs to insert fake chunks into small bins for arbitrary read/write. Trigger for CTF challenges involving heap corruption, glibc 2.31+ exploitation, or when the user needs to bypass malloc sanity checks using fake chunk linking.
testing
How to perform House of Force heap exploitation attacks. Use this skill whenever the user mentions heap exploitation, House of Force, top chunk manipulation, arbitrary memory allocation, malloc manipulation, or wants to allocate chunks at specific addresses. Also trigger for CTF challenges involving heap overflows, top chunk size overwrites, or when the user needs to calculate evil_size for heap attacks. Make sure to use this skill for any binary exploitation task involving glibc heap manipulation, even if they don't explicitly say "House of Force".
tools
How to perform House of Einherjar heap exploitation to allocate memory at arbitrary addresses. Use this skill whenever the user mentions heap exploitation, glibc heap attacks, arbitrary memory allocation, off-by-one overflow exploitation, tcache poisoning, fast bin attacks, or any CTF challenge involving heap manipulation. This is essential for binary exploitation tasks where you need to control malloc() return addresses.
testing
How to identify, analyze, and exploit heap overflow vulnerabilities in binary exploitation challenges and real-world scenarios. Use this skill whenever the user mentions heap overflows, memory corruption, heap grooming, tcache poisoning, fast-bin attacks, or any heap-related vulnerability in CTF challenges, binary analysis, or security research. This skill covers heap overflow fundamentals, exploitation techniques, heap grooming strategies, and real-world CVE analysis.