Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

abelrguezr/token-embeddings

Name: token-embeddings
Author: abelrguezr

skills/AI/AI-llm-architecture/3.-token-embeddings/SKILL.md

npx skillsauth add abelrguezr/hacktricks-skills token-embeddings

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Token Embeddings Skill

This skill helps you create, understand, and work with token embeddings for large language models.

What This Skill Does

Explains token embedding concepts and initialization
Creates PyTorch embedding layers for your vocabulary
Adds positional embeddings (absolute, relative, RoPE)
Helps debug embedding-related issues
Provides code templates for common embedding tasks

When to Use This Skill

Use this skill when you:

Need to create token embeddings for a new vocabulary
Want to understand how embeddings work in your model
Need to add positional information to your embeddings
Are debugging embedding dimension mismatches
Want to extend context windows in RoPE-based models
Need to implement or understand different positional encoding strategies

Core Concepts

Token Embeddings

Token embeddings convert discrete tokens into continuous vectors. Each token in your vocabulary gets a unique vector of fixed dimensions.

Key parameters:

vocab_size: Number of unique tokens (e.g., 50257 for BPE)
embedding_dim: Vector dimensions (e.g., 256, 512, 768)

Example:

Vocabulary: [1, 2, 3, 4, 5, 6] (6 tokens)
Embedding dim: 3
Token 3 → [-0.4015, 0.9666, -1.1481]

Positional Embeddings

Positional embeddings encode token positions in sequences. Without them, the model treats tokens as a "bag of words."

Types:

Absolute: Fixed position vectors (GPT-style)
Relative: Distance-based encoding (Transformer-XL, BERT variants)
RoPE: Rotary embeddings (modern decoder-only LLMs)

Quick Start

Create Basic Token Embeddings

import torch

vocab_size = 50257  # BPE vocabulary
embedding_dim = 256

token_embedding = torch.nn.Embedding(vocab_size, embedding_dim)

Add Absolute Positional Embeddings

context_length = 512
pos_embedding = torch.nn.Embedding(context_length, embedding_dim)

# Combine embeddings
token_emb = token_embedding(token_ids)  # [batch, seq_len, dim]
pos_emb = pos_embedding(torch.arange(seq_len))  # [seq_len, dim]
combined = token_emb + pos_emb  # [batch, seq_len, dim]

RoPE (Rotary Positional Embeddings)

For modern LLMs, RoPE is preferred:

def apply_rope(q, k, cos, sin):
    """Apply rotary positional embeddings to query/key vectors."""
    q_embed = (q * cos) + (rotate_half(q) * sin)
    k_embed = (k * cos) + (rotate_half(k) * sin)
    return q_embed, k_embed

Common Tasks

Task 1: Initialize Embedding Layer

import torch

def create_token_embeddings(vocab_size: int, embedding_dim: int) -> torch.nn.Embedding:
    """Create a token embedding layer."""
    return torch.nn.Embedding(vocab_size, embedding_dim)

# Usage
embedding_layer = create_token_embeddings(50257, 256)
print(embedding_layer.weight.shape)  # torch.Size([50257, 256])

Task 2: Create Positional Embeddings

def create_positional_embeddings(context_length: int, embedding_dim: int) -> torch.nn.Embedding:
    """Create absolute positional embeddings."""
    return torch.nn.Embedding(context_length, embedding_dim)

# Usage
pos_layer = create_positional_embeddings(512, 256)
pos_embeddings = pos_layer(torch.arange(512))
print(pos_embeddings.shape)  # torch.Size([512, 256])

Task 3: Combine Token and Positional Embeddings

def combine_embeddings(
    token_ids: torch.Tensor,
    token_embedding: torch.nn.Embedding,
    pos_embedding: torch.nn.Embedding
) -> torch.Tensor:
    """Combine token and positional embeddings.
    
    Args:
        token_ids: [batch_size, seq_len]
        token_embedding: Token embedding layer
        pos_embedding: Positional embedding layer
    
    Returns:
        Combined embeddings: [batch_size, seq_len, embedding_dim]
    """
    batch_size, seq_len = token_ids.shape
    
    # Get token embeddings
    token_emb = token_embedding(token_ids)  # [batch, seq_len, dim]
    
    # Get positional embeddings
    positions = torch.arange(seq_len).expand(batch_size, -1)
    pos_emb = pos_embedding(positions)  # [batch, seq_len, dim]
    
    # Combine
    return token_emb + pos_emb

Task 4: Position Interpolation for Extended Context

def position_interpolation(
    pos_ids: torch.Tensor,
    original_context: int,
    new_context: int
) -> torch.Tensor:
    """Scale position indices for context window extension.
    
    Args:
        pos_ids: Original position indices
        original_context: Training context length (e.g., 2048)
        new_context: Target context length (e.g., 8192)
    
    Returns:
        Scaled position indices
    """
    scale = original_context / new_context
    scaled_pos = (pos_ids * scale).long()
    return scaled_pos

# Usage
original_ctx = 2048
new_ctx = 8192
positions = torch.arange(8192)
scaled_positions = position_interpolation(positions, original_ctx, new_ctx)

Debugging Checklist

When embeddings aren't working correctly, check:

Dimension Mismatches

# Verify shapes match
assert token_emb.shape == pos_emb.shape, "Embedding dimensions must match"

Vocabulary Size

# Ensure vocab_size matches your tokenizer
max_token_id = token_ids.max()
assert max_token_id < vocab_size, f"Token {max_token_id} exceeds vocab_size {vocab_size}"

Context Length

# Ensure sequence doesn't exceed context length
seq_len = token_ids.shape[1]
assert seq_len <= context_length, f"Sequence {seq_len} exceeds context {context_length}"

Gradient Flow

# Verify embeddings are trainable
assert token_embedding.weight.requires_grad, "Embeddings should be trainable"

Best Practices

Embedding Dimensions: Use powers of 2 (256, 512, 768, 1024) for efficiency
Initialization: PyTorch's default Xavier initialization works well
Positional Encoding: Use RoPE for decoder-only models, absolute for encoder-only
Context Extension: Use position interpolation before fine-tuning for longer contexts
Batch Processing: Always process in batches for efficiency

Example: Complete Embedding Setup

import torch
import torch.nn as nn

class TokenEmbedding(nn.Module):
    def __init__(self, vocab_size: int, embedding_dim: int, context_length: int):
        super().__init__()
        self.token_embedding = nn.Embedding(vocab_size, embedding_dim)
        self.pos_embedding = nn.Embedding(context_length, embedding_dim)
        self.context_length = context_length
    
    def forward(self, token_ids: torch.Tensor) -> torch.Tensor:
        batch_size, seq_len = token_ids.shape
        
        # Token embeddings
        token_emb = self.token_embedding(token_ids)
        
        # Positional embeddings
        positions = torch.arange(seq_len).expand(batch_size, -1)
        pos_emb = self.pos_embedding(positions)
        
        # Combine
        return token_emb + pos_emb

# Usage
vocab_size = 50257
embedding_dim = 256
context_length = 512

embedding_model = TokenEmbedding(vocab_size, embedding_dim, context_length)

# Test with sample input
batch_size = 8
seq_len = 4
token_ids = torch.randint(0, vocab_size, (batch_size, seq_len))

output = embedding_model(token_ids)
print(output.shape)  # torch.Size([8, 4, 256])

References

Build a Large Language Model from Scratch
RoPE Paper
YaRN: Context Extension
Position Interpolation

abelrguezr/token-embeddings

skills/AI/AI-llm-architecture/3.-token-embeddings/SKILL.md

Create and work with token embeddings for LLMs. Use this skill whenever you need to understand token embeddings, create embedding layers in PyTorch, add positional embeddings (absolute, relative, or RoPE), or debug embedding-related issues in your language model. This skill covers vocabulary setup, embedding initialization, positional encoding strategies, and context window extension techniques. Make sure to use this skill when working with any LLM architecture, training pipelines, or when you need to convert tokens to numerical vectors.

5 stars

tools

Updated Apr 16, 2026

$ install --global

skillsauth

npx skillsauth add abelrguezr/hacktricks-skills token-embeddings

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 16, 2026, 2:06 AM167.7s2 files scanned

SKILL.md

name:: token-embeddings
description:: Create and work with token embeddings for LLMs. Use this skill whenever you need to understand token embeddings, create embedding layers in PyTorch, add positional embeddings (absolute, relative, or RoPE), or debug embedding-related issues in your language model. This skill covers vocabulary setup, embedding initialization, positional encoding strategies, and context window extension techniques. Make sure to use this skill when working with any LLM architecture, training pipelines, or when you need to convert tokens to numerical vectors.

Token Embeddings Skill

This skill helps you create, understand, and work with token embeddings for large language models.

What This Skill Does

Explains token embedding concepts and initialization
Creates PyTorch embedding layers for your vocabulary
Adds positional embeddings (absolute, relative, RoPE)
Helps debug embedding-related issues
Provides code templates for common embedding tasks

When to Use This Skill

Use this skill when you:

Need to create token embeddings for a new vocabulary
Want to understand how embeddings work in your model
Need to add positional information to your embeddings
Are debugging embedding dimension mismatches
Want to extend context windows in RoPE-based models
Need to implement or understand different positional encoding strategies

Core Concepts

Token Embeddings

Token embeddings convert discrete tokens into continuous vectors. Each token in your vocabulary gets a unique vector of fixed dimensions.

Key parameters:

vocab_size: Number of unique tokens (e.g., 50257 for BPE)
embedding_dim: Vector dimensions (e.g., 256, 512, 768)

Example:

Vocabulary: [1, 2, 3, 4, 5, 6] (6 tokens)
Embedding dim: 3
Token 3 → [-0.4015, 0.9666, -1.1481]

Positional Embeddings

Positional embeddings encode token positions in sequences. Without them, the model treats tokens as a "bag of words."

Types:

Absolute: Fixed position vectors (GPT-style)
Relative: Distance-based encoding (Transformer-XL, BERT variants)
RoPE: Rotary embeddings (modern decoder-only LLMs)

Quick Start

Create Basic Token Embeddings

import torch

vocab_size = 50257  # BPE vocabulary
embedding_dim = 256

token_embedding = torch.nn.Embedding(vocab_size, embedding_dim)

Add Absolute Positional Embeddings

context_length = 512
pos_embedding = torch.nn.Embedding(context_length, embedding_dim)

# Combine embeddings
token_emb = token_embedding(token_ids)  # [batch, seq_len, dim]
pos_emb = pos_embedding(torch.arange(seq_len))  # [seq_len, dim]
combined = token_emb + pos_emb  # [batch, seq_len, dim]

RoPE (Rotary Positional Embeddings)

For modern LLMs, RoPE is preferred:

def apply_rope(q, k, cos, sin):
    """Apply rotary positional embeddings to query/key vectors."""
    q_embed = (q * cos) + (rotate_half(q) * sin)
    k_embed = (k * cos) + (rotate_half(k) * sin)
    return q_embed, k_embed

Common Tasks

Task 1: Initialize Embedding Layer

import torch

def create_token_embeddings(vocab_size: int, embedding_dim: int) -> torch.nn.Embedding:
    """Create a token embedding layer."""
    return torch.nn.Embedding(vocab_size, embedding_dim)

# Usage
embedding_layer = create_token_embeddings(50257, 256)
print(embedding_layer.weight.shape)  # torch.Size([50257, 256])

Task 2: Create Positional Embeddings

def create_positional_embeddings(context_length: int, embedding_dim: int) -> torch.nn.Embedding:
    """Create absolute positional embeddings."""
    return torch.nn.Embedding(context_length, embedding_dim)

# Usage
pos_layer = create_positional_embeddings(512, 256)
pos_embeddings = pos_layer(torch.arange(512))
print(pos_embeddings.shape)  # torch.Size([512, 256])

Task 3: Combine Token and Positional Embeddings

def combine_embeddings(
    token_ids: torch.Tensor,
    token_embedding: torch.nn.Embedding,
    pos_embedding: torch.nn.Embedding
) -> torch.Tensor:
    """Combine token and positional embeddings.
    
    Args:
        token_ids: [batch_size, seq_len]
        token_embedding: Token embedding layer
        pos_embedding: Positional embedding layer
    
    Returns:
        Combined embeddings: [batch_size, seq_len, embedding_dim]
    """
    batch_size, seq_len = token_ids.shape
    
    # Get token embeddings
    token_emb = token_embedding(token_ids)  # [batch, seq_len, dim]
    
    # Get positional embeddings
    positions = torch.arange(seq_len).expand(batch_size, -1)
    pos_emb = pos_embedding(positions)  # [batch, seq_len, dim]
    
    # Combine
    return token_emb + pos_emb

Task 4: Position Interpolation for Extended Context

def position_interpolation(
    pos_ids: torch.Tensor,
    original_context: int,
    new_context: int
) -> torch.Tensor:
    """Scale position indices for context window extension.
    
    Args:
        pos_ids: Original position indices
        original_context: Training context length (e.g., 2048)
        new_context: Target context length (e.g., 8192)
    
    Returns:
        Scaled position indices
    """
    scale = original_context / new_context
    scaled_pos = (pos_ids * scale).long()
    return scaled_pos

# Usage
original_ctx = 2048
new_ctx = 8192
positions = torch.arange(8192)
scaled_positions = position_interpolation(positions, original_ctx, new_ctx)

Debugging Checklist

When embeddings aren't working correctly, check:

Dimension Mismatches

# Verify shapes match
assert token_emb.shape == pos_emb.shape, "Embedding dimensions must match"

Vocabulary Size

# Ensure vocab_size matches your tokenizer
max_token_id = token_ids.max()
assert max_token_id < vocab_size, f"Token {max_token_id} exceeds vocab_size {vocab_size}"

Context Length

# Ensure sequence doesn't exceed context length
seq_len = token_ids.shape[1]
assert seq_len <= context_length, f"Sequence {seq_len} exceeds context {context_length}"

Gradient Flow

# Verify embeddings are trainable
assert token_embedding.weight.requires_grad, "Embeddings should be trainable"

Best Practices

Embedding Dimensions: Use powers of 2 (256, 512, 768, 1024) for efficiency
Initialization: PyTorch's default Xavier initialization works well
Positional Encoding: Use RoPE for decoder-only models, absolute for encoder-only
Context Extension: Use position interpolation before fine-tuning for longer contexts
Batch Processing: Always process in batches for efficiency

Example: Complete Embedding Setup

import torch
import torch.nn as nn

class TokenEmbedding(nn.Module):
    def __init__(self, vocab_size: int, embedding_dim: int, context_length: int):
        super().__init__()
        self.token_embedding = nn.Embedding(vocab_size, embedding_dim)
        self.pos_embedding = nn.Embedding(context_length, embedding_dim)
        self.context_length = context_length
    
    def forward(self, token_ids: torch.Tensor) -> torch.Tensor:
        batch_size, seq_len = token_ids.shape
        
        # Token embeddings
        token_emb = self.token_embedding(token_ids)
        
        # Positional embeddings
        positions = torch.arange(seq_len).expand(batch_size, -1)
        pos_emb = self.pos_embedding(positions)
        
        # Combine
        return token_emb + pos_emb

# Usage
vocab_size = 50257
embedding_dim = 256
context_length = 512

embedding_model = TokenEmbedding(vocab_size, embedding_dim, context_length)

# Test with sample input
batch_size = 8
seq_len = 4
token_ids = torch.randint(0, vocab_size, (batch_size, seq_len))

output = embedding_model(token_ids)
print(output.shape)  # torch.Size([8, 4, 256])

References

Build a Large Language Model from Scratch
RoPE Paper
YaRN: Context Extension
Position Interpolation

Related Skills

abelrguezr/house-of-lore-exploit

testing

VerifiedTrustedCommunity

How to perform a House of Lore (small bin attack) heap exploitation. Use this skill whenever the user mentions heap exploitation, small bin attacks, fake chunks, glibc heap vulnerabilities, or needs to insert fake chunks into small bins for arbitrary read/write. Trigger for CTF challenges involving heap corruption, glibc 2.31+ exploitation, or when the user needs to bypass malloc sanity checks using fake chunk linking.

5SKILL.mdUpdated Apr 16, 2026

abelrguezr/house-of-lore-exploit

abelrguezr/house-of-force-exploit

testing

VerifiedTrustedCommunity

How to perform House of Force heap exploitation attacks. Use this skill whenever the user mentions heap exploitation, House of Force, top chunk manipulation, arbitrary memory allocation, malloc manipulation, or wants to allocate chunks at specific addresses. Also trigger for CTF challenges involving heap overflows, top chunk size overwrites, or when the user needs to calculate evil_size for heap attacks. Make sure to use this skill for any binary exploitation task involving glibc heap manipulation, even if they don't explicitly say "House of Force".

5SKILL.mdUpdated Apr 16, 2026

abelrguezr/house-of-force-exploit

abelrguezr/house-of-einherjar

tools

VerifiedTrustedCommunity

How to perform House of Einherjar heap exploitation to allocate memory at arbitrary addresses. Use this skill whenever the user mentions heap exploitation, glibc heap attacks, arbitrary memory allocation, off-by-one overflow exploitation, tcache poisoning, fast bin attacks, or any CTF challenge involving heap manipulation. This is essential for binary exploitation tasks where you need to control malloc() return addresses.

5SKILL.mdUpdated Apr 16, 2026

abelrguezr/house-of-einherjar

abelrguezr/heap-overflow-exploitation

testing

VerifiedTrustedCommunity

How to identify, analyze, and exploit heap overflow vulnerabilities in binary exploitation challenges and real-world scenarios. Use this skill whenever the user mentions heap overflows, memory corruption, heap grooming, tcache poisoning, fast-bin attacks, or any heap-related vulnerability in CTF challenges, binary analysis, or security research. This skill covers heap overflow fundamentals, exploitation techniques, heap grooming strategies, and real-world CVE analysis.

5SKILL.mdUpdated Apr 16, 2026

abelrguezr/heap-overflow-exploitation

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/abelrguezr/hacktricks-skills.git

# Copy into Claude Code skills folder (global)
cp -r hacktricks-skills/skills/AI/AI-llm-architecture/3.-token-embeddings ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

abelrguezr/hacktricks-skills

5 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT