Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

ADu2021/cartridges-long-context

Name: cartridges-long-context
Author: ADu2021

skills/skillxiv-v0.0.2-claude-opus-4.6/cartridges-long-context/SKILL.md

npx skillsauth add ADu2021/skillXiv cartridges-long-context

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Cartridges: Lightweight Long Context Representations

Core Concept

Cartridges are pre-trained KV cache representations that encode large text corpora into memory-efficient, reusable forms. Rather than loading entire documents into context at inference, users train Cartridges offline once via "self-study," then apply them across multiple queries. The approach composes multiple Cartridges without retraining.

Architecture Overview

Pre-training via self-study: Combines synthetic conversation generation with context-distillation training
Lightweight KV cache encoding: Stores corpus knowledge in hidden representations
Composability: Multiple trained Cartridges combine at inference without additional training
Efficiency: 38.6x memory reduction and 26.4x throughput versus in-context learning

Implementation

Step 1: Generate Synthetic Conversations

Create training data by generating model conversations about corpus content:

class CartridgePretrainer:
    def __init__(self, base_model, corpus_documents: list):
        self.model = base_model
        self.corpus = corpus_documents

    def generate_synthetic_conversations(self,
                                        num_conversations: int = 1000
                                        ) -> list:
        """Generate synthetic QA pairs about corpus content."""
        conversations = []

        for doc in self.corpus:
            # Extract key content from document
            doc_summary = self.model.extract_summary(doc)

            # Generate multiple question-answer pairs
            for _ in range(num_conversations // len(self.corpus)):
                question = self.model.generate_question(doc_summary)
                answer = self.model.generate_answer(
                    question,
                    doc,
                    context_length=4096
                )

                conversations.append({
                    "corpus_context": doc,
                    "question": question,
                    "answer": answer,
                    "doc_id": doc.get("id")
                })

        return conversations

Step 2: Train Context Distillation

Distill corpus knowledge into KV cache via synthetic conversations:

class ContextDistillationTrainer:
    def __init__(self, model, batch_size: int = 32):
        self.model = model
        self.batch_size = batch_size
        self.optimizer = torch.optim.Adam(
            model.parameters(),
            lr=1e-4
        )

    def compute_kv_cache_loss(self, corpus_text: str,
                             question: str,
                             answer: str) -> torch.Tensor:
        """Optimize KV cache to distill corpus knowledge."""

        # Encode corpus once to KV cache
        with torch.no_grad():
            corpus_tokens = self.model.tokenize(corpus_text)
            kv_cache = self.model.forward_and_cache(
                corpus_tokens
            )

        # Train model to answer question using cached KV
        question_tokens = self.model.tokenize(question)
        answer_tokens = self.model.tokenize(answer)

        # Forward pass with cached corpus KV
        logits = self.model.forward_with_kv_cache(
            question_tokens,
            kv_cache
        )

        # Compute loss on answer prediction
        loss = torch.nn.functional.cross_entropy(
            logits[:-1],  # Predict all but last token
            answer_tokens[1:]  # Shifted targets
        )

        return loss

    def train_epoch(self, conversations: list):
        """Train one epoch on synthetic conversations."""
        total_loss = 0.0

        for i in range(0, len(conversations), self.batch_size):
            batch = conversations[i:i + self.batch_size]

            self.optimizer.zero_grad()
            batch_loss = 0.0

            for conv in batch:
                loss = self.compute_kv_cache_loss(
                    conv["corpus_context"],
                    conv["question"],
                    conv["answer"]
                )
                batch_loss += loss

            batch_loss = batch_loss / len(batch)
            batch_loss.backward()
            self.optimizer.step()

            total_loss += batch_loss.item()

        return total_loss / len(conversations)

Step 3: Store and Compose Cartridges

Save trained KV caches and compose them at inference:

class CartridgeManager:
    def __init__(self, storage_path: str):
        self.storage_path = storage_path
        self.cartridges = {}

    def save_cartridge(self, corpus_id: str,
                      kv_cache: dict,
                      metadata: dict):
        """Save trained Cartridge with metadata."""
        cartridge = {
            "kv_cache": kv_cache,
            "corpus_id": corpus_id,
            "corpus_summary": metadata.get("summary"),
            "doc_count": metadata.get("doc_count"),
            "token_count": metadata.get("token_count")
        }

        save_path = f"{self.storage_path}/{corpus_id}.pt"
        torch.save(cartridge, save_path)
        self.cartridges[corpus_id] = cartridge

    def compose_cartridges(self, cartridge_ids: list) -> dict:
        """Combine multiple Cartridges at inference."""
        composed_kv = None
        metadata_list = []

        for cart_id in cartridge_ids:
            cartridge = torch.load(
                f"{self.storage_path}/{cart_id}.pt"
            )
            metadata_list.append({
                "corpus_id": cart_id,
                "summary": cartridge["corpus_summary"]
            })

            # Merge KV caches (concatenate along sequence dimension)
            if composed_kv is None:
                composed_kv = cartridge["kv_cache"]
            else:
                composed_kv = self._merge_kv_caches(
                    composed_kv,
                    cartridge["kv_cache"]
                )

        return {
            "combined_kv_cache": composed_kv,
            "source_cartridges": metadata_list
        }

    def _merge_kv_caches(self, kv1: dict, kv2: dict) -> dict:
        """Concatenate KV caches along sequence dimension."""
        merged = {}
        for layer in kv1.keys():
            # Concatenate keys and values from both caches
            merged[layer] = {
                "key": torch.cat([kv1[layer]["key"],
                                 kv2[layer]["key"]], dim=0),
                "value": torch.cat([kv1[layer]["value"],
                                   kv2[layer]["value"]], dim=0)
            }
        return merged

Step 4: Query with Composed Cartridges

Generate answers using pre-computed corpus representations:

def answer_query_with_cartridges(model,
                                 question: str,
                                 composed_cartridges: dict) -> str:
    """Answer question using composed Cartridge KV caches."""

    question_tokens = model.tokenize(question)

    # Generate using pre-computed KV caches
    response = model.generate_with_kv_cache(
        question_tokens,
        kv_cache=composed_cartridges["combined_kv_cache"],
        max_length=512
    )

    return model.detokenize(response)

Practical Guidance

Pre-training Strategy: Self-study synthetic conversations outperform naive next-token prediction on corpus text. Generate diverse QA pairs that cover different aspects of the corpus.

Memory Efficiency: Cartridges achieve 38.6x memory savings over in-context learning because KV caches are much smaller than full token sequences. This enables handling 484K effective context on MTOB benchmarks.

Composition Without Retraining: Pre-trained Cartridges compose directly at inference by concatenating KV sequences. No fine-tuning needed to combine multiple corpora.

When to Apply: Use Cartridges for frequently-queried corpora, knowledge bases, or technical documentation where amortizing pre-training over many queries justifies the offline computation cost.

Reference

Cartridges represent a shift from retrieving documents at inference to retrieving pre-computed KV representations. The self-study approach (synthetic conversations plus context distillation) proves more effective than naive corpus encoding. Composability enables flexible corpus combinations without additional training overhead.

ADu2021/cartridges-long-context

skills/skillxiv-v0.0.2-claude-opus-4.6/cartridges-long-context/SKILL.md

Train reusable pre-computed KV cache representations of large text corpora for efficient retrieval, achieving 38.6x memory reduction and 26.4x throughput improvement.

2 stars

testing

Updated Apr 16, 2026

$ install --global

skillsauth

npx skillsauth add ADu2021/skillXiv cartridges-long-context

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 24, 2026, 8:55 PM1.8s1 file scanned

SKILL.md

name:: cartridges-long-context
title:: Cartridges: Lightweight and general-purpose long context representations via self-study
version:: 0.0.2
engine:: skillxiv-v0.0.2-claude-opus-4.6
license:: MIT
url:: https://arxiv.org/abs/2506.06266
keywords:: [KV cache, context representation, efficient retrieval, composability]
description:: Train reusable pre-computed KV cache representations of large text corpora for efficient retrieval, achieving 38.6x memory reduction and 26.4x throughput improvement.

Cartridges: Lightweight Long Context Representations

Core Concept

Architecture Overview

Pre-training via self-study: Combines synthetic conversation generation with context-distillation training
Lightweight KV cache encoding: Stores corpus knowledge in hidden representations
Composability: Multiple trained Cartridges combine at inference without additional training
Efficiency: 38.6x memory reduction and 26.4x throughput versus in-context learning

Implementation

Step 1: Generate Synthetic Conversations

Create training data by generating model conversations about corpus content:

class CartridgePretrainer:
    def __init__(self, base_model, corpus_documents: list):
        self.model = base_model
        self.corpus = corpus_documents

    def generate_synthetic_conversations(self,
                                        num_conversations: int = 1000
                                        ) -> list:
        """Generate synthetic QA pairs about corpus content."""
        conversations = []

        for doc in self.corpus:
            # Extract key content from document
            doc_summary = self.model.extract_summary(doc)

            # Generate multiple question-answer pairs
            for _ in range(num_conversations // len(self.corpus)):
                question = self.model.generate_question(doc_summary)
                answer = self.model.generate_answer(
                    question,
                    doc,
                    context_length=4096
                )

                conversations.append({
                    "corpus_context": doc,
                    "question": question,
                    "answer": answer,
                    "doc_id": doc.get("id")
                })

        return conversations

Step 2: Train Context Distillation

Distill corpus knowledge into KV cache via synthetic conversations:

class ContextDistillationTrainer:
    def __init__(self, model, batch_size: int = 32):
        self.model = model
        self.batch_size = batch_size
        self.optimizer = torch.optim.Adam(
            model.parameters(),
            lr=1e-4
        )

    def compute_kv_cache_loss(self, corpus_text: str,
                             question: str,
                             answer: str) -> torch.Tensor:
        """Optimize KV cache to distill corpus knowledge."""

        # Encode corpus once to KV cache
        with torch.no_grad():
            corpus_tokens = self.model.tokenize(corpus_text)
            kv_cache = self.model.forward_and_cache(
                corpus_tokens
            )

        # Train model to answer question using cached KV
        question_tokens = self.model.tokenize(question)
        answer_tokens = self.model.tokenize(answer)

        # Forward pass with cached corpus KV
        logits = self.model.forward_with_kv_cache(
            question_tokens,
            kv_cache
        )

        # Compute loss on answer prediction
        loss = torch.nn.functional.cross_entropy(
            logits[:-1],  # Predict all but last token
            answer_tokens[1:]  # Shifted targets
        )

        return loss

    def train_epoch(self, conversations: list):
        """Train one epoch on synthetic conversations."""
        total_loss = 0.0

        for i in range(0, len(conversations), self.batch_size):
            batch = conversations[i:i + self.batch_size]

            self.optimizer.zero_grad()
            batch_loss = 0.0

            for conv in batch:
                loss = self.compute_kv_cache_loss(
                    conv["corpus_context"],
                    conv["question"],
                    conv["answer"]
                )
                batch_loss += loss

            batch_loss = batch_loss / len(batch)
            batch_loss.backward()
            self.optimizer.step()

            total_loss += batch_loss.item()

        return total_loss / len(conversations)

Step 3: Store and Compose Cartridges

Save trained KV caches and compose them at inference:

class CartridgeManager:
    def __init__(self, storage_path: str):
        self.storage_path = storage_path
        self.cartridges = {}

    def save_cartridge(self, corpus_id: str,
                      kv_cache: dict,
                      metadata: dict):
        """Save trained Cartridge with metadata."""
        cartridge = {
            "kv_cache": kv_cache,
            "corpus_id": corpus_id,
            "corpus_summary": metadata.get("summary"),
            "doc_count": metadata.get("doc_count"),
            "token_count": metadata.get("token_count")
        }

        save_path = f"{self.storage_path}/{corpus_id}.pt"
        torch.save(cartridge, save_path)
        self.cartridges[corpus_id] = cartridge

    def compose_cartridges(self, cartridge_ids: list) -> dict:
        """Combine multiple Cartridges at inference."""
        composed_kv = None
        metadata_list = []

        for cart_id in cartridge_ids:
            cartridge = torch.load(
                f"{self.storage_path}/{cart_id}.pt"
            )
            metadata_list.append({
                "corpus_id": cart_id,
                "summary": cartridge["corpus_summary"]
            })

            # Merge KV caches (concatenate along sequence dimension)
            if composed_kv is None:
                composed_kv = cartridge["kv_cache"]
            else:
                composed_kv = self._merge_kv_caches(
                    composed_kv,
                    cartridge["kv_cache"]
                )

        return {
            "combined_kv_cache": composed_kv,
            "source_cartridges": metadata_list
        }

    def _merge_kv_caches(self, kv1: dict, kv2: dict) -> dict:
        """Concatenate KV caches along sequence dimension."""
        merged = {}
        for layer in kv1.keys():
            # Concatenate keys and values from both caches
            merged[layer] = {
                "key": torch.cat([kv1[layer]["key"],
                                 kv2[layer]["key"]], dim=0),
                "value": torch.cat([kv1[layer]["value"],
                                   kv2[layer]["value"]], dim=0)
            }
        return merged

Step 4: Query with Composed Cartridges

Generate answers using pre-computed corpus representations:

def answer_query_with_cartridges(model,
                                 question: str,
                                 composed_cartridges: dict) -> str:
    """Answer question using composed Cartridge KV caches."""

    question_tokens = model.tokenize(question)

    # Generate using pre-computed KV caches
    response = model.generate_with_kv_cache(
        question_tokens,
        kv_cache=composed_cartridges["combined_kv_cache"],
        max_length=512
    )

    return model.detokenize(response)

Practical Guidance

Pre-training Strategy: Self-study synthetic conversations outperform naive next-token prediction on corpus text. Generate diverse QA pairs that cover different aspects of the corpus.

Composition Without Retraining: Pre-trained Cartridges compose directly at inference by concatenating KV sequences. No fine-tuning needed to combine multiple corpora.

When to Apply: Use Cartridges for frequently-queried corpora, knowledge bases, or technical documentation where amortizing pre-training over many queries justifies the offline computation cost.

Reference

Related Skills

ADu2021/flow-map-trajectory-tilting

testing

VerifiedTrustedCommunity

Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.

2SKILL.mdUpdated Apr 17, 2026

ADu2021/flow-map-trajectory-tilting

ADu2021/flexible-data-mixture-of-experts

testing

VerifiedTrustedCommunity

Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.

2SKILL.mdUpdated Apr 17, 2026

ADu2021/flexible-data-mixture-of-experts

ADu2021/flexibility-trap-diffusion-reasoning

data-ai

VerifiedTrustedCommunity

Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.

2SKILL.mdUpdated Apr 17, 2026

ADu2021/flexibility-trap-diffusion-reasoning

ADu2021/flex-continuous-agent-evolution

devops

VerifiedTrustedCommunity

Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.

2SKILL.mdUpdated Apr 17, 2026

ADu2021/flex-continuous-agent-evolution

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/ADu2021/skillXiv.git

# Copy into Claude Code skills folder (global)
cp -r skillXiv/skills/skillxiv-v0.0.2-claude-opus-4.6/cartridges-long-context ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

ADu2021/skillXiv

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT