skills/cope-clipped-rope-as/SKILL.md
Implement CoPE (Clipped RoPE) soft clipping of low-frequency rotary positional embedding components to extend LLM context length without retraining. Use when: 'extend context window with CoPE', 'apply soft clipping to RoPE', 'fix long context degradation', 'implement CoPE positional embedding', 'scale RoPE to longer sequences', 'add cosine-decay frequency clipping'
npx skillsauth add ndpvt-web/arxiv-claude-skills cope-clipped-rope-asInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to implement CoPE (Clipped RoPE), a training-free technique that extends the effective context length of RoPE-based LLMs by applying cosine-decay soft clipping to low-frequency positional embedding components. CoPE eliminates out-of-distribution position signal outliers, preserves semantic attention patterns, and avoids spectral leakage artifacts that plague hard-clipping approaches -- yielding gains from 4k to 256k context lengths as a drop-in modification to the RoPE frequency vector.
The Problem. RoPE encodes position information by rotating query/key vectors at dimension-specific frequencies. Lower dimensions rotate fast (high frequency, encoding local position), while higher dimensions rotate slowly (low frequency, encoding global position). When a model encounters sequence positions beyond its training window, these low-frequency components enter out-of-distribution territory, producing unreliable attention scores. Hard-clipping these frequencies (setting them to zero) introduces spectral leakage -- sinc-kernel oscillations with slow O(1/tau) decay that corrupt the attention pattern.
The CoPE Solution. Instead of hard-clipping, CoPE applies a cosine-decay taper to the last N entries of the inverse-frequency vector (inv_freq). The weight function is w = 0.5 * (1 + cos(theta)) where theta sweeps from 0 to pi across the clipped dimensions. This smoothly attenuates low-frequency components from full strength to zero, eliminating OOD outliers while preventing Gibbs oscillations. The modification touches only the inv_freq initialization -- no architectural changes, no attention mask modifications, fully compatible with FlashAttention.
Why it works. CoPE unifies two previously separate goals: (1) OOD mitigation -- the tapered frequencies never produce extreme rotation angles at unseen positions; (2) semantic modeling -- by suppressing slow-rotating dimensions that encode positional rather than semantic information, attention scores more reliably reflect token similarity. On Llama-3-8B extended to 64k context, CoPE improves HELMET scores by 10.8% within the training range and nearly doubles performance under 256k extrapolation (14.37% to 28.48%), with zero degradation on short-context benchmarks (MMLU, GSM8K).
Identify the RoPE implementation. Locate the RotaryEmbedding class (or equivalent) in the model code. In HuggingFace transformers, this is typically in modeling_llama.py, modeling_qwen2.py, or modeling_mistral.py. Find where inv_freq is computed -- usually as inv_freq = 1.0 / (base ** (torch.arange(0, dim, 2).float() / dim)).
Determine the critical dimension. Calculate which frequency dimensions correspond to rotation periods exceeding the pre-training context length. The formula is d_ct = 2 * ceil((d/2) * log_base(L_pre / (2*pi))) where d is head dimension, base is the RoPE base frequency, and L_pre is the pre-training context length. Dimensions beyond d_ct are OOD-prone.
Choose the clip count. The default is clip_low_n = 20 (the last 20 entries of inv_freq). For a 128-dimensional head (64 frequency entries), this clips ~31% of frequencies. The paper finds clipping ~75% of OOD frequencies optimal. Adjust based on head dimension and how aggressively you need to extrapolate.
Implement the cosine-decay soft mask. After computing inv_freq, apply:
clip = min(clip_low_n, inv_freq.numel())
if clip > 0:
start_idx = inv_freq.numel() - clip
theta = torch.linspace(0.0, torch.pi, steps=clip, device=inv_freq.device, dtype=torch.float32)
smooth_mask = 0.5 * (1.0 + torch.cos(theta))
inv_freq = inv_freq.clone()
inv_freq[start_idx:] = inv_freq[start_idx:] * smooth_mask.to(inv_freq.dtype)
Handle dynamic frequency updates. If the model uses dynamic RoPE scaling (e.g., for sequences exceeding max_position_embeddings), re-apply the soft mask after each frequency recomputation. This ensures the taper remains active when inv_freq is recalculated for longer sequences.
Register the modified inv_freq. Replace the original inv_freq buffer registration with the clipped version. Ensure the mask is applied before self.register_buffer("inv_freq", inv_freq, persistent=False).
Validate with a needle-in-haystack test. Run a simple retrieval test at 2x and 4x the training context length. CoPE should maintain recall (e.g., RULER NIAH: 60.5% vanilla RoPE vs 78.5% CoPE at 256k). If recall drops, increase clip_low_n slightly.
Verify no short-context regression. Run a standard benchmark (MMLU or similar) to confirm scores remain within noise of the baseline. CoPE should not degrade short-context performance.
Optionally combine with ABF. CoPE stacks with Adjusted Base Frequency (increasing RoPE base from e.g. 500k to 10M). Apply ABF first to shift frequencies, then apply CoPE soft clipping to the resulting inv_freq.
Example 1: Adding CoPE to a HuggingFace LLaMA model
User: "I'm serving Llama-3-8B with transformers and it degrades badly past 8k context. Can you add CoPE to extend it?"
Approach:
use_cope config flag and clip parameterinv_freq after initializationImplementation -- modify LlamaRotaryEmbedding.__init__:
class LlamaRotaryEmbedding(nn.Module):
def __init__(self, config, device=None):
super().__init__()
self.config = config
self.rope_type = _get_rope_type(config)
self.max_seq_len_cached = config.max_position_embeddings
self.original_max_seq_len = config.max_position_embeddings
inv_freq, self.attention_scaling = ROPE_INIT_FUNCTIONS[self.rope_type](config, device)
# --- CoPE: soft-clip low-frequency components ---
use_cope = getattr(config, "use_cope", False)
self.clip_low_n = getattr(config, "cope_clip_n", 20) if use_cope else 0
if self.clip_low_n > 0:
freq_dim = inv_freq.numel()
clip = min(self.clip_low_n, freq_dim)
start_idx = freq_dim - clip
theta = torch.linspace(0.0, torch.pi, steps=clip,
device=inv_freq.device, dtype=torch.float32)
smooth_mask = 0.5 * (1.0 + torch.cos(theta))
inv_freq = inv_freq.clone()
inv_freq[start_idx:] *= smooth_mask.to(inv_freq.dtype)
# --- end CoPE ---
self.register_buffer("inv_freq", inv_freq, persistent=False)
self.original_inv_freq = self.inv_freq
Enable it:
from transformers import AutoModelForCausalLM, AutoConfig
config = AutoConfig.from_pretrained("meta-llama/Llama-3-8B")
config.use_cope = True
config.cope_clip_n = 20 # clip last 20 of 64 frequency entries
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3-8B", config=config)
Example 2: Adding CoPE to a custom RoPE implementation
User: "I have a custom transformer with RoPE. How do I add CoPE soft clipping?"
Approach:
inv_freq or freqs is computedStandalone utility:
import torch
def apply_cope_clipping(inv_freq: torch.Tensor, clip_n: int = 20) -> torch.Tensor:
"""Apply CoPE cosine-decay soft clipping to the last clip_n entries of inv_freq.
Args:
inv_freq: RoPE inverse frequency tensor, shape (dim // 2,)
clip_n: Number of low-frequency (high-index) entries to taper.
Returns:
Modified inv_freq with soft-clipped low frequencies.
"""
freq_dim = inv_freq.numel()
clip = min(clip_n, freq_dim)
if clip == 0:
return inv_freq
start_idx = freq_dim - clip
theta = torch.linspace(0.0, torch.pi, steps=clip,
device=inv_freq.device, dtype=torch.float32)
smooth_mask = 0.5 * (1.0 + torch.cos(theta))
inv_freq = inv_freq.clone()
inv_freq[start_idx:] *= smooth_mask.to(inv_freq.dtype)
return inv_freq
Usage in any RoPE setup:
dim = 128
base = 500000.0
inv_freq = 1.0 / (base ** (torch.arange(0, dim, 2).float() / dim))
inv_freq = apply_cope_clipping(inv_freq, clip_n=20)
Example 3: Choosing clip_n for a different model
User: "I'm using Qwen2-7B with head_dim=128 and base=1000000. What clip_n should I use?"
Approach:
import math
head_dim = 128
base = 1_000_000
L_pre = 32768 # Qwen2 pre-training context
# Critical frequency index: where rotation period exceeds L_pre
# theta_j = 1 / base^(2j/d), period = 2*pi / theta_j
# Period > L_pre when theta_j < 2*pi / L_pre
theta_crit = 2 * math.pi / L_pre # ~0.000192
freq_entries = head_dim // 2 # 64
oob_count = 0
for j in range(freq_entries):
theta_j = 1.0 / (base ** (2 * j / head_dim))
if theta_j < theta_crit:
oob_count += 1
# oob_count tells you how many frequencies are OOD
# Clip ~75% of those: clip_n = round(0.75 * oob_count)
clip_n = round(0.75 * oob_count)
print(f"OOD frequencies: {oob_count}, recommended clip_n: {clip_n}")
Output: Typically yields clip_n between 15-25 depending on the model's base frequency and pre-training length.
inv_freq at initialization time, not during every forward pass. The mask is static and should only be recomputed if inv_freq itself is recalculated (dynamic RoPE scaling).inv_freq before in-place modification (inv_freq = inv_freq.clone()) to avoid corrupting shared tensors in multi-GPU setups or gradient computation.inv_freq has 64 entries; for head_dim=64, it has 32 entries. Adjust clip_n proportionally.model.model.layers[0].self_attn.rotary_emb.inv_freq -- the last clip_n values should taper toward zero. If they match unmodified values, the config flag is not being read.clip_n. If clipping too many in-distribution frequencies, you lose useful positional signal. Start with clip_n = 10 and increase.smooth_mask dtype matches inv_freq dtype. Mixed precision mismatches (e.g., float16 inv_freq with float32 mask) can cause issues on some hardware._dynamic_frequency_update for sequences exceeding max_seq_len_cached, ensure the CoPE mask is reapplied after the new inv_freq is computed in that method.inv_freq is clipped before it is sharded across devices. Apply CoPE in __init__ before any model parallelism wrapper.clip_n may differ for models with different head dimensions, base frequencies, or pre-training lengths.Paper: CoPE: Clipped RoPE as A Scalable Free Lunch for Long Context LLMs (Li et al., 2026). Look for Section 3 (the soft clipping weight function and spectral leakage analysis) and Table 1 (HELMET benchmark results across context lengths).
Code: github.com/hrlics/CoPE -- reference implementation for Llama-3-8B with training and evaluation scripts.
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".