skills/draincode-stealthy-energy-consumption/SKILL.md
Evaluate and defend RAG-based code generation systems against energy-drain attacks that poison retrieval contexts to inflate LLM output length, latency, and GPU energy consumption. Use when: 'audit my RAG pipeline for energy attacks', 'test code retrieval poisoning resilience', 'detect adversarial triggers in retrieved code', 'harden my code generation system against context poisoning', 'benchmark energy cost of retrieval-augmented code generation', 'simulate DrainCode-style attacks on my pipeline'.
npx skillsauth add ndpvt-web/arxiv-claude-skills draincode-stealthy-energy-consumptionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to audit, test, and harden retrieval-augmented code generation (RAG) systems against DrainCode-style adversarial attacks. DrainCode (Wang et al., 2026) demonstrated that an attacker can poison a retrieval corpus with adversarial trigger tokens embedded in code snippets, causing LLMs to suppress their end-of-sequence (EOS) token and produce outputs 2-10x longer than normal -- inflating GPU latency by up to 182% and energy consumption by up to 155%, while largely preserving functional correctness. This skill teaches how to detect such poisoning, build defenses, and stress-test RAG pipelines for computational-efficiency vulnerabilities.
The Attack Model. DrainCode targets the standard RAG code-generation pipeline: a retriever (typically BM25 or embedding-based) fetches top-k code snippets from a corpus, concatenates them with the user's incomplete code as context, and feeds the combined prompt to an LLM for completion. The attacker poisons 1-3 snippets per query in the retrieval corpus by embedding short adversarial trigger token sequences within syntactically valid code blocks. These triggers are optimized via gradient-based search to minimize EOS probability across all generation positions (L1 loss) while maximizing hidden-state diversity to encourage varied, lengthy output (L2 nuclear-norm loss), subject to a KL-divergence constraint that keeps non-trigger output distributions close to clean baselines -- making the poisoning hard to detect by surface-level metrics.
Why It Matters for Defense. Existing defenses performed poorly against DrainCode: SVM classifiers on hidden representations achieved only 30-37% detection accuracy, perplexity-based filters scored 28-29%, and fine-tuned CodeBERT reached 51-62% but is limited to 512 tokens. This means production systems need layered, purpose-built defenses: output-length anomaly detection, token-budget enforcement, retrieval-time perplexity gating on full-length snippets, and energy-consumption monitoring. The attack's transferability across models (tested on DeepSeek-Coder-7B, CodeQwen-7B, Internlm2-7B, Llama3-8B) and prompting strategies makes defense-in-depth essential.
The Defensive Opportunity. Because triggers are gradient-optimized token sequences, they often contain unusual token co-occurrences within otherwise normal code. Statistical anomaly detection on token bigram/trigram distributions within retrieved snippets, combined with strict output-length budgets and energy-per-query monitoring, provides the most practical defense stack.
Map the retrieval architecture. Identify the retriever type (BM25, dense embedding, hybrid), corpus source (public repos, curated datasets), number of retrieved snippets (k), and how context is concatenated with user queries. Document the exact prompt template used.
Establish clean baselines. Run 100+ representative code-generation queries against the unmodified pipeline. Record per-query metrics: output token count, wall-clock latency, GPU energy (via nvidia-smi or codecarbon), and functional correctness (pass@1 on unit tests). Compute mean, median, P95, and P99 for each metric.
Craft probe snippets that simulate adversarial triggers. Insert sequences of low-frequency tokens (e.g., rarely-used Unicode identifiers, unusual variable names with high token entropy) into otherwise valid code snippets. Place these at comment boundaries, inside docstrings, and between function definitions -- the positions DrainCode targets.
Inject probe snippets into the retrieval corpus. Add 1-3 poisoned snippets per test query to the corpus, ensuring they rank in the top-k results for target queries. Re-run the baseline query set and compare output length, latency, and energy against clean baselines.
Detect anomalous outputs. Flag any query where output length exceeds 2x the P95 baseline, latency exceeds 1.5x baseline, or energy exceeds 1.5x baseline. These thresholds correspond to the lower bound of DrainCode's demonstrated impact.
Implement retrieval-time filtering. Add a perplexity gate: compute per-token perplexity of each retrieved snippet using a small language model (e.g., a 1B-parameter model). Reject snippets whose perplexity exceeds 2 standard deviations above the corpus mean. Also compute token bigram entropy and flag statistical outliers.
Enforce output-length budgets. Set a hard max_new_tokens limit at 2x the P95 clean output length for your task distribution. This directly caps the energy amplification factor regardless of context poisoning.
Deploy runtime energy monitoring. Instrument the inference server to log GPU energy per query (using codecarbon, pyJoules, or direct NVML calls). Set alerts when rolling-average energy exceeds 1.3x the clean baseline over a 5-minute window.
Validate defenses under attack simulation. Re-run the probe-snippet injection from step 4 with all defenses active. Confirm that flagged queries are caught, output lengths are capped, and energy stays within budget. Measure any impact on clean-query performance (false positive rate, latency overhead of filtering).
Document the threat model and residual risk. Record which attack vectors are mitigated (corpus poisoning, prompt injection), which are partially mitigated (white-box trigger optimization against your specific model), and which require ongoing monitoring (novel trigger patterns, model updates changing vulnerability surface).
Example 1: Auditing a Code Completion Service
User: "I run a RAG code completion service using BM25 retrieval over a public Python snippet corpus and DeepSeek-Coder-7B. How do I test if it's vulnerable to energy-drain attacks?"
Approach:
codecarbon to measure energy per query# Probe snippet example: adversarial tokens embedded in a valid function
def calculate_total(items):
# xtq_7kz mf_2rp vbn_9wl <-- unusual token sequence simulating trigger
total = 0
for item in items:
total += item.price * item.quantity
return total
max_new_tokens=600 (2x P95 baseline) and perplexity filtering on retrieved snippetsOutput:
Baseline: mean=298 tokens, P95=412 tokens, energy=0.42 Wh/query
With probes: mean=847 tokens, P95=1,203 tokens, energy=0.89 Wh/query
Verdict: VULNERABLE (2.8x output inflation, 2.1x energy increase)
After mitigation: mean=305 tokens, P95=420 tokens, energy=0.44 Wh/query
False positive rate on clean queries: 0.8%
Example 2: Building a Retrieval Filter
User: "Write a retrieval-time filter that screens out potentially poisoned code snippets before they reach the LLM."
Approach:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import numpy as np
class RetrievalPoisonFilter:
def __init__(self, model_name="microsoft/phi-2", threshold_std=2.0):
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
self.model = AutoModelForCausalLM.from_pretrained(
model_name, torch_dtype=torch.float16, device_map="auto"
)
self.threshold_std = threshold_std
self.corpus_stats = None # calibrated during setup
def calibrate(self, clean_snippets: list[str], sample_size=500):
"""Compute perplexity distribution over known-clean snippets."""
perplexities = []
for snippet in clean_snippets[:sample_size]:
ppl = self._compute_perplexity(snippet)
perplexities.append(ppl)
self.corpus_stats = {
"mean": np.mean(perplexities),
"std": np.std(perplexities),
}
def is_suspicious(self, snippet: str) -> bool:
"""Return True if snippet perplexity is anomalously high."""
if self.corpus_stats is None:
raise RuntimeError("Call calibrate() first")
ppl = self._compute_perplexity(snippet)
threshold = (
self.corpus_stats["mean"]
+ self.threshold_std * self.corpus_stats["std"]
)
return ppl > threshold
def filter_retrievals(self, snippets: list[str]) -> list[str]:
"""Remove suspicious snippets from retrieval results."""
return [s for s in snippets if not self.is_suspicious(s)]
def _compute_perplexity(self, text: str) -> float:
inputs = self.tokenizer(text, return_tensors="pt", truncation=True,
max_length=2048).to(self.model.device)
with torch.no_grad():
outputs = self.model(**inputs, labels=inputs["input_ids"])
return torch.exp(outputs.loss).item()
Example 3: Energy Monitoring Dashboard
User: "Set up energy-per-query monitoring for my code generation API to detect ongoing attacks."
Approach:
from codecarbon import EmissionsTracker
from collections import deque
import statistics
class EnergyAnomalyDetector:
def __init__(self, window_size=100, alert_multiplier=1.5):
self.window = deque(maxlen=window_size)
self.alert_multiplier = alert_multiplier
self.baseline_median = None
def calibrate(self, baseline_energies: list[float]):
self.baseline_median = statistics.median(baseline_energies)
def record_and_check(self, energy_kwh: float) -> dict:
self.window.append(energy_kwh)
rolling_median = statistics.median(self.window)
is_anomalous = (
self.baseline_median is not None
and rolling_median > self.baseline_median * self.alert_multiplier
)
return {
"energy_kwh": energy_kwh,
"rolling_median": rolling_median,
"baseline_median": self.baseline_median,
"alert": is_anomalous,
}
# Usage in inference endpoint:
def generate_with_monitoring(query, retrieved_context, model, detector):
tracker = EmissionsTracker(log_level="error")
tracker.start()
output = model.generate(retrieved_context + query)
emissions = tracker.stop()
result = detector.record_and_check(emissions)
if result["alert"]:
log_security_event("energy_anomaly", result)
return output
max_new_tokens limits on all code generation endpoints. This is the single most effective mitigation -- it directly caps the amplification factor regardless of trigger sophistication.Wang, Y., Wu, J., Jiang, T., Liu, M., & Chen, J. (2026). DRAINCODE: Stealthy Energy Consumption Attacks on Retrieval-Augmented Code Generation via Context Poisoning. arXiv:2601.20615v3. https://arxiv.org/abs/2601.20615v3
Key sections to study: Section 3 (attack formulation with EOS loss + nuclear-norm diversity loss + KL constraint), Section 4.2 (per-model results showing 118-226% output inflation), and Section 5 (defense evaluation showing all tested detectors below 62% accuracy).
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".