skills/alienlm-alienization-api-boundary-privacy/SKILL.md
Implement AlienLM-style API-boundary privacy layers that protect sensitive text sent to black-box LLM APIs using vocabulary-scale bijective token remapping. Use when: 'add privacy layer to LLM API calls', 'protect prompts sent to external API', 'alienize text for API privacy', 'build token-level encryption for LLM pipeline', 'implement bijective vocabulary mapping', 'privacy-preserving LLM deployment'.
npx skillsauth add ndpvt-web/arxiv-claude-skills alienlm-alienization-api-boundary-privacyInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to design and implement AlienLM-style privacy layers that protect sensitive prompts, outputs, and fine-tuning data transmitted to black-box LLM APIs. The core technique is a vocabulary-scale bijection — a one-to-one token-ID remapping that translates plaintext into an "Alien Language" before it crosses the API boundary, then losslessly recovers the original text client-side using the inverse mapping. Combined with Alien Adaptation Training (AAT), the target model learns to operate directly on alienized inputs, retaining over 81% of plaintext performance while exposing fewer than 0.22% of tokens to recovery attacks.
Vocabulary-Scale Bijection. AlienLM defines a bijection f: I → I over the set of non-special token IDs in the model's vocabulary. Special tokens (BOS, EOS, PAD, etc.) are preserved unchanged. For each non-special token ID i_k, the bijection maps it to a different token ID f(i_k), creating an "alien vocabulary" where every token is swapped with another. Because f is bijective, the inverse f⁻¹ exists and recovery is lossless: D(E(x)) = x. The bijection is seeded by a secret key stored client-side — the API provider never sees the mapping. Critically, this operates at the token-ID level, not the character level. Character-level substitutions (like ROT13) break subword tokenizer boundaries and produce out-of-distribution token sequences, degrading performance below 25%. Token-ID bijection preserves the model's familiar subword structure.
Bijection Optimization. Not all bijections are equal. A random permutation works but leaves performance on the table. AlienLM optimizes the bijection by (1) maximizing the surface-form edit distance between original and mapped tokens (so alienized text looks nothing like plaintext) while (2) minimizing the embedding-space distance between paired tokens (so the model can transfer learned representations). This is solved greedily: partition token IDs into k buckets by seed, retrieve k nearest neighbors in embedding space for each token, score candidates on edit-distance vs. embedding-similarity, and greedily assign symmetric pairs. A proxy model's embeddings (e.g., Qwen for LLaMA) suffice — cross-model alignment is strong enough that proxy-based bijection performs within ~1.75 accuracy points of using the target model's own embeddings.
Alien Adaptation Training (AAT). After constructing the bijection, the target model is fine-tuned exclusively on alienized data. Both inputs and outputs in the training set are passed through the encoder E before upload to the fine-tuning API. The training objective is standard causal language modeling loss over alienized token sequences. The API provider sees only alien text during training and inference — no plaintext ever crosses the boundary. AAT uses ~300K instruction-tuning examples plus optional domain-specific data and completes in approximately 12 hours on 4×A100-equivalent compute via commercial fine-tuning APIs.
Identify the vocabulary and special tokens. Load the target model's tokenizer. Extract the full token-ID set I and designate the special-token subset S (BOS, EOS, PAD, mask tokens, tool-call delimiters). Only tokens in I \ S participate in the bijection.
Generate the bijection seed (secret key). Generate a cryptographically random seed. Store it securely client-side (e.g., in a secrets manager or environment variable). This seed deterministically controls the bijection — different seeds produce different alien languages with ~1.4% pairwise token overlap.
Build the optimized bijection. Using the seed, partition non-special token IDs into k buckets. For each token in a bucket, retrieve its k nearest neighbors by embedding cosine similarity (using a proxy model's LM head if target embeddings are unavailable). Score each candidate pair by: score = normalized_edit_distance(surface_a, surface_b) - μ * embedding_similarity(a, b). Greedily assign the highest-scoring symmetric pairs. This runs in under 20 minutes for a 128K vocabulary.
Implement the encoder E and decoder D. The encoder tokenizes plaintext with the model's tokenizer τ, remaps each non-special token ID via f, then decodes back to a string via τ⁻¹. The decoder does the reverse with f⁻¹. Both are pure functions — no model inference required. Wrap these as a lightweight client library.
Choose the alienization ratio ρ. The ratio ρ ∈ [0, 1] controls what fraction of the vocabulary is permuted. At ρ = 1.0, all non-special tokens are remapped (maximum privacy, ~81% performance recovery). At ρ = 0.6, ~86% performance recovery with most tokens still alienized. Select based on your privacy-performance tradeoff.
Prepare alienized training data. Take your instruction-tuning dataset {(x_i, y_i)}. Apply the encoder to both inputs and outputs: {(E(x_i), E(y_i))}. Upload only the alienized pairs to the fine-tuning API. The provider never receives plaintext.
Run Alien Adaptation Training (AAT). Fine-tune the target model on the alienized dataset using the provider's standard fine-tuning API. Use standard causal LM loss. Include domain-specific alienized examples if you need strong performance on specialized tasks (e.g., adding alienized math data improves GSM8K from 41.7% to 55.5%).
Deploy the inference pipeline. At inference time: (a) client alienizes the prompt via E, (b) sends alien text to the API, (c) receives alien response, (d) client de-alienizes via D. The round trip is lossless. The API provider sees only alien tokens at every stage.
Implement key rotation. To rotate keys, generate a new seed, rebuild the bijection, re-alienize training data, and re-run AAT. Each key rotation produces an independent alien language. For multi-tenant scenarios, note that per-key AAT outperforms shared multi-seed training.
Validate with recovery attack simulation. Test your deployment against the three threat tiers: (O1) frequency analysis on alien text, (O2) partial plaintext-alien pair leakage, (O3) adversary with model weights. Verify that token recovery stays below acceptable thresholds (<0.22% for O3-level adversaries with optimized bijections).
Example 1: Building a privacy proxy for a medical chatbot
User: "I'm building a chatbot that answers patient questions using GPT-4 via API.
Patient messages contain PHI. I need a privacy layer so the API provider
never sees real patient data."
Approach:
1. Load GPT-4's tokenizer (cl100k_base). Identify special tokens
(e.g., <|endoftext|>, <|im_start|>, <|im_end|>).
2. Generate a random 256-bit seed, store in AWS Secrets Manager.
3. Build optimized bijection over ~100K non-special tokens using
a proxy model (e.g., Qwen 2.5 7B embeddings for cross-model alignment).
4. Implement E/D as a Python middleware:
import hashlib, json
from transformers import AutoTokenizer
class AlienProxy:
def __init__(self, seed: bytes, tokenizer_name: str, rho: float = 1.0):
self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
self.bijection, self.inverse = self._build_bijection(seed, rho)
def alienize(self, text: str) -> str:
ids = self.tokenizer.encode(text, add_special_tokens=False)
alien_ids = [self.bijection.get(i, i) for i in ids]
return self.tokenizer.decode(alien_ids)
def dealienize(self, alien_text: str) -> str:
ids = self.tokenizer.encode(alien_text, add_special_tokens=False)
plain_ids = [self.inverse.get(i, i) for i in ids]
return self.tokenizer.decode(plain_ids)
5. Alienize 300K instruction-tuning examples + 50K medical QA pairs.
Upload alienized data to fine-tuning API. Run AAT.
6. In production: patient message → alienize → API call → dealienize → display.
Output:
- Patient sends: "I've been having chest pain for 3 days"
- API receives: "omorphagr thixotropydef disjunc kramerwald fost 3 intermitía"
(alien text — no PHI exposed)
- API responds in alien text
- Client dealienizes to readable medical advice
Example 2: Protecting proprietary code in a coding assistant pipeline
User: "We use an external LLM API for code review. Our source code is
proprietary and we can't let the provider read it. Can we use AlienLM?"
Approach:
1. Load the coding model's tokenizer. Preserve code-structural special
tokens (indent markers, newlines) in the special-token set S.
2. Set ρ = 0.8 for a balance of privacy and code-task performance.
3. Build bijection, ensuring code-specific tokens (variable names,
keywords) are remapped while structural tokens stay intact.
4. Prepare alienized training data from open-source code review datasets
(e.g., CodeReviewer). Alienize both code snippets and review comments.
5. Fine-tune via AAT on the alienized code review data.
6. Deploy: developer submits code → client alienizes → API reviews
alienized code → client dealienizes review comments.
Key consideration:
- Code structure (indentation, brackets, control flow) is partially
preserved through special-token exemption, but variable names and
string literals become unreadable to the provider.
- Include alienized code-specific training data to maintain review quality.
Example 3: Implementing the bijection optimization algorithm
User: "Show me how to build the optimized bijection, not just a random shuffle."
Approach:
1. Load proxy model embeddings (LM head weights), shape [vocab_size, dim].
2. For each non-special token i, compute k=50 nearest neighbors by
cosine similarity in embedding space.
3. Score each candidate pair (i, j):
def score_pair(i, j, tokenizer, embeddings, mu=0.5):
surf_i = tokenizer.decode([i])
surf_j = tokenizer.decode([j])
edit_dist = normalized_levenshtein(surf_i, surf_j)
emb_sim = cosine_similarity(embeddings[i], embeddings[j])
return edit_dist - mu * (1 - emb_sim) # maximize edit dist, minimize emb dist
4. Greedy symmetric assignment:
import numpy as np
from scipy.spatial.distance import cdist
def build_optimized_bijection(token_ids, embeddings, tokenizer, mu=0.5, k=50):
bijection = {}
available = set(token_ids)
# Precompute nearest neighbors
emb_matrix = embeddings[token_ids]
nn_indices = compute_knn(emb_matrix, k=k)
for i in token_ids:
if i in bijection:
continue
best_j, best_score = None, -float('inf')
for j in nn_indices[i]:
if j not in available or j == i:
continue
s = score_pair(i, j, tokenizer, embeddings, mu)
if s > best_score:
best_j, best_score = j, s
if best_j is not None:
bijection[i] = best_j
bijection[best_j] = i
available.discard(i)
available.discard(best_j)
return bijection
Output: A dictionary mapping each token ID to its optimized partner,
completing in <20 minutes for 128K vocabulary.
Paper: AlienLM: Alienization of Language for API-Boundary Privacy in Black-Box LLMs — Kim & Kang, 2026. Focus on Section 3 (bijection optimization algorithm), Section 4 (AAT procedure), and Table 1 (performance recovery ratios across models and benchmarks).
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".