skills/skillxiv-v0.0.2-claude-opus-4.6/efficient-machine-unlearning/SKILL.md
Framework for efficient machine unlearning that reformulates forgetting as inverse learning. Achieves significant computational speedup by replacing expensive Hessian operations with gradient-based optimization, enabling privacy-preserving model updates.
npx skillsauth add ADu2021/skillXiv efficient-machine-unlearningInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Efficient Machine Unlearning addresses the critical challenge of removing specific training data from models for privacy compliance (e.g., GDPR right to be forgotten). Rather than expensive Hessian-based approaches, this framework establishes a theoretical connection between learning and unlearning, enabling efficient gradient-based deletion.
The fundamental insight is that unlearning can be viewed as the inverse of incremental learning. By reformulating the problem through this lens:
The framework consists of:
Step 1: Implement efficient influence approximation
Approximate influence without expensive Hessian computation:
import torch
import torch.nn as nn
from typing import List, Tuple, Dict, Optional
import numpy as np
class InfluenceApproximator:
"""Efficiently approximates data influence on model parameters"""
def __init__(self, model: nn.Module):
self.model = model
self.parameters = list(model.parameters())
def compute_gradient_norm(self, batch: Dict,
loss_fn: callable) -> torch.Tensor:
"""
Compute gradient norm for batch influence.
Args:
batch: Data batch
loss_fn: Loss function
Returns:
Gradient vector (flattened)
"""
# Forward pass
outputs = self.model(batch['input_ids'], batch.get('attention_mask'))
loss = loss_fn(outputs, batch.get('labels'))
# Compute gradients
grads = torch.autograd.grad(
loss,
self.parameters,
retain_graph=True,
create_graph=True,
allow_unused=True
)
# Flatten and concatenate
flat_grad = torch.cat([
g.reshape(-1) for g in grads if g is not None
])
return flat_grad
def compute_influence_score(self, to_forget_batch: Dict,
other_batch: Dict,
loss_fn: callable) -> float:
"""
Compute influence of to_forget_batch on other_batch.
Measures how much removing to_forget would change loss on other_batch.
Args:
to_forget_batch: Data to remove
other_batch: Data to evaluate influence on
loss_fn: Loss function
Returns:
Influence score (higher = more influential)
"""
# Gradient of to_forget sample
grad_to_forget = self.compute_gradient_norm(to_forget_batch, loss_fn)
# Gradient of other sample
grad_other = self.compute_gradient_norm(other_batch, loss_fn)
# Influence = dot product of gradients
# High similarity means to_forget influences other's loss
influence = torch.dot(grad_to_forget, grad_other).item()
return influence
def approximate_hessian_inverse_sqrt(self,
data_batch: Dict,
loss_fn: callable,
num_samples: int = 50) -> torch.Tensor:
"""
Approximate H^-1 v using Hutchinson trace estimator.
More efficient than exact Hessian computation.
Args:
data_batch: Batch to estimate on
loss_fn: Loss function
num_samples: Number of samples for estimation
Returns:
Approximated H^-1 v vector
"""
device = next(self.model.parameters()).device
# Random vector for trace estimation
v = torch.randn(sum(p.numel() for p in self.parameters),
device=device)
# Compute Hv using finite differences
h_v = self._compute_hessian_vector_product(
data_batch, loss_fn, v
)
# Approximate H^-1 v using conjugate gradient or similar
# Simplified: direct approximation
approx_h_inv_v = v / (h_v + 1e-8)
return approx_h_inv_v
def _compute_hessian_vector_product(self,
batch: Dict,
loss_fn: callable,
v: torch.Tensor) -> torch.Tensor:
"""
Compute Hessian-vector product: H v where H is loss Hessian.
Uses reverse-mode differentiation for efficiency.
"""
# Forward pass
outputs = self.model(batch['input_ids'], batch.get('attention_mask'))
loss = loss_fn(outputs, batch.get('labels'))
# First gradient
grads = torch.autograd.grad(
loss,
self.parameters,
retain_graph=True,
create_graph=True,
allow_unused=True
)
flat_grad = torch.cat([
g.reshape(-1) for g in grads if g is not None
])
# Gradient-vector dot product
gvp = torch.dot(flat_grad, v)
# Second gradient (Hessian-vector product)
h_v = torch.autograd.grad(
gvp,
self.parameters,
retain_graph=True,
allow_unused=True
)
h_v_flat = torch.cat([
g.reshape(-1) for g in h_v if g is not None
])
return h_v_flat
Step 2: Reformulate unlearning as inverse learning
View the forgetting problem through the lens of incremental learning:
class IncrementalLearningPerspective:
"""
Reformulates unlearning as inverse of incremental learning.
Key insight: If data x was added to model M to get M', then
removing x from M' should reverse the process.
"""
def __init__(self, model: nn.Module):
self.model = model
def compute_unlearning_gradient(self,
to_forget_batch: Dict,
loss_fn: callable) -> Dict:
"""
Compute gradient direction for removing influence of batch.
Instead of computing H^-1 g (inverse problem),
directly optimize to remove influence.
Args:
to_forget_batch: Batch to remove
loss_fn: Loss function
Returns:
Direction to move parameters to unlearn data
"""
# Compute loss gradient for forget batch
outputs = self.model(to_forget_batch['input_ids'])
loss = loss_fn(outputs, to_forget_batch.get('labels'))
# Gradient indicating how to fit this data
grads = torch.autograd.grad(
loss,
self.model.parameters(),
retain_graph=True,
create_graph=False,
allow_unused=True
)
# Unlearning direction: negative of learning gradient
# Moving opposite direction removes the data's influence
unlearn_direction = {
name: -g if g is not None else None
for name, g in zip(
[n for n, _ in self.model.named_parameters()],
grads
)
}
return unlearn_direction
def incremental_unlearning_update(self,
to_forget_batch: Dict,
learning_rate: float,
loss_fn: callable) -> Dict:
"""
Single unlearning step using incremental perspective.
Move parameters opposite to how they'd move if learning.
"""
unlearn_dir = self.compute_unlearning_gradient(to_forget_batch, loss_fn)
updates = {}
for name, param in self.model.named_parameters():
if name in unlearn_dir and unlearn_dir[name] is not None:
# Update: move opposite to learning direction
param.data = param.data - learning_rate * unlearn_dir[name]
updates[name] = -learning_rate * unlearn_dir[name]
return updates
Step 3: Implement gradient-based unlearning optimizer
Perform efficient deletion through optimization:
class GradientBasedUnlearner:
"""Efficiently unlearns data through gradient optimization"""
def __init__(self, model: nn.Module, learning_rate: float = 1e-4):
self.model = model
self.learning_rate = learning_rate
self.optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
self.incremental = IncrementalLearningPerspective(model)
def unlearn_sample(self, to_forget_sample: Dict,
loss_fn: callable,
num_steps: int = 5) -> Dict:
"""
Unlearn a single sample through iterative optimization.
Args:
to_forget_sample: Single sample to remove
loss_fn: Loss function
num_steps: Number of gradient steps
Returns:
Metrics about unlearning process
"""
initial_loss = None
metrics = {
'step_losses': [],
'parameter_changes': [],
'num_steps': num_steps
}
for step in range(num_steps):
# Compute loss on sample to forget
outputs = self.model(
to_forget_sample['input_ids'],
to_forget_sample.get('attention_mask')
)
loss = loss_fn(outputs, to_forget_sample.get('labels'))
if initial_loss is None:
initial_loss = loss.item()
metrics['step_losses'].append(loss.item())
# Unlearning step: move opposite to learning direction
unlearn_update = self.incremental.incremental_unlearning_update(
to_forget_sample,
self.learning_rate,
loss_fn
)
param_change = sum(
torch.norm(v).item() for v in unlearn_update.values()
if v is not None
)
metrics['parameter_changes'].append(param_change)
metrics['loss_reduction'] = initial_loss - metrics['step_losses'][-1]
return metrics
def unlearn_batch(self, to_forget_batch: Dict,
loss_fn: callable,
batch_num_steps: int = 10) -> Dict:
"""
Unlearn all samples in a batch.
Args:
to_forget_batch: Full batch to remove
loss_fn: Loss function
batch_num_steps: Total optimization steps for batch
Returns:
Unlearning metrics
"""
metrics = {
'batch_loss_initial': None,
'batch_loss_final': None,
'num_steps': batch_num_steps,
'total_param_change': 0.0
}
for step in range(batch_num_steps):
# Forward pass
outputs = self.model(
to_forget_batch['input_ids'],
to_forget_batch.get('attention_mask')
)
loss = loss_fn(outputs, to_forget_batch.get('labels'))
if step == 0:
metrics['batch_loss_initial'] = loss.item()
# Backward pass
self.optimizer.zero_grad()
# Maximize loss (adversarial: try to increase loss on forget data)
# This removes the learned associations
loss.backward()
# Update with negated gradients (opposite direction)
for param in self.model.parameters():
if param.grad is not None:
param.data -= self.learning_rate * param.grad
metrics['total_param_change'] += torch.norm(param.grad).item()
# Final evaluation
with torch.no_grad():
outputs = self.model(to_forget_batch['input_ids'])
final_loss = loss_fn(outputs, to_forget_batch.get('labels'))
metrics['batch_loss_final'] = final_loss.item()
return metrics
Step 4: Implement membership inference test
Verify that unlearning was successful:
class MembershipInferenceTest:
"""Tests whether data has been successfully unlearned"""
def __init__(self, model: nn.Module):
self.model = model
def compute_membership_score(self, sample: Dict,
loss_fn: callable) -> float:
"""
Compute membership score: model's loss on sample.
High loss = likely not a training sample (unlearned)
Low loss = likely a training sample (still learned)
Args:
sample: Sample to test membership of
loss_fn: Loss function
Returns:
Membership score (higher = more likely member)
"""
with torch.no_grad():
outputs = self.model(
sample['input_ids'],
sample.get('attention_mask')
)
loss = loss_fn(outputs, sample.get('labels'))
# Inverse loss as membership score
# (lower loss = higher membership probability)
membership_score = -loss.item()
return membership_score
def membership_inference_attack(self,
train_samples: List[Dict],
test_samples: List[Dict],
loss_fn: callable) -> Dict:
"""
Perform membership inference attack to test unlearning.
Args:
train_samples: Original training samples
test_samples: Non-training samples
loss_fn: Loss function
Returns:
AUC score indicating inference accuracy
"""
train_scores = [
self.compute_membership_score(s, loss_fn)
for s in train_samples
]
test_scores = [
self.compute_membership_score(s, loss_fn)
for s in test_samples
]
# Compute AUC: can we distinguish train from test?
from sklearn.metrics import roc_auc_score
y_true = [1] * len(train_scores) + [0] * len(test_scores)
y_pred = train_scores + test_scores
auc = roc_auc_score(y_true, y_pred)
return {
'auc': auc,
'train_avg_score': np.mean(train_scores),
'test_avg_score': np.mean(test_scores),
'separation': np.mean(train_scores) - np.mean(test_scores)
}
def verify_unlearning(self,
unlearned_samples: List[Dict],
remaining_samples: List[Dict],
loss_fn: callable,
threshold: float = 0.5) -> Dict:
"""
Verify that samples have been unlearned.
Args:
unlearned_samples: Samples that should be forgotten
remaining_samples: Samples that should still be known
loss_fn: Loss function
threshold: Threshold for considering unlearned
Returns:
Verification results
"""
unlearned_scores = [
self.compute_membership_score(s, loss_fn)
for s in unlearned_samples
]
remaining_scores = [
self.compute_membership_score(s, loss_fn)
for s in remaining_samples
]
# Successful unlearning: unlearned samples have much higher loss
unlearning_gap = np.mean(unlearned_scores) - np.mean(remaining_scores)
successful_unlearns = sum(
1 for score in unlearned_scores
if score > threshold
)
return {
'unlearning_gap': unlearning_gap,
'successful_unlearns': successful_unlearns,
'total_samples': len(unlearned_samples),
'success_rate': successful_unlearns / len(unlearned_samples),
'verified': successful_unlearns / len(unlearned_samples) > 0.9
}
Step 5: Implement end-to-end unlearning pipeline
Integrate all components into complete unlearning workflow:
class UnlearningPipeline:
"""Complete efficient machine unlearning system"""
def __init__(self, model: nn.Module, loss_fn: callable,
learning_rate: float = 1e-4):
self.model = model
self.loss_fn = loss_fn
self.unlearner = GradientBasedUnlearner(model, learning_rate)
self.verifier = MembershipInferenceTest(model)
self.influence = InfluenceApproximator(model)
def unlearn_request(self, to_forget_data: List[Dict],
num_steps: int = 10) -> Dict:
"""
Process unlearning request for data batch.
Args:
to_forget_data: Data to remove
num_steps: Optimization steps
Returns:
Unlearning report
"""
report = {
'num_samples': len(to_forget_data),
'unlearning_metrics': None,
'verification': None,
'success': False
}
# Step 1: Compute influence scores
influence_scores = []
for sample in to_forget_data:
score = self.influence.compute_influence_score(
sample,
{'input_ids': torch.zeros(1)}, # Dummy
self.loss_fn
)
influence_scores.append(score)
# Step 2: Unlearn data
batch_metrics = self.unlearner.unlearn_batch(
{
'input_ids': torch.cat([s['input_ids'] for s in to_forget_data]),
'labels': torch.cat([s.get('labels', s['input_ids'])
for s in to_forget_data])
},
self.loss_fn,
batch_num_steps=num_steps
)
report['unlearning_metrics'] = batch_metrics
# Step 3: Verify unlearning
verification = self.verifier.verify_unlearning(
to_forget_data,
[], # Would have hold-out test set in practice
self.loss_fn
)
report['verification'] = verification
report['success'] = verification['verified']
return report
def evaluate_utility(self, test_data: List[Dict]) -> float:
"""
Evaluate model utility preservation after unlearning.
Args:
test_data: Test set for evaluation
Returns:
Accuracy or loss on test set
"""
with torch.no_grad():
total_loss = 0.0
for sample in test_data:
outputs = self.model(
sample['input_ids'],
sample.get('attention_mask')
)
loss = self.loss_fn(outputs, sample.get('labels'))
total_loss += loss.item()
avg_loss = total_loss / len(test_data)
return avg_loss
When to use Efficient Machine Unlearning:
When NOT to use Efficient Machine Unlearning:
Key hyperparameters:
unlearning_learning_rate: 1e-4 to 1e-5 typicalnum_unlearning_steps: 5-20 depending on data importancebatch_size: Larger batches more efficientverification_threshold: 0.5-0.7 for membership testExpected performance:
Privacy guarantees:
Efficient Machine Unlearning via Influence Approximation. arXiv:2507.23257
testing
Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.
testing
Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.
data-ai
Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.
devops
Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.