.claude/skills/dedup-audit/SKILL.md
Systematic multi-pass code deduplication audit for Rust workspaces. Use when duplication has accumulated across crates, when error boilerplate is excessive, when repeated From/Display/Error impls appear across modules, when onboarding thiserror, or when establishing CI duplication gates. Triggers on "find duplicates", "reduce duplication", "dedup audit", "thiserror migration", "error boilerplate".
npx skillsauth add ahrav/gossip-rs dedup-auditInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Multi-pass deduplication campaign for Rust workspaces. Detects duplicates with two complementary tools, triages true vs incidental duplication, eliminates error boilerplate via thiserror, extracts macros for repeated routing impls, and installs CI prevention gates.
Core principle: Syntactic similarity is not semantic identity. Every tool report requires human triage before action. The wrong abstraction costs more than the duplication it removes.
Run both tools to get complementary views. Token-based catches broad patterns; AST-based catches structural clones with higher precision.
npm install -g jscpd # Token-based, 150+ languages
cargo install cargo-dupes # AST-based, Rust-native via syn
mkdir -p tmp/dedup-baseline
jscpd --min-tokens 50 --min-lines 5 \
--reporters json,html \
--ignore "target/**,fuzz/**,tmp/**" \
--format rust \
--output tmp/dedup-baseline \
crates/
Parse results:
python3 -c "
import json, sys
data = json.load(open('tmp/dedup-baseline/jscpd-report.json'))
s = data['statistics']['total']
print(f'Duplication: {s[\"percentage\"]}% ({s[\"duplicatedLines\"]} lines, {s[\"clones\"]} pairs)')
"
Industry target: <5% for production code. 5-10% warrants review. >10% demands action.
# High threshold: near-exact clones
cargo dupes report --threshold 0.9 --exclude-tests --min-nodes 15 \
> tmp/dedup-baseline/cargo-dupes-exact.txt
# Lower threshold: structural similarity
cargo dupes report --threshold 0.7 --exclude-tests --min-nodes 20 \
> tmp/dedup-baseline/cargo-dupes-similar.txt
If cargo-dupes OOMs on large workspaces, run per-crate:
for crate in crates/*/; do
cargo dupes report --threshold 0.9 --path "$crate" 2>/dev/null
done
For each clone pair, apply the incidental duplication test:
"If I change this code for caller A's requirements, must caller B also change?"
- Yes -> true duplication, actionable
- No -> incidental duplication, leave it alone
Categorize every detected pair as: true-duplication / intentional / deferred
Common intentional duplication in Rust:
Error handling boilerplate is consistently the largest actionable duplication category in Rust workspaces. This pass is mechanical and low-risk.
# Root Cargo.toml
[workspace.dependencies]
thiserror = "2"
For types with #[derive(Debug)] (standard Debug), the conversion is:
// BEFORE:
#[derive(Debug)]
pub enum FooError {
Bar(BarError),
Baz { detail: String },
}
impl fmt::Display for FooError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
Self::Bar(e) => write!(f, "bar failed: {e}"),
Self::Baz { detail } => write!(f, "baz: {detail}"),
}
}
}
impl std::error::Error for FooError {
fn source(&self) -> Option<&(dyn std::error::Error + 'static)> {
match self { Self::Bar(e) => Some(e), _ => None }
}
}
// AFTER:
#[derive(Debug, thiserror::Error)]
pub enum FooError {
#[error("bar failed: {0}")]
Bar(#[source] BarError),
#[error("baz: {detail}")]
Baz { detail: String },
}
Rules:
#[source] where source() returns Some(inner) but no From impl exists#[from] ONLY where an existing From impl is being replacedsource are auto-detected by thiserror (no annotation needed)use std::fmt / use std::error::Error importsthiserror v2 does NOT auto-derive Debug. Custom Debug impls for security redaction (hiding hash values, keys, credentials) are safe alongside thiserror:
#[derive(Clone, PartialEq, Eq, thiserror::Error)] // NO Debug here
#[non_exhaustive]
pub enum SecretError {
#[error("connection failed to {host}")]
Connection { host: String, #[source] source: io::Error },
}
// Custom Debug with redaction remains untouched
impl fmt::Debug for SecretError { /* redacts host */ }
When #[error("...")] cannot express the formatting (conditional logic,
method calls, helper functions), keep manual Display but still derive
thiserror::Error for the Error impl:
#[derive(Debug, thiserror::Error)]
pub enum ComplexError {
// No #[error] attributes — manual Display below
Variant { source: InnerError }, // source auto-detected
}
impl fmt::Display for ComplexError { /* complex logic */ }
When 3+ From<SharedError> for SpecificError impls follow the same pattern
(accept some variants, reject others with unreachable!()), extract a macro:
macro_rules! impl_from_shared_error {
($target:ident,
accept: [ $($variant:ident $({ $($field:ident),* })?),* $(,)? ],
reject: [ $($rej:pat),* $(,)? ]
) => {
impl From<SharedError> for $target {
fn from(e: SharedError) -> Self {
match e {
$( SharedError::$variant $({ $($field),* })? =>
Self::$variant $({ $($field),* })?, )*
$( $rej => unreachable!(
"SharedError variant not valid for {}", stringify!($target)
), )*
}
}
}
};
}
Critical: Use explicit rejection arms (not wildcards) to preserve
compile-time exhaustiveness. Adding a new variant to SharedError must
force a compile error in every routing impl.
Similarly for From<XxxError> for RejectionKind patterns in simulation
harnesses.
Review remaining duplicates flagged by cargo-dupes near-duplicate detection.
Decision framework for each candidate:
Is it test/bench code?
Yes -> Higher threshold applies. Leave it unless it causes maintenance pain.
No -> Continue.
Rule of three: Are there 3+ instances?
No -> Defer. Two instances are insufficient signal.
Yes -> Continue.
Can you name the abstraction clearly?
No -> The abstraction is premature. Leave the duplication.
Yes -> Continue.
Would extraction introduce lifetime complexity or generics on hot paths?
Yes -> Keep concrete. Use the "outline pattern" if generics are needed.
No -> Extract.
Rust-specific extraction risks:
macro_rules!Create .jscpd.json at workspace root:
{
"threshold": 6,
"reporters": ["json", "consoleFull"],
"ignore": [
"target/**",
"fuzz/**",
"tmp/**",
"**/*test*",
"**/*tests*",
"**/benches/**"
],
"minTokens": 50,
"minLines": 5,
"format": ["rust"]
}
Start threshold at current baseline + 1% margin. Ratchet down after each pass.
Wire jscpd into CI. The .jscpd.json config is inert without an invocation step:
# GitHub Actions example
- name: Check code duplication
run: npx jscpd --config .jscpd.json crates/
Add to project instructions:
#[derive(thiserror::Error)]After each pass:
cargo fmt --all
cargo check
cargo clippy --all-targets --all-features -- -D warnings
cargo test --workspace # exclude Docker-dependent crates if needed
RUSTDOCFLAGS="-D warnings" cargo doc --no-deps --all-features
Specific checks:
format!("{}", error) output identical before/after for every converted variantformat!("{:?}", error) still shows <redacted> where custom Debug existserror.source() chain preserved| What | Tool/Technique |
|------|---------------|
| Token-level scan | jscpd --format rust --min-tokens 50 |
| AST-level scan | cargo dupes report --threshold 0.9 |
| Error boilerplate | thiserror = "2" derive macro |
| Routing From impls | macro_rules! with accept/reject lists |
| CI gate | .jscpd.json with threshold ratchet |
| Pattern enforcement | ast-grep rules post-dedup |
| Binary bloat check | cargo bloat --release |
| Mistake | Fix |
|---------|-----|
| Merging incidental duplication | Apply the "caller A / caller B" test first |
| Using #[from] speculatively | Only replace existing From impls |
| Mixing refactoring with feature work | Pure refactoring PRs only |
| One giant PR for all conversions | One crate per PR |
| Deduplicating test code aggressively | Test clarity > DRY |
| Using generics on hot paths | Keep concrete; use outline pattern |
| Wildcard rejection in routing macros | Use explicit arms for exhaustiveness |
development
Deep first-principles code explanation that builds real understanding through phased walkthroughs with diagrams. Covers algorithms, data structures, memory layout, concurrency patterns, and performance tricks — especially for systems code in Rust. Use whenever the user asks to explain, walk through, break down, deep dive into, or understand code. Trigger on "how does this work", "what's happening here", "teach me about this", "why is it done this way", or when the user references a file with @ and wants to understand it. Proactively use when examining code involving lock-free algorithms, atomics/CAS, memory ordering,
development
Use when creating implementation-ready beads tasks that need testing strategy, optimal implementation approach, and documentation requirements baked in — composes /create-task with parallel enrichment agents that analyze the codebase and produce concrete test specifications, algorithm/data-structure guidance, and doc quality standards so implementing agents don't need to re-research
development
--- name: autoresearch description: Autonomous Goal-directed Iteration. Apply Karpathy's autoresearch principles to ANY task. Loops autonomously — modify, verify, keep/discard, repeat. Supports bounded iteration via Iterations: N inline config. version: 1.9.11 --- # Claude Autoresearch — Autonomous Goal-directed Iteration Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch). Applies constraint-driven autonomous iteration to ANY work — not just ML research. **Core id
development
Use when implementing a new feature and assessing coverage gaps, during periodic test hygiene, when test suites feel bloated, or before merging code that changes coordination or hot paths. Two-phase assess-then-improve testing pipeline.