skills/dancing-chains-strategic-persuasion/SKILL.md
Apply Theory of Mind-based strategic persuasion to code reviews, PR rebuttals, RFC objections, and technical disagreements. Uses the ToM-Strategy-Response (TSR) pipeline from RebuttalAgent to model the reviewer's mental state, formulate a targeted persuasion strategy, and generate a grounded response. Trigger phrases: - "Help me respond to this code review" - "Write a rebuttal for this PR feedback" - "How should I push back on this review comment" - "Draft a response to this RFC objection" - "Help me persuade the reviewer" - "Respond to this technical critique"
npx skillsauth add ndpvt-web/arxiv-claude-skills dancing-chains-strategic-persuasionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to craft persuasive, strategically grounded responses to code reviews, PR feedback, RFC objections, and technical critiques. It applies the ToM-Strategy-Response (TSR) pipeline from the RebuttalAgent paper (He et al., ICLR 2026), which treats rebuttal not as blunt counter-argument but as strategic communication under information asymmetry -- modeling what the reviewer thinks, deciding how to shift their position, and only then composing the response.
The core insight from "Dancing in Chains" is that effective rebuttal requires perspective-taking before argument construction. Most people jump directly to defending their code. The TSR pipeline inserts two critical reasoning stages before generating any text.
Stage 1 -- Theory of Mind (T): Build a hierarchical profile of the reviewer at two levels. At the macro level, infer their overall stance (supportive, skeptical, hostile), dominant concern (correctness, performance, maintainability, style), and expertise level (domain expert, generalist, junior). At the micro level, classify each specific comment by its concern category: significance of the issue raised, methodology/design critique, rigor of evidence (tests, benchmarks), or presentation/clarity complaints. This profile determines who you are talking to and what they actually care about.
Stage 2 -- Strategy (S): Translate the reviewer profile into an actionable persuasion plan. This is where strategic trade-offs happen: when to concede a point to build goodwill, when to stand firm with evidence, when to reframe the narrative, and when to propose a compromise. The strategy is a short explicit plan (2-4 sentences) that guides tone, structure, and argumentation before any response text is written.
Stage 3 -- Response (R): Generate the actual reply conditioned on both the reviewer profile and the strategy. This ensures the response is coherent, targeted, and persuasive rather than reactive. Each response addresses the reviewer's underlying concern (not just their surface objection), uses evidence aligned to their expertise level, and maintains the strategic tone decided in Stage 2.
Collect all inputs. Gather the review comment(s), the relevant code/design under review, and any surrounding context (PR description, commit messages, related discussions). If the user provides only the review comment, ask for the code or design being reviewed.
Macro-level reviewer profiling. Analyze the full set of review comments to infer:
Micro-level comment classification. For each individual comment, tag it with:
Formulate explicit strategy. Based on the profile, write a 2-4 sentence strategy plan before drafting any response. Decide for each comment whether to:
Structure the response for the reviewer's expertise level. For expert reviewers: be precise and technical, skip obvious context. For generalist reviewers: provide more rationale for domain-specific decisions. For junior reviewers: be encouraging and educational, not dismissive.
Draft each comment response. For each review point, write a response that:
Calibrate tone across the full response. Read the complete set of replies together. Ensure the cumulative tone matches the strategy: if you conceded 3 minor points, your firm stance on the major point should feel earned, not combative.
Add structural signals. For multi-comment responses, use clear headers or numbered references to match the reviewer's original structure, making it easy for them to track your reply against their comments.
Self-check for strategy drift. Reread each response against the explicit strategy from Step 4. Flag any response that contradicts the planned approach (e.g., strategy says "concede" but the draft argues back).
Present the output with the reasoning visible. Show the user the reviewer profile, the strategy rationale, and the drafted responses -- so they can adjust the strategy before sending.
Example 1: Defending an architectural choice in a PR review
User: Help me respond to this review comment on my PR:
Reviewer says: "I don't understand why you're using an event-driven
architecture here. This adds unnecessary complexity. A simple
synchronous call chain would be much easier to maintain and test.
Please simplify."
My PR adds an event bus for decoupling the notification system from
the order processing pipeline in our e-commerce service.
---
Approach:
[ToM Analysis]
- Macro: Reviewer is skeptical but constructive ("please simplify"
not "this is wrong"). Dominant concern is maintainability/complexity.
Expertise appears generalist -- they haven't engaged with the specific
coupling problem.
- Micro: Methodology critique, partially valid (event-driven IS more
complex), moderate emotional charge.
[Strategy]
Concede the complexity concern to build goodwill, then reframe: the
complexity is the *cost* of solving a real coupling problem that causes
bugs today. Provide concrete evidence of the problem. Offer a compromise
on scope.
[Response]
"You're right that event-driven adds indirection compared to direct
calls -- that's a real trade-off. The reason I went this direction is
that we currently have 4 places where order processing directly imports
and calls notification logic (OrderService.java:142, :203, :287,
RefundHandler.java:56). This coupling caused the outage last sprint
when the email provider timeout cascaded into order failures (see
incident #1247).
The event bus decouples these so notification failures can't block
orders. I've kept the event surface small -- just 2 event types
(OrderCompleted, OrderRefunded) rather than a general-purpose bus.
That said, I'm open to scoping this down: I could start with only
the OrderCompleted event (which covers the incident case) and handle
RefundHandler in a follow-up. Would that address your complexity
concern while still fixing the coupling issue?"
Example 2: Responding to multiple review comments with mixed validity
User: I got these review comments on my Python PR. Help me respond:
1. "This function is 45 lines long, please break it up"
2. "You're not handling the case where the API returns a 429"
3. "Why are you using dataclasses instead of Pydantic here?"
The PR adds a data ingestion pipeline that fetches from a rate-limited
external API and transforms records into our internal format.
---
Approach:
[ToM Analysis]
- Macro: Reviewer is thorough and constructive. Three distinct concerns
spanning style, correctness, and design. Likely experienced -- the 429
catch is specific and practical.
- Micro:
1. Presentation concern, partially valid
2. Rigor concern, fully valid (genuine bug)
3. Methodology concern, debatable (both are reasonable)
[Strategy]
Concede #2 immediately and enthusiastically -- it's a real bug and
acknowledging it builds credibility. For #1, offer a targeted
compromise. For #3, stand firm with a brief rationale but keep the
door open.
[Response]
> 1. "This function is 45 lines long, please break it up"
Good call. I've extracted the field-mapping logic into
`_map_external_to_internal()` and the validation into
`_validate_record()`. The main function is now 18 lines and reads
as a clear pipeline: fetch -> map -> validate -> persist.
> 2. "You're not handling the case where the API returns a 429"
You're right, and that's a real gap given this API's rate limits.
I've added exponential backoff with jitter (3 retries, base 1s)
in commit abc123. Also added a test that mocks 429 responses to
verify the retry behavior. Thanks for catching this.
> 3. "Why are you using dataclasses instead of Pydantic here?"
I went with dataclasses because these objects are internal transfer
types that never touch serialization boundaries -- they're created
in `ingest()` and consumed in `persist()` within the same process.
Pydantic's validation overhead (~4x slower construction in our
benchmarks) doesn't buy us anything here since the data is already
validated in `_validate_record()`. Happy to reconsider if we later
need to serialize these, but for now the simpler tool fits.
Example 3: Responding to a hostile/dismissive review
User: Got this review on my RFC for migrating from REST to gRPC:
"This proposal is half-baked. You haven't considered backward
compatibility at all. The migration cost alone makes this DOA."
---
Approach:
[ToM Analysis]
- Macro: Reviewer is reject-leaning and dismissive. Attitude is terse
and emotionally charged ("half-baked", "DOA"). Dominant concern is
migration cost/compatibility. Likely has operational experience --
these are practical concerns, not theoretical.
- Micro: Significance + Methodology critique. The concerns are
legitimate even if the tone is harsh.
[Strategy]
Do NOT match the hostile tone or become defensive. Acknowledge the
valid underlying concerns with specific evidence that they were
considered. The goal is to demonstrate thoroughness, which directly
counters "half-baked." Provide concrete migration plan details to
shift the conversation from vibes to specifics.
[Response]
"Backward compatibility and migration cost are central concerns --
let me clarify how the RFC addresses them, since I may not have
made this prominent enough in the document.
Section 4.2 describes the dual-protocol phase: existing REST
endpoints remain live behind the API gateway while gRPC endpoints
come online service-by-service. No client breaks during migration.
I've estimated the timeline in Appendix B based on our 23 endpoints
-- the proposal is to migrate the 8 highest-traffic endpoints first
(which handle 91% of internal RPC volume) and leave the long-tail
REST endpoints as-is until Q3.
On cost: the RFC includes a comparison table (Table 3) showing
current REST serialization overhead vs. projected gRPC performance
at our p99 latency targets. The break-even point is ~4 months
post-migration for the top-8 endpoints.
I'll update the RFC to move the compatibility and cost analysis
into the main body rather than the appendix so these are more
visible. Would you be open to reviewing a revised version that
foregrounds these sections?"
He, Z., Lyu, Z., & Fung, Y.R. (2026). Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind. ICLR 2026. arXiv:2601.15715v2 -- See Section 3 for the full TSR pipeline specification, Section 4 for the critique-and-refine data synthesis method, and Appendix D for complete worked examples of the ToM profiling and strategy formulation stages.
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".