skills/funny-or-persuasive-but/SKILL.md
Fine-grained multi-concept text control that avoids the compositionality trap where LLMs degrade when asked to be e.g. funny AND persuasive simultaneously. Use when: 'write a funny persuasive email', 'make this formal but warm', 'generate humorous and convincing copy', 'control tone on two axes at once', 'blend humor with authority in this text', 'write polite but assertive feedback'.
npx skillsauth add ndpvt-web/arxiv-claude-skills funny-or-persuasive-butInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill implements the evaluation-driven decomposition strategy from Labroo et al. (EACL 2026), which demonstrated that LLMs systematically fail at simultaneous multi-attribute text generation. When asked to produce text that is both humorous and persuasive (or formal and warm, or polite and assertive), models degrade on at least one dimension -- even though these concepts are linguistically independent. This skill teaches Claude to detect multi-concept requests, decompose them into sequential single-concept passes, and self-evaluate each dimension on a 1-5 intensity scale to ensure both attributes hit their targets.
The compositionality trap. Labroo et al. tested concept pairs -- humor vs. persuasiveness, formality vs. politeness, sentiment polarity -- across Llama, Gemma, and Qwen models on tasks like review generation, email composition, and argument construction. They measured output quality using fine-tuned classifiers and human annotators on a 1-5 Likert scale. The core finding: naive dual-concept prompts (e.g., "Write a humorous AND persuasive review") consistently score lower on one or both dimensions compared to single-concept prompts. The model cannot serve two masters in a single generation pass.
Sequential decomposition as mitigation. The paper identifies that explicit decomposition -- handling each concept in a distinct phase rather than a simultaneous instruction -- yields measurably better results. Rather than asking for "funny and persuasive" in one shot, you first generate with one concept dominant, then revise to layer in the second. This exploits the model's strong single-concept control while sidestepping the compositional failure mode.
Self-evaluation closes the loop. Each concept is scored on a 1-5 intensity scale after generation. If either dimension falls below the target, a targeted revision pass adjusts only the deficient attribute. This mirrors the paper's evaluation pipeline (automated classifier + human annotation) but made practical for a generation workflow.
Identify the concept pair. Parse the user's request to extract exactly which stylistic attributes are being combined. Map informal language to canonical concepts: humor, persuasiveness, formality, politeness, sentiment, warmth, assertiveness, authority.
Assign target intensities. For each concept, determine the desired intensity on a 1-5 scale. If the user says "very funny, slightly persuasive," map to humor=5, persuasiveness=2. If no intensity is specified, default both to 3 (moderate).
Choose a primary concept. Select whichever concept is harder to retrofit after the fact. Humor and creativity are harder to add to existing text than persuasiveness or formality. Prioritize the more generative/creative attribute as primary.
Generate the primary-concept draft. Write the text focusing exclusively on the primary concept at its target intensity. Do not mention or attempt the secondary concept. This ensures the harder attribute is strongly established.
Score the primary concept. Rate the draft on the 1-5 scale for the primary concept. If it falls below target, revise before proceeding. Do not move to step 6 with a weak primary attribute.
Layer in the secondary concept via targeted revision. Revise the draft to introduce the secondary concept at its target intensity. The revision instruction must be explicit: "Revise the following text to add [concept] at intensity [N]/5 while preserving the existing [primary concept]." Constrain edits to additive phrasing changes, not structural rewrites.
Score both concepts on the revision. Rate the revised text on both dimensions independently. Check that neither has dropped below target. If the primary concept degraded during revision, the edit was too aggressive.
Iterative correction if needed. If one concept is below target, perform a focused micro-edit on only the deficient dimension. Limit to 2 correction passes maximum -- if it still fails, the concept pair may have genuine tension at those intensities (see Limitations).
Verify naturalness. Read the final text for coherence. Multi-concept text that sounds forced or awkward is worse than slightly missing one target. Flag any jarring juxtapositions to the user.
Present the output with a concept scorecard. Show the final text alongside a brief 1-5 rating for each attribute so the user can see where each dimension landed and request adjustments.
Example 1: Humorous and persuasive product review
User: Write a funny but convincing review for a standing desk.
Step 1 - Identify concepts: humor (target: 4/5), persuasiveness (target: 4/5)
Step 2 - Primary concept: humor (harder to retrofit)
Primary draft (humor-focused):
"I bought this standing desk because my doctor said sitting is the new
smoking, and I figured if I'm going to die, I'd rather do it on my feet
like a man. Three months in, my legs have calves now. Actual calves.
I didn't know that was possible after 30."
Score: humor=4/5, persuasiveness=1/5. Primary concept is strong.
Revision pass (adding persuasiveness):
"I bought this standing desk because my doctor said sitting is the new
smoking, and I figured if I'm going to die, I'd rather do it on my feet
like a man. Three months in, my legs have calves now. Actual calves.
I didn't know that was possible after 30. But seriously -- the
pneumatic lift is dead silent, the surface fits two monitors and a
coffee graveyard, and the wobble factor is zero. At $349, it costs less
than the back surgery I was headed toward. My productivity is up because
I pace while thinking, which apparently is what smart people do."
Final score: humor=4/5, persuasiveness=4/5.
Both targets met. Text reads naturally.
Example 2: Formal yet warm rejection email
User: Write a formal but warm email declining a job candidate.
Step 1 - Identify concepts: formality (target: 4/5), warmth (target: 4/5)
Step 2 - Primary concept: warmth (harder to add to formal text)
Primary draft (warmth-focused):
"Hi Jordan, I genuinely enjoyed our conversation last Thursday --
your take on distributed systems was one of the most thoughtful I've
heard this quarter. I want to be straightforward with you: we've
decided to move forward with another candidate whose background
more closely matches the immediate needs of this role. That said,
your skills are strong, and I'd be disappointed if we didn't cross
paths again. I'll reach out personally if something opens up that
fits your profile."
Score: warmth=4/5, formality=2/5. Needs formality lift.
Revision pass (adding formality):
"Dear Jordan, Thank you for taking the time to speak with us last
Thursday. Your perspective on distributed systems architecture was
genuinely one of the most thoughtful I have encountered this quarter.
After careful consideration, we have decided to move forward with
a candidate whose background more closely aligns with the immediate
requirements of the role. I want to be straightforward: this was a
difficult decision. Your technical depth and communication skills
left a strong impression on the team. I would welcome the
opportunity to reconnect should a position that suits your profile
become available, and I will reach out personally if it does."
Final score: warmth=4/5, formality=4/5.
Warmth preserved through specific compliments and personal commitment.
Formality achieved through structure, salutation, and register.
Example 3: Polite but assertive Slack message
User: Help me write a polite but firm message to a coworker who
keeps missing deadlines. Make it assertive enough to actually work.
Step 1 - Identify concepts: politeness (target: 3/5), assertiveness (target: 5/5)
Step 2 - Primary concept: assertiveness (user emphasized "actually work")
Primary draft (assertiveness-focused):
"The Q3 reports were due Friday. This is the third time this quarter
the deadline has been missed. I need the completed reports by end of
day Wednesday. If there's a recurring blocker, we need to address it
now -- not after the next missed deadline."
Score: assertiveness=5/5, politeness=1/5. Too blunt.
Revision pass (adding politeness):
"Hey Sam -- I noticed the Q3 reports didn't land by Friday's deadline,
and I want to flag this because it's the third time this quarter.
I know things get hectic, but I need the completed reports by end of
day Wednesday so the team can stay on track. If something is
consistently blocking you, I'd rather we figure that out together now
than keep running into the same wall. Can we grab 15 minutes tomorrow
to sort it out?"
Final score: politeness=3/5, assertiveness=4/5.
The assertiveness dipped slightly from 5 to 4 -- acceptable trade-off.
The deadline is still concrete, the pattern is named, but the tone
invites collaboration rather than issuing an ultimatum.
Primary concept degrades during secondary pass. This is the most common failure. Roll back to the primary draft and attempt a lighter touch -- add the secondary concept at a lower intensity (e.g., 2/5 instead of 4/5), then incrementally increase.
Both concepts score low after revision. The text has likely been over-edited into incoherence. Discard the revision and restart from step 4 with a fresh primary draft. Do not attempt to patch a broken revision.
User requests contradictory concepts at high intensity. Some pairs genuinely conflict at extreme levels (e.g., humor=5 + formality=5, or assertiveness=5 + politeness=5). Acknowledge the tension honestly, propose a feasible intensity split (e.g., humor=4, formality=3), and let the user decide which attribute to prioritize.
The text sounds mechanical or forced. Multi-concept text that reads like a checklist is worse than missing one target by a point. Prioritize naturalness. If blending feels artificial, lower one attribute by one level and check if the text flows better.
Labroo, A., Sheth, I., Raina, V., Ahmed, A., & Fritz, M. (2026). Funny or Persuasive, but Not Both: Evaluating Fine-Grained Multi-Concept Control in LLMs. EACL 2026. arXiv:2601.18483. Key takeaway: naive multi-attribute prompts systematically degrade output; sequential single-concept generation with self-evaluation scoring preserves both attributes.
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".