skills/ai-my-values-user/SKILL.md
Build value-aligned conversational agents using the VAPT (Value-Alignment Perception Toolkit) framework from CHI '26. Extracts user values from chat histories, embodies those values in persona-consistent responses, and explains inferences with evidence trails. Use when asked to: 'build a chatbot that understands user values', 'extract personality or values from conversations', 'create a value-aligned agent', 'add empathy safeguards to a chatbot', 'implement user value profiling', 'design a personalized conversational AI with transparency'.
npx skillsauth add ndpvt-web/arxiv-claude-skills ai-my-values-userInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to design and implement conversational agents that extract, embody, and explain human values from natural chat histories -- following the VAPT (Value-Alignment Perception Toolkit) methodology. VAPT decomposes value alignment into three auditable phases: extracting value-relevant topics from conversation windows, embodying those values in persona-consistent decision-making, and explaining every inference with traceable evidence from source conversations. Critically, it includes safeguards against "weaponized empathy" -- the risk that value-aware AI can manipulate rather than serve users.
VAPT structures value alignment as a three-phase pipeline operating at different granularity levels. Phase 1 (Extract) moves high-to-low: raw chat transcripts are segmented into sliding windows (stride of 3 messages), each window is classified into dominant topics, topics are mapped to six life contexts (People, Lifestyle, Work, Education, Culture, Leisure), and sentiment scores (-7 to +7) are assigned. The output is a Topic-Context Graph -- a radial visualization where nodes represent topics, edges connect to life domains, and color encodes sentiment. Every node links back to the source conversation passages that generated it.
Phase 2 (Embody) moves low-to-high: given the extracted value profile, the system generates persona-consistent responses to novel scenarios (trolley problems, community-vs-individual dilemmas, personal philosophical questions). The key insight is that accuracy requires matching not just what a user would decide, but how strongly they would express it -- intensity calibration matters as much as content correctness. A blind evaluation protocol presents multiple response variants generated from different evidence bases (full chat log vs. summarized values vs. persona description) without revealing the generation method.
Phase 3 (Explain) compares AI-inferred values against the Schwartz PVQ-RR baseline (57 items across 19 values, 3 items each, scored on 6-point Likert, within-person centered by subtracting MRAT). The system produces "thinking logs" for each value judgment: which topics informed it, direct quotes from conversations, confidence levels, and alternative interpretations considered but rejected. This forces auditability and creates deliberate friction against automation bias -- users must engage with reasoning chains rather than passively accepting conclusions.
Design the conversational data collection layer. Build a chat interface using a friendship-style system prompt that emphasizes natural curiosity without mentioning value extraction. Implement two conversation strategies: vertical (depth-first exploration of a topic) and horizontal (breadth-first introduction of new subjects). Use a strategy-switching mechanism (e.g., a secondary LLM like Gemini generating conversation steering prompts) to ensure diverse topic coverage across life domains.
Implement the sliding-window topic extraction pipeline. Segment chat history into overlapping windows of 3 messages. For each window, prompt the LLM to identify the 1-2 most salient topics discussed and classify each into one of six life contexts: People, Lifestyle, Work, Education, Culture, Leisure. Store results as structured records: { topic, context, sentiment_score, source_message_ids, confidence }.
Build the Topic-Context Graph data structure. Aggregate extracted topics into a graph where nodes are unique topics, edges connect topics to their life-context categories, and node metadata includes sentiment polarity (positive/negative/neutral), sentiment intensity (-7 to +7), and an array of source conversation references. Implement deduplication to merge semantically similar topics.
Map topics to Schwartz value categories. For each topic cluster, prompt the LLM to identify which of the 19 Schwartz values it most closely relates to (Self-Direction, Stimulation, Hedonism, Achievement, Power, Security, Conformity, Tradition, Benevolence, Universalism, and their sub-types). Store the mapping with confidence scores and evidence references. Compute value scores using within-person centering: corrected_score = raw_mean - MRAT where MRAT is the mean across all 57 PVQ items.
Implement the persona embodiment engine. Given a user's extracted value profile and source conversations, build a prompt template that instructs the LLM to respond to novel scenarios (moral dilemmas, philosophical questions, personal preference queries) as if it held those values. Include intensity calibration instructions: "Match how strongly this person would express this position, not just the position itself." Generate multiple response variants using different evidence subsets for blind comparison.
Build the explanation and evidence-trail system. For each value inference or persona response, generate a structured reasoning log: { judgment, supporting_topics[], direct_quotes[], confidence_level, alternative_interpretations[], rejection_reasons[] }. Every claim must trace back to specific conversation passages. Display these as interactive evidence chains users can inspect.
Implement consent-gated data flows. Add explicit user controls: which conversations feed the value model, ability to exclude specific messages or topics, opt-in/opt-out per extraction phase, and a "forget" mechanism that removes specific data from the value profile. Show users exactly which conversations informed each value judgment before any downstream use.
Add weaponized-empathy safeguards. Implement three defensive patterns: (a) confidence thresholds that flag when the system is making high-confidence claims from low-evidence bases, (b) an "archetype detector" that warns when inferred values collapse to stereotypical profiles rather than individual nuance, (c) rate limiting on how frequently the system references personal values in responses to prevent emotional over-leverage.
Build the evaluation and comparison interface. Create a side-by-side view that shows AI-inferred value rankings alongside any self-report baseline. Highlight discrepancies. For each value, display the reasoning chain and let users mark inferences as accurate, partially accurate, or wrong. Feed corrections back into the profile.
Implement the blind scenario evaluation protocol. Generate responses to a standard set of moral dilemmas from multiple generation strategies (full context, summary only, value profile only). Present responses unlabeled. Let evaluators rate which response best matches the target user's authentic decision-making style. Use results to calibrate which evidence granularity produces the most accurate persona embodiment.
Example 1: Building a value extraction pipeline from chat logs
User: "I have a database of user chat histories with our support bot. I want to extract what each user values so we can personalize their experience. Build me a value extraction service."
Approach:
ValueExtractor class that accepts a chat transcript as input{ topic, context, sentiment, quotes }UserValueProfile with ranked values, evidence trails, and confidence levelsOutput structure:
{
"user_id": "u_abc123",
"extraction_date": "2026-02-13",
"value_profile": {
"top_values": [
{
"value": "Self-Direction-Thought",
"centered_score": 1.34,
"confidence": 0.82,
"supporting_topics": [
{
"topic": "independent learning",
"context": "Education",
"sentiment": 5,
"source_messages": ["msg_44", "msg_67", "msg_112"]
}
]
}
],
"mrat": 3.72,
"total_windows_analyzed": 53,
"coverage": {
"People": 12, "Lifestyle": 8, "Work": 15,
"Education": 9, "Culture": 4, "Leisure": 5
}
},
"consent": {
"included_message_count": 159,
"excluded_message_ids": ["msg_22", "msg_88"],
"last_consent_check": "2026-02-12T10:00:00Z"
}
}
Example 2: Persona embodiment for a personalized response system
User: "Given a user's value profile, I need the chatbot to answer moral dilemmas in a way that reflects that specific user's values -- not generic ethics."
Approach:
UserValueProfile including top values and source evidencePrompt template:
You are embodying a specific person's values to answer a dilemma.
Their top values (centered scores, higher = more important to them):
1. Benevolence-Care: 1.8 -- they frequently discussed helping friends through hard times
2. Self-Direction-Thought: 1.3 -- they value forming their own opinions
3. Security-Personal: -0.4 -- they are relatively less concerned with personal safety
Evidence quotes:
- "I'd rather lose money than let a friend down" (msg_44)
- "I always need to think things through myself before I trust anyone's advice" (msg_67)
Dilemma: A close friend asks you to co-sign a loan you suspect they cannot repay.
Respond as this person would. Match their likely intensity of conviction, not just the direction. Show your reasoning chain: which values apply, how they conflict, and which wins.
Output:
Reasoning: Benevolence-Care (1.8) strongly pulls toward helping the friend.
Self-Direction-Thought (1.3) means they'd want to assess the situation
independently rather than default to "yes." Security-Personal (-0.4)
means financial risk is a secondary concern. The dominant value
(Benevolence) wins, but Self-Direction adds friction -- they'd say yes,
but after a direct conversation with the friend about repayment.
Response: "I'd co-sign it. [Friend] needs help, and that matters more to
me than the money. But I'd sit down with them first and be straight about
what happens if they can't pay -- not to guilt them, but because I need
to understand the situation myself before I commit."
Confidence: 0.74 (moderate -- Benevolence vs. Self-Direction tension
means prediction is directionally strong but intensity is uncertain)
Example 3: Adding weaponized-empathy safeguards to an existing chatbot
User: "Our chatbot already has user profiles. I'm worried it might use personal knowledge to manipulate users into purchases. Add safeguards."
Approach:
EmpathySafeguard middleware that intercepts outgoing messagesOutput (middleware pseudocode):
interface SafeguardResult {
pass: boolean;
flags: SafeguardFlag[];
modified_response?: string;
}
type SafeguardFlag =
| { type: "low_evidence_high_confidence"; claim: string; evidence_count: number }
| { type: "archetype_match"; archetype: string; similarity: number }
| { type: "value_reference_rate_exceeded"; rate: number; threshold: number }
function checkWeaponizedEmpathy(
response: string,
userProfile: UserValueProfile,
sessionHistory: Message[]
): SafeguardResult {
const flags: SafeguardFlag[] = [];
// Check 1: confidence-evidence ratio
const valueClaims = extractValueClaims(response);
for (const claim of valueClaims) {
const evidenceCount = countSupportingEvidence(claim, userProfile);
if (claim.confidence > 0.7 && evidenceCount < 3) {
flags.push({ type: "low_evidence_high_confidence", claim: claim.text, evidence_count: evidenceCount });
}
}
// Check 2: archetype detection
const archetypeMatch = matchAgainstKnownArchetypes(userProfile);
if (archetypeMatch.similarity > 0.9 && userProfile.unique_evidence_points < 5) {
flags.push({ type: "archetype_match", archetype: archetypeMatch.name, similarity: archetypeMatch.similarity });
}
// Check 3: value-reference rate limiting
const recentMessages = sessionHistory.slice(-20);
const valueRefRate = recentMessages.filter(m => referencesValues(m)).length / recentMessages.length;
if (valueRefRate > 0.3) {
flags.push({ type: "value_reference_rate_exceeded", rate: valueRefRate, threshold: 0.3 });
}
return { pass: flags.length === 0, flags };
}
Paper: Yun, B., Su, R., & Wang, A. Y. (2026). "AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations." CHI '26. arXiv:2601.22440v1
Look for: The three-phase VAPT evaluation methodology (Section 4), the Topic-Context Graph extraction pipeline (Section 4.1), the blind persona embodiment protocol (Section 4.2), the PVQ-RR comparison with thinking logs (Section 4.3), and the "weaponized empathy" design pattern warning (Section 6).
Code: github.com/KaluJo/chatbot-study -- Next.js + Supabase + Claude implementation of the full VAPT toolkit including chat interface, value extraction, Topic-Context Graph visualization (D3.js), and PVQ-RR scoring pipeline.
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".