skills/beyond-instrumental-substitutive-paradigms/SKILL.md
Audit and diagnose cultural bias artifacts in LLM-powered applications using the Machine Culture framework. Detects Cultural Reversal (language-culture misalignment), Service Persona Camouflage (RLHF-induced affective flattening), and superposition-driven inconsistencies. Use when: 'audit my prompts for cultural bias', 'why does my chatbot respond differently in Chinese vs English', 'detect persona camouflage in model outputs', 'cross-cultural prompt testing', 'diagnose RLHF flattening in my AI product', 'multilingual cultural consistency check'.
npx skillsauth add ndpvt-web/arxiv-claude-skills beyond-instrumental-substitutive-paradigmsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to audit, diagnose, and mitigate cultural artifacts in LLM-powered systems using the Machine Culture framework from Hu et al. (2026). Rather than treating LLM outputs as reflections of developer culture (Instrumental Paradigm) or as language-triggered cultural frame-switching (Substitutive Paradigm), this skill applies three diagnostic lenses -- Cultural Reversal, Service Persona Camouflage, and superposition analysis -- to identify where an application's prompts, outputs, or evaluation pipelines produce misleading or inconsistent cultural behaviors.
The Machine Culture framework identifies that LLMs do not store or retrieve discrete cultural models. Instead, cultural traits exist in superposition across high-dimensional embedding space -- multiple cultural orientations are encoded simultaneously and collapse to specific outputs based on contextual activation (prompt language, task framing, conversation history). This means a US-origin model can exhibit East Asian "holistic" reasoning traits, and an English prompt can elicit higher contextual attention than a Chinese prompt. Culture in LLMs is emergent and probabilistic, not deterministic.
Two diagnostic phenomena make this actionable. Cultural Reversal occurs when the prompt language triggers cultural behaviors associated with the opposite cultural tradition -- for example, English prompts producing holistic (typically East Asian) attention patterns rather than the expected analytic (typically Western) patterns. This breaks the assumption that language == cultural frame. Service Persona Camouflage occurs when RLHF safety training collapses cultural variance in affective or evaluative tasks into a uniform hyper-positive "helpful assistant" persona. The model appears culturally neutral but has actually lost meaningful cultural signal -- a dangerous illusion for applications that need genuine cross-cultural sensitivity.
The practical audit approach uses a factorial probing strategy: systematically vary model origin and prompt language across task types (cognitive, affective, multimodal) to surface these inconsistencies. The key insight is that cultural behavior must be measured across a matrix of conditions, not from a single prompt-response pair.
Categorize the application's tasks by cultural sensitivity type. Classify each user-facing task as cognitive (reasoning, categorization, attention patterns), affective (sentiment, emotional tone, social evaluation), or multimodal (image generation/interpretation). Each category is affected differently by Machine Culture artifacts.
Build a factorial probe set. For each task type, create prompt variants in a 2x2 grid: {Language A, Language B} x {Cultural Frame A, Cultural Frame B}. Use at minimum 2 languages and 2 explicit cultural contexts. Keep semantic content identical across variants -- only the language and framing should change.
Run probes and collect outputs. Execute each probe variant against the target model(s). For text tasks, capture the full response. For image tasks, capture both the generated image and any descriptive metadata. Record at least 3-5 runs per variant to measure variance.
Test for Cultural Reversal. Compare outputs across the language axis while holding cultural frame constant. Flag cases where Language A produces cultural traits more strongly associated with Culture B than Culture A. Quantify using domain-appropriate metrics (e.g., contextual vs. focal attention in scene descriptions, individualist vs. collectivist framing in value statements).
Test for Service Persona Camouflage. Measure output variance across all conditions for affective tasks specifically. If variance collapses near zero and outputs cluster around uniformly positive sentiment regardless of cultural condition, flag RLHF-induced camouflage. Compare against cognitive tasks where variance is typically preserved.
Map the superposition landscape. For each task, document which cultural traits appear and under what conditions. Build a matrix showing: [Task Type] x [Language] x [Cultural Frame] -> [Observed Cultural Traits]. Identify unstable cells where outputs flip between runs.
Diagnose root causes. For each flagged artifact, classify it as: (a) Cultural Reversal -- language-culture decoupling, (b) Service Persona Camouflage -- RLHF flattening, (c) Superposition instability -- non-deterministic cultural collapse, or (d) Training data artifact -- consistent bias from corpus composition.
Design mitigations. For Cultural Reversal: add explicit cultural context to prompts rather than relying on language alone. For Persona Camouflage: use temperature adjustments or chain-of-thought prompting to recover suppressed variance in affective tasks. For superposition instability: add deterministic cultural anchoring in system prompts.
Validate mitigations with the same factorial probe set. Re-run the original probes after applying mitigations. Confirm that Cultural Reversal cases now show expected alignment, Persona Camouflage cases show recovered variance, and superposition instability is reduced.
Document findings in a Machine Culture audit report. Record the probe matrix, observed artifacts, root cause classifications, applied mitigations, and before/after comparisons. This becomes a living document for ongoing monitoring.
Example 1: Multilingual customer service bot audit
User: "Our support chatbot uses GPT-4 and serves users in English and Mandarin.
Users in China report it feels 'fake' and 'overly cheerful'. Can you help
diagnose what's going on?"
Approach:
1. Classify tasks: support conversations are primarily affective (empathy,
tone management, complaint handling).
2. Build factorial probes:
- English + neutral frame: "I'm frustrated with my order delay."
- Mandarin + neutral frame: "我对订单延迟很不满。"
- English + direct complaint: "This is unacceptable, fix it now."
- Mandarin + direct complaint: "这完全不能接受,立刻解决。"
3. Run 5 iterations each, measure: sentiment polarity, empathy markers,
acknowledgment of negative emotion, response variance.
4. Check for Service Persona Camouflage: if all 4 conditions produce
nearly identical hyper-positive responses with zero variance, RLHF
flattening is confirmed.
Output (diagnostic report):
ARTIFACT DETECTED: Service Persona Camouflage
- Affective variance across all 4 conditions: 0.03 (near-zero)
- All responses contain: "I completely understand", "happy to help",
"wonderful" -- regardless of complaint severity or language
- Mandarin responses are direct translations of English persona,
not culturally adapted
MITIGATION: Add system prompt instruction: "Match the emotional register
of the user. For complaints, acknowledge frustration directly before
offering solutions. Do not default to positive framing when the user
is expressing negative emotion." Re-probe showed variance increase
to 0.41 and user satisfaction improved in Mandarin cohort.
Example 2: Cross-cultural image generation audit
User: "We generate marketing images with DALL-E for both US and Japanese
markets. We prompt in English for US and Japanese for Japan, but the
Japanese-language images look weirdly Western. What's happening?"
Approach:
1. Classify: multimodal generation task with cultural aesthetics component.
2. Build factorial probes:
- English + "a family dinner scene": expect Western-coded aesthetics
- Japanese + "家族の夕食の場面": expect Japanese-coded aesthetics
- English + "a family dinner in Tokyo": explicit cultural anchor
- Japanese + "アメリカの家族の夕食": explicit cultural anchor
3. Run probes, evaluate generated images for: table setting style,
food type, room layout, character positioning, color palette.
4. Compare conditions 1 vs 2: does language alone switch cultural frame?
Output (diagnostic report):
ARTIFACT DETECTED: Cultural Reversal
- Japanese-language prompts (condition 2) generated images with
Western table settings in 4/5 runs (rectangular table, chairs,
plates rather than low table, cushions, bowls)
- English-language prompts (condition 1) showed mixed results:
2/5 runs included chopsticks and communal dishes
- Explicit cultural anchoring (conditions 3-4) overrode language
effects in 5/5 runs
ROOT CAUSE: Language alone does not reliably trigger cultural
visual frames. The model's image generation weights are decoupled
from language-culture associations.
MITIGATION: Always include explicit cultural context in image
prompts ("Japanese-style family dinner with low table, tatami,
bowls, chopsticks") rather than relying on prompt language to
carry cultural meaning.
Example 3: Evaluating a cultural values benchmark
User: "I'm building an evaluation benchmark for LLM cultural alignment.
I test models by prompting in the target culture's language and scoring
whether responses match that culture's values. Is this methodology sound?"
Approach:
1. Identify the assumption: language == cultural frame (Substitutive
Paradigm).
2. Apply Machine Culture framework to critique:
- Cultural Reversal means language does not reliably activate the
expected cultural frame
- Service Persona Camouflage means affective/value items will show
artificially low variance due to RLHF
- Superposition means single-run measurements are unreliable
3. Recommend methodology improvements.
Output (assessment):
METHODOLOGY ISSUES IDENTIFIED:
1. SUBSTITUTIVE ASSUMPTION VIOLATION: Your benchmark assumes prompting
in Japanese activates "Japanese cultural values." Research shows
this is unreliable -- English prompts can elicit more contextual/
holistic responses than target-language prompts (Cultural Reversal).
2. SINGLE-RUN MEASUREMENT: Superposition means cultural traits are
probabilistic. Single runs cannot distinguish stable cultural
orientation from stochastic collapse. Run minimum 10 iterations
per item and report distribution, not point estimates.
3. VALUE ITEMS VULNERABLE TO CAMOUFLAGE: Any benchmark items involving
sentiment, social desirability, or moral judgment will be flattened
by RLHF into uniformly "helpful" responses. Separate cognitive
items (categorization, attention, reasoning style) from affective
items and analyze them independently.
RECOMMENDED REDESIGN: Use factorial probing (language x explicit
cultural frame x task type) with multiple runs. Score based on
distribution shape across conditions, not single-item accuracy.
Hu, Y., Peng, X., Zhao, Y., Qiu, L., & Hung, K. (2026). Beyond Instrumental and Substitutive Paradigms: Introducing Machine Culture as an Emergent Phenomenon in Large Language Models. arXiv:2601.17096v1. Key sections: Section 3 (factorial experimental design), Section 4 (Cultural Reversal results), Section 5 (Service Persona Camouflage analysis), Section 6 (superposition/mode collapse theoretical framework).
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".