skills/beyond-translation-cross-cultural-meme/SKILL.md
Cross-cultural meme transcreation using a three-stage hybrid pipeline (cultural analysis, visual generation, assembly) that preserves humor and communicative intent while adapting culture-specific references between languages. Triggers: 'transcreate this meme', 'adapt meme for Chinese audience', 'convert meme to US culture', 'cross-cultural meme adaptation', 'localize this meme for another culture', 'meme cultural translation'
npx skillsauth add ndpvt-web/arxiv-claude-skills beyond-translation-cross-cultural-memeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to guide the implementation of a hybrid meme transcreation pipeline that adapts memes between cultures -- not by translating text, but by replacing culture-specific humor, visual references, and linguistic conventions with equivalents that resonate in the target culture. Based on the MemeXGen framework, the pipeline separates culture-invariant elements (core emotion, humor mechanism, communicative intent) from culture-specific elements (idioms, visual symbols, pop culture references, text style) and handles each independently across three stages: cultural analysis with caption generation, visual template generation, and final meme assembly.
The MemeXGen framework treats meme adaptation as transcreation rather than translation. Translation assumes a one-to-one mapping of words; transcreation acknowledges that humor, cultural references, and visual conventions must be recreated in the target culture. The core insight is a strict separation: identify what is universal (the emotional payload, the humor type -- sarcasm, exaggeration, irony) and what is culture-bound (specific idioms, celebrity references, visual templates, text density conventions). Only culture-bound elements get replaced; universal elements are preserved as anchors.
The framework operates in three sequential stages. Stage 1 (Cultural Analysis) uses a vision-language model (LLaVA 1.6 13B in the paper) to analyze the source meme, extract its communicative intent, identify culture-specific references, map those references to target-culture equivalents, and generate a transcreated caption plus visual recommendations. Stage 2 (Visual Generation) feeds those visual recommendations to an image generation model (FLUX.1 Schnell) to produce a culture-appropriate visual template. Stage 3 (Assembly) composites the transcreated caption onto the generated visual using culture-aware typography -- Chinese memes use denser text layouts, while English memes use more spaced, larger fonts with Impact-style conventions.
A critical finding is directional asymmetry: US-to-Chinese transcreation scores 4.48/5.0 while Chinese-to-US scores 3.93/5.0. This gap arises because US memes rely on globally recognizable templates while Chinese memes depend on context-specific wordplay and implicit cultural concepts that lack direct Western equivalents. Any implementation must account for this asymmetry by investing more effort in the cultural analysis stage when adapting from high-context cultures.
Ingest the source meme -- Accept a meme image and its source culture label (e.g., "US", "Chinese"). If the meme contains text, run OCR or use a VLM to extract it. Store the raw image, extracted text, and culture label as structured input.
Perform cultural analysis -- Prompt a vision-language model with the meme image and instruct it to output: (a) a description of all culture-specific references (idioms, celebrities, visual symbols, meme templates), (b) the core communicative intent in one sentence, (c) the humor mechanism (sarcasm, wordplay, exaggeration, irony, absurdism), (d) the emotional tone (joy, anger, sadness, fear, disgust, surprise) with intensity 1-5, and (e) which elements are universal vs. culture-bound. Use temperature 0.7 for balanced creativity.
Map culture-specific elements to target equivalents -- For each culture-bound element identified in step 2, prompt the model to propose a target-culture equivalent. For example: a reference to "996 work culture" maps to "hustle culture" or "quiet quitting"; a Weibo-style reaction maps to a Twitter/Reddit-style reaction. Require the model to justify each mapping with a brief rationale.
Generate the transcreated caption -- Using the intent, humor mechanism, emotional tone, and mapped equivalents, prompt the model to write a meme caption in the target language that: preserves the original emotional effect, uses natural meme language conventions of the target culture (not formal/literary register), and incorporates the mapped cultural references. Avoid literal translation at all costs.
Produce visual recommendations -- From the cultural analysis, generate a structured description of what the target meme's visual should depict: character types, scene setting, composition style, and any culture-specific visual conventions (e.g., Chinese memes often use cartoon/anime-derived characters; US memes favor reaction photos and established templates).
Generate the target visual -- Feed the visual recommendations into an image generation model (e.g., FLUX, SDXL, or DALL-E). Specify 1024x1024 resolution and meme-appropriate style (bold, clear subjects, simple backgrounds). If the source meme uses a well-known template with a target-culture equivalent, reference that template explicitly.
Assemble the final meme -- Composite the transcreated caption onto the generated visual. Apply culture-aware typography: for Chinese targets, use denser text with appropriate CJK fonts (e.g., Noto Sans CJK); for English targets, use Impact or similar bold sans-serif with stroke outlines. Position text according to target culture conventions (top/bottom for US Impact-style, overlaid or sidebar for many Chinese formats).
Evaluate transcreation quality -- Score the output on six dimensions (each 1-5): Caption Quality (clarity, tone, meme-appropriateness), Image Quality (visual clarity, composition), Synergy (image-text coherence), Cultural Fit (relatability for target audience), Intent Preservation (fidelity to original message), and Overall Score (average). Use a VLM evaluator (Qwen-VL-Max shows strongest human correlation at r=0.926) or human reviewers.
Iterate on failures -- If any dimension scores below 3.0, diagnose the failure mode: formal speech dampening humor (rewrite caption in casual register), visual disconnect from caption (regenerate visual with more specific prompt), or humor mechanism lost (re-analyze source and try alternative cultural mapping). Re-run from the failed stage.
Example 1: US meme to Chinese adaptation
User: "Adapt this 'distracted boyfriend' meme about choosing pizza over salad for a Chinese audience."
Approach:
Output: A meme using a familiar Chinese format with the milk tea vs. healthy choice framing that preserves the ironic self-awareness humor of the original while using references that resonate with Chinese internet culture.
Example 2: Chinese meme to US adaptation
User: "This Chinese meme shows a student saying '我太难了' (I'm so tired/it's so hard) with a crying face about exam season. Adapt it for US audiences."
Approach:
Output: A US-format meme using "this is fine" energy with finals week context that preserves the hyperbolic self-deprecation while using formats and references native to US meme culture.
Example 3: Implementing the evaluation pipeline
User: "I need to evaluate whether my meme transcreations are good. Build me an evaluation script."
Approach:
Output:
import json
from dataclasses import dataclass
@dataclass
class TranscreationEval:
caption_quality: float # Clarity, tone, meme-appropriateness
image_quality: float # Visual clarity, composition
synergy: float # Image-text coherence
cultural_fit: float # Relatability for target audience
intent_preservation: float # Fidelity to original message
@property
def overall(self) -> float:
dims = [self.caption_quality, self.image_quality,
self.synergy, self.cultural_fit, self.intent_preservation]
return sum(dims) / len(dims)
@property
def failure_dimensions(self) -> list[str]:
return [name for name, val in {
"caption_quality": self.caption_quality,
"image_quality": self.image_quality,
"synergy": self.synergy,
"cultural_fit": self.cultural_fit,
"intent_preservation": self.intent_preservation,
}.items() if val < 3.0]
EVAL_PROMPT = """Rate this meme transcreation on a 1-5 scale for each dimension.
Source culture: {source_culture} | Target culture: {target_culture}
Dimensions:
1. Caption Quality: Is the caption clear, natural meme language, appropriate tone?
2. Image Quality: Is the visual clear, well-composed, recognizable?
3. Synergy: Do image and text work together to convey humor/emotion?
4. Cultural Fit: Would the target audience find this relatable and natural?
5. Intent Preservation: Does this preserve the original meme's message and feeling?
Return JSON: {{"caption_quality": N, "image_quality": N, "synergy": N,
"cultural_fit": N, "intent_preservation": N}}"""
def evaluate_pair(vlm_client, source_img, target_img,
source_culture: str, target_culture: str) -> TranscreationEval:
prompt = EVAL_PROMPT.format(
source_culture=source_culture, target_culture=target_culture
)
response = vlm_client.analyze(
images=[source_img, target_img], prompt=prompt
)
scores = json.loads(response)
return TranscreationEval(**scores)
Beyond Translation: Cross-Cultural Meme Transcreation with Vision-Language Models (Zhao, Zhang, Ignat, 2026). Look for: the three-stage hybrid pipeline architecture (Section 3), the six-dimension evaluation rubric (Section 4), directional asymmetry analysis (Section 5), and success/failure pattern taxonomy (Section 6). Code and dataset: github.com/AIM-SCU/MemeXGen.
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".