skills/amem4rec-leveraging-cross-user-similarity/SKILL.md
Build agentic recommendation systems that learn collaborative filtering signals through cross-user memory evolution -- no CF model pre-training needed. Use when: 'build a recommender with memory', 'add collaborative filtering to LLM recommendations', 'cross-user pattern memory pool', 'agentic recommender system', 'memory-augmented ranking', 'evolving user behavior patterns for recommendations'.
npx skillsauth add ndpvt-web/arxiv-claude-skills amem4rec-leveraging-cross-user-similarityInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to implement AMEM4Rec, a recommendation architecture where an LLM-based agent learns collaborative filtering signals end-to-end by abstracting user behaviors into a shared global memory pool, linking similar memories via dual validation (embedding similarity + semantic LLM check), and iteratively evolving those memories to reinforce cross-user patterns. At inference time, retrieved memories augment the LLM ranker with collaborative context, eliminating the need for a separate pre-trained CF model.
The core insight: Instead of relying on a pre-trained collaborative filtering model (matrix factorization, graph neural networks, etc.), AMEM4Rec makes CF signals emerge from a shared memory pool. User interaction histories are processed through sliding windows and abstracted by an LLM into structured memory entries -- each containing a high-level behavior explanation and a concrete interaction pattern description. These memories are encoded via Sentence-BERT into an embedding space and stored in a global pool shared across all users.
Memory linking and evolution: When a new memory is created, it is compared against existing memories using cosine similarity. A dual-validator system decides what to do: a similarity validator uses a distribution-aware decision tree with soft thresholds (tau_low=0.55, tau_high=0.9) to classify the new memory into STORE-only, UPDATE-and-STORE, or UPDATE-only actions. A semantic validator (an LLM call) confirms whether candidate links are truly semantically related, filtering out false positives from embedding similarity alone. Linked memories are then evolved -- the LLM merges and reinforces shared patterns, causing related behavior archetypes to cluster tighter in embedding space over iterations.
Inference: At recommendation time, the user's recent history is encoded and matched against the evolved memory pool. The top-k retrieved memories, which now encode cross-user collaborative patterns, are provided as context to an LLM ranker alongside the user's personal history and candidate items. The ranker reorders candidates informed by both individual preferences and population-level behavioral patterns.
Create a structured format for memories. Each memory m_k is a tuple (p_k, e_k) where p_k contains two fields and e_k is the embedding:
@dataclass
class Memory:
behavior_explanation: str # 1-2 sentences: stable, high-level user tendency
pattern_description: str # 1-2 sentences: concrete interaction structure
embedding: np.ndarray # Sentence-BERT encoding of both fields concatenated
linked_memory_ids: list[str] # IDs of semantically linked memories
evolution_count: int = 0 # how many times this memory has been updated
Partition each user's interaction sequence into overlapping windows of size w (default w=3). For each window, prompt the LLM to abstract the behavior:
Prompt: Given the following user interactions:
1. [Item title, category, action]
2. [Item title, category, action]
3. [Item title, category, action]
Generate:
- behavior_explanation: A 1-2 sentence description of the stable, high-level user tendency shown here. Do NOT mention specific item names.
- pattern_description: A 1-2 sentence description of the concrete interaction structure (e.g., "explores budget options then upgrades to premium").
Critical rule: behavior explanations must NEVER leak specific item names -- they must describe abstract behavioral tendencies only.
Encode each memory's concatenated text (behavior_explanation + " " + pattern_description) using Sentence-BERT (or any sentence embedding model). Store all memories in a shared pool with an efficient nearest-neighbor index (FAISS or similar):
from sentence_transformers import SentenceTransformer
encoder = SentenceTransformer('all-MiniLM-L6-v2')
memory_pool = {} # id -> Memory
index = faiss.IndexFlatIP(384) # cosine similarity via inner product on normalized vectors
For each new memory m_n, retrieve top-k (k=5) nearest neighbors from the pool by cosine similarity. Compute the score distribution metrics:
scores = cosine_similarity(m_n.embedding, pool_embeddings) # top-k
s_max = max(scores)
p_high = sum(1 for s in scores if s >= tau_high) / k # tau_high = 0.9
p_medium = sum(1 for s in scores if tau_low <= s < tau_high) / k # tau_low = 0.55
p_low = sum(1 for s in scores if s < tau_low) / k
Use the soft-threshold mechanism to decide the action:
if s_max < tau_low:
action = "STORE_ONLY" # novel pattern, no similar memories
elif s_max < tau_high:
action = "UPDATE_AND_STORE" # partially similar, keep both
elif p_high >= 0.6:
action = "UPDATE_ONLY" # highly redundant, merge into existing
else:
action = "UPDATE_AND_STORE" # high max but sparse distribution
For candidate links identified by embedding similarity, ask the LLM to confirm semantic relatedness:
Prompt: Given these two user behavior patterns:
Memory A: [behavior_explanation + pattern_description]
Memory B: [behavior_explanation + pattern_description]
Are these describing fundamentally the same or closely related user behavior tendency?
Answer YES or NO with a one-sentence justification.
Only link memories that pass both the similarity threshold AND the semantic validation.
For UPDATE actions, prompt the LLM to merge the new memory with each validated linked memory:
Prompt: You have two related behavior patterns from different users:
Existing: [memory text]
New: [memory text]
Produce an evolved version that reinforces the shared pattern while preserving nuances.
Output the same format: behavior_explanation + pattern_description.
Re-encode the evolved memory and update it in the pool. Increment evolution_count.
Process all users' windows sequentially (or in batches). The memory pool grows and evolves organically -- early memories get reinforced as more users with similar patterns are processed. Memories with high evolution_count represent strong cross-user behavioral archetypes.
At inference time for a target user:
# Encode user's recent history
user_context = encode_user_history(user_recent_interactions)
# Retrieve top-k_mem memories (k_mem=5)
relevant_memories = memory_pool.search(user_context, k=5)
# Construct ranking prompt
prompt = f"""Given this user's recent interactions:
{format_history(user_recent_interactions)}
Cross-user behavioral patterns relevant to this user:
{format_memories(relevant_memories)}
Rank the following candidate items from most to least relevant:
{format_candidates(candidate_items)}
Return a ranked list with brief justifications."""
Measure NDCG@K (K in {1, 5, 10}) on a held-out test set. Key hyperparameters to tune:
w (default 3)k (default 5)tau_low (default 0.55) and tau_high (default 0.9)k_memExample 1: E-commerce product recommender
User: "I have a dataset of Amazon purchase histories. Build me a recommender that uses LLM reasoning but also captures collaborative patterns across users."
Approach:
Output:
{
"user_id": "U123",
"retrieved_memories": [
{"pattern": "Budget-conscious fitness equipment exploration with gradual upgrade trajectory", "evolution_count": 47},
{"pattern": "Home workout setup building across complementary equipment categories", "evolution_count": 31}
],
"ranked_items": [
{"rank": 1, "item": "Adjustable Kettlebell 25lb", "reason": "Fits budget upgrade trajectory + home workout pattern"},
{"rank": 2, "item": "Exercise Mat Premium", "reason": "Complements existing equipment pattern"},
{"rank": 3, "item": "Protein Shaker Bottle", "reason": "Adjacent category to fitness equipment interest"}
]
}
Example 2: News article recommender
User: "I'm building a news recommendation system. Users have short reading histories and I need to capture what types of articles similar readers consume."
Approach:
Output:
Retrieved memories for user with history ["GPT-5 Announced", "OpenAI Valuation Soars"]:
- "AI industry follower: tracks product launches then reads business/financial implications" (evolved 82 times)
- "Tech investor mindset: reads announcements then seeks market analysis" (evolved 56 times)
Top recommendations:
1. "What GPT-5 Means for Enterprise AI Adoption" (analysis piece matching both memory patterns)
2. "AI Chip Stocks Rally After OpenAI News" (financial angle from investor pattern)
Example 3: Adding memory evolution to an existing agent framework
User: "I already have an LLM agent that makes recommendations from user history. How do I add the AMEM4Rec memory pool to it?"
Approach:
MemoryPool module with Sentence-BERT encoder and FAISS indexIntegration code skeleton:
class MemoryAugmentedRecommender:
def __init__(self, base_agent, memory_pool):
self.agent = base_agent
self.pool = memory_pool
def recommend(self, user_id, candidates):
history = self.agent.get_user_history(user_id)
history_embedding = self.pool.encode(history)
memories = self.pool.retrieve(history_embedding, k=5)
ranking_context = {
"user_history": history,
"cross_user_patterns": [m.to_text() for m in memories],
"candidates": candidates
}
return self.agent.rank(ranking_context)
evolution_count on memories. Highly-evolved memories (reinforced by many users) represent strong population-level signals and should be weighted higher during retrieval.evolution_count == 0 after all users are processed (orphan memories that never linked to any other). For active systems, set a TTL or cap pool size by evicting lowest-evolution-count entries.w interactions cannot fill a sliding window. Handle by encoding their partial history directly and retrieving memories -- the cross-user memory pool specifically helps here since it provides population-level context.Paper: AMEM4Rec: Leveraging Cross-User Similarity for Memory Evolution in Agentic LLM Recommenders (Nguyen, Kieu, Le, 2026). Key sections: Section 3 for the full architecture, Algorithm 1 for the training/inference pseudocode, and Appendix A for the exact prompts used in memory abstraction, linking validation, evolution, and ranking.
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".