skills/memorybench/SKILL.md
Automatically benchmark your custom memory implementation against established systems like Supermemory. Set up a public benchmark, or create your own. Compare solutions against quality, latency, features and cost, easily, with a simple UI and CLI.
npx skillsauth add supermemoryai/memorybench benchmark-contextInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Automatically benchmark your custom memory implementation against established systems like Supermemory, Mem0, and Zep.
When you invoke this skill from your project, it handles the complete benchmarking process end-to-end:
No manual commands needed - everything runs automatically from start to finish.
Use this skill when you:
Phase 1: Setup
./memorybench)Phase 2: Discovery
Phase 3: Code Generation
Phase 4: Registration
src/types/provider.ts with your provider namesrc/providers/index.tssrc/utils/config.tsPhase 5: Configuration
.env.local with required API keysPhase 6: Validation
Phase 7: Benchmark Execution
The skill will ask you these questions upfront:
What should we call your memory provider?
Where is your memory implementation?
src/lib/memory, packages/memory, src/services/contextWhich dataset matches your use case?
LoCoMo - Long-term conversational memory across multiple sessions spanning days/weeks
LongMemEval - Memory with long documents and complex retrieval
ConvoMem - Multi-turn conversation understanding and context tracking
See Benchmarks Reference for detailed information.
Which systems to compare against?
How many questions to benchmark?
Important: You must run this skill from your project root, NOT from memorybench.
your-project/ ← Run skill from here
├── src/
│ └── lib/memory/ ← Your memory implementation
└── memorybench/ ← Skill clones this automatically
└── src/providers/ ← Your provider adapter goes here
The skill will:
./memorybench../src/lib/memory) when analyzing your codecd memorybench && bun run src/index.ts ...After the skill runs, you'll have:
your-project/
└── memorybench/
├── .env.local # Your API keys
├── src/
│ ├── providers/
│ │ └── {yourname}/
│ │ ├── index.ts # Provider implementation
│ │ └── prompts.ts # Custom prompts (optional)
│ ├── types/provider.ts # Updated with your provider
│ └── providers/index.ts # Registered
└── data/runs/{run-id}/ # Benchmark results
├── checkpoint.json # Run state
├── results/ # Per-question results
└── report.json # Final metrics
Once the benchmark finishes, the skill shows:
Summary Scores:
Key Findings:
Next Steps:
cd memorybench && bun run src/index.ts servecd memorybench && bun run src/index.ts show-failures -r {run-id}cd memorybench && bun run src/index.ts run -p {name} -b {benchmark}If something goes wrong:
.env.localSee Debugging Reference for detailed troubleshooting.
When executing this skill, follow these steps:
Check that we're in the user's project (not in memorybench):
basename "$(pwd)" | grep -q "memorybench" && echo "ERROR" || echo "OK"
If ERROR, inform user to run from their project root and exit.
Check if memorybench already exists:
[ -d "memorybench" ] && echo "EXISTS" || echo "NOT_FOUND"
If NOT_FOUND, clone it using EXACTLY this command (do not modify the URL):
git clone https://github.com/supermemoryai/memorybench.git memorybench
Then install dependencies:
cd memorybench && bun install && cd ..
IMPORTANT: You MUST use the URL https://github.com/supermemoryai/memorybench.git - do not infer or use any other URL.
If EXISTS, use the existing installation (no action needed).
Use AskUserQuestion tool to collect all 5 questions at once:
Use Task tool with subagent_type=Explore to analyze the provided memory code location. Look for initialization, add/ingest methods, search/query methods, and configuration needs.
Based on discovery, create:
memorybench/src/providers/{providerName}/index.ts using template from references/provider-template.mdmemorybench/src/providers/{providerName}/prompts.ts if custom formatting neededUpdate these files in memorybench:
src/types/provider.ts - Add to ProviderName union typesrc/providers/index.ts - Import and register in providers Recordsrc/utils/config.ts - Add case in getProviderConfig().env.example - Document environment variablesAsk user for API keys (OpenAI for judging, their provider keys, comparison provider keys).
Create or update memorybench/.env.local with provided values.
Run quick test:
cd memorybench && bun run src/index.ts test -p {providerName} -b {benchmark} -q question_1
If fails, show error and ask user if they want to debug or abort.
Based on user selections, run the benchmark command and LET IT COMPLETE without polling.
With comparisons:
cd memorybench && bun run src/index.ts compare -p {providerName},{others} -b {benchmark} -l {limit}
Without comparisons:
cd memorybench && bun run src/index.ts run -p {providerName} -b {benchmark} -l {limit}
(Omit -l {limit} if user selected "full")
IMPORTANT INSTRUCTIONS FOR RUNNING:
run_in_background: false (or omit the parameter) so the command runs synchronouslytimeout: 600000 for 10 minutes)Alternative if synchronous doesn't work:
run_in_background: trueWhen the benchmark completes, parse the final output and present:
Look for sections in the output like:
The skill completes successfully when:
From your project root:
# Invoke the skill (in Claude Code CLI)
/memorybench
# Or if using skills programmatically
skill:memorybench
That's it! The skill handles everything else automatically.
tools
Use when work should span one or more detached tasks but still behave like one job with a single owner context. TaskFlow is the durable flow substrate under authoring layers like Lobster, ACPX, plugins, or plain code. Keep conditional logic in the caller; use TaskFlow for flow identity, child-task linkage, waiting state, revision-checked mutations, and user-facing emergence.
tools
# Lobster Lobster executes multi-step workflows with approval checkpoints. Use it when: - User wants a repeatable automation (triage, monitor, sync) - Actions need human approval before executing (send, post, delete) - Multiple tool calls should run as one deterministic operation ## When to use Lobster | User intent | Use Lobster? | | ------------------------------------------------------ | --------------------------
tools
# Lobster Lobster executes multi-step workflows with approval checkpoints. Use it when: - User wants a repeatable automation (triage, monitor, sync) - Actions need human approval before executing (send, post, delete) - Multiple tool calls should run as one deterministic operation ## When to use Lobster | User intent | Use Lobster? | | ------------------------------------------------------ | --------------------------
tools
A CLI tool for making authenticated requests to the X (Twitter) API. Use this skill when you need to post tweets, reply, quote, search, read posts, manage followers, send DMs, upload media, or interact with any X API v2 endpoint.