dist/codex/nlweb-protocol/skills/nlweb-dev-patterns/SKILL.md
NLWeb development patterns — the mixed-mode programming philosophy, FastTrack vs Analysis parallel paths, config file precedence and the `mode: development` override trap, in-stream NLWS headers vs HTTP headers, embedding/ingest determinism, debugging the LLM-call chain, neural scorer selection (NLWebScorer ModernBERT+GAM), and the A2A / AgentFinder / DataFinder / ModelRouter subsystems. Use when designing the internal architecture of an NLWeb deployment or solving cross-cutting concerns.
npx skillsauth add orcaqubits/agentic-commerce-claude-plugins nlweb-dev-patternsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Fetch live docs:
core/baseHandler.py, core/router.py, core/retriever.py, core/ranking.py for current code paths.NLWeb's defining design choice. Rather than one big LLM call per query, NLWeb makes many small calls, each with a strict JSON output schema (<returnStruc>), feeding Python control flow.
Implications:
When designing extensions, follow the same pattern: small, schema-constrained LLM calls, deterministic Python glue.
NLWebHandler runs two paths in parallel:
| Path | What it does | When it wins |
|------|--------------|--------------|
| FastTrack | Immediate vector search → stream early results | Common queries with obvious retrieval matches |
| Analysis | Decontextualize → detect type → route via ToolSelector to a specific handler | Ambiguous queries, complex flows (compare, recipe substitution) |
Both paths stream into the same response. FastTrack results appear quickly; Analysis results appear when ready. The agent decides whether to render incrementally or wait.
Implications for handlers you write: if you write a slow, expensive handler, FastTrack will still beat you to first byte for simple queries. That's fine — it's the design.
8 YAML config files in config/. Precedence (highest first):
mode: development in config_webserver.yamlThe mode: development override is a foot-gun in production. A query like ?write_endpoint=other_qdrant would silently switch the write target. Always set mode: production before deploying.
NLWeb's "NLWS headers" mechanism is JSON message objects on the SSE channel, not HTTP response headers. Each carries a message_type:
| message_type | Carries |
|--------------|---------|
| license | Content license terms |
| data_retention | How long the agent may cache |
| cache_policy | Caching directives |
| usage_terms | Acceptable use |
| rate_limits | Calls/sec, daily quota |
| data_freshness | Last index time |
| api_version | NLWeb release identifier |
| ui_component | Optional rendering hint |
Client parsing rule: buffer message objects until you see a results chunk or terminal marker. Don't assume the first chunk is data.
The most common NLWeb bug: changing the embedding provider after ingest, getting empty or garbage results.
Rule: pick the embedding provider FIRST, configure the retrieval backend's vector dimension to match, ingest with that provider, query with that provider. Never change mid-stream without re-ingesting.
If you need to migrate embedding providers:
preferred_providerdb_load.py --only-delete delete-site <site> for each siteWhen /ask returns a bad answer, the bug is in one of these call sites:
| Call site | Symptom | Fix |
|-----------|---------|-----|
| Decontextualize | Query rewritten wrong; off-topic results | Pre-compute decontextualized_query, log the prompt's output |
| Type detection | Wrong handler invoked | Pass itemType explicitly, or check site_types.xml |
| Tool selection | Right type, wrong tool | Adjust tool descriptions; set tool_selection_enabled: false to bypass |
| Ranking | Top results are off | Check embedding alignment first; then try scorer=nlwebscorer |
| Summarize / generate | Final answer is poor | Improve Schema.org source data; bump model tier |
Isolate by mode: mode=list skips summarize/generate. If list is bad, the issue is retrieval or ranking, not synthesis.
The NLWebScorer/ subsystem provides a ModernBERT + GAM neural reranker as an alternative to LLM-based ranking. Activate via ?scorer=nlwebscorer on /ask. Configure checkpoints in config_*.yaml:
scorers:
nlwebscorer:
bert_checkpoint: ./checkpoints/modernbert.pt
gam_checkpoint: ./checkpoints/gam.pt
Use cases:
Tradeoff: it's domain-specific — you may need to fine-tune on your data. See docs/training-recipe-modernbert-gam.md.
NLWeb's repo isn't just one server. Five top-level folders are conceptually distinct:
| Subsystem | Purpose | When relevant |
|-----------|---------|---------------|
| AskAgent/ | The core /ask and /mcp server | Always |
| AgentFinder/ | Cross-site NLWeb discovery (federated /who) | Multi-site federations |
| DataFinder/ | NL→SQL for enterprise sources (HubSpot, Dynamics, Jira) | Enterprise data, not vector-backed |
| ModelRouter/ | Cost/quality routing across LLM providers | Cost optimization at scale |
| NLWebScorer/ | Neural reranker (ModernBERT + GAM) | High-volume retrieval |
Most deployments use only AskAgent. The rest are opt-in.
NLWeb supports three transport bindings in parallel:
| Binding | Path | Audience |
|---------|------|----------|
| REST /ask | port 8000 | Browsers, custom clients |
| MCP /mcp | port 8000 | AI agents (Claude, Gemini, native MCP) |
| A2A | webserver/a2a_wrapper.py, route a2a.py | Google Agent-to-Agent protocol |
| AppSDK adapter | port 8100 | ChatGPT specifically |
All share the same backend pipeline. No data duplication. Choose by audience, not by feature.
core/conversation_history.py persists exchanges per authenticated user. methods/conversation_search.py queries the persisted history.
Long-term memory (cross-conversation user preferences) is NOT shipped. Hook points to add it:
NLWebHandler.respond() — extract durable facts, write to user profileThis is intentional: NLWeb leaves opinionated personalization to the integrator.
NLWeb doesn't define idempotency keys — /ask calls are read-side; replays are safe. /mcp follows JSON-RPC 2.0 semantics: include id in every request, retry with the same id if the connection drops mid-request (server may dedup if implemented).
For db_load.py, idempotency is upsert by URL. Re-running on the same source updates existing records rather than duplicating.
Every result carries a schema_object. Agents pattern-match on @type to render appropriately. Design rule: any new tool or handler you write should preserve the schema_object in its output. Don't strip it down to text — that defeats the whole point of NLWeb.
NLWeb releases as dated markdown files in docs/release_notes/, not semver tags. When pinning a deployment:
Most extensibility goes via:
config/*.yaml and XML files (preferred)methods/ (custom handlers)llm_providers/, embedding_providers/, retrieval_providers/webserver/middleware/Avoid editing core/baseHandler.py, core/router.py, etc. — they change frequently and your fork rots.
The default config enables three retrieval backends (qdrant_local, nlweb_west, shopify_mcp), the federated /who endpoint, and mode: development. For any non-demo deployment, set these:
# config_webserver.yaml
mode: production
# config_nlweb.yaml
who_endpoint_enabled: false
# config_retrieval.yaml
endpoints:
nlweb_west: { enabled: false }
shopify_mcp: { enabled: false }
These defaults make sense for hello-world demos. They are anti-patterns for production.
Always cross-reference with the latest docs/release_notes/ and the live core/ modules — patterns evolve and the code is the source of truth.
development
Build with Spree's headless Next.js storefront — the official `spree/storefront` repo (Next.js 16 App Router with Server Actions and Turbopack, React 19 Server Components, Tailwind CSS 4, TypeScript 5, `@spree/sdk`, Sentry), server-only auth (httpOnly JWT cookies + publishable key), MeiliSearch faceted catalog, one-page checkout with Apple/Google Pay/Klarna/Affirm/SEPA, multi-region market routing, GA4 + JSON-LD SEO, and Vercel/Docker deployment. Use when forking or customizing the storefront, or evaluating headless adoption.
tools
Build Spree extensions as Rails engines — gem scaffolding, `bin/rails g spree:extension`, mounting routes/migrations/assets, the modern `prepend` decorator pattern (`*_decorator.rb` with `self.prepended(base)`), generators (`spree:model_decorator`, `spree:controller_decorator`), the four customization surfaces in preference order (Events > Webhooks > Dependencies > Decorators), Spree::Dependencies for swapping service objects, gem release/versioning, and the deprecated Deface engine. Use when building a reusable Spree extension or adding non-trivial customization to an app.
development
Build with Spree's event bus and Webhooks 2.0 — `Spree::Events` publication, `Spree::Subscriber` DSL with `subscribes_to` and `on`, wildcard matching, lifecycle events (`{model}.created/.updated/.deleted` via `publishes_lifecycle_events`), the canonical event catalog (order.*, payment.*, shipment.*, product.*), Webhooks 2.0 endpoints, HMAC-SHA256 signing (`X-Spree-Webhook-Signature`), exponential-backoff retries, and Sidekiq job orchestration. Use when wiring event-driven business logic, building webhook consumers, or replacing ActiveSupport callback chains.
tools
Cross-cutting Spree development patterns — the customization preference hierarchy (Events > Webhooks > Dependencies > Decorators), `Spree::Dependencies` service-object swapping, the `_decorator.rb` + `prepend` + `self.prepended` idiom, idempotent subscribers and webhook receivers, multi-store scoping discipline, prefixed IDs, calculator polymorphism (shipping/promotion/tax share the base), service-object composition with `dry-monads` or simple results, why to avoid `class_eval` reopening and Deface, and Spree-on-Rails idioms (Hotwire/Turbo Stimulus, ActiveStorage, Action Cable, Sidekiq). Use when designing the architecture of a Spree extension or solving cross-cutting concerns.