dist/codex/nlweb-protocol/skills/nlweb-llm-providers/SKILL.md
Configure NLWeb LLM and embedding providers — OpenAI, Azure OpenAI (default), Anthropic, Google Gemini, DeepSeek on Azure, Llama on Azure, HuggingFace, Inception Labs, Snowflake Cortex, Ollama, Pi Labs. Covers `config_llm.yaml` high/low tier model selection, the ModelRouter cost/quality routing logic, `config_embedding.yaml`, and adding a custom provider. Use when picking models, tuning cost, or wiring a new LLM backend.
npx skillsauth add orcaqubits/agentic-commerce-claude-plugins nlweb-llm-providersInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Fetch live docs:
AskAgent/python/llm_providers/<provider>.py for the SDK calls the provider class makes.NLWeb's pipeline doesn't make one big LLM call per query. It makes many small calls: decontextualize the query, detect Schema.org item type, route to a tool, rank results, optionally summarize/generate. Each call has a strict <returnStruc> JSON schema in prompts.xml. Cost and latency are dominated by the number of calls, not the size of any single one.
config_llm.yaml defines a high model and a low model per provider:
providers:
openai:
high: gpt-4.1
low: gpt-4.1-mini
api_key_env: OPENAI_API_KEY
The codebase decides which tier to use per call site — e.g., decontextualization is "low", final generate is "high". The exact assignment lives in core/ modules and the ModelRouter subsystem.
Out of the box, NLWeb's preferred_endpoint (in config_llm.yaml) is azure_openai with gpt-4.1 / gpt-4.1-mini. Most users override this in .env or by editing the YAML.
(Verify the live config_llm.yaml for current models and key names.)
| Provider | Default high | Default low | Env var |
|----------|--------------|-------------|---------|
| OpenAI | gpt-4.1 | gpt-4.1-mini | OPENAI_API_KEY |
| Azure OpenAI | gpt-4.1 | gpt-4.1-mini | AZURE_OPENAI_API_KEY + AZURE_OPENAI_ENDPOINT |
| Anthropic | claude-3-7-sonnet-latest | claude-3-5-haiku-latest | ANTHROPIC_API_KEY |
| Google Gemini | gemini-2.5-pro | gemini-2.0-flash-lite | GEMINI_API_KEY |
| DeepSeek on Azure | deepseek-coder-33b | deepseek-coder-7b | AZURE_DEEPSEEK_ENDPOINT |
| Llama on Azure | llama-2-70b | llama-2-13b | AZURE_LLAMA_ENDPOINT |
| HuggingFace | Qwen2.5-72B | Qwen2.5-Coder-7B | HF_TOKEN |
| Inception Labs | mercury-small | mercury-small | INCEPTION_API_KEY |
| Snowflake Cortex | claude-3-5-sonnet | llama3.1-8b | Snowflake creds |
| Ollama | configurable | configurable | local — no key |
| Pi Labs | (class present, may not be in default YAML) | — | — |
| Provider | Default model | Dim | |----------|---------------|-----| | OpenAI | text-embedding-3-small | 1536 | | Azure OpenAI | text-embedding-3-small | 1536 | | Gemini | text-embedding-004 | 768 | | Snowflake | arctic-embed-m-v1.5 | 768 | | Elasticsearch | multilingual-e5-small | 384 | | Ollama | nomic-embed-text (typically) | 768 |
Set preferred_provider in config_embedding.yaml. This must match what you used at ingest time — the most common NLWeb bug is changing the embedding provider after data is loaded, then getting empty results.
NLWeb's ModelRouter/ subsystem is a cost/quality router that picks the right model tier (high vs low) per call site. It's still evolving — verify whether it's active in your release.
R.V. Guha's design goal: NLWeb should run on whatever LLM stack the site operator already has. A Snowflake customer uses Cortex; an Azure shop uses Azure OpenAI; a privacy-conscious deployment uses Ollama on prem. The provider abstraction is intentional.
In config_llm.yaml:
preferred_endpoint: anthropic
providers:
anthropic:
high: claude-3-7-sonnet-latest
low: claude-3-5-haiku-latest
api_key_env: ANTHROPIC_API_KEY
Set ANTHROPIC_API_KEY in .env. Restart the server.
Install Ollama, pull a model:
ollama pull llama3.1:8b
ollama pull nomic-embed-text
In config_llm.yaml:
preferred_endpoint: ollama
providers:
ollama:
high: llama3.1:8b
low: llama3.1:8b
base_url: http://localhost:11434
In config_embedding.yaml:
preferred_provider: ollama
providers:
ollama:
model: nomic-embed-text
dim: 768
Important: re-ingest after switching embedding provider — old vectors are now wrong-dim.
llm_providers/ (look at openai.py or anthropic.py as templates).complete() returning JSON-conformant output for the <returnStruc> schemas, plus optional streaming).core/llm.py).config_llm.yaml.<returnStruc> prompt before deploying.low tier for everything except the final generate (default behavior — verify).tool_selection_enabled: false in config_nlweb.yaml to skip the router call entirely.who_endpoint_enabled to skip federated discovery.decontextualized_query client-side to skip that LLM call.# 1. Stop serving traffic
# 2. Change config_embedding.yaml
# 3. Drop the index
python -m data_loading.db_load --only-delete delete-site <site>
# 4. Re-ingest
python -m data_loading.db_load <source> <site>
# 5. Restart
You cannot mix-and-match embedding providers across a single retrieval index. Vectors are not portable across providers.
nlweb check runs connectivity diagnostics for all configured providers. Use it before debugging "the model isn't responding" issues — the answer is usually a missing env var.
deployment_name in config doesn't match what's deployed in Azure. They're per-deployment, not per-model.ollama pull <model> first.Always re-fetch config_llm.yaml from the live repo — provider keys and model IDs change.
development
Build with Spree's headless Next.js storefront — the official `spree/storefront` repo (Next.js 16 App Router with Server Actions and Turbopack, React 19 Server Components, Tailwind CSS 4, TypeScript 5, `@spree/sdk`, Sentry), server-only auth (httpOnly JWT cookies + publishable key), MeiliSearch faceted catalog, one-page checkout with Apple/Google Pay/Klarna/Affirm/SEPA, multi-region market routing, GA4 + JSON-LD SEO, and Vercel/Docker deployment. Use when forking or customizing the storefront, or evaluating headless adoption.
tools
Build Spree extensions as Rails engines — gem scaffolding, `bin/rails g spree:extension`, mounting routes/migrations/assets, the modern `prepend` decorator pattern (`*_decorator.rb` with `self.prepended(base)`), generators (`spree:model_decorator`, `spree:controller_decorator`), the four customization surfaces in preference order (Events > Webhooks > Dependencies > Decorators), Spree::Dependencies for swapping service objects, gem release/versioning, and the deprecated Deface engine. Use when building a reusable Spree extension or adding non-trivial customization to an app.
development
Build with Spree's event bus and Webhooks 2.0 — `Spree::Events` publication, `Spree::Subscriber` DSL with `subscribes_to` and `on`, wildcard matching, lifecycle events (`{model}.created/.updated/.deleted` via `publishes_lifecycle_events`), the canonical event catalog (order.*, payment.*, shipment.*, product.*), Webhooks 2.0 endpoints, HMAC-SHA256 signing (`X-Spree-Webhook-Signature`), exponential-backoff retries, and Sidekiq job orchestration. Use when wiring event-driven business logic, building webhook consumers, or replacing ActiveSupport callback chains.
tools
Cross-cutting Spree development patterns — the customization preference hierarchy (Events > Webhooks > Dependencies > Decorators), `Spree::Dependencies` service-object swapping, the `_decorator.rb` + `prepend` + `self.prepended` idiom, idempotent subscribers and webhook receivers, multi-store scoping discipline, prefixed IDs, calculator polymorphism (shipping/promotion/tax share the base), service-object composition with `dry-monads` or simple results, why to avoid `class_eval` reopening and Deface, and Spree-on-Rails idioms (Hotwire/Turbo Stimulus, ActiveStorage, Action Cable, Sidekiq). Use when designing the architecture of a Spree extension or solving cross-cutting concerns.