Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

orcaqubits/nlweb-llm-providers

Name: nlweb-llm-providers
Author: orcaqubits

dist/codex/nlweb-protocol/skills/nlweb-llm-providers/SKILL.md

npx skillsauth add orcaqubits/agentic-commerce-claude-plugins nlweb-llm-providers

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

NLWeb LLM & Embedding Providers

Before writing code

Fetch live docs:

Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/nlweb-providers.md for the canonical provider list and config schema.
Fetch https://github.com/nlweb-ai/NLWeb/blob/main/config/config_llm.yaml for the exact model IDs and env-var names currently shipped.
Fetch https://github.com/nlweb-ai/NLWeb/blob/main/config/config_embedding.yaml for embedding defaults.
Inspect AskAgent/python/llm_providers/<provider>.py for the SDK calls the provider class makes.
Web-search the latest release notes — new providers and models get added often.

Conceptual Architecture

Mixed-Mode = Many Small LLM Calls

NLWeb's pipeline doesn't make one big LLM call per query. It makes many small calls: decontextualize the query, detect Schema.org item type, route to a tool, rank results, optionally summarize/generate. Each call has a strict <returnStruc> JSON schema in prompts.xml. Cost and latency are dominated by the number of calls, not the size of any single one.

High / Low Tier Model Selection

config_llm.yaml defines a high model and a low model per provider:

providers:
  openai:
    high: gpt-4.1
    low: gpt-4.1-mini
    api_key_env: OPENAI_API_KEY

The codebase decides which tier to use per call site — e.g., decontextualization is "low", final generate is "high". The exact assignment lives in core/ modules and the ModelRouter subsystem.

The Default Provider

Out of the box, NLWeb's preferred_endpoint (in config_llm.yaml) is azure_openai with gpt-4.1 / gpt-4.1-mini. Most users override this in .env or by editing the YAML.

All Supported LLM Providers

(Verify the live config_llm.yaml for current models and key names.)

| Provider | Default high | Default low | Env var | |----------|--------------|-------------|---------| | OpenAI | gpt-4.1 | gpt-4.1-mini | OPENAI_API_KEY | | Azure OpenAI | gpt-4.1 | gpt-4.1-mini | AZURE_OPENAI_API_KEY + AZURE_OPENAI_ENDPOINT | | Anthropic | claude-3-7-sonnet-latest | claude-3-5-haiku-latest | ANTHROPIC_API_KEY | | Google Gemini | gemini-2.5-pro | gemini-2.0-flash-lite | GEMINI_API_KEY | | DeepSeek on Azure | deepseek-coder-33b | deepseek-coder-7b | AZURE_DEEPSEEK_ENDPOINT | | Llama on Azure | llama-2-70b | llama-2-13b | AZURE_LLAMA_ENDPOINT | | HuggingFace | Qwen2.5-72B | Qwen2.5-Coder-7B | HF_TOKEN | | Inception Labs | mercury-small | mercury-small | INCEPTION_API_KEY | | Snowflake Cortex | claude-3-5-sonnet | llama3.1-8b | Snowflake creds | | Ollama | configurable | configurable | local — no key | | Pi Labs | (class present, may not be in default YAML) | — | — |

Embedding Providers

| Provider | Default model | Dim | |----------|---------------|-----| | OpenAI | text-embedding-3-small | 1536 | | Azure OpenAI | text-embedding-3-small | 1536 | | Gemini | text-embedding-004 | 768 | | Snowflake | arctic-embed-m-v1.5 | 768 | | Elasticsearch | multilingual-e5-small | 384 | | Ollama | nomic-embed-text (typically) | 768 |

Set preferred_provider in config_embedding.yaml. This must match what you used at ingest time — the most common NLWeb bug is changing the embedding provider after data is loaded, then getting empty results.

ModelRouter

NLWeb's ModelRouter/ subsystem is a cost/quality router that picks the right model tier (high vs low) per call site. It's still evolving — verify whether it's active in your release.

Why So Many Providers?

R.V. Guha's design goal: NLWeb should run on whatever LLM stack the site operator already has. A Snowflake customer uses Cortex; an Azure shop uses Azure OpenAI; a privacy-conscious deployment uses Ollama on prem. The provider abstraction is intentional.

Implementation Guidance

Switching the Primary LLM Provider

In config_llm.yaml:

preferred_endpoint: anthropic

providers:
  anthropic:
    high: claude-3-7-sonnet-latest
    low: claude-3-5-haiku-latest
    api_key_env: ANTHROPIC_API_KEY

Set ANTHROPIC_API_KEY in .env. Restart the server.

Running Locally with Ollama (Offline)

Install Ollama, pull a model:

ollama pull llama3.1:8b
ollama pull nomic-embed-text

In config_llm.yaml:

preferred_endpoint: ollama
providers:
  ollama:
    high: llama3.1:8b
    low: llama3.1:8b
    base_url: http://localhost:11434

In config_embedding.yaml:

preferred_provider: ollama
providers:
  ollama:
    model: nomic-embed-text
    dim: 768

Important: re-ingest after switching embedding provider — old vectors are now wrong-dim.

Adding a Custom Provider

Subclass the base class in llm_providers/ (look at openai.py or anthropic.py as templates).
Implement the required methods (typically complete() returning JSON-conformant output for the <returnStruc> schemas, plus optional streaming).
Register in the provider factory (verify exact location — usually a registry in core/llm.py).
Add an entry in config_llm.yaml.
Test against a known-good <returnStruc> prompt before deploying.

Tuning Cost

Use low tier for everything except the final generate (default behavior — verify).
Set tool_selection_enabled: false in config_nlweb.yaml to skip the router call entirely.
Disable who_endpoint_enabled to skip federated discovery.
Pre-compute decontextualized_query client-side to skip that LLM call.

Switching Embedding Providers Safely

# 1. Stop serving traffic
# 2. Change config_embedding.yaml
# 3. Drop the index
python -m data_loading.db_load --only-delete delete-site <site>
# 4. Re-ingest
python -m data_loading.db_load <source> <site>
# 5. Restart

You cannot mix-and-match embedding providers across a single retrieval index. Vectors are not portable across providers.

Verifying Provider Wiring

nlweb check runs connectivity diagnostics for all configured providers. Use it before debugging "the model isn't responding" issues — the answer is usually a missing env var.

Provider Failure Modes

OpenAI / Anthropic / Gemini 429s: rate limits. Add backoff in the provider class or reduce concurrency.
Azure OpenAI 404 on deployment: the deployment_name in config doesn't match what's deployed in Azure. They're per-deployment, not per-model.
Ollama "model not found": ollama pull <model> first.
Snowflake Cortex authentication: requires the warehouse + role to have Cortex enabled.
HuggingFace inference endpoint cold-start: first call takes 30-60s. Pre-warm.

Always re-fetch config_llm.yaml from the live repo — provider keys and model IDs change.

orcaqubits/nlweb-llm-providers

dist/codex/nlweb-protocol/skills/nlweb-llm-providers/SKILL.md

Configure NLWeb LLM and embedding providers — OpenAI, Azure OpenAI (default), Anthropic, Google Gemini, DeepSeek on Azure, Llama on Azure, HuggingFace, Inception Labs, Snowflake Cortex, Ollama, Pi Labs. Covers `config_llm.yaml` high/low tier model selection, the ModelRouter cost/quality routing logic, `config_embedding.yaml`, and adding a custom provider. Use when picking models, tuning cost, or wiring a new LLM backend.

27 stars

development

Updated May 14, 2026

$ install --global

skillsauth

npx skillsauth add orcaqubits/agentic-commerce-claude-plugins nlweb-llm-providers

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 14, 2026, 5:55 AM145.2s1 file scanned

SKILL.md

name:: nlweb-llm-providers
description:: >

NLWeb LLM & Embedding Providers

Before writing code

Fetch live docs:

Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/nlweb-providers.md for the canonical provider list and config schema.
Fetch https://github.com/nlweb-ai/NLWeb/blob/main/config/config_llm.yaml for the exact model IDs and env-var names currently shipped.
Fetch https://github.com/nlweb-ai/NLWeb/blob/main/config/config_embedding.yaml for embedding defaults.
Inspect AskAgent/python/llm_providers/<provider>.py for the SDK calls the provider class makes.
Web-search the latest release notes — new providers and models get added often.

Conceptual Architecture

Mixed-Mode = Many Small LLM Calls

High / Low Tier Model Selection

config_llm.yaml defines a high model and a low model per provider:

providers:
  openai:
    high: gpt-4.1
    low: gpt-4.1-mini
    api_key_env: OPENAI_API_KEY

The codebase decides which tier to use per call site — e.g., decontextualization is "low", final generate is "high". The exact assignment lives in core/ modules and the ModelRouter subsystem.

The Default Provider

Out of the box, NLWeb's preferred_endpoint (in config_llm.yaml) is azure_openai with gpt-4.1 / gpt-4.1-mini. Most users override this in .env or by editing the YAML.

All Supported LLM Providers

(Verify the live config_llm.yaml for current models and key names.)

Embedding Providers

ModelRouter

NLWeb's ModelRouter/ subsystem is a cost/quality router that picks the right model tier (high vs low) per call site. It's still evolving — verify whether it's active in your release.

Why So Many Providers?

Implementation Guidance

Switching the Primary LLM Provider

In config_llm.yaml:

preferred_endpoint: anthropic

providers:
  anthropic:
    high: claude-3-7-sonnet-latest
    low: claude-3-5-haiku-latest
    api_key_env: ANTHROPIC_API_KEY

Set ANTHROPIC_API_KEY in .env. Restart the server.

Running Locally with Ollama (Offline)

Install Ollama, pull a model:

ollama pull llama3.1:8b
ollama pull nomic-embed-text

In config_llm.yaml:

preferred_endpoint: ollama
providers:
  ollama:
    high: llama3.1:8b
    low: llama3.1:8b
    base_url: http://localhost:11434

In config_embedding.yaml:

preferred_provider: ollama
providers:
  ollama:
    model: nomic-embed-text
    dim: 768

Important: re-ingest after switching embedding provider — old vectors are now wrong-dim.

Adding a Custom Provider

Subclass the base class in llm_providers/ (look at openai.py or anthropic.py as templates).
Implement the required methods (typically complete() returning JSON-conformant output for the <returnStruc> schemas, plus optional streaming).
Register in the provider factory (verify exact location — usually a registry in core/llm.py).
Add an entry in config_llm.yaml.
Test against a known-good <returnStruc> prompt before deploying.

Tuning Cost

Use low tier for everything except the final generate (default behavior — verify).
Set tool_selection_enabled: false in config_nlweb.yaml to skip the router call entirely.
Disable who_endpoint_enabled to skip federated discovery.
Pre-compute decontextualized_query client-side to skip that LLM call.

Switching Embedding Providers Safely

# 1. Stop serving traffic
# 2. Change config_embedding.yaml
# 3. Drop the index
python -m data_loading.db_load --only-delete delete-site <site>
# 4. Re-ingest
python -m data_loading.db_load <source> <site>
# 5. Restart

You cannot mix-and-match embedding providers across a single retrieval index. Vectors are not portable across providers.

Verifying Provider Wiring

nlweb check runs connectivity diagnostics for all configured providers. Use it before debugging "the model isn't responding" issues — the answer is usually a missing env var.

Provider Failure Modes

OpenAI / Anthropic / Gemini 429s: rate limits. Add backoff in the provider class or reduce concurrency.
Azure OpenAI 404 on deployment: the deployment_name in config doesn't match what's deployed in Azure. They're per-deployment, not per-model.
Ollama "model not found": ollama pull <model> first.
Snowflake Cortex authentication: requires the warehouse + role to have Cortex enabled.
HuggingFace inference endpoint cold-start: first call takes 30-60s. Pre-warm.

Always re-fetch config_llm.yaml from the live repo — provider keys and model IDs change.

Related Skills

orcaqubits/spree-headless-storefront

development

VerifiedTrustedCommunity

Build with Spree's headless Next.js storefront — the official `spree/storefront` repo (Next.js 16 App Router with Server Actions and Turbopack, React 19 Server Components, Tailwind CSS 4, TypeScript 5, `@spree/sdk`, Sentry), server-only auth (httpOnly JWT cookies + publishable key), MeiliSearch faceted catalog, one-page checkout with Apple/Google Pay/Klarna/Affirm/SEPA, multi-region market routing, GA4 + JSON-LD SEO, and Vercel/Docker deployment. Use when forking or customizing the storefront, or evaluating headless adoption.

27SKILL.mdUpdated May 14, 2026

orcaqubits/spree-headless-storefront

orcaqubits/spree-extensions

tools

VerifiedTrustedCommunity

Build Spree extensions as Rails engines — gem scaffolding, `bin/rails g spree:extension`, mounting routes/migrations/assets, the modern `prepend` decorator pattern (`*_decorator.rb` with `self.prepended(base)`), generators (`spree:model_decorator`, `spree:controller_decorator`), the four customization surfaces in preference order (Events > Webhooks > Dependencies > Decorators), Spree::Dependencies for swapping service objects, gem release/versioning, and the deprecated Deface engine. Use when building a reusable Spree extension or adding non-trivial customization to an app.

27SKILL.mdUpdated May 14, 2026

orcaqubits/spree-extensions

orcaqubits/spree-events-webhooks

development

VerifiedTrustedCommunity

Build with Spree's event bus and Webhooks 2.0 — `Spree::Events` publication, `Spree::Subscriber` DSL with `subscribes_to` and `on`, wildcard matching, lifecycle events (`{model}.created/.updated/.deleted` via `publishes_lifecycle_events`), the canonical event catalog (order.*, payment.*, shipment.*, product.*), Webhooks 2.0 endpoints, HMAC-SHA256 signing (`X-Spree-Webhook-Signature`), exponential-backoff retries, and Sidekiq job orchestration. Use when wiring event-driven business logic, building webhook consumers, or replacing ActiveSupport callback chains.

27SKILL.mdUpdated May 14, 2026

orcaqubits/spree-events-webhooks

orcaqubits/spree-dev-patterns

tools

VerifiedTrustedCommunity

Cross-cutting Spree development patterns — the customization preference hierarchy (Events > Webhooks > Dependencies > Decorators), `Spree::Dependencies` service-object swapping, the `_decorator.rb` + `prepend` + `self.prepended` idiom, idempotent subscribers and webhook receivers, multi-store scoping discipline, prefixed IDs, calculator polymorphism (shipping/promotion/tax share the base), service-object composition with `dry-monads` or simple results, why to avoid `class_eval` reopening and Deface, and Spree-on-Rails idioms (Hotwire/Turbo Stimulus, ActiveStorage, Action Cable, Sidekiq). Use when designing the architecture of a Spree extension or solving cross-cutting concerns.

27SKILL.mdUpdated May 14, 2026

orcaqubits/spree-dev-patterns

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/orcaqubits/agentic-commerce-claude-plugins.git

# Copy into Claude Code skills folder (global)
cp -r agentic-commerce-claude-plugins/dist/codex/nlweb-protocol/skills/nlweb-llm-providers ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

orcaqubits/agentic-commerce-claude-plugins

27 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT