Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

orcaqubits/nlweb-deployment

Name: nlweb-deployment
Author: orcaqubits

dist/codex/nlweb-protocol/skills/nlweb-deployment/SKILL.md

npx skillsauth add orcaqubits/agentic-commerce-claude-plugins nlweb-deployment

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

NLWeb Deployment

Before writing code

Fetch live docs:

Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/setup-azure.md for Azure App Service deployment.
Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/setup-snowflake.md for Snowflake Container Services.
Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/setup-cloudflare-autorag.md for Cloudflare Worker + AutoRAG.
Fetch https://developers.cloudflare.com/ai-search/how-to/nlweb/ for Cloudflare's hosted NLWeb documentation.
Inspect deploy_azure_webapp.sh, setup.sh, startup_aiohttp.sh in the live repo for current commands.
Web-search the latest release notes for breaking deployment changes.

Conceptual Architecture

Deployment Targets Supported

| Target | Notes | Setup doc | |--------|-------|-----------| | Azure App Service | Reference deployment; ships shell scripts | docs/setup-azure.md | | Snowflake Container Services | NLWeb runs inside Snowflake compute, closest to data | docs/setup-snowflake.md | | Cloudflare Worker + AutoRAG | Edge deployment; CF manages indexing | docs/setup-cloudflare-autorag.md | | Docker | Bring-your-own host | Build from Dockerfile if shipped, else manual | | Bare Python | systemd + venv on a VM | Use app-aiohttp.py directly | | WordPress plugin | For WP sites | code/wordpress/nlweb/ |

Production Hardening Checklist

Before exposing /ask or /mcp to the internet:

Set mode: production in config_webserver.yaml — disables query-string config overrides.
Lock down the sites: allowlist in config_nlweb.yaml — only the sites you want public.
Disable who_endpoint_enabled if you don't want federated traffic going to nlwm.azurewebsites.net.
Turn off unused retrieval backends in config_retrieval.yaml (nlweb_west, shopify_mcp unless needed).
Configure OAuth if you need auth (see nlweb-auth-multitenancy).
Set TLS at the edge (App Service, CF, ALB, etc.).
Set rate limits — NLWeb itself has limited built-in protection; do it at the edge.
Configure CORS if a browser client calls /ask directly.
Persist conversations to a real storage provider (config_storage.yaml), not in-memory.
Configure observability — logs, /mcp/health checks, latency metrics.

Env Vars vs YAML Config

Secrets always in env vars — never in config_*.yaml. The convention NLWeb uses:

# config_llm.yaml
providers:
  azure_openai:
    api_key_env: AZURE_OPENAI_API_KEY     # references env var, doesn't store value
    endpoint_env: AZURE_OPENAI_ENDPOINT

.env is typical for dev; in cloud deployments use the platform's secret manager (Azure Key Vault, Snowflake secrets, CF Workers KV / Secrets, etc.) and inject as env vars.

The Two Server Processes

A full production NLWeb deployment may have:

Main aiohttp server (port 8000) — /ask, /mcp, /who, /sites, /config, /api/oauth/*
AppSDK adapter (port 8100) — only if you're integrating with ChatGPT Apps SDK. Optional.

Plus optionally the Node.js MCP server in openai-apps-sdk-integration/ if you want the React widget for ChatGPT.

Reverse-Proxy Concerns

NLWeb streams SSE. Make sure your reverse proxy:

Disables response buffering for /ask paths (X-Accel-Buffering: no is sent, but nginx still needs proxy_buffering off).
Sets long timeouts (60-300s) for /ask streams.
Forwards real client IP (X-Forwarded-For) for rate limiting.
Terminates TLS — NLWeb assumes plain HTTP behind a TLS-terminating proxy.

Data Reload as a CI Job

Most deployments reload site data on a schedule:

# .github/workflows/nlweb-reload.yml (sketch)
on:
  schedule:
    - cron: '0 3 * * *'   # daily 03:00 UTC
jobs:
  reload:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install -r requirements.txt
      - run: python -m data_loading.db_load https://example.com/feed.xml my-site
        env:
          AZURE_SEARCH_API_KEY: ${{ secrets.AZURE_SEARCH_API_KEY }}
          AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}

Run reload as a separate process — don't bake it into the server's startup.

Scaling

NLWeb is stateless per-request (state is in conversation storage + the vector backend). Scale horizontally:

Multiple app instances behind a load balancer
Shared vector backend (cloud-hosted, not Qdrant local file)
Shared conversation storage (Qdrant remote / Azure Search / Elasticsearch)
Sticky sessions NOT required for /ask (each request is self-contained)

LLM and embedding API quota is usually the binding constraint, not CPU.

Implementation Guidance

Azure App Service Deployment

Walk through deploy_azure_webapp.sh — it provisions:

App Service Plan + Web App (Linux, Python 3.11+)
Azure AI Search service
Azure OpenAI deployment
App settings (env vars) wired to the search/openai instances

Customize the resource names, set WEBSITES_PORT=8000 (or whichever the script uses), deploy via git push or az webapp deploy. Verify mode: production in the deployed config_webserver.yaml.

Snowflake Container Services

NLWeb runs as a containerized service inside Snowflake compute, queries Cortex Search (data is already in Snowflake tables). Use the setup-snowflake.md doc — it covers the SPCS service spec, image build, and Cortex Search setup.

Cloudflare Worker + AutoRAG

Cloudflare maintains a hosted variant. Two options:

Self-host on CF Workers following docs/setup-cloudflare-autorag.md — covers the worker template and AutoRAG wiring.
Use CF's managed deployment per https://developers.cloudflare.com/ai-search/how-to/nlweb/.

Docker

If a Dockerfile ships in the repo, use it. Otherwise:

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["python", "AskAgent/python/app-aiohttp.py"]

Mount config/ and .env as bind mounts or use env vars + ConfigMaps. Persist data/db/ (Qdrant local) on a volume if not using a remote vector store.

Health Checks

Liveness: GET /mcp/health (or /sites as a fallback)
Readiness: GET /sites — fails fast if config is broken

Logs and Observability

NLWeb logs to stdout via Python logging. Wire to your platform's log aggregator (Azure Monitor, CloudWatch, etc.). Key metrics:

/ask latency (p50, p95, p99) — SSE makes this tricky; measure TTFB and total
LLM API errors / 429s
Retrieval backend latencies (per-backend)
Conversation storage write latency

Production Failure Modes

App boots but /ask 500s: usually an env var missing — check the log for the failing provider.
Streaming requests time out at the proxy: increase proxy read timeout; turn off proxy buffering.
Cold-start latency: first request after deploy takes 30-60s as models load. Pre-warm with a synthetic health check.
Bills are huge: too many LLM calls per query — tune tool_selection_enabled, model tiers, and who_endpoint_enabled.
Embedding rate limits during data reload: throttle --batch-size, use a separate embedding deployment, or run reloads off-peak.

Always re-fetch the per-target setup doc and deploy_*.sh scripts before deploying — these are the most release-sensitive parts of the codebase.

orcaqubits/nlweb-deployment

dist/codex/nlweb-protocol/skills/nlweb-deployment/SKILL.md

Deploy NLWeb to production — Azure App Service (`deploy_azure_webapp.sh` + AI Search + Azure OpenAI), Snowflake Container Services, Cloudflare Worker + AutoRAG, Docker, and self-hosted. Covers env-var conventions, `mode: production` lockdown, scaling, TLS, OAuth, and CI for data reloads. Use when going from local dev to a hosted, internet-facing NLWeb instance.

27 stars

development

Updated May 14, 2026

$ install --global

skillsauth

npx skillsauth add orcaqubits/agentic-commerce-claude-plugins nlweb-deployment

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 14, 2026, 5:55 AM147.1s1 file scanned

SKILL.md

name:: nlweb-deployment
description:: >

NLWeb Deployment

Before writing code

Fetch live docs:

Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/setup-azure.md for Azure App Service deployment.
Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/setup-snowflake.md for Snowflake Container Services.
Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/setup-cloudflare-autorag.md for Cloudflare Worker + AutoRAG.
Fetch https://developers.cloudflare.com/ai-search/how-to/nlweb/ for Cloudflare's hosted NLWeb documentation.
Inspect deploy_azure_webapp.sh, setup.sh, startup_aiohttp.sh in the live repo for current commands.
Web-search the latest release notes for breaking deployment changes.

Conceptual Architecture

Deployment Targets Supported

Production Hardening Checklist

Before exposing /ask or /mcp to the internet:

Set mode: production in config_webserver.yaml — disables query-string config overrides.
Lock down the sites: allowlist in config_nlweb.yaml — only the sites you want public.
Disable who_endpoint_enabled if you don't want federated traffic going to nlwm.azurewebsites.net.
Turn off unused retrieval backends in config_retrieval.yaml (nlweb_west, shopify_mcp unless needed).
Configure OAuth if you need auth (see nlweb-auth-multitenancy).
Set TLS at the edge (App Service, CF, ALB, etc.).
Set rate limits — NLWeb itself has limited built-in protection; do it at the edge.
Configure CORS if a browser client calls /ask directly.
Persist conversations to a real storage provider (config_storage.yaml), not in-memory.
Configure observability — logs, /mcp/health checks, latency metrics.

Env Vars vs YAML Config

Secrets always in env vars — never in config_*.yaml. The convention NLWeb uses:

# config_llm.yaml
providers:
  azure_openai:
    api_key_env: AZURE_OPENAI_API_KEY     # references env var, doesn't store value
    endpoint_env: AZURE_OPENAI_ENDPOINT

.env is typical for dev; in cloud deployments use the platform's secret manager (Azure Key Vault, Snowflake secrets, CF Workers KV / Secrets, etc.) and inject as env vars.

The Two Server Processes

A full production NLWeb deployment may have:

Main aiohttp server (port 8000) — /ask, /mcp, /who, /sites, /config, /api/oauth/*
AppSDK adapter (port 8100) — only if you're integrating with ChatGPT Apps SDK. Optional.

Plus optionally the Node.js MCP server in openai-apps-sdk-integration/ if you want the React widget for ChatGPT.

Reverse-Proxy Concerns

NLWeb streams SSE. Make sure your reverse proxy:

Disables response buffering for /ask paths (X-Accel-Buffering: no is sent, but nginx still needs proxy_buffering off).
Sets long timeouts (60-300s) for /ask streams.
Forwards real client IP (X-Forwarded-For) for rate limiting.
Terminates TLS — NLWeb assumes plain HTTP behind a TLS-terminating proxy.

Data Reload as a CI Job

Most deployments reload site data on a schedule:

# .github/workflows/nlweb-reload.yml (sketch)
on:
  schedule:
    - cron: '0 3 * * *'   # daily 03:00 UTC
jobs:
  reload:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install -r requirements.txt
      - run: python -m data_loading.db_load https://example.com/feed.xml my-site
        env:
          AZURE_SEARCH_API_KEY: ${{ secrets.AZURE_SEARCH_API_KEY }}
          AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}

Run reload as a separate process — don't bake it into the server's startup.

Scaling

NLWeb is stateless per-request (state is in conversation storage + the vector backend). Scale horizontally:

Multiple app instances behind a load balancer
Shared vector backend (cloud-hosted, not Qdrant local file)
Shared conversation storage (Qdrant remote / Azure Search / Elasticsearch)
Sticky sessions NOT required for /ask (each request is self-contained)

LLM and embedding API quota is usually the binding constraint, not CPU.

Implementation Guidance

Azure App Service Deployment

Walk through deploy_azure_webapp.sh — it provisions:

App Service Plan + Web App (Linux, Python 3.11+)
Azure AI Search service
Azure OpenAI deployment
App settings (env vars) wired to the search/openai instances

Customize the resource names, set WEBSITES_PORT=8000 (or whichever the script uses), deploy via git push or az webapp deploy. Verify mode: production in the deployed config_webserver.yaml.

Snowflake Container Services

Cloudflare Worker + AutoRAG

Cloudflare maintains a hosted variant. Two options:

Self-host on CF Workers following docs/setup-cloudflare-autorag.md — covers the worker template and AutoRAG wiring.
Use CF's managed deployment per https://developers.cloudflare.com/ai-search/how-to/nlweb/.

Docker

If a Dockerfile ships in the repo, use it. Otherwise:

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["python", "AskAgent/python/app-aiohttp.py"]

Mount config/ and .env as bind mounts or use env vars + ConfigMaps. Persist data/db/ (Qdrant local) on a volume if not using a remote vector store.

Health Checks

Liveness: GET /mcp/health (or /sites as a fallback)
Readiness: GET /sites — fails fast if config is broken

Logs and Observability

NLWeb logs to stdout via Python logging. Wire to your platform's log aggregator (Azure Monitor, CloudWatch, etc.). Key metrics:

/ask latency (p50, p95, p99) — SSE makes this tricky; measure TTFB and total
LLM API errors / 429s
Retrieval backend latencies (per-backend)
Conversation storage write latency

Production Failure Modes

App boots but /ask 500s: usually an env var missing — check the log for the failing provider.
Streaming requests time out at the proxy: increase proxy read timeout; turn off proxy buffering.
Cold-start latency: first request after deploy takes 30-60s as models load. Pre-warm with a synthetic health check.
Bills are huge: too many LLM calls per query — tune tool_selection_enabled, model tiers, and who_endpoint_enabled.
Embedding rate limits during data reload: throttle --batch-size, use a separate embedding deployment, or run reloads off-peak.

Always re-fetch the per-target setup doc and deploy_*.sh scripts before deploying — these are the most release-sensitive parts of the codebase.

Related Skills

orcaqubits/spree-headless-storefront

development

VerifiedTrustedCommunity

Build with Spree's headless Next.js storefront — the official `spree/storefront` repo (Next.js 16 App Router with Server Actions and Turbopack, React 19 Server Components, Tailwind CSS 4, TypeScript 5, `@spree/sdk`, Sentry), server-only auth (httpOnly JWT cookies + publishable key), MeiliSearch faceted catalog, one-page checkout with Apple/Google Pay/Klarna/Affirm/SEPA, multi-region market routing, GA4 + JSON-LD SEO, and Vercel/Docker deployment. Use when forking or customizing the storefront, or evaluating headless adoption.

27SKILL.mdUpdated May 14, 2026

orcaqubits/spree-headless-storefront

orcaqubits/spree-extensions

tools

VerifiedTrustedCommunity

Build Spree extensions as Rails engines — gem scaffolding, `bin/rails g spree:extension`, mounting routes/migrations/assets, the modern `prepend` decorator pattern (`*_decorator.rb` with `self.prepended(base)`), generators (`spree:model_decorator`, `spree:controller_decorator`), the four customization surfaces in preference order (Events > Webhooks > Dependencies > Decorators), Spree::Dependencies for swapping service objects, gem release/versioning, and the deprecated Deface engine. Use when building a reusable Spree extension or adding non-trivial customization to an app.

27SKILL.mdUpdated May 14, 2026

orcaqubits/spree-extensions

orcaqubits/spree-events-webhooks

development

VerifiedTrustedCommunity

Build with Spree's event bus and Webhooks 2.0 — `Spree::Events` publication, `Spree::Subscriber` DSL with `subscribes_to` and `on`, wildcard matching, lifecycle events (`{model}.created/.updated/.deleted` via `publishes_lifecycle_events`), the canonical event catalog (order.*, payment.*, shipment.*, product.*), Webhooks 2.0 endpoints, HMAC-SHA256 signing (`X-Spree-Webhook-Signature`), exponential-backoff retries, and Sidekiq job orchestration. Use when wiring event-driven business logic, building webhook consumers, or replacing ActiveSupport callback chains.

27SKILL.mdUpdated May 14, 2026

orcaqubits/spree-events-webhooks

orcaqubits/spree-dev-patterns

tools

VerifiedTrustedCommunity

Cross-cutting Spree development patterns — the customization preference hierarchy (Events > Webhooks > Dependencies > Decorators), `Spree::Dependencies` service-object swapping, the `_decorator.rb` + `prepend` + `self.prepended` idiom, idempotent subscribers and webhook receivers, multi-store scoping discipline, prefixed IDs, calculator polymorphism (shipping/promotion/tax share the base), service-object composition with `dry-monads` or simple results, why to avoid `class_eval` reopening and Deface, and Spree-on-Rails idioms (Hotwire/Turbo Stimulus, ActiveStorage, Action Cable, Sidekiq). Use when designing the architecture of a Spree extension or solving cross-cutting concerns.

27SKILL.mdUpdated May 14, 2026

orcaqubits/spree-dev-patterns

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/orcaqubits/agentic-commerce-claude-plugins.git

# Copy into Claude Code skills folder (global)
cp -r agentic-commerce-claude-plugins/dist/codex/nlweb-protocol/skills/nlweb-deployment ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

orcaqubits/agentic-commerce-claude-plugins

27 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT