dist/codex/nlweb-protocol/skills/nlweb-schema-org-grounding/SKILL.md
Prepare and structure site content as Schema.org JSON-LD for NLWeb ingestion — covers the supported types (Recipe, Product, Movie, Event, Article, RealEstate, Course, etc.), per-type behavior in NLWeb's tool routing, JSON-LD embedding patterns in HTML, sites.xml registration, and how the `schema_object` flows through ranking back to agent results. Use when authoring or auditing the structured data on a site that will be exposed via NLWeb.
npx skillsauth add orcaqubits/agentic-commerce-claude-plugins nlweb-schema-org-groundingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Fetch live references:
<returnStruc> shapes work.schema.org JSON-LD validator — Google's Rich Results Test is a quick way to validate before ingest.AskAgent/python/methods/recipe_substitution.py, accompaniment.py, compare_items.py for examples of how type-specific tools consume the schema_object.NLWeb's defining design choice: results carry their full Schema.org object back to the agent. Unlike a generic RAG system that returns text chunks, NLWeb returns structured JSON-LD — so an agent receiving a Recipe result gets ingredients, cookTime, nutrition, recipeYield, not just a paragraph of text. This is what makes NLWeb results agent-actionable.
R.V. Guha (NLWeb's author) co-created Schema.org for exactly this reason — the data was already structured; NLWeb finally exposes it to agents.
site_types.xml enumerates the types with per-type tool / prompt overrides. Common types (verify the live file):
| Type | Use Case | Type-Specific Tools |
|------|----------|---------------------|
| Recipe | Cooking sites | recipe_substitution, accompaniment |
| Product | E-commerce | compare_items, item_details |
| Movie / TVSeries | Streaming/reviews | compare_items |
| Event | Calendars, ticketing | item_details |
| Article / NewsArticle / BlogPosting | News, blogs | summarize-mode default |
| RealEstate / Apartment / House | Listings | item_details, compare |
| Course | EdTech | item_details |
| Restaurant / LocalBusiness | Maps, directories | accompaniment, item_details |
| Book | Catalogs | compare, item_details |
| Person / Organization | Profiles | item_details |
NLWeb falls back to a default tool set for any Schema.org type not explicitly enumerated.
Schema.org JSON-LD is typically embedded in HTML via a <script type="application/ld+json"> tag:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Recipe",
"name": "Classic Tomato Soup",
"url": "https://example.com/recipes/tomato-soup",
"image": "https://example.com/images/tomato-soup.jpg",
"author": { "@type": "Person", "name": "Jane Doe" },
"datePublished": "2025-09-12",
"description": "A simple weeknight tomato soup.",
"recipeIngredient": ["6 ripe tomatoes", "1 onion", "..."],
"recipeInstructions": [...],
"nutrition": { "@type": "NutritionInformation", "calories": "200" },
"cookTime": "PT30M",
"recipeYield": "4 servings"
}
</script>
NLWeb's URL-list ingest path extracts this directly. The richer the JSON-LD, the more useful the result.
schema_object Field in ResponsesEvery NLWeb result contains:
{
"url": "...",
"name": "...",
"site": "...",
"score": 0.87,
"description": "...",
"schema_object": { /* the full JSON-LD as ingested */ }
}
Agents can pattern-match on schema_object.@type to render appropriately, extract specific properties (e.g., offers.price for products), or chain to a follow-up tool call.
In addition to config_nlweb.yaml's sites: allowlist, the demo data ships with a sites.xml-style registry tying site names to crawl sources and Schema.org type defaults. Check the live repo for the current registration convention — this is an area that's been evolving.
| Type | Always include | |------|----------------| | Recipe | name, url, image, recipeIngredient, recipeInstructions, cookTime, recipeYield | | Product | name, url, image, description, offers (price, priceCurrency, availability) | | Article | headline, url, image, author, datePublished, description, articleBody (or summary) | | Event | name, url, startDate, location, description | | Movie | name, url, image, director, datePublished, genre, description | | RealEstate | name, url, image, address, numberOfRooms, floorSize, price |
The fewer fields populated, the worse the result quality — especially for mode=generate answers.
site_types.xml defines a tree:
This is mixed-mode programming in action — small, type-aware LLM calls drive the response.
Before ingest:
<script type="application/ld+json">.@type is one NLWeb's site_types.xml knows about — if not, results still work but use default prompts.@context: "https://schema.org" — NLWeb's parser keys off this.url — it's the deduplication key across retrieval backends.Recipe not CreativeWork) so type-specific tools activate.@type discriminators (e.g., author as Person, offers as Offer).After loading, hit a result and inspect schema_object:
curl 'http://localhost:8000/ask?query=quick+dinners&site=recipes&streaming=false&mode=list' | jq '.results[0].schema_object'
If schema_object is missing key fields, fix the source HTML — not NLWeb's config.
If you want a custom domain (say, Podcast episodes) with type-specific tools:
<site_type> entry in site_types.xml referencing your @type value.prompts.xml (or inherit defaults).methods/ (see nlweb-tools-framework).If your source isn't JSON-LD (CSV, proprietary API), map fields to Schema.org at ingest time, not query time. Update rss2schema.py or write a small adapter that emits Schema.org JSON before calling db_load. The richer the mapping, the better the agent experience.
@type is missing or non-Schema.org — results work but type-specific tools never fire.url is relative — breaks deduplication; always emit absolute URLs.datePublished: "2025-09-12" works; "Sept 12, 2025" does not.offers is a bare string instead of an Offer object — agents lose the price field.Always validate JSON-LD with an external tool before assuming ingest will work — silent parser failures are common.
development
Build with Spree's headless Next.js storefront — the official `spree/storefront` repo (Next.js 16 App Router with Server Actions and Turbopack, React 19 Server Components, Tailwind CSS 4, TypeScript 5, `@spree/sdk`, Sentry), server-only auth (httpOnly JWT cookies + publishable key), MeiliSearch faceted catalog, one-page checkout with Apple/Google Pay/Klarna/Affirm/SEPA, multi-region market routing, GA4 + JSON-LD SEO, and Vercel/Docker deployment. Use when forking or customizing the storefront, or evaluating headless adoption.
tools
Build Spree extensions as Rails engines — gem scaffolding, `bin/rails g spree:extension`, mounting routes/migrations/assets, the modern `prepend` decorator pattern (`*_decorator.rb` with `self.prepended(base)`), generators (`spree:model_decorator`, `spree:controller_decorator`), the four customization surfaces in preference order (Events > Webhooks > Dependencies > Decorators), Spree::Dependencies for swapping service objects, gem release/versioning, and the deprecated Deface engine. Use when building a reusable Spree extension or adding non-trivial customization to an app.
development
Build with Spree's event bus and Webhooks 2.0 — `Spree::Events` publication, `Spree::Subscriber` DSL with `subscribes_to` and `on`, wildcard matching, lifecycle events (`{model}.created/.updated/.deleted` via `publishes_lifecycle_events`), the canonical event catalog (order.*, payment.*, shipment.*, product.*), Webhooks 2.0 endpoints, HMAC-SHA256 signing (`X-Spree-Webhook-Signature`), exponential-backoff retries, and Sidekiq job orchestration. Use when wiring event-driven business logic, building webhook consumers, or replacing ActiveSupport callback chains.
tools
Cross-cutting Spree development patterns — the customization preference hierarchy (Events > Webhooks > Dependencies > Decorators), `Spree::Dependencies` service-object swapping, the `_decorator.rb` + `prepend` + `self.prepended` idiom, idempotent subscribers and webhook receivers, multi-store scoping discipline, prefixed IDs, calculator polymorphism (shipping/promotion/tax share the base), service-object composition with `dry-monads` or simple results, why to avoid `class_eval` reopening and Deface, and Spree-on-Rails idioms (Hotwire/Turbo Stimulus, ActiveStorage, Action Cable, Sidekiq). Use when designing the architecture of a Spree extension or solving cross-cutting concerns.