skills/google-ranking-signals/SKILL.md
This skill should be used when the user asks to "audit a page for SEO", "analyze ranking signals", "check Google ranking factors", "optimize content for Google", "explain how Google ranks pages", "analyze the Google leak", "check NavBoost signals", "evaluate site authority", "improve E-E-A-T", "understand why a page ranks", or requests any SEO analysis grounded in Google's internal ranking systems. Provides comprehensive expertise based on the 2024 Google Content Warehouse API leak (2,596 modules, 14,014 attributes).
npx skillsauth add schoberg/google-ranking-signals-plugin Google Ranking SignalsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
In May 2024, internal documentation from Google's Content Warehouse API was publicly leaked, revealing 2,596 modules and 14,014 attributes that describe how Google evaluates, scores, and ranks web pages. This was the largest confirmed leak of Google's ranking system internals in the company's history.
The leak contradicted several of Google's long-standing public statements:
| Google's Public Claim | What the Leak Revealed |
|---|---|
| "We don't use domain authority" | siteAuthority attribute exists and is used |
| "Chrome data isn't used for ranking" | NavBoost system ingests Chrome click data extensively |
| "Click data is too noisy to use" | NavBoost is one of the most powerful re-ranking signals |
| "We don't sandbox new sites" | hostAge attribute confirms new-domain sandboxing |
| "PageRank is outdated" | PageRank is still computed for every indexed document |
Important caveats: The leak shows what attributes exist, not how heavily each is weighted. Some features may be experimental or deprecated. Treat signals as confirmed factors, but impact levels as informed estimates.
| Category | Primary Modules | Key Fields | Reference |
|---|---|---|---|
| NavBoost (Clicks) | QualityNavboostCrapsCrapsClickSignals (10), QualityNavboostCrapsCrapsData (15+) | goodClicks, badClicks, lastLongestClicks, unicornClicks | references/navboost-signals.md |
| Site Authority | QualityNsrNsrData (58), PerDocData (230+) | siteAuthority, nsr, pagerankNs, hostAge, chromeInTotal | references/site-authority-signals.md |
| Content Quality | QualityNsrPQData (19), QualityTimebasedSyntacticDate (14) | contentEffort, chard, tofu, OriginalContentScore, bylineDate | references/content-quality-signals.md |
| Author & Entity | RepositoryWebrefEntityJoin (12), PerDocData | annotatedEntityId, authorObfuscatedGaiaStr, topicalityScore | references/author-entity-signals.md |
| Link Signals | IndexingDocjoinerAnchorStatistics (53), AnchorsAnchor (38), AnchorsAnchorSource (25) | penguinPenalty, uniqueDomainCount, homePageInfo, pagerankWeight | references/link-signals.md |
| Chrome & UX | QualityNsrNsrData, NavBoost integration | chromeInTotal, directFrac, pnav, voterTokenCount | references/chrome-ux-signals.md |
| Special Treatments | CompressedQualitySignals (38) | pandaDemotion, scamness, unauthoritativeScore, productReview* | references/special-treatments.md |
| Ranking Architecture | CompositeDoc (44), CompressedQualitySignals, GDocumentBase (32) | scaledSelectionTierRank, crapsNew*Signals | references/ranking-architecture.md |
For attribute-level lookup, consult references/signal-catalog.md — an alphabetical index of 150+ confirmed attributes with module paths, data types, and impact levels.
To audit a page against the leaked ranking signals:
Gather page data — <head> data and structured data anywhere on the page must come from raw HTML. WebFetch converts HTML to markdown before its summarizer sees it, which strips <meta>, <link>, and <script> content (including <script type="application/ld+json"> blocks, wherever they appear in the document). So WebFetch cannot reliably report on <head> or on JSON-LD structured data regardless of how the prompt is phrased. (Note: WebFetch can report on JSON-LD that appears as visible page content — e.g., code examples on schema.org's docs — but that's documentation, not real schema markup.)
1a. Raw HTML for <head> (authoritative source) — Run via Bash:
curl -sL --compressed --max-time 10 -A 'Mozilla/5.0 (compatible; ClaudeCodeSEOAudit/1.0)' <url>
Extract the <head>...</head> block and read it verbatim — every <meta>, <link>, <title>, and <script src> tag inside <head>.
Search the entire document (not just <head>) for structured data, because JSON-LD, microdata, and RDFa can appear in <body>:
<script type="application/ld+json">...</script> anywhere in the documentitemscope, itemtype, or itemprop attributestypeof, vocab, or property attributesOnly report structured data as missing if all three formats are absent from the whole document.
Treat curl as unusable for <head> analysis if any of these are true:
<head> contains zero <meta> tags (indicates JS-rendered shell, bot challenge page, or stub — e.g., Cloudflare/Datadome/"Please wait...")1b. Visible content (WebFetch) — Always use WebFetch for headings (H1/H2/H3), visible body content, author byline, publish date, word count, and link structure. Summarization is appropriate for this content.
Critical rule — handling confidence for <head> and structured data in the final report:
<head> findings AND structured data findings are authoritative. Report present/absent with confidence.<head> tag, JSON-LD block, microdata, or RDFa as "missing". Mark all <head>-derived AND structured-data findings as "could not verify — raw HTML unavailable". Never substitute WebFetch for these — WebFetch strips both <head> content and <script> blocks during its markdown conversion. Diagnose the cause and tell the user how to recover (see Step 1d below).1d. When curl is unusable — diagnose and explain: Identify the most likely cause from the curl output and body content, and include a "Why <head> couldn't be verified" section in the audit. Most audits are own-site audits, which opens up easier recovery paths (CMS/template inspection, allowlisting your own WAF, logged-in browser session). If you don't already know, ask: "Is this your own site, or one you're auditing externally?" and tailor the recovery suggestions accordingly. See examples/page-audit-workflow.md Step 1d for the full diagnostic table with both own-site and third-party recovery paths. Common causes:
<head> from DevTools. Third-party: retry with a browser UA, or paste view-source.<div id="root">/__NEXT_DATA__/etc. with near-empty <head>. Own site: check the template/component building <head> (Next.js <Head>, React Helmet, Vue meta). Either: paste rendered <head> from DevTools Elements after page loads.Always include in the audit output a line stating which method supplied the <head> data, e.g., "<head> source: curl (authoritative)" or "<head> source: unavailable — see 'Why <head> couldn't be verified' below".
Evaluate by signal category — Walk through each category in the table above. For each, consult the corresponding reference file and check the page against the documented attributes. Key areas to assess:
Identify signal gaps — Flag categories where the page is weak or missing signals entirely. Prioritize gaps in high-impact categories (NavBoost, Content Quality, Links).
Prioritize improvements — Rank recommendations by estimated impact and implementation effort. Quick wins first, structural changes second.
Deliver actionable recommendations — Cite specific attributes from the leak to ground each recommendation. Reference the detailed guide in examples/page-audit-workflow.md.
To optimize content based on leak insights:
Analyze content signals — Check token count, vocabulary diversity (unique tokens vs total), title-query alignment (titlematchScore), and freshness indicators (BylineDate, SyntacticDate).
Evaluate author/entity presence — Verify author byline exists, credentials are discoverable, and entity associations are clear. Check for brand mentions across the web.
Assess freshness — Determine if the query triggers QDF. Review publish dates, last-modified signals, and content update history.
Optimize for engagement — Improve elements that drive goodClicks and lastLongestClicks: compelling titles, strong introductions, comprehensive coverage that satisfies search intent.
Apply the structured checklist in examples/content-optimization-checklist.md for a signal-by-signal walkthrough.
To answer specific questions about how Google's ranking system works:
Search the signal catalog — Consult references/signal-catalog.md for the relevant attribute(s). Each entry includes the module path, data type, impact level, and cross-references.
Read the category reference — For deeper context, read the corresponding category file from references/.
Cite specific attributes — Ground every answer in named attributes and modules from the leak. Distinguish between confirmed attributes and inferred behavior.
Note contradictions — When the answer contradicts Google's public statements, call this out explicitly with the evidence from the leak.
Detailed technical documentation organized by signal category:
references/navboost-signals.md — Click data, pogo-sticking, 13-month window, Chrome integrationreferences/site-authority-signals.md — Domain trust, age, sandbox, PageRankreferences/content-quality-signals.md — Freshness, tokens, title matching, version historyreferences/author-entity-signals.md — E-E-A-T, author scoring, entity recognitionreferences/link-signals.md — Backlink quality tiers, diversity, freshness, anchor textreferences/chrome-ux-signals.md — Engagement metrics, session data, scroll depthreferences/special-treatments.md — Whitelists, authority flags, YMYL treatmentsreferences/ranking-architecture.md — Ascorer, Twiddlers, re-ranking pipelinereferences/signal-catalog.md — Alphabetical index of 150+ attributes (use for quick lookup)Practical workflows and templates:
examples/page-audit-workflow.md — Complete audit walkthrough with sample report formatexamples/content-optimization-checklist.md — Signal-by-signal optimization checklistexamples/signal-gap-analysis.md — Competitive signal comparison frameworkdevelopment
Maintainer-only workflow for handling GitHub Secret Scanning alerts on OpenClaw. Use when Codex needs to triage, redact, clean up, and resolve secret leakage found in issue comments, issue bodies, PR comments, or other GitHub content.
development
Maintainer workflow for OpenClaw releases, prereleases, changelog release notes, and publish validation. Use when Codex needs to prepare or verify stable or beta release steps, align version naming, assemble release notes, check release auth requirements, or validate publish-time commands and artifacts.
development
Run, watch, debug, and extend OpenClaw QA testing with qa-lab and qa-channel. Use when Codex needs to execute the repo-backed QA suite, inspect live QA artifacts, debug failing scenarios, add new QA scenarios, or explain the OpenClaw QA workflow. Prefer the live OpenAI lane with regular openai/gpt-5.4 in fast mode; do not use gpt-5.4-pro or gpt-5.4-mini unless the user explicitly overrides that policy.
development
End-to-end Parallels smoke, upgrade, and rerun workflow for OpenClaw across macOS, Windows, and Linux guests. Use when Codex needs to run, rerun, debug, or interpret VM-based install, onboarding, gateway smoke tests, latest-release-to-main upgrade checks, fresh snapshot retests, or optional Discord roundtrip verification under Parallels.