# 100-Site GEO Survey — Reproduction Runbook Last updated: 2026-05-11. Companion to https://geolocus.ai/multi-site-survey. --- ## What this is Between 2026-03-30 and 2026-04-29, GeoLocus Group audited 100 websites across 31 industries against a 13-signal protocol — 8 binary readiness signals plus 5 quantitative metrics, each binarized at a defined threshold. Every site receives the same scan, the same thresholds, and the same arithmetic. The page at `/multi-site-survey` publishes the per-site scorecard, the cohort-level pass rates, and the methodology. This runbook is the second half of that publication: the script, the thresholds, the cohort, and the output schema, so any third party can re-run the same audit on their own machine and get the same numbers. The audit reads only public HTTP endpoints. No paid API keys are required for the 8 binary signals or the four publicly-measurable quantitative metrics (RR, RTC, RPS, LMR). Of those four, only RPS is a pure-infrastructure metric; RR and RTC are data-structure metrics (page-template efficiency) and LMR is a content-discipline metric (editorial freshness). The fifth metric — Source Grounding Ratio (SGR) — is the second content-discipline metric; it uses an LLM (Claude Sonnet) to extract verifiable claims from the bot-UA HTML and is the only optional cost. SGR is treated as a separate quantitative add-on rather than a hard prerequisite for the 8-pillar pass/fail. This runbook documents the v3.4 protocol used for the 2026-04-29 cohort run. The reference implementation in `audit-v3.js` is included verbatim below in the [Reference Implementation](#reference-implementation) section. --- ## Prerequisites - Node.js 20.x or later - `curl` 7.x or later (system curl is fine; no special build flags) - ~2 GB free RAM during run (parallel sitemap crawls can spike memory) - ~30 minutes wall-clock for a 100-site cohort at concurrency=5 - No paid API keys for the 8 binary signals + RR/RTC/RPS/LMR - For SGR (the moat metric): an Anthropic API key (`ANTHROPIC_API_KEY` env var). Skip SGR by setting `SKIP_SGR=1` and the binary score still produces. --- ## How to run ```bash # Clone or download the script (see Reference Implementation below) mkdir geo-audit && cd geo-audit # ... save audit-v3.js into this directory ... # Optional: SGR via Anthropic export ANTHROPIC_API_KEY=sk-ant-... # Run the full cohort (100 sites, concurrency 5) node audit-v3.js # Run a slice of the cohort (sites 1-10) node audit-v3.js --start 1 --end 10 # Crank concurrency for a faster (less polite) run node audit-v3.js --concurrency 10 # Audit your own site against the same 13 signals (single-site mode) node audit-v3.js --single https://your-site.com ``` Output is written to `audit-receipts-v3/_.json` — one file per site. Run-level summary is written to `audit-receipts-v3/MANIFEST.json`. --- ## The 8 binary signals | # | Signal | Test | Pass criteria | |---|---|---|---| | S1 | Robots AI bots allowed | Parse `https:///robots.txt` | No `Disallow: /` rule matching `GPTBot`, `ClaudeBot`, or `PerplexityBot` UA | | S2 | llms.txt present | `curl -sL https:///llms.txt` | HTTP 200 + body starts with `#` or `##` (markdown, not HTML or empty) | | S3 | llms-full.txt present | `curl -sL https:///llms-full.txt` | HTTP 200 + non-empty body | | S4 | Sitemap fresh | Walk sitemap.xml tree, extract `` from each URL | Median `lastmod` across all URLs ≤ 30 days from run timestamp | | S5 | JSON-LD structured data | `curl -sL https:///` and grep `application/ld+json` | At least one valid JSON-LD `