The pipeline at a glance
Every scan goes through four stages. We crawl the URL, extract structured signals from the HTML, run those signals through a deterministic SEO rubric, and ask an LLM to read the same page and produce an AI-automation score with a confidence percentage on every finding.
- Fetch. A single GET request with a custom user agent, 10-second timeout, 5 MB cap. We follow up to two redirects. Internal, local, and private hosts are blocked at the URL parser.
- Parse. The page is parsed with linkedom — a server-side DOM that mirrors browser behavior. We extract metadata (title, description, canonical, robots), heading structure, link inventory, image inventory, schema markup, and the visible text body.
- SEO score. A fixed rubric of seven checks runs against the parsed signals. Each check returns pass / warn / fail and contributes a weighted share of the 0–100 score. No LLM in this path — the same page produces the same SEO score every time.
- AI score. The body text and a structural summary go to Claude with a cached scoring prompt. The model returns an AI-automation score, a confidence percentage, a breakdown across content / structure / imagery / tone, and up to three short findings. If confidence is below 0.6 the request is escalated from Haiku to Sonnet automatically.
The SEO audit
The audit grades pages against established SEO conventions. Nothing exotic — these are the same things any reviewer would look at on a first pass, encoded so the result is repeatable.
The seven checks
- Meta tags. Presence and length of
<title>andmeta description. Title under 60 chars, description under 160 chars, both non-empty. - Heading structure. Exactly one
h1. No skipped levels (h1 → h3 fails). Heading text is non-empty. - Mobile readiness. Viewport meta present and configured for responsive layout. Touch targets and font-size cues from the rendered CSS where available.
- Page speed signals. Total transferred bytes, render-blocking resource count, and CLS-prone patterns (e.g. images with no width/height attribute).
- Schema markup. Presence of any JSON-LD, RDFa, or Microdata. Validated for parseability.
- Broken links. Sample of internal links HEAD-checked. 4xx and 5xx responses are flagged.
- Image alt text. Every
<img>outside decorative contexts must have a non-emptyalt.
The score
Each check carries a weight reflecting its impact on real-world ranking. Critical failures (no h1, no title) move the score more than soft warnings (description slightly over length). The final 0–100 maps to four verdicts: Excellent 90+, Good 75–89, Fair 60–74, Needs work below 60.
The AI-automation score
The AI score answers one question: how much of this site reads as machine-generated, top to bottom? It is a fingerprint of the site, not a verdict on any single sentence.
The four signal categories
- Content. Lexical patterns, sentence-shape variance, hedged phrasing, factual specificity, and the markers an LLM tends to leave when it writes long prose.
- Structure. Heading regularity, paragraph length distribution, list cadence, and the kind of templated section ordering that auto-generated pages produce.
- Imagery. File-name patterns, dimensions, repeated alt text, and visible artifacts characteristic of generative image models.
- Tone. Voice consistency across sections, register shifts, and the emotional flatness LLMs default to when there is no editorial pass.
Confidence
Every score ships with a confidence percentage. Confidence is the model's own self-reported certainty, not a statistical measurement. Pages with mixed signals (a templated header, a clearly human blog post, a generated FAQ) deliberately produce lower confidence — the model says “I see both,” and the verdict surface reads Mixed signals rather than committing to a side. When confidence drops below 0.6 we re-run the analysis on a stronger model to either resolve the uncertainty or confirm it.
Verdict bands
- Mostly AI-generated. Score above 65, confidence above 0.7. Multiple categories agree.
- Mostly human-written. Score below 35, confidence above 0.7.
- Mixed signals. Anything else. Either the score is ambiguous or confidence is low — both lead to the same honest answer.
What the AI score is not
It is not a plagiarism detector. It does not look at any other page, it does not check for word reuse, and it does not produce a “copied from” output. A site can be entirely original and still score high on AI-automation if a human used a model to draft the prose.
It is not a quality judgment. AI-generated content can be excellent and human-written content can be terrible. The score reads automation, not value.
It is not a moderation tool. We do not flag content for hosting providers or report it anywhere. The score exists on the public report URL and nowhere else.
Limitations we know about
- Single-page only. Phase 1 scans one URL, not a site crawl. A landing page that looks AI-generated may sit in front of a human-written blog the scanner never sees. We will add multi-page passes when the volume warrants it.
- Language coverage. Findings are produced in English regardless of the scanned site's language. Scoring works on non-English pages, but the rendered explanation is always English in v1.
- Auth walls and JS-only content. If the page renders most content client-side without server-rendered fallback, the audit sees only the shell. SEO checks reflect that honestly; AI scoring degrades gracefully but produces lower confidence.
- Adversarial sites. A site engineered to evade AI detection will eventually evade ours too. We do not claim a hard ceiling on either side.
How findings are written
Every finding has three fields: a severity (Strong / Clear / Worth noting), a category, and a one-sentence headline. Headlines describe what we saw, not what to do about it. Findings are deliberately short — they exist to anchor the score, not to prescribe edits.
The model returns at most three findings per scan, even when more signals are present. Forcing the model to pick its top three keeps the report scannable. The full signal set sits behind the score; it is not surfaced in v1.
Versioning
The scoring rubric and the AI prompt are versioned together. When either changes — a new check, a reweight, a prompt revision — the change is recorded in the methodology changelog with the date and the rationale. Past scans are not re-scored. A scan's scored-at timestamp implicitly fixes which methodology version it ran under.
Questions or disputes
If a scan looks wrong to you, send the report URL and what you expected to hello@crawlranker.com. Disputes that surface a real methodology issue land in the changelog.