AI Content Detector · Site-wide, not just paragraphs

Why paragraph detection misses the point

A paragraph detector reads four sentences and makes a call. That’s about as much as it can do.

It cannot see the eight headings of identical length stacked one after the other. It cannot see the section ordering you’ve already met on a hundred AI-built landing pages this year. Structural fingerprints scale with the page. The sentence doesn’t. Google’s helpful-content systems weigh the former more every year.

A skilled copy editor can rewrite a paragraph past a text detector in fifteen minutes. The whole-page fingerprint is much harder to mask. So we look there.

What we read

Content

Lexical patterns, sentence-shape variance, hedged phrasing, factual specificity. We also count how often the body text reaches for common LLM-marker phrases. “Leverage.” “Seamlessly.” “In today’s fast-paced world.” “Unlock the power of.” The hit count appears in the report so you can grep your own copy for it later.

Structure

Heading regularity, paragraph length distribution, list cadence, the templated section ordering AI page-builders produce. We detect the builder itself where we can: Framer, Durable, Mixo, Lovable, v0, Webflow, Wix, Squarespace. The builder name is itself a high-signal input.

Imagery

Filename patterns, alt-text repetition, the visible artifacts a diffusion model leaves behind. A grid of unsplash photos with identical gradient overlays reads very differently from real custom photography, and the score reflects that.

Tone

Voice consistency across sections, register shifts, the emotional flatness LLMs default to when nobody edits the output. A distinct first-person voice or a joke that actually lands moves this score the other direction.

The evidence in the report

Findings don’t just describe. They cite. Each finding carries three pieces of evidence:

Up to three verbatim quotes pulled from the page, tagged with where we found them: body, heading, meta tag, image alt, link URL.
A one-sentence recommendation. The single most actionable thing you could do about it.
Severity and category so you can scan the page at a glance.

Reports also surface a “what’s working” block listing the page’s strongest human-authorship signals. The score is a fingerprint of automation, not a quality judgment. A page can be 90% AI-built and still be good. We say so when it is.

How this differs from text-only detectors

Whole-page input. You paste a URL, not a paragraph. No copying.
Structural signals. Section ordering, heading cadence, detected builder. All first-class inputs.
Imagery signals. Filename patterns and alt-text repetition feed the score. Pure text detectors are blind to them.
SEO pairing. Every scan ships an SEO audit next to the AI score, so you can read the page through both lenses without switching tools.
Visible confidence. Every score carries a confidence percentage. Anything under 0.6 reruns on a stronger model before the report finalises.

Run a free AI content scan

Paste a URL on the homepage. Under 30 seconds for most pages. No signup, no payment, no API key.

Scan a site →

Common questions

Does CrawlRanker work like GPTZero or Originality?

Same category, different scope. GPTZero, Originality, Copyleaks: those tools read a paragraph of pasted text and return a probability. Useful for an essay or a single passage. CrawlRanker reads a URL and grades the whole page across four signal categories. If you want to check a paragraph, use a paragraph detector. If you want to read a site, this is the right shape of tool.

Will my page be flagged just because I used AI to draft it?

Probably not. AI-drafted-then-human-edited pages score in the middle of the band, not the top. The signal load-bears when AI output gets published with no editorial pass and the structure, copy, imagery, and tone all point in the same direction at once. If you wrote the copy yourself but the layout came out of a template, expect a mixed-signal result. That’s the honest answer, not a forced verdict.

How accurate is the score?

Useful enough to act on, honest enough to admit when it isn’t sure. Thin pages, mixed authorship, languages we have less coverage on: those produce a lower confidence percentage and a “Mixed signals” verdict rather than a forced call. When confidence drops below 0.6 the scan automatically escalates to a stronger model and reruns.