How LLMs decide which sources to cite

What ChatGPT, Claude, and Perplexity actually read from a site — and why clean schema.org annotations move the needle more than a third hero slider.

When a prospective customer asks ChatGPT or Claude for "the best WordPress agency in Berlin," what happens behind the answer has very little to do with classical SEO. The model isn't fetching a position-one URL. It's reading a handful of pages, extracting verifiable claims, and deciding which of them are quotable without someone calling foul. The sites that show up have built a better lobby — not repeated the keyword more often.

What an LLM actually reads

Short version: a lot less than the page shows a human. Hero carousels, animated backgrounds, marketing copy — that all happens in a visual layer the language model never sees. What counts is the structured layer underneath: <script type="application/ld+json"> blocks, semantic headings, paragraphs with clear claims, and a reachable /llms.txt at the site root.

Concretely: when a model is deciding whether your agency is a fair answer, it looks for machine-readable signals across three things — identity (who is this?), provability (what can the claim be tied to?), and recency (is this still fresh?). Miss any of the three and the page drops out of the answer set even when the content is genuinely better than the competition.

The three layers that matter

1. Identity: schema.org Organization

Without a full Organization annotation — with name, founder, foundingDate, address, contactPoint, and a sameAs array that resolves to real external profiles — an LLM can't tell whether the site behind the claim represents a real, checkable entity. The most common failure mode: a logo and a company name, but no structured data. The most common fix: one edit to the layout template that covers the whole site at once.

2. Provability: citation and sameAs

LLMs prefer to quote claims that line up with something external. Saying "specializing in WordPress migrations since 2019" without a register filing, a LinkedIn profile, or an older post anchoring the date leaves the model with a free-floating statement. A populated sameAs list on your Organization node and explicit citation fields on Article nodes changes the math.

3. Recency: dateModified and /llms.txt

Models with live web access — ChatGPT with search, Perplexity, Google AI Overviews — weight fresher pages measurably higher. A precise dateModified on every page (the real last content change, not the build timestamp) signals that the claim still holds. A /llms.txt at the root that explains the site in 15 lines saves AI crawlers from reconstructing the same answer from 40 nested templates.

What to do this week

  1. Fill out the Organization JSON-LD properly: founder as a real Person node, foundingDate from the public register, three or more sameAs URLs pointing at external profiles, a contactPoint with email and phone.
  2. Publish a /llms.txt — 10 to 15 lines is enough. Who you are, what you do, where you operate, which third-party sources corroborate the rest.
  3. Read dateModified off the actual content-change timestamp on every meaningful page, not hard-coded build dates.
  4. Ship a real FAQ page with FAQPage schema. Models pull FAQ answers disproportionately often as quotable snippets.

None of this is exotic and none of it is a quarter-long project. Three of the four items are a single template edit. The hourly payoff is higher than any SEO move that worked between 2010 and 2022.

What we see in audits

In every AI-visibility audit we run at Digital Domination, at least one of the three layers is incomplete. Often /llms.txt is missing entirely. Often the founder field holds the brand name where a Person should be. Often dateModified reflects the last deploy, not the last edit. Each of those is its own lever. Together, they decide whether the model picks the page up as a source — or filters it as marketing noise. And because those signals only pay off if the engines actually change what they say about you, it is worth checking the output, not just the markup—we built Cited to watch how all five engines describe a brand over time, so a fix to schema or /llms.txt can be tied to whether the citations actually move.