Do LLM Bots Actually Read JSON-LD, and Will They Cite It?

Live experiment

TL;DR - We built a secret page with a fact that only lives in JSON-LD, nowhere on the page. So a secret page with a secret fact. Major AI crawlers have already found this page. Over the coming days/weeks/months I will ask various AI's about this mysterious term to see if it gets discovered!

Scope: this tests whether structured data that sits in a page's source but not its visible render gets ingested and cited by AI answer engines. Vol. 02 covered whether those engines execute JavaScript at all; this is the next layer down.

Does JSON-LD Actually Feed the Answer?

Search engines have read structured data for years. The pitch every schema vendor makes is that this same invisible layer now feeds the AI engines too. Maybe it does. We wanted to actually test that, not assume it.

Forget JavaScript for a second. A ton of what a page tells a machine never shows up on screen for a human - JSON-LD, meta tags, HTML comments, stuff that sits in the source and never renders. So here is the question: if a fact lives only in the JSON-LD, with nothing visible backing it up anywhere on the page, will an AI engine swallow that fact and repeat it back when you ask? I honestly did not know going in.

What we built

One page. One coined framework buried inside it. The whole thing hinges on where that framework lives: its name and details sit only in the structured data and a few other non-visible layers. Not in the body copy, not in the headline, not in the meta description. Read the page the way a person does and you never see it.

To keep it honest, every layer carries its own marker, so a hit points back to exactly one source:

  • A visible control - a separate term written into the body copy that you can actually see on the page. If the engines repeat this one, I know the visible text got read, so silence on the structured data means something.
  • The JSON-LD layer - the framework and one coined sub-term live only here, and that sub-term exists nowhere else on the page or anywhere on the web.
  • A few other hidden layers - a custom meta tag, an HTML comment, a display:none block - each with its own marker, so I can see whether the engines treat them differently from JSON-LD.

Every layer got a unique token. A coined sub-term cannot show up by accident - it does not exist anywhere I did not put it. So if an engine spits it back, it read the structured data. That is the whole trick.

How we are measuring success

Two tracks, same as the rest of this series.

First, did they even fetch the page? Our edge middleware logs every known AI crawler that hits the site - which bot, which path, what time. No guessing about whether the page got crawled.

Second, the regurgitation test. Once the crawlers have had a few days, we ask the major engines about the planted terms. Ask about the visible control to confirm the visible text was ingested at all, then ask about the structured-data-only term to see if it comes back. That term has no other home, so any engine that returns it read the JSON-LD. It does not need me to take any model's word for anything.

We Aren't Sharing the URL or the "secret phrase"

Two things stay quiet while this runs: the address of the test page, and the secret phrase planted on it. The phrase returns zero results anywhere right now, and that is the whole reason a later citation means anything. If I printed it here, on a page the search and AI crawlers already read, I would just be handing them the answer - and a citation after that proves nothing. Same goes for the URL. So both stay sealed until the measurement window closes, and then I will show you everything: the page, the phrase, and where it ended up.

What we expect

I have no idea. AI said it should happen fast, I think it will take weeks or months. My bet is some AIs read the JSON-LD and some do not. If structured data turns out to be a live ingestion channel, that changes how hard a brand should lean on schema to state its facts. If it is a dead end, that is just as useful to know, and I will tell you what to do instead.

Follow along

The clock is running. As the bot log fills in and the engines start answering, I will update this post with the crawl timeline, the layer-by-layer results, and the full reveal of what I planted and where. Want the raw methodology or a look at the instrument once it is done? Let us know.