Is noindex an AI shield, or just a search instruction?
Everybody assumes that if a page is set to noindex, it stays out of the AI engines too. I was not sure that was true. noindex tells search engines one thing: do not list this page in results. But AI answer engines are not search engines, and they do not all play by the same rules.
So, two questions. Will an AI crawler fetch and ingest a noindex page at all? And if the page is in no index anywhere, can a link people pass around by hand - a post, a forwarded email - still drag its content into an AI answer? If the answer to both is yes, then noindex is not the shield people think it is, and sharing is its own way into the engines that has nothing to do with search.
What we built
One page, set to noindex with a meta-robots tag. We left it out of the sitemap, out of llms.txt, and we did not link to it from anywhere on the site. As far as the normal discovery machinery is concerned, the page does not exist.
On it sits a coined term that returns zero results anywhere before launch and lives only on this one page. We kept it out of the messages we use to share the link - those carry the bare URL and a vague teaser, nothing else. That gap is the whole point. The term shows up nowhere in the share text, so any engine that later repeats it had to fetch the page itself, not just read our posts about it.
Then we hand it out on purpose - a few channels, staggered over several days, so each share can be lined up against the crawl log. We stayed off search-owned surfaces too, so the only way search itself reaches the content is by crawling the noindex page directly.
How we are measuring it
Same two tracks as the rest of this series. Crawl confirmation comes from our edge middleware, which logs every known AI crawler, the path it asked for, and the time. Because the channels go out on different days, the log tells us more than whether the page got fetched - it tells us which share most likely set it off.
Then the regurgitation test: ask the major engines about the coined term after a crawl window. A hit means the noindex page was read and ingested anyway, despite carrying every do-not-index signal short of flat out blocking the crawler.
What we are withholding, and why
One thing I am holding back: the coined term itself. That stays sealed until the run is over. Naming it on an indexed post would put it in front of the engines as plain text, and then a later citation would prove nothing - the whole test rests on that term existing nowhere a crawler can already reach. The page is a different story. I am not linking it from here, but the TL;DR up top tells you how to find it, and I am fine with that. A crawler that stumbles onto the page through this post instead of a shared link muddies which channel gets the credit, not whether noindex held. The term is the part that has to stay quiet.
What we expect
On the record, so you can measure me against it later. My bet is that at least one engine fetches the page off a shared link despite noindex, because fetching and indexing are different pipes and the crawlers do not all treat the directive the same way. The negative case matters just as much. If nothing comes back, the bot log still tells us which of two stories it is: a fetch with no later citation means the page was read and then respected, and no fetch at all means the shares never reached a crawler. Those are very different outcomes, and we can tell them apart.
Follow along
The clock is running. As the bot log fills in and the engines start answering, I will update this post with the crawl timeline, the channel attribution, and the full reveal of what I planted. Want the methodology once the run is done? Let us know.