AI search GEO robots.txt crawlers technical SEO

The 2026 AI crawler roster: GPTBot, PerplexityBot, ClaudeBot, OAI-AdsBot—which one actually sends traffic?

Four bots crawl your site every day. Only one of them sends visitors back. Here’s the data, the decision matrix, and the robots.txt snippets I use on every audit.

By Alexander Kirsch-Clayton, Founder May 22, 2026 ·8 min read

Your robots.txt is probably optimized for the wrong era

In every GEO audit I open, the first thing I check is robots.txt. And most German B2B sites are doing one of two things wrong: blocking the bots that send traffic, or letting all four AI crawlers in without understanding which one is actually worth feeding.

Those aren’t equivalent mistakes, but they share a root cause. The robots.txt on a typical mid-market site was written for a world with one crawler that mattered—Googlebot. That world is gone. There are now four AI crawlers worth naming individually, they have wildly different economics, and one of them only appeared in April 2026. If your rules haven’t changed since 2023, they’re answering a question nobody is asking anymore.

This post does two things. It gives you the identity card for each bot, and it pairs that with the one number that should drive your decision: how many pages each crawler takes before it sends a single visitor back.

Meet the 2026 AI crawler roster: four bots, four strategies

OpenAI alone runs four documented crawlers, each with a separate job and a separate robots.txt token: GPTBot for model training, OAI-SearchBot for ChatGPT Search results, ChatGPT-User for live browsing inside a chat, and OAI-AdsBot for validating ad landing pages. Add Perplexity and Anthropic and you have the four names that show up most in audit logs.

GPTBot—OpenAI

Token: GPTBot. Purpose: collects content to train OpenAI’s foundation models. This is the volume monster—its share of all crawler requests nearly tripled in a year, from 4.7% in July 2024 to 11.7% in July 2025. Within the AI-only bot category it went from 11.9% to 28.1%. It publishes a gptbot.json IP-range file, so you can verify it.

OAI-AdsBot—OpenAI (April 2026 newcomer)

User-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-AdsBot/1.0; +https://openai.com/adsbot. It visits the landing page behind an ad submitted to ChatGPT to check policy compliance. OpenAI states the data it collects is not used for training. It’s the one bot here you almost certainly haven’t written a rule for yet.

PerplexityBot—Perplexity

User-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot). Perplexity’s docs are explicit: it is a search-indexing bot, not a training crawler, and allowing it is the documented path to appearing in Perplexity results. Remember that sentence—it’s the whole argument later in this post.

ClaudeBot—Anthropic

User-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com). Anthropic describes its purpose as collecting web content that could contribute to training. It respects Disallow and Crawl-delay. It is, structurally, a training crawler—and that matters for what you get back from it.

The volume shock: training crawlers now own more than half the traffic

You’ve probably seen the headline that ChatGPT crawls 3.6× more than Googlebot. It’s real, but read the fine print before you panic. An Alli AI study of 24 million proxy requests across 69 sites (January–March 2026) found that OpenAI’s ChatGPT-User crawler—the live-browsing one, not GPTBot—made 3.6× more requests than Googlebot in that sample. Across Cloudflare’s whole network the picture is calmer: Googlebot still leads GPTBot on share. So the 3.6× figure describes real-time fetches on a specific platform’s customer base, not GPTBot eating Google.

The bigger structural shift is this: in April 2026, training-dedicated crawlers crossed 50% of all tracked AI bot requests for the first time, hitting 51.5%. Googlebot’s share fell from 38.7% in January 2026 to 31.6% in March—the largest quarterly share loss for any single crawler in Cloudflare Radar’s history.

The metric that actually matters: crawl-to-referral ratio

Crawl volume is the wrong scoreboard. The right one is the crawl-to-referral ratio—how many pages a bot takes per single visitor it sends back. SEOmator’s 2026 GEO data, built on Cloudflare Radar analytics, lines them up cleanly (March 2026):

Googlebot—5:1 (five pages crawled per referral)
PerplexityBot—111:1
GPTBot—1,276:1
ClaudeBot—23,951:1

Sit with those numbers. Among AI-native crawlers, Perplexity returns one referral for every 111 pages it crawls; GPTBot needs 1,276—more than ten times the cost for the same visit. ClaudeBot needs nearly 24,000, because Anthropic has no search-referral mechanism comparable to Perplexity or ChatGPT Search. Claude responses occasionally cite a source, but those citations don’t produce clickable, trackable referrals.

The most aggressive crawler is not the most valuable one. PerplexityBot is the highest-ROI AI crawler to optimize for in 2026—despite GPTBot dominating raw crawl volume—because it converts roughly one in every 111 crawls into an actual visit.

This reframes the whole exercise. You are not optimizing for crawl frequency. You’re optimizing for citation that links back. On that scale, GPTBot’s enormous volume is a cost, not a win.

OAI-AdsBot: the April 2026 entrant that needs its own rule

Here’s the trap I’m already seeing in audits. OAI-AdsBot appeared in OpenAI’s public crawler docs in April 2026, and people assume their existing GPTBot rule covers it. It does not. A User-agent: GPTBot directive has zero effect on OAI-AdsBot—they’re separate tokens, and you need a separate Disallow to control it.

OAI-AdsBot validates the landing page behind an ad submitted to ChatGPT and checks it against OpenAI’s ad policies. The data isn’t used for training. There’s also a verification gap worth flagging: at the time it was documented, OpenAI had not published an adsbot.json IP-range file, unlike GPTBot’s gptbot.json. That makes it harder to confirm that traffic claiming to be OAI-AdsBot is genuine.

The commercial risk of conflating the two cuts both ways. Block GPTBot for training reasons and accidentally assume ads are covered, and you may quietly degrade how your landing pages are validated if you ever run ChatGPT ads. Decide on OAI-AdsBot deliberately, on its own line.

Robots.txt decision matrix: allow, disallow, or rate-limit?

Here’s the recommendation I default to, with the reasoning. Adjust for your own posture on training data—that’s a values call, not a technical one.

PerplexityBot—Allow. It’s a search bot with the best referral economics of any AI crawler. Blocking it removes you from Perplexity results for no upside.
OAI-SearchBot—Allow. Surfaces you in ChatGPT Search; a referral channel, not training.
ChatGPT-User—Allow. Live fetches triggered by a real person asking about you in ChatGPT.
GPTBot—Your call, lean disallow. Pure training, 1,276:1 economics. It’s also the most-blocked AI crawler on Cloudflare’s network, appearing in 5.52% of all Disallow rules.
ClaudeBot—Disallow unless you actively want to contribute training data. 23,951:1 means effectively no traffic back. It’s the fastest-growing blocked bot, rising from 9.6% to 10.1% of blocking rules in early 2026.
OAI-AdsBot—Allow only if you run or plan to run ChatGPT ads; otherwise disallow. Either way, give it its own line.

Config 1—allow the referrers, block the training crawlers. This is what I recommend for most B2B sites: stay visible in AI search, opt out of being training fodder.

User-agent: PerplexityBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: OAI-AdsBot
Disallow: /

Config 2—maximum visibility. You’re comfortable with training in exchange for the widest possible AI surface area.

User-agent: GPTBot
User-agent: OAI-SearchBot
User-agent: ChatGPT-User
User-agent: PerplexityBot
User-agent: ClaudeBot
Allow: /

User-agent: OAI-AdsBot
Disallow: /

Config 3—ease the load without going dark. If training crawlers are straining your server, slow them rather than ban them. Note that not every bot honors Crawl-delay—GPTBot in particular ignores it—so treat this as best-effort and pair it with edge rate-limiting.

User-agent: ClaudeBot
Crawl-delay: 30
Disallow: /search
Disallow: /cart

User-agent: GPTBot
Disallow: /

Content strategy implication: write for Perplexity first, rank in Google second

If PerplexityBot returns a visitor every 111 crawls and it’s documented as a search bot you simply have to allow, then your citable editorial content has a clear primary target for the back half of 2026. Write the page that answers one specific question completely, structure it so a machine can lift a clean answer, allow PerplexityBot, and you’ve built the highest-ROI AI traffic channel available.

Google still matters—its 5:1 ratio is in a different league because traditional search drives enormous referral volume. This isn’t “abandon SEO.” It’s a sequencing call: the same structural choices that make a page quotable in Perplexity (a direct answer up top, clear headings, real specifics) are the choices that help in Google too. Write for the strict reader first. The lenient one comes along for free.

My 5-point AI crawler health check

This is the sequence I run on every GEO audit engagement. You can do the first four yourself this week.

Pull 30 days of server logs and segment by user-agent. Before you change a single rule, find out which of the four bots are actually hitting you and how hard. Most owners are surprised.
Read your current robots.txt out loud, token by token. Confirm whether each AI bot is allowed, disallowed, or silently uncovered. “Uncovered” usually means allowed by default—including OAI-AdsBot.
Check for the OAI-AdsBot blind spot specifically. If you have a GPTBot rule and assumed it covers ads, fix that now with its own line.
Verify the high-volume crawlers against published IP ranges (gptbot.json, Perplexity’s ranges). Spoofed user-agents are common; don’t trust the string alone.
Map crawl volume against referral value. If GPTBot and ClaudeBot are your top two by volume but send no traffic, you’re subsidizing training at the cost of crawl budget—decide if that’s intentional.

FAQs

Should I block GPTBot?

For most B2B sites, leaning toward disallow is defensible: GPTBot is pure training with a 1,276:1 crawl-to-referral ratio, and it’s already the most-blocked AI crawler on Cloudflare’s network. Blocking it costs you almost no traffic. The trade-off is opting out of training data contribution, which is a values decision, not a technical one.

What is OAI-AdsBot, and do I need a separate rule?

OAI-AdsBot, documented by OpenAI in April 2026, validates landing pages behind ads submitted to ChatGPT. Yes, you need a separate rule—a GPTBot directive does not control it. They’re distinct user-agent tokens. The data it gathers is not used for training.

Which AI crawler drives the most referral traffic in 2026?

Among AI-native crawlers, PerplexityBot is the most referral-efficient by a wide margin—one referral per 111 pages crawled, versus 1,276 for GPTBot and 23,951 for ClaudeBot. Googlebot still leads overall at 5:1, but it isn’t AI-native. If you’re optimizing specifically for AI search referrals, optimize for Perplexity.

Does allowing AI crawlers hurt my Google ranking?

Allowing search-oriented AI bots like PerplexityBot or OAI-SearchBot has no bearing on Googlebot, which crawls under its own token and ranks on its own signals. The structural work that makes a page citable in AI answers—a clear lead answer, clean headings, real specifics—tends to help in Google too.

How do I verify which bots are actually crawling my site?

Segment your server logs by user-agent over a 30-day window, then verify the high-volume hits against each provider’s published IP-range file (for example gptbot.json). User-agent strings can be spoofed, so the IP check matters. Note that OAI-AdsBot had no published range file when it was documented, which makes it harder to verify than its siblings.

Next step: get your site’s full AI crawler profile

We run exactly this check as part of every GEO audit. What we typically find on German B2B sites: a robots.txt frozen in 2023, GPTBot and ClaudeBot quietly consuming the bulk of crawl budget for zero traffic, PerplexityBot either blocked or never deliberately allowed, and no rule at all for OAI-AdsBot.

Fixing the first three is mostly a 30-minute job once you have the log data. Knowing which trade-offs are right for your business is the part worth getting a second pair of eyes on. If you want your full AI crawler profile—who’s crawling, what they take, what they send back—that’s where our audit starts. Flat rates starting at €99 net.