GEOClarity
SEO

robots.txt for AI Crawlers — Complete Setup Guide

Configure your robots.txt to allow ChatGPT, Perplexity, Google AI Overview and other AI crawlers. Copy-paste templates included.

GEOClarity · · Updated March 13, 2026 · 10 min read

TL;DR — Key Takeaways

  • robots.txt is the #1 technical GEO fix — if AI crawlers are blocked, your site is completely invisible to ChatGPT, Perplexity, and all AI engines
  • AI crawlers strictly obey robots.txt — unlike Google which may index through links, blocked AI bots will never see your content
  • Allow at minimum: GPTBot, ChatGPT-User, PerplexityBot, Google-Extended, and ClaudeBot — these cover the major AI engines
  • Watch for wildcard blocksUser-agent: * with Disallow: / blocks everything including all AI crawlers
  • Changes take effect within 24-48 hours — citation improvements typically follow within 1-2 weeks after re-indexing
  • Combine robots.txt with llms.txt — robots.txt controls access while llms.txt provides AI-specific context about your site structure

Your robots.txt file is the single most impactful GEO fix. If AI crawlers are blocked, your site is invisible to ChatGPT, Perplexity, and every other AI engine. This 5-minute change can unlock AI citations. If you want to go deeper, Page Speed & AI Crawlers: Does It Matter? breaks this down step by step.

Why robots.txt Matters for GEO

Your robots.txt file determines whether AI engines can access your content at all. Unlike Google, which may still index blocked pages through external links, AI crawlers like GPTBot and PerplexityBot strictly obey robots.txt — a single Disallow rule can make your entire site invisible to every AI search engine simultaneously.

robots.txt for AI Crawlers — Complete Setup Guide

AI engines use crawler bots to read your website. These bots check robots.txt before accessing any page. A single Disallow rule can make your entire site invisible to AI search. (We explore this further in GEO for Professional Services (2026).)

Unlike Google, which may still index blocked pages through links, AI crawlers strictly obey robots.txt. No access means zero citations. This relates closely to what we cover in Core Web Vitals Explained: LCP, INP, and CLS for SEO in 2026.

AI Crawler Bots You Must Allow

There are at least 10 major AI crawler bots active in 2026, each serving a different AI engine. The five essential ones to allow are GPTBot and ChatGPT-User (OpenAI), PerplexityBot (Perplexity), Google-Extended (Google AI/Gemini), and ClaudeBot (Anthropic). Missing even one means zero visibility on that platform.

BotEngineUser-Agent
GPTBotChatGPT (training)GPTBot
ChatGPT-UserChatGPT (browsing)ChatGPT-User
PerplexityBotPerplexityPerplexityBot
Google-ExtendedGoogle AI / GeminiGoogle-Extended
ClaudeBotClaude / AnthropicClaudeBot
AmazonbotAlexa / AmazonAmazonbot
Applebot-ExtendedApple IntelligenceApplebot-Extended
Meta-ExternalAgentMeta AIMeta-ExternalAgent
cohere-aiCoherecohere-ai
BytespiderTikTok AIBytespider

Copy this complete template into your robots.txt file to allow all major AI crawlers. It explicitly grants access to GPTBot, ChatGPT-User, PerplexityBot, Google-Extended, ClaudeBot, and five additional AI bots — ensuring maximum AI search visibility while maintaining standard search engine access.

Copy this directly into your robots.txt file: For more on this, see our guide to GEO for Agencies: AI Search as a Service.

## Allow all AI crawlers for GEO
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Amazonbot
Allow: /

User-agent: Applebot-Extended
Allow: /

User-agent: Meta-ExternalAgent
Allow: /

User-agent: cohere-ai
Allow: /

## Standard search engines
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

## Sitemap
Sitemap: https://yourdomain.com/[XML sitemap](/blog/xml-sitemap-ai-crawlers).xml

How to Check Your Current Setup

Check your current robots.txt by visiting yourdomain.com/robots.txt directly in a browser. The most common problem is a wildcard User-agent: * with Disallow: / — this blocks everything including all AI crawlers. CMS defaults, CDN security rules, and hosting provider configurations frequently create these blocks without your knowledge.

Visit yourdomain.com/robots.txt in your browser. Look for any Disallow rules targeting AI bot user-agents. Common problematic blocks: Our What Is Answer Engine Optimization (AEO)? Complete Guide guide covers this in detail.

## REMOVE THESE — they block AI visibility
User-agent: GPTBot
Disallow: /

User-agent: *
Disallow: /

The wildcard User-agent: * with Disallow: / blocks everything including all AI crawlers. As we discuss in Free GEO Audit Tools for AI Visibility, this is a critical factor.

Selective Access

You can grant AI crawlers access to public content areas like /blog/ and /products/ while blocking sensitive directories like /admin/, /api/, and /private/. This selective approach maximizes AI visibility for your marketing content while maintaining security for internal tools and customer data.

If you need to protect certain areas while allowing AI access to content: If you want to go deeper, Why JavaScript Kills Your AI Visibility breaks this down step by step.

User-agent: GPTBot
Allow: /blog/
Allow: /products/
Allow: /about/
Disallow: /admin/
Disallow: /api/
Disallow: /private/

Common Mistakes

Four mistakes account for most robots.txt-related AI visibility failures: CMS platforms generating restrictive defaults, CDNs like Cloudflare blocking bot traffic in security settings, forgetting to allow ChatGPT-User alongside GPTBot, and over-aggressive server-level rate limiting that throttles legitimate AI crawl requests.

  • CMS defaults — WordPress and Next.js sometimes generate restrictive robots.txt files
  • CDN blocking — Cloudflare and other CDNs may block bot traffic by default in security settings
  • Forgetting ChatGPT-User — GPTBot handles training data, ChatGPT-User handles live browsing. You need both
  • Rate limiting — Some server configs throttle bot requests too aggressively

How Quickly Do Changes Take Effect?

Expect AI crawlers to detect your robots.txt updates within 24-48 hours, with citation improvements appearing 1-2 weeks later as crawlers re-index your content. This timeline means robots.txt fixes are among the fastest GEO wins — a 5-minute change today can unlock AI citations within two weeks.

AI crawlers typically discover robots.txt changes within 24-48 hours. Citation improvements usually appear within 1-2 weeks after crawlers re-index your content. (We explore this further in How to Write Answer Units — Paragraphs AI Can Quote.)

FAQ

These are the most frequently asked questions about configuring robots.txt for AI crawlers, covering SEO impact, selective blocking, and deployment logistics.

Will allowing AI crawlers hurt my SEO?

No. AI crawler access has no impact on Google rankings. These are separate systems.

Should I block any AI crawlers?

Only if you have specific licensing or legal concerns. For maximum AI visibility, allow all crawlers.

Do I need to restart my server after changing robots.txt?

No. robots.txt is a static file served by your web server. Changes are immediate.


How to Test Your robots.txt Configuration

Always test your robots.txt before deploying to production. Use Google’s robots.txt tester in Search Console for syntax validation, then verify manually with curl -I to confirm proper HTTP status codes and Content-Type headers. A misconfigured file can accidentally block all crawlers or expose sensitive directories.

Testing your robots.txt before deploying changes is critical. A misconfigured file can accidentally block all crawlers or expose sensitive directories.

Google’s robots.txt Tester

Google Search Console includes a robots.txt testing tool that validates syntax and shows which URLs are blocked or allowed. While this only tests Googlebot rules directly, the syntax applies to all crawlers.

Steps to test:

  1. Open Google Search Console for your domain
  2. Navigate to Settings → robots.txt
  3. Paste your robots.txt content
  4. Enter specific URLs to check if they are blocked or allowed
  5. Fix any issues before deploying

Manual Verification

After deploying your robots.txt changes, verify manually:

  1. Direct access test — Visit yourdomain.com/robots.txt in a browser to confirm the file is accessible and contains your intended rules
  2. HTTP response check — Use curl -I yourdomain.com/robots.txt to verify the server returns a 200 status code, not a 404 or 500
  3. Content-Type header — The response should have Content-Type: text/plain. Some servers misconfigure this as HTML, which can cause parsing issues for crawlers
  4. Encoding check — Ensure the file is UTF-8 encoded with no BOM (Byte Order Mark). Some text editors add invisible characters that can break parsing

Monitoring AI Crawler Access

After allowing AI crawlers, monitor your server logs to verify they are actually visiting your site. Look for these user-agent strings in your access logs:

User-Agent StringWhat It Means
GPTBot/1.0OpenAI is crawling for training data
ChatGPT-UserChatGPT is browsing your page in real-time for a user query
PerplexityBotPerplexity is indexing your content
Google-ExtendedGoogle is crawling for Gemini and AI Overview
ClaudeBotAnthropic is indexing for Claude

If you do not see any AI crawler activity within two weeks of updating your robots.txt, check for additional blocking mechanisms such as Cloudflare bot management rules, IP-based blocking, or server-level firewall rules that might be intercepting requests before they reach your robots.txt file.

Combining robots.txt with llms.txt

While robots.txt controls crawler access at the technical level, llms.txt is a newer standard that provides AI-specific guidance about your site. Think of robots.txt as the bouncer at the door and llms.txt as the welcome guide inside.

A complete GEO technical setup includes both files:

# robots.txt — Controls access
User-agent: GPTBot
Allow: /

# llms.txt — Provides context (separate file)
# Site: Your Company Name
# Description: What your site is about
# Key Pages: /product, /blog, /about

Together, these files ensure AI crawlers can both access your content and understand its structure and purpose. For a deeper dive into llms.txt implementation, see our guide on What Is llms.txt and Why Your Site Needs One.

Platform-Specific robots.txt Setup

WordPress, Next.js, Shopify, Squarespace, and Wix each handle robots.txt differently — ranging from full control (WordPress, static sites) to heavily restricted auto-generation (Wix, Shopify). Knowing your platform’s limitations is essential because some require workarounds or plugin-level configurations to allow AI crawlers.

Different platforms handle robots.txt differently, and some make it harder to customize than others.

WordPress: Edit via Yoast SEO plugin → Tools → File Editor, or place a physical robots.txt file in your root directory. Physical files override virtual ones generated by plugins.

Next.js / Astro / Static Sites: Place robots.txt in your public/ directory. It will be served automatically at the root URL.

Shopify: Shopify auto-generates robots.txt and restricts direct editing. Use the robots.txt.liquid theme file to add custom rules. Note that Shopify’s default robots.txt already blocks several paths that you cannot override.

Squarespace: Limited robots.txt control through Settings → SEO → robots.txt. You can add custom directives but cannot fully replace the auto-generated file.

Wix: Wix auto-manages robots.txt with very limited customization. You can add specific directives through the SEO dashboard, but full control requires contacting Wix support or migrating to a platform with more flexibility.

Frequently Asked Questions

Will allowing AI crawlers hurt my SEO?
No. AI crawler access has no impact on Google rankings. These are separate systems. Allowing GPTBot, PerplexityBot, and other AI crawlers only affects your visibility in AI-generated search results.
Should I block any AI crawlers?
Only if you have specific licensing or legal concerns about AI training data usage. For maximum AI visibility and citation potential, allow all major AI crawlers including GPTBot, ChatGPT-User, PerplexityBot, ClaudeBot, and Google-Extended.
How quickly do robots.txt changes take effect for AI crawlers?
AI crawlers typically discover robots.txt changes within 24-48 hours. However, citation improvements usually appear within 1-2 weeks after crawlers re-index your content with the new access permissions.
Do I need to restart my server after changing robots.txt?
No. robots.txt is a static file served by your web server. Changes are immediate — crawlers will see the updated rules on their next visit without any server restart required.
G

GEOClarity

Writing about Generative Engine Optimization, AI search, and the future of content visibility.

Related Posts

Get GEO insights in your inbox

AI search optimization strategies. No spam.