TL;DR — Key Takeaways
- robots.txt is the #1 technical GEO fix — if AI crawlers are blocked, your site is completely invisible to ChatGPT, Perplexity, and all AI engines
- AI crawlers strictly obey robots.txt — unlike Google which may index through links, blocked AI bots will never see your content
- Allow at minimum: GPTBot, ChatGPT-User, PerplexityBot, Google-Extended, and ClaudeBot — these cover the major AI engines
- Watch for wildcard blocks —
User-agent: *withDisallow: /blocks everything including all AI crawlers - Changes take effect within 24-48 hours — citation improvements typically follow within 1-2 weeks after re-indexing
- Combine robots.txt with llms.txt — robots.txt controls access while llms.txt provides AI-specific context about your site structure
Your robots.txt file is the single most impactful GEO fix. If AI crawlers are blocked, your site is invisible to ChatGPT, Perplexity, and every other AI engine. This 5-minute change can unlock AI citations. If you want to go deeper, Page Speed & AI Crawlers: Does It Matter? breaks this down step by step.
Why robots.txt Matters for GEO
Your robots.txt file determines whether AI engines can access your content at all. Unlike Google, which may still index blocked pages through external links, AI crawlers like GPTBot and PerplexityBot strictly obey robots.txt — a single Disallow rule can make your entire site invisible to every AI search engine simultaneously.
AI engines use crawler bots to read your website. These bots check robots.txt before accessing any page. A single Disallow rule can make your entire site invisible to AI search. (We explore this further in GEO for Professional Services (2026).)
Unlike Google, which may still index blocked pages through links, AI crawlers strictly obey robots.txt. No access means zero citations. This relates closely to what we cover in Core Web Vitals Explained: LCP, INP, and CLS for SEO in 2026.
AI Crawler Bots You Must Allow
There are at least 10 major AI crawler bots active in 2026, each serving a different AI engine. The five essential ones to allow are GPTBot and ChatGPT-User (OpenAI), PerplexityBot (Perplexity), Google-Extended (Google AI/Gemini), and ClaudeBot (Anthropic). Missing even one means zero visibility on that platform.
| Bot | Engine | User-Agent |
|---|---|---|
| GPTBot | ChatGPT (training) | GPTBot |
| ChatGPT-User | ChatGPT (browsing) | ChatGPT-User |
| PerplexityBot | Perplexity | PerplexityBot |
| Google-Extended | Google AI / Gemini | Google-Extended |
| ClaudeBot | Claude / Anthropic | ClaudeBot |
| Amazonbot | Alexa / Amazon | Amazonbot |
| Applebot-Extended | Apple Intelligence | Applebot-Extended |
| Meta-ExternalAgent | Meta AI | Meta-ExternalAgent |
| cohere-ai | Cohere | cohere-ai |
| Bytespider | TikTok AI | Bytespider |
Recommended robots.txt Template
Copy this complete template into your robots.txt file to allow all major AI crawlers. It explicitly grants access to GPTBot, ChatGPT-User, PerplexityBot, Google-Extended, ClaudeBot, and five additional AI bots — ensuring maximum AI search visibility while maintaining standard search engine access.
Copy this directly into your robots.txt file: For more on this, see our guide to GEO for Agencies: AI Search as a Service.
## Allow all AI crawlers for GEO
User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Amazonbot
Allow: /
User-agent: Applebot-Extended
Allow: /
User-agent: Meta-ExternalAgent
Allow: /
User-agent: cohere-ai
Allow: /
## Standard search engines
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
## Sitemap
Sitemap: https://yourdomain.com/[XML sitemap](/blog/xml-sitemap-ai-crawlers).xml
How to Check Your Current Setup
Check your current robots.txt by visiting yourdomain.com/robots.txt directly in a browser. The most common problem is a wildcard User-agent: * with Disallow: / — this blocks everything including all AI crawlers. CMS defaults, CDN security rules, and hosting provider configurations frequently create these blocks without your knowledge.
Visit yourdomain.com/robots.txt in your browser. Look for any Disallow rules targeting AI bot user-agents. Common problematic blocks: Our What Is Answer Engine Optimization (AEO)? Complete Guide guide covers this in detail.
## REMOVE THESE — they block AI visibility
User-agent: GPTBot
Disallow: /
User-agent: *
Disallow: /
The wildcard User-agent: * with Disallow: / blocks everything including all AI crawlers. As we discuss in Free GEO Audit Tools for AI Visibility, this is a critical factor.
Selective Access
You can grant AI crawlers access to public content areas like /blog/ and /products/ while blocking sensitive directories like /admin/, /api/, and /private/. This selective approach maximizes AI visibility for your marketing content while maintaining security for internal tools and customer data.
If you need to protect certain areas while allowing AI access to content: If you want to go deeper, Why JavaScript Kills Your AI Visibility breaks this down step by step.
User-agent: GPTBot
Allow: /blog/
Allow: /products/
Allow: /about/
Disallow: /admin/
Disallow: /api/
Disallow: /private/
Common Mistakes
Four mistakes account for most robots.txt-related AI visibility failures: CMS platforms generating restrictive defaults, CDNs like Cloudflare blocking bot traffic in security settings, forgetting to allow ChatGPT-User alongside GPTBot, and over-aggressive server-level rate limiting that throttles legitimate AI crawl requests.
- CMS defaults — WordPress and Next.js sometimes generate restrictive robots.txt files
- CDN blocking — Cloudflare and other CDNs may block bot traffic by default in security settings
- Forgetting ChatGPT-User — GPTBot handles training data, ChatGPT-User handles live browsing. You need both
- Rate limiting — Some server configs throttle bot requests too aggressively
How Quickly Do Changes Take Effect?
Expect AI crawlers to detect your robots.txt updates within 24-48 hours, with citation improvements appearing 1-2 weeks later as crawlers re-index your content. This timeline means robots.txt fixes are among the fastest GEO wins — a 5-minute change today can unlock AI citations within two weeks.
AI crawlers typically discover robots.txt changes within 24-48 hours. Citation improvements usually appear within 1-2 weeks after crawlers re-index your content. (We explore this further in How to Write Answer Units — Paragraphs AI Can Quote.)
FAQ
These are the most frequently asked questions about configuring robots.txt for AI crawlers, covering SEO impact, selective blocking, and deployment logistics.
Will allowing AI crawlers hurt my SEO?
No. AI crawler access has no impact on Google rankings. These are separate systems.
Should I block any AI crawlers?
Only if you have specific licensing or legal concerns. For maximum AI visibility, allow all crawlers.
Do I need to restart my server after changing robots.txt?
No. robots.txt is a static file served by your web server. Changes are immediate.
How to Test Your robots.txt Configuration
Always test your robots.txt before deploying to production. Use Google’s robots.txt tester in Search Console for syntax validation, then verify manually with curl -I to confirm proper HTTP status codes and Content-Type headers. A misconfigured file can accidentally block all crawlers or expose sensitive directories.
Testing your robots.txt before deploying changes is critical. A misconfigured file can accidentally block all crawlers or expose sensitive directories.
Google’s robots.txt Tester
Google Search Console includes a robots.txt testing tool that validates syntax and shows which URLs are blocked or allowed. While this only tests Googlebot rules directly, the syntax applies to all crawlers.
Steps to test:
- Open Google Search Console for your domain
- Navigate to Settings → robots.txt
- Paste your robots.txt content
- Enter specific URLs to check if they are blocked or allowed
- Fix any issues before deploying
Manual Verification
After deploying your robots.txt changes, verify manually:
- Direct access test — Visit
yourdomain.com/robots.txtin a browser to confirm the file is accessible and contains your intended rules - HTTP response check — Use
curl -I yourdomain.com/robots.txtto verify the server returns a 200 status code, not a 404 or 500 - Content-Type header — The response should have
Content-Type: text/plain. Some servers misconfigure this as HTML, which can cause parsing issues for crawlers - Encoding check — Ensure the file is UTF-8 encoded with no BOM (Byte Order Mark). Some text editors add invisible characters that can break parsing
Monitoring AI Crawler Access
After allowing AI crawlers, monitor your server logs to verify they are actually visiting your site. Look for these user-agent strings in your access logs:
| User-Agent String | What It Means |
|---|---|
GPTBot/1.0 | OpenAI is crawling for training data |
ChatGPT-User | ChatGPT is browsing your page in real-time for a user query |
PerplexityBot | Perplexity is indexing your content |
Google-Extended | Google is crawling for Gemini and AI Overview |
ClaudeBot | Anthropic is indexing for Claude |
If you do not see any AI crawler activity within two weeks of updating your robots.txt, check for additional blocking mechanisms such as Cloudflare bot management rules, IP-based blocking, or server-level firewall rules that might be intercepting requests before they reach your robots.txt file.
Combining robots.txt with llms.txt
While robots.txt controls crawler access at the technical level, llms.txt is a newer standard that provides AI-specific guidance about your site. Think of robots.txt as the bouncer at the door and llms.txt as the welcome guide inside.
A complete GEO technical setup includes both files:
# robots.txt — Controls access
User-agent: GPTBot
Allow: /
# llms.txt — Provides context (separate file)
# Site: Your Company Name
# Description: What your site is about
# Key Pages: /product, /blog, /about
Together, these files ensure AI crawlers can both access your content and understand its structure and purpose. For a deeper dive into llms.txt implementation, see our guide on What Is llms.txt and Why Your Site Needs One.
Platform-Specific robots.txt Setup
WordPress, Next.js, Shopify, Squarespace, and Wix each handle robots.txt differently — ranging from full control (WordPress, static sites) to heavily restricted auto-generation (Wix, Shopify). Knowing your platform’s limitations is essential because some require workarounds or plugin-level configurations to allow AI crawlers.
Different platforms handle robots.txt differently, and some make it harder to customize than others.
WordPress: Edit via Yoast SEO plugin → Tools → File Editor, or place a physical robots.txt file in your root directory. Physical files override virtual ones generated by plugins.
Next.js / Astro / Static Sites: Place robots.txt in your public/ directory. It will be served automatically at the root URL.
Shopify: Shopify auto-generates robots.txt and restricts direct editing. Use the robots.txt.liquid theme file to add custom rules. Note that Shopify’s default robots.txt already blocks several paths that you cannot override.
Squarespace: Limited robots.txt control through Settings → SEO → robots.txt. You can add custom directives but cannot fully replace the auto-generated file.
Wix: Wix auto-manages robots.txt with very limited customization. You can add specific directives through the SEO dashboard, but full control requires contacting Wix support or migrating to a platform with more flexibility.