Skip to main content

What is robots.txt?

The robots.txt file tells search engines and AI crawlers which parts of your site they can access and index. Optimizing this file is crucial for AI search visibility. AI platforms like ChatGPT, Perplexity, and Claude use web crawlers to discover and index content. A properly configured robots.txt ensures:
  • ✅ AI crawlers can access your important content
  • ✅ You control what gets indexed
  • ✅ You prevent crawling of sensitive or duplicate content
  • ✅ You optimize crawl budget for high-value pages

AI Crawler User Agents

Here are the main AI crawler user agents you should allow:
# OpenAI (ChatGPT)
User-agent: GPTBot
Allow: /

# Anthropic (Claude)
User-agent: Claude-Web
Allow: /

# Google (Gemini/Bard)
User-agent: Google-Extended
Allow: /

# Perplexity
User-agent: PerplexityBot
Allow: /

# Common Good Crawlers (AI training data)
User-agent: CCBot
Allow: /

Best Practices

Ensure AI bots can access your key pages:
User-agent: GPTBot
Allow: /blog/
Allow: /products/
Allow: /docs/
Disallow: /admin/
Prevent AI from indexing private or duplicate content:
User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /api/
Disallow: /*?
Help crawlers find all your content:
Sitemap: https://asklantern.com/sitemap.xml
Direct crawlers to your most important pages first:
# High priority content
User-agent: *
Allow: /blog/
Allow: /products/

# Lower priority
Crawl-delay: 10
Here’s a complete example optimized for AI search visibility:
# Allow all major AI crawlers
User-agent: GPTBot
User-agent: Claude-Web
User-agent: Google-Extended
User-agent: PerplexityBot
User-agent: CCBot
Allow: /
Disallow: /admin/
Disallow: /private/

# General crawlers
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /private/
Disallow: /api/
Disallow: /*?filter=
Disallow: /*?utm_

# Sitemaps
Sitemap: https://asklantern.com/sitemap.xml
Sitemap: https://asklantern.com/blog-sitemap.xml

Testing Your robots.txt

1

Check syntax

Use Google’s robots.txt Tester to validate syntax
2

Test with Lantern

Use Lantern’s crawler audit tool to see how AI bots interpret your robots.txt
3

Monitor in dashboard

Track which AI crawlers are accessing your site in your Lantern dashboard
Important: Changes to robots.txt can take days or weeks to fully propagate. Monitor your Lantern dashboard to see when AI platforms recognize the changes.

Common Mistakes to Avoid

Blocking AI crawlers

Don’t accidentally block AI bots with overly restrictive rules

No sitemap reference

Always include your sitemap URL to help crawlers discover content

Blocking important pages

Ensure your best content is accessible to AI crawlers

Syntax errors

Test your robots.txt file - syntax errors can block all crawlers

Next Steps

Configure your sitemap

Learn how to create an optimized sitemap for AI discovery