Complete Guide to AI Bot User-Agents
The definitive reference for identifying, tracking, and managing 20+ AI crawlers visiting your website.
Quick Reference
This guide covers all major AI bot user-agents as of October 2025. Bookmark this page as your go-to reference for AI crawler identification.
Understanding User-Agent Strings
Every bot that visits your website identifies itself through a user-agent string. This string tells you what software is accessing your site, allowing you to track, analyze, and control AI crawler access.
User-Agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)AI bots typically include their name and a link to documentation in their user-agent string, making them identifiable in server logs and analytics tools.
Major AI Bot User-Agents
GPTBot
OpenAI • ChatGPT, GPT-4, GPT-3.5
GPTBot/1.0 (+https://openai.com/gptbot)Purpose: Training ChatGPT and GPT models
Respects llms.txt: ✅ Yes (94% compliance)
Documentation: openai.com/gptbot
Block in robots.txt: User-agent: GPTBot / Disallow: /
Claude-Web
Anthropic • Claude AI
Claude-Web/1.0 (+https://www.anthropic.com/bot)Purpose: Training Claude language models
Respects llms.txt: ✅ Yes (91% compliance)
Documentation: anthropic.com/bot
Block in robots.txt: User-agent: Claude-Web / Disallow: /
Google-Extended
Google • Gemini, Bard
Google-Extended/1.0 (+https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)Purpose: Training Gemini and improving AI products (separate from search indexing)
Respects llms.txt: ✅ Yes (89% compliance)
Documentation: Google Crawlers
Block in robots.txt: User-agent: Google-Extended / Disallow: /
CCBot
Common Crawl • Multiple AI Companies
CCBot/2.0 (https://commoncrawl.org/faq/)Purpose: Building web archive used by many AI companies for training
Respects llms.txt: ⚠️ Partial (67% compliance)
Documentation: commoncrawl.org
Block in robots.txt: User-agent: CCBot / Disallow: /
PerplexityBot
Perplexity AI • AI Search Engine
PerplexityBot/1.0 (+https://perplexity.ai/bot)Purpose: Real-time search and answer generation
Respects llms.txt: ✅ Yes
Documentation: perplexity.ai/bot
Block in robots.txt: User-agent: PerplexityBot / Disallow: /
Bytespider
ByteDance • TikTok AI
Bytespider/1.0 (+https://bytedance.com/)Purpose: Training AI models for TikTok and ByteDance products
Respects llms.txt: ⚠️ Unknown
Documentation: Limited public documentation
Block in robots.txt: User-agent: Bytespider / Disallow: /
Applebot-Extended
Apple • Apple Intelligence
Applebot-Extended/1.0 (+https://support.apple.com/en-us/119829)Purpose: Training Apple Intelligence and AI features
Respects llms.txt: ✅ Yes
Documentation: Apple Support
Block in robots.txt: User-agent: Applebot-Extended / Disallow: /
Additional AI Crawlers
Beyond the major players, numerous other AI bots crawl the web. Here's a comprehensive list:
Search & Answer Engines
YouBot- You.com AI searchDiffbot- Knowledge graph extractionOmgilibot- Omgili search crawlerFacebookBot- Meta AI training
Research & Academic
anthropic-ai- Anthropic researchcohere-ai- Cohere AI modelsAI2Bot- Allen Institute for AIScrapy- Research data collection
Commercial AI Services
ImagesiftBot- Image AI trainingAmazonbot- Amazon AI servicesKangaroo Bot- AI data collectionTimpibot- AI search indexing
Emerging AI Bots
ChatGPT-User- ChatGPT browsingClaudeBot- Claude web accessGrok-bot- X (Twitter) Grok AIMeta-ExternalAgent- Meta AI crawling
Detection Methods
Server-Side Detection (Recommended)
The most reliable method is checking user-agent strings in your server logs or application code:
Node.js / Express
app.use((req, res, next) => {
const userAgent = req.headers['user-agent'] || '';
const aiBot = detectAIBot(userAgent);
if (aiBot) {
console.log(`AI Bot detected: ${aiBot}`);
// Log to analytics, apply rate limiting, etc.
}
next();
});
function detectAIBot(userAgent) {
const aiBots = [
'GPTBot', 'Claude-Web', 'Google-Extended',
'CCBot', 'PerplexityBot', 'Bytespider',
'Applebot-Extended', 'anthropic-ai', 'cohere-ai'
];
for (const bot of aiBots) {
if (userAgent.includes(bot)) {
return bot;
}
}
return null;
}Python / Flask
from flask import request
AI_BOTS = [
'GPTBot', 'Claude-Web', 'Google-Extended',
'CCBot', 'PerplexityBot', 'Bytespider'
]
@app.before_request
def detect_ai_bot():
user_agent = request.headers.get('User-Agent', '')
for bot in AI_BOTS:
if bot in user_agent:
# Log detection
app.logger.info(f'AI Bot detected: {bot}')
# Add to analytics
track_ai_bot(bot)
breakPHP
<?php
$userAgent = $_SERVER['HTTP_USER_AGENT'] ?? '';
$aiBots = [
'GPTBot', 'Claude-Web', 'Google-Extended',
'CCBot', 'PerplexityBot', 'Bytespider'
];
foreach ($aiBots as $bot) {
if (strpos($userAgent, $bot) !== false) {
error_log("AI Bot detected: " . $bot);
// Track in analytics
trackAIBot($bot);
break;
}
}
?>Analytics Integration
Track AI bot visits in Google Analytics or your analytics platform:
// Google Analytics 4
gtag('event', 'ai_bot_visit', {
'bot_name': botName,
'page_path': window.location.pathname,
'timestamp': new Date().toISOString()
});
// Custom Analytics
analytics.track('AI Bot Visit', {
botName: botName,
userAgent: navigator.userAgent,
page: window.location.href
});Blocking Strategies
Method 1: robots.txt (Simple)
Block all AI bots at once in your robots.txt file:
# Block major AI training bots
User-agent: GPTBot
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: Applebot-Extended
Disallow: /Method 2: llms.txt (Granular Control)
Use llms.txt for selective policies:
# llms.txt - Selective AI Policy
# Allow blog content
User-agent: *
Allow: /blog/
Allow: /docs/
# Block everything else
Disallow: /admin/
Disallow: /user/
Disallow: /premium/
# Specific rules for GPTBot
User-agent: GPTBot
Allow: /
Disallow: /private/
Crawl-delay: 2Method 3: Server-Level Blocking
Block at the server level for guaranteed enforcement:
# Apache .htaccess
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (GPTBot|Claude-Web|CCBot) [NC]
RewriteRule .* - [F,L]
# Nginx
if ($http_user_agent ~* (GPTBot|Claude-Web|CCBot)) {
return 403;
}Monitoring AI Bot Activity
Server Log Analysis
Analyze your server logs to see which AI bots are visiting:
# Count AI bot visits in Apache/Nginx logs
grep -E "(GPTBot|Claude-Web|Google-Extended|CCBot)" access.log | wc -l
# See which bots visited
grep -E "(GPTBot|Claude-Web|Google-Extended|CCBot)" access.log | \
awk '{print $1, $12}' | sort | uniq -c
# Track by date
grep "GPTBot" access.log | awk '{print $4}' | cut -d: -f1 | \
sort | uniq -cReal-Time Tracking
Use our free AI bot tracker for real-time monitoring:
Free AI Bot Tracker
See which AI bots visit your site in real-time with our invisible tracking widget. Tracks 20+ AI crawlers automatically.
Get Free Tracker →Best Practices
✅ DO: Monitor Before Blocking
Track AI bot activity for 2-4 weeks before implementing blocking policies. Understand which bots visit and how often.
✅ DO: Use llms.txt for Granular Control
Implement selective policies that allow public content while protecting sensitive areas.
✅ DO: Document Your Policy
Include comments in your llms.txt explaining your reasoning and contact information.
❌ DON'T: Block Without Understanding Impact
Blocking all AI bots may reduce your visibility in AI-powered search results.
❌ DON'T: Forget to Update
New AI bots emerge regularly. Review and update your policies quarterly.
Quick Reference Table
| Bot Name | Company | Respects llms.txt | Traffic Level |
|---|---|---|---|
| GPTBot | OpenAI | ✅ 94% | High |
| Claude-Web | Anthropic | ✅ 91% | High |
| Google-Extended | ✅ 89% | High | |
| CCBot | Common Crawl | ⚠️ 67% | Very High |
| PerplexityBot | Perplexity | ✅ Yes | Medium |
| Bytespider | ByteDance | ❓ Unknown | High |
Manage AI Bots Effectively
Use our free tools to detect, track, and control AI crawler access to your website:
📚Related Articles
Introducing AI Bot Analytics: Track Which AI Models Visit Your Website
See which AI bots visit your website with our new free bot tracker.
AI Crawlers Guide
Everything you need to know about AI crawlers and how they work.
How to Install Bot Tracker
Step-by-step guide to installing our AI bot tracker on your website.
