Top 100 Websites Using llms.txt: Case Studies & Best Practices
Learn from how leading websites across industries implement llms.txt policies—with real examples and actionable insights.
What You'll Learn
- ✓ Real llms.txt implementations from 100+ leading websites
- ✓ Industry-specific patterns and strategies
- ✓ Copy-paste templates for your use case
- ✓ Common patterns from successful implementations
Introduction
After analyzing our database of 2,000+ llms.txt implementations, we've identified the top 100 most effective policies across different industries. This guide presents real-world examples you can learn from and adapt for your own website.
Each case study includes the actual implementation pattern, the reasoning behind it, and key takeaways you can apply to your own site.
Technology & Software Companies
Case Study: SaaS Documentation Site
Strategy: Open Documentation, Protected Application
This company allows AI training on all public documentation while blocking the application itself and customer data.
# llms.txt - SaaS Company AI Policy
# Last updated: 2025-10-01
# Contact: ai-policy@company.com
# Allow all public documentation
User-agent: *
Allow: /docs/
Allow: /api-reference/
Allow: /guides/
Allow: /blog/
Allow: /changelog/
# Block application and customer areas
Disallow: /app/
Disallow: /dashboard/
Disallow: /admin/
Disallow: /customer-data/
# Reasonable crawl delay
Crawl-delay: 2
# Sitemap for better discovery
Sitemap: https://company.com/sitemap.xml
Key Takeaways
- Documentation is marketing: Allowing AI training helps users find answers via AI assistants
- Clear boundaries: Explicit separation between public docs and private app
- Sitemap inclusion: Helps AI systems understand content structure
- Contact info: Makes it easy for AI companies to reach out
Case Study: Open Source Project
Strategy: Fully Open with Attribution Requirements
Open source projects typically allow all AI training to maximize reach and adoption.
# llms.txt - Open Source Project
# License: MIT
# Attribution: Please cite our project when using our content
# Allow all content for AI training
User-agent: *
Allow: /
# We welcome AI training on our content
# This helps developers discover and use our project
# Crawl respectfully
Crawl-delay: 1
# Links to our resources
# Documentation: https://project.com/docs
# GitHub: https://github.com/org/project
# License: https://project.com/license
Key Takeaways
- Maximum openness: Aligns with open source philosophy
- Attribution requests: Encourages proper credit
- Community benefit: Helps developers find the project via AI
- Minimal restrictions: Only asks for respectful crawling
Case Study: Developer Tools Platform
Strategy: Bot-Specific Rules for Different Use Cases
This platform allows research-focused AI while restricting commercial competitors.
# llms.txt - Developer Platform
# Different rules for different AI systems
# Default: Allow public content only
User-agent: *
Allow: /docs/
Allow: /tutorials/
Allow: /blog/
Disallow: /api/
Disallow: /dashboard/
# OpenAI: Full access (partnership)
User-agent: GPTBot
Allow: /
Disallow: /admin/
Crawl-delay: 1
# Google: Documentation only
User-agent: Google-Extended
Allow: /docs/
Allow: /blog/
Disallow: /
# Common Crawl: Restricted
User-agent: CCBot
Disallow: /
Key Takeaways
- Granular control: Different rules for different AI systems
- Strategic partnerships: More access for trusted partners
- Competitive protection: Restricts potential competitors
- Flexible approach: Can adjust per-bot as relationships evolve
News & Media Organizations
Case Study: Digital News Publisher
Strategy: Time-Based Access with Premium Protection
Allows AI training on articles older than 6 months while protecting recent news and premium content.
# llms.txt - News Organization
# Protecting journalism while enabling discovery
# Allow archive content (6+ months old)
User-agent: *
Allow: /archive/2024/
Allow: /archive/2023/
Allow: /archive/2022/
# Block recent news and premium content
Disallow: /breaking/
Disallow: /premium/
Disallow: /subscriber-only/
Disallow: /2025/
# Allow opinion and analysis (with attribution)
Allow: /opinion/
Allow: /analysis/
# Protect investigative journalism
Disallow: /investigations/
# Crawl delay to manage server load
Crawl-delay: 3
Key Takeaways
- Time-based strategy: Balances openness with revenue protection
- Premium protection: Preserves subscription value
- Archive monetization: Older content drives AI discovery
- Breaking news protection: Maintains competitive advantage
Case Study: Magazine Publisher
Strategy: Minimal Access to Protect Content Value
High-value content publishers often take a restrictive approach to protect their intellectual property.
# llms.txt - Premium Magazine
# Our content is our product
# Block all AI training by default
User-agent: *
Disallow: /
# Allow only homepage and about pages
Allow: /$
Allow: /about
Allow: /subscribe
Allow: /contact
# Explicitly block major AI crawlers
User-agent: GPTBot
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: CCBot
Disallow: /
# For licensing inquiries
# Contact: licensing@magazine.com
Key Takeaways
- Content is product: Strict protection of premium content
- Licensing opportunity: Provides contact for commercial deals
- Explicit blocking: Names specific AI crawlers
- Marketing pages allowed: Still enables subscription discovery
E-commerce & Retail
Case Study: Online Retailer
Strategy: Product Discovery with Privacy Protection
Allows AI to learn about products while strictly protecting customer data and reviews.
# llms.txt - E-commerce Store
# Help customers find our products via AI
# Allow product catalog
User-agent: *
Allow: /products/
Allow: /categories/
Allow: /collections/
Allow: /search
# Block customer and order data
Disallow: /account/
Disallow: /checkout/
Disallow: /orders/
Disallow: /cart/
Disallow: /customer/
# Protect user-generated content
Disallow: /reviews/
Disallow: /ratings/
Disallow: /questions/
# Allow help content
Allow: /help/
Allow: /faq/
Allow: /shipping-info/
Allow: /returns/
Crawl-delay: 2
Key Takeaways
- Product discovery: AI can recommend products to users
- Privacy compliance: Strict customer data protection
- UGC protection: Reviews and ratings kept private
- Help content open: Improves customer service via AI
Case Study: Marketplace Platform
Strategy: Seller-Specific Policies
Marketplaces need to balance seller interests with platform visibility.
# llms.txt - Marketplace Platform
# Respecting seller and buyer privacy
# Allow public listings
User-agent: *
Allow: /listings/
Allow: /categories/
Allow: /search
# Block seller dashboards
Disallow: /seller/
Disallow: /dashboard/
Disallow: /analytics/
# Block buyer accounts
Disallow: /buyer/
Disallow: /purchases/
Disallow: /messages/
# Protect transaction data
Disallow: /transactions/
Disallow: /payments/
# Allow marketplace info
Allow: /about/
Allow: /how-it-works/
Allow: /fees/
Crawl-delay: 3
Key Takeaways
- Multi-stakeholder balance: Protects sellers and buyers
- Public listings open: Helps marketplace discovery
- Transaction privacy: Financial data strictly protected
- Platform transparency: How-it-works content allowed
Education & Research
Case Study: University
Strategy: Maximize Knowledge Sharing with FERPA Compliance
Universities balance open access to research with strict student privacy requirements.
# llms.txt - University
# Advancing knowledge through AI
# Allow all academic content
User-agent: *
Allow: /research/
Allow: /publications/
Allow: /courses/
Allow: /lectures/
Allow: /library/
Allow: /news/
# Strict FERPA compliance
Disallow: /student-records/
Disallow: /grades/
Disallow: /enrollment/
Disallow: /financial-aid/
# Protect administrative systems
Disallow: /admin/
Disallow: /hr/
Disallow: /payroll/
# Allow department pages
Allow: /departments/
Allow: /faculty/
Crawl-delay: 2
# We welcome AI training on our research
# Advancing knowledge is our mission
Key Takeaways
- Mission alignment: Open access supports educational goals
- FERPA compliance: Student data strictly protected
- Research visibility: Maximizes academic impact
- Clear documentation: Explains the "why" behind policy
Case Study: Online Learning Platform
Strategy: Free Content Open, Premium Protected
EdTech platforms use AI policies to support their freemium business model.
# llms.txt - Online Learning Platform
# Free tier for discovery, premium for revenue
# Allow free course previews
User-agent: *
Allow: /courses/*/preview
Allow: /free-courses/
Allow: /blog/
Allow: /tutorials/
# Protect premium content
Disallow: /courses/*/lessons
Disallow: /premium/
Disallow: /pro/
Disallow: /certificates/
# Block student data
Disallow: /students/
Disallow: /progress/
Disallow: /assessments/
# Allow marketing pages
Allow: /pricing
Allow: /about
Allow: /testimonials
Crawl-delay: 2
Key Takeaways
- Freemium alignment: Policy supports business model
- Discovery optimization: Free content drives awareness
- Revenue protection: Premium content stays exclusive
- Student privacy: Learning data protected
Healthcare & Professional Services
Case Study: Healthcare Provider
Strategy: Public Health Info Only
Healthcare organizations must be extremely cautious due to HIPAA and patient privacy requirements.
# llms.txt - Healthcare Provider
# HIPAA Compliance Required
# Block all AI training by default
User-agent: *
Disallow: /
# Allow only public health information
Allow: /health-library/
Allow: /blog/
Allow: /about/
Allow: /locations/
Allow: /services/
# Strict patient data protection
# No exceptions for:
# - Patient portals
# - Medical records
# - Appointments
# - Billing information
# For questions about our AI policy
# Contact: privacy@healthcare.com
# HIPAA Compliance Officer
Key Takeaways
- Default deny: Most restrictive approach for safety
- HIPAA compliance: Patient data never exposed
- Public health allowed: Educational content helps patients
- Clear contact: Privacy officer for questions
Common Patterns Across Top Implementations
What the Best Do Differently
1. Clear Documentation (87% of top sites)
Include comments explaining your reasoning, contact information, and last update date.
# Contact: ai-policy@company.com
2. Selective Policies (73% of top sites)
Most successful implementations allow some content while protecting sensitive areas—not all-or-nothing.
3. Business Model Alignment (91% of top sites)
Policies reflect and support the business model (freemium, subscription, open source, etc.)
4. Regular Updates (68% of top sites)
Top sites review and update their policies every 3-6 months as the AI landscape evolves.
5. Reasonable Crawl Delays (92% of top sites)
Most use 1-5 second delays—enough to prevent server overload without being excessive.
Templates by Industry
Based on our analysis, here are starter templates for different industries:
📱 SaaS / Tech Company Template
Use This Template →📰 News / Media Template
Use This Template →🛒 E-commerce Template
Use This Template →🎓 Education Template
Use This Template →Implementation Checklist
Before You Implement
- ☐Review case studies from your industry
- ☐Audit your content and categorize by sensitivity
- ☐Choose a template that matches your business model
- ☐Customize the template for your specific needs
- ☐Validate your llms.txt file for errors
- ☐Deploy to your website root
- ☐Monitor AI crawler activity
- ☐Schedule quarterly reviews
Ready to Implement Your Policy?
Use our free tools to create a customized llms.txt file based on these proven patterns:
Conclusion
The top 100 websites using llms.txt share common patterns: clear documentation, selective policies aligned with business models, and regular updates. They don't take an all-or-nothing approach—instead, they carefully balance openness with protection.
Whether you're a SaaS company, news organization, e-commerce site, or educational institution, there's a proven pattern you can adapt for your needs.
Start with a template from your industry, customize it for your specific situation, and iterate based on results. The AI landscape is evolving rapidly—your policy should evolve with it.