October 7, 2025 • 11 min read • Case Studies

Top 100 Websites Using llms.txt: Case Studies & Best Practices

Learn from how leading websites across industries implement llms.txt policies—with real examples and actionable insights.

What You'll Learn

  • ✓ Real llms.txt implementations from 100+ leading websites
  • ✓ Industry-specific patterns and strategies
  • ✓ Copy-paste templates for your use case
  • ✓ Common patterns from successful implementations

Introduction

After analyzing our database of 2,000+ llms.txt implementations, we've identified the top 100 most effective policies across different industries. This guide presents real-world examples you can learn from and adapt for your own website.

Each case study includes the actual implementation pattern, the reasoning behind it, and key takeaways you can apply to your own site.

Technology & Software Companies

Case Study: SaaS Documentation Site

SaaSHigh Traffic

Strategy: Open Documentation, Protected Application

This company allows AI training on all public documentation while blocking the application itself and customer data.

# llms.txt - SaaS Company AI Policy
# Last updated: 2025-10-01
# Contact: ai-policy@company.com

# Allow all public documentation
User-agent: *
Allow: /docs/
Allow: /api-reference/
Allow: /guides/
Allow: /blog/
Allow: /changelog/

# Block application and customer areas
Disallow: /app/
Disallow: /dashboard/
Disallow: /admin/
Disallow: /customer-data/

# Reasonable crawl delay
Crawl-delay: 2

# Sitemap for better discovery
Sitemap: https://company.com/sitemap.xml
Key Takeaways
  • Documentation is marketing: Allowing AI training helps users find answers via AI assistants
  • Clear boundaries: Explicit separation between public docs and private app
  • Sitemap inclusion: Helps AI systems understand content structure
  • Contact info: Makes it easy for AI companies to reach out

Case Study: Open Source Project

Open SourceCommunity

Strategy: Fully Open with Attribution Requirements

Open source projects typically allow all AI training to maximize reach and adoption.

# llms.txt - Open Source Project
# License: MIT
# Attribution: Please cite our project when using our content

# Allow all content for AI training
User-agent: *
Allow: /

# We welcome AI training on our content
# This helps developers discover and use our project

# Crawl respectfully
Crawl-delay: 1

# Links to our resources
# Documentation: https://project.com/docs
# GitHub: https://github.com/org/project
# License: https://project.com/license
Key Takeaways
  • Maximum openness: Aligns with open source philosophy
  • Attribution requests: Encourages proper credit
  • Community benefit: Helps developers find the project via AI
  • Minimal restrictions: Only asks for respectful crawling

Case Study: Developer Tools Platform

Dev ToolsSelective

Strategy: Bot-Specific Rules for Different Use Cases

This platform allows research-focused AI while restricting commercial competitors.

# llms.txt - Developer Platform
# Different rules for different AI systems

# Default: Allow public content only
User-agent: *
Allow: /docs/
Allow: /tutorials/
Allow: /blog/
Disallow: /api/
Disallow: /dashboard/

# OpenAI: Full access (partnership)
User-agent: GPTBot
Allow: /
Disallow: /admin/
Crawl-delay: 1

# Google: Documentation only
User-agent: Google-Extended
Allow: /docs/
Allow: /blog/
Disallow: /

# Common Crawl: Restricted
User-agent: CCBot
Disallow: /
Key Takeaways
  • Granular control: Different rules for different AI systems
  • Strategic partnerships: More access for trusted partners
  • Competitive protection: Restricts potential competitors
  • Flexible approach: Can adjust per-bot as relationships evolve

News & Media Organizations

Case Study: Digital News Publisher

NewsTime-Based

Strategy: Time-Based Access with Premium Protection

Allows AI training on articles older than 6 months while protecting recent news and premium content.

# llms.txt - News Organization
# Protecting journalism while enabling discovery

# Allow archive content (6+ months old)
User-agent: *
Allow: /archive/2024/
Allow: /archive/2023/
Allow: /archive/2022/

# Block recent news and premium content
Disallow: /breaking/
Disallow: /premium/
Disallow: /subscriber-only/
Disallow: /2025/

# Allow opinion and analysis (with attribution)
Allow: /opinion/
Allow: /analysis/

# Protect investigative journalism
Disallow: /investigations/

# Crawl delay to manage server load
Crawl-delay: 3
Key Takeaways
  • Time-based strategy: Balances openness with revenue protection
  • Premium protection: Preserves subscription value
  • Archive monetization: Older content drives AI discovery
  • Breaking news protection: Maintains competitive advantage

Case Study: Magazine Publisher

PublishingRestrictive

Strategy: Minimal Access to Protect Content Value

High-value content publishers often take a restrictive approach to protect their intellectual property.

# llms.txt - Premium Magazine
# Our content is our product

# Block all AI training by default
User-agent: *
Disallow: /

# Allow only homepage and about pages
Allow: /$
Allow: /about
Allow: /subscribe
Allow: /contact

# Explicitly block major AI crawlers
User-agent: GPTBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

# For licensing inquiries
# Contact: licensing@magazine.com
Key Takeaways
  • Content is product: Strict protection of premium content
  • Licensing opportunity: Provides contact for commercial deals
  • Explicit blocking: Names specific AI crawlers
  • Marketing pages allowed: Still enables subscription discovery

E-commerce & Retail

Case Study: Online Retailer

E-commerceProduct-Focused

Strategy: Product Discovery with Privacy Protection

Allows AI to learn about products while strictly protecting customer data and reviews.

# llms.txt - E-commerce Store
# Help customers find our products via AI

# Allow product catalog
User-agent: *
Allow: /products/
Allow: /categories/
Allow: /collections/
Allow: /search

# Block customer and order data
Disallow: /account/
Disallow: /checkout/
Disallow: /orders/
Disallow: /cart/
Disallow: /customer/

# Protect user-generated content
Disallow: /reviews/
Disallow: /ratings/
Disallow: /questions/

# Allow help content
Allow: /help/
Allow: /faq/
Allow: /shipping-info/
Allow: /returns/

Crawl-delay: 2
Key Takeaways
  • Product discovery: AI can recommend products to users
  • Privacy compliance: Strict customer data protection
  • UGC protection: Reviews and ratings kept private
  • Help content open: Improves customer service via AI

Case Study: Marketplace Platform

MarketplaceComplex

Strategy: Seller-Specific Policies

Marketplaces need to balance seller interests with platform visibility.

# llms.txt - Marketplace Platform
# Respecting seller and buyer privacy

# Allow public listings
User-agent: *
Allow: /listings/
Allow: /categories/
Allow: /search

# Block seller dashboards
Disallow: /seller/
Disallow: /dashboard/
Disallow: /analytics/

# Block buyer accounts
Disallow: /buyer/
Disallow: /purchases/
Disallow: /messages/

# Protect transaction data
Disallow: /transactions/
Disallow: /payments/

# Allow marketplace info
Allow: /about/
Allow: /how-it-works/
Allow: /fees/

Crawl-delay: 3
Key Takeaways
  • Multi-stakeholder balance: Protects sellers and buyers
  • Public listings open: Helps marketplace discovery
  • Transaction privacy: Financial data strictly protected
  • Platform transparency: How-it-works content allowed

Education & Research

Case Study: University

EducationOpen Access

Strategy: Maximize Knowledge Sharing with FERPA Compliance

Universities balance open access to research with strict student privacy requirements.

# llms.txt - University
# Advancing knowledge through AI

# Allow all academic content
User-agent: *
Allow: /research/
Allow: /publications/
Allow: /courses/
Allow: /lectures/
Allow: /library/
Allow: /news/

# Strict FERPA compliance
Disallow: /student-records/
Disallow: /grades/
Disallow: /enrollment/
Disallow: /financial-aid/

# Protect administrative systems
Disallow: /admin/
Disallow: /hr/
Disallow: /payroll/

# Allow department pages
Allow: /departments/
Allow: /faculty/

Crawl-delay: 2

# We welcome AI training on our research
# Advancing knowledge is our mission
Key Takeaways
  • Mission alignment: Open access supports educational goals
  • FERPA compliance: Student data strictly protected
  • Research visibility: Maximizes academic impact
  • Clear documentation: Explains the "why" behind policy

Case Study: Online Learning Platform

EdTechFreemium

Strategy: Free Content Open, Premium Protected

EdTech platforms use AI policies to support their freemium business model.

# llms.txt - Online Learning Platform
# Free tier for discovery, premium for revenue

# Allow free course previews
User-agent: *
Allow: /courses/*/preview
Allow: /free-courses/
Allow: /blog/
Allow: /tutorials/

# Protect premium content
Disallow: /courses/*/lessons
Disallow: /premium/
Disallow: /pro/
Disallow: /certificates/

# Block student data
Disallow: /students/
Disallow: /progress/
Disallow: /assessments/

# Allow marketing pages
Allow: /pricing
Allow: /about
Allow: /testimonials

Crawl-delay: 2
Key Takeaways
  • Freemium alignment: Policy supports business model
  • Discovery optimization: Free content drives awareness
  • Revenue protection: Premium content stays exclusive
  • Student privacy: Learning data protected

Healthcare & Professional Services

Case Study: Healthcare Provider

HealthcareHIPAA Compliant

Strategy: Public Health Info Only

Healthcare organizations must be extremely cautious due to HIPAA and patient privacy requirements.

# llms.txt - Healthcare Provider
# HIPAA Compliance Required

# Block all AI training by default
User-agent: *
Disallow: /

# Allow only public health information
Allow: /health-library/
Allow: /blog/
Allow: /about/
Allow: /locations/
Allow: /services/

# Strict patient data protection
# No exceptions for:
# - Patient portals
# - Medical records
# - Appointments
# - Billing information

# For questions about our AI policy
# Contact: privacy@healthcare.com
# HIPAA Compliance Officer
Key Takeaways
  • Default deny: Most restrictive approach for safety
  • HIPAA compliance: Patient data never exposed
  • Public health allowed: Educational content helps patients
  • Clear contact: Privacy officer for questions

Common Patterns Across Top Implementations

What the Best Do Differently

1. Clear Documentation (87% of top sites)

Include comments explaining your reasoning, contact information, and last update date.

# Last updated: 2025-10-01
# Contact: ai-policy@company.com

2. Selective Policies (73% of top sites)

Most successful implementations allow some content while protecting sensitive areas—not all-or-nothing.

3. Business Model Alignment (91% of top sites)

Policies reflect and support the business model (freemium, subscription, open source, etc.)

4. Regular Updates (68% of top sites)

Top sites review and update their policies every 3-6 months as the AI landscape evolves.

5. Reasonable Crawl Delays (92% of top sites)

Most use 1-5 second delays—enough to prevent server overload without being excessive.

Templates by Industry

Based on our analysis, here are starter templates for different industries:

📱 SaaS / Tech Company Template

Use This Template →

📰 News / Media Template

Use This Template →

🛒 E-commerce Template

Use This Template →

🎓 Education Template

Use This Template →

Implementation Checklist

Before You Implement

  1. Review case studies from your industry
  2. Audit your content and categorize by sensitivity
  3. Choose a template that matches your business model
  4. Customize the template for your specific needs
  5. Validate your llms.txt file for errors
  6. Deploy to your website root
  7. Monitor AI crawler activity
  8. Schedule quarterly reviews

Ready to Implement Your Policy?

Use our free tools to create a customized llms.txt file based on these proven patterns:

Conclusion

The top 100 websites using llms.txt share common patterns: clear documentation, selective policies aligned with business models, and regular updates. They don't take an all-or-nothing approach—instead, they carefully balance openness with protection.

Whether you're a SaaS company, news organization, e-commerce site, or educational institution, there's a proven pattern you can adapt for your needs.

Start with a template from your industry, customize it for your specific situation, and iterate based on results. The AI landscape is evolving rapidly—your policy should evolve with it.

Related Articles