Top 100 Websites Using llms.txt: Case Studies & Best Practices

Introduction

After analyzing our database of 2,000+ llms.txt implementations, we've identified the top 100 most effective policies across different industries. This guide presents real-world examples you can learn from and adapt for your own website.

Each case study includes the actual implementation pattern, the reasoning behind it, and key takeaways you can apply to your own site.

Technology & Software Companies

Case Study: SaaS Documentation Site

SaaSHigh Traffic

Strategy: Open Documentation, Protected Application

This company allows AI training on all public documentation while blocking the application itself and customer data.

# llms.txt - SaaS Company AI Policy
# Last updated: 2025-10-01
# Contact: ai-policy@company.com

# Allow all public documentation
User-agent: *
Allow: /docs/
Allow: /api-reference/
Allow: /guides/
Allow: /blog/
Allow: /changelog/

# Block application and customer areas
Disallow: /app/
Disallow: /dashboard/
Disallow: /admin/
Disallow: /customer-data/

# Reasonable crawl delay
Crawl-delay: 2

# Sitemap for better discovery
Sitemap: https://company.com/sitemap.xml

Key Takeaways

Documentation is marketing: Allowing AI training helps users find answers via AI assistants
Clear boundaries: Explicit separation between public docs and private app
Sitemap inclusion: Helps AI systems understand content structure
Contact info: Makes it easy for AI companies to reach out

Case Study: Open Source Project

Open SourceCommunity

Strategy: Fully Open with Attribution Requirements

Open source projects typically allow all AI training to maximize reach and adoption.

# llms.txt - Open Source Project
# License: MIT
# Attribution: Please cite our project when using our content

# Allow all content for AI training
User-agent: *
Allow: /

# We welcome AI training on our content
# This helps developers discover and use our project

# Crawl respectfully
Crawl-delay: 1

# Links to our resources
# Documentation: https://project.com/docs
# GitHub: https://github.com/org/project
# License: https://project.com/license

Key Takeaways

Maximum openness: Aligns with open source philosophy
Attribution requests: Encourages proper credit
Community benefit: Helps developers find the project via AI
Minimal restrictions: Only asks for respectful crawling

Case Study: Developer Tools Platform

Dev ToolsSelective

Strategy: Bot-Specific Rules for Different Use Cases

This platform allows research-focused AI while restricting commercial competitors.

# llms.txt - Developer Platform
# Different rules for different AI systems

# Default: Allow public content only
User-agent: *
Allow: /docs/
Allow: /tutorials/
Allow: /blog/
Disallow: /api/
Disallow: /dashboard/

# OpenAI: Full access (partnership)
User-agent: GPTBot
Allow: /
Disallow: /admin/
Crawl-delay: 1

# Google: Documentation only
User-agent: Google-Extended
Allow: /docs/
Allow: /blog/
Disallow: /

# Common Crawl: Restricted
User-agent: CCBot
Disallow: /

Key Takeaways

Granular control: Different rules for different AI systems
Strategic partnerships: More access for trusted partners
Competitive protection: Restricts potential competitors
Flexible approach: Can adjust per-bot as relationships evolve

News & Media Organizations

Case Study: Digital News Publisher

NewsTime-Based

Strategy: Time-Based Access with Premium Protection

Allows AI training on articles older than 6 months while protecting recent news and premium content.

# llms.txt - News Organization
# Protecting journalism while enabling discovery

# Allow archive content (6+ months old)
User-agent: *
Allow: /archive/2024/
Allow: /archive/2023/
Allow: /archive/2022/

# Block recent news and premium content
Disallow: /breaking/
Disallow: /premium/
Disallow: /subscriber-only/
Disallow: /2025/

# Allow opinion and analysis (with attribution)
Allow: /opinion/
Allow: /analysis/

# Protect investigative journalism
Disallow: /investigations/

# Crawl delay to manage server load
Crawl-delay: 3

Key Takeaways

Time-based strategy: Balances openness with revenue protection
Premium protection: Preserves subscription value
Archive monetization: Older content drives AI discovery
Breaking news protection: Maintains competitive advantage

Case Study: Magazine Publisher

PublishingRestrictive

Strategy: Minimal Access to Protect Content Value

High-value content publishers often take a restrictive approach to protect their intellectual property.

# llms.txt - Premium Magazine
# Our content is our product

# Block all AI training by default
User-agent: *
Disallow: /

# Allow only homepage and about pages
Allow: /$
Allow: /about
Allow: /subscribe
Allow: /contact

# Explicitly block major AI crawlers
User-agent: GPTBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

# For licensing inquiries
# Contact: licensing@magazine.com

Key Takeaways

Content is product: Strict protection of premium content
Licensing opportunity: Provides contact for commercial deals
Explicit blocking: Names specific AI crawlers
Marketing pages allowed: Still enables subscription discovery

E-commerce & Retail

Case Study: Online Retailer

E-commerceProduct-Focused

Strategy: Product Discovery with Privacy Protection

Allows AI to learn about products while strictly protecting customer data and reviews.

# llms.txt - E-commerce Store
# Help customers find our products via AI

# Allow product catalog
User-agent: *
Allow: /products/
Allow: /categories/
Allow: /collections/
Allow: /search

# Block customer and order data
Disallow: /account/
Disallow: /checkout/
Disallow: /orders/
Disallow: /cart/
Disallow: /customer/

# Protect user-generated content
Disallow: /reviews/
Disallow: /ratings/
Disallow: /questions/

# Allow help content
Allow: /help/
Allow: /faq/
Allow: /shipping-info/
Allow: /returns/

Crawl-delay: 2

Key Takeaways

Product discovery: AI can recommend products to users
Privacy compliance: Strict customer data protection
UGC protection: Reviews and ratings kept private
Help content open: Improves customer service via AI

Case Study: Marketplace Platform

MarketplaceComplex

Strategy: Seller-Specific Policies

Marketplaces need to balance seller interests with platform visibility.

# llms.txt - Marketplace Platform
# Respecting seller and buyer privacy

# Allow public listings
User-agent: *
Allow: /listings/
Allow: /categories/
Allow: /search

# Block seller dashboards
Disallow: /seller/
Disallow: /dashboard/
Disallow: /analytics/

# Block buyer accounts
Disallow: /buyer/
Disallow: /purchases/
Disallow: /messages/

# Protect transaction data
Disallow: /transactions/
Disallow: /payments/

# Allow marketplace info
Allow: /about/
Allow: /how-it-works/
Allow: /fees/

Crawl-delay: 3

Key Takeaways

Multi-stakeholder balance: Protects sellers and buyers
Public listings open: Helps marketplace discovery
Transaction privacy: Financial data strictly protected
Platform transparency: How-it-works content allowed

Education & Research

Case Study: University

EducationOpen Access

Strategy: Maximize Knowledge Sharing with FERPA Compliance

Universities balance open access to research with strict student privacy requirements.

# llms.txt - University
# Advancing knowledge through AI

# Allow all academic content
User-agent: *
Allow: /research/
Allow: /publications/
Allow: /courses/
Allow: /lectures/
Allow: /library/
Allow: /news/

# Strict FERPA compliance
Disallow: /student-records/
Disallow: /grades/
Disallow: /enrollment/
Disallow: /financial-aid/

# Protect administrative systems
Disallow: /admin/
Disallow: /hr/
Disallow: /payroll/

# Allow department pages
Allow: /departments/
Allow: /faculty/

Crawl-delay: 2

# We welcome AI training on our research
# Advancing knowledge is our mission

Key Takeaways

Mission alignment: Open access supports educational goals
FERPA compliance: Student data strictly protected
Research visibility: Maximizes academic impact
Clear documentation: Explains the "why" behind policy

Case Study: Online Learning Platform

EdTechFreemium

Strategy: Free Content Open, Premium Protected

EdTech platforms use AI policies to support their freemium business model.

# llms.txt - Online Learning Platform
# Free tier for discovery, premium for revenue

# Allow free course previews
User-agent: *
Allow: /courses/*/preview
Allow: /free-courses/
Allow: /blog/
Allow: /tutorials/

# Protect premium content
Disallow: /courses/*/lessons
Disallow: /premium/
Disallow: /pro/
Disallow: /certificates/

# Block student data
Disallow: /students/
Disallow: /progress/
Disallow: /assessments/

# Allow marketing pages
Allow: /pricing
Allow: /about
Allow: /testimonials

Crawl-delay: 2

Key Takeaways

Freemium alignment: Policy supports business model
Discovery optimization: Free content drives awareness
Revenue protection: Premium content stays exclusive
Student privacy: Learning data protected

Healthcare & Professional Services

Case Study: Healthcare Provider

HealthcareHIPAA Compliant

Strategy: Public Health Info Only

Healthcare organizations must be extremely cautious due to HIPAA and patient privacy requirements.

# llms.txt - Healthcare Provider
# HIPAA Compliance Required

# Block all AI training by default
User-agent: *
Disallow: /

# Allow only public health information
Allow: /health-library/
Allow: /blog/
Allow: /about/
Allow: /locations/
Allow: /services/

# Strict patient data protection
# No exceptions for:
# - Patient portals
# - Medical records
# - Appointments
# - Billing information

# For questions about our AI policy
# Contact: privacy@healthcare.com
# HIPAA Compliance Officer

Key Takeaways

Default deny: Most restrictive approach for safety
HIPAA compliance: Patient data never exposed
Public health allowed: Educational content helps patients
Clear contact: Privacy officer for questions

Common Patterns Across Top Implementations

What the Best Do Differently

1. Clear Documentation (87% of top sites)

Include comments explaining your reasoning, contact information, and last update date.

# Last updated: 2025-10-01
# Contact: ai-policy@company.com

2. Selective Policies (73% of top sites)

Most successful implementations allow some content while protecting sensitive areas—not all-or-nothing.

3. Business Model Alignment (91% of top sites)

Policies reflect and support the business model (freemium, subscription, open source, etc.)

4. Regular Updates (68% of top sites)

Top sites review and update their policies every 3-6 months as the AI landscape evolves.

5. Reasonable Crawl Delays (92% of top sites)

Most use 1-5 second delays—enough to prevent server overload without being excessive.

Templates by Industry

Based on our analysis, here are starter templates for different industries:

Implementation Checklist

Before You Implement

☐Review case studies from your industry
☐Audit your content and categorize by sensitivity
☐Choose a template that matches your business model
☐Customize the template for your specific needs
☐Validate your llms.txt file for errors
☐Deploy to your website root
☐Monitor AI crawler activity
☐Schedule quarterly reviews

Ready to Implement Your Policy?

Use our free tools to create a customized llms.txt file based on these proven patterns:

Create Your Policy Submit & Validate

Conclusion

The top 100 websites using llms.txt share common patterns: clear documentation, selective policies aligned with business models, and regular updates. They don't take an all-or-nothing approach—instead, they carefully balance openness with protection.

Whether you're a SaaS company, news organization, e-commerce site, or educational institution, there's a proven pattern you can adapt for your needs.

Start with a template from your industry, customize it for your specific situation, and iterate based on results. The AI landscape is evolving rapidly—your policy should evolve with it.

What You'll Learn

Introduction

Technology & Software Companies

Case Study: SaaS Documentation Site

Strategy: Open Documentation, Protected Application

Key Takeaways

Case Study: Open Source Project

Strategy: Fully Open with Attribution Requirements

Key Takeaways

Case Study: Developer Tools Platform

Strategy: Bot-Specific Rules for Different Use Cases

Key Takeaways

News & Media Organizations

Case Study: Digital News Publisher

Strategy: Time-Based Access with Premium Protection

Key Takeaways

Case Study: Magazine Publisher

Strategy: Minimal Access to Protect Content Value

Key Takeaways

E-commerce & Retail

Case Study: Online Retailer

Strategy: Product Discovery with Privacy Protection

Key Takeaways

Case Study: Marketplace Platform

Strategy: Seller-Specific Policies

Key Takeaways

Education & Research

Case Study: University

Strategy: Maximize Knowledge Sharing with FERPA Compliance

Key Takeaways

Case Study: Online Learning Platform

Strategy: Free Content Open, Premium Protected

Key Takeaways

Healthcare & Professional Services

Case Study: Healthcare Provider

Strategy: Public Health Info Only

Key Takeaways

Common Patterns Across Top Implementations

What the Best Do Differently

1. Clear Documentation (87% of top sites)

2. Selective Policies (73% of top sites)

3. Business Model Alignment (91% of top sites)

4. Regular Updates (68% of top sites)

5. Reasonable Crawl Delays (92% of top sites)

Templates by Industry

📱 SaaS / Tech Company Template

📰 News / Media Template

🛒 E-commerce Template

🎓 Education Template

Implementation Checklist

Before You Implement

Ready to Implement Your Policy?

Conclusion

📚Related Articles

LLMS.txt Adoption Report 2025

Common LLMS.txt Mistakes Analysis

What is llms.txt? The Complete Guide to AI Training Guidelines