# llms.txt — Guidance for Large Language Models and AI Agents # Site: https://itif.org/ # Owner: Information Technology & Innovation Foundation (ITIF) # Last-Updated: 2025-09-28 # Spec-Note: There is no universal standard for llms.txt yet. This file is an explicit policy # manifest for AI crawlers, AI search engines, and LLM providers. It complements robots.txt. ############################################################################### # 0) CONTACT & OWNERSHIP ############################################################################### Organization: Information Technology & Innovation Foundation (ITIF) Address: 700 K Street NW, Suite 600, Washington, DC 20001, USA Contact-Email: mail@itif.org Contact-Form: https://itif.org/media-contacts/ # General/media inquiries Policy-Contact: mail@itif.org Copyright: © ITIF. All rights reserved unless otherwise noted. ############################################################################### # 1) SCOPE (WHAT THIS FILE APPLIES TO) ############################################################################### Scope: public web pages, articles, reports, blogs, podcasts, event pages, and metadata Scope-Exclusions: - Paywalled, members-only, or embargoed content (if any) - Third-party embeds (e.g., externally hosted media) subject to their own licenses Hierarchy: - robots.txt directives MUST be respected first for crawling/fetching access. - This llms.txt governs USE of fetched content by AI systems (training, indexing, summarizing, etc.). Robots-Txt: https://itif.org/robots.txt Sitemap: https://itif.org/sitemap/ # Human-readable sitemap page ############################################################################### # 2) ALLOWED USES (WITH CONDITIONS) ############################################################################### # AI systems MAY: Allow: - Non-persistent retrieval for on-demand question answering and summarization - Snippet generation (<= 240 characters) for search/preview - Indexing for retrieval-augmented generation (RAG) with the constraints below Conditions: - Clear Attribution required (see Section 5) - Source Linking required (canonical URL) - No storage beyond Cache-Policy without separate written permission - No transformation that implies ITIF endorsement of generated outputs ############################################################################### # 3) PROHIBITED USES (WITHOUT EXPLICIT LICENSE) ############################################################################### Disallow: - Model-Training: using content to train, fine-tune, distill, or evaluate ML/LLM models - Dataset-Creation: inclusion in datasets or embeddings distributed to third parties - Persistent Archiving: storage beyond Cache-Policy without written consent - Commercial-Redistribution: republishing substantial portions of content - Derivative-Works that could substitute for ITIF publications If-You-Need-These-Rights: Contact Policy-Contact for licensing. ############################################################################### # 4) CRAWLER / PRODUCT DECLARATIONS ############################################################################### # If your AI bot/product is not listed, you still MUST honor this policy. # The User-Agent names below are examples in common use as of 2025. [Agents] # Training-oriented/extended agents MUST treat “Disallow: Model-Training” as binding. User-Agent: GPTBot User-Agent: ChatGPT-User User-Agent: Google-Extended User-Agent: Applebot-Extended User-Agent: CCBot User-Agent: ClaudeBot User-Agent: anthropic-ai User-Agent: PerplexityBot User-Agent: Amazonbot User-Agent: FacebookBot User-Agent: Bytespider User-Agent: YouBot Compliance: required ############################################################################### # 5) ATTRIBUTION & LINKING ############################################################################### Attribution-Required: yes Attribution-Format: - “Source: Information Technology & Innovation Foundation (ITIF), ‘’” - Include the canonical URL Brand-Use: Do not imply endorsement or partnership. Use ITIF name only for factual credit. ############################################################################### # 6) RATE LIMITS & FETCH RULES ############################################################################### # These apply to AI crawlers and AI search engines (distinct from normal web browsers). Max-Requests: 1 request per second (site-wide, burst up to 5 rps for 10 seconds) Respect-Crawl-Delay: yes Honor-If-Modified-Since: yes Do-Not-Bypass: CDN, bot detection, or authentication gates ############################################################################### # 7) PRIVACY & SENSITIVE DATA ############################################################################### PII-Handling: Do not extract, aggregate, or republish personal contact info at scale. Email-Protection: Do not harvest or expose email addresses beyond what appears on-page for human readers; do not surface emails in AI answers unless the page explicitly presents them. ############################################################################### # 8) CACHE & RETENTION ############################################################################### Cache-Policy: - Transient cache up to 7 days for operational performance - Do not create long-term archives, vector DBs, or offline mirrors without license Refresh-Policy: - Revalidate on each significant query topic (e.g., weekly) to avoid stale summaries ############################################################################### # 9) FAIR USE & EXCEPTIONS ############################################################################### # This policy is not intended to restrict fair use quotations for commentary, research, or news # reporting. However, automated large-scale ingestion and synthesis are outside fair use. Fair-Use: short quotations and citations allowed with attribution (Section 5) ############################################################################### # 10) DMCA / TAKEDOWN / AUDIT ############################################################################### Takedown: - If ITIF notifies you of prohibited use, you must cease within 72 hours and purge derived data. Audit: - Upon written request, provide a description of how ITIF content is ingested and used. ############################################################################### # 11) EXAMPLES ############################################################################### Example-Allowed: - “What did ITIF say about transatlantic data flows?” → summarize a single article, cite ITIF, and link to the original page. Example-Disallowed: - Training your general-purpose LLM or building a proprietary embeddings dataset from bulk ITIF content. ############################################################################### # 12) VERSIONING ############################################################################### Policy-Version: 1.0 (2025-09-28) Change-Process: ITIF may update this file without notice. Continue to check latest version. Jurisdiction: United States (DC) Contact-For-Licensing: mail@itif.org