LLMS Central - The Robots.txt for AI

itif.org

Last updated: 10/31/2025valid

Independent Directory - Important Information

This llms.txt file was publicly accessible and retrieved from itif.org. LLMS Central does not claim ownership of this content and hosts it for informational purposes only to help AI systems discover and respect website policies.

This listing is not an endorsement by itif.org and they have not sponsored this page. We are an independent directory service with no affiliation to the listed domain.

Copyright & Terms: Users should respect the original terms of service of itif.org. If you believe there is a copyright or terms of service violation, please contact us at support@llmscentral.com for prompt removal. Domain owners can also claim their listing.

Current llms.txt Content

# llms.txt — Guidance for Large Language Models and AI Agents
# Site: https://itif.org/
# Owner: Information Technology & Innovation Foundation (ITIF)
# Last-Updated: 2025-09-28
# Spec-Note: There is no universal standard for llms.txt yet. This file is an explicit policy
# manifest for AI crawlers, AI search engines, and LLM providers. It complements robots.txt.

###############################################################################
# 0) CONTACT & OWNERSHIP
###############################################################################
Organization: Information Technology & Innovation Foundation (ITIF)
Address: 700 K Street NW, Suite 600, Washington, DC 20001, USA
Contact-Email: mail@itif.org
Contact-Form: https://itif.org/media-contacts/   # General/media inquiries
Policy-Contact: mail@itif.org
Copyright: © ITIF. All rights reserved unless otherwise noted.

###############################################################################
# 1) SCOPE (WHAT THIS FILE APPLIES TO)
###############################################################################
Scope: public web pages, articles, reports, blogs, podcasts, event pages, and metadata
Scope-Exclusions:
  - Paywalled, members-only, or embargoed content (if any)
  - Third-party embeds (e.g., externally hosted media) subject to their own licenses
Hierarchy:
  - robots.txt directives MUST be respected first for crawling/fetching access.
  - This llms.txt governs USE of fetched content by AI systems (training, indexing, summarizing, etc.).
Robots-Txt: https://itif.org/robots.txt
Sitemap: https://itif.org/sitemap/   # Human-readable sitemap page

###############################################################################
# 2) ALLOWED USES (WITH CONDITIONS)
###############################################################################
# AI systems MAY:
Allow:
  - Non-persistent retrieval for on-demand question answering and summarization
  - Snippet generation (<= 240 characters) for search/preview
  - Indexing for retrieval-augmented generation (RAG) with the constraints below
Conditions:
  - Clear Attribution required (see Section 5)
  - Source Linking required (canonical URL)
  - No storage beyond Cache-Policy without separate written permission
  - No transformation that implies ITIF endorsement of generated outputs

###############################################################################
# 3) PROHIBITED USES (WITHOUT EXPLICIT LICENSE)
###############################################################################
Disallow:
  - Model-Training: using content to train, fine-tune, distill, or evaluate ML/LLM models
  - Dataset-Creation: inclusion in datasets or embeddings distributed to third parties
  - Persistent Archiving: storage beyond Cache-Policy without written consent
  - Commercial-Redistribution: republishing substantial portions of content
  - Derivative-Works that could substitute for ITIF publications
If-You-Need-These-Rights: Contact Policy-Contact for licensing.

###############################################################################
# 4) CRAWLER / PRODUCT DECLARATIONS
###############################################################################
# If your AI bot/product is not listed, you still MUST honor this policy.
# The User-Agent names below are examples in common use as of 2025.

[Agents]
# Training-oriented/extended agents MUST treat “Disallow: Model-Training” as binding.
User-Agent: GPTBot
User-Agent: ChatGPT-User
User-Agent: Google-Extended
User-Agent: Applebot-Extended
User-Agent: CCBot
User-Agent: ClaudeBot
User-Agent: anthropic-ai
User-Agent: PerplexityBot
User-Agent: Amazonbot
User-Agent: FacebookBot
User-Agent: Bytespider
User-Agent: YouBot
Compliance: required

###############################################################################
# 5) ATTRIBUTION & LINKING
###############################################################################
Attribution-Required: yes
Attribution-Format:
  - “Source: Information Technology & Innovation Foundation (ITIF), ‘<Page/Report Title>’”
  - Include the canonical URL
Brand-Use: Do not imply endorsement or partnership. Use ITIF name only for factual credit.

###############################################################################
# 6) RATE LIMITS & FETCH RULES
###############################################################################
# These apply to AI crawlers and AI search engines (distinct from normal web browsers).
Max-Requests: 1 request per second (site-wide, burst up to 5 rps for 10 seconds)
Respect-Crawl-Delay: yes
Honor-If-Modified-Since: yes
Do-Not-Bypass: CDN, bot detection, or authentication gates

###############################################################################
# 7) PRIVACY & SENSITIVE DATA
###############################################################################
PII-Handling: Do not extract, aggregate, or republish personal contact info at scale.
Email-Protection: Do not harvest or expose email addresses beyond what appears on-page for
  human readers; do not surface emails in AI answers unless the page explicitly presents them.

###############################################################################
# 8) CACHE & RETENTION
###############################################################################
Cache-Policy:
  - Transient cache up to 7 days for operational performance
  - Do not create long-term archives, vector DBs, or offline mirrors without license
Refresh-Policy:
  - Revalidate on each significant query topic (e.g., weekly) to avoid stale summaries

###############################################################################
# 9) FAIR USE & EXCEPTIONS
###############################################################################
# This policy is not intended to restrict fair use quotations for commentary, research, or news
# reporting. However, automated large-scale ingestion and synthesis are outside fair use.
Fair-Use: short quotations and citations allowed with attribution (Section 5)

###############################################################################
# 10) DMCA / TAKEDOWN / AUDIT
###############################################################################
Takedown:
  - If ITIF notifies you of prohibited use, you must cease within 72 hours and purge derived data.
Audit:
  - Upon written request, provide a description of how ITIF content is ingested and used.

###############################################################################
# 11) EXAMPLES
###############################################################################
Example-Allowed:
  - “What did ITIF say about transatlantic data flows?” → summarize a single article,
    cite ITIF, and link to the original page.
Example-Disallowed:
  - Training your general-purpose LLM or building a proprietary embeddings dataset
    from bulk ITIF content.

###############################################################################
# 12) VERSIONING
###############################################################################
Policy-Version: 1.0 (2025-09-28)
Change-Process: ITIF may update this file without notice. Continue to check latest version.
Jurisdiction: United States (DC)
Contact-For-Licensing: mail@itif.org

Version History

Version 110/31/2025, 1:37:56 AMvalid
7266 bytes

Categories

blognewstechnology

Visit Website

Explore the original website and see their AI training policy in action.

Visit itif.org

Content Types

articlespages

Recent Access

No recent access

API Access

Canonical URL:
https://llmscentral.com/itif.org/llms.txt
API Endpoint:
/api/llms?domain=itif.org
itif.org - llms.txt File | AI Training Guidelines | LLMS Central