What is ai.txt?
Ai.txt is a file placed at the root of a website, which selectively restricts or permits access to the site’s content and media–mirroring the widely adopted robots.txt standard. Unlike robots.txt, which is commonly read when a website is crawled, ai.txt is read when a site’s media is downloaded. With ai.txt, website owners can control whether or not their work is used to train new AI models and can continue to use robots.txt to manage permissions for popular search engines.
Spawning is uniquely positioned to promote an ai.txt standard. Starting today, Spawning’s API will communicate the permissions set by ai.txt files to our growing network of AI researchers and partners, including Hugging Face and Stability AI.
Why ai.txt?
While robots.txt has been a useful tool for search engine permissions, it has significant limitations when it comes to the nuanced needs of current data mining practices.
Read at the right time
Consider this scenario: you found out your images were included in the LAION 5B dataset, and you promptly put a robots.txt file on your website to prevent future data scrapes. Unfortunately, links to your images still remain in the LAION 5B dataset, meaning anyone who uses that dataset in the future to train an AI model can still find and download your images.
An ai.txt file addresses this challenge because it’s checked when the links in LAION 5B are used to download your website’s images–allowing real-time adjustments to permissions.
Read from the right place
The widespread practice of embedding external links to content complicates the effectiveness of robots.txt. Even if your website’s robots.txt file is respected by a web crawler, links to your media could still be scraped from sites without a robots.txt and end up in a dataset.
An ai.txt file also provides a solution here by ensuring that AI models verify permissions from the site where they download the media.
Providing legal grounds
Lastly, many see robots.txt as an optional standard—making it insufficient as a protective measure against data mining. In contrast, ai.txt takes direct aim at the EU TDM Article 4 exception by explicitly providing a machine readable opt-out method for commercial text and data mining. This adherence to a recognized legal standard will help ensure compliance and will bolster ai.txt’s role as a reliable mechanism to honor the wishes of creators.
While AI developments aren’t confined to the US and Europe, and global legislation varies, the widespread adoption of similar standards (such as cookie consent forms) suggests a global trend toward increased data protection. An ai.txt standard, with its respect for data rights and its promotion of ethical AI development, aligns with this trend.
One of many tools for creators
Spawning views ai.txt not as a unilateral standard, but rather as a simple and useful way to declare permissions for text and data mining. As other conventions and standards emerge, we will continue to integrate them into our easy-to-use python package, which makes it effortless for model trainers to respect consent requests, regardless of how they’re made.
While ai.txt offers comprehensive protection for content hosted on your website (even when linked externally), it doesn’t extend to copies hosted on sites you don’t own or control. This is an issue that Spawning is currently tackling in other ways. With Have I Been Trained, you can search popular datasets for copies of your work hosted anywhere on the web and opt them out. We automatically opt out any exact duplicates of your work that appear elsewhere, and we have more extensive duplicate detection in the works.
Putting AI permissions in the hands of content creators
Advancements in AI shouldn’t come at the cost of control and protection of your creations. Configure your website’s ai.txt today.
We invite you to join us in the endeavor of shaping a global set of standards for consenting AI practices, contributing to a more secure, respectful, and innovative future. We’d love to hear from you! Send us an email at info@spawning.ai or reach out on Twitter.