The Spawning Guide to Rights Reservations
How to register rights reservations through Spawning wherever your work appears on the web
Spawning’s opt-outs are designed to meet the specifications for rights reservations laid out by the EU in the 2019 directive on Copyright in the Digital Single Market (CDSM) Article 4(3) and in the forthcoming AI Act.
Our goal is to minimize the cost and effort of rights reservations for both rights holders and model trainers. We want to put model trainers in a position to succeed in their efforts to respect rights reservations. We also believe that the onus cannot lie on rights holders to register rights reservations separately for each model trainer. That means consolidating multiple machine-readable methods into a single package.
When a rights holder registers an opt-out through Spawning, they are making a reservation of rights expressing that a piece of media should not be used for AI training. We currently support all forms of media that have a specific URL, including videos, music, images, pdfs and other files. The media URL itself is added to Spawning’s Do Not Train Registry (DNTR) — a central repository of rights reservations that are respected by major model trainers and AI groups, including Stability AI and Hugging Face, as well as anyone using Spawning’s Data Diligence package.
You can learn more about the technical and practical constraints that guide our approach in our recent post. There, we look at how and when data scraping happens, and what that means for ensuring model trainers have up-to-date and comprehensive rights reservations at the time of scraping.
If you are a rights holder and interested in registering a rights reservation, read on. We hope this guide answers all your questions. If it doesn’t, let us know, and we’ll do our best to address your specific situation.
Expressing rights reservations with Spawning
We recommend a 3-step process to get the most thorough coverage:
What’s the difference between these steps?
Do I need to do all three?
How far you want to go in this process is up to you. If you’re a musician, you may want to skip HaveIBeenTrained.com’s image search in step 2. For some rights holders or large image-hosting sites, the domain-level rights reservations in Step 1 may be enough. This is the avenue that Shutterstock took. They added over 444 million media URLs by 1 million creators to Spawning’s DNTR, and 200,000 new images are automatically added every day as they are uploaded to their site. We encourage large rights holder groups to email us to discuss novel solutions for highly specific needs.
Due to how AI training data is collected from the web, we tend to think about rights reservations in two ways: at the domain level and at the level of individual media items. It’s currently necessary to address both cases to express your rights reservations in a way that is visible during data scraping. The steps above give the best coverage, most easily, across both these cases.
Domain-level registration
HaveIBeenTrained.com's domain search makes it as easy as possible to register rights reservations for all of your media files in one go. If your digital portfolio includes hundreds of images, you don’t need to right-click to opt-out hundreds of times, simply verify that you are the domain owner with an email. Start by searching for and selecting your domain, you’ll be prompted to log in or sign up to a Spawning account and claim your domain. Registering your domain also means that any new media files added to your domain are automatically added to the DNTR without additional effort on your part, whether or not they have been included in a training dataset yet.
If you and only you host your content, then you’re done. But what if you don’t host all of your media? This is the situation many rights holders find themselves in. Their media files are found on Reddit, Twitter, or Instagram. Or, someone has copied their image URL and added it to their own blog. Unfortunately, domain-level solutions, while convenient, don’t address these cases in the context of how model trainers currently scrape the web for training data. To be read at the time of scraping, these rights reservations have to be specific to the media items themselves.
Item-specific registration
To locate and register rights reservations for content hosted by others across the web, we recommend starting with HaveIBeenTrained.com's image search because it is the most effective tool out there to find and opt out numerous images at one time. If you’ve already registered an image, either through your domain or a previous experience with Have I Been Trained?, the image won’t appear, so any results you get are images that are not included in the DNTR.
HaveIBeenTrained.com’s image search specifically looks at images that are already included in the LAION-5B dataset. The tool was originally built in 2022 to shed light on the dataset and raise awareness among artists, many of whom did not yet realize their work had already been used for AI training. With the addition of the opt-out feature, Have I Been Trained? became a tool not only for exploration but also for expressing rights reservations. However, because the search is limited the LAION-5B dataset, it is limited in its scope.
Spawning's browser extension was made to allow for more complete coverage. You can use the browser extension to register any type of media file hosted anywhere, whether or not it is included in the LAION-5B dataset or any other dataset. We recommend doing a sweep of large image-hosting sites where you or others may have posted your work, such as Instagram, Pinterest, or Reddit, and registering rights reservations for your media through the browser extension.
Additional options
The tool suite above should have you covered, regardless of whether you’re registering images, music, or video; regardless of whether your content is new to the web or already in major training datasets; and regardless of whether you have control of the hosting domain or not. But there are still a couple more things you can do, if you feel the need.
Ai.txt is an alternative to registering a domain. If you’re more technically inclined, and it’s easier for you to modify a root file than to verify your domain, you can simply add the ai.txt file to your site. We also have tutorials on deploying to Wordpress, Squarespace, Shopify, and other large website platforms to make the process easier. Anyone using the Spawning API to respect rights reservations will read the file and omit the content on your site from training. Model trainers who don’t wish to use the Spawning API can also make the choice to read and respect the ai.txt file. Additionally, the ai.txt file offers the flexibility to set different permissions for each type of media found on your site, so you could, for example, reserve rights for images but not audio or vice versa.
Kudurru is an active defense network. We created Kudurru because opt-outs are treated as voluntary in most jurisdictions. Kudurru puts the control in the hands of rights holders, who can choose to refuse to serve media to requests from IPs that are currently demonstrating a pattern of scraping behavior. While Kudurru is still an option, we hope that it will not be a necessary one as AI trainers begin to comply with the CDSM’s TDM copyright exceptions and the AI Act.
Limitations
Enforcement
Spawning cannot force model trainers to respect rights reservations. Individual jurisdictions need to establish and enforce their requirements for generative AI model training copyright exceptions and limitations. Based on these requirements, AI trainers will respond, either through compliance for access to specific markets, such as the EU, or moving to markets with fewer restrictions.
Prior training
Registering a rights reservation today does not undo the previous use of a work for model training. There is promising research aimed at removing concepts from generative ai models. To date, we're not aware of any commercial AI model trainers adopting similar approaches. If they do, we will make our DNTR available to them to remove the artists who have already opted out.
Terms of Service agreements
Some websites include in their terms of service that posting media on the site grants permission for that media to be used by the website host, including for generative AI model training or for sale to another company. In this case, an image in the DNTR would not be downloaded by Stability AI for model training, but the hosting website might use the media to train their own LLM or sell it to someone else for some other purpose.
Benefits of the Do Not Train Registry
We believe that the DNTR, in concert with Spawning’s Data Diligence package for developers, provides the best protection against model training available. The system is designed to put the most up-to-date machine-readable rights reservations in front of model trainers at the time of download, regardless of where the media lives on the web.
Adding your work to the DNTR prevents your work from being used to train specific new models. Stability AI and Hugging Face, as well as anyone using Spawning’s Data Diligence package, have chosen to respect rights reservations made through Spawning’s DNTR. Media in the DNTR were excluded from the training of Stable Diffusion V3, providing a tangible benefit to those who didn’t want their work used for AI training.
Additionally, the rights reservations made in the DNTR are designed to meet the specifications laid out by the EU in the CDSM Article 4(3). This includes machine-readability as well as a number of factors designed to make it easy for AI model trainers to integrate these rights reservations into their workflows and remain in compliance with the EU’s requirements.
The CDSM introduced the EU’s text and data mining (TDM) copyright exceptions, which include a requirement that TDM done outside of research and cultural institutions respect rights holders’ reservations of rights. The EU’s forthcoming AI Act reiterates the validity of these rights reservations and clarifies that they apply to generative AI training and must be respected by anyone who wishes to participate in the EU market, regardless of where the data collection or training is done.
We believe that this will be a compelling reason for generative AI model trainers to adhere to the EU’s requirements as well as influence similar requirements elsewhere. With the EU’s GDPR requirements, for example, what we have seen is that cookie consent forms have become mostly ubiquitous as companies move toward adherence to a single set of requirements.
What about rights reservations not registered through Spawning?
It is necessary to respect all forms of machine-readable rights reservations in order to maintain compliance with the EU’s TDM copyright exceptions and upcoming AI Act. Spawning facilitates this through the Data Diligence package.
When a model trainer uses the Data Diligence package, each piece of media is checked against the DNTR. Items included in the registry are not downloaded, saving the media-host on server costs. If a piece of media is not in the DNTR, it will be downloaded; however, the Data Diligence package will check for other machine-readable forms of rights reservations, for example HTTP headers, and if one is found, the media is excluded from the training data.
Stay tuned, we’ll be taking an in-depth look comparing some of these other machine-readable methods in a forthcoming post.