A Fireside Chat with the Creators of Public Diffusion
We talk limitations, goals, and how the model is uniquely designed for artists to fine-tune
Laura Exline: Hey Nick, hey Jordan, thanks for taking a little time away from training! Let’s dive right in. What is Public Diffusion?
Jordan Meyer: Public Diffusion is a foundation model, similar to Stable Diffusion but designed explicitly for fine tuning, that has been trained only using images marked as public domain or with a CC0, no-rights-reserved, license.
Prompt: "A bright purple and yellow bird standing on the ground surrounded by plants with flowers, with a blurred background." (img2img test)
Laura Exline: Why now? How long has Public Diffusion been in the works?
Nick Padgett: I would say since the beginning. It was the very first thing. Since the inception of the company, Public Diffusion was the idea. But there was no path to get there at the time. There was consent infrastructure that didn't exist that we had to build first. There was dataset curation infrastructure that didn't exist that we had to build first. There wasn’t a dataset of public domain images to work with. So we had to build everything from the ground up. Finally, back in January, Jordan and I got together in Atlanta, and we set out this long plan to get us to where we are now.
Laura Exline: Why do you think it's important to build this particular model?
Jordan Meyer: The models currently available to artists have issues with their training data. When we speak to art schools, there’s a real tension over generative AI. Most of their students say they want nothing to do with generative AI, but the other 20 percent say, "Hey, we're paying you an awful lot of money to go to your art school. Please teach us this thing that we're going to have to use when we get out." Right?
And that as a problem is a sort of microcosm of what’s happening with a lot of artists and organizations. They see the promise of AI but don't want to dive in when there are legal and ethical implications that could make that a problematic choice for them. We avoid the uncertainty and risk of these legal issues because we didn't scrape indiscriminately, and we filter out copyrighted and CC-BY images.
So, that’s part of the why . . . but also, I personally don’t want to use models in any sort of commercial context to create media that directly competes with the people who were in the training data against their will.
Laura Exline: How does the emphasis on public domain data fit with Spawning's goals?
Jordan Meyer: One of the main goals of Public Diffusion is that the artists who fine-tune it own the result because all of the training data has no rights reserved. Theirs is the only copyrighted work in the model weights that they fine-tune.
Laura Exline: What about the limitations of using only data from the public domain? Will Public Diffusion be able to make an image of a modern city, or neon lights, or pop art? How does that work?
Jordan Meyer: I see three main hurdles to making a great model with such a small dataset. The first hurdle is image quality. Common Canvas focused on Creative Commons images. I love it, but if you try it out, you’ll see that its resolution is limited and photographic prompts can look washy. Painting prompts tend to lack details. The first Public Diffusion candidate is still only partially trained, but I think it can already produce results with fantastic image quality—it’s capable of very detailed photography and painterly outputs.
Prompt: “A stunning view of Monument Valley, Utah, USA. The sky is filled with clouds and the ground is covered with plants and rocks. The sun is setting in the background, casting a warm orange glow over the landscape.”
Jordan Meyer: The next hurdle is style transfer. Can the model learn to communicate new artistic aesthetics and style markers with that high image quality? I think the answer is yes, and we’ll demonstrate that in the beta.
Finally, there’s the concept hurdle. We reviewed the dataset and found something like 10 corgis. That means there’s really no chance for Public Diffusion to understand a prompt asking for a corgi. This is the limitation that I spend the most time trying to solve. And we have a few ideas about how to do it.
Nick Padgett: Yeah, and we certainly don't see the absence of styles as a weakness of the model. It's purposeful—we don't want it to be able to leak modern styles into someone else's style plug-in. When someone fine tunes, it does become fully theirs instead of a little bit of theirs but mostly guided by a bunch of other modern artists.
Laura Exline: Is it possible to fine-tune on a style that the model has never seen? For example, anime. Will it be able to learn a fine-tune even if it doesn't have anything in the base model to pull from for its knowledge?
Jordan Meyer: Yes. The process of fine-tuning is very similar to the process of training. Anime styles should be possible to learn through fine-tuning, and one of the reasons we’re starting the beta this early in our process is to find that out.
Recent research on this topic suggests that by the time you have 600 images of a particular concept, you're able to saturate a model's understanding of it, even if it's never seen it before. So, a group of anime artists could assemble their work together, fine-tune a very high-quality anime model, and offer it on Source.Plus, and then users would know that that group of artists is receiving the bulk of the money when they subscribe to that model.
Can you say more about how style plug-ins might work on Source.Plus?
Jordan Meyer: Yeah, artists can create private collections with their own work. They can even augment it with work on Source.Plus from the public domain to take it in a new direction. Then, artists will be able to train a fine-tune with the images in their private collections, with Public Diffusion as the base model. We see artists setting their own price to offer that as a style plug-in. So, when you sign up on Source.Plus, you will be able to create your own style plug-ins or subscribe to the style plug-ins of your favorite artists, who will get the bulk of those subscription fees.
I also see it as a really great way for artists to engage with their fans. If I'm following an artist that I really love and I'm working with their model and making cool images, and then they pick up on one and retweet it and say that they love it, I'm now like a fan for life—this person who I really admire liked something that I made. So, it could be a very exciting thing for both the artists and the fans to engage via these media.
Laura Exline: Have you taken any other steps during training to make the model more friendly for artists?
Nick Padgett: As we were going through with Source.Plus, I saw a lot of images are rotated incorrectly because of EXIF parameters . . . so I fixed those, of course . . . but it sparked some ideas of how we could use EXIF to make the most out of our training data. And so we started feeding the model things like the camera lens, model and make, things like the aperture settings, whether the flash was on or off, those kinds of things. So we’ve given the model more information about what's a photograph and how to recreate photography settings so it could improve it in those areas.
Laura Exline: Do you expect that sort of information to translate into the prompting process? So, if I'm a photographer and it's really natural for me to think in terms of my camera settings, will I be able to say, “I want an aperture like this”?
Nick Padgett: Absolutely. There's all kinds of different visual phenomena that show up with different kinds of cameras. If you know those and you prompt by camera or by aperture, the output should mirror those same kind of phenomena.
Jordan Meyer: A really exciting aspect about the way that these models learn is that it has decoupled the aperture setting from the idea that it's a photograph, and it can apply it to other things. So when I prompt for a landscape painting, I can add f/16.0, and the model will still respond to that.
Laura Exline: That's cool. Is there a similar transference with medium tags?
Nick Padgett: So, we included all sorts of medium tags in the training data, and that makes our model more purposeful in following the kind of medium that you give it. So, you can ask for a prompt and also tell it it's an oil painting, or a digital photograph, or a watercolor, and the output follows that style much more closely than you would see from even some of the state-of-the-art models, which seem to default to a more singular style.
Prompt: “A watercolor painting of a stream in the woods, with trees, plants, flowers, and grass surrounding it. The sky is visible in the background, creating a peaceful atmosphere.”
Prompt: “An oil painting of a stream in the woods, with trees, plants, flowers, and grass surrounding it. The sky is visible in the background, creating a peaceful atmosphere.”
Laura Exline: That’s really cool! So, making that connection with the EXIF data, that’s a very specific catch that centers the perspectives of artists. What do you feel has allowed y'all to do that as part of the development process?
Nick Padgett: I mean, I was going to go to school for art originally before computer science. I've been creative my whole life, and so I think I’m always looking to have more purposeful control over outputs, instead of trying to prompt a model and then have to wrestle it to get something that's in my mind's eye. I want to be deliberate with the prompting, like with the lens and the medium, so I’m not left to the whims of the model's randomness to exercise my creativity.
Jordan Meyer: I think Nick is being modest. I've seen his oil paintings—they're fantastic. Pretty much everyone at Spawning has a creative background. I originally went to school for music and later did the photojournalism track at UNC Chapel Hill. I've worked professionally as a musician and apprenticed as a photographer.
I'm constantly evaluating the outputs of our models from that photographer’s perspective of what makes a good image and tweaking and trying to improve the model from there.
Laura Exline: Models have their own strengths and weaknesses, even unique styles. As more images emerge from Public Diffusion, what do you think is going to be its “thing”?
Jordan Meyer: It is not a generalist model like Stable Diffusion, so it’s much more limited in what it knows. But, when you prompt it for something that it is familiar with, I think it can perform as well as any model. I think some artists may even prefer it if they are looking for something more organic and because it avoids that AI look that we've all gotten somewhat used to.
But I think its “thing" will be fine-tuning. What we want Public Diffusion to be known for is that it enabled a whole class of users—who were uncomfortable with other methods—to engage with AI seriously and professionally and deeply, and to come up with new techniques in their own artistic practice that they're excited about. So, I think customizability, ownership, and meeting the ethical standards of a big group of people will be its thing.
Prompt: “A portrait by Vincent van Gogh. The painting depicts a man with a beard, wearing a blue coat, standing in front of a starry night sky.”
Laura Exline: That’s a pretty big thing! We've opened up sign-ups for the private beta. Can you share a little bit about what's next for beta testers?
Jordan Meyer: We just started speaking with the initial group of beta testers this week. It’s a very small group so that we can fine-tune with them manually and really dig in with the artists to understand their needs. We’ll use their early feedback to understand when it’s ready to open up to a wider beta audience.
We’ve had an incredible level of interest in the beta, and I’m extremely grateful to everyone who has signed up because this early beta is a crucial step towards our goal of releasing an open-weight model into the commons next year that everyone can fine-tune, own, and use commercially.
Laura Exline: That’s a good place to end! Thank you both!
Join the beta waitlist.