Generative AI Models Can Plagiarize Images With Simple Phrase Prompts

A recent study found hundreds of examples in which AI created trademark infringing images.

Guillaume/Getty Images

AI researchers studied whether generative AI models can plagiarize images.
They found certain visual models generated trademarked characters with brief or indirect prompts.
For example, the models produced almost exact images of Simpsons and Star Wars characters.

Generating a copyright lawsuit could be as easy as typing something akin to a game show prompt into an AI.

When researchers input the two-word prompt “videogame italian” into OpenAI’s Dall-E 3, the model returned recognizable pictures of Mario from the iconic Nintendo franchise, and the phrase “animated sponge” returned clear images of the hero of “Spongebob Squarepants.”

The results were part of a two-week investigation by AI researcher Gary Marcus and digital artist Reid Southen that found that AI models can produce “near replicas of trademarked characters” with a simple text prompt.

Marcus and Southen put two visual AI models to the test — Midjourney and Dall-E 3 — and found both capable of reproducing almost exact images from movies and video games even when the models are given brief and indirect prompts, according to a report published in IEEE Spectrum.

The researchers fed the prompt “popular 90’s animated cartoon with yellow skin” into Midjourney, and it reproduced recognizable images of characters from “The Simpsons.” At the same time, “black armor with light sword” produced a close likeness to characters from the Star Wars franchise.

Throughout their investigation, the researchers found hundreds of recognizable examples of animated and human characters from films and games.

The study comes amid growing concerns about generative AI models’ capacity for plagiarism. For example, a recent lawsuit The New York Times filed against OpenAI alleged that GPT-4 reproduced blocks from New York Times articles almost word for word.

The issue is that generative models are still “black boxes” in which the relationship between the inputs and outputs isn’t entirely clear to end users. Hence, according to the authors’ report, it’s hard to predict when a model will likely generate a plagiaristic response.

The implication for the end user is that if they don’t immediately recognize a trademarked image in the output of an AI model, they don’t have another way of verifying copyright infringement, the authors contended.

“In a generative AI system, the invited inference is that the creation is original artwork that the user is free to use. No manifest of how the artwork was created is supplied,” they wrote. On the other hand, when someone sources an image through Google search, they have more resources to determine the source — and whether it’s acceptable to use.

Currently, the burden of preventing copyright infringement falls on the artists or image owners. Dall-E 3 has an opt-out process for artists and image owners, but it’s so burdensome that one artist called it “enraging.” And visual artists have sued Midjourney for copyright infringement.

The authors suggested that AI models could simply remove copyrighted works from their training data, filter out problematic queries, or simply list the sources used to generate images. They said that AI models should only use properly licensed training data until someone comes up with a better solution to report the origin of images and filter out copyright violations.

Midjourney and OpenAI did not respond to a request for comment from Business Insider.