Unlock the Editor’s Digest for free
Roula Khalaf, Editor of the FT, selects her favourite stories in this weekly newsletter.
The simmering war between the tech and media industries over generative AI just turned serious.
For a technology that raises profound questions about the way things like text, images and music are produced and used, the legal challenges this year have been surprisingly few and far between. Several novelists, journalists and comedians have sued for copyright infringement over claims their work has been used to train the large language models, while Getty Images took on Stability AI over use of its picture library and Anthropic was sued over song lyrics.
Yet most major rights owners have held back, hoping to find ways to share in the spoils from the new technology rather than seek to thwart it. In the only two notable agreements between the tech and media worlds so far, AP allowed its archives to be used to train OpenAI’s models, while Axel Springer, owner of Politico, Die Welt and Business Insider, reached a broader deal with the same company earlier this month.
That makes the lawsuit the New York Times has just lodged against OpenAI and Microsoft an ominous sign of what lies in store in 2024. According to the Times, months of negotiation have failed to produce terms that protect the company’s rights and provide fair compensation.
The lawsuits over generative AI carry a strong echo of the early cases that established the legal basis for search engines. Then, the US courts ruled that it was “fair use” to index copyrighted content when this was used to create “transformative” new search services. The short snippets of text and “thumbnail” images displayed in search engines were also found not to be substitutes for the original content, limiting the damage search might have on the media companies’ businesses.
There are some important differences this time. In its lawsuit, the NYT showed how it coaxed OpenAI’s ChatGPT and Micrsosoft’s AI-powered Bing into producing extensive, verbatim quotes from its reporting.
Also, while search engines were designed to send traffic off to other websites, generative AI services such as ChatGPT answer questions directly, making them a more obvious substitute for the original source material. These greater legal risks should make AI companies hesitate about having their defence of “fair use” tested in front of a jury.
Yet there are also considerations that weigh in the other direction — starting with the fact that the risk of an unpredictable jury verdict cuts both ways. OpenAI will be able to point out that news publishers can easily block it from crawling their content if they don’t want it to be used for training its LLMs. That is something many publishers, including the NYT, have done this year.
In addition, generative AI threatens to commodify many types of information. Once it has trained its models on the content it gets from AP and Axel Springer, OpenAI will have less need of further news archives. This seriously limits the compensation that each publisher will be able to negotiate, as well as the number of bilateral deals the AI companies will be willing to reach.
All this makes a return to the negotiating table before a court showdown the most likely outcome. Generative AI promises to create big new markets for media content: the question, as always, is how the spoils should be shared.
The media companies hope to reap value from the technology directly, training AI models on their archives and summarising their news content to enhance their own services. But judging from the large audience ChatGPT generated in its first months, smart chatbots and other AI-powered services look set to become huge media sites themselves.
Axel Springer stands to make “tens of millions of euros” a year from its OpenAI agreement. For a transformative technology that could upend much of the media business, that may not be much. Even a payment of €40mn would still only add around 1 per cent to Springer’s revenue each year. In return, the news groups risk surrendering their audience to the AI companies. They could also see the value of their brands diluted if ChatGPT and its successors become the new oracles of the internet.
With generative AI still in its infancy, it is impossible to envisage exactly what new services it will lead to, or how valuable these will become. That, more than anything, makes it hard for media companies to agree terms that trade away their future rights. But as generative AI catches on with more internet users, the pressure to reach a deal will only increase.