OpenAI, a leader in the artificial intelligence (AI) industry, is continuously reshaping the technological landscape. However, its journey hasn’t been without challenges. Recently, the company navigated significant events, such as the firing and subsequent reinstatement of its CEO, Sam Altman. These developments highlight the dynamic and sometimes tumultuous nature of the AI field.
A key development in OpenAI’s strategy involves forming partnerships with news publishers to use their stories for training AI models, as reported by The Information last week. The publication revealed that OpenAI is offering between $1 million and $5 million annually for licensing copyrighted news articles. This marks one of the first insights into the financial aspects of AI companies’ investment in licensed material. This approach mirrors Apple’s strategy, which reportedly includes plans to partner with media companies, offering at least $50 million over several years for similar purposes.
These numbers are comparable to some pre-AI licensing agreements. For instance, when Meta introduced the Facebook News tab (later discontinued in Europe), it reportedly proposed up to $3 million a year for news story licenses. However, these figures might not match the larger investments seen in the industry. Google, for example, announced a $1 billion investment in 2020 to partner with news organizations. Additionally, under new legislation, Google agreed to pay Canadian publishers $100 million annually for article links.
Traditionally, large language models like those developed by OpenAI have relied on internet-sourced data for training. While some AI models do not fully disclose their training data sources, information about the datasets or web crawlers used is often available. The cost of training datasets varies, with some providers like LAION offering open-source data for free, as seen in models like Stable Diffusion. AI developers also deploy web crawlers to gather internet data for training, although this involves additional costs for data vetting, tagging, and cleaning.
Recently, OpenAI’s GPT crawler faced access blocks from companies like The New York Times and Vox Media, presenting new challenges. Furthermore, several organizations claim that using their data for training constitutes copyright infringement. The New York Times, among others, has sued OpenAI and Microsoft, alleging that tools like ChatGPT and Microsoft’s Copilot can produce outputs nearly identical to their copyrighted material.
To circumvent these issues, AI companies are increasingly partnering with publishers. For example, Axel Springer, parent company of Politico and Business Insider, as well as The Associated Press, have signed agreements with OpenAI to license stories for training models like GPT-4 and to develop news-gathering technology.
OpenAI and Apple are not alone in their quest to collaborate with news organizations. Google demonstrated an AI tool named Genesis, capable of generating news stories, to executives from major publications like The New York Times, The Wall Street Journal, and The Washington Post. Meanwhile, some newsrooms have started experimenting with generative AI tools, yielding mixed results.
In summary, the evolving relationship between AI companies and news publishers is a testament to the growing influence and complexity of AI in the modern media landscape. As OpenAI continues to navigate this terrain, the industry watches closely to see how these partnerships will shape the future of news and AI development.