No headings were found on this page.

AI Firms Resist Compensating Content Creators for Copyrighted Material Used to Train AI Systems

AI, Insights, News

No headings were found on this page.

The use of copyrighted content like books, images, and videos to train artificial intelligence (AI) systems has sparked debates around proper compensation for creators. AI companies argue that transforming this data through machine learning (ML) algorithms makes the output distinct enough to not require licensing fees.

However, content creators counter that their work still provides the fundamental data foundation for AI tools to function. Though AI training data undergoes processing, original creative elements may remain woven through the models. As advanced AI proliferates, leveraging vast datasets for development, addressing content rights issues is crucial. Content creators reasonably seek fair payment for their intellectual property utilized by profitable AI services. But formulating precise compensation models remains a complex challenge. Overall, a balanced solution that respects creators’ rights while enabling AI progress requires nuanced negotiations and compromises between stakeholders. With careful cooperation, the tech industry can build an ethical AI future that equitably rewards content contributors.

In light of the above, a few months ago the US Copyright Office called out for public comments on new regulations for generative AI’s handling of copyrighted materials. From this, The Verge elicited detailed responses from leading AI entities such as Meta, Google, Microsoft, Adobe, Hugging Face, StabilityAI, and Anthropic. Apple, diverging slightly, submitted a comment concerning the copyright of AI-generated code.

While there are nuances in each company’s stance, a common thread is the resistance to paying for the use of copyrighted works to train AI models. The comment period, which started on August 30th, invited opinions until October 18th on whether AI can hold copyrights without human input, and the responsibilities regarding AI and copyright infringement. This discussion arises amid a wave of copyright lawsuits across various sectors including artists and tech firms.

Below — with credit to The Verge given — are selected excerpts from the responses provided by each company.

Meta:

Meta argues that compensating copyright holders for AI training data would be highly impractical at this stage, given the need to identify millions of rights holders when any individual payout would be minimal. However, content creators counter that even small royalties per work would collectively amount to reasonable compensation and that upholding copyright principles merits the administrative effort of licensing AI training datasets.

Google:

The core issue of copyright for Google would not arise if it were possible to train without generating copies. The process of “knowledge harvesting,” akin to the court’s analogy in Harper & Row of reading and assimilating the information from a book, is not just non-infringing but actually aligns with the objectives of copyright law. That technological constraints necessitate making copies to glean ideas and facts from copyrighted materials should not change this outcome.

Microsoft:

Mandating permission for the use of available works in AI training would hinder AI advancement, believes Microsoft. Obtaining the vast amount of data needed for the creation of responsible AI models is impractical, even when the works and their owners are identifiable. Moreover, licensing requirements would stifle start-ups and new market entrants who lack the means to secure such permissions, consequently confining AI development to a select few companies capable of managing extensive licensing operations, or to developers in nations where using copyrighted works for AI model training does not constitute infringement.

Anthropic:

Anthropic’s stance states that prudent policy has consistently acknowledged that to foster creativity and innovation, certain limitations on copyright are necessary. We are of the view that the current legal framework, along with ongoing cooperation among all parties involved, can balance the varied interests, thereby facilitating the advantages of AI while mitigating any concerns.

Apple:

Apple believes when a human developer dictates the expressive aspects of the output, making choices to alter, supplement, refine, or disregard the proposed code, the resulting code from such interactions with the tools should possess adequate human creativity to warrant copyright protection.

Apart from these big players, there were also responses from other important companies that are leveraging AI with Adobe referencing the case of Sega v. Accolade, where the Ninth Circuit deemed the intermediate copying of Sega’s software as fair use in the realm of reverse engineering to determine functional requirements for game compatibility with Sega consoles. This act, Adobe argued, served the public by fostering a broader spectrum of independently designed video games for Sega’s platform, an outcome that aligns with the objectives of the Copyright Act to bolster creative expression.

In concert with this, Andreessen Horowitz pointed to the immense investments funnelled into AI development over recent years — totalling billions — based on the understanding that current copyright law allows for the extraction of statistical facts through necessary copying. They stressed that a shift in this policy would unsettle established anticipations, undercutting the substantial private capital that has been instrumental in positioning the U.S. as a frontrunner in the global AI landscape. Such a reversal could pose risks not only to economic prosperity but also to national security.

Adding to the discussion, Hugging Face posited that training AI on copyrighted material should be considered fair use, as the intention is to generate distinctive AI models that do not supplant the original expressive content but rather are capable of producing a diverse array of outputs unrelated to the copyrighted material. They did, however, concede that this general stance could encounter more nuanced scenarios that would demand closer examination.

StabilityAI highlighted an international perspective, noting that several jurisdictions, including Singapore, Japan, the European Union, South Korea, Taiwan, Malaysia, and Israel, have amended their copyright laws to establish safeguards akin to fair use for AI training purposes. They referenced the UK, where recommendations from the Government Chief Scientific Advisor suggested the government should facilitate data mining to foster a thriving AI industry while maintaining existing copyright and intellectual property protections on the output of AI.

The AI industry is resisting compensating creators for copyrighted training data, despite growing calls for licensing fees. Major players argue practicality and fair use, while the US Copyright Office solicits input on new rules.

Stakes are high amid evolving laws worldwide. One thing is a guarantee, copyright violation is going to be a big story regarding AI in the years to come. Watch this space.

Featured image: Credit: Kevin Lockwood (KSL Graphics)