Please wait while the page is loading...

loader

AIPPI 2024: The copyright dilemma: Trained to infringe?

20 October 2024

AIPPI 2024: The copyright dilemma: Trained to infringe?

Copyright holders are demanding the protection of their works against unauthorized training use by generative artificial intelligence (AI) tools. But it’s not a straightforward topic, explained Ellen Keenan-O’Malley, a senior associate with EIP in London. Rights owners must understand whether the use is authorized or a permitted act under fair use.

“In the EU, is the training of AI allowed or not?” asked Elisa Huusko, a partner with Berggren in Helsinki. “Most likely, yes. Companies are able to use sources that do not contain any technical restrictions,” she said. Huusko explained that EU copyright law is dependent on exceptions and limitations, and that there is no exception that would have been drafted and enacted for AI. 

The industry needs to find alternatives, she argued. The EU Artificial Intelligence Act contains nothing about copyright, but considerable time, energy and resources have been put into enacting the act, and it would be very controversial to say that training for AI would not be allowed while trying to promote AI technologies, Huusko said, who noted that the business and scientific community wants to be allowed to train AI.

Other exceptions should be sought, and data mining is quite similar to training AI, she said, to the point that it has been seen as an alternative choice for allowing AI training. One would need to read data mining exceptions quite liberally, Huusko said, if they were going to use them for training AI, and copyright owners must have an option to decide to opt out. The problem is, for developers training AI, it would be almost impossible to verify if a copyright owner has opted out. “We are going to have a lot of difficulties,” she said.

As the EU doesn’t have a specific exception or limitation for AI but wants to allow generative AI, do companies have consent for its use? Can companies use silent consent? This was the concept that Finland adopted, she said: As people have allowed their content to be published online without any control over who is using it and how it is being used, then this is silent consent. But content that lies behind a paywall could not be used in the same manner, Huusko said. 

In China, internet companies have actively launched large AI models for commercial use, explained Allen Wang, managing partner at Beijing TA Law Firm in Beijing. The scale of the Chinese AI industry is huge, he said, highlighting that a recent AIGC Series (Artificial Intelligence Generated Content) report by research analyst firm Third Bridge said that in 2023, nearly 10% of all content on Douyin, a popular short video platform, may be generated by AI. The same report said that in 2023, the value of AI-generated content was about US$2.04 billion, but would grow to nearly US$103 billion. Chinese government agencies have actively introduced policies to support and promote the development of the AI industry, he said. China is not a country that uses case law, explained Wang, and companies must follow legislation and regulations. 

Explaining the concept of input and output, Warrington Parker, managing partner of the San Francisco office of Crowell & Moring, said that input is all about what the AI was trained on, and output is what the AI is going to be used for. In the United States, on the input side, original works and derivatives of them would be protected, as would the specific way a compilation or collection was put together, but ideas and facts cannot be protected. If you have a compilation such as a phone book, he explained, you can protect how it is organized but you can mine the factual data it contains to train AI.

On the output side, the question is to what is being produced. Can it re-create the original? If it can, it probably breaches copyright laws, but “in the style of” is more of an idea, and so cannot be infringing, Parker said.

Of course, there will be cases where copyright will be infringed, and fair use arguments may come into play, Parker said. Parody is fair use, and copying a factual work would be preferable to copying an original work of fiction. Courts would also consider how much of the original work was used, and the effect on the potential market that the output creates, he said. Does the new product take market share away from the original product, or has it created a new market? Using copyrighted works to train AI algorithms on how words go together would be ok, but using them to train AI to reproduce the original works would not,  he said.

Keenan-O’Malley asked the panel whether companies can put technical measures in place to prevent original works from being used for training AI or for data mining. The panel agreed such measures do exist, and that the circumvention of technical measures would be illegal, citing in particular that using anything behind a paywall would be illegal.

She then asked the panel whether AI could learn through web scraping without violating copyright law. Both Wang and Huusko explained that while there may not be laws specifically against web scraping, web scraping would likely still be in breach of copyright laws in their jurisdictions. Parker went further, explaining that the type and impact of web scraping will influence a court’s decision in the United States. If there was an impact on the original market by the thing created by web scraping, such activity would likely be illegal. 

A comment from the floor explained that, often, companies don’t mind their data being used to train AI or for web scraping, but they want to be compensated for it. It’s often the use of original work for free that angers the rights holder. 

Another audience member asked if collective licensing schemes could benefit rights holders. Parker agreed they could, and said that licencing schemes can bring claims of abuse of the license if they feel its use has gone beyond what was agreed. 

The final question from the floor concerned opt-outs, asking if there was a particular form of opt-out that was more valid than others. Where and how do you find out if an original work has opted out? That is a big problem, explained Huusko, noting that any kind of opt-out is sufficient, and as more original work becomes easily accessible, it will become increasingly difficult to identify which works have opted out.

Keenan-O’Malley, Huusko, Wang and Parker were speaking on day two of the AIPPI World Congress, currently running in Hangzhou.

– Darren Barton, reporting from Hangzhou


Law firms