Singaporean writers reject use of their works to train LLM for gov’t project
28 June 2024
In March 2024, the Singaporean government emailed local writers to ask for permission to use their works as training material for a Large Language Model (LLM) as part of the National Multimodal LLM Programme. The answer was a firm ‘No.’
Spearheaded by the Infocomm Media Development Authority, the US$52 million government initiative launched in December 2023 is said to be Southeast Asia’s first regional LLM project. It aims to train Singapore’s own LLM using local material, including information on the country’s history and culture. Existing LLMs are trained primarily using data from Western sources, while Singapore’s LLM will also be trained in 11 languages spoken in Southeast Asia.
The email stated that the writers’ works would be used for research purposes only. However, it failed to adequately address matters relating to copyright and compensation for the writers.
George Hwang, director at George Hwang in Singapore, said the authors’ reaction is not surprising as they are just protective of their works.
“The authors’ concerns seem to be more in the area of responsible use and not just remuneration. They want assurance that the AI will be for the limited purpose of ‘public service towards cultural representation’ only. This can be dealt with in the contracts,” said Hwang.
“What will happen at the end of the day depends on the parties’ relative bargaining position. The authors will need to take a look at the ‘Best Alternative to a Negotiated Outcome.’ If they play hardball, can the Singapore government use their works without their consent? For this, we will need to take a look at the ‘permitted uses’ available,” he explained.
The available permitted uses are government use, computational data analysis or data mining and fair use.
Assessing the issue in the context of government use as a permitted use, Hwang stated: “It cannot be used for the service of the government. To fall within this scope, it must be a ‘public act.’ This is used by the government for the service of the government. I do not see the creation of a generative AI system, which has Singaporean or Southeast Asian influences, as a service Singaporeans have voted their government to provide for now.”
If the author’s work is “paywalled” or downloadable for a price, it may still not fall within the computational data analysis provision. “Lawful access to the material is one of the conditions for this permitted use to apply,” noted Hwang.
As to “fair use,” he said: “With the change of ‘fair dealing’ to ‘fair use’ in our Copyright Act 2021, there is the possibility that our courts may start reading the U.S.’s notion of transformative use into this exception to infringement. This is untrodden territory, and cases are pending in the U.S. Whilst there are no decided cases on AI and fair use in the U.S., the case of Google Books can inform us on what transformative use is.”
- Espie Angelica A. de Leon