Who owns the content in ChatGPT and other tools?
28 April 2023
Since its launch in November 2022, ChatGPT has gained notoriety because of how it simulates news articles or, in dating apps, messages with potential dates. The informal nature of the chats might even remind older millennials of the experience of conversing online because they were raised with Internet Relay Chat (IRC) chat rooms, a text-based instant messaging platform popular in the early 2000s. However, ChatGPT, the most recent advancement in technology known as a “large language model tool,” does not “think” or “communicate” with sentience as humans do.
Even if ChatGPT can explain quantum physics and compose poetry at will, a complete AI takeover isn’t exactly on the horizon yet. This begs the questions around content custody, ownership and attribution since ChatGPT tends to not include citations to the original sources and IP used or synthesized.
In academia – and in law – there is a general obligation to attribute or cite original sources when using someone else’s work. This obligation is based on the concept that the moral rights of a creator is of integral importance.
He also strongly suggested that users should fact-check citations and attributions for accuracy should ChatGPT provide these. While copyright protects original works in a material form in Australia, including computer software, there is currently no express legal principle that states original works can include computer-generated works as they do not have an original author.
However, whether the Australian copyright law can protect a computer-generated work is still being carefully considered by various policymakers in the country. These issues have caught the attention of legislatures in both Australia and throughout the world. There is no doubt that there will be some movement on this soon.
OpenAI, however, has broad rights to use the input and the output, which are collectively called content. Specifically, OpenAI will have a right to use the content to provide and maintain their services, comply with applicable law and enforce their policies. There are several other restrictions, including users not being permitted to represent outputs from ChatGPT as human-generated.
“Ultimately, I recommend that users act with caution. They should not assume they own the content, or that they would be able to use the content without fear of being subject to infringement claims,” said Chien.
The same is true for Singapore. Like many other jurisdictions, this is a hot topic in IP law and there’s currently no settled answer. “Most countries, including Singapore, require a human to be the author of a work, which means that it is potentially OpenAI that can claim ownership of the original AI-generated work,” said Aaron Thng, a director at Amica Law in Singapore. “But if a ChatGPT-generated work is considered a derivation of an original copyrighted work, then the ownership of the copyright would likely belong to the original copyright holder.”
Assuming ChatGPT doesn’t reproduce a pre-existing original work identically, the absence of attributions or citations is not likely to be a big problem under copyright law, according to Thng. Copyright law protects only the expression of an idea, not the idea itself. If ChatGPT can express the base idea differently, this is unlikely to be an infringement of copyright. However, the user must provide some attribution to the original work or author under Singapore law if ChatGPT quotes from another original copyrighted work.
Copying and generating the exact same passage
Meanwhile, in New Zealand, IP laws provide some statutory guidance in respect of computer-generated works. Section 5 of the Copyright Act 1994 defines the meaning of author, including authors of an artistic work that is computer-generated. The section confirms that the author of a literary, dramatic, musical or artistic work that is computer-generated is “the person by whom the arrangements necessary for the creation of the work are undertaken.”
However, according to Jonathan Aumonier-Ward, a principal at AJ Park in Wellington, it is not yet clear how this is applied where the output is generated based on a multitude of works that may belong to others.
“The courts will have to examine issues, such as who is making arrangements for the work to be generated. Is it the programmer or the developer? Or is it the user of the program?” he said. “In the case of OpenAI, the work is only generated once the user enters a command. We suspect that clarity may be provided over the medium term by legislative drafting, and this will be a political and economic issue as much as a legal one.”
He added: “When ChatGPT generates the exact same passages for someone else, the same could happen if two people are independently working on the same problem anyway. It is probably a non-issue, or maybe less of an issue than people are currently thinking it might be.”
From New Zealand’s perspective, Section 41 of the Copyright Act 1994 states that “copyright in a work is not infringed by the incidental copying of the work in an artistic work, a sound recording, a film or a communication work.”
“Additionally, the infringing acts usually require copying,” said Aumonier-Ward. “It is not clear that either of the two identical passages are copies of the other. Though theoretically, the first generated passage could become part of the vast quantity of information and data used to generate the second passage, but this is a technical question not a legal one. I think it is going to be more important for legislation to be clear as to whether content generated by AI can be an ‘original work’, and to determine whether there is copying as part of the black box process here.”
In the Philippines, to be entitled to copyright protection, the work must meet the standard of originality. This concept is better explained by the Supreme Court in the case of Ching Kian Chuan v. Court of Appeals, G.R. No. 130360, 15 August 2001: “A person to be entitled to a copyright must be the original creator of the work. He must have created it by his own skill, [labour] and judgment without directly copying or evasively imitating the work of another.”
“The generation of the exact same passages for someone else gives rise to the issues of originality and independent creation of the passages, which may be difficult to establish,” said Rowanie Nakan, a partner at Cruz Marcelo & Tenefrancia in Manila. “Thus, in the absence of clear evidence of copying, it may be challenging to enforce one’s copyrights with respect to common or similar passages created by the chatbot, even assuming that they are copyright eligible.”
“However, the said provision does not shed light on how outputs may be considered to be original or independently created considering that they are generated in response to similar questions,” said Nakan.
ChatGPT as source?
If the programmers of ChatGPT can be considered authors, they have the right to require that the authorship of any text generated by ChatGPT be attributed to them. In this regard, OpenAI’s Sharing & Publication Policy already requires content co-authored with OpenAI to disclose the role of AI in formulating the content, particularly: “The author generated this text in part with GPT-3, OpenAI’s large-scale language-generation model.”
“It is our view, however, that ChatGPT should not be cited as a source in itself,” said Andrea Alegre, an associate at Cruz Marcelo & Tenefrancia. “After all, the output created by ChatGPT is only an automated, predictive amalgamation of various other resources. ChatGPT is not the source of information itself.”
In academic, legal and other professional writing, materials are generally classified into primary and secondary sources. Primary sources are those understood to be raw, original quantitative or qualitative information that the author has personal knowledge of or direct access to. Meanwhile, secondary sources are those more distant from the origin of the data or information and built on other primary or secondary sources, whether to interpret, analyze, critique or otherwise process those prior sources.
Viewed within this framework, ChatGPT fails to qualify either as a primary or secondary source, as it also neither generates new information or thought from raw sources, nor analyzes, interprets, critiques those materials. It is only an automated mechanism for predicting likely responses to queries based on its vast store of information, according to Alegre.
At most, therefore, predictive models such as ChatGPT should only be used as tools for sifting through or processing large quantities of information but cannot be cited as sources in the absence of a deliberate intent, creativity or thought in their generation of information. “Stripped to their essence, these models engage in mere mathematical predictions or guesswork, which, albeit heavily calculated, is still merely guesswork,” she said.
As AI technology, such as ChatGPT, becomes more prevalent in the coming years, users may become reliant on the output it generates for information and advice.
Robert Daniel Arcadio, an associate at Cruz Marcelo & Tenefrancia, advised that users should keep in mind ChatGPT is primarily a language model trained to produce text and does not necessarily give accurate responses.
“For instance, a prompt on the legality of divorce in the Philippines will tell a user that ‘it was recently legalized in 2019’, when in fact, the Philippines is the only country, aside from the Vatican, which outlaws absolute divorce,” he said. “Similarly, ChatGPT sometimes cites sources that do not exist. Identifying what is credible information from what is not thus becomes a challenge for users of such AI technology.”
James Louie Cuevas, an associate at Cruz Marcelo & Tenefrancia, said that currently, ChatGPT provides users with disclaimers on its accuracy and information on its limitations in its General FAQs. ChatGPT also cautions that it is not intended to give advice, and OpenAI indicates the primary uses of its GPT model as copywriting, summarization, parsing text, classification and translation.
“OpenAI can address accuracy concerns further by including the source and author attribution in its output. Not only will this enable users to check the credibility of any given output, but this may also address any copyright issues which may arise from the use of AI-generated texts,” said Cuevas. “Users, on the other hand, should recognize that AI technologies, such as ChatGPT, are simply tools that process and generate responses to specific inputs or stimuli and are not a substitute for original human-written works.”
Tricking the system
The application of generative AI might seem advanced compared to other current technologies, but soon it is going to become part of our daily lives, similar to the emergence of the internet and Web 2.0 to 3.0. New challenges will arise because of developments, which is unavoidable.
“Looking at the surface, most of the responses generated by ChatGPT seem to be well-organized and reliable. However, sometimes it can make a mistake by coming up with a plausible-sounding but incorrect or nonsensical response – the so-called ‘hallucination’,” said Dhiraphol Suwanprateep, a partner at Baker McKenzie in Bangkok.
He added that ChatGPT would need humans to be involved in the training process and provide feedback for its learning and improvements, as it is only pretrained. It has never interacted with reality and therefore has no idea of the underlying reality in the responses it provides.
Also, ChatGPT is not allowed to create content that violates its content policy, such as illegal activities, generation of malware, providing financial or investment advice. However, users always find ways to trick the algorithms, said Kritiyanee Buranatrevedhya, an associate at Baker McKenzie in Bangkok.
“By giving ChatGPT the criteria and instructions for roleplaying, such as playing the role of someone else, an expert engineer, a white hat hacker or financial advisor for example, ChatGPT is now capable of providing the answer that is prohibited by its content policies in the shoes of the role it is playing, the so-called ‘jailbreaking’,” she explained. “Therefore, there is always a way to trick the system.”
He further said that this challenge requires regulatory developments, whether court precedents or regulations, to provide clarity on this issue. He also said that this is the reason why Thailand is in its development of specific AI law as well.
There will be many challenges in practice, said Burin Saekow, also a Bangkok-based associate at Baker McKenzie. ChatGPT uses the input of the users to train itself. Therefore, if the user provides input of data protected by law, such as trade secrets or personal data, it could result in a data breach and illegal activities. It also uses the upcoming GPT-4 model, which claims to be more accurate, provide less mistakes and now capable of image analysis. With these, new challenges will certainly emerge.
“If the next version of ChatGPT could be adapted to provide citation of the source of work in its chat response function, this could reduce the risk of copyright infringement and have more options for users to weigh the reliability of the source of the response too,” he said.