You are currently viewing a new version of our website. To view the old version click .
Laws
  • Article
  • Open Access

23 June 2025

Copyright Implications and Legal Responses to AI Training: A Chinese Perspective

and
1
Intellectual Property School, East China University of Political Science and Law, Shanghai 201620, China
2
School of Law, University of International Business and Economics, Beijing 100029, China
*
Author to whom correspondence should be addressed.

Abstract

The emergence of generative AI presents complex challenges to existing copyright regimes, particularly concerning the large-scale use of copyrighted materials in model training. Legal disputes across jurisdictions highlight the urgent need for a balanced, principle-based framework that protects the rights of creators while fostering innovation. In China, a regulatory approach of “moderate leniency” has emerged—emphasizing control over downstream AI-generated content (AIGC) while adopting a more permissive stance toward upstream training. This model upholds the idea–expression dichotomy, rejecting theories such as “retained expression” or “retained style”, which improperly equate ideas with expressions. A critical legal distinction lies between real-time training, which is ephemeral and economically insignificant, and non-real-time training, which involves data retention and should be assessed under fair use test. A fair use exception specific to AI training is both timely and justified, provided it ensures equitable sharing of technological benefits and addresses AIGC’s potential substitutive impact on original works. Furthermore, technical processes like format conversion and machine translation do not infringe derivative rights, as they lack human creativity and expressive content. Even when training involves broader use, legitimacy may be established through the principle of technical necessity within the reproduction right framework.

1. Introduction

In recent years, generative artificial intelligence has witnessed explosive growth, with applications such as ChatGPT, Midjourney, Sora, Suno, and DeepSeek rapidly expanding in both reach and functionality. These technologies are reshaping the landscape of content creation, distribution, and consumption and have become key drivers of global technological innovation and industrial transformation. At the core of generative AI lies machine learning, where developers enhance the model’s capacity for learning and generalization through a process known as AI training. AI training, broadly construed, comprises two interrelated phases: data acquisition (including the collection, storage, and preprocessing of raw data) and data input (comprising pre-training, supervised fine-tuning, and reinforcement learning) (Sun and Dredze 2025). While the former constitutes a preparatory stage, the latter reflects the actual implementation of learning protocols and corresponds to what is typically referred to as AI training in the narrow sense. Unless otherwise indicated, this article uses ‘AI training’ in its broader sense; however, in the analysis of specific practices, particularly in Section 3, the term is used in its narrow sense to refer exclusively to the model training process, excluding corpus construction or other preparatory activities.
The performance of generative AI systems depends heavily on the availability of large-scale, high-quality datasets. Unlike other types of AI, generative models must deeply analyze the language, stylistic features, compositional structures, and logical patterns embedded in intellectual creations across the literary, artistic, and scientific domains. Consequently, training datasets often contain copyrighted materials—such as books, artworks, music, and films—raising significant legal concerns regarding potential copyright infringement during the development process (Zurth 2021). These risks can be examined from two interrelated dimensions: the training output, which is a mathematical model generated through the learning of a dataset by machine learning algorithms, typically comprising a large number of parameters and a complex architecture, and the training process, which denotes the aggregate of activities performed by the developer or trainer in training a model, which involves the use of copyrighted materials as part of the training dataset. The former involves assessing whether the model has substantially retained the expressive elements of original copyrighted works; the latter centers on whether the acts involved in AI training amount to reproduction or the creation of derivative works under copyright law.
Globally, copyright litigation related to AI training has intensified across sectors such as publishing (Syndicat National de L’édition 2025), journalism (Canadian News Companies 2024), music (Sato 2024), movie (Shawn and Mat 2025), and visual arts (Vincent 2023). A pronounced conflict has emerged between copyright holders and generative AI developers (hereinafter referred to as “trainers”), with both sides disputing whether, and how, AI training activities should be regulated under existing copyright frameworks. In China, while there are no binding judicial decisions directly addressing copyright infringement claims involving AI training (Dong and Ren 2024), courts have begun exploring the copyrightability of AI-generated content (AIGC)1 and the potential liability of generative AI service providers2. On the regulatory front, the Interim Measures for the Management of Generative Artificial Intelligence Services (GAPM), the world’s first regulation specifically targeting generative AI, promulgated on 10 July 2023, and effective from 15 August 2023, explicitly enshrine the principle of “respect for intellectual property rights” in the context of AI training. These developments indicate that the interpretation and application of China’s current copyright regime will remain central to resolving future legal disputes in this evolving domain.
By contrast, in other jurisdictions, representative cases such as Tremblay v. OpenAI,3 Silverman v. OpenAI,4 and Robert v. LAION5 have substantively examined whether the AI training process falls within the scope of copyright protection. This interpretive, case-driven approach offers greater practical guidance for both theoretical development and judicial adjudication. In disputes concerning the ‘training output,’ the core issue is whether the process of training and AIGC constitutes unlawful copying of original works. In Andersen v. Stability AI, for instance, the plaintiff argued that the AI system compresses and stores the original images as ‘training images’ and that the output process constitutes a complex collage of preexisting works.6 This line of argument posits that the model essentially functions as a ‘complex collage tool.’7 Infringement claims arising from the training process are typically more intricate and demand a careful distinction between two technical scenarios: ‘real-time training,’ in which copyrighted materials are temporarily loaded into memory for analysis, and ‘non-real-time training,’ in which data copies are fixed more permanently through mechanisms such as cloud storage or local hard drives. Furthermore, the training process often involves ancillary data-processing activities such as format conversion and language translation, adding further layers of legal complexity.
There is, as yet, no international consensus on how to address copyright infringement risks arising from both the training outputs and training processes of generative AI systems. Nevertheless, resolving the ongoing conflict between copyright holders and AI developers is crucial to unlocking the full technological and industrial potential of generative AI. Although the Copyright Law of the People’s Republic of China (the Copyright Law) does not explicitly address the applicability of copyright rights (Article 10 of the Copyright Law)8 or fair use categories (Article 24 of the Copyright Law)9 to AI training activities, developments in legislation and judicial practice reflect a clear policy orientation toward embracing technological advancement. In global academic and professional communities, three main theories have emerged as possible frameworks for resolving copyright risks associated with AI training: the non-expressive use theory, the fair use theory, and the statutory licensing theory.
The non-expressive use theory holds that the acts involved in AI training do not make use of the expressive elements of copyrighted works and therefore fall outside the scope of copyright regulation. The determinacy of the theoretical framework itself, especially its documented benefits for the development of small and medium-sized enterprises (SMEs) (Quang 2021), together with its cost-efficiency in practical implementation (X. Liu 2024), constitutes key evidence commonly relied upon by proponents to support their claims. The fair use theory asserts that the acts involved in AI training should or do constitute fair use and therefore do not amount to copyright infringement. Arguments in favor of including AI training within the scope of fair use often invoke doctrines such as the application of the transformative use standard (Sag 2019), the construction of a fair learning principle (Lemley and Casey 2020), adherence to the technology neutrality principle (X. Xu 2024), and a balancing of harms and benefits associated with use (Y. Liu 2024). These approaches are employed to justify both the legitimacy and necessity of recognizing AI training as falling within the ambit of fair use. The statutory license theory argues that the acts involved in AI training should be covered by a statutory license and therefore, after payment of compensation, do not constitute copyright infringement (Y. Cai 2024). It should be noted, however, that under current Chinese legislation, no statutory license exists that specifically applies to AI training, and the establishment of new regulations would be necessary to implement this approach. This is also true in most other countries. All three approaches rely on a precise legal delineation between the nature of the ‘training process’ and that of the ‘training output’.
The main goal of this article is to provide a solution to resolve the conflict between the development of artificial intelligence and copyright protection and to further enrich the body of doctrinal scholarship. While the copyright implications of AI training have been the subject of extensive scholarly discussion at the international level, the regulatory developments and practical approaches emerging from China have attracted comparatively little attention. Against this background, this article first adopts a China-focused perspective to examine the copyright risks of AI training, distilling recent developments and trends in domestic legal and policy practice. Second, by engaging with both domestic practice and comparative insights from other jurisdictions, this article analyzes the copyright implications arising from both the training process and training outputs, with particular focus on the legal characterization of acts involving the use of copyrighted works. Finally, this article aims to provide theoretical grounding and judicial guidance for a balanced resolution of this emerging legal challenge. It contends that whether AI training constitutes copyright infringement should turn on whether the relevant use of copyrighted content preserves the original expression protected under copyright law. This standard should apply uniformly to the acquisition, processing, and utilization of copyright-protected material throughout the AI training lifecycle.

2. Recent Developments in Chinese Practice

Since 2023, China has made significant strides in exploring the copyright issues related to AI training within its judicial and legislative practices. While there are no binding rulings or legal provisions that directly define the legal status of AI training, the ongoing developments suggest a clear trend: Chinese authorities have prioritized “post facto regulation” over “preemptive restriction”, signaling a proactive stance in fostering the growth of emerging technologies and industries.

2.1. Legislation: Focusing on Mitigating the Negative Externalities of Aigc

While relevant regulations have been introduced, China’s legislative control over AIGC is clearly more extensive than its approach to AI training, with the latter being treated as subordinate. Since 2022, China has enacted a series of prominent AI-related regulations, including Provisions on the Administration of Algorithm-generated Recommendations for Internet Information Services (ARR), Provisions on the Administration of Deep Synthesis of Internet-Based Information Services (DSR), GAPM, and the Measures for Identification of Artificial Intelligence-Generated Synthetic Contents (AIM). The common feature of these regulations is their focus on mitigating the negative externalities associated with AIGC while leaving the legal nature of AI training deliberately vague, either by not addressing it or by providing only limited discussion (Table 1).
Table 1. China’s latest AI-related regulations.
The GAPM provide the most direct response to copyright risks associated with AI training and is closely linked to the subsequent Basic Requirements for the Safety of Generative Artificial Intelligence Services. Specifically, the former mandates that generative AI service providers must carry out pre-training, optimization, and other data processing activities in compliance with the law, using data and foundational models from legitimate sources. It also requires that these activities must not infringe upon the intellectual property rights of others.10 However, the GAPM does not address whether copyright holders have the right to control AI training, leaving this question unresolved and necessitating an examination of other legal sources.
The Basic Requirements for the Safety of Generative Artificial Intelligence Services, issued by the National Technical Committee 260 on Cybersecurity of the Standardization Administration of China, builds upon the GAPM. The general provisions of the Basic Security Requirements for Generative Artificial Intelligence Services state, “This document supports the GAPM and sets forth the basic security requirements that service providers must follow. When service providers fulfill the filing requirements, they must conduct a security assessment in accordance with the provisions of Chapter 9 of this document and submit the assessment report”. Therefore, while the Basic Security Requirements for Generative Artificial Intelligence Services is not a typical administrative regulation or normative document in China, it still carries normative force.

2.2. Judicial Practice: A Shift from Strict Liability to Moderate Leniency

Although there are currently no binding precedents in Chinese judicial practice that directly address copyright infringement arising from AI training, two recent cases involving substantial similarity between AIGC and copyrighted works have begun to draw a connection between the actions of AI developers (or trainers) and potential copyright liability. While the courts differed in the degree of attribution, both decisions reflect an evolving judicial perspective that links training activities to infringement risks.

2.2.1. GIC Ultraman Case

On 8 February 2024, the Guangzhou Internet Court issued a decision in Shanghai Xinchuanghua Cultural Development Co., Ltd. v. Guangzhou Nianguang Co., Ltd., a case (GIC Ultraman case) that has been referred to by some media outlets as China’s first copyright infringement lawsuit involving an AI platform.11 The plaintiff held the exclusive license to the copyright in the Ultraman series of artistic works within China and was entitled to enforce the rights independently. The defendant operated a website called “tab”, which provided generative AI services to the public via programmable APIs, accompanied by a paid subscription model. The plaintiff discovered that the AI system offered by the defendant could, upon user prompts, generate images that closely resembled those from the Ultraman series. Consequently, the plaintiff initiated legal proceedings, alleging that the defendant had infringed its rights of reproduction, adaptation, and communication to the public through information networks and sought to halt the infringement and demanded compensation for damages. The court framed the key issues of the case as whether the defendant’s actions infringed the plaintiff’s rights and, if so, how liability should be assigned.
The plaintiff argues that the defendant’s unauthorized generation of multiple Ultraman images infringes on the plaintiff’s right of reproduction over the Ultraman works involved in the case. Some of the infringing images, such as those featuring Ultraman in an illustration style and the fusion of Ultraman with other characters like Sailor Moon and Doraemon, are substantially similar to the plaintiff’s original works, thus infringing on the plaintiff’s right of adaptation. Additionally, by generating and providing all the involved images to users, the defendant has violated the plaintiff’s right of online communication of the Ultraman works.
The court ultimately concluded that the defendant was directly liable for infringing the plaintiff’s rights of reproduction and adaptation. It also found the defendant at fault for failing to establish a complaint mechanism, provide risk warnings, or display prominent notices and therefore held the defendant liable for damages. The court ordered the defendant to implement an effective keyword filtering system to prevent the AI model from generating infringing content.
Regarding the right of reproduction, the court held that “the images involved in the case, generated by the tab website and provided by the plaintiff, partially or fully reproduce the original expression of the ‘Ultraman’ artistic character. Therefore, the defendant, without authorization, reproduced the Ultraman works, infringing on the plaintiff’s right of reproduction over the Ultraman works involved”. Concerning the right of adaptation, the court stated, “The generated images involved retain the original expression of the ‘Tiga Ultraman Composite’ work and, based on this original expression, develop new characteristics. The defendant’s actions thus constitute an adaptation of the Ultraman works. Therefore, without permission, the defendant adapted the Ultraman works, infringing on the plaintiff’s right of adaptation”. Regarding the right of online communication, the court commented, “Whether the defendant infringed on the right of reproduction, adaptation, or the right of online communication is a matter of determining which specific copyright rights have been violated, and does not affect the establishment of infringement. In other words, it does not have a substantial impact on the interests of the rights holder or the public. Considering that this case involves a new situation of infringement arising in the context of generative AI, and given that the court has already upheld the plaintiff’s claims of infringement on the right of reproduction and adaptation, the court will not re-evaluate the claim of infringement of the right of online communication as it is already covered under the claims for reproduction and adaptation”. As for the plaintiff’s request for the defendant to remove the Ultraman data from its training dataset, the court did not support this claim, as the defendant did not actually conduct model training.
The GIC Ultraman case sparked significant debate in both academic and industry circles upon its release, with some supporting the court’s decision (Yao 2024) and others questioning its findings of fact and legal application (Juanmao 2024). The main issue arises from the court’s reliance solely on the substantial similarity between the AIGC and the Ultraman series to conclude that the defendant had reproduced and adapted the plaintiff’s copyrighted works without authorization (Luo and Yang 2025). However, the court did not clearly explain the relationship between the plaintiff’s rights to reproduction and adaptation in the absence of direct involvement in the AI training process. In other words, it remains unclear whether the content generated by the model was an independent act of the defendant, the plaintiff, or the model itself or whether it was a collaborative result. The court’s decision to find the defendant directly liable for infringement, rather than indirectly, reflects a tendency toward strict accountability. However, this perspective fails to explain why using third-party APIs to offer generative AI services constitutes acts of reproduction and adaptation. A more reasonable explanation is that the court did not adequately distinguish between scenarios in which the trainer directly provides services and those in which third parties indirectly provide services via APIs. Reproduction and adaptation, in this context, should be seen as the result of the actions taken during the development process by the former. It should be noted that while this line of reasoning may explain the judiciary’s perspective, it is not necessarily reasonable.

2.2.2. HIPC Ultraman Case

On 30 December 2024, the Hangzhou Intermediate People’s Court issued its second-instance judgment in the case of Shanghai Character License Administrative Co., Ltd. v. Hangzhou Jellyfish Intelligent Technology Co., Ltd. (HIPC Ultraman case), affirming the first-instance decision.12 This case marks China’s second instance of AI platform-related copyright infringement litigation. In this case, the plaintiff, as the exclusive licensee of the Ultraman series of artistic works in China, had the right to enforce the copyright independently. The defendant operated a platform that utilized the open-source Stable Diffusion model, offering services for image generation, storage, distribution, and model customization under the framework of Stable Diffusion’s text-to-image, image-to-image, and LoRA (Low-Rank Adaptation) technologies. The purpose of LoRA (Low-Rank Adaptation) technology is to allow users to upload a corpus of images and train a LoRA overlay model on top of a base model. This overlay model is used to supplement the content that the base model is unable to generate or to enhance specific image features, enabling fine-tuning and personalization of the generated images. This process helps tailor the images to better align with specific artistic styles and creative requirements. The plaintiff discovered that the defendant’s platform not only allowed for the generation of images similar to those in the Ultraman series using a LoRA overlay model but also provided storage and distribution services for the generated content. As a result, the plaintiff initiated a lawsuit, claiming that the defendant had directly and indirectly infringed its rights to the communication of the Ultraman series works through information networks and engaged in unfair competition. The plaintiff sought to halt the infringement and demanded compensation for damages. The two courts summarized the key issues as “whether direct or indirect infringement occurred”, “whether unfair competition was constituted”, and “if infringement or unfair competition was found, how the defendant should be held liable”. Ultimately, the courts concluded that the defendant should have been aware of the infringement but failed to take necessary action, thus constituting indirect infringement (specifically related to the right of communication via information networks). The courts ruled that the Anti-Unfair Competition Law would not be applied to further evaluate the case. The defendant was ordered to bear liability for damages, remove the infringing images, and eliminate any models capable of generating infringing content. It is noteworthy that when addressing the plaintiff’s request to “delete all materials and data related to Ultraman”, the first-instance court provided clarification on the legal boundaries of using copyrighted works in generative AI development. The court stated that ‘in the absence of evidence showing that the generative AI was intended to use the original expression of a copyrighted work, or that it has affected the normal use of the work or unreasonably harmed the legitimate interests of the copyright holder, such actions may be considered fair use.’
Although the HIPC Ultraman case differs from the GIC Ultraman case in terms of the facts and the plaintiff’s approach to enforcing their rights, it indirectly reflects a shift in Chinese judicial practice from strict liability to a more lenient stance concerning copyright risks in AI training. Specifically, the HIPC Ultraman case focuses more on the substitutive role of AIGC (artificial intelligence-generated content) with respect to copyrighted works—particularly its derivative effect in distribution—rather than on the legal nature of actions taken by parties during the development and application stages. As the plaintiff did not allege that the defendant had engaged in acts of reproduction or adaptation, the court did not address these aspects. However, the court in this case clarified the defendant’s service provision based on an open-source model, thereby avoiding the confusion present in the GIC Ultraman case that mixed the scenarios of direct service provision by the trainer and indirect service provision via third-party interface calls. The court rejected the automatic assumption that AI service providers are directly infringing upon copyright. Based on this, some scholars have further argued the feasibility of applying the “safe harbor rule” to AI service provision scenarios and its positive implications for technological and industrial development (Xiong 2025). Additionally, the court differentiated the roles of the trainer, service provider, and user prior to the generation stage and introduced a framework for determining fair use based on the externalities of AIGC. This demonstrates that Chinese judicial practice is continuously evolving, with its reasoning becoming more refined, rational, and aligned closely with legislative logic.

2.2.3. BIC REDnote Case

Similar to other jurisdictions, China has not yet seen a definitive ruling directly addressing whether AI training constitutes copyright infringement. However, on 20 June 2024, the Beijing Internet Court began hearing a series of cases filed by four illustrators against REDnote (BIC REDnote case), revealing some trial details and providing a valuable window into the claims, demands, and evidence presented by both the plaintiffs and the defendant. The plaintiffs in these cases are illustrators who have long posted their artworks on the REDnote platform. They discovered that users had posted images on the platform that bore clear signs of imitation, with these users claiming that the images were generated using an AI drawing software provided by REDnote. This led to the dispute (Dong and Ren 2024). The plaintiffs assert that the defendant, without permission, used their artworks to train an AI model and applied it for commercial purposes, exceeding the boundaries of fair use and constituting copyright infringement. Specifically, the plaintiffs argue that the defendant’s act of scraping their artworks to input them into the AI model infringes their right to reproduction; that the AI software’s functionality, which combines their artworks with other images to create new works, infringes their right of adaptation; and that the defendant’s actions also violate their right to control the use of their works as training material for AI. In defense, the defendant contends that there is no substantial similarity between the plaintiffs’ artworks and the generated content, that the AI training constitutes fair use, and that they are not at fault. As of now, the case is still under trial.
In this case, the dispute between the plaintiff and the defendant centers around whether the “training result” and the “training process” lead to copyright infringement, an issue that is not explicitly addressed by China’s current copyright legislation. On the one hand, the question arises as to whether the “training result” retains the copyrighted works and their adapted content as the original data, thereby infringing on the rights of reproduction and adaptation through the service-providing behavior. On the other hand, without the permission of the copyright holder, does the “training process” infringe on the rights of reproduction, adaptation, or other rights with a broad scope? Furthermore, can fair use provide an exemption for either of these two scenarios? The judiciary cannot refuse to render a judgment, so these issues remain to be clarified through judicial interpretation of China’s current copyright law.
From the perspective of Chinese judicial practice, even if AI training does not fall within the statutory scope of the right of reproduction or the right of adaptation, the residual category of “other rights” under copyright law may still provide a legal basis for rights holders to pursue claims, provided certain conditions are satisfied. The Copyright Law establishes copyright powers in a relatively fragmented manner. In particular, the right of communication does not have a singular overarching provision like the “right of making available to the public” but is instead divided into several specific rights, including the right of distribution, rental, exhibition, performance, screening, broadcasting, and online communication. With the in-depth development of technology and industry, new forms of work dissemination, such as online live streaming, have gradually become more widespread.
However, online live streaming, being a non-interactive transmission method via cable, does not fall under the scope of broadcasting rights (which cover wireless broadcasting, cable retransmission, and public performance broadcasts) as defined in the 2010 version of the Copyright Law, nor does it fall under the right of online communication (which covers interactive communication via wired or wireless means). The act of broadcasting works through online live streaming without authorization thus falls into a gray area between copyright infringement and non-infringement. Given the substantial impact such actions have on the rights holders, Chinese judicial authorities have interpreted the “other rights” provision to include online live streaming, addressing the copyright holders’ concerns.13 In the subsequent 2020 amendments, broadcasting rights were also expanded to cover this issue. It should be noted that the uncertainty surrounding the “other rights” provision and the risks associated with its broad application have sparked some criticism. The question of whether to retain this provision, and under what conditions it should be applied, remains a challenge for both legislators and the judiciary (Li 2022). Under the fair use doctrine, statutory exceptions such as “personal study or research” and “classroom teaching or scientific research” may also be analogically interpreted to exempt training activities (Wan and Li 2023). Whether such a doctrinal approach—relying on the existing copyright framework to regulate novel technological and industrial developments—represents a sustainable legal response remains open to scrutiny and evaluation.

5. Conclusions

The copyright risks associated with AI training are not only a concern for China but also a global challenge, which is confronted with the dual pressures of technological innovation and regulatory reasoning. Jurisdictions are actively formulating responses to AI-related copyright challenges in light of their distinct economic, political, cultural, and technological conditions. For instance, the United States has adopted a posture of cautious intervention, asserting that the doctrine of fair use is capable of resolving the majority of legal issues arising from AI training, while residual concerns may be addressed through self-correcting market mechanisms (U.S. Copyright Office 2025).
Similarly, the European Union, through the Copyright in the Single Market Directive, has established a framework of exceptions for text and data mining (TDM), accompanied by an opt-out mechanism. While these provisions offer a potential pathway for reconciling the interests of AI development and copyright protection, their practical application remains subject to considerable uncertainty, particularly in commercial contexts involving AI training (European Union Intellectual Property Office (EUIPO) 2025). The Japanese Copyright Act has likewise introduced a provision permitting non-enjoyment uses, which has substantially lowered the legal barriers to AI training (Agency for Cultural Affairs 2018). Nonetheless, this legislative development has been met with sustained opposition from copyright holders (Deck 2023). The ongoing divergence in regulatory approaches highlights the absence of a broadly accepted consensus on how to reconcile AI training with copyright protection.
In China, the latest AI-related regulations, including ARR, DSR, GAPM, and AIM, establish a governance framework focused on preemptively curbing and subsequently mitigating the negative externalities of AIGC. The ambiguity in the legal treatment of AI training reflects a policy that encourages technological and industrial development, emphasizing inclusive growth. This approach has also been progressively refined and consolidated through relevant judicial practice. These approaches provide a pragmatically oriented framework for addressing the copyright implications of AI training.
This article argues that the determination of copyright infringement should focus on whether the use of a copyrighted work retains its original expression, with AI training’s acquisition, processing, and utilization of copyrighted works being subject to this criterion. At the “training output” level, based on the technical principles of AI training, the AI model itself does not serve as a repository for copyrighted works. If the AIGC does not retain the original expression of the copyrighted work, the trainer, being involved only in non-protectable elements such as ideas, styles, or facts, does not constitute copyright infringement. Consequently, both the retained expression theory and retained style theory are untenable.
At the “training process” level, it is critical to distinguish technical scenarios involving the digital storage of copyrighted works. The transient nature and lack of independent economic value inherent in “real-time training” naturally categorize it within the scope of “temporary reproduction”, whereas “non-real-time training”, which involves the creation of a database and the formation of stable copies, requires a fair use review. The homogeneity of derivative rights and reproduction rights implies that algorithm-driven technical processes such as work format conversion and language translation, which lack human creative intent and the generation of original expression, do not breach the boundaries of derivative rights. Even outside technical processing, any derivative actions undertaken by the trainer on copyrighted works can attain legitimacy under the framework of reproduction rights, provided they meet the principle of technical necessity.

Author Contributions

Conceptualization, L.Y. and H.L.; Writing—original draft, L.Y.; Writing—review & editing, L.Y. and H.L.; Visualization, L.Y. and H.L.; Supervision, L.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

No new data were created.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Agency for Cultural Affairs. 2018. Overview of the Amendment to the Copyright Act. Available online: https://www.bunka.go.jp/seisaku/chosakuken/hokaisei/h30_hokaisei/pdf/r1406693_02.pdf (accessed on 15 June 2025).
  2. Cai, Yuanzhen. 2024. Copyright Statutory Licensing of Machine Learning: Foundations and Regulations. Intellectual Property 38: 77. [Google Scholar]
  3. Canadian News Companies. 2024. Challenge OpenAI over Alleged Copyright Breaches. CNBC, November 29. Available online: https://www.cnbc.com/2024/11/29/major-canadian-news-media-companies-launch-legal-action-against-openai.html (accessed on 1 January 2025).
  4. Carlini, Nicolas, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramer, Borja Balle, Daphne Ippolito, and Eric Wallace. 2023. Extracting Training Data from Diffusion Models. pp. 5–6. Available online: https://arxiv.org/pdf/2301.13188.pdf (accessed on 5 May 2025).
  5. Cooper, A. Feder, Katherine Lee, James Grimmelmann, Daphne Ippolito, Christopher Callison-Burch, Christopher A. Choquette-Choo, Niloofar Mireshghallah, Miles Brundage, David Mimno, Madiha Zahrah Choksi, and et al. 2023. Report of the 1st Workshop on Generative AI and Law. p. 27. Available online: https://arxiv.org/abs/2311.06477 (accessed on 5 May 2025).
  6. Cui, Guobin. 2014. Copyright Law: Cases and Materials. Beijing: Peking University Press, p. 419. [Google Scholar]
  7. Cyberspace Administration of China. 2022a. ARR, Q&A. January 4. Available online: https://www.cac.gov.cn/2022-01/04/c_1642894606594726.htm (accessed on 23 March 2025).
  8. Cyberspace Administration of China. 2022b. DSR, Q&A. December 12. Available online: https://www.gov.cn/zhengce/2022-12/12/content_5731430.htm (accessed on 23 March 2025).
  9. Cyberspace Administration of China. 2023. Official Q&A on the GAPM. July 15. Available online: https://www.gov.cn/zhengce/202307/content_6892001.htm (accessed on 23 March 2025).
  10. Cyberspace Administration of China. 2025. Q&A on the AIM. March 14. Available online: https://www.cac.gov.cn/2025-03/14/c_1743654685896173.htm (accessed on 23 March 2025).
  11. Deck, Andrew. 2023. This Japanese Manga Artist-Turned-Politician Is Taking on AI Art. Rest of World, February 16. Available online: https://restofworld.org/2023/generative-ai-japanese-politicians-manga/ (accessed on 16 June 2025).
  12. Dong, Wenjia, and Huiying Ren. 2024. Beijing Internet Court Trials First Case of Copyright Infringement Involving AI Painting Model Training. Beijing Internet Court WeChat Official Account. June 20. Available online: https://mp.weixin.qq.com/s/cyskAz1cASBaNIYQpGpGsA (accessed on 1 January 2025).
  13. European Union Intellectual Property Office (EUIPO). 2025. Development of Generative Artificial Intelligence from a Copyright Perspective. May 12. Available online: https://euipo.europa.eu/tunnel-web/secure/webdav/guest/document_library/observatory/documents/reports/2025_GenAI_from_copyright_perspective/2025_GenAI_from_copyright_perspective_FullR_en.pdf (accessed on 15 June 2025).
  14. Fraser, Stephen. 1997. The Copyright Battle: Emerging International Rules and Roadblocks on the Global Information Infrastructure. The John Marshall Journal of Computer & Information Law 15: 777. [Google Scholar]
  15. General Office of the State Council of the People’s Republic of China. 2023. Notice on the Issuance of the State Council’s 2023 Legislative Work Plan. June 6. Available online: https://www.gov.cn/gongbao/2023/issue_10526/202306/content_6887136.html (accessed on 4 April 2025).
  16. General Office of the State Council of the People’s Republic of China. 2024. Notice on the Issuance of the State Council’s 2024 Legislative Work Plan. May 9. Available online: https://www.gov.cn/zhengce/content/202405/content_6950093.htm (accessed on 4 April 2025).
  17. Guan, Chunyan. 2024. Exploring the Fair Use of Copyright for Generative Artificial Intelligence Training: International Trends, Local Development, and the Construction of Rules. Publishing Research 40: 91. [Google Scholar]
  18. Huang, Wei, and Leiming Wang, eds. 2021. Introduction and Interpretation of the Copyright Law of the People’s Republic of China. Beijing: China Democratic and Legal Publishing House, pp. 154–55. [Google Scholar]
  19. Ji, Jessica, Josh Goldstein, and Andrew Lohn. 2023. Controlling Large Language Models: A Primer. Center for Security and Emerging Technology. December. Available online: https://cset.georgetown.edu/wp-content/uploads/CSET-Controlling-Large-Language-Model-Outputs-A-Primer.pdf (accessed on 5 May 2025).
  20. Juanmao. 2024. First Global AIGC Platform Infringement Case Decided: “Ultraman” Defeats AI. March 4. Available online: https://news.qq.com/rain/a/20240304A03D2Y00 (accessed on 5 April 2025).
  21. Lemley, Mark A., and Bryan Casey. 2020. Fair Learning. Texas Law Review 99: 743–79. [Google Scholar] [CrossRef]
  22. Li, Chen. 2022. On the Interpretation of Other Rights which Shall be Enjoyed by the Copyright Owners. Intellectual Property 36: 21. [Google Scholar]
  23. Lin, Xiuqin. 2021. Reshaping the Fair Use System in Copyright Law in the AI Era. Chinese Journal of Law 43: 176. [Google Scholar]
  24. Liu, Liang. 2024. In the First Nine Months of the 7th China International Import Expo, over 30,000 Trademark Infringement and Fake Patent Cases Were Handled in China. China News Network, November 6. Available online: https://www.chinanews.com.cn/cj/2024/11-06/10314706.shtml (accessed on 4 April 2025).
  25. Liu, Xiaochun. 2024. Non-Work Use Nature of Generative Artificial Intelligence Data Training and its Legitimization. Legal Forum 39: 67. [Google Scholar]
  26. Liu, Yu. 2024. An Economic Analysis of Machine Data Utilization Constituting Fair Use under Copyright Law. Intellectual Property 38: 107. [Google Scholar]
  27. Lohmann, Fred. 2023. Re: Notice of Inquiry and Request for Comment [Docket No. 2023-06]. October 30, pp. 5–6. Available online: https://downloads.regulations.gov/COLC-2023-0006-8906/attachment_1.pdf (accessed on 5 May 2025).
  28. Lu, Haijun. 2017. On the Nature of Idea/Expression Dichotomy. Intellectual Property 31: 25. [Google Scholar]
  29. Luo, Han, and Ruiqi Yang. 2025. Reflection and Exploration on Identification of Copyright Infringement in Artificial Intelligence Generated Content—A Case Study of the Ultraman. Digital Publishing Research 4: 69. [Google Scholar]
  30. Mihály, Ficsor. 2009. The Law of Copyright and the Internet. Translated by Guo Shoukang, Wan Yong, and Xiang Jing. Beijing: China Encyclopedia Publishing House, vol. 1, pp. 178–98. [Google Scholar]
  31. Ministry of Justice of the People’s Republic of China. 2006. The Head of Legislative Affairs Office of the State Council of the People’s Republic of China Responds to the Question from China Government Legal Information Network’s Reporter Regarding the ‘Regulation on the Protection of the Right of Communication to the Public on Information Networks’. December 4. Available online: https://www.moj.gov.cn/pub/sfbgw/zcjd/200612/t20061204_389820.html (accessed on 19 December 2024).
  32. Nimmer, David. 1997. A Tale of Two Treaties-Dateline: Geneva—December 1996. Columbia-VLA Journal of Law & the Arts 22: 15–16. [Google Scholar]
  33. Quang, Jenny. 2021. Does Training AI Violate Copyright Law? Berkeley Technology Law Journal 36: 1420–29. [Google Scholar]
  34. Sag, Matthew. 2019. The New Legal Landscape for Text Mining and Machine Learning. Journal of the Copyright Society of the USA 66: 291–319. [Google Scholar] [CrossRef]
  35. Sato, Mia. 2024. Major Record Labels Sue AI Company behind ‘BBL Drizzy’. The Verge, June 24. Available online: https://www.theverge.com/2024/6/24/24184710/riaa-ai-lawsuit-suno-udio-copyright-umg-sony-warner (accessed on 1 January 2025).
  36. Shawn, Chen, and O’ Brien Mat. 2025. Disney and Universal Sue AI firm Midjourney for Copyright Infringement. AP News, June 12. Available online: https://apnews.com/article/disney-universal-midjourney-copyright-lawsuit-722b1b892192e7e1628f7ae5da8cc427 (accessed on 15 June 2025).
  37. Shen, Rengan. 1997. WIPO Introduced Two New Treaties. In Intellectual Property Research. Edited by Zheng Chengsi. Beijing: China Fangzheng Press, vol. 3, p. 9. [Google Scholar]
  38. Shen, Rengan, and Yingke Zhong. 2003. Introduction to Copyright Law, rev. ed. Beijing: The Commercial Press, pp. 244–46. [Google Scholar]
  39. Sobel, Benjamin L. W. 2017. Artificial Intelligence’s Fair Use Crisis. Columbia Journal of Law & the Arts 41: 81. [Google Scholar]
  40. Stability AI. 2023. Response to United States Copyright Office Inquiry into Artificial Intelligence and Copyright. October, p. 13. Available online: https://www.law.berkeley.edu/wp-content/uploads/2023/11/Stability-AI-COLC-2023-0006-8664_attachment_1.pdf (accessed on 5 May 2025).
  41. Statement on AI Training. 2024. Available online: https://www.aitrainingstatement.org/ (accessed on 4 April 2025).
  42. Sun, Kaiser, and Mark Dredze. 2025. Amuro and Char: Analyzing the Relationship Between Pre-Training and Fine-Tuning of Large Language Models. March 18. Available online: https://arxiv.org/pdf/2408.06663 (accessed on 15 June 2025).
  43. Syndicat National de L’édition. 2025. Authors and Publishers Unite in Lawsuit Against Meta to Protect Copyright from Infringement by Generative AI Developers. March 18. Available online: https://www.sne.fr/press-release-authors-and-publishers-unite-in-lawsuit-against-meta-to-protect-copyright-from-infringement-by-generative-ai-developers/ (accessed on 23 March 2025).
  44. Tao, Qian. 2024. Copyright Problems regarding Training of Foundation Models: Clarification of the Theory and Application of Rules. Tribune of Political Science and Law 42: 152. [Google Scholar]
  45. U.S. Copyright Office. 2023. Notice of Inquiry on Artificial Intelligence & Copyright (Dkt. 2023–2026) Reply Comments of Meta Platforms. Inc. December 6, p. 13. Available online: https://downloads.regulations.gov/COLC-2023-0006-10332/attachment_1.pdf (accessed on 5 May 2025).
  46. U.S. Copyright Office. 2025. Copyright and Artificial Intelligence, Part 3: Generative AI Training (Pre-Publication Version). May, pp. 47–48. Available online: https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf (accessed on 15 June 2025).
  47. Vincent, James. 2023. Getty Images Sues AI Art Generator Stable Diffusion in the US for Copyright Infringement. The Verge, February 7. Available online: https://www.theverge.com/2023/2/6/23587393/ai-art-copyright-lawsuit-getty-images-stable-diffusion (accessed on 1 January 2025).
  48. Wan, Yong. 2021. The Dilemma and Solution of the Fair Use System under Copyright Law in the Age of Artificial Intelligence. Social Sciences Journal 43: 93. [Google Scholar]
  49. Wan, Yong, and Yalan Li. 2023. Research on the Interpretation of Fair Use Clause in Response to the Development of Artificial Intelligence Industry. Digital Law 1: 83. [Google Scholar]
  50. Wang, Qian. 2023. The Qualitative Analysis of Content Generated by Artificial Intelligence in Copyright Law. Tribune of Political Science and Law 41: 24. [Google Scholar]
  51. World Intellectual Property Organization. 1996. Amendments to Partly Consolidated Text of Draft Treaty No. 1. CRNR/DC/64. Delegation of the Peoples Republic of China, December 13. [Google Scholar]
  52. Xiong, Qi. 2025. Copyright Infringement Liability of Generative Artificial Intelligence Platforms. Global Law Review 47: 23. [Google Scholar]
  53. Xu, Xiaoben. 2024. Fair Use of Copyright of Artificial Intelligence Model from Technology Neutrality Perspective. Law Review 42: 86. [Google Scholar]
  54. Yao, Zhiwei. 2024. Determination and Prevention of Copyright Infringement of AI Generated Works: Focusing on the Worlds First Generative AI Service Infringement Judgment. Local Legislation Journal 9: 1. [Google Scholar]
  55. Ye, Jufen, and Qingyuan Sang. 2017. Storing Transcoded Novels on Webpages Constitutes Infringement. Peoples Court Daily (Beijing). [Google Scholar]
  56. Zhang, Jianhua, ed. 2006. Interpretation of Regulation on the Protection of the Right of Communication to the Public on Information Networks. Beijing: China Legal Publishing House, p. 5. [Google Scholar]
  57. Zhang, Ping. 2024. The Obstacles and Solutions of Copyright System in Artificial Intelligence Content Generation Mechanism. Science of Law 42: 27. [Google Scholar]
  58. Zhu, Kaixin. 2023. Understanding the Core Copyright Issues in Training Large AI Models. Tencent Research Institute, October 19. Available online: https://mp.weixin.qq.com/s/ab1p8v2QopyCmHXzdkcQmg (accessed on 6 April 2025).
  59. Zurth, Patrick. 2021. Artificial Creativity? A Case Against Copyright Protection for AI-Generated Works. UCLA Journal of Law and Technology 25: 18. [Google Scholar]
1
See Beijing Internet Court Civil Judgment (2023) Jing 0491 Minchu No. 11279; Changshu People’s Court of Jiangsu Province Civil Judgment (2024) Su 0581 Minchu No. 6697.
2
See Guangzhou Internet Court Civil Judgment (2024) Yue 0192 Minchu No. 113.
3
See Tremblay v. OpenAI, Inc. 716 F.Supp.3d 772 (ND Cal 2024); Tremblay v. OpenAI, Inc. 742 F.Supp.3d 1054 (ND Cal 2024).
4
See Silverman v. OpenAI, Inc. F.Supp.3d (ND Cal 2023).
5
See Robert Kneschke v LAION eV (Hamburger Landgericht, Az. 310 O 227/23, 2024).
6
See Sarah Andersen et al v Stability AI Ltd 700 F Supp 3d 853 (ND Cal 2023).
7
See footnote 6 above.
8
The copyright rights most closely associated with AI training are the right of reproduction and derivative rights. Whether AI training activities constitute exercises of these rights remains legally unsettled and subject to interpretation. It is also noteworthy that Article 10, Item 17, of the Copyright Law contains a catch-all provision referred to as “other rights”, which may be invoked to extend protection to certain uses arising in the context of AI, depending on how such rights are interpreted and applied.
9
Article 24 of the Copyright Law enumerates twelve statutory exceptions constituting fair use and permits additional exceptions to be established by other laws or administrative regulations.
10
See Article 7 of the GAPM.
11
See Guangzhou Internet Court Civil Judgment (2024) Yue 0192 Minchu No.113.
12
See Hangzhou Intermediate People’s Court Civil Judgment (2024) Zhe 01 Minzhong No. 10332.
13
See Beijing Intellectual Property Court Civil Judgment (2017) Jing 73 Minzhong No.840.
14
See footnote 6 above.
15
See Order Granting Motion to Dismiss, Kadrey v Meta Platforms, Inc. Case No. 23-cv-03417-VC (TSH) (N.D. Cal. 2023).
16
A report from the International Conference on Machine Learning (ICML) points out that AI models, unlike search engines, cannot directly access their training data. Instead, they can only make predictions based on the information encoded in their model weights. The seminar lasts for two days, with the first day held as part of the International Conference on Machine Learning (ICML).
17
See footnote 6 above.
18
See Guangzhou Internet Court Civil Judgment (2024) Yue 0192 Minchu No.113; Hangzhou Intermediate People’s Court Civil Judgment (2024) Zhe 01 Minzhong No.10332.
19
See Andersen v. Stability AI Ltd. 744 F. Supp. 3d 956, 979 (N.D.Cal. 2024).
20
See Jewelry 10, Inc v Elegance Trading Co. No 88 Civ 1320 (PNL) (SDNY 1991).
21
See Dave Grossman Designs, Inc. v Bortin 347 F Supp 1150, 1156–57 (ND Ill 1972).
22
See Nash v CBS, Inc. 899 F 2d 1537, 1540 (7th Cir 1990).
23
See Arnstein v. Porter 154 F.2d 464, 473 (2d Cir. 1946).
24
Light is essentially an electromagnetic wave. When an image of a work is formed on the retina of the human eye, the rod and cone cells on the retina convert the light signals (electromagnetic waves) into electrical signals. These electrical signals are then transmitted to the occipital lobe, allowing the brain to process the image and colors. It can be said that when a person reads, the work is inevitably copied into the brain in an electronic form. This reasoning is undisputed from a scientific perspective. However, from the standpoint of general common sense, categorizing the temporary reproduction that occurs in this process as copyright infringement is undoubtedly absurd.
25
See Public Relations Consultants Association Ltd v Newspaper Licensing Agency Ltd and Others [2013] UKSC 18, para 32.
26
Article 1270, paragraph 2, of the ражданский кодекс (Civil Code of the Russian Federation, 2024 amendments) states: “As reproduction shall not be deemed a short term recording of a work which is of temporary or accidental nature and is an integral and significant part of a technological process solely intended for the legal use of a work, or is the transfer of a work on an information telecommunication network between third parties by an information broker, provided that such record has no independent economic importance”.
27
Auteurswet van 1912 (zoals gewijzigd tot 1 September 2017) Artikel 13a: Onder de verveelvoudiging van een werk van letterkunde, wetenschap of kunst wordt niet verstaan de tijdelijke reproductie die van voorbijgaande of incidentele aard is, en die een integraal en essentieel onderdeel vormt van een technisch procédé dat wordt toegepast met als enig doel (a) de doorgifte in een netwerk tussen derden door een tussenpersoon of (b) een rechtmatig gebruik van een werk mogelijk te maken, en die geen zelfstandige economische waarde bezit.
28
See Código do Direito de Autor e dos Direitos Conexos (Code of Copyright and Related Rights, 2021amendments), Artigo 75.º, Âmbito 1.
29
Article 21 The author shall have the exclusive right to reproduce his work. See Japanese Copyright Act (Act No. 48 of 1970, amended up to 19 July 2024).
30
The right of reproduction is the right to reproduce the work in any manner or form, including temporary reproduction insofar as it has independent economic significance. Article 4.2 of European Copyright Code.
31
See Pudong New Area People’s Court of Shanghai Criminal Judgment (2015) Pudong Xing (Zhi) Chu No.12.
32
See C-5/08 Infopaq International A/S v Danske Dagblades Forening [2009] ECR I-06569, paras 62–65.
33
See Joined Cases C-403/08 and C-429/08 Football Association Premier League Ltd and Others v QC Leisure and Others and Karen Murphy v Media Protection Services Ltd [2011] ECR I-09083, paras. 174–177.
34
The Copyright Law was promulgated and came into effect in 1990 and has since undergone three amendments in 2001, 2010, and 2020.
35
See Article 3 of Regulation for the Implementation of the Copyright Law of the People’s Republic of China (2013).
36
See Lee v A.R.T. Co 125 F.3d 580 (7th Cir, 1997).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.