Chinese Buddhist Canon Digitization: A Review and Prospects

Xu Zhang

doi:10.3390/rel17010052

Institute of World Religions, Chinese Academy of Social Sciences, Beijing 100732, China

Religions2026, 17(1), 52;https://doi.org/10.3390/rel17010052

This article belongs to the Special Issue Shaping Sacred Knowledge: The Transmission and Legacy of the Chinese Buddhist Canon

Version Notes

Order Reprints

Abstract

The digitization of the Chinese Buddhist Canon represents a transformative shift in Buddhist textual scholarship, enabling unprecedented access to and analysis of one of East Asia’s most extensive scriptural collections. This review examines the evolution of digital platforms, with a focus on the Chinese Buddhist Electronic Text Association (CBETA) and the SAT Daizōkyō Text Database, which have become foundational resources in the field. It evaluates their respective methodological paradigms—CBETA’s critical edition model and SAT’s interoperable, ecosystem-based approach—while highlighting their shared reliance on the Taishō Tripiṭaka as a base text. The study identifies a persistent “Taishō bottleneck,” wherein the dominance of a single edition obscures the rich textual diversity inherent in the canon’s three major lineages: Central, Southern, and Northern. By surveying newly accessible image databases of key editions such as the Zhaocheng Jin Canon 趙城金藏, Sixi Canon 思溪藏, and Qidan Canon 契丹藏, the paper argues for a paradigm shift toward a multi-lineage collation framework. The integration of artificial intelligence—particularly in OCR, text–image alignment, and semantic analysis—is presented as essential for realizing a “Hybrid Digital Canon.” This model would harmonize genealogical, media, and methodological pluralism, fostering a more nuanced and historically grounded digital philology.

Keywords:

Chinese Buddhist canon; digitization; CBETA; SAT; artificial intelligence

1. Introduction

The Chinese Buddhist Canon is a comprehensive collection of Buddhist scriptures. In defining the Chinese Buddhist canon, most earlier scholars have emphasized several key elements: content, arrangement, structure, and authority.1 As summarized by Mr. Fang Guangchang 方廣錩, it is “a compendium of Chinese Buddhist texts and related literature that fundamentally encompasses the Chinese-translated Buddhist scriptures of successive dynasties as its core, organized according to a specific structural framework, and possessing certain distinguishing features (Fang 2006). The Canon’s evolution—from manuscript versions in the Northern and Southern Dynasties to modern digital editions—reflects both technological progress and changing scholarly paradigms.

Research on the Chinese Buddhist Canon has now entered a new phase. Fang Guangchang’s systematic investigations of Buddhist manuscripts from Dunhuang offer a detailed textual analysis of catalogues related to Tang Dynasty manuscript canons, as well as the formation and evolution of the Thousand-Character Classifiers 千字文帙號 used in the Canon (Fang 2006). His work has substantially clarified the development and genealogical origins of manuscript canon traditions. In parallel, Li Fuhua 李富華 and He Mei 何梅 has conducted meticulous research into the carving processes and catalogs of printed Canons, significantly advancing our comprehension of printed versions (Li and He 2003). Particularly influential has been the “three-lineage theory” of printed canons, proposed by Fang Guangchang and Chikusa Masaaki 竺沙雅章, which serves as a guiding framework for identifying different editions of the Canon and clarifying their genealogical relationships (Fang 1991; Chikusa 2000).

Building upon criteria such as catalogue organization, Thousand-Character Classifiers, and physical format, the two scholars categorized printed editions of the Canon into three principal lineages: the Central lineage 中原系, the Southern lineage 南方系, and the Northern lineage 北方系. The Central lineage comprises the Kaibao Canon 開寶藏, the First and Second Carved Editions of the Goryeo Canon 初雕/再雕高麗藏, and the Zhaocheng Jin Canon 趙城金藏. The Southern lineage includes such editions as the Fuzhou Canon 福州藏, the Sixi Canon 思溪藏, and the Qisha Canon 磧砂藏, while the Northern lineage is represented by the Qidan Canon 契丹藏.

This genealogical framework offers valuable guidance for the textual criticism of the Canon and for tracing the evolution of its contents. Nevertheless, the Chinese Buddhist Canon is an extensive and intricate collection, with documented instances of cross-lineage influence. For example, prior to its carving, the Second Carved Edition of the Goryeo Canon—classified within the Central lineage—was collated by the Korean monk Sugi 守其 and others against the Qidan Canon, a text belonging to the Northern lineage. The complex relationships among canons across these lineages, as well as those between printed canons and earlier manuscript traditions, remain areas requiring further inquiry.

Previous methodological approaches to the study of these three canonical lineages have predominantly focused on analyzing catalogue structures, Thousand-Character Classifiers, and physical formats. In terms of scholarly focus, research has often been confined to case-specific analyses of individual sutras. However, given the sheer volume and inherent complexity of the Chinese Buddhist Canon, such isolated investigations are inadequate for capturing the broader characteristics of each lineage. To address these limitations comprehensively, a systematic and comparative textual criticism of canons across different lineages is imperative.

Driven by the advancements in digital humanities since the late twentieth century, systematic collation projects have evolved from theoretical concepts to realities. Digital humanities have fundamentally transformed the study of ancient texts, with research on the Chinese Buddhist Canon being a prominent example. Early foundational efforts, such as the Electronic Buddhist Text Initiative (EBTI) founded in 1993 by Professor Lewis R. Lancaster, established a framework for international collaboration in the digitization of cross-linguistic Buddhist literature. This movement gave rise to several influential digital projects, most notably the Chinese Buddhist Electronic Text Association (CBETA) and the SAT Daizōkyō Text Database. These platforms have not only significantly improved access to the Canon but have also augmented its research utility through features such as full-text search, cross-edition collation, and standardized citation systems.

CBETA, initiated in Taiwan, China, in 1998, has become the most widely used digital database in Chinese Buddhist studies. It builds on the Taishō Shinshū Daizōkyō 大正新修大藏經 (Taishō) but extends its scope to include later commentaries, monastic gazetteers, stone inscriptions, and modern works (Tu 2015). Meanwhile, the SAT database, launched in Japan in 1994, offers a complete digital version of the 85-volume Taishō, along with rich contextual resources such as original images, parallel editions, and links to Japanese and English translations (Muller et al. 2017). Both projects represent significant milestones in the digital reinvention of the Buddhist Canon (Wittern 2017).

Contemporary databases of the Buddhist Canon can be classified into three functional categories, each addressing distinct scholarly objectives:

(1): The first category comprises Image Databases, which provide high-quality digital reproductions of canonical collections through modern photographic and scanning technologies. By granting direct access to primary sources, these databases support essential philological tasks such as producing critical editions and analyzing textual variants. Representative examples include the National Library of China’s databases for the Zhaocheng Jin Canon and Sixi Canon, as well as the Three Editions of Buddhist Canons Database from Zōjōji Temple 增上寺.
(2): Buddhist Canon Full-Text Databases, on the other hand, transcribe the Canon’s content into searchable text. While image databases offer proximity to original sources, full-text databases are favored for daily research due to their efficiency in retrieval and reference. Among these, CBETA stands out as a preeminent example, surpassing the printed Taishō edition in both accessibility and textual accuracy. Moreover, CBETA has established an academic standard with its concise citation format indicating volume, text number, page, column, and line.
(3): The third category is Image–Text Parallel Databases, exemplified by Japan’s SAT Daizōkyō Text Database. These integrate the advantages of both image and full-text repositories by pairing searchable transcriptions with corresponding original images, thereby greatly facilitating textual verification and comparative analysis. A distinctive feature of the SAT database is its interoperability with other major resources, including the Digital Dictionary of Buddhism (DDB), INBUDS, and CiNii, signaling a forward-looking model for the integrated development of Buddhist digital scholarship (Tong 2017).

A significant advancement in the digitization of the Chinese Buddhist Canon in recent years has been the increasing accessibility of image databases and the advent of image–text parallel databases. The digitization and open-access of high-resolution canonical images have substantially expanded the primary sources available for scholarly collation, providing the foundational raw materials essential for producing critical editions and conducting systematic analyses of textual variants. This proliferation is instrumental in reconstructing more accurate and authoritative versions of Buddhist scriptures, thereby enabling large-scale, systematic collation of the entire Canon. While image–text parallel databases represent a promising direction in digitization, their current functionality remains relatively basic, primarily focused on delivering reliable texts. Future digitization efforts, aided by Artificial Intelligence (AI), could leverage these resources to achieve automated character recognition, punctuation, and computer-assisted collation, establishing a groundwork for comprehensive textual criticism guided by the three-lineage theory.

Despite the accomplishments of platforms such as CBETA and SAT, which are predominantly based on the Taishō, scholarly critiques have highlighted its hastily compiled nature and questionable collation notes. Furthermore, certain major digitization initiatives remain constrained by technical or structural limitations, and have yet to attain a comparable degree of integration and global scholarly impact as leading international platforms.

This review aims to address these gaps by examining the methodological shift from static full-text databases to dynamic multimodal platforms. It contends that while existing platforms excel in text retrieval, the next phase of digital philology necessitates a paradigm transition from reliance on a single base text—primarily the Taishō—to a multi-base collation framework. By evaluating newly accessible global image databases of canonical editions across different lineages and emerging AI technologies, this paper proposes a theoretical model for a “Hybrid Digital Canon.” Specifically, it investigates the philological imperative of integrating textual traditions from distinct canonical lineages, such as the Northern and Southern lineage, to complement and critically engage with the Taishō system. This approach seeks to overcome current epistemic constraints in textual criticism and foster a more rigorous, diverse digital scholarly environment.

2. Digitization of Texts of the Chinese Buddhist Canon

As early as the 1980s and 1990s, Professor Lewis R. Lancaster of the University of California, Berkeley, recognized the potential of digital humanities technologies in Buddhist studies (Lancaster 2003, 2008). He actively promote the digitization, standardization, and collaborative development of the Buddhist Canon.

In 1993, he convened representatives from various linguistic traditions engaged in digitizing Buddhist materials and established the Electronic Buddhist Text Initiative (EBTI). This association brought together academic and religious institutions worldwide involved in Buddhist text digitization, forming an independent, open organization dedicated to advancing and coordinating digital transcription efforts across participating institutions (Du 1998).

From 1994 to 2008, EBTI organized a series of international conferences across East Asia that facilitated crucial consensus on technical standards, including character encoding and markup languages.

Under the impetus of EBTI, digital humanities technologies were progressively adopted by Buddhist research institutions worldwide, leading to the systematic digitization of Buddhist scriptures in multiple languages, including Chinese, Sanskrit, Pali, Vietnamese, and Tibetan. Among these efforts, four digitization projects focusing on Chinese Buddhist texts are particularly representative:

(1): The Zen-Knowledgebase Project, developed by the International Research Institute for Zen Buddhism (IRIZ) at Hanazono University in Japan. Directed by Dr. Urs App, then Deputy Director of the Institute, its primary goal was to digitize Zen Buddhist texts and build a comprehensive database. The content encompasses ancient Zen texts, modern research, reference materials, maps, and Zen art.
(2): The Chinese Buddhist Electronic Text Association (CBETA) Project, led by Venerable Huimin 惠敏法師, Christian Wittern, and others. Its primary objective was the digitization of Chinese Buddhist Canons, beginning with the Taishō and later expanding to include other major editions.
(3): The SAT Daizōkyō Text Database, founded by Professor Yasunori Ejima 江島惠教 of Japan. SAT is the abbreviation for the Sanskrit phrase “Saṃgaṇikīkṛtaṃ Taiśotripiṭakaṃ” (Digitized Taishō). Its primary goal was the digitization of the 85 volumes of the Taishō.
(4): The Research Institute of the Tripitaka Koreana (RITK), established in 1993 to ensure the permanent preservation of the 81,340 wooden printing blocks of the Second Carved Edition (1236–1251) of the Goryeo Canon housed at Haeinsa Temple. The institute undertook the comprehensive digitization of both the physical printing blocks and the full-text images of the Goryeo Canon.

Of the four projects mentioned, CBETA, developed by the Chinese Buddhist Electronic Text Association in Taiwan, China, and the SAT database, created by the SAT Daizōkyō Text Database Committee in Japan, have become the most extensively utilized resources in Chinese Buddhist studies. The following section will introduce and compare these two projects.

2.1. The Chinese Buddhist Electronic Text Association (CBETA): Historical Genesis and Methodological Paradigms

The establishment of the CBETA in February 1998 was not merely a technical milestone but the culmination of a complex institutional evolution within the Buddhist community in Taiwan, China. The mid-1990s witnessed a surge of enthusiasm for digitization, with numerous fragmented initiatives driven by distinct entities such as the Computing Center of Academia Sinica, Fo Guang Shan 佛光山, and various university research centers (Fang 1996). However, these early efforts were characterized by a lack of coordination; each participating unit worked separately on electronic transcriptions, leading to duplicated efforts and inconsistent data standards. A notable precursor occurred in 1994, when an initiative convened under the same name—“Chinese Buddhist Electronic Text Association”—attempted to unite secular scholars and religious practitioners. Despite its ambitious vision, this early project stalled, failing to achieve the necessary broad-based consensus to sustain such a monumental task.2

Two structural obstacles fundamentally hindered these pre-1998 efforts. The first was the legal complexity surrounding the Taishō. As the academic standard, the Taishō was indispensable for any serious scholarly database; yet, obtaining copyright permissions from the Japanese publisher, Daizō Shuppan 大藏出版, proved difficult for individual organizations (Hou 2007). The second obstacle was a prevailing internal hesitation within Buddhist communities, rooted in religious ethics. Ven. Huimin 慧敏法師 retrospectively identified a specific apprehension among some Buddhist leaders: the fear that distributing erroneous digital scriptures might create negative karma and incur spiritual blame (Hou and Zhuo 2015). This “karmic anxiety” acted as a significant conservative force, discouraging rapid digitization without rigorous quality assurance mechanisms.

It became evident that overcoming these legal and religious barriers required a centralized, authoritative consortium capable of guaranteeing both legal compliance and textual accuracy. Consequently, the successful formation of CBETA in 1998 represented a strategic consolidation of resources. Under the leadership of Ven. Huimin (Chairman) and Ven. Hengching 恆清法師 (Supervisor), the project secured stable funding from the North America Venerable Yin Shun Foundation 北美印順導師基金會 and the Chung-Hwa Institute of Buddhist Studies 中華佛學研究所. Crucially, the organization enlisted Christian Wittern as a technical consultant, bridging the gap between traditional scholarship and emerging digital standards.

This institutional consolidation allowed CBETA to negotiate successfully for the Taishō rights and, more importantly, to establish a sustainable workflow that prioritized philological rigor over speed (Du 1999a). By alleviating the fear of “disseminating errors” through strict proofreading protocols, CBETA legitimized digital texts within the religious community. The project’s subsequent evolution can be analyzed through three methodological dimensions: foundational standardization, the dynamic critical edition model, and its transformation into a semantic research infrastructure.

2.1.1. The Philology of Encoding: TEI and the “Gaiji” 缺字 Solution

In the burgeoning era of digitization during the late 1990s, CBETA faced a critical decision regarding the underlying architecture of its data. While many contemporary projects opted for visual-centric formats like PDF or HTML for immediate readability, CBETA made the farsighted decision to adopt SGML (later XML) combined with the Text Encoding Initiative (TEI) guidelines. This choice prioritized structural semantics over visual presentation, ensuring that the logical components of the text—such as sutra titles, fascicle divisions, and speakers—were machine-readable and independent of specific display software (Du 1999b). This adherence to international standards proved crucial for the database’s longevity, allowing seamless migration across evolving operating systems over the subsequent decades.

A major epistemological obstacle in digitizing East Asian classical texts was the limitation of character encoding. Standard sets available at the time, such as Big5 or early Unicode, were insufficient to represent the thousands of variant characters (yitizi 異體字) and rare Siddhaṃ scripts essential to Buddhist philology. A simplistic solution adopted by many early projects was to create proprietary fonts or bitmap images for these “missing characters” (Gaiji). However, this approach rendered the characters unsearchable and locked the data into specific font environments. CBETA rejected this visual-only solution in favor of a logical description method.

As detailed in the technical reports, the team implemented a philologically robust solution using Composition Strings (ideographic description sequences). Instead of assigning a temporary code, a rare character would be described by its component parts (e.g., [Component A + Component B]). For example, a variant character not in Unicode would be represented as a computable string describing its radical and phonetic components. This ensured that even if the character could not be displayed, its semantic structure remained searchable and analyzable (Wang 2000).

Furthermore, to manage the chaotic landscape of character variants across different canonical editions, CBETA developed a Universal Mapping System (known as M-codes). This system created a mapping table that linked characters across various encoding standards (CCCII, Big5, Unicode) and dictionary references. By treating the character code as a reference pointer rather than a fixed visual glyph, CBETA established a flexible infrastructure that could adapt as the Unicode standard expanded to include Extension B, C, and beyond. This rigorous standardization laid the groundwork for the corpus to serve as a reliable dataset for computational linguistics.

Building on this technical foundation, CBETA’s scope expanded in distinct, strategic phases. The first phase (1998–2003) focused exclusively on digitizing the 85 volumes of the Taishō to create a reliable base text. The second phase (2003–2008) integrated the Shinsan Zokuzōkyō, significantly expanding coverage of Chan literature. The third phase (2008–Present) marked a paradigm shift from a closed “canon” to an open encyclopedia, incorporating diverse materials such as the Jiaxing Canon, Fangshan Stone Sutras, and modern scholarship (Du 2012). Table 1 illustrates the gradual expansion of CBETA’s coverage scope from 1998 to 2022. This expansion reflects a shift in the criteria for canonicity, moving from a focus on Indian translations to a comprehensive library of East Asian Buddhist history.3

Table 1. Buddhist Texts Incorporated by CBETA Over the Years.

2.1.2. Digitizing the Apparatus Criticus: The XML-Based Collation Structure

While the printed Taishō serves as the base text, the CBETA digital version effectively supersedes its print progenitor in both textual quality and critical utility. The original Taishō, compiled in the early 20th century, was a monumental achievement but contained numerous punctuation errors and misprints inherited from the Shukusatsu Canon 縮刷藏. CBETA’s editorial team did not merely transcribe the text; they engaged in a massive proofreading and re-punctuation project. Through a rigorous “human-in-the-loop” process involving hundreds of volunteers and expert editors, thousands of errors were corrected, producing a text that is philologically superior to the original print edition (Chen 2013).

A key methodological innovation in this process was the establishment of the “Three No’s and Three Yes’s” 三不三要 editorial principle. As outlined by the editorial team, this principle balanced fidelity with correction. The “Three No’s” stipulated that: (1) Common variants should not be normalized if the original meaning is clear; (2) The text should not be changed if the meaning is understandable; and (3) Text should not be emended without sufficient evidence. Conversely, the “Three Yes’s” mandated correction when: (1) The text clearly contradicts the base text; (2) There is an obvious semantic error (often using other editions like the Zokuzōkyō as reference); or (3) A better reading exists in a parallel version supported by strong evidence (Bhikkhu et al. 2005).

Beyond textual emendation, a fundamental contribution lies in how CBETA handled the complex collation notes. In the printed Taishō, variant readings from other lineages (such as the Song, Yuan, Ming, or Old Japanese manuscript canons) are presented as static footnotes, which can be difficult to cross-reference systematically. CBETA transformed this static “apparatus criticus” into structured machine-readable data using the hierarchical architecture of XML (Extensible Markup Language). By adopting the TEI guidelines, editors encoded the relationships between the base text and over twenty textual witnesses using specific tags: <app> (apparatus entry), <lem> (lemma/base text), and <rdg> (reading/variant) (Bhikkhu et al. 2005).4

This digital encoding does not merely reproduce the visual layout of the footnotes but logically associates variant characters with their specific coordinates in the base text. For instance, where the Taishō text (<lem>) reads “heart,” the XML data simultaneously records that the Song edition (<rdg wit=“Song”>) reads “mind.” This structure allows the digital platform to display collation notes interactively and enables researchers to perform systematic searches for specific textual variants across the corpus.

However, this digitization process introduces new interpretative challenges. The introduction of modern Western-style punctuation (such as quotation marks, question marks, and exclamation points) has significantly lowered the barrier for reading classical Chinese Buddhist texts. Yet, case studies—such as those examining the Diamond Sūtra—demonstrate that this “interpretive punctuation” imposes a specific exegetical layer that may occasionally diverge from the syntactic structure of the original Sanskrit or alternative Chinese interpretations (Chen 2013). Thus, while the CBETA edition offers unprecedented accessibility, it necessitates a critical awareness of the interpretative choices embedded in both its punctuation and its handling of collation data.

2.1.3. Infrastructure for the Future: Integration and Semantic Discovery

Beyond textual accuracy, CBETA has fundamentally altered the citation standard of Buddhist studies. In the pre-digital era, citing a specific line in a Buddhist text was often imprecise due to varying pagination across editions. Recognizing this “citation crisis,” CBETA established a precise, line-based citation standard known as “Line Beginning Information” 行首資訊. This unique identifier string—comprising the version, canon edition, text number, page, column, and line (e.g., T08n0235_p0748c26)—functions as a persistent digital object identifier (DOI) for every line of the canon.5 This format has been widely adopted as the academic standard in international Chinese Buddhist studies, allowing scholars to cite digital evidence with the same granularity and reliability as printed books (CBETA 2007).

In its maturation phase, the project shifted focus from text transcription to building a comprehensive research platform. A critical component of this platform is the Digital Catalogue project. By creating a structured metadata framework, the catalogue serves as a bibliographic hub that links the Chinese canon with its Sanskrit, Pali, and Tibetan counterparts. This inter-linking allows researchers to navigate across linguistic traditions, facilitating comparative studies that were previously labor-intensive (Du et al. 2008).

This evolution culminated in the transition from a standalone CD-ROM application to the cloud-based CBETA Research Platform in 2016. As Hong (2016, 2018) argues, this shift transforms CBETA from a “data provider” into a “research infrastructure.” The new platform addresses the limitations of local storage for big data analysis and enables advanced functions like Quantitative Analysis. Researchers can now generate real-time character frequency statistics and collocation patterns, which are indispensable tools for stylometric analysis, authorship attribution, and dating of anonymous scriptures.

Most recently, the integration of Artificial Intelligence (AI) and Natural Language Processing (NLP) has opened new frontiers. Collaborating with computational linguists, CBETA has applied unsupervised word segmentation algorithms tailored for classical Chinese Buddhist texts. This technology overcomes the lack of word boundaries in the original texts, enabling more accurate keyword searches and linguistic analysis. Furthermore, quotation detection algorithms have been deployed to automatically map inter-textual relationships, identifying where later commentaries quote earlier sutras even when the citations are inexact. These semantic tools allow researchers to trace the evolution of doctrinal concepts across centuries of translation and interpretation, marking the arrival of “Semantic Knowledge Discovery” in Buddhist studies (Bhikkhu et al. 2018).

2.2. The SAT Daizōkyō Text Database: From Digital Text to Research Ecosystem

While CBETA focused on producing a definitive critical edition through rigorous XML coding to serve as a standardized base text, the SAT Daizōkyō Text Database (SAT) followed a divergent trajectory. Prioritizing the construction of a web-based “knowledge ecosystem,” SAT evolved from a digitization project into a central hub for Digital Humanities (DH) in East Asian Buddhism. Initiated in 1994 by Professor Yasunori Ejima and subsequently led by Professor Masahiro Shimoda 下田正弘 at the University of Tokyo, the project has fundamentally influenced the methodology of digital Buddhist studies in Japan (Nagasaki 2015b; Muller et al. 2017). Its development can be analyzed through three distinct phases: the shift to a web-centric philosophy, the construction of an interoperable API ecosystem, and the recent integration of AI-driven image–text alignment.

2.2.1. Evolution of Editorial Philosophy and Web Strategy

Unlike CBETA, which initially relied on CD/DVD distribution to ensure accessibility in regions with unstable internet connections, SAT adopted a web-centric strategy relatively early. By 2008, SAT had fully transitioned to a server-side architecture, driven by the realization that the future of humanities resources lay in dynamic online services rather than static local storage. This strategic shift allowed SAT to implement a “consumable and renewable” database model. In this model, the massive corpus of over 100 million characters could be continuously updated—correcting typos, refining punctuation, and adding new metadata—without requiring users to purchase new media or reinstall software. This approach transformed the digital canon from a “product” into a “service” (Nagasaki 2015b).

A significant philological and technical contribution during this phase was SAT’s systematic resolution of the gaiji (missing character) problem. In the early stages of digitization, thousands of rare Buddhist characters (such as variant forms found in Esoteric texts or specific Siddhaṃ scripts) were not present in standard computer encodings. While many projects resorted to using internal image-based mappings or proprietary fonts, SAT took a more sustainable route by actively collaborating with the Ideographic Rapporteur Group (IRG). Through this effort, SAT successfully submitted and standardized over 3000 rare Buddhist characters into the international Unicode standard (Extension blocks). This effort not only solved the display issue for Buddhist texts but also contributed significantly to the global digital infrastructure for East Asian languages, ensuring that Buddhist philological data remains searchable and machine-readable in the long term (Muller et al. 2017).

2.2.2. Interoperability: The API Ecosystem

A defining feature of SAT, distinguishing it from many other textual databases, is its radical emphasis on Interoperability. Rather than building a closed “silo” of data, SAT utilized Web APIs (Application Programming Interfaces) to create a dynamic, interconnected network of resources. This architecture transforms the text from a static object of reading into a node within a larger knowledge graph.

As detailed by Nagasaki (2015b), SAT integrated several external databases directly into its reading interface through “mashup” technologies:

Lexicographical Integration: By linking with the Digital Dictionary of Buddhism (DDB), users can highlight terms in the Taishō text and immediately view definitions in a pop-up window. This integration respects the independence of both projects while creating a seamless user experience.

Bibliographic Discovery: SAT is interoperable with the INBUDS article database. When a user reads a specific scripture, the system can automatically query INBUDS to display a list of modern academic papers related to that text or specific terminology, effectively bridging the gap between primary sources and secondary scholarship.

Cross-Linguistic Mapping: Through collaboration with the University of Hamburg, SAT incorporated the Indo-Tibetan Lexical Resource (ITLR). This allows scholars to cross-reference Chinese Buddhist terms with their Sanskrit and Tibetan equivalents, facilitating comparative philological research across linguistic traditions.

Parallel Corpora: SAT also integrates the BDK (Bukkyō Dendō Kyōkai) English Tripiṭaka series. Users can view the Chinese original and the English translation side-by-side, paragraph by paragraph, which is invaluable for translation studies and international education.

Through these integrations, SAT functions not merely as a text repository but as a “platform of platforms,” enabling a fluid research workflow that moves effortlessly from textual reading to lexicographical analysis, bibliographic review, and comparative translation.

2.2.3. The AI Turn: SATed and High-Precision Text–Image Alignment

The most methodologically significant advancement in recent years is the development of SATed (SAT Digital Scholarly Editing), a system that integrates the International Image Interoperability Framework (IIIF) with Artificial Intelligence technologies to solve the “text–image disconnection” problem.

Traditionally, verifying a digital transcription against the original woodblock print was a labor-intensive process. Scholars had to manually flip through high-resolution images or physical books to find the corresponding line of text, a task that becomes exponentially difficult with large corpora. Addressing this, Nagasaki et al. (2023, 2024) introduced a novel workflow combining high-precision OCR with algorithmic alignment:

AI-OCR Generation: The team utilized the NDL Classical OCR (developed by the National Diet Library of Japan), which is trained on vast amounts of pre-modern Japanese and Chinese materials. This AI model generates raw text data from woodblock images of various canonical editions, such as the Zhaocheng Jin Canon, the Sixi Canon, and the Goryeo Canon.

Algorithmic Alignment: Since OCR output inevitably contains errors, it cannot be used directly as a critical edition. However, SAT developers employed the Python 2.1 difflib library to compare this “noisy” OCR text with the high-quality, proofread text of the Taishō already in the SAT database. By calculating the edit distance and matching sequences, the system automatically establishes coordinate links between the characters in the digital SAT text and their precise locations on the images of other editions.

Visual Philology: This breakthrough allows for a “human-in-the-loop” verification system. Users can click on any line of the digital Taishō text and immediately view the corresponding zone (using IIIF coordinate cropping) in the original high-resolution images of the Fuzhou or Sixi Canons.

This technology has profound implications. It enables scholars to perform large-scale comparative analysis of layout, variant characters, and annotations across different canonical lineages without leaving the digital environment. For instance, determining whether a specific variant reading in the Taishō notes (var. A) is actually supported by the physical evidence in the Sixi Canon now takes seconds rather than minutes. Furthermore, this system allows for the correction of OCR data through crowdsourcing, creating a virtuous cycle where human verification improves the underlying AI models (Nagasaki 2024).

To clarify the distinct positions of these two major platforms in the landscape of digital philology, we summarize their technical and functional differences in Table 2.

Table 2. Comparative Analysis of Technical and Functional Features: CBETA vs. SAT.

3. Prospects for the Digitization of the Chinese Buddhist Canon

The digitization of the Chinese Buddhist Canon has achieved significant milestones over the past three decades, fundamentally transforming the landscape of Buddhist studies. Platforms such as CBETA and SAT have revolutionized textual access, enabling scholars to search the vast corpus of scripture with unprecedented efficiency. By establishing rigorous encoding standards (TEI/XML) and creating interconnected research ecosystems, these projects have democratized access to primary sources, allowing researchers globally to engage with the canonical tradition regardless of their physical location. The transition from physical paper to searchable digital text represents a major leap in the history of Buddhist philology, effectively addressing immediate concerns regarding preservation and accessibility.

However, as the field matures, the very success of these first-generation platforms has brought their epistemological boundaries into sharper focus. While they excel at text retrieval, their design reflects the academic preferences of the early 20th century rather than the pluralistic needs of modern philology. The current digital landscape is characterized by a “single-base text” structure, which typically presents one homogenized text as the standard, relegating variant readings to the periphery of footnotes or apparatus. This model, while efficient for general reading, may inadvertently obscure the rich, complex, and often divergent history of the canon’s transmission.

The core challenge facing the next generation of digitization is the accurate representation of the textual lineage diversity of the Chinese Buddhist tradition. The canon is not a monolithic entity but a fluid network of evolving texts that bifurcated and merged across different geographical regions. A scripture circulating in the Northern dynasties often differed significantly—in structure, phrasing, and chapter ordering—from its counterpart in the South. Current digital platforms, by privileging a single “standard” edition, face challenges in capturing this multi-dimensional reality, often compressing distinct historical lineages into a linear narrative.

Therefore, the future of digital philology lies not merely in refining existing databases, but in a fundamental paradigm shift. There is a growing need to move towards a dynamic, multi-version collation environment that treats different canonical lineages as equal, independent nodes within a network. This shift requires leveraging the newly available wealth of global image resources to reconstruct the complex relationships between texts, moving from a model that digitizes a single canon to one that digitizes the entire tradition.

3.1. The “Taishō Bottleneck”: Lineage Bias and Collation Limitations

A significant constraint of current digital platforms is their predominant reliance on the Taishō as the absolute master version. For nearly a century, the Taishō has been serving as the standard critical edition. Consequently, both CBETA and SAT adopted it as their base text. However, philological scholarship in recent decades has increasingly nuanced this reliance, suggesting that the Taishō embodies certain limitations inherent to its compilation era.

First, regarding their base texts, the core content of both CBETA and SAT derives from the Japanese-compiled Taishō. Traditionally, this edition has been praised for its refined base texts, rigorous editorial principles, and meticulous collation (Fang 1997). However, recent re-evaluations of the Taishō’s editorial work by scholars from Mainland China, Taiwan, China, and Japan have revealed significant shortcomings. Zhou Bokan 周伯戡 identified punctuation errors in the Taishō that mirror those in the Pinjia Canon 頻伽藏 (a reprint of the Shukusatsu Canon), suggesting the Taishō may have relied on the latter (Zhou 2002). Nagasaki Kiyonori 永崎研宣 pointed out that the Taishō’s base text was not, as officially claimed, the Second Carved Edition of the Goryeo Canon, but rather the Shukusatsu Canon (Nagasaki 2015a). Compiled in a relatively short period of only ten years, the Taishō’s editors did not conduct comprehensive collation using the Sixi Canon, Puning Canon, or Jiaxing Canon, but instead transcribed the collation notes from the Shukusatsu Canon—raising doubts about their reliability. Furthermore, a recent review of the Taishō’s collation quality by a monastic community in China found an error rate of 13.6% in one volume alone when checked against two other editions, including instances of both erroneous and omitted corrections (Fang 2015, p. 20).

Second, the data structure based on the Taishō imposes a hierarchical relationship on textual variants. In CBETA’s XML structure, the Taishō text functions as the immutable root, while readings from other canons are coded as subordinate variants. This makes it computationally difficult to reconstruct a text from a different lineage as a primary object of study. For example, if a scripture in the Zhaocheng Jin Canon has a completely different chapter order or contains paragraphs missing from the Taishō, the current system struggles to display it as a coherent, independent whole.

Third, the physical layout indexing used by systems like SAT is bound to the page-grid of the printed Taishō. This grid cannot easily accommodate texts or sequencing systems from other lineages that do not correspond to the Taishō layout. As a result, non-Taishō traditions are structurally marginalized, and are treated as supplements rather than independent textual systems. This “Taishō Bottleneck” restricts the ability to explore the full diversity of the Buddhist textual heritage.

3.2. The Material Turn and the Dialectics of Systematic Collation

To overcome the limitations of the Taishō, the field must embrace the “Material Turn”—a return to primary sources enabled by the proliferation of high-resolution image databases. We are currently witnessing an explosion of digital access to canonical woodblocks that were previously sequestered in rare book libraries. These resources cover the three major genealogical systems of the Chinese canon: the Central Lineage 中原系, the Southern Lineage 南方系, and the Northern Lineage 北方系.

For the Central Lineage, initiated with the Kaibao Canon carved in Sichuan, we now have access to critical witnesses. The Zhaocheng Jin Canon, historically significant for preserving the text of the lost Kaibao Canon, has been digitized by the National Library of China. Simultaneously, the Second Carved Edition of the Goryeo Canon, often considered a standard for this lineage, is fully accessible via the “Archives of Buddhist Culture” in Korea. For the Southern Lineage, which circulated widely in the Jiangnan region, the National Library of China has released images of the Sixi Canon, while the Imperial Household Agency in Japan provides access to the Fuzhou Canon. Regarding the Northern Lineage, which maintains close ties to the manuscript traditions of Chang’an, crucial resources include the Qidan Canon scattered scrolls discovered in the Yingxian Wooden Pagoda and the Liao/Jin dynasty stone scriptures from the Fangshan Stone Sutras (Shanxi Sheng Wenwuju 1991; Zhongguo Fojiao Xiehui 2000). Notable examples of publicly released Canon image collections include:

(1): National Library of China’s Zhaocheng Jin Canon, China: In the winter of 1931, Zhu Qinglan 朱慶瀾, Ye Gongchuo 葉恭綽, and others initiated the “Photolithographic Song Canon Association” to photolithographically reproduce the Song-Yuan edition Qisha Canon held at the Kaiyuan Temple 開元寺 and Wolong Temple 臥龍寺 in Shaanxi Province for wider circulation. The monk Fan Cheng 范成 was tasked with investigating ancient Buddhist scriptures held in various temples that could supplement missing fascicles of the Qisha Canon. In the summer of 1932, following local leads, Fan Cheng discovered the Zhaocheng Jin Canon at the Guangsheng Temple 廣勝寺 in Zhaocheng 趙城, Shanxi. The majority of the texts in the Zhonghua Dazangjing, published by Zhonghua Book Company 中華書局, have been photolithographically reproduced based on the National Library’s Zhaocheng Jin Canon as the base text (Zhonghua Dazangjing Bianjiju 1983). In recent years, as part of the National Library’s ancient text digitization project, the Zhaocheng Jin Canon has been digitized and included in the “Chinese Ancient Books Resource Database.” (Appendix A).
(2): National Library of China’s Sixi Canon, China: Originally acquired by Yang Shoujing from Japan in the late Qing dynasty, the Tian’an Temple edition of the Sixi Canon consisted of over 4600 fascicles but lacked the 600-fascicle Mahāprajñāpāramitā Sūtra. 6After passing through the Songpo Library 松坡圖書館 in Beijing, the collection was integrated into the National Library of China (NLC) in 1950. The NLC later completed the set by repurchasing the missing sutra, and the canon has since been digitized and included in the “Chinese Ancient Books Resource Database.” (Appendix A).
(3): Fuzhou Canon Hybrid Edition held by the Imperial Household Agency 宮內廳, Japan: Produced in the Fuzhou region during the Northern and Southern Song dynasties, the Fuzhou Canon includes the Dongchan Temple edition 東禪寺版 (Chongning Canon 崇寧藏) and Kaiyuan Temple edition 開元寺版 (Pilu Canon 毗盧藏), with surviving sets predominantly hybrid in composition. While no complete versions exist in China today, Japan houses several hybrid sets, including one preserved by the Imperial Household Agency’s Shoryōbu. This edition, comprising 1454 texts across 5733 fascicles, was utilized as a collational reference for the Taishō, indicated by the abbreviation “宫” in its collation notes. Full-image access is provided through the “Imperial Household Agency Shoryōbu Collection of Chinese Classics Database.” (Appendix A) More recently, Cultural Relics Press issued a facsimile of the Fuzhou Canon held in the Imperial Household Agency’s Archives and Mausolea Department (Shoryōbu 書陵部) (Wang 2025).
(4): First and Second Carved Editions of the Goryeo Canon held by Haeinsa Temple 海印寺, Korea: Initially digitized by the Korean Goryeo Canon Research Institute, both the First and Second Carved Editions were temporarily available online before being relocated to Dongguk University’s “Archives of Buddhist Culture” database, where they are currently accessible. (Appendix A).
(5): Yuan Official Canon 元官藏 held by Yunnan Provincial Library, China: The Yuan Official Canon was identified relatively late. As early as the beginning of the last century, Ono Gemmyō 小野玄妙 noted the existence of a Yuan Dynasty Canon distinct from both the Qisha Canon and the Puning Canon, but its name remained uncertain. In 1984, Tong Wei 童瑋, Fang Guangchang 方廣錩, and Jin Zhiliang 金志良, in their article “The Discovery of the Officially Carved Yuan Dynasty Canon,” named the approximately 32-volume Canon found at the Yunnan Provincial Library the Yuan Official Canon (Tong et al. 1984). Subsequently, Li Jining 李際寧, Chi Limei 池麗梅, and others published articles researching the Yuan Official Canon (Li 2010; Chi 2023). The Yuan Official Canon held by the Yunnan Provincial Library is currently accessible via the library’s “Yunnan Ancient Books Digital Library” database. (Appendix A).
(6): Jingshan Canon 徑山藏 held by the University of Tokyo, Japan: This Ming dynasty canon, also known as the Jiaxing Canon 嘉興藏, is notable for including Chan Buddhist discourse records absent in earlier canons. As extant versions vary significantly in content across collections, determining its definitive catalog remains a scholarly priority. The University of Tokyo’s digitized collection provides valuable primary sources for this research. (Appendix A).
(7): Yongle Northern Canon 永樂南藏 and Yongle Southern Canon 永樂北藏 held by Shandong Provincial Library, China: The Yongle Southern Canon, initially carved under Emperor Chengzu, was expanded through three subsequent supplements, totaling 678 cases containing 1618 texts across 6325 fascicles. Shandong Provincial Library’s collection comprises 5200 fascicles in 5104 volumes. The Yongle Northern Canon, representing the third imperial Ming edition, was produced across the Yongle and Wanli reigns, containing 636 main cases and 41 supplementary cases. The library’s Wanli 20th-year (1592) imprint preserves 5386 fascicles. Both canons are accessible through the specialized “Ming Dynasty Buddhist Canon Database.” (Appendix A).
(8): Goryeo, Sixi, and Puning Canon 普寧藏 at Zōjōji Temple, Japan: These three canons served as collation references for the Taishō, with the Second Carved Editions of Goryeo Canon as base text and the Sixi and Puning Canons as reference editions marked “Song” and “Yuan”, respectively. In November 2023, Zōjōji Temple launched digital access to its collections, comprising 5342 Sixi Canon fascicles, 5228 Puning Canon fascicles, and 1357 bound volumes of the Goryeo Canon. (Appendix A).

The availability of these images allows for a symbiotic relationship between theoretical frameworks and systematic collation. On one hand, the “Three Lineage Theory” provides a necessary roadmap for utilizing these vast resources. By categorizing canons into Central, Southern, and Northern systems, scholars can strategically select representative editions for collation. This theoretical guidance ensures that digitization efforts are philologically representative rather than random.

On the other hand, systematic collation based on these new image resources will reciprocally deepen and refine our understanding of the canon’s history. While the “Three Lineage Theory” offers a concise macro-framework, the reality of textual transmission is far more complex. For instance, within the Central Lineage, there are subtle but significant differences between the Kaibao, Goryeo, and Zhaocheng Jin Canons that have yet to be fully mapped. Furthermore, the interactions between systems are often fluid. Through large-scale, computer-assisted collation of these newly available images, scholars can uncover these micro-histories, moving beyond simple classification to reveal the dynamic, cross-fertilizing nature of the canonical tradition.

3.3. Case Study: Structural Divergence in the Mahāsaṃnipāta Sūtra

The necessity of a multi-lineage approach is vividly illustrated by the Mahāsaṃnipāta Sūtra (Daji jing 大集經, T13n0397). This scripture powerfully illustrates why a single-base textual system is insufficient. This text exemplifies why a single-base system is insufficient. Textual study demonstrates that the version familiar to users of CBETA and the Taishō is a 60-fascicle “composite edition” (heben 合本). This structure, originating from the Central Lineage, was compiled by the Sui dynasty monk Sengjiu 僧就, who merged the original translation with other independent sutras to create a comprehensive collection.

The process of Sengjiu’s compilation of the 60-fascicle Mahāsaṃnipāta Sūtra is recorded in detail in A Record of the Three Treasures Throughout Successive Dynasties (歷代三寶紀T49n2034):

“The Newly Combined Mahāsaṃnipāta Sūtra: the work mentioned above, in 60 fascicles, was newly combined in the 6th year of the Kaihuang era(586) by the śramaṇa Sengjiu of the Zhaoti Monastery. Sengjiu entered the monastic life in his youth and devoted himself to the study of the scriptural treasury. According to the Sanskrit manuscript, this Mahāsaṃnipāta Sūtra contains a total of 100,000 verses. If translated in full, it would comprise about 300 fascicles. … However, as we are growing distant from the Sage [the Buddha], ordinary understanding is gradually dimming; people cannot retain the whole and thus copy and translate according to the portions they receive. Consequently, the Sanskrit manuscripts that arrived were incomplete in their divisions and bundles: abbreviated versions were translated briefly, expanded versions were translated extensively. For this reason, the translations produced by earlier masters—Zhi [Qian] and [Dharm]arakṣa, as well as those by Kumārajīva—resulted in versions of sometimes 27, sometimes 30, sometimes 31 fascicles; the scrolls were not fixed. Sengjiu, while propagating the scripture, often lamented this. When he later saw that [Narendra]yaśas, during the Northern Qi period, had translated the Candragarbha Sūtra in 12 fascicles, and again in the present Kaihuang era the same Yaśas translated the Sūryagarbha Sūtra in 15 fascicles—and since both were expanded old sections of the Mahāsaṃnipāta—he was inwardly delighted and promptly combined them to form 60 fascicles. Although Sengjiu incorporated them, the work was not yet refined. Later, the śramaṇa Hongqing of the Daxingshan Monastery, a man of profound insight and clarity, was commissioned by the Empress to collate and copy the scriptures of two canons. He then corrected the titles and headings in Sengjiu’s combined version, putting it into excellent order.”

《新合大集經》，右一部六十卷。招提寺沙門釋僧就開皇六年新合。就少出家，專寶坊學。依如梵本，此《大集經》凡十萬偈。若具足翻，可三百卷。……然去聖將遠，凡識漸惛，不能總持，隨分撮寫。致來梵本，部夾弗全，略至略翻，廣來廣譯。緣是前哲支、曇所翻及羅什出，或二十七，或復三十，或三十一，卷軸匪定。就既宣揚，每恒嗟歎。及覩耶舍高齊之世出《月藏經》一十二卷。至今開皇復屬耶舍譯《日藏經》一十五卷。既並《大集》廣本舊品，內誠欣躍，即依合之成六十軸。就雖附入，未善精。比有大興善寺沙門洪慶者，識度淵明，奉為皇后檢校抄寫眾經兩藏，遂更正就所合名題，甚為整頓。7

Sengjiu combined the 30-fascicle translation by Dharmakṣema with the Candragarbha Sūtra and Sūryagarbha Sūtra translated by Narendrayaśas to create the new 60-fascicle Mahāsaṃnipāta Sūtra. Sengjiu’s compilation was relatively crude, merely linking the various texts together. Subsequently, the monk Hongqing of Daxingshan Monastery, acting on the Empress’s order to collate the scriptures, corrected the titles of the fascicles and sections, integrating them into a cohesive whole. It is noteworthy that the editorial work of Sengjiu and Hongqing on the Mahāsaṃnipāta Sūtra was not, strictly speaking, scriptural translation but rather a form of scriptural compilation. Their compiled edition lacked an actual Sanskrit manuscript as its basis, relying instead on the prevalent legend of an expansive version of the Mahāyāna sūtra circulating at the time. This differed from the Mid-Imperial Chinese ideal that scriptural translation must be based on a Sanskrit original, leading Tang dynasty bibliographers like Jingtai 靜泰 to question the authority of this 60-fascicle edition.

Historical records and extant editions reveal a completely different situation in other transmission lineages. The Northern lineage is represented by the Qidan Canon. While only scattered volumes of the Qidan Canon survive today—none containing the Mahāsaṃnipāta Sūtra—the Korean monk Sugi utilized it as a collation witness when overseeing the carving of the Second Carved Edition of the Goryeo Canon. He recorded substantial collation notes, compiled in the Record of the Collation of the New Carving of the Goryeo Canon (Gaoliguo Xindiadazang Jiaozheng Bielu 高麗國新雕大藏校正別錄 K38n1402). In this work, Sugi points out that, unlike the First Carved Carved Editions of Goryeo Canon, the “Dan Canon” (Qidan Canon) representing the Northern lineage contained a 30-fascicle Mahāsaṃnipāta Sūtra:

“For this sūtra, both the State Edition and the Song Edition comprise 60 fascicles in 17 sections. The Dan Canon contains 30 fascicles in 11 sections. Furthermore, at the beginning of the sūtra, both the State and Song editions include the section titled ‘Yingluo Pin’ (Jewel-Net Section), which is absent in the Dan Canon. The ‘Xukongzang Pin’ (Space-Storage Section) appears in the two editions after the ‘Bukeshuo [Pin]’ (Inexpressible Section), but in the Dan Canon it is placed before the ‘Wuyan Pin’ (Wordless Section). Also, after the ‘Baoji Pin’ (Jeweled Topknot Section), the two editions contain the ‘Wujinyi Pin’ (Inexhaustible Meaning Section) in four fascicles, which the Dan Canon lacks; instead, it has the ‘Rimi Fen’ (Sun-Secret Division) in three fascicles…”

此經《國本》《宋本》皆六十卷，凡十七品。《丹藏》中三十卷，十一品。又經初首《國》《宋》兩本則有《瓔珞品》名，丹藏所無。其《虛空藏品》兩本在《不可說》後，丹藏在《無言品》前。又於《寶髻品》後兩本有《無盡意品》四卷，《丹藏》即無，而有《日密分》三卷……8

Here, the State Edition refers to the First Carved Edition of the Goryeo Canon, the Song Edition to the Kaibao Canon, and the Dan Edition to the Qidan Canon. The First Carved Goryeo Canon and the Kaibao Canon are both 60-fascicle editions with 17 sections; the Qidan Canon edition has 30 fascicles and 11 sections. The First Carved Goryeo and Kaibao Canons begin with the Yingluo Pin 瓔珞品, missing in the Qidan Canon. Also, the Xukongzang Pin 虛空藏品 appears after the Bukeshuo Pin 不可說品 in the two editions but before the Wuyan Pin 無言品 in the Qidan Canon. After the Baoji Pin 寶髻品, the two editions contain the Wujinyi Pin 無盡意品 in four fascicles, while the Qidan Canon omits this but includes the Rimi Fen 日密分 in three fascicles.

No complete set of the Qidan Canon survives today. However, the Liao and Jin dynasty stone sūtras at Fangshan can largely be considered a carved reprint of the Qidan Canon. The Mahāsaṃnipāta Sūtra among the Fangshan Stone Sūtras, as indicated by Sugi’s collation notes, indeed consists of 30 fascicles, and the sequence of its sections aligns with his records.

Beyond the Northern lineage represented by the Qidan Canon and the Fangshan Stone Sūtras, we can examine the Southern lineage editions, namely the Fuzhou Canon and the Sixi Canon. The structure of the sūtra in the Fuzhou and Sixi Canons is consistent with that of the Northern lineage. Given that the 60-fascicle version originates from Sengjiu’s compilation polished by Hongqing, what is the source of the 30-fascicle version found in the Southern and Northern printed canons? We may consult the records in Zhisheng 智升’s Record of Buddhist Scriptures Compiled During the Kaiyuan period (Kaiyuan Shijiao Lu 開元釋教錄T55n2154). Zhisheng recorded six different versions of the Mahāsaṃnipāta Sūtra available to him, among which he considered the standard text to be the 30-fascicle edition that became the basis for the Northern and Southern printed canons.9

The standard text recognized by Zhisheng comprises eleven sections. Compared to the twelve-section version recorded by Sengyou 僧祐 in A Collection of Records Concerning the Chinese Tripitaka (出三藏記集T55n2145), it lacks the first section, Yingluo Pin 瓔珞品, and the twelfth section, Wujinyi Pin 無盡意品, but includes the eleventh section, Rimi Fen 日密分; the Xukongcang Pusa Pin 虛空藏菩薩品 is placed before the Wuyan Pusa Pin 無言菩薩品. The Mahāsaṃnipāta Sūtra entered in the Kaiyuan Shijiao Lu is precisely this 30-fascicle edition.

Based on Zhisheng’s record, Sugi’s collation notes on the Qidan Canon, the Liao-Jin stone sūtras at Fangshan, and the Goryeo, Fuzhou, and Sixi Canons, we can construct the following Table 3 to clearly illustrate the structural differences between the 30-fascicle and 60-fascicle editions of the Mahāsaṃnipāta Sūtra.

Table 3. Section Sequence of the Mahāsaṃnipāta Sūtra.

Utilizing newly accessible image resources confirms that both the Northern and Southern transmission lineages preserve the original 30-fascicle translation and did not adopt the 60-fascicle compiled edition. Furthermore, these versions lack the Yingluo Pin contained in the Central lineage version, and the sequence of sections is entirely different.

Current platforms like CBETA are built upon a framework that treats the Taishō Tripiṭaka (60-fascicle edition) as the standard base text. Within this framework, the 30-fascicle Northern/Southern version can typically only be represented as a “defective” variant that “omits” 30 fascicles. This representation risks obscuring the historical fact that the 30-fascicle text is not a subset of the 60-fascicle text but an independent, authentic textual tradition. The prevailing data model, which forces Northern/Southern lineage texts into the structural grid of the Central lineage text, struggles to adequately reflect their independent structural identity.

3.4. Future Directions: AI-Driven “Hybrid Digital Canon”

The convergence of vast image resources with the structural limitations of current platforms points to a clear evolutionary path for the field. Future initiatives should aim to construct a “Hybrid Digital Canon.” This theoretical model represents a synthesis of genealogical hybridity (where Central, Southern, and Northern lineages coexist), media hybridity (integrating text with IIIF images), and methodological hybridity (combining philology with automation).

To realize this vision, where editions from the Central (Goryeo/Zhaocheng), Southern (Sixi), and Northern (Qidan) lineages coexist as equal nodes, the field must harness the revolutionary breakthroughs in Artificial Intelligence (AI). Since the introduction of the Transformer architecture in 2017 and the subsequent rise of Large Language Models (LLMs), AI has evolved from a supplementary tool into a transformative force capable of reshaping digital philology (Deng et al. 2021).

The foundational step in constructing this Hybrid Canon is converting the massive image repositories of non-Taishō lineages into computable text. As a core component of ancient text digitization, OCR stands out as a pivotal technology (Su et al. 2021). Modern workflows now simulate the human reading process through four key stages, each essential for processing the complex layouts of the Southern and Northern canons.

First, in the image preprocessing stage, degradation issues caused by age—such as noise, physical deformations, and background interference—are addressed through techniques including denoising, shape correction, and background removal to enhance image quality.

Second, layout analysis involves identifying and segmenting the complex structural components of historical texts, such as text columns, annotations, and illustrations. This step is particularly challenging due to intricate layouts common in ancient books, such as mixed large and small characters and multi-column arrangements. In recent years, deep learning-based instance segmentation methods (e.g., U-Net) and graph neural networks have emerged as effective solutions.

The third stage, text recognition, constitutes the core of the process, converting segmented text lines into machine-readable characters. The technology in this area has evolved from rule-based and statistical methods to deep learning models. Current state-of-the-art approaches typically employ end-to-end frameworks combining convolutional and recurrent neural networks with connectionist temporal classification or attention mechanisms, further enhanced by pre-trained language models to improve recognition of variant and rare characters.

Finally, the post-processing and proofreading stage ensures accuracy through confidence estimation, error detection, and human–computer interactive correction. Rich information technologies—such as retaining candidate characters and coordinate positioning—provide essential support for manual verification.

In the specific context of the Buddhist Canon, the research team led by Jin Lianwen has made groundbreaking progress in addressing the unique challenges of diverse typefaces and variant characters. As detailed in Wang et al. (2022), this team constructed specialized datasets (e.g., the “Goryeo Canon Dataset”) and innovatively employed weakly supervised learning methods with adaptive gating mechanisms. They achieved recognition accuracy exceeding 98% for Buddhist texts using streamlined architectures such as CNN+CTC. Their “Rushi Ancient Text Digitization Platform” has successfully processed key editions including the Sixi Canon and Zhaocheng Jin Canon. Concurrently, the Reiwa Daizōkyō 令和大藏經 project led by the SAT team is establishing a “Collaborative Research Data Infrastructure” to systematically process over 110 million characters using similar AI-OCR technologies. These advances prove that the technical barrier to digitizing diverse canonical lineages has been overcome, laying the material foundation for a Hybrid Digital Canon.

Once the text is digitized, the next challenge for the Hybrid Canon is making these vast, unpunctuated corpora intelligible and interoperable.

To address the readability of the newly digitized Zhonghua Dazangjing 中華大藏經, Zhonghua Book Company has launched the “Buddhist Classics Database,” integrating state-of-the-art AI auto-punctuation. Based on the Transformer model research by Hong et al. (2021), this system employs an end-to-end encoder–decoder architecture with a multi-head attention mechanism. Pre-trained on a billion-character corpus, the model achieves an F1 score of 95.1% for sentence segmentation. This technology allows users to instantly view machine-generated punctuation for the massive Zhonghua Dazangjing, significantly enhancing access to the Central Lineage texts that were previously difficult to navigate.

Complementing this is the work of the Canon Office of Beijing Longquan Temple 龍泉寺, whose “Ancient Texts Hub” collaborates with CBETA to provide AI-powered punctuation services. Furthermore, CBETA has implemented semantic search functions using OpenAI’s API. Unlike traditional keyword searches, this system employs semantic vector comparison to identify relevant passages based on meaning rather than form. This capability is vital for a Hybrid Canon, as it allows researchers to trace doctrinal evolution across the Central, Southern, and Northern traditions even when their phrasing diverges significantly. By integrating these semantic tools, the Hybrid Digital Canon transforms from a static repository into a dynamic environment for knowledge discovery.

4. Conclusions

This review has traced the remarkable trajectory of the Chinese Buddhist Canon’s digitization, from early manuscript and print collations to today’s sophisticated digital ecosystems. Platforms such as CBETA and SAT have revolutionized textual access, standardizing citation systems, enabling large-scale search, and integrating cross-linguistic resources. Yet, as demonstrated, their dependence on the Taishō—while pragmatically necessary in earlier phases of digitization—has imposed significant epistemological constraints. The canon’s historical reality is one of fluid, multi-lineage transmission, with substantive variations across the Central, Southern, and Northern lineages that are often marginalized in current digital representations.

The proliferation of high-resolution image databases for non-Taishō editions now provides the material basis for a more inclusive digital philology. Coupled with advances in AI-driven OCR, text–image alignment, and semantic processing, these resources make it feasible to envision a “Hybrid Digital Canon”—a platform where multiple lineage editions coexist as equal nodes within a networked, interoperable environment. Such a model would not only preserve textual diversity but also actively facilitate comparative, lineage-aware scholarship.

Future efforts must therefore prioritize the development of multi-base collation frameworks, leveraging computational tools to automate the alignment and analysis of variant editions while maintaining rigorous philological standards. By transcending the “Taishō bottleneck,” digital scholarship can more accurately reflect the historical dynamism of the Buddhist textual tradition and support a new generation of research into its formation, transmission, and transformation. The journey toward a fully integrated, intelligent, and pluralistic digital canon remains an ongoing scholarly imperative—one that promises to deepen our understanding of Buddhist literature and its place in the digital humanities.

Funding

This research was funded by the National Social Science Fund Youth Project “A Study of Buddhist Canons and Catalogues During the Tang-Song Dynasties” (Project No. 21CZJ013), and the Chinese Academy of Social Sciences “Endangered and Cold Discipline” Support Initiative (Project No. 25JXLM05).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

API	Application Programming Interface
CBETA	Chinese Buddhist Electronic Text Association
CiNii	Citation Information by National institute of informatics. A database suite operated by the National Institute of Informatics (NII).
DDB	Digital Dictionary of Buddhism.
DOI	Digital Object Identifier
EBTI	Electronic Buddhist Text Initiative
IRIZ	International Research Institute for Zen Buddhism
INBUDS	Indian and Buddhist Studies Treatise Database
LLM	Large Language Models
NLC	National Library of China
NLP	Natural Language Processing
OCR	Optical Character Recognition
RITK	Research Institute of the Tripitaka Koreana
SAT	Saṃgaṇikīkṛtaṃ Taiśo Tripiṭakaṃ/The SAT Daizōkyō Text Database
TEI	Text Encoding Initiative

Appendix A

Table A1. Buddhist Canon Image Databases.

Database	Website	Canon
Chinese Ancient Books Resource Database	https://www.nlc.cn/pcab/zy/zhgj_zyk/ (Last accessed: 28 October 2025)	Zhaocheng Jin Canon; Sixi Canon
Imperial Household Agency Shoryōbu Collection of Chinese Classics Database	https://db2.sido.keio.ac.jp/kanseki (Last accessed: 28 October 2025)	Fuzhou Canon
Archives of Buddhist Culture Database	https://kabc.dongguk.edu/ (Last accessed: 28 October 2025)	First and Second Carved Editions of the Goryeo Canon
Yunnan Ancient Books Digital Library	http://msq.ynlib.cn/ (Last accessed: 28 October 2025)	Yuan Official Canon
“University of Tokyo General Library Wanli Edition Buddhist Canon (Jiaxing Canon) Electronic Edition” Database	https://dzkimgs.l.u-tokyo.ac.jp/kkz/ (Last accessed: 28 October 2025)	Jiaxing Canon
Ming Dynasty Buddhist Canon Database	http://58.59.15.37:9500/#/home (Last accessed: 28 October 2025)	Yongle Northern Canon and Yongle Southern Canon
“Three Editions of Buddhist Sacred Canons Stored at Zōjōji” Database	https://jodoshuzensho.jp/zojoji_sandaizo/ (Last accessed: 28 October 2025)	Goryeo Canon, Sixi Canon, and Puning Canon

Notes

1	For instance, Kanichi Ogawa 小川貫一 argued that the Buddhist canon is not merely a collection or series of Buddhist texts, but carries implications of “specific organization and intentional content” (Daizōkai 1964, pp. 5–6). Li Fuhua and He Mei described the Chinese Buddhist canon as “a comprehensive collection of Chinese Buddhist scriptures—including works composed by Chinese monks—that have been identified, systematized, and organized according to a certain sequence by Buddhist bibliographers since the Sui and Tang dynasties,” one that exhibits “rich content, rigorous sequence, and refined structure” (Li and He 2003, p. 14). K. R. Norman noted that a canon represents “it is a collection of scriptures (oral or written), which gives a certain authority to those texts included in it” (Norman 1997, p. 131).
2	Prior to CBETA’s official formation, a notable attempt was made in 1994 by Yang Guoping, convened under the same name (“Chinese Buddhist Electronic Text Association”), involving institutions like Fo Guang Shan and Academia Sinica. Additionally, lay practitioner Shen Jiazhen established the “Buddhist Computer Information Merit Association” in the US. However, these initiatives stalled due to difficulties in achieving consensus regarding copyright and editorial responsibility. See Fang (1996) and CBETA (2007).
3	Examining the inclusion catalogues 入藏錄 from the Sui and Tang dynasties reveals that manuscript canons of that period primarily contained the Tripitaka (Sūtra, Vinaya, Śāstra) translated from Sanskrit and Central Asian languages. (Zhang 2024) By the 18th year of the Kaiyuan era (730 CE), Zhi Sheng’s 智升 Kaiyuan Shijiao Lu 開元釋教錄 shows that works written by Chinese monks began to be gradually included in the canon, though their number remained very limited (approx. 40 works, 368 fascicles), mainly comprising Buddhist catalogues and historical/biographical texts. Notably, commentaries by eminent Tang dynasty masters were excluded, a situation incongruent with the flourishing state of Tang Buddhism. By the Song-Yuan printed canons and the Ming Jingshan Canon, works by local monks, Chan discourse records, and transmission of the lamp records 燈錄 were increasingly incorporated. In modern times, the Japanese-compiled Taishō and Zokuzōkyō significantly increased the inclusion of ancient Chinese exegetical literature. CBETA continues this trend of expanding inclusion criteria, incorporating stone inscription rubbings, monastic gazetteers, extra-canonical texts, and works by modern masters, marking a further transformation of the Buddhist canon from sacred scripture towards a comprehensive encyclopedia.
4	For a list of canons referenced by CBETA, see CBETA Canon Codes: https://cbetaonline.cn/doc/zh/02-02_id.php (accessed on 28 October 2025).
5	For CBETA citation examples, see CBETA Copy & Cite: https://cbetaonline.cn/doc/zh/04-aca_quote.php (accessed on 28 October 2025).
6	For details on the Sixi Canon held at the NLC, please see (Li 2007).
7	CBETA 2025.R3, T49, no. 2034, p. 103a9-b10 (CBETA 2025).
8	CBETA 2025.R3, K38, no. 1402, p. 513c15-21 (CBETA 2025).
9	Zhisheng described this 30-fascicle edition as follows: “Upon examining the scripture text, it differs from [Seng]You’s Record: First, the Tuoluoni Zizai Wang Pusa Pin (Dhāraṇī-Sovereign King Bodhisattva Section) (some scripture texts split this into the Yingluo Pin; this is incorrect. This is one continuous passage and should not be divided into two. The later Da’ai Jing corresponds to this section.); Second, the Baonü Pin (Jewel Maid Section); Third, the Buxuan Pusa Pin (Unwavering Bodhisattva Section); Fourth, the Haihui Pusa Pin (Oceanic Wisdom Bodhisattva Section); Fifth, the Xukongcang Pusa Pin (Space-Storage Bodhisattva Section); Sixth, the Wuyan Pusa Pin (Wordless Bodhisattva Section); Seventh, the Bukeshuo Pusa Pin (Inexpressible Bodhisattva Section); Eighth, the Baochuang Fen (Jewel-Banner Division); Ninth, the Xukongmu Fen (Space-Eye Division); Tenth, the Baoji Pusa Pin (Jeweled Topknot Bodhisattva Section); Eleventh, the Rimi Fen (Sun-Secret Division); (Upon checking various catalogues, the fascicle count for this Mahāsaṃnipāta Sūtra is not fixed: some say 29, 30, 31, 32, or 40 fascicles. Nowadays, the Mahāsaṃnipāta is mostly in 30 fascicles. The text of the Rimi Fen is incomplete, lacking about one fascicle. The 31-fascicle version should be complete, but it has not yet been obtained.); … From the Tuoluoni Zizai Wang Pin to the Rimi Fen, there are eleven divisions in total. The Rizang Jing and the Rimi Fen are different translations of the same original and also constitute the eleventh division. (Both the Rimi Fen and the Rizang Jing begin by stating: ‘Having explained the Xukongmu and the ānāpānasmṛti nectar-gate, this sūtra is then taught.’) Furthermore, since the Rimi Fen is taught after the Xukongmu Fen, by principle it should not be separated by the Baoji Pin. In the present scripture text, this section appears separated; the reason is unclear. Also, although the Rimi Fen and the Rizang Jing share the same original, the text of the Rimi Fen is extremely abbreviated, and its latter part is further missing, amounting to roughly less than a fascicle.)”; 今檢經本與祐《記》不同：第一《陀羅尼自在王菩薩品》，（亦有經本分為《瓔珞品》者，不然。此是一段，不合分二，後《大哀經》即是此品。）第二《寶女品》，第三《不眩菩薩品》，第四《海慧菩薩品》，第五《虛空藏菩薩品》，第六《無言菩薩品》，第七《不可說菩薩品》，第八《寶幢分》，第九《虛空目分》，第十《寶髻菩薩品》，第十一《日密分》（尋檢群錄，此《大集經》卷無定準，或云二十九，或云三十，或三十一，或三十二，或四十卷。今時《大集》，多分三十。其《日密分》文不具足，合少一卷。其三十一卷者，文應備具，今尋求未獲。）……今從《陀羅尼自在王品》至《日密分》，總十一分，其《日藏經》與《日密分》同本異譯，亦是第十一分，（《日密》《日藏》初俱云：”說《虛空目》安那般那甘露門已，次說此經。”）又《日密分》既於《虛空目》後說，准義不合，隔《寶髻品》。今經本中，有此品隔，未詳所以。又《日密》《日藏》雖是同本，其《日密分》又文極撮略，後文復闕，可少卷餘。(CBETA 2025.R3, T55, no. 2154, p. 588a14-b1 (CBETA 2025)).

References

Bhikkhu, Huimin 釋惠敏, Zhengmin Du 杜正民, Bangxin Zhou 周邦信, and Zhipan Wang 王志攀. 2005. Techniques for Collating Multiple Text Versions in the Digitization of Classical Texts: The CBETA Taishō Buddhist Canon as an Example. Chung-Hwa Buddhist Journal 中華佛學學報 18: 299–325. [Google Scholar]
Bhikkhu, Huimin 釋惠敏, Zhenzhou Hung 洪振洲, and Jiayu Xu 許佳瑜. 2018. Hanji yuyi lianjie de tantao yu yingyong yanjiu: Yi fodian shuwei ziyuan wei li: CBETA shuwei yanjiu pingtai 漢籍語意鏈結的探討與應用研究—以佛典數位資源為例：CBETA數位研究平台 [Exploration and Applied Research on Semantic Linking of Chinese Classics: Using Buddhist Digital Resources as an Example: The CBETA Digital Research Platform]. Paper presented at Digital Archives and Digital Humanities International Conference (9th), Taiwan, China, December 18–21; Taipei: Taiwan Digital Humanities Association, p. 163. [Google Scholar]
CBETA. 2007. Hui xiang Shen Jiazhen jushi 回向沈家楨居士 [Dedication to Layman C. T. Shen]. Zhonghua dianzi fodian xiehui xinwen dianzi bao 中華電子佛典協會新聞電子報. 103. Available online: https://archive.cbeta.org/data/news/200711/index.htm (accessed on 28 October 2025).
CBETA. 2025. 2025.R3. Taishō Shinshū Daizōkyō 大正新脩大藏經, vols. T49, T55. Goryeo Canon 高麗大藏經, vols. K38. Available online: https://www.cbeta.org (accessed on 28 October 2025).
Chen, Shu-Fen 陳淑芬. 2013. Jingang jing biaodian yanjiu: Yi Dazhengzang yu CBETA Jiumoluoshi yi ben wei li 《金剛經》標點研究：以《大正藏》與CBETA鳩摩羅什譯本為例 [A Study on the Punctuation of the Diamond Sūtra: Based on the Kumārajīva Translation in the Taishō Canon and CBETA]. Yuan Kuang Journal of Buddhist Studies 圓光佛學學報 22: 33–88. [Google Scholar]
Chi, Limei 池麗梅. 2023. Gen Kanban Daizōkyō Kenkyū (1) Genzon Tekisuto no Gaikan 元官版大蔵経研究（一）現存テキストの概観 [A Study of the Official Buddhist Canon of the Yuan Dynasty (I): Existing Texts in China and Japan]. Kokusai Bukkyōgaku Daigakuin Daigaku Kenkyū Kiyō 國際仏教學大學院大學研究紀要 26: 63–81. [Google Scholar]
Chikusa, Masaaki 竺沙雅章. 2000. Sō Gen ban Daizōkyō no keifu 宋元版大藏經の系譜 [A Genealogy of the Song–Yuan Editions of the Buddhist Canon]. In Sō Gen Bukkyō bunkashi kenkyū宋元佛教文化史研究 [Studies in the Cultural History of Song Yuan Buddhism]. Tokyo: Kyūko Shoin, pp. 271–362. [Google Scholar]
Daizōkai 大蔵會, ed. 1964. Daizōkyō: Seiritsu to hensen 大蔵経: 成立と変遷 [The Buddhist Canon: Formation and Transformation]. Nagoya: Hyakkaen 百華苑. [Google Scholar]
Deng, Sanhong 鄧三鴻, Haotian Hu 胡昊天, Hao Wang 王昊, and Dongbo Wang 王東波. 2021. Guwen zidong chuli yanjiu xianzhuang yu xinshidai fazhan qushi zhanwang 古文自動處理研究現狀與新時代發展趨勢展望 [The Current State and New Era Development Trends of Classical Chinese Automated Processing Research]. Keji Qingbao Yanjiu 科技情報研究 3: 1–20. [Google Scholar]
Du, Zhengmin 杜正民. 1998. Dangdai guoji fodian dianzihua xiankuang: Dianzi fodian tuijin xieyihui (EBTI) jianjie 當代國際佛典電子化現況：電子佛典推進協議會（EBTI）簡介 [The Current State of International Buddhist Text Electronization: An Introduction to the Electronic Buddhist Text Initiative (EBTI)]. Fojiao Tushuguan Xun 佛教圖書館訊 15: 28–39. [Google Scholar]
Du, Zhengmin 杜正民. 1999a. Hanwen dianzi Dazangjing de zhizuo yuanqi yu zuoye liucheng—yi “Zhonghua Dianzi Fodian Xiehui” wei li 漢文電子大藏經的製作緣起與作業流程——以”中華電子佛典協會”為例 [The Genesis and Workflow of the Chinese Electronic Buddhist Canon: Taking the Chinese Buddhist Electronic Text Association as an Example]. Foxue Zhongxin Xuebao 佛學中心學報 4: 347–69. [Google Scholar]
Du, Zhengmin 杜正民. 1999b. Yi CBETA wei li: Tan daliang wenxian zhi jianli: Hanwen Zangjing dianzihua zuoye jianshuo 以CBETA為例—談大量文獻之建立—漢文藏經電子化作業簡說 [Using CBETA as an Example: On the Establishment of Large-Scale Texts—A Brief Introduction to the Electronization of the Chinese Buddhist Canon]. Computing Center Newsletter, Academia Sinica 中央研究院計算中心通訊 15: 117–22. [Google Scholar]
Du, Zhengmin 杜正民. 2012. Foxue shuwei ziyuan de jianzhi yu kaizhan 佛學數位資源的建置與開展 [The Construction and Development of Buddhist Digital Resources]. Dharma Drum Journal of Buddhist Studies 法鼓佛學學報 10: 147–210. [Google Scholar]
Du, Zhengmin 杜正民, Jiaming Li 李家名, Haiwen Zhou 周海文, Baolian Zheng 鄭寶蓮, and Xinyan Lin 林心雁. 2008. Zangjing yu Fojiao gongjushu de shuweihua bianzuan: Yi CBETA dianzi fodian yu shuwei jinglu jihua wei li 藏經與佛教工具書的數位化編纂：以CBETA電子佛典與數位經錄計畫為例 [The Digital Compilation of Buddhist Canons and Reference Works: Using the CBETA Electronic Buddhist Canon and Digital Catalog Project as Examples]. Fojiao Tushuguan Guankan 佛教圖書館館刊 47: 22–35. [Google Scholar]
Fang, Guangchang 方廣錩. 1991. Fojiao Dazangjing shi: Ba–shi shiji 佛教大藏經史：八——十世紀 [History of the Buddhist Canon: 8th–10th Centuries]. Beijing: Zhongguo Shehui Kexue Chubanshe 中國社會科學出版社. [Google Scholar]
Fang, Guangchang 方廣錩. 1996. Haiwai Dazangjing bianji ji dianziban Dazangjing de qingkuang 海外大藏經編輯及電子版大藏經的情況 [Overseas Buddhist Canon Compilation and the Status of Digitized Canons]. Fayin 法音 5: 3–11. [Google Scholar]
Fang, Guangchang 方廣錩. 1997. Dazheng xinxiu Dazangjing pingshu 《大正新修大藏經》評述 [A Review of the Taishō Shinshū Daizōkyō]. In Wensi: Jinling kejingchu 130 zhounian jinian zhuanji 聞思：金陵刻經處130周年紀念專輯 [Wensi: Commemorative Collection for the 130th Anniversary of Jinling Scriptural Press]. Edited by Nanjing Jinling Kejingchu 南京金陵刻經處. Beijing: Hauwen chubanshe 华文出版社, pp. 230–53. [Google Scholar]
Fang, Guangchang 方廣錩. 2006. Zhongguo xieben dazangjing yanjiu 中國寫本大藏經研究 [Studies on the Dazangjing in Chinese Manuscripts]. Shanghai: Shanghai Guji Chubanshe 上海古籍出版社, pp. 9–10. [Google Scholar]
Fang, Guangchang 方廣錩. 2015. Guji shuzihua shiye zhong de Dazhengzang yu fodian zhengli 古籍數位化視野中的《大正藏》與佛典整理 [The Taishō Canon and Buddhist Text Collation from the Perspective of Ancient Text Digitization]. Shanghai Shifan Daxue Xuebao (Zhexue Shehui Kexue Ban) 上海師範大學學報 (哲學社會科學版) 4: 17–25. [Google Scholar]
Hong, Tao 洪濤, Ruixue Cheng 程瑞雪, Sixi Liu 劉思汐, and Kaiqi Fang 方凱齊. 2021. Yizhong jiyu Transformer moxing de guji zidong biaodian jishu 一種基於 Transformer模型的古籍自動標點技術 [An Automatic Punctuation Technique for Classical Chinese Based on Transformer Model]. Shuzi Renwen 數字人文 2: 111–22. [Google Scholar]
Hong, Zhenzhou 洪振洲. 2016. You ziliaoku dao shuwei yanjiu pingtai: Tan fodian wenxian shuwei yanjiu gongju zhi fazhan yu yanbian 由資料庫到數位研究平台－談佛典文獻數位研究工具之發展與演變 [From Database to Digital Research Platform: On the Development and Evolution of Digital Research Tools for Buddhist Literature]. Hanxue Yanjiu Tongxun 漢學研究通訊 35: 1–14. [Google Scholar]
Hong, Zhenzhou 洪振洲. 2018. Shuweishidai Hanyi fodian zhi yanjiu liqi: CBETA shuwei yanjiu pingtai 數位時代漢譯佛典之研究利器－CBETA數位研究平臺 [A Powerful Tool for Studying Chinese Buddhist Scriptures in the Digital Age: The CBETA Digital Research Platform]. Journal of Digital Archives and Digital Humanities 數位典藏與數位人文 1: 149–74. [Google Scholar]
Hou, Kunhong 侯坤宏. 2007. Xingtan nalu: Hengqing fashi fangtan lu 杏壇衲履：恒清法師訪談錄 [Cassock in the Scholarly Grove: An Interview with Ven. Hengching]. Taipei: Guoshiguan 國史館, pp. 203–8. [Google Scholar]
Hou, Kunhong 侯坤宏, and Zunhong Zhuo 卓遵宏. 2015. Liushi gan’en ji: Huimin fashi fangtan lu (zengding ban) 六十感恩紀：惠敏法師訪談錄（增訂版） [Sixty Years of Gratitude: An Interview with Ven. Huimin (Expanded Edition)]. Taipei: Fagu Wenhua 法鼓文化, pp. 301–16. [Google Scholar]
Huang, Qiangang 黃乾綱, Jiaming Li 李家名, and Fayuan Shi 釋法源. 2018. Yi shuwei fojing wei jichu, yanfa guwenxian yanjiu suoxu zixun jishu de jingyan yu chengguo 以數位佛經為基礎，研發古文獻研究所需資訊技術的經驗與成果 [Experience and Results of Developing Information Technologies for Classical Text Research Based on Digital Buddhist Sutras]. Renwen yu Shehui Kexue Jianxun 人文與社會科學簡訊 19: 124–31. [Google Scholar]
Lancaster, Lewis R. 2003. Buddhism and the Digital Age. Hsi Lai Journal of Humanistic Buddhism 4: 79–86. [Google Scholar]
Lancaster, Lewis R. 2008. Digital Buddhist Texts and Buddhist Universities. The Journal of the International Association of Buddhist Universities 1: 133–54. [Google Scholar]
Li, Fuhua 李富華, and Mei He 何梅. 2003. Hanwen Fojiao Dazangjing yanjiu 漢文佛教大藏經研究 [Research on the Chinese Buddhist Canon]. Beijing: Zongjiao Wenhua Chubanshe 宗教文化出版社. [Google Scholar]
Li, Jining 李際寧. 2007. Guotu xin shoucang Sixi ban Da bore boluomi jing de jingguo ji qi wenwu banben jiazhi 國圖新入藏思溪版《大般若波羅蜜經》的經過及其文物版本價值 [The Process and Value of the National Library’s Newly Acquired Sixi Canon of the Mahāprajñāpāramitā Sūtra]. In Fojiao Dazangjing yanjiu lungao 佛教大藏經研究論稿 [Studies on the Buddhist Canons]. Beijing: Zongjiao Wenhua Chubanshe 宗教文化出版社, pp. 171–84. [Google Scholar]
Li, Jining 李際寧. 2010. Guanyu jinnian faxian de Yuan guanzang 關於近年發現的《元官藏》 [On the Recently Discovered Yuan Official Canon]. Zangwai Fojiao wenxian 藏外佛教文獻 13: 352–88. [Google Scholar]
Muller, A. Charles, Masahiro Shimoda, and Kiyonori Nagasaki. 2017. The SAT Taishō Text Database: A Brief History. In Reinventing the Tripitaka: Transformation of the Buddhist Canon in Modern East Asia. Edited by Jiang Wu and Greg Wilkinson. Lanham: Lexington Books, pp. 175–85. [Google Scholar]
Nagasaki, Kiyonori 永崎研宣. 2015a. Bukkyō bunken no tame no kōzō-teki na dejitaru tekisuto no kijutsu to katsuyō 仏教文献のための構造的なデジタルテクストの記述と活用 [Structured Digital Text Description and Utilization for Buddhist Literature]. Indogaku Bukkyōgaku Kenkyū 印度學佛教學研究 63: 1088–94. [Google Scholar]
Nagasaki, Kiyonori 永崎研宣. 2015b. SAT Daizōkyō tekisuto dētabēsu: Jinbun-gaku ni okeru ōpundēta no katsuyō ni mukete SAT大蔵経テキストデータベース: 人文学におけるオープンデータの活用に向けて [The SAT Daizōkyō Text Database: Towards the Use of Open Data in the Humanities]. Jōhō Kanri 情報管理 58: 422–37. [Google Scholar]
Nagasaki, Kiyonori 永崎研宣. 2024. Bukkyō kenkyū to tekisuto kōzōka 仏典研究とテキスト構造化 [Buddhist Text Research and Text Structuring]. Indogaku Bukkyōgaku Kenkyū 印度學佛教學研究 72: 725–30. [Google Scholar]
Nagasaki, Kiyonori 永崎研宣, Ikki Ohmukai 大向一輝, and Masahiro Shimoda 下田正弘. 2023. OCR no kōseidoka o fumeta dejitaru gakujutsu henshūban no shintenkai OCRの高精度化を踏まえたデジタル学術編集版の新展開 [New Developments in Digital Scholarly Editions Based on Advanced OCR Accuracy]. Jinmon Kon 2023 Ronbunshū じんもんこん2023論文集 2023: 177–82. [Google Scholar]
Nagasaki, Kiyonori 永崎研宣, Jun Homma 本間淳, and Masahiro Shimoda 下田正弘. 2024. TEI koten-seki byūwā no kaihatsu: Higashi Ajia koten-seki ōpun saiensisu no jitsugen ni mukete TEI 古典籍ビューワの開発 ―東アジア古典籍オープンサイエンスの実現に向けて—[Development of a TEI Classical Text Viewer: Towards Realizing Open Science for East Asian Classical Texts]. Jinmon Kon 2024 Ronbunshū じんもんこん2024論文集 2024: 59–66. [Google Scholar]
Norman, Kenneth Roy. 1997. A Philological Approach to Buddhism: The Bukkyō Dendō Kyōkai Lectures 1994. London: School of Oriental and African Studies. [Google Scholar]
Shanxi Sheng Wenwuju 山西省文物局, and Zhongguo Lishi Bowuguan 中國歷史博物館. 1991. Yingxian muta Liao dai mizang 應縣木塔遼代秘蔵 [Liao Dynasty Hidden Treasures from the Wooden Pagoda at Ying County]. Beijing: Wenwu chubanshe 文物出版社. [Google Scholar]
Su, Qi 蘇祺, Renfen Hu 胡韌奮, Yuchen Zhu 諸雨辰, Chengxi Yan 嚴承希, and Jun Wang 王軍. 2021. Guji shuzihua guanjian jishu pingshu 古籍數字化關鍵技術評述 [A Review of Key Technologies in Ancient Book Digitization]. Shuzi Renwen Yanjiu 數字人文研究 1: 83–88. [Google Scholar]
Tong, Ran 通然. 2017. Hanyu Fojiao wenxian dianzihua de xianzhuang yu zhanwang 漢語佛教文獻電子化的現狀與展望 [The Current Situation and Prospects of the Electronization of Chinese Buddhist Documents]. Foxue yanjiu 佛學研究 1: 267–75. [Google Scholar]
Tong, Wei 童瑋, Guangchang Fang 方廣錩, and Zhiliang Jin 金志良. 1984. Yuandai guanke Dazangjing de faxian 元代官刻大藏經的發現 [The Discovery of the Officially Published Buddhist Canon in the Yuan Dynasty]. Wenwu 文物 12: 82–86. [Google Scholar]
Tu, Aming. 2015. Appendix 2. The Creation of the CBETA Chinese Electronic Tripitaka Collection in Taiwan. In Spreading Buddha’s Word in East Asia. Edited by Jiang Wu and Lucille Chia. New York: Columbia University Press, pp. 321–36. [Google Scholar]
Wang, Jun 王軍, Chenglin Liu 劉成林, Lianwen Jin 金連文, Yongge Liu 劉永革, Chixuan Zhang 張弛宜, Yinfei Wang 王胤斐, Hui Zhu 朱慧, Jingwen Han 韓靜雯, and Xuan Xu 徐璇. 2022. Xilie bitan zhi si: Zhineng shidai guji OCR jishu 系列筆談之四：智能時代古籍OCR技術 [The Fourth of a Series of Written Discussions: Ancient Book OCR Technology in the Age of AI]. Shuzi renwen 數字人文, 95–125. [Google Scholar]
Wang, Song 王頌, ed. 2025. Songban Fuzhou zang: Dongchan Dengjue Chanyuan yu Kaiyuansi ban hebi 宋版福州藏：東禪等覺禪院與開元寺版合璧 [The Song Dynasty Fuzhou Canon: A Combination of the Dongchan Dengjue Chan Monastery and Kaiyuan Monastery Editions]. Beijing: Zhongguo shudian chubanshe 中國書店出版社. [Google Scholar]
Wang, Zhipan 王志攀. 2000. CBETA dianzi fodian quezi shiwu CBETA電子佛典缺字實務 [Practical Handling of Missing Characters in CBETA Electronic Buddhist Texts]. Buddhist Library Newsletter 佛教圖書館館訊 24: 31–40. [Google Scholar]
Wittern, Christian. 2017. The Digital Tripitaka and the Modern World. In Reinventing the Tripitaka: Transformation of the Buddhist Canon in Modern East Asia. Edited by Jiang Wu and Greg Wilkinson. Lanham: Lexington Books, pp. 155–73. [Google Scholar]
Zhang, Xu 張旭. 2024. Zhongjing mulu yu Sui Tang shiqi de huangguan guancang 《眾經目錄》與隋唐時期的皇家官藏 [Zhongjing mulu and the Royal Official Collection of Buddhist Scriptures in the Sui and Tang Dynasties]. Shijie Zongjiao Wenhua 世界宗教文化 6: 172–79. [Google Scholar]
Zhongguo Fojiao Xiehui 中國佛教協會, and Zhongguo Fojiao Tushu Wenwuguan 中國佛教圖書文物館, eds. 2000. Fangshan shijing 房山石經 [Fangshan Stone Scriptures]. Beijing: Huaxia chubanshe 華夏出版社. [Google Scholar]
Zhonghua Dazangjing Bianjiju 中華大藏經編輯局. 1983. Zhonghua Dazangjing 中華大藏經 [Chinese Buddhist Canon]. Beijing: Zhonghua shuju 中華書局. [Google Scholar]
Zhou, Bokan 周伯戡. 2002. Ping CBETA dianzi Dazhengzang 評 CBETA 電子大正藏 [Evaluating the CBETA Electronic Taishō Canon]. Foxue Zhongxin Xuebao 佛學研究中心學報 7: 379–90. [Google Scholar]