Next Article in Journal
How about Now? Changes in Risk Perception before and after Hurricane Irma
Previous Article in Journal
How Does the Degree of Competition in an Industry Affect a Company’s Environmental Management and Performance?
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Analysis of Hotspots and Trends in Digitalization Research of Chinese Archives Based on Bibliometrics

1
School of Economics and Management, Beijing Information Science and Technology University, Beijing 100192, China
2
Key Research Base of Beijing Municipal Cultural Heritage Bureau, Beijing Information Science and Technology University, Beijing 100192, China
3
Beijing Key Lab of Green Development Decision Based on Big Data, Beijing 100192, China
4
Beijing World Urban Circular Economy System (Industry) Collaborative Innovation Center, Beijing 100192, China
*
Author to whom correspondence should be addressed.
Sustainability 2023, 15(9), 7679; https://doi.org/10.3390/su15097679
Submission received: 28 March 2023 / Revised: 21 April 2023 / Accepted: 29 April 2023 / Published: 7 May 2023

Abstract

:
This paper aims to reveal the current status of research development, research hotspots and trends, and possible future research directions in the field of archival digitization research in China. This paper focuses on the field of archival digitization research in China, using 1267 relevant literature articles retrieved from the China National Knowledge Infrastructure (CNKI) database between 1995 and 2022 as the research object. The study employs a bibliometric analysis and knowledge graph analysis and visualizes and analyzes its research content, authors, and institutions using software such as CiteSpace and VOSviewer. The study summarizes the research hotspots and development trends in the field of archival digitization. The results indicate that interdisciplinary research is a clear trend in current archival digitization research. Institutional construction, management mode, management technology, and archival form are this field’s four main thematic directions. Finally, based on the current research hotspots in China’s archival digitization, the future research and development direction of the field is discussed.

1. Introduction

Traditional archival management faces increasingly severe challenges in terms of archive resource storage [1], utilization [2], and security assurance [3]. In recent years, the development and application of scanning technology [4], OCR technology [5], big data technology [6], cloud computing [7], artificial intelligence [8], and other technologies have driven the digital management of archives into a new stage of “smart” management. The development and advancement of information technology (IT) have led to the deep mining of archival information, making the various elements of archival information management form intrinsic links, fully realizing the sharing of archival information resources [9,10], and meeting the needs of people in the information age for the filing, long-term preservation and development and utilization of archives information [2]. In general, the digitization of archives and archival digitization research process in China has been more delayed than that in foreign countries, due to the difference in the level of computer penetration and IT technology development [11,12]. However, China is a country with a long history and rich culture, and relevant archives are of great value in the study of history, culture, and society. China is currently undergoing a digital transformation, which presents both opportunities and challenges. As technology continues to advance, archival digitization has become an important component of this digital transformation. Research on archival digitization within the Chinese context can provide valuable insights and experiences that can contribute to the overall success of China’s digital transformation. With the development of Internet technology, China has realized the importance of archives digitization and, for a long time now, has been undertaking a large-scale digitization project [13]. The significance of conducting archival digitization research in the era of big data has gradually become apparent, making it a hot topic and frontier subject in current archival studies [14,15], and has accumulated certain experiences and achievements. Therefore, systematically analyzing the current research hotspots and trends in China’s archival digitization research under the background of “smart +” can provide reference and guidance for theoretical research and practical work on archival digitization.
The purpose of this paper is to quantitatively examine and visually analyze representative figures, publishing institutions, research hotspots and their evolution in the field of archival digitization research in China, and to propose reflections on existing research on archival digitization and possible future research directions, so that scholars can understand the research content of archival digitization in China more intuitively and provide a reference for future research. Using the journal literature on the subject of “archival digitization” in the China National Knowledge Infrastructure database as the data source, we use BICOMB, python and Excel data processing tools to identify the hotspot areas of Chinese archival digitization research through the word frequency analysis method; we identify the frontier and development trend of Chinese archival digitization research by detecting the changes of high-frequency words; we analyze the external characteristics of the literature, such as authors and issuing institutions, to understand the current development of Chinese archival digitization research fields; and we use the information visualization software CiteSpace, VOSviewer, and python to draw a line graph of the annual publication volume of Chinese archival digitization, a network of cooperative relationships of representative authors, a network of keyword co-occurrence relationships, and keyword clustering. Knowledge graphs, keyword emergence graphs, etc. are drawn to visualize representative authors, research hotspots and frontiers of Chinese archival digitization. Finally, we point out the possible future research directions in the field of Chinese archival digitization in the context of related studies and the current era.

2. Research Methods and Data Collection

2.1. Research Methods

2.1.1. Bibliometrics

This study used bibliometric methods to analyze the development trend of digitalization of Chinese archives. Bibliometrics is a quantitative analysis method that uses scientific and technological literacy and its various external characteristics as research objects. It employs mathematical and statistical methods to describe, evaluate, and predict the current state and development trends of science and technology. Its main feature is to output a certain amount of quantified information content [16]. Bibliometrics analysis could be enhanced with scientific maps representing the relationship among the different actors (authors, institutions, countries, etc.) [17]. VOSviewer has fantastic visualization and is capable of loading and exporting information from many sources. CiteSpace allows the analysis and visualization of trends and patterns in a research area [18,19]. The main goal of this tool is to facilitate the analysis of emerging trends in a knowledge domain. It was developed at Drexel University (USA). Bicomb adopts the current mature and popular database language development. It can quickly read bibliographic information in a literature database, and it accurately extracts fields, classifies, stores, and creates statistics. In addition, it can generate the co-occurrence matrix of bibliographic data, to provide comprehensive, accurate, and authoritative basic data for further research [20]. This study takes some of the literature in the field of archival digitization research included in the CNKI database as the research object, and uses BICOMB to count the keyword word frequency, author postings, and institutional postings; CiteSpace to construct the author partnership network, keyword co-occurrence clustering map, and keyword emergent map; and VOSviewer to construct the keyword co-occurrence map, to analyze the current situation and development trend of archival digitization research in China.

2.1.2. Knowledge Map

A knowledge graph is a graphical representation of the development and structural relationships of real-world scientific knowledge. It is used to “apply mathematical, statistical, graphical, and information science methods in a combined way to discover, describe, analyze, and ultimately display the interrelationships between textual content in literature. This visualization method is used to display the development history, research status, research hotspots, and frontiers of disciplines” [8]. Figure 1 below shows the thought process and flow of constructing a knowledge graph of Chinese digitalization of archives research. Firstly, literature was searched according to the research topic (digitalization of archives) and other conditions; after relevant literature was retrieved, manual reading and checking confirmed if they met the criteria, and manual screening was conducted to export the required literature citation format, which was then stored. Next, keywords were extracted, and their frequency was counted based on the research timeframe. Finally, a keyword co-occurrence network was constructed for the extracted keywords, and software such as CiteSpace and VOSviewer was used to generate a knowledge graph. Keywords were clustered, and a co-occurring cluster knowledge graph was generated.

2.2. Data Collection

The research data were obtained from the China National Knowledge Infrastructure database (CNKI; https://www.cnki.net/). CNKI was selected because it is the largest mainland Chinese journal full-text database in the world and covers almost all kinds of research disciplines. Advanced search criteria were set, with the search topic set as “档案数字化”. The search time range is set from January 1995 to September 2022. The earliest literature on the subject in the databases dates back to 1995. It is for this reason that we set the time frame of this review. Furthermore, taking into account the potential impact and the quality of the relevant studies in this field, the review of Chinese publications was restricted to journals listed in the China Social Sciences Citation Index (CSSCI), Chinese Core Journals, SCI, EI and CSCD (Chinese Science Citation Database). In this step, only core journals were considered to maintain academic authority; articles published in general journals were excluded. It should be noted that the journals on CNKI are divided into core journals and general journals, among which core journals are formally rated by Chinese research institutions with academic authority. Therefore, articles published in such journals have more academic reference value [21]. A total of 1869 Chinese language articles were retrieved; however, some literature was irrelevant to the digitalization of archives. To ensure the accuracy of literature selection, manual judgment and screening were conducted, resulting in the removal of articles without authors and those unrelated to the research content, as well as notices, announcements, newspapers, conferences, news updates, and government documents. Ultimately, 1267 articles were obtained for analysis.

3. Results Analysis

3.1. Overall Trend Analysis of Publication Volume

The number of publications on digital archiving in China before October 2022 was collected and plotted to show the annual variation in publication volume (Figure 2). Figure 2 shows that research on digital archiving first appeared in the late 19th century, with an increasing trend followed by a decreasing trend in yearly publication counts. Based on the changes in annual publication volume, the research on digital archiving in China can be roughly divided into three stages: 1995–2004, 2005–2014, and 2015–2022. During the initial stage (1995–2004), the annual publication count was relatively low and always below the average of 47 articles per year, but showed an upward trend each year. The period from 2005–2014 was characterized by rapid growth, during which the total number of publications showed an overall upward trend, with several years (2014, 2013, and 2011) having the highest publication counts. The final stage (2015–2022) was marked by a decline in publication volume, with a gradual decrease in the number of publications each year.

3.2. Author Analysis

Thanks to the efforts of many scholars, China has made significant progress in the field of digital archiving. To further understand the contributions made by Chinese scholars in this area, we conducted a statistical analysis of the publication volume of authors in the field of digital archiving in China from 1995–2022 and identified the most representative authors, their affiliations, and research directions (see Table 1). The analysis revealed that: 1. The top three authors in terms of publication volume were Zhaoyu Zhang, Xianjie Bian, Yali Luo, and Weihong Lin, but their affiliations were relatively scattered, indicating that no core research institutions have yet been formed in the field; 2. In terms of the research directions of representative scholars, there is an obvious interdisciplinary research trend in current digital archiving research, which mainly focuses on new fields that are a fusion of computer software and applications, archives and museums, as well as library and information science, among other disciplines.
To more intuitively display the collaborative relationships among authors, this study utilized CiteSpace (6.2.1, Copyright (c) 2023 Chaomei Chen) to construct and illustrate a network of author collaborations from 1995 to 2022 (see Figure 3).
By Price’s Law, the core group of authors in the field of digital archival research was defined, and the calculation formula is as follows: N min = 0.749 × N max , where Nmin represents the minimum number of papers published by a core author and Nmax represents the maximum number of papers published by a single author in the field. If an author has published more than Nmin papers in the field, that author is then considered a core author [22]. In the field of digitalization research of Chinese archives, Zhaoyu Zhang has the highest number of published papers, with 10 papers. According to the above formula, Nmin is approximately equal to 3, and authors who have published three or more papers are considered to be core authors, with a total of 43 individuals meeting this criterion. When drawing the network diagram of author collaborations, the names of the core authors are displayed. As shown in the figure, the author’s collaboration network can be interpreted as the collaborative relationships between scholars who are representative of the research on the digitalization of archives. The network consists of 1363 nodes and 576 edges, indicating there are 1363 authors and 578 collaborations among them. The number of edges is less than the number of nodes, and there are many isolated points in the network, which indicates that the collaborations among the authors are relatively loose. The size of each node represents the number of articles published by the author: the larger the node, the more articles are published by the author. The thickness of each edge represents the strength of co-occurrence, with thicker edges indicating more frequent and closer collaborations among the authors [23]. It can be seen that a main collaboration network has formed, represented by scholars such as Weidong Zhang, Ping Wang, and Hongying Zhao. Due to differences in research fields, other networks also exist within the overall collaboration network, such as the sub-networks formed by Liu Yong and Yanping Wu, or Yongsheng Chen and Sixin Xue. Research shows that inter-group collaboration contributes to publication of high-quality papers [24]. Therefore, scholars in the field of archival digitization should increase communication and strengthen collaboration to produce more valuable research.

3.3. Analysis of Research Institutions

The frequency of occurrence and analysis of the publishing institutions in 1267 articles were conducted. The study found that three institutions published articles 10 times or more and 16 institutions published articles five times or more. In archival digitization research, both universities and archives play an extremely important role. Within the colleges and universities, the schools of management and history have the highest overall distribution of publishing institutions. Table 2 lists representative research institutions, including the Information Resource Management School of Renmin University of China, the Zhejiang Provincial Archives Bureau, the National Archives Administration, the Second Historical Archives of China, and the School of Management of Jilin University, which are among the top five institutions.

3.4. Research Content Analysis

3.4.1. Frequency Keyword Count

To conduct an in-depth analysis of the research paradigm, hot topics, and evolutionary trends in the field of digital archiving, we utilized the Bicomb software to extract the keywords from each article. We then processed these keywords by merging semantically consistent ones, such as “digital archives”, “archival digital construction”, “library collection digitalization”, and “digitization”, which were merged into “digital archiving”. Using high-frequency keywords as a sample, we analyzed the research themes and content at each stage and calculated three stages with word frequency of three or more keywords and their cumulative percentages (see Table 3). Overall, the number of high-frequency keywords gradually increased from 27 in the first stage to 239 in the second stage, and then to 136 in the third stage, indicating that the research content related to digital archiving has gradually increased.
To further analyze and compare the changes in research content at different stages, we conducted a word frequency count and sorted the keywords. We then selected the top 15 keywords by frequency for display and analysis. Some high-frequency keywords are shown in Table 4, where the rightmost column displays the high-frequency keywords and their frequencies aggregated across the three stages from 1995 to 2022.
To facilitate the comparison of characteristics across the three stages, we sorted the high-frequency keywords in each stage and created a Venn diagram. From Figure 4, it can be seen that “archive digitization”, “digital archive”, “archive”, and “archives” are the common high-frequency keywords across all three stages. Excluding the common keywords, the first and second stages share the keywords “archive information resources”, “electronic documents”, and “archival information”; the second and third stages share the keywords “digitization of paper archives”, “archive management”, and “archival work”; while there were no shared keywords between the third and first stages, indicating a significant shift in research focus over time. The second stage (2005–2014) introduced new keywords, such as “digitization of paper archives”, “archive management”, and “archival work”, compared to the first stage (1995–2004). The third stage (2015–2022) introduced new keywords such as “archive industry”, “digital work”, and “digital transformation”, compared to the second stage (2005–2014). Overall, archive, electronic files, digitization of paper archives, and archival management have consistently been the focus of research in the field of digital archiving, as evidenced by the high-frequency keywords across the three stages from 2009–2021.
Based on the keywords and research hotspots of each stage, the three stages of Chinese digital archiving research can be named the Physical-to-Digital Conversion Phase, the Digital Resource Conversion Phase, and the Management Professionalization Phase. In the Physical-to-Digital Conversion Phase, the academic community focused on using tools or technologies such as digital cameras, scanners, and OCR to convert physical archives into electronic files, preprocess these files, archive them, and store them in databases. In the Digital Resource Conversion Phase, integrating existing archival resources, conducting big data analysis and deep mining, and fully utilizing the latent value of archival data to enhance archival management and public services were the key focus areas of research in the field of digital archiving. In the Management Professionalization Phase, with the widespread application of emerging information technologies, the archival work environment, objects, and contents underwent profound changes, necessitating a comprehensive review of the national and social environment of the archival system. This has accelerated the transformation and innovation of archival work concepts, models, and methods, which is also the focus of academic attention.

3.4.2. Co-Occurrence Analysis of High-Frequency Keywords

Keywords represent the essence of a research paper. In this study, VOSviewer (1.6.18, Copyright (c) 2009-2022 Nees Jan van Eck and Ludo Waltman) software was utilized to extract the keywords from 1267 core papers to further investigate the latest developments in digital archival research. To prevent an overly large network structure, the minimum link strength of the network was set to 5. The resulting co-occurrence graph of the keywords is shown in Figure 5, where each node represents a keyword (research topic) and each edge represents a direct relationship (co-occurrence) between keywords. Additionally, larger nodes indicate a higher frequency of occurrence of the corresponding keyword, while thicker edges indicate a closer relationship between nodes. The complex interconnections between the keywords suggest a high level of interconnectedness between them.

3.4.3. Clustering Analysis

To further explore the hot topics in digital archival research, this study utilized CiteSpace software to generate a keyword clustering map, as shown in Figure 6. The specific operational process involved setting parameters in CiteSpace. Firstly, the time range was set from 1995 to 2022, with yearly intervals for time slicing. Secondly, network pruning was conducted using the Pathfinder algorithm, then Pruning Sliced Network, and finally Pruning The Merged Network, to prevent an overly large network structure. Finally, the LLR algorithm (log-likelihood ratio) was used to generate the co-occurrence clustering map of keywords, as shown in Figure 5. In the map, S = 0.9604 (silhouette coefficient), Q = 0.872 (modularity). A Q value greater than 0.3 (empirical value) indicates that the category structure is significant, while an S value greater than 0.7 means that the clustering is efficient and convincing [23]. As CiteSpace cannot automatically generate the optimal number of categories, manual reading was applied to determine that 10 categories resulted in the best effect, with large differences between different themes and high similarity among keywords within the same theme. Thus, the software automatically generated 10 major clusters which displayed the top hot topics in digital archival research. Generally, the order of the clusters reflects the popularity of the corresponding topic. From the figure, these topics are: #0 Electronic files; #1 Archives Department; #2 Retention Period; #3 Digitalization; #4 Archives Bureau; #5 Collection Optimization; #6 Archives Management; #7 Outsourcing; #8 Archives; #9 Historical Archives. For ease of analysis, this study extracted information on each cluster, which is presented in Table 5.
The above clusters were automatically generated by CiteSpace software, based on algorithms, and have a certain reference value. To further analyze and refine the research themes related to digital archival, this study conducted manual screening based on software clustering. After extensive literature reading and investigation, four themes were identified: Institutional construction; Management mode; Management technology; and Archival form. Among them, institutional construction responds to the demand for archival digitization; management modes indicate the direction of archival digitization; management technologies provide the impetus for archival digitization; and archival forms bear the source of archival digitization. The relationships between these four themes are shown in Figure 7, and each theme is described in detail below.
  • Institutional Construction
Institutional construction responds to the demand for archival digitization. The goals of improving the quality of archival resource construction, the level of archival utilization services, the effectiveness of archival governance, and the degree of modernization in the process of archival institution construction correspond to the demands of various sectors of society for archival digitization. In recent years, research has focused on the construction of archival institutions such as smart government archives, smart universities, smart libraries, smart archives, and smart museums under the background of big data. Among them, in the construction of smart government, Yi Zhao conducted research on the development of the archival industry in Shanghai, proposing that the digital transformation of the archival industry should be promoted from four aspects: integration, connectivity, innovation, and leadership [25]. In terms of the construction of smart universities, Xianmei Lang pointed out that the digitization reform of university scientific research archives is a trend, and traditional university digital scientific research archives should be improved through approaches such as enriching digital resources, ensuring information security, building legal norms, and enhancing service capabilities [26]. In terms of the construction of smart libraries, museums, and archives, Xuefang Zhu analyzed the management mechanism of the convergence of library, museum, and archive services from the perspective of foreign practices and awareness, and finally proposed suggestions for the digital service integration mode for China’s libraries, museums, and archives [27].
2.
Management Mode
Archival management mode indicates the direction of archival digitization. The “open, intelligent, secure, and efficient” archival management mode provides direction for future archival digitization development and catalyzes the transformation of traditional archival management modes. Currently, the informatization of management entities (institutions and personnel) is underway, and archival workers are becoming information resource managers, data workers, and information workers, who undertake work responsibilities such as information organization, information analysis and mining, management system construction and improvement, and so on. The systematization of management systems involves various types of electronic file archiving management systems, digital archival resource backup management systems, digital archival room application system operation and security management systems, agency archival resource backup management systems, talent allocation and funding guarantee systems, and departmental responsibilities and rewards and punishments systems. The intellectualization of management methods involves the combination of machine learning, knowledge graphs, data correlation, semantic networks and other technologies in the intelligent search, mining, personalized recommendation, and book publishing of China’s libraries, intelligence agencies and other industries, conducting relevant research and trials [28]. The integration and integration of management systems involve the application of technologies such as big data, cloud computing, artificial intelligence, blockchain, and digital tracing, which can realize the integration of full business systems, and can provide more convenient and intelligent information services more quickly, thereby maximizing the balanced efficiency of information resource utilization [29].
3.
Management Technology
Management technologies provide the impetus for archival digitization. The research and application of scientific and technological advances, such as big data, cloud computing, and artificial intelligence, provide the most fundamental and lasting driving force for the development of archival digitization and are promoting human society to usher in the “intelligent +” era of human–machine collaboration, cross-border integration, co-creation, and sharing. "Intelligence + archives" means applying stronger, high-tech technologies to archival management work, to optimize the business links of the archival collection, collation, storage, retrieval, and utilization and continuously improve the intelligence level of archival management [30]. Currently, China’s practice of intelligent management of archival mainly focuses on two aspects. The first is the establishment of intelligent archival warehouses, including intelligent archival dense shelves with physical archive management as the core, and intelligent control of archival warehouse environment and security [31,32]. The second is the intelligent integration of archival information content using digital technology, including intelligent detection of archival digitization results [33], intelligent conversion of archival information [6], and automation control of some business links of electronic file archiving and collation [34], thus improving the efficiency and level of archival management.
4.
Archival Form
Resource form carries the source of archival digitization. Archival resources are the source of archival digitization, and digitized archival resources are the foundation for all archival digitization applications. In the process of digitization, from the perspective of the existing forms of archives, archives have changed from physical to electronic, to digital and, finally, to data [29]. Among them, physical archives are archives that can be identified with the naked eye or with the help of magnifying tools directly from the carrier. Electronic archives are electronic files with preservation value that are archived and saved [35], corresponding to physical archives of the traditional era. Digital archives include digitized copies of electronic archives and traditional carrier archives. The difference between electronic archives and digital archives is that the digitized copy of a traditional carrier archive belongs to a digital archive [35], but it does not belong to an electronic archive. Data is the value form of new archives. Narrowly defined data is expressed in digital form, and broadly defined data refers to all facts recorded by symbols. Scholars have analyzed the connotations of “digitization” and "datafication". Among them [36], Ziqing Zhou believes that datafication focuses on the analysis of information content, the excavation of the intrinsic value of information, the control of the development laws of information, and the prediction of informed decision-making. Zhao Yue believes that datafication is inherent in digitization, but different from digitization. The datafication of archives is the continuation and improvement of digitization, and the significance of datafication is to transform the way of using archives, from “reading” to “analysis” [14]. In summary, datafication is the expansion and deepening of digitization, and archival datafication is a higher form of archival digitization.

4. Possible Future Research Directions

Using the Burst Detection algorithm in CiteSpace can reveal the cutting-edge areas and development directions of archival digitization research [37]. The burst keyword statistical graph displays the burst periods and intensities of keywords during the research period, as shown in Figure 8. Keywords with burst intensity greater than 4 include “archive,” “photo archive”, “university”, “archive bureau”, and “archival work”. Among them, "archive" has the highest burst intensity of 4.8, with a burst year from 1996 to 2005, indicating that "archive" became an explosive topic during this period. From the perspective of the influence period of the emergent words, "scanner" has the longest impact time, which is 12 years. Looking at the three stages separately, the burst keywords for the first stage are “scanner”, ”archive”, “photo archive”, and “archival document”; the burst keywords for the second stage are “paper-based archive”, “historical archive”, “scanning”, “university”, “archive bureau”, “outsourcing”, “electronic archive”, and “outsourced work”; and the burst keywords for the third stage are “big data”, “archival profession”, and “archive work”. Among them, "big data" and or "archive work" are likely to remain the focus of research in the field of archival digitization for the next period.
Through analyzing the research content and trend changes in the field of domestic archives digitization in the past 28 years, especially the analysis of the research content in the past five years, and combining with the connotation of archives digitization and the statistical chart of keywords depicting mutation, this study believes that big data, archives work, and digital archives will become the focus of research on archives digitization in the new era. Among them, big data technology is an important supporting technology for the construction of digital archives, and the construction of digital archives promotes the development of the archives work.
  • Big Data
So-called “big data” refers to the technical architecture and process of data collection, processing, and value extraction carried out when the data sample is large enough; that is, when the data volume is scaled, and the data type is diversified [38]. The application of big data information technology can achieve virtual storage of archives, reduce archive management costs, make archive retrieval more convenient, facilitate electronic document management, promote professional archives management, and facilitate the long-term preservation of archives [39]. The value of big data lies in its use, but usability problems such as low consistency, poor timeliness, insufficient correlation, and poor accuracy of archival data will become the pain point of archival data management, resulting in archival resources mostly staying at the level of simple organization and retrieval [14], while the development and mining at the content level have not yet attracted sufficient attention. Improving data usability through mechanism construction, technology optimization and resource allocation has become a topic of universal significance and is worthy of in-depth investigation [40]. This requires archive managers to uphold the concept of keeping up with the times and putting science and technology first, continually learning and mastering the “full data mode” technology of the big data era, mastering cloud computing technology, and keeping pace with technological development. Therefore, how to use big data information technology to manage the huge amount of archival information and make archives more intelligent and smarter has become the key content of academic concern.
2.
Archives Work
Managing and developing archive information resources according to scientific principles and methods to serve all sectors of society is the core task and goal of archives digitization. The Chinese archives industry consists of organizational management of the archives industry, archives at various levels and types of archives, archives rooms of government agencies, organizations, enterprises, and institutions at all levels, professional archives education, archival science, and technology research, archival publicity and publishing, and international communication and cooperation in the archives field. General Secretary Xi Jinping emphasized, in an important instruction, to “strengthen the Party’s leadership over archives work”. Provinces and cities have launched the digital transformation of the archives industry to promote innovative development. Since the theoretical and practical exploration of archival datafication is still in the preliminary stage, the existence of uncertainties in the real demand for archival datafication, archival datafication policy support, archival datafication standards and specifications, and the technical aspects of archival datafication also make the promotion of archival datafication face difficulties, the most important of which is reflected in the policy and motivation level, talent and technology level. In terms of policy, since there is no clear policy trend about archival datafication at the national level, many local archival institutions also lack the initiative and enthusiasm to explore the road of archival datafication, which also makes the archival departments lack the motivation to explore archival datafication [14]. In terms of talents, there are problems, such as the unreasonable structure of archives professionals, the low skills of archives professionals, and weak information literacy and information mining ability [41,42]. How to implement technological and effective digital transformation of archive industry development in the context of urban digital transformation is worth an in-depth study in the foreseeable future.
3.
Digital Archives
Digital archives refer to the archival information integration and management system that uses modern information technology to collect, process, store, and manage digital archive information at all levels, and types of archives to meet the growing demand for archive information resource management and utilization in the information age and provide public archive information services and resource utilization through various network platforms. In the context of the smart city, the construction of wisdom archives is gaining greater attention. Wisdom archives are a higher stage of development of digital archives [43], which eliminates the disadvantages that the management objects of digital archives are relatively isolated and the management mode still relies on manual work and has not yet reached the intelligent collection and deep excavation of archival resources; the management objects are comprehensive and diversified, and through real-time sensing, aggregation and excavation of various kinds of data, the efficient aggregation and value-added utilization of resources are realized, and the functions of archives are improved in all aspects [44]. As a new form of archives in the future, smart archives are a new concept and practice for the transformation, innovation, and sustainable development of archives [45]. Therefore, the construction of digital archives has again become an important issue in the study of archival digitization.

5. Conclusions

The objective of this study has been to quantitatively examine and visually analyze representative figures, publishing institutions, research hotspots, and their evolution in the field of archival digitization research in China in the period between 1995 and 2022. For this, bibliometric analysis has been carried out on 1267 research articles in the CNKI database.
The objective of this study has been to analyze the evolution of research articles on CSR and sustainability at a global level in the period between 2001 and 2020. For this, bibliometric analysis has been carried out on 3079 research articles in the database Scopus. Analyzing the research directions of representative scholars and publishing institutions shows a clear trend of interdisciplinary research in current digital archives research, mainly focused on new fields of cross-disciplinary fusion such as archives and museums, computer software and applications, and library and digital library. However, the cooperation between authors and institutions is relatively loose, and a core research institution has not yet been formed. According to changes in annual publication volume, China’s digital archives research can be roughly divided into three stages, named as the Physical-to-Digital Conversion Phase (1995–2004), the Digital Resource Conversion Phase (2005–2014), and the Management Professionalization Phase (2015–2022), based on the keywords and research hotspots of each stage.
In terms of research content, major research areas, such as archives, electronic documents, digitization of paper archives, and archival management, have always maintained high attention. There are four main thematic directions in China’s digital archives research field: Institutional construction; Management mode; Management technology; and Archival form. Among them, archival institutional construction responds to the demand for archival digitization; management modes indicate the direction of archival digitization; management technologies provide the impetus for archival digitization; and archival forms bear the source of archival digitization.
Through analyzing the research content of the digital archives field, combined with the connotation of digital archiving and the statistical chart of key mutation keywords drawn, it is believed that big data, archival work, and digital archives will become important research directions in the future digital archives research field. Among them, big data technology is an important supporting technology for the construction of digital archives, while the construction of digital archives promotes the development of archival work.

Author Contributions

Conceptualization, L.Q. and Y.Z.; methodology, Y.Z.; software, Y.Z.; validation, L.Q., Y.Z. and J.Z.; formal analysis, L.Q. and Y.Z.; investigation, L.Q. and Y.Z.; resources, L.Q. and Y.Z.; data curation, Y.Z.; writing—original draft preparation, Y.Z.; writing—review and editing, L.Q., J.Z. and Y.Z.; visualization, Y.Z.; supervision, L.Q. and J.Z.; project administration, L.Q.; funding acquisition, J.Z. and L.Q. All authors have read and agreed to the published version of the manuscript.

Funding

The Project of Cultivation for young top-notch Talents of Beijing Municipal Institutions (No. BPHR202203235), and the Program for Promoting the Connotative Development of BISTU (No. 5026010961).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available at https://drive.google.com/drive/folders/1wfBS-krJSTVf8yXwYOxt4y6jM7nogTCV?usp=share_link (accessed on 26 March 2023).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Barata, K. Archives in the Digital Age. J. Soc. Arch. 2004, 25, 63–70. [Google Scholar] [CrossRef]
  2. Sun, Y.; Qin, F. Thoughts on Digital Management of Engineering Archives. Ind. Eng. Innov. Manag. 2022, 5, 58–62. [Google Scholar]
  3. Meng, T.; Hui, L. Application of Information Technology in Digital Archives Management. In Proceedings of the 2017 International Conference on Education and E-Learning, ACM, Bangkok, Thailand, 2 November 2017; pp. 81–83. [Google Scholar]
  4. Noev, N.; Todorov, T.; Bogdanova, G. Digitization and 3D Scanning of Historical Artifacts. Digit. Present. Preserv. Cult. Sci. Herit. 2013, 3, 133–138. [Google Scholar]
  5. Bukhari, S.S.; Kadi, A.; Jouneh, M.A.; Mir, F.M.; Dengel, A. AnyOCR: An Open-Source OCR System for Historical Archives. In Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017; Volume 1, pp. 305–310. [Google Scholar]
  6. Cheng, Z. An Analysis of the Application of Artificial Intelligence Technology in Archive Digitization Work. China Arch. 2021, 570, 64–65. [Google Scholar]
  7. Sun, Y. Construction and Research of Digital Archives Cloud Platform Based on Big Data Management. J. Phys. Conf. Ser. 2021, 1881, 042094. [Google Scholar] [CrossRef]
  8. Ming, X.; Xiaohua, Q.; Jie, H.; Guojun, L.; Zhaohui, F. Comparison of Software Tools for Mapping Knowledge Domain. Libr. J. 2013, 32, 61–69. [Google Scholar] [CrossRef]
  9. Kumar, A.N.; Miga, M.I.; Pheiffer, T.S.; Chambless, L.B.; Thompson, R.C.; Dawant, B.M. Persistent and Automatic Intraoperative 3D Digitization of Surfaces under Dynamic Magnifications of an Operating Microscope. Med. Image Anal. 2015, 19, 30–45. [Google Scholar] [CrossRef]
  10. Müller, L.; Tipold, A.; Ehlers, J.P.; Schaper, E. Digitalisierung der Lehre?–Begleitende Bedarfsanalyse zur Implementierung von Vorlesungsaufzeichnungen in der tiermedizinischen Ausbildung. Tierärztl. Prax. Ausg. K Kleintiere Heimtiere 2019, 47, 164–174. [Google Scholar] [CrossRef] [PubMed]
  11. Li, P.; Li, J. Operations of Museum Digitization: Case Studies Comparing China and the U.S. J. Supply Chain Oper. Manag. 2019, 17, 56–95. [Google Scholar]
  12. Tian, Y.; Zhang, L.; Wang, X. Knowledge Network and Visualization Analysis of Image Archive Digitization Topic Research. In Proceedings of the 2021 the 5th International Conference on Virtual and Augmented Reality Simulations, Association for Computing Machinery, New York, NY, USA, 11 December 2021; pp. 67–72. [Google Scholar]
  13. Chen, Y.; Su, H. The Value and Problems of Digital Preservation for Historical Documents in China. Proc. Doc. Acad. 2017, 4, 14. [Google Scholar] [CrossRef]
  14. Yue, Z. Prospects for Archives Datafication in Big Data Era: Significance and Dilemma. Arch. Sci. Study 2019, 52–60. [Google Scholar] [CrossRef]
  15. Jiang, S.; Zihan, L. Research on Innovation of Archives Management Methods in the “Intelligence +” Era. Arch. Sci. Study 2021, 179, 54–59. [Google Scholar] [CrossRef]
  16. Liang, Z.; Xianxue, M. The Comparative Study on Bibliometric Method and Content Analysis Method. Libr. Work Study 2013, 208, 64–66. [Google Scholar] [CrossRef]
  17. Moral-Muñoz, J.A.; Herrera-Viedma, E.; Santisteban-Espejo, A.; Cobo, M.J. Software Tools for Conducting Bibliometric Analysis in Science: An up-to-Date Review. Prof. Inf. 2020, 29. [Google Scholar] [CrossRef]
  18. Chen, C. CiteSpace II: Detecting and Visualizing Emerging Trends and Transient Patterns in Scientific Literature. J. Am. Soc. Inf. Sci. Technol. 2006, 57, 359–377. [Google Scholar] [CrossRef]
  19. Chen, C. Science Mapping: A Systematic Review of the Literature. J. Data Inf. Sci. 2017, 2, 1–40. [Google Scholar] [CrossRef]
  20. Zeng, S.; Yang, H. A Bibliometric and Visualization Analysis of Knowledge Mapping in Digital Economy Research, 1992–2022. Sustainability 2023, 15, 6565. [Google Scholar] [CrossRef]
  21. Li, J.; Hu, Z.; Pan, L. Analysis of School Support: Systematic Literature Review of Core Chinese- and English-Language Journals Published in 2000–2021. Front. Psychol. 2022, 13, 933695. [Google Scholar] [CrossRef]
  22. Lin, L.; Tao, S.; Chunhua, C. The Knowledge Map and Hotspot Tendency of Team Conflict Management Based on Bibliometric Analysis from 1996 to 2020. Chin. J. Manag. 2021, 18, 148–158. [Google Scholar]
  23. Chen, Y.; Chen, C.; Hu, Z.; Wang, X. Principles and Applications of Analyzing a Citation Space; Science Press: Beijing, China, 2014; ISBN 978-7-03-041752-7. [Google Scholar]
  24. Zhiguo, S.; An, A.; Dehu, Y.; Qing, H. Bibliometric Analysis and Prospect of Research on Old Residential Community Renovation Based on CiteSpace. Urban Dev. Stud. 2021, 28, 5–10. [Google Scholar]
  25. Yi, Z. Research on the Development of Archives Undertaking Under the Background of City Digital Transformation in Shanghai. Arch. Sci. Study 2022, 73–78. [Google Scholar] [CrossRef]
  26. Xianmei, L.; Tao, Y. The Driving Forces and Implementation Methods of Digital Reform of Scientific Research Archives in Universities. Zhejiang Arch. 2021, 58–59. [Google Scholar] [CrossRef]
  27. Xuefang, Z. Research on Digital Service Convergence of Libraries-Museums-Archives Based on National Conditions. Inf. Stud. Appl. 2021, 44, 1–7+52. [Google Scholar] [CrossRef]
  28. Meijing, Y.; Yongheng, Z.; Jia, L.; Hui, W. Analysis of Artificial Intelligence Research Evolution of Library and Information Field in China. Sci. Technol. Manag. Res. 2020, 40, 155–161. [Google Scholar] [CrossRef]
  29. Haibin, D. Reasons and Connotations of the Foundamental Change of the Archives Informatization Revolution. Arch. Manag. 2022, 256, 5–13. [Google Scholar] [CrossRef]
  30. Hang, C.; Jing, W. An Initial Exploration of the Trend of Archives Management Transformation in the Perspective of “Intelligence+”. China Arch. 2019, 553, 72–73. [Google Scholar]
  31. Jin, Z.; Li, C.; Jianzhou, W.; Ying, Z.; Ping, S. Practice of Construction of an Integrated Control System for Archive Warehouses. China Arch. 2019, 547, 64–65. [Google Scholar]
  32. Xiaofeng, W. A Feasibility Study on the Intelligent Management Robots for Archive Warehouses. Arch. Constr. 2019, 363, 55–56+59. [Google Scholar]
  33. Yuanfei, S.; Lei, J.; Zhaoli, Y.; Hongwie, Y.; Zhongxiu, Y. A Study and Practice of Intelligent Quality Inspection Solution for Digital Achievements. China Arch. 2018, 536, 62–64. [Google Scholar]
  34. Jiangxia, L. A Study on Quality Control of Analog Audio-visual Archives in Digitization. Arch. Sci. Study 2018, 160, 101–106. [Google Scholar] [CrossRef]
  35. WanMei, Z.; Zhen, Z.; QiongHui, R. Concept Analysis of Electronic Records, Archival Electronic Records and Digital Archives. Shanxi Arch. 2019, 246, 10–18. [Google Scholar]
  36. Ziqing, Z. Data-Based Survival And Development of Archives in the Age of Big Data. Zhejiang Arch. 2022, 493, 45–48. [Google Scholar] [CrossRef]
  37. Jie, H.; Xiaobin, J.; Hanbing, L.; Rui, X.; Mingxuan, N. Research on the Development of Human Geography in Nanjing University at the Past 40 Years Based on Bibliometric Analysis. Mod. Urban Res. 2021, 2–10. [Google Scholar] [CrossRef]
  38. Jinhong, W.; Fei, Z.; Xiufang, J. Big Data: Opportunities, Challenges and Strategies of Enterprise Competitive Intelligence. J. Intell. 2013, 32, 5–9. [Google Scholar]
  39. Xianhong, H. Analysis on the Management and Optimization of University Library Archives Under the Background of Big Data. Libr. Work Study 2017, 262, 105–107. [Google Scholar] [CrossRef]
  40. Bo, J.; Feng, Z.; Peng, Y. Research of Archival Data: Review and Domain. Inf. Sci. 2021, 39, 187–193. [Google Scholar] [CrossRef]
  41. Zhen, H.; Wen, Y.; Sihui, T.; Wenming, X. New Normality and Practical Development of Education of Archival Science in the Era of Big Data. Arch. Sci. Study 2016, 148, 117–123. [Google Scholar] [CrossRef]
  42. Shuangshuang, M.; Tongzhu, X. Digital Transformation of Archives Work under the Background of Digital China Construction: Connotation, Dilemma and Approach. Arch. Sci. Study 2022, 189, 115–121. [Google Scholar] [CrossRef]
  43. Lin, Z. Analysis on the Causes of the Rise of Smart Archives in my country. Archives 2016, 268, 56–59. [Google Scholar]
  44. Li, N.; Jiayong, P. Constructing Service-oriented Smart Archives in China. Arch. Sci. Study 2018, 161, 89–96. [Google Scholar] [CrossRef]
  45. Jiali, M.; Gang, X. The Fusion of Wisdom and Wisdom: The Development Vision of Smart Archives. J. Southwest Minzu Univ. Soc. Sci. 2019, 40, 227–231. [Google Scholar]
Figure 1. The construction process of knowledge graph for Chinese archives digitalization research.
Figure 1. The construction process of knowledge graph for Chinese archives digitalization research.
Sustainability 15 07679 g001
Figure 2. Publication volume statistics from 1995 to 2022.
Figure 2. Publication volume statistics from 1995 to 2022.
Sustainability 15 07679 g002
Figure 3. Co-authorship network map for Chinese archives digitalization.
Figure 3. Co-authorship network map for Chinese archives digitalization.
Sustainability 15 07679 g003
Figure 4. Venn diagram of keywords in three stages.
Figure 4. Venn diagram of keywords in three stages.
Sustainability 15 07679 g004
Figure 5. Co-occurrence network map of keywords in Chinese archives digitalization research.
Figure 5. Co-occurrence network map of keywords in Chinese archives digitalization research.
Sustainability 15 07679 g005
Figure 6. Clustered co-occurrence network map of keywords in Chinese archives digitalization research.
Figure 6. Clustered co-occurrence network map of keywords in Chinese archives digitalization research.
Sustainability 15 07679 g006
Figure 7. Theme relationship in Chinese archives digitalization research.
Figure 7. Theme relationship in Chinese archives digitalization research.
Sustainability 15 07679 g007
Figure 8. Keyword emergent analysis in Chinese archives digitalization field.
Figure 8. Keyword emergent analysis in Chinese archives digitalization field.
Sustainability 15 07679 g008
Table 1. Partial representation of scholars and their publication volume statistics.
Table 1. Partial representation of scholars and their publication volume statistics.
RankingAuthorFrequencyPercentage/%InstitutionResearch Direction
1Zhaoyu Zhang100.789Yancheng Teachers UniversityArchives and Museums; Administrative Law and Local Governance; Computer Software and Computer Applications;
2Xianjie Bian70.553Yancheng Teachers UniversityArchives and Museums; Computer Software and Computer Applications; Library and Information Science with Digital Libraries;
3Yali Luo60.474Zhejiang UniversityArchives and Museums; Higher Education; Civil and Commercial Law;
3Weihong Lin60.474Zhejiang Provincial ArchivesArchives and Museums; Computer Software and Computer Applications; Administration and National Governance;
5Xueguang Li50.395Changchun City ArchivesArchives and Museums; Mathematics; Computer Software and Computer Applications;
5Hua Lin50.395Yunnan UniversityArchives and Museums; Modern Chinese History; Administration and National Governance;
5Ping Wang50.395China People’s Public Security UniversityLibrary and Information Science with Digital Libraries; Archives and Museums; Computer Software and Computer Applications;
5Yangxin Li50.395Sun Yat-Sen UniversityArchives and Museums; Administrative Law and Local Governance; Research on Health Policies and Regulations;
5Senlin Lu50.395Jiangxi Science and Technology Normal UniversityArchives and Museums; Higher Education; Tourism;
5Hongying Zhao50.395Jilin UniversityArchives and Museums; Library and Information Science with Digital Libraries; Computer Software and Computer Applications;
5Zhongying Yang50.395Beijing Municipal Archives BureauArchives and Museums; Computer Software and Computer Applications;
Table 2. Research institutions and frequency statistics.
Table 2. Research institutions and frequency statistics.
RankingInstitution NameFrequencyPercentage/%
1School of Information Resource Management, Renmin University of China231.7437
2Zhejiang Provincial Archives130.9856
3National Archives Administration of China110.8340
4The Second Historical Archives of China90.6823
5School of Management, Jilin University70.5307
6Department of Library, Information and Archives, Shanghai University60.4549
7School of Public Administration, Yunnan University60.4549
8The First Historical Archives of China60.4549
9School of Public Administration, Sichuan University60.4549
10Liaoning Provincial Archives60.4549
Table 3. Number and proportion of keywords with a frequency greater than 2.
Table 3. Number and proportion of keywords with a frequency greater than 2.
Phase1995–20042005–20142015–20221995–2022
Quantity27239136288
Cumulative Percentage (%)41.29160.35248.16662.213
Table 4. Partial high-frequency keywords.
Table 4. Partial high-frequency keywords.
Ranking1995–2004F2005–2014F2015–2022F1995–2022F
1Archive digitization89Archive digitization619Archive digitization218Archive digitization873
2Archive17Archive69Archive33Digital archive94
3Digital archive12Archives59Archive work30Archive87
4Archives10Electronic files53Archive management29Archives81
5Archive information resources8Digitization of paper records52Digital archive28Electronic files55
6Photo archive6Archives bureau51Archive industry22Digitization of paper archives50
7Electronic files6Paper archives45Digitization of paper archives22Paper archives45
8Database5Digital archive42Archives22Archive digital management43
9Archive information5Archive management35Digital work15Archive information resources41
10Digital management5Archive digital management35Digital processing14Archive management39
11Archive database5Original archives32Digital archive resources14Archive information36
12Electronic file archiving4Archive information31Electronic archives14Catalog database31
13Scanner4Archive work29Digital transformation13Archives bureau30
14Archive department4Catalog database29Digital archives12Digital processing30
15Archive management software3Archive information resources28University12Digital archives30
We use the symbol F to denote the frequency of occurrence of a given keyword in each of the four phases.
Table 5. Clustering information table of keywords in archives digitalization research.
Table 5. Clustering information table of keywords in archives digitalization research.
TypeCluster CodeCluster NameMain Nodes
Institutional construction#1Archives departmentUniversity, Grassroots archives, Educational administration departments, Teaching archives, Archive work, Archive utilization, Professional ethics
#4Archives bureauDistrict archives, County-level archives, Sichuan province, Lishui city, Public archives, Livelihood archives, Medical insurance archives, Promoting development
Management mode#6Archives managementBusiness processes, Integration, Digitization, Electronic school affairs, Quality control, Professional growth, Archive data, Exploration design
#2Retention periodFiling scope, Work standards, Archivists, Storage formats, Cold storage rooms, Protective materials, Nitrogen-filled packaging, Plastic bags
#5Collection optimizationOptimization, Collection of archives, Value appraisal, Preservation value, Preservation and utilization, Archive management, Paperless, Read-only CD
#7OutsourcingService outsourcing, Insourcing, Control, Security risks, Measures, Data backup, Benefits, Crowdsourcing, Outsourcing parties
Management technology#3DigitizationArtificial intelligence, Big databases, Experimental design, Data conversion, Technical conditions, Development practice | Development and utilization, Shared platforms
Archival form#8ArchivesMultiple carriers, Storage, Agricultural archives, Books, Storage, Multimedia, Hydrological yearbook, Classification system
#0Electronic filesDigital age, Document preservation, Information portal, Information retrieval, Information storage, Inevitable trend, Cloud technology, Offsite backup
#9Historical archivesAncient books, Protection strategies, Preprocessing, Security control, Certificate value, Principles, People-oriented, Information archives
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, Y.; Zhang, J.; Qi, L. Analysis of Hotspots and Trends in Digitalization Research of Chinese Archives Based on Bibliometrics. Sustainability 2023, 15, 7679. https://doi.org/10.3390/su15097679

AMA Style

Zhang Y, Zhang J, Qi L. Analysis of Hotspots and Trends in Digitalization Research of Chinese Archives Based on Bibliometrics. Sustainability. 2023; 15(9):7679. https://doi.org/10.3390/su15097679

Chicago/Turabian Style

Zhang, Yu, Jian Zhang, and Lin Qi. 2023. "Analysis of Hotspots and Trends in Digitalization Research of Chinese Archives Based on Bibliometrics" Sustainability 15, no. 9: 7679. https://doi.org/10.3390/su15097679

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop