Impact of the Coronavirus Pandemic on Science and Society: Insights from Temporal Bibliometric Networks

A global event such as the COVID-19 crisis presents new, often unexpected responses that are fascinating to investigate from both, scientific and social standpoints. Despite several documented similarities, the Coronavirus pandemic is clearly distinct from the 1918 flu pandemic in terms of our exponentially increased, almost instantaneous ability to access/share information, offering an unprecedented opportunity to visualise rippling effects of global events across space and time. Personal devices provide “big data” on people’s movement, the environment and economic trends, while access to the unprecedented flurry in scientific publications and media posts provides a measure of the response of the educated world to the crisis. Most bibliometric (co-authorship, co-citation, or bibliographic coupling) analyses ignore the time dimension, but COVID-19 has made it possible to perform a detailed temporal investigation into the pandemic. Here, we report a comprehensive network analysis based on more than 20000 published documents on viral epidemics, authored by over 75,000 individuals from 140 nations in the past one year of the crisis. In contrast to the 1918 flu pandemic, access to published data over the past two decades enabled a comparison of publishing trends between the ongoing COVID-19 pandemic and those of the 2003 SARS epidemic, to study changes in thematic foci and societal pressures dictating research over the course of a crisis.


Introduction
Coronavirus and the significance of 'Big Data' Unlike the 1918 flu pandemic, COVID-19 has revealed how the ubiquitous use of networked personal devices, automated sensors and the internet can greatly impact the ability of a society to cope and survive. Data is constantly being collected and documented from an estimated 10 billion mobile phones, over 2000 satellites and more than 25 billion digital sensors, to monitor and quantify shifts in social and economic activities in response to the pandemic. Such 'Big data' is helping steer scientific research towards addressing the crisis and return to normalcy, and strongly impacts the state's inherent capacity to make informed policy decisions based on social trends and scientific evidence [1]. The world has witnessed an unprecedented flurry in scientific publications during the past year of the ongoing coronavirus disease-2019 (COVID-19) pandemic that has infected more than 120 million people on the planet, killing over 2 million as of March 2021 [2].

Temporal conceptualisation of published data
Extraction and analysis of knowledge from the scholarly corpus can add valuable insights and enable synthesis of existing research findings while delineating new directions for future research [3]. Rigorous bibliometric methods can identify coherent clusters in existing research that can serve as reference points and identify knowledge gaps that remain to be addressed [4]. In this regard, visualization and conceptualization of a complex co-citation corpus as networks enables derivation of biologically significant inferences from systematic analysis of detailed conceptual relationships [5]. Very recently, we developed a new decision support system based on recursive partitioning of bibliometric evidence, to simplify exploratory literature review, enabling rational design of research objectives for scholars, as well as development of comprehensive grant proposals that address gaps in research [6]. In this work, we use this method taking into account the time dimension (on a quarterly basis), to gain a near-real time glimpse into how the pandemic is impacting scientific research in different ways across spatial scales.

Bibliometric parameters
The basic parameters used to plot bibliometric networks include number of documents, number of sources (journals, books etc.) in which the documents have been published, number of Keywords Plus, number of authors, publication period, and Collaboration Index. Keywords Plus by Clarivate Analytics' Web of Science includes recurring phrases from all the titles in a document's reference list [7]. Collaboration Index is calculated as the number of authors of multi-author documents divided by the number of multi-author documents. It provides a quantitative metric to measure research collaboration [8].
A useful tool to analyze the contribution of sources in a collection is Bradford's law. The law categorizes the sources contributing to the research in a particular field into 'zones'. The top sources in the list are categorized as 'core sources' or 'Zone 1' sources that are most frequently cited in that field. Zone 2 and Zone 3 contain less frequently cited sources. Then the number of sources in each zone can be calculated as 1, n, n 2 , and so on [9]. Another such law is Lotka's law. It is used to measure author productivity and contribution to the research in a field. It is a modified inverse square law that can be used to calculate how many authors will publish any fixed number of documents in a field [10].
The diversity of research themes within a subject area can be analyzed using co-occurrence networks plotted for Keywords Plus. Similarly, collaboration networks for countries and institutes reveal trends in research collaboration. Another parameter used to quantify international collaboration (in addition to Collaboration Index) is Multiple Country Publication Ratio (MCP Ratio). A MCP is identified as a publication where at least one author is from a country different from that of the other authors. MCP Ratio is then calculated as the number of MCPs for a country divided by the total number of publications the country has contributed to the collection [8].

Harnessing the Data Revolution
In summary, this work contributes to harnessing the data revolution that is a unique feature of the current crisis unlike the 1918 flu pandemic. This paper develops a conceptual framework integrating the three dimensions of time, space and scientific evidence, to enable a reassessment of the nature, dynamics and nuances of bibliometric networks based on published data. Unfortunately, we also find that the extraordinary amount of data available today has little impact on the policy process at local or global scales. These insights should be a wake-up call to harness the data revolution more responsibly and carefully, in order to achieve a new normal that can be more resilient, safer and sustainable.

Materials and Methods
Data was collected using the Web of Science Core Collection search tool. The search terms used were: 'SARS', 'coronavirus', 'SARS-CoV-2' and 'COVID-19'. All data from the year 2001 onwards was downloaded. This was done on 17 April 2020. The data was organized into three groups: Another round of data collection was undertaken in January 2021. This time, the search term used was 'COVID-19' alone, since 'SARS-CoV-2' keyword matches were found to overlap with those of 'COVID-19'. All data from 1 January 2020 to 31 December 2020 was downloaded and organized into four temporal sections as follows: 1. Q1: all data from January to March 2020; 2. Q2: all data from April to June 2020; 3. Q3: all data from July to September 2020; 4. Q4: all data from October to December 2020.
The data from the four quarters was used to compare publishing trends over the course of the year of the pandemic. All data was analyzed using the Biblioshiny tool in the R Bibliometrix package [8,11]. Numerical data for various bibliometric parameters of quarters Q1-Q4 was also analyzed and plotted in Microsoft Excel.

Annual scientific production for 20-year data
The annual scientific production curve for Group B data (

General publishing trends
The complete bibliometric data and information collected for each of the Groups A, B, and C is depicted in Table 1. It was observed that the number of documents published during the first six months of the coronavirus pandemic (Group A) was 5.4 times the number of documents published in the first six months of the SARS epidemic (Group C), a significant rise, even after normalising for background noise of the previous year respectively (about 50 papers in 2001; 200 in 2019). This is despite the two datasets having the same baseline collaboration index for authors, a metric considered better than traditional metrices like H-index as they are able to account for collaboration, which can have a strong bearing on the estimated individual scientific impact [12].
However, Table 1 also reveals that the number of publication sources during COVID-19 was almost three times the sources that published data on the SARS epidemic, while the number of authors publishing their work in the first six months of the COVID-19 pandemic was about six times higher than for SARS Group C, suggesting that (a) Significantly more authors contributed to the surge of COVID-19 publications, and (b) Group A journals contributed to their collection more often.
Interestingly, Table 1 reveals much greater thematic focus of the relevant publications in 2020, as evident from a lower number of keywords in Group A as compared to Group C SARS data, and in retrospect, a pattern specific to the global crisis of 2020 with a worldwide surge in research aligning wth diverse aspects of the pandemic.
A country-wise comparison of the number of documents contributed to each group revealed one overall trend -more countries were involved in publishing at the start of the coronavirus pandemic than at the start of the SARS epidemic. Several African, Eastern European, and South American countries started publishing early on during the coronavirus pandemic. This was not seen during the SARS epidemic, and merits a detailed investigation of the 2020 publication patterns, as has been attempted in the subsequent section.  Figure 2 provides a glimpse into the Quarterly bibliography data over the course of 2020, grouped into Quarters Q1 to Q4 as described in methods. The publishing trends during the first three quarters of 2020 saw a steady rise as can be seen in this Figure. This included overall number of documents, which may have impacted the number of sources of these documents, core journals for each quarter (as per Bradford's law), keywords, and authors, all of which were observed to steadily rise from Q1 to Q3 and then decrease slightly during the last quarter Q4. The highest overall COVID-19 related publishing activity was seen during the third quarter Q3.
In contrast, the collaboration index was the highest during Q1, when the pandemic was still very new across the globe. This pattern suggests that at the beginning of the pandemic, a large number of authors came together to collaborate in order to address the crisis, but with time and progressive recognition of the severity of the crisis, these partnerships gave way to more focussed The collaboration index dropped to its lowest during Q2, after which it rose steadily until Q4, providing evidence that the second quarter of the pandemic year witnessed a change in collaborative tendencies of authors. This metric does not fully take into account the importance of the paper on its scientific community, but the relative contributions of its co-authors. This is important, since neglecting coauthor information can inhibit quantification of individual researcher's achievements and give others undue credit. Publication credit assignment is increasingly being seen as a major criterion in development of academic assessment or peer review systems [13]. It should be acknowledged here that other metrics exist that attempt to measure collaboration credit, but theoretical justification of the collaboration-index results in a convincing assessment of a researcher's involvement [12].
However, the rise in number of keywords with a simultaneous decrease in collaboration index, prompts a rigorous assessment of individual keywords, as has been attempted in the next sections of this paper, offering a more efficient bibliometric analysis.  Figure 3 reveals authorship trends between COVID-19 (Group A)and SARS 2003 (Group C). More than half (58.3%) of all authors in 2020 contributed only one document to the collection while 1% of all authors contributed five documents to the collection, a pattern quite close to the expected/theoretical Lotka curve (dotted line). However, in Group C, a more skewed curve was visible, revealing >80% authors contributing only one document to the collection and 0.2% authors contributing five documents.

Authorship trends and the need for gender normalization
An incredibly powerful measure of the pandemic's impact on working women in science is lost in the collections, since article metadata do not capture gender metrics. We tried to manually scan all 75,608 author names in our collections, but naming conventions can rely heavily on demographics, history and geographical regions. We are now working towards building a strategy to identify gender from first names and contextual information. A steadily increasing percentage of authors were seen contributing to the collections from Q1 to Q3 but trend reversed in Q4, a likely fallout of the holiday season, when most universities worldwide had appealed to academics to actively take vacations and breaks. In all four quarters of 2020, about 80% authors published single papers in the respective collections, following Lotka curves and we decided to investigate the range of research thematics across the four quarters, as described in the next section.  Figure 4 depicts Co-occurrence networks mapping top keywords in Group A and Group C, and clear patterns emerge from clusters in both SARS and COVID-19 that reveal how distinctly the scientific community addressed the outbreaks, strongly governed by public sentiment and responses. At the turn of the millennium, most SARS related research involved viral infections in murine, equine, porcine or human models, the use of gene/protein sequences was also emerging, but in complete isolation from other research clusters in the network. In contrast, research into COVID-19 advanced into diverse avenues like management, mortality, epidemiology and human transmissions, for all kinds of respiratory syndromes focussing on diverse location based geo-specific outbreaks. The strong overlap between all clusters reveals the huge connect between researchers and an interdisciplinary outlook towards the pandemic. An assessment of keyword co-occurrence maps for all four quarters of 2020 is depicted in Figure 5. In the first quarter of 2020 (Q1), focus of research was on biology of the disease (red), its pathogenesis (green) and treatment (blue), while work on 'mortality', 'management', and 'strategies' was beginning to emerge (small orange cluster). In Q2, the primary clusters from Q1 spread out to include more aspects that addressed infection and pathogenesis. Research into the societal impact of the pandemic (stress, risk, children, care, management and therapy) started to appear on the periphery of the map (blue cluster). During Q3, this blue cluster integrated with the red and green clusters, to occupy a more central position in the network, revealing the increasing impact of societal concerns on pandemic associated research. Keywords like 'depression' emphasize the extent to which the public was impacted by the pandemic, while keywords like 'modelling' reveal the focus on 'big data' driven machine learning initiatives. By Q4, the secondary impact (blue) cluster has taken an authoritative position on the map with keywords like 'impact' becoming most prominent along with emergence of aspects of mental health, well being, anxiety, and performance.  Figure 6 depicts the institutional collaboration networks for both COVID-19 and SARS groups, both clustered by a unified approach where a weighted variant of modularity-based community detection has been used to identify the institutions that have the maximum collaborating authors. The patterns are dramatically different with Group A (COVID-19) showing three fairly well-connected clusters in contrast to Group B (SARS) with eight small clusters that are entirely isolated from each other, suggesting very little collaboration between major research groups steering the investigations. The largest cluster in Group A (red) represents primarily Chinese institutes, but this cluster also includes the University of Melbourne (collaborating with Peking University) and the University of Oxford (collaborating with Tsinghua University). This cluster links to much smaller, international cluster (blue) via the University of Oxford and Peking Union Medical College Hospital. The onset of SARS on the other hand, resulted in the creation of distinct regional clusters, with the largest (red) set of institutions based in Hong Kong. The country collaboration networks for both groups showed very similar patterns with Group A having authors from several countries led by a strong collaboration between the United States of America (USA) and China, but also having authors from Denmark, Pakistan, Ghana and Canada, among others. Even the smallest collaboration clusters in COVID-19 reflected diverse regional representation eg. Japan with Honduras, Nepal and Colombia. Meanwhile, the publications at the onset of SARS were from a total of merely five economically powerful, developed countries of the world, which were, surprisingly, further divided into two isolated clusters (U.K-Australia and USA-Taiwan-Canada). These patterns further reiterate the extent to which COVID-19 has bridged scientific inequality, enabling new, more resilient researcher networks worldwide. Greater access to data, sharing of critical technology, and local insights have enabled researchers to better understand how the pandemic is impacting societies in different ways in different places. This in turn, has allowed the scientific community during COVID-19 (unlike SARS), to evaluate and bring out the best possible interventions to address the problems at various levels and improve the resilience of society at large.

Trends in collaboration across institutional and regional scales
MCP Ratio is a metric to quantify a country's international collaboration. We used this metric in the temporal quarterly COVID-19 networks and the patterns are depicted in Figure 7. Switzerland was observed to have the maximum rise in MCP Ratio over time; followed by Iran. Several other countries started from a value of zero or near-zero and showed large increases over the temporal quarterly networks, namely Turkey, Korea, Spain, Japan, further emphasizing a role of data access and strong connect between nation states. In contrast, France showed a sustained decrease in MCP Ratio after the first quarter, suggesting that it was at the top of the game when the pandemic broke, but was soon brought to a near closure of all international collaborations arising (presumably) from the severe societal disruption and nation wide impact of the pandemic with some of the world's highest mortalities during the first and second quarters of 2020. The MCP Ratio remained largely constant for most other countries including Canada, Germany, and UK. More insights into these trends can be gauged from Figure 8 which provides a detailed break up of the country collaboration networks over the four temporal networks. In the first quarter (Q1), China, USA, and UK dominated the map, while the second Q2 witnessed emergence of a strongly interconnected cluster of European countries (blue) appeared, as well as a peripheral cluster (green) of Latin American countries. The earlier (Q1-red) cluster moved to a less prominent space in the Q2 temporal network, despite these countries forging new collaborations with developing countries. In Q3, the clusters became even more interconnected with less polarization around USA, China, and UK. Some of the Latin American countries merged with the European cluster (blue). Many new developing countries joined the red cluster while the other clusters developed new links and overlaps. In the last quarter (Q4), the Latin American cluster reappears ( I green), while links between countries become fewer as compared to Q3. Each cluster in the Q4 temporal network remains highly interconnected, but the connections between clusters are distinctly fewer.
Institutes were ranked on the basis of number of documents they contributed to the collection. Seven institutes appeared consistently in the list for each quarter, with the Huazhong University of Science and Technology at the top, followed by Wuhan University (located in the epicenter of the pandemic), at Rank 2 from Q1-Q3. In the last quarter Q4, Wuhan was overtaken by University of Toronto, which featured in the third place from Q1-Q3. Quarterly COVID-19 collaboration networks for institutes across the year 2020. showed similar patterns. Q1 had extremely well-defined clusters with clearly discernible inter-cluster collaboration links in between clusters, with links to lesser-known Chinese institutes that first started publishing in this collection, apart from a few South Korean institutions. In Q2, Harvard Medical School appeared on the network and immediately took a large and central position, bringing with it an entirely new cluster of American institutes with limited ties to the Chinese cluster, and total disappearance of South Korean institutes. All clusters became more interconnected in Q3, with the Chinese cluster being pushed to a peripheral position. In Q4, there was a greater number of well-defined clusters ( European; British; American), with fewer inter-cluster links.

Discussion
A temporal bibliometric analysis of coronavirus-related research as presented in this work, offers a near-real time glimpse into how the pandemic is impacting societies in different ways in different places. This work also brings out the benefits of extremely advanced technical capacity as well as the extraordinary amount of data available in 2020, and its impact on the policy process. The temporal bibliometric networks shown here helped identify several interesting trends in academic publishing during two major epidemics.
We noted clear distinctions at each scale when the COVID-19 and SARS were compared (Group A and Group C). We also observed relatively lucid and comprehensible trends in each case across the four temporal networks (Collections Q1 to Q4). The annual scientific production for coronavirus-related research peaked during the SARS and MERS epidemics, and then again during the current COVID-19 pandemic.
On comparing data from the first six months of the SARS epidemic to data from the first six months of the COVID-19 pandemic, we found some predictable and some surprising patterns. The scientific world published about five times more at the start of the pandemic than at the start of the epidemic. Bradford's law curves showed us very different core journals lists for the pandemic and for the epidemic, indicating the dynamic nature of research even under the same umbrella of coronavirus research.
Lotka's law curves showed us that authors researching COVID-19 in 2019 were more 'dedicated' to publishing than those researching SARS back in 2003. In 2003, most authors published only once on SARS. But in 2019, authors researching COVID-19 were much more likely to publish more than once.
Keyword co-occurrence networks showed us that initial COVID-19 research was more interdisciplinary than initial SARS research. COVID-19 researchers started publishing on the biology, disease mechanism and epidemiology early on. These themes co-occurred frequently. However, SARS researchers mostly published on the biology of the disease in 2003.
The country production maps showed us that many more developing countries participated in the initial surge of COVID-19 research as compared to when SARS first appeared. These countries also collaborated much more with each other and with developed countries. Likewise, several institutes across the world also collaborated on coronavirus research from early on. This was not the case during the SARS outbreak, where mostly developed countries and a few institutes published most of the research.
Over the course of the pandemic year itself, we saw a steadily increasing interest in publishing COVID-19 research until the September 2020. There was a minor dip in interest from September to December 2020. However, research collaborations still increased steadily from earlier in the year.
Several journals saw a boom in COVID-19 research at the beginning of the year, so much so that they occupied a core position in the first quarter. However, after March 2020, most of those core quarter 1 journals disappeared from the collection altogether, for the rest of the year.
The BMJ produced a large number of documents throughout the year. However, its impact measured by h-index remained consistently below that of other journals that were publishing about half the number of papers, such as Lancet and Journal of Medical Virology. The much-discussed hydroxychloroquine paper was retracted by Lancet in June 2020. The h-index for Lancet fell rapidly after June 2020, reducing the h-index difference among these top journals.
Over the year, we saw research interest diversifying from disease biology to its secondary impact on people's mental health and wellbeing. By the end of the year, impact-related research had become common. Green spaces took on new importance across the world at this time of crisis, especially in urban areas, as evident from funding agency priorities and thematic maps, reinforces the need for (and health benefits of ) accessible public parks and forested areas. These benefits of green spaces can be factored into post-COVID urban planning policies.
While most countries saw a general decrease in how often their research was cited, this was not the case for Switzerland, India, and Iran in the second quarter. These countries received an increased number of citations in the transition from the first to the second quarter of the year. It is also worth noting that Switzerland increased its international collaboration massively over the year. Canada, Germany, and the United Kingdom had high international collaboration levels through the entire year.
From the first to the third quarter of the year, authors appeared to become more dedicated and published an increasing number of times. This changed after the third quarter, when the trend went back to the way it was in January.
Chinese institutes published the most number of documents through the year. Although many Chinese institutes stopped publishing by the end of the year, Huazhong University of Science and Technology and Wuhan University remained the top contributors worldwide till the very end of the year. Until March, most of the publications were coming from China and South Korea. However, the Western world picked up quickly from March onwards and most other non-Western institutes were sidelined.

Conclusions
In comparison to 2003, the researchers in 2020 published a lot more in response to disease outbreak. Newer journals published COVID-19 related research, and many of these had not published significant coronavirus-related research in the previous two decades. This research was more interdisciplinary and saw much greater international collaboration.
Over the course of the pandemic, interest in publishing increased rapidly until September 2020. This interest seemed to reduce by the end of the year. However, researchers continued to value collaboration till the end of the year. At the beginning of the year, most publications came from the epicenter of the pandemic -China. However, by the end of the year, although global collaborations increased, most of them were among researchers in the Western world. Research from China and the developing world became less significant as it became clearer that the pandemic was a global concern.
One of the starkly missing features in this analysis is that of societal inequalities that cannot be proxied measurably from scientific publications, although we tried to assess this by means of identifying economically backward regions with relatively lesser known institutions and authors in our collections. We noted people in the highest-income areas (economically advanced countries) had significantly more central locations in several temporal networks as compared to low-income countries. But these trends appeared to be diminishing by the Q4 and the future may hold surprises that we are currently in the process of predicting. For instance, it has been noted that reduced economic activity and travel during the pandemic has reducing air pollution and deaths from traffic accidents and crashes, but the published corpus does not yet allow us to quantify this.
A second feature missing from the current analysis is the gender ratios, as emphasized in the text already. It has been predicted that it may take about two decades before the number of women on scientific papers is equal to the number of men. We undertook a manual inspection for trends between 2003 SARS and the current 2020 pandemic, and found strong skews that mask huge amounts of variation and merit a more dedicated analysis of the collections, currently underway in our laboratory. We are trying to identify numbers of women authors in each collection, their rates of publishing, the extent to which women are outnumbered by men across subject areas, and more.
In summary, the trends observed in this work already offer ample scope for another comprehensive analysis of the same collections, with new question and outlook. At the same time, the current analysis has provided valuable insight into how academia responds to a global calamity, and how societal impact and public responses steers research. Looking forward, we note that recovery is possible, but more importantly, resilience is needed. We may learn useful lessons on the real-world importance of ensuring diversity, accessibility, and quality in scientific thought. The analysis of quarterly temporal networks during pandemic also emphasized the necessity and need to include the time dimension in such investigations, and how often we miss out on handles that enable is to be better prepared for recurrent stressors. We also reiterate that it is our collective responsibility to use the pandemic associated 'big data' and the exponentially increasing new wealth of information for a better world. The data presented in this study are available on request from the corresponding author. The data are not publicly available due to large file sizes.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.