National Scientiﬁc Funding for Interdisciplinary Research: A Comparison Study of Infectious Diseases in the US and EU

: Infectious diseases have been continuously and increasingly threatening human health and welfare due to a variety of factors such as globalisation, environmental, demographic changes, and emerging pathogens. In order to establish an interdisciplinary approach for coordinating R&D via funding, it is imperative to discover research trends in the ﬁeld. In this paper, we apply machine learning methodologies and network analyses to understand how the European Union (EU) and the United States (US) have invested their funding in infectious diseases research utilising an interdisciplinary approach. The purpose of this paper is to use public R&D project data as data and to grasp the research trends of epidemic diseases in the US and EU through scientometric analysis.


Introduction
Infectious diseases have been continuously and increasingly threatening human health and welfare due to a variety of factors such as globalisation, environmental, demographic changes, and emerging pathogens [1][2][3]. According to a study by the World Bank [4], the estimated economic losses from six major outbreaks, i.e., Nipah Virus (Malaysia), West Nile Fever (USA), SARS (Asia, Canada, other), HPAI (Asia, Europe), BSE (US, UK), and Rift Valley Fever (Tanzania, Kenya, Somalia) between 1997 and 2009, totalled $ 80 billion USD. If these outbreaks had been forestalled, an average of USD 6.7 billion per year of losses might have been avoidable, as indicated by the emergence and dissemination of the recent outbreak of the Ebola virus between 2013-2015, which resulted in around 28,600 suspected cases, 11,300 confirmed deaths and an estimated financial loss of USD 600 million worldwide [5]. Not only have infectious diseases directly and indirectly affected public health systems, but they have also influenced a variety of environmental (i.e., phenology) and economic (i.e., transportation industry) sectors [5,6]. Thus, a variety of researchers (i.e., molecular biologists, biochemists, epidemiologists) must collaborate closely to guarantee the transfer and application of scientific and technological outcomes to the field, thereby accomplishing long-term control of infectious diseases [1,7]. That is, the only solution would be internationally coordinated, interdisciplinary approach [8][9][10][11][12]. Such an argument can be identified in much medical research [13,14]. Also, diverse health-related databases have been used to gain opportunities for a better understanding of health care management across countries that differentiate from their health systems [15], or they may be used find a better direction for approaches to treatment [16,17].
In order to establish an interdisciplinary approach for coordinating R&D via funding, it is necessary to recognise the current research trends in the field [17,18]. Previous research on infectious diseases has focused on a scientific activity (i.e., publications) [1,19]. However, quantitative analyses based on publications or patents have an inherent limitation of retrospective characteristics [20]. Namely, it is inappropriate to use publications or patents to establish future-oriented strategies. As a consequence, prominent scholars have emphasised the utilisation of funding data as an alternative [20,21]. Although research based on funding was conducted, its focus was not the investigation of a disciplinary approach, but the distribution of funding for infectious diseases [22]. Thus, the first requirement was to research infectious diseases with both an interdisciplinary perspective and information from funding databases.
Moreover, 2015 was a year when the most research was conducted for these studies. Due to the Ebola breakout, The World Health Organization (WHO), a control centre of globally proactive and coordinated research and development (R&D) efforts, has pursued its core responsibility that averts and minimises the loss of life and economic resources stemming from an outbreak within its member states [9]. Thus, the second requirement for researchers and policymakers to understand is how we have dealt with the global challenge of infectious diseases since 2015.
To the best of our knowledge, no study has met both requirements. Therefore, in this study, we aimed to provide funding information on the interdisciplinary approach of infectious diseases since 2015. Many studies have indicated that the US and EU have critical roles in the scientific and technological advancements of infectious diseases [19,22]. In particular, as three leading national scientific funding organisations or programmes related to health domain, the National Institutes of Health (NIH) of the US and the framework programmes for research and innovation (i.e., Horizon 2020) funded by the EU are significant. The programmes of these organisations have emphasised fostering interdisciplinary studies among scientific disciplines such as life sciences, technical sciences and social sciences to keep their societies better because societal challenges such as health and wellbeing are relevant to multiple disciplines [21,23].
In this paper, we apply machine learning methodologies and network analyses to understand how the EU and US have invested their funding to address infectious diseases research using an interdisciplinary approach. Our research addresses the following questions:

1.
What interdisciplinary fields of research on infectious diseases have the US and EU invested in since 2015? 2.
What are the disciplinary ranges in infectious diseases-related research fields in the US and EU? 3.
How does the US and the EU differ in their interdisciplinary research approaches to infectious diseases?
The remainder of this paper consists of four sections. Following this general introduction, the materials and methods section describes the framework and methodology. The results section presents the comparative results of the research profiling and machine learning analyses. The discussion and conclusion section review our research, identify research limitations and indicate promising research opportunities to pursue in the future.

Data Collection
The data used in this study are the global R&D project information, collected from the R&D database provided by the US and the EU. A total of 5934 R&D projects related to infectious diseases from 2014 to 2017 were collected from each database. The query set used to collect the data was the following: ((infectious OR contagious OR communicable) AND (disease *)). Data sources for each country and the number of data are shown in Figure 1. STAR METRICS is the U.S. federal government's database of R&D projects designed to create a data repository and tools needed to assess the impact of federal R&D investments. This database is funded by the NIH (National Institutes of Health), the NSF (National Science Foundation) and the U.S. Department of Agriculture and the Environment Department under the auspices of the Office of Science and Technology Policy (OSTP). Federal RePORTER is an initiative of STAR METRICS , which creates a database of R&D projects from federal agencies and makes it available to the public. CORDIS is a major public database and portal site that provides the most extensive coverage of all major research projects, and the European Commission financially supports it. CORDIS's websites and repositories include all public information (project facts, publishable reports and outputs), communication and development assistance (news, events, success stories, magazines, etc.), open access publications and links to external sources held by the European Commission.

Data Preprocessing
Although the XML data provided by each country's R&D database had the same content, they could not be compared until there was a standardisation of the field names for each database. Accordingly, fields that contained the same contents but used different names for each database were unified into the same integrated field name by utilising the index of metadata provided by each database. Examples of uniformed field names are shown in Figure 2. The optimised database structure was completed by mapping the US and EU database fields into the unified field structure, and a new structured global R&D project database was completed when the actual field contents were parsed and filled in for the corresponding fields. STAR METRICS ® is the U.S. federal government's database of R&D projects designed to create a data repository and tools needed to assess the impact of federal R&D investments. This database is funded by the NIH (National Institutes of Health), the NSF (National Science Foundation) and the U.S. Department of Agriculture and the Environment Department under the auspices of the Office of Science and Technology Policy (OSTP). Federal RePORTER is an initiative of STAR METRICS ® , which creates a database of R&D projects from federal agencies and makes it available to the public. CORDIS is a major public database and portal site that provides the most extensive coverage of all major research projects, and the European Commission financially supports it. CORDIS's websites and repositories include all public information (project facts, publishable reports and outputs), communication and development assistance (news, events, success stories, magazines, etc.), open access publications and links to external sources held by the European Commission.

Data Preprocessing
Although the XML data provided by each country's R&D database had the same content, they could not be compared until there was a standardisation of the field names for each database. Accordingly, fields that contained the same contents but used different names for each database were unified into the same integrated field name by utilising the index of metadata provided by each database. Examples of uniformed field names are shown in Figure 2. The optimised database structure was completed by mapping the US and EU database fields into the unified field structure, and a new structured global R&D project database was completed when the actual field contents were parsed and filled in for the corresponding fields.
Also, in order to understand the convergence of the R&D area, a consistent classification system for each project was needed. As a result, 5 classification codes were assigned to each project by using the ASJC code (All Science Journal Classification Codes) of Scopus. The process of allocating ASJC codes to each project was done through the calculation of similarity through machine learning. Also, in order to understand the convergence of the R&D area, a consistent classification system for each project was needed. As a result, 5 classification codes were assigned to each project by using the ASJC code (All Science Journal Classification Codes) of Scopus. The process of allocating ASJC codes to each project was done through the calculation of similarity through machine learning.
The first step in the machine learning process is that the author keyword of approximately one million articles of Scopus and the ASJC codes assigned to each paper were set as the feature and label, respectively. After that, based on the similarity calculated according to the learned results, five ASJC codes that were most relevant to the title and abstract of the R&D project were given to each project. A conceptual diagram of this process is shown in Figure 3.

Cooccurrence Matrix
As a way to identify interdisciplinary R&D areas, we chose to validate project groups that were assigned to various areas, namely ASJC codes of various fields at the same time. As part of this approach, a cooccurrence matrix in text mining was used first. Cooccurrence in text mining refers to words appearing together in a sentence, paragraph or text. The ASJC codes that appear together in a particular project group will be relevant; the more often they appear, the greater their relevance. When multiple ASJC codes appear simultaneously in a project group, a network structure among ASJC codes is formed, and clusters composed of them are built, thereby enabling network analyses to be performed. Embodying and analysing the cooccurrence matrix was undertaken by using the Vantage Point ® system (Search Tech, Inc., Atlanta, GA, USA, Version 7.1).

Clustering and Network Visualisation
As mentioned, when the association among ASJC codes is identified through the cooccurrence matrix, it has a network structure. By visualising this network structure, we can directly grasp the relationship between ASJC codes. The VOSViewer (Leiden University, Leiden, The Netherlands, Version 1.16.11) software was used as a network structure visualisation tool. The VOSViewer system calculates the similarity between each component and visualises the network structure in the form of a cluster map or a topographic map. A mathematical model and algorithm of the VOSViewer's clustering and mapping can be found in the research of Van Eck and Waltman [24].
Initially, the constructed clusters were able to divide into more detailed sub-clusters. Therefore, in order to derive more detailed, interdisciplinary R&D areas from larger clusters, the components of The first step in the machine learning process is that the author keyword of approximately one million articles of Scopus and the ASJC codes assigned to each paper were set as the feature and label, respectively. After that, based on the similarity calculated according to the learned results, five ASJC codes that were most relevant to the title and abstract of the R&D project were given to each project. A conceptual diagram of this process is shown in Figure 3.

Defining an Interdisciplinary R&D Area
The definition of interdisciplinary R&D can only be seen by looking directly at the R&D projects that were comprised of actual clusters or sub-clusters. Therefore, we first ascertained the approximate R&D area by grasping the component ASJC codes constituting each sub-cluster. After that, the contents of the title and abstract of the project in the sub-cluster were checked, and the research fields of each sub-cluster were defined.
In addition, the budgets allocated to each sub-cluster and research institutes were identified. In order to compare countries, the US and EU projects were analysed individually by the same process, and through a comparison between the US and the EU, we sought to derive implications of the commonalities and differences between the US and the EU in interdisciplinary R&D areas related to infectious diseases.

Interdisciplinary Research Areas on Infectious Diseases Funded by the US
As shown in Figure 4, the interdisciplinary research areas on infectious diseases that were funded by the US were divided into 5 clusters (categories); some of the clusters were then further

Cooccurrence Matrix
As a way to identify interdisciplinary R&D areas, we chose to validate project groups that were assigned to various areas, namely ASJC codes of various fields at the same time. As part of this approach, a cooccurrence matrix in text mining was used first. Cooccurrence in text mining refers to words appearing together in a sentence, paragraph or text. The ASJC codes that appear together in a particular project group will be relevant; the more often they appear, the greater their relevance. When multiple ASJC codes appear simultaneously in a project group, a network structure among ASJC codes is formed, and clusters composed of them are built, thereby enabling network analyses to be performed. Embodying and analysing the cooccurrence matrix was undertaken by using the Vantage Point ® system (Search Tech, Inc., Atlanta, GA, USA, Version 7.1).

Clustering and Network Visualisation
As mentioned, when the association among ASJC codes is identified through the cooccurrence matrix, it has a network structure. By visualising this network structure, we can directly grasp the relationship between ASJC codes. The VOSViewer (Leiden University, Leiden, The Netherlands, Version 1.16.11) software was used as a network structure visualisation tool. The VOSViewer system calculates the similarity between each component and visualises the network structure in the form of a cluster map or a topographic map. A mathematical model and algorithm of the VOSViewer's clustering and mapping can be found in the research of Van Eck and Waltman [24].
Initially, the constructed clusters were able to divide into more detailed sub-clusters. Therefore, in order to derive more detailed, interdisciplinary R&D areas from larger clusters, the components of ASJC codes belonging to each cluster were extracted again, and each large cluster was divided into several sub-clusters through the two types of software, as mentioned earlier.

Defining an Interdisciplinary R&D Area
The definition of interdisciplinary R&D can only be seen by looking directly at the R&D projects that were comprised of actual clusters or sub-clusters. Therefore, we first ascertained the approximate R&D area by grasping the component ASJC codes constituting each sub-cluster. After that, the contents of the title and abstract of the project in the sub-cluster were checked, and the research fields of each sub-cluster were defined.
In addition, the budgets allocated to each sub-cluster and research institutes were identified. In order to compare countries, the US and EU projects were analysed individually by the same process, and through a comparison between the US and the EU, we sought to derive implications of the commonalities and differences between the US and the EU in interdisciplinary R&D areas related to infectious diseases. Figure 4, the interdisciplinary research areas on infectious diseases that were funded by the US were divided into 5 clusters (categories); some of the clusters were then further categorised into 2~4 sub-clusters. After reviewing research descriptions of funded projects in each sub-cluster, we named each cluster to contain the comprehensive meaning of main research subjects as follows: The cluster of "Public health for HIV-vulnerable group/Respiratory health for children/HR for infectious diseases (Cluster 1)" is made of "HIV (sub-cluster 1-1)", "heath assessment of children (sub-cluster 1-2)", and "research scholars and educational programs (sub-cluster 1-3)", the cluster of "Diagnosis and treatment of infectious diseases by using advanced technology (Cluster 2)" is composed of "information technology based infectious disease diagnosis (sub-cluster 2-1)", "molecular biology based diagnosis and treatment (sub-cluster 2-2)", "wastewater treatment for preventing infectious diseases (sub-cluster 2-3)", and "eye disease stemming from infectious diseases (sub-cluster 2-4)", the cluster of "Biological Sustainability 2019, 11, 4120 6 of 25 studies on the mechanism of inflammatory diseases caused by infectious diseases and development of therapies for them (Cluster 3)" is created by "inflammation treatments caused by infectious diseases (sub-cluster 3-1)", "sleeping sickness (sub-cluster 3-2)", and "viral diseases in human and animals (sub-cluster 3-3)", the cluster of "Strengthening research capacity for epidemiology (Cluster 4)" is formed by "small animal models for infectious diseases (sub-cluster 4-1)" and "epidemiology and health system (sub-cluster 4-2)", and the cluster of "Clinical trials on vaccines and products to help treat and prevent infectious diseases (Cluster 5)". In the next subsection, detailed investigations for each cluster will be described. Comprehensive information for each cluster of the US is listed in Table A1 of Appendix A.

Public Health for HIV-Vulnerable Group/Respiratory Health for Children/HR for Infectious Diseases (Cluster 1)
Public health for HIV-vulnerable group/Respiratory health for children/HR for infectious diseases (Cluster 1) contained 46 projects costing USD 9,270,970 worth of and 21 multiple disciplines, respectively (see Figure 5 and Table 1 Second, health assessments of children who were exposed to infectious disease at an early age, environmental pollutants, chemical exposure, and endocrine disruptors (Sub-cluster 1-2) is formed with USD 3,530,344 worth of 2 projects. Health (2739) played a fundamental role and adopted four heterogeneous research areas such as Epidemiology (2713)

Public Health for HIV-Vulnerable Group/Respiratory Health for Children/HR for Infectious Diseases (Cluster 1)
Public health for HIV-vulnerable group/Respiratory health for children/HR for infectious diseases (Cluster 1) contained 46 projects costing USD 9,270,970 worth of and 21 multiple disciplines, respectively (see Figure 5 and Table 1).

Diagnosis and Treatment of Infectious Diseases by Using Advanced Technology (Cluster 2)
Overall, the cluster of the diagnosis and treatment of infectious diseases by using advanced technology (Cluster 2) was comprised of 31 projects totalling USD 18,305,522 and incorporating 38 multiple disciplines, respectively (see Figure 6 and Table 2).
First, based on Health Informatics (2718), information technology-based infectious disease diagnosis (sub-cluster 2-1) collaborated with 11 heterogeneous disciplines such as Health Information Management (3605), Computer Networks and Communications (1705), Hardware and Architecture (1708) and Biomedical Engineering (2204). Nineteen projects spent USD 5,532,582. In particular, the University of Pittsburgh spent USD 2,330,227 from 2014~2019 for the project of "MIDAS (Modelling of Infectious Disease Agent Study) Informatics Services Group ISG" and the University of Utah is estimated to spend USD 747,704 from 2017~2021 in the project of "IOBIO Web-based Interactive Tools for Real-time Analysis in Genomic Big Data".
Fourth, eye diseases stemming from infectious diseases (i.e., Trachoma) comprises Management (1408), Management Science and Operations Research (1408), Engineering (miscellaneous) (2201), Cardiology and Cardiovascular Medicine (2705) and Oral Surgery (3504). The project, "Forecasting Trachoma Control", studied by University of California San Francisco with a budget of USD 399,420 from 2016-2019, can be typified.    Biological studies on the mechanism of inflammatory diseases caused by infectious diseases and the development of therapies for them (Cluster 3) contained 131 projects worth USD 37,588,973 and spanned 35 multiple disciplines, respectively (see Figure 7 and Table 3). Biological studies on the mechanism of inflammatory diseases caused by infectious diseases and he development of therapies for them (Cluster 3) contained 131 projects worth USD 37,588,973 and panned 35 multiple disciplines, respectively (see Figure 7 and Table 3).    Second, with regard to sleeping sickness (i.e., Trypanosoma brucei), a total of four projects were selected, and USD 416,250 worth of funding was allocated. At least four heterogeneous fields, i.e., Insect Science (1109)

Strengthening Research Capacity for Epidemiology (Cluster 4)
Generally strengthening research capacity for epidemiology (Cluster 4) comprised USD 113,924,725 worth of 121 projects and 19 multiple disciplines, respectively (see Figure 8 and Table 4).

Strengthening Research Capacity for Epidemiology (Cluster 4)
Generally strengthening research capacity for epidemiology (Cluster 4) comprised USD 113,924,725 worth of 121 projects and 19 multiple disciplines, respectively (see Figure 8 and Table 4).

Clinical Trials on Vaccines and Products to Help Treat and Prevent Infectious Diseases (Cluster 5)
Regarding clinical trials of vaccines and products to help treat and prevent infectious diseases, two projects totalled USD 80,156 which were conducted through the following disciplines: Critical Care and Intensive Care Medicine (2706), Industrial relations (1410) and Applied Mathematics (2604) (see Figure 9 and Table 5). Care and Intensive Care Medicine (2706), Industrial relations (1410) and Applied Mathematics (2604) (see Figure 9 and Table 5).

Fields of Interdisciplinary Research on Infectious Diseases Funded by the EU
As shown in Figure 10, the interdisciplinary research areas on infectious diseases that were funded by the EU were divided into 4 clusters (categories). Comprehensive information for each cluster of the EU is listed in Table A2 of Appendix A.    Figure 10, the interdisciplinary research areas on infectious diseases that were funded by the EU were divided into 4 clusters (categories). Comprehensive information for each cluster of the EU is listed in Table A2 of Appendix A.

Detection and Profiling for Pathogens (Cluster 1)
Five projects on the detection and profiling for pathogens were allocated USD 9,244,467, which principally studied Medical Laboratory Technology (3607) along with 15 different disciplines, including specialities such as Virology (2406), Biochemistry, medical (2704), Clinical Biochemistry (1308), Immunology and Microbiology (miscellaneous) (2401) (see Table 6).   Table 7). With regard to age-related diseases, the King's College London has conducted research under the title of "10/66 Ten Years on Monitoring and Improving Health Expectancy by Targeting
For instance, the project, "Leveraging Pharmaceutical Sciences and Structural Biology Training to Develop 21st Century Vaccines", from the University of Strathclyde was carried out with a budget of USD 1,243,357 during the time period of 2016-2020 comprising various disciplines such as Biomaterials

Comparison between the US and EU
The US and EU shared the same interest in public health policy related to infectious diseases, which stems from the results of Public health for HIV-vulnerable group/Respiratory health for children/HR for infectious diseases (Cluster 1) of the US and Public health for the prevention and treatment of infectious diseases (Cluster 2) of the EU. However, the target group of the interdisciplinary researches of the two were different. The US focused on vulnerable members of society, while the EU studied the benefits of vaccination among regular citizen. Furthermore, while the subject of the epidemic prevention and policy research in the US is clear and detailed, from a whole social and economic structure of viewpoint developing infrastructure accounted for the majority of the European Unions' projects.
From the results of diagnosis and treatment of infectious diseases by using advanced technology (Cluster 2) of US and Public health for the prevention and treatment of infectious diseases (Cluster 2) of EU, they have both heavily invested in research and development of epidemic prevention and treatment. Especially research on the diagnosis of infectious diseases has been actively conducted by using advanced technologies such as ICT, data, and platform. Unlike the EU, which focused their immunological studies on infectious diseases, the US immunological studies mainly concentrated on inflammatory diseases.
When taking a closer look at biological studies of the mechanism of inflammatory diseases caused by infectious diseases and the development of therapies for them (Cluster 3) of US and Immunological studies on infectious diseases (Cluster 3) of the EU, common biological and immunological approaches to the pathogenesis and treatment of epidemics is likely to break down in terms of pathogens and out broken organs of infectious diseases. Likewise, depending on examination of clinical trials of vaccines and products to help treat and prevent infectious diseases (Cluster 5) of US and Development of vaccine and vaccine-related products (Cluster 4) of the EU, the US shares the same goal as the EU to develop, manage, and evaluate vaccine-related products.
From investigating the strengthening research capacity for epidemiology (Cluster 4) of the US, many projects have invested in developing human resources in epidemiology. Supporting research laboratories and performing the role of government health departments.

Discussion and Conclusions
The aim of this study was to understand the trends of research on infectious diseases that globally endangering human health and well-being from an interdisciplinary approach since mid the 2010s, thereby deducing strategic directions for governmental R&D. In particular, two leading scientific funding organisations or programmes that played dominant roles in infectious diseases were investigated. According to the results, the US has invested in five interdisciplinary research areas (13 sub-clusters) of infectious diseases with a total budget of USD 179,170,346 and with 333 projects, which contained 118 heterogeneous disciplines. On the other hand, the EU has funded 19 projects which were worth about USD 87,876,000 and have experimented with 54 different disciplines. In summary, the US has significantly invested more in interdisciplinary research of infectious diseases than the EU since the mid-2010s. Although it is hard to directly compare the research fields of both nations in terms of discipline, four characteristics they had in common in terms of interdisciplinary research on infectious diseases was identified as follows: public health policy of infectious diseases, epidemic prevention and treatment, segmentations of biological and immunological approaches to the pathogenesis and treatment of epidemics, and vaccine development.
The EU heavily concentrated on public health for infectious diseases or prevention, diagnosis and treatment of infectious diseases (see Cluster 2 of the EU (USD 62.4 million) and Cluster 1 of the US (USD 9.3 million) or Cluster 2 of the US (USD 18.3 million)). It presumed that the Horizon 2020 programmes of the EU encouraged strong involvement of the EU member nations to support mechanisms for contributing to the establishment of a research ecosystem of wider reach and benefit in the EU [17].
Regarding the segmentation of biological and immunological studies, the US (Cluster 3, USD 37,588,973) invested more heavily than the EU (Cluster 3, USD 11,357,222) via a variety of projects. Particularly, the US has adopted more diverse knowledge stemming from multiple disciplines including Biochemistry, Genetics and Molecular Biology (miscellaneous) (1301), Cancer Research (1306), Cell Biology (1307), Cellular and Molecular Neuroscience (2804), Animal Science and Zoology (1103), and Applied Microbiology and Biotechnology (2402). On the contrary, the EU has focused on the vaccine development more than the US (see Cluster 4 of the EU (USD 4.9 million) and Cluster 5 of the US (USD 0.8 million)). As a result, research areas from the EU have included more distinctive disciplines such as Agronomy and Crop Science (1102), Horticulture (1108), Biochemistry, Genetics and Molecular Biology (miscellaneous) (1301), Global and Planetary Change (2306) and so on. The University of Strathclyde in the UK has taken the lead the interdisciplinary research on vaccine development of infectious diseases since 2016. It allows other infectious disease-related research organisations to recognise its specialised disciplines, which may offer a hint for coordinating the work of joint researches.
For the US, it has reinforced its capacity for epidemiology through significant investment (Cluster 4 of the US (USD 113,924,725)). Therefore, it is imperative that nations or organisations outside of the US review the research results from organisations in the US before considering launching a research programme.

Conflicts of Interest:
The authors declare no conflict of interest. 12 12,041,079