Long-Term Collaboration Network Based on ClinicalTrials.gov Database in the Pharmaceutical Industry

: Increasing costs, risks, and productivity problems in the pharmaceutical industry are important recent issues in the biomedical ﬁeld. Open innovation is proposed as a solution to these issues. However, little statistical analysis related to collaboration in the pharmaceutical industry has been conducted so far. Meanwhile, not many cases have analyzed the clinical trials database, even though it is the information source with the widest coverage for the pharmaceutical industry. The purpose of this study is to test the clinical trials information as a probe for observing the status of the collaboration network and open innovation in the pharmaceutical industry. This study applied the social network analysis method to clinical trials data from 1980 to 2016 in ClinicalTrials.gov. Data were divided into four time periods—1980s, 1990s, 2000s, and 2010s—and the collaboration network was constructed for each time period. The characteristic of each network was investigated. The types of agencies participating in the clinical trials were classiﬁed as a university, national institute, company, or other, and the major players in the collaboration networks were identiﬁed. This study showed some phenomena related to the pharmaceutical industry that could provide clues to policymakers about open innovation. If follow-up studies were conducted, the utilization of the clinical trial database could be further expanded, which is expected to help open innovation in the pharmaceutical industry.


Introduction
Due to social issues such as an improvement in quality of life and the global aging phenomenon, the need for new drug development is growing rapidly, and many pharmaceutical companies and research institutes are making efforts to develop new drugs. However, the cost of R&D investment in the pharmaceutical industry is increasing, and productivity is stagnant. For this reason, the demand for innovation is increasing. A recent study estimated that the cost of new drug development increased from 800 million United States dollars (USD) (in 2000 dollars) to 2.87 billion USD (in 2013 dollars) (DiMasi et al., 2016) [1]. Healthcare expenditure in the US also increased from 27.4 billion USD in 1960 to 2.9 trillion USD in 2013 (Rho and Kim, 2017) [2]. A key part of the cost increase is clinical trial studies; particularly, the cost of performing phase II and phase III is growing rapidly (Orloff et al., 2009) [3]. Clinical trials are not only cost intensive; they are also time-consuming in new drug development. According to Kaitin (2010) [4], it takes 7.2 years on average from the start of clinical testing to regulatory approval in the case of FDA-approved new drugs. In an extreme case, the drug is supplied to the market more than nine years after new drug-candidate development. This means that clinical trials could be a threat to the management of pharmaceutical companies, and it could remove opportunities database is one of the information sources with the widest coverage for new drug development. Second, the clinical trials database includes information on the development of biosimilar drugs, which are replication medicines. The importance of biosimilar drugs is increasing in the pharmaceutical industry. Original drugs are protected by patents on discovery and development, and the patent holders have exclusive rights. A replication medicine is basically a medicine that has the same effect as the existing drug, but available at a lower price, so the academic importance is low. The market importance of biosimilar drugs, on the contrary, is increasing. Recently, biosimilar drugs have exceeded the market share of original drugs. Even drug makers of original drugs foster the development of biosimilar drugs. This means that, without the biosimilar drugs, the current activity of the pharmaceutical industry cannot be understood properly. Third, clinical trial study information can provide insight for improving the productivity of new drug development. Since clinical trial studies, especially phase II and phase III, are the largest parts of the cost increase in new drug development, these stages are where the greatest effort needs to be made in order to improve productivity and reduce risk. Fourth, the clinical trials database is 'an underutilized source' of information (Glass et al., 2015) [18]. For the reasons described above, the clinical trials information is a very important source of information, with a large amount of data accumulated: there are data from over 230,000 clinical trials from 195 countries in ClinicalTrials.gov. Nevertheless, it has not been adequately analyzed and examined. There are some previous studies, such as the correlation analysis between ClinicalTrials.gov and publications in biomedical journals, and an analysis on the characteristics of trends in the clinical trials database.
There have been few studies on collaborations in clinical trials, despite their importance.
Social network analysis (SNA) is used in this study to examine the interorganizational collaborations that can be found in clinical trial studies. SNA was originally developed to study the relationships between people, and is most frequently used in disciplines such as sociology and anthropology. Due to the development of information communication technology, SNA is now applied to complex social systems and natural phenomena. SNA itself is forming a transdisciplinary field covering economics, organization studies, business management, public health, information science, biology, complexity, and so on (Knoke and Yang, 2008) [19]. SNA has been applied to many cases to analyze the relationships between researchers and institutions, and find out the important players within the network (Newman (2004) [20], Kretschmer (1994) [21], Wang et al. (2012) [22], Liu et al. (2017) [23], Almgren and Lee (2016) [24]). Interorganizational collaboration is a key to understanding open innovation, and thus, many studies have been carried out to understand open innovation and improve productivity (Powell et al. (1996) [10], Fritsch and Kauffeld-Monz (2010) [25]).
In a nutshell, an open innovation strategy is needed to address productivity problems in the pharmaceutical industry, where R&D costs and risks are increasing. In order to look for an open innovation strategy, it is first necessary to understand past and present collaborative relations in the pharmaceutical industry. Therefore, the research question of this study is as follows: Does the clinical trial database help us explore interorganizational collaboration and understand open innovation in the pharmaceutical industry? Information on participating organizations is extracted from the clinical trials database, to which SNA methodology was applied. Analyzing about 40 years of clinical trials information is expected to help understand the past and current collaboration status, and eventually the open innovation trend in the pharmaceutical industry. Glass et al. (2015) [18] argued that ClinicalTrials.gov is underutilized, despite it being an important source of information for understanding the entire clinical trials. Research on ClinicalTrials.gov data has been limited to investigations of registration trends and trial attributes in clinical trial, publications, and understanding the taxonomy of ClinicalTrials.gov. In particular, ClinicalTrials.gov contains a wealth of data on partnerships and clinical trial data from over 195 countries. Glass et al. derived yearly statistical data for ClinicalTrials.gov data, and mentioned that incomplete site information was not more Sustainability 2018, 10, 322 4 of 14 than 3%, and hence ClinicalTrials.gov data could be a valuable research source. They also presented various research topics for in-depth research studies and explained the need for future studies. Califf et al. (2012) [26] examined the characteristics of 96,346 interventional clinical study data in ClinicalTrials.gov between 2004 and 2010. Various topics, such as primary purposes, intervention types, regions, anticipated enrollments, and so on, were examined for 2004-2007 and 2007-2010. They also studied clinical trial attributes by therapeutic areas, i.e., oncology, cardiovascular, and mental health. They concluded that the clinical trials were dominated by small trials, and contained a significant heterogeneity in methodological approaches. Roumiantseva et al. (2013) [27] studied clinical trial design characteristics according to sponsorship in ClinicalTrials.gov. They analyzed 108,315 trials registered in ClinicalTrials.gov as of 11 June 2011, and compared interventions and medical conditions by sponsor type. They found that industry-sponsored studies differed systematically from government-sponsored ones in study type, choice of interventions, and study conditions. Thiers et al. (2008) [28] showed globalization trends by statistically classifying countries and regions where clinical trials were conducted. As expected, most clinical trials were conducted in wealthy countries. However, in recent years, clinical trials in emerging regions were also on the rise. This study also showed the global status of clinical trials. If a pharmaceutical company plans to enter the global market with new drugs, it will need to grasp the global network.

Literature Review
Apart from the previously mentioned studies, there are research examples that applied SNA to observe the innovation status of the firms. For example, Powell et al. (1996) [10] collected interorganizational agreements data in 1990-1994 from the Bioscan database, and analyzed the collaboration relationship between dedicated biofirms (mostly U.S. firms). They found that the density of the biotech firm network increased, and claimed that the companies tended to enlarge the interorganizational agreement in order to obtain more specific information, resources, and products. Another example is Fritsch and Kauffeld-Monz (2010) [25]. They analyzed the network structure of 300 firms and research organizations in 16 German regions. Data were gathered by postal questionnaires in 2004 asking the names of the most important partner firms and research organizations. They argued that strong ties were more effective in conveying knowledge and information than the weak ones.
According to the above literature review on previous studies, it is reasonable to consider that the clinical trials database and SNA is an appropriate combination of information source and analytical methodology to examine past and current interorganizational collaboration and open innovation in the pharmaceutical industry.

Data and Method
In this study, the ClinicalTrials.gov database was used to analyze the pharmaceutical network of collaborations. To the best of our knowledge, this study is the first long-term analysis of a clinical trials database. The whole dataset of ClinicalTrials.gov, containing approximately 230,000 trial studies as of January 2017, was downloaded from its Downloading Contents for Analysis page (https:// clinicaltrials.gov/ct2/resources/download). SNA was applied to the sponsor-collaborator relationship data, which was also acquired from ClinicalTrials.gov ( Figure 1). The collaboration network in each decade was constructed, and network properties and characteristic trend were examined in order to conduct a time-series analysis. The characteristics of the collaboration network of clinical trials were analyzed, and major players of a network for each decade were identified. The characteristics of the four types of sponsors of clinical trials, such as groups of national institutes, universities, companies, and others, were examined. The findings, such as the differences in their roles in clinical trials, could provide some implications for the understanding of open innovation in the pharmaceutical industry.  Table 1 shows the basic parameters for the dataset. The number of participants for each time period increased from 181 (1980s) to 28,227 (2010s). Since the 1990s, as the numbers of participating agencies exceeds 1000, node reduction was required to carry out the network analysis. In this study, the top 77% agencies were selected based on the number of clinical trials per agency, and the number of nodes of collaboration networks for each decade increased from 222 (1990s) to 1467 (2000s) to 3087 (2010s). The number of edges (the collaborative relationships) of each network also increased drastically, from 563 (1980s) to 40,577 (2010s). Figure 2 shows the trends in the number of clinical trials, and the number of participating agencies. From Figure 2, it can be seen that the number of clinical trials increased sharply since the 2000s, and the number of participating agencies is not as high as the number of clinical trials. This shows that as the number of agencies participating in clinical trials increases, the number of clinical trials participating for each agency also increases.   Table 1 shows the basic parameters for the dataset. The number of participants for each time period increased from 181 (1980s) to 28,227 (2010s). Since the 1990s, as the numbers of participating agencies exceeds 1000, node reduction was required to carry out the network analysis. In this study, the top 77% agencies were selected based on the number of clinical trials per agency, and the number of nodes of collaboration networks for each decade increased from 222 (1990s) to 1467 (2000s) to 3087 (2010s). The number of edges (the collaborative relationships) of each network also increased drastically, from 563 (1980s) to 40,577 (2010s). Figure 2 shows the trends in the number of clinical trials, and the number of participating agencies. From Figure 2, it can be seen that the number of clinical trials increased sharply since the 2000s, and the number of participating agencies is not as high as the number of clinical trials. This shows that as the number of agencies participating in clinical trials increases, the number of clinical trials participating for each agency also increases.    Figure 4 show the degree of distributions in the clinical trial collaboration networks in the four time periods. The degree is the number of links attached to a node (agency, in this case), and it could be interpreted as the number of collaborators of agencies. As with the number of clinical trials, it can be seen that the degree increases in recent time periods. Particularly from the 1990s to the 2000s, the degree of distribution increases noticeably. Though the high-degree regions in the 2000s and 2010s overlap each other, the degree in the 2010s is higher than the 2000s, as a whole. Figures 3 and 4 illustrate collaboration trends in clinical trials. By the 2000s, the agencies have increased in both the number of clinical trials and collaborators. The quantitative and cooperative expansion of clinical trials has been achieved. However, by the beginning of 2010, the number of clinical trials was stagnant, and the number of collaborators was steadily increasing. This is believed to be a correction period. It is difficult to continuously increase the number of clinical trials due to the expansion of clinical trial costs and productivity issues. Instead, they seek to improve productivity by accompanying more partnerships. This is a clue showing that companies and agencies look for chances of open innovation in the pharmaceutical industry.   Figure 4 show the degree of distributions in the clinical trial collaboration networks in the four time periods. The degree is the number of links attached to a node (agency, in this case), and it could be interpreted as the number of collaborators of agencies. As with the number of clinical trials, it can be seen that the degree increases in recent time periods. Particularly from the 1990s to the 2000s, the degree of distribution increases noticeably. Though the high-degree regions in the 2000s and 2010s overlap each other, the degree in the 2010s is higher than the 2000s, as a whole. Figures 3 and 4 illustrate collaboration trends in clinical trials. By the 2000s, the agencies have increased in both the number of clinical trials and collaborators. The quantitative and cooperative expansion of clinical trials has been achieved. However, by the beginning of 2010, the number of clinical trials was stagnant, and the number of collaborators was steadily increasing. This is believed to be a correction period. It is difficult to continuously increase the number of clinical trials due to the expansion of clinical trial costs and productivity issues. Instead, they seek to improve productivity by accompanying more partnerships. This is a clue showing that companies and agencies look for chances of open innovation in the pharmaceutical industry.

Major Players on Clinical Trial Networks
The top 20 agencies in each time period, with respect to the number of clinical trials and degrees, were extracted. Table 2 shows statistics on the top 20 agencies with respect to the number of trials. According to Table 2, it is apparent that the (global) companies lead the clinical trials from the 2000s (the numbers of companies are 10 for the 2010s and 13 for the 2000s), while national institutes did those roles in the 1980s and the 1990s (the numbers of national institutes are 11 for the 1990s and 12 for the 1980s). Table 3

Major Players on Clinical Trial Networks
The top 20 agencies in each time period, with respect to the number of clinical trials and degrees, were extracted. Table 2 shows statistics on the top 20 agencies with respect to the number of trials. According to Table 2, it is apparent that the (global) companies lead the clinical trials from the 2000s (the numbers of companies are 10 for the 2010s and 13 for the 2000s), while national institutes did those roles in the 1980s and the 1990s (the numbers of national institutes are 11 for the 1990s and 12 for the 1980s). Table 3

Major Players on Clinical Trial Networks
The top 20 agencies in each time period, with respect to the number of clinical trials and degrees, were extracted. Table 2 shows statistics on the top 20 agencies with respect to the number of trials. According to Table 2, it is apparent that the (global) companies lead the clinical trials from the 2000s (the numbers of companies are 10 for the 2010s and 13 for the 2000s), while national institutes did those roles in the 1980s and the 1990s (the numbers of national institutes are 11 for the 1990s and 12 for the 1980s). Table 3 Table 4 shows statistics of the top 20 agencies with respect to degrees. Degree means the number of collaborators, so the agencies with high degrees could be interpreted as intermediators in the collaboration networks. According to Table 4, universities are the main players, which is contrary to the result in Table 2. In 2010s, 12 universities were ranked in the top 20 with high degrees, which is twice the amount of that in the 2000s (rank 6). The universities, such as Johns Hopkins University, the University of California, San Francisco, Duke University, the University of Pittsburgh, and Columbia University are the major universities for 20 years (in the 2000s and 2010s), as shown in Table 5. The Washington University School of Medicine, Stanford University, the University of Washington, the University of Michigan, the University of Pennsylvania, and Emory University are emerging in the 2010s. Meanwhile, only two national institutes and three companies are ranked in the top 20 list.

Maps of Clinical Trial Networks
The maps of collaboration networks were constructed using the VOSviewer application version 1.6.5, which is a software tool for constructing and visualizing bibliometric networks that was developed by the Center for Science and Technology Studies (CWTS), Leiden University. Figures 5  and 6 show the networks of the 2000s and 2010s, respectively, with different colors per cluster (a group of agencies). The original numbers of agencies are 1467 and 3087 in the 2000s and the 2010s, respectively. When the collaboration networks are created with the original data, isolated groups with a small number of nodes (agencies) exist. The networks are regenerated with the remaining 1378 and 2857 agencies in the 2000s and 2010s, respectively, after removing the isolated groups to create the maps for the largest components, and the clusters are expressed. There are 32 groups in the 2000s and 30 groups in the 2010s. This means that the average size of the collaborative groups increased from 43 in the 2000s to more than twice as many-95-in the 2010s, which implies that partnerships have increased among agencies participating in clinical trials.
Institute and National Cancer Institute among the national institutes; GlaxoSmithKline, Pfizer, Merck Sharp & Dohme Corp (New Jersey, USA) among the companies; and the Massachusetts General Hospital, Mayo Clinic, and M.D. Anderson Cancer Center among the others.

Maps of Clinical Trial Networks
The maps of collaboration networks were constructed using the VOSviewer application version 1.6.5, which is a software tool for constructing and visualizing bibliometric networks that was developed by the Center for Science and Technology Studies (CWTS), Leiden University. Figures 5  and 6 show the networks of the 2000s and 2010s, respectively, with different colors per cluster (a group of agencies). The original numbers of agencies are 1467 and 3087 in the 2000s and the 2010s, respectively. When the collaboration networks are created with the original data, isolated groups with a small number of nodes (agencies) exist. The networks are regenerated with the remaining 1378 and 2857 agencies in the 2000s and 2010s, respectively, after removing the isolated groups to create the maps for the largest components, and the clusters are expressed. There are 32 groups in the 2000s and 30 groups in the 2010s. This means that the average size of the collaborative groups increased from 43 in the 2000s to more than twice as many-95-in the 2010s, which implies that partnerships have increased among agencies participating in clinical trials.  As shown in Figure 5, there is a large cluster component (or main cluster group), with the National Cancer Institute as the hub in the map of the 2000s, and a separate group of Taiwanese institutions located at the top-left corner of the network. In the 2010s, as shown in Figure 6, two more separate groups, apart from the Taiwan group (a), were formed, each of which comprises South Korean (c) and Chinese agencies (b). The separation from the main cluster group can be understood as a matter of closeness. In other words, the members in a separated group have a closer relationship As shown in Figure 5, there is a large cluster component (or main cluster group), with the National Cancer Institute as the hub in the map of the 2000s, and a separate group of Taiwanese institutions located at the top-left corner of the network. In the 2010s, as shown in Figure 6, two more separate groups, apart from the Taiwan group (a), were formed, each of which comprises South Korean (c) and Chinese agencies (b). The separation from the main cluster group can be understood as a matter of closeness. In other words, the members in a separated group have a closer relationship with members in the same group than with ones in the other groups. Therefore, Taiwanese agencies are closer and have a more active relationship with each other than with foreign agencies. The Chinese and South Korean groups are close to the large cluster components of the network in the 2000s, but they form separate groups in the 2010s, as does the Taiwanese group. This could be an indication of a recent shift from international to national cooperation. It can be interpreted as the efforts of China and Korea to expand their influence on the East Asian pharmaceutical market, and seek a niche market by leading the clinical trials for East Asia region. Table 6 shows several major agencies in Taiwanese, Chinese and South Korean groups selected by degree: the National Taiwan University Hospital and Chang Gung Memorial Hospital in the Taiwanese group; Fudan University and Sun Yat-sen University in the Chinese group; and the Seoul National University Hospital, Asian Medical Center, Inje University, and Samsung Medical Center in the Korean group.

Discussion
This study aims to examine whether the clinical trials database, which has not been utilized frequently, could be a probe for understanding the pharmaceutical industry. More specifically, the aim of this study was to investigate whether the clinical trials database is an appropriate information source for exploring partnerships in the pharmaceutical industry. Three explorative studies were performed to obtain a basic understanding of the clinical trials database: studies on the distribution of the number of clinical trials and degrees, the major players of the clinical trial network, and maps of the clinical trial networks. This study contributed to the understanding of a macroscopic collaboration relationship that can be constructed from the database and cover about 40 years of practices. In the first instance, it provided clues to understanding changes in clinical trials collaborative relationships by decade. The characteristics, including the major agencies as the leaders or intermediators of each decade, were identified. The maps of the clinical trials collaboration networks helped understand the overall structure of the global clinical trials and macroscopic changes. This study is different from previous ones on the clinical trials database. Studies such as Glass et al. (2015) [18], Califf et al. (2012) [26], and Thiers et al. (2008) [28], used statistical analyses to understand clinical trials database. Roumiantseva et al. (2013) [27] were interested in the types of sponsors, rather than their collaborative relationships. The main difference between this study and existing ones is that the collaborative relationships in the pharmaceutical industry were examined using the information on participants in the clinical trials database. The results of this study show that the clinical trials database is meaningful as an information source that shows collaborations for new drug developments, so the clinical trials database could be useful as a probe for exploring open innovation in the pharmaceutical industry.
This study mainly focuses on understanding the long-term clinical trials database, and hence the strategies of specific collaborating groups were not identified, which is a limitation of the study. An overview of the change in the macroscopic strategy according to the long-term clinical collaboration network is the main implication of the study.
An analysis of the strategies of specific agencies or collaborating groups (clusters) is one of the research topics for future studies. The analysis of the moves of specific countries and emerging countries on the collaboration network is another research topic to be conducted for the study on the country-specific strategies. As was found in this study, Taiwan, China, and South Korea formed separate groups in the global collaboration network. Identification of the processes through which these countries formed separate groups as a part of a global network could help understand their strategies. The same idea can be applied to understanding the strategies of the leading global pharmaceutical firms as well. The findings obtained in this study could provide global and comprehensive references to the research topics in the future.

Conclusions
More than ever, there is a growing demand for healthcare and new drug development. The amount of R&D expenditure in the pharmaceutical industry is on the rise, while the productivity of new drug development is stagnant. The industry is concerned, as these issues are getting more significant. Especially, the cost and time for the clinical trial phase in the new drug development process are so large that they are becoming social issues. Much of the literature refers to open innovation as a solution to this productivity problem.
This study tests the clinical trial database as a probe for observing open innovation in the pharmaceutical industry. This study analyzed clinical trials data from the ClinicalTrials.gov database in order to observe the characteristics of partnerships, and examined whether the clinical trials data were valuable for understanding the current state of open innovation in the pharmaceutical industry. All of the data from ClinicalTrials.gov (1890-2016) was divided into four time periods-1980s, 1990s, 2000s, and 2010s-and analyzed. This study adopted social network analysis (SNA) to construct collaboration data of the clinical trials for each time period. The analysis was composed of three parts: the distributions of the number of clinical trials and degrees, the major players of the clinical trial network, and maps of the clinical trial networks.
This study of the long-term collaboration network, based on the ClinicalTrials.gov database, provides several understandings for the relationships among pharmaceutical companies, research institutes, and universities, and their mechanisms. First, the number of clinical trials per agency has stagnated since the 2000s, but the number of collaborators continues to grow. This can be interpreted as an increase in external cooperation to look for opportunities for open innovation without expanding the number of clinical trials due to the cost and risk burden of clinical trials.
Second, the main leaders carrying out a large number of clinical trials are different from the intermediators establishing many partnerships on the clinical trial collaboration network. Some of the big agencies, such as the National Cancer Institute (Bethesda, MD, USA) and the National Heart, Lung, and Blood Institute (Bethesda, MD, USA) among national institutes, the University of California, San Francisco among universities, and GlaxoSmithKline (Brentford, UK), Pfizer (New York, NY, USA), and Merck Sharp & Dohme Corp. (Whitehouse Station, NJ, USA) among companies, are performing the roles of leader and intermediator at the same time. However, many of the top 20 agencies in regard to the number of clinical trials in the 2010s were companies, while those of in the rank of degrees in the 2010s are universities. The interpretation is that the global big pharmaceutical companies are performing the roles of the main leaders, and the universities are the intermediators in the clinical trial collaboration network.
Third, the maps of the clinical trials collaboration network in the 2000s and in 2010s show a large cluster component and a few separate clusters. The large cluster component consists of institutions mostly, except for some of Taiwanese, Chinese, and Korean institutions. In the 2000s, the Taiwanese group was generated separate in the network. In the 2010s, the Chinese, Korean, and Taiwanese groups were generated separately in the network. This phenomenon can be a clue suggesting the different strategic moves of Taiwan, China, and South Korea from other countries.
The results of this study show that the clinical trials database from ClinicalTrials.gov could be used a probe for understanding and examining the pharmaceutical industry. The results help understand the collaborative relationships in clinical trials in the pharmaceutical industry. It provides some information about how the clinical trial collaboration network has changed, and who the major players are. The study also suggests some specific follow-up research topics, such as the collaborative strategies of clinical trials in each country. Research that analyzes the clinical trial collaboration of companies or countries will help understand their detailed strategies. If follow-up studies would be conducted, the utilization of the clinical trials database will be further expanded, and it is expected to help open innovation in the pharmaceutical industry.