A Bibliometric Analysis of Crowdsourcing in the Field of Public Health

With the characteristics of low cost and open call, crowdsourcing has been widely adopted in many fields, particularly to support the use of surveys, data processing, and the monitoring of public health. The objective of the current study is to analyze the applications, hotspots, and emerging trends of crowdsourcing in the field of public health. Using CiteSpace for the visualization of scientific maps, this study explores the analysis of time-scope, countries and institutions, authors, published journals, keywords, co-references, and citation clusters. The results show that the United States is the country with the most publications regarding crowdsourcing applications for public health. Howe and Brabham are the two leading authors in this field. Further, most of the articles published in this field are found in medical and comprehensive journals. Crowdsourcing in public health is increasing and diversifying. The results of this study will enable and support the analysis of the specific role of crowdsourcing in the public health ecosystem.


Introduction
Crowdsourcing was first definitized by Jeff Howe in 2006, to represent the act of organizations outsourcing their tasks to an undefined and large group of people [1]. Brabham described crowdsourcing as an online, distributed problem-solving and production model [2]. Crowdsourcing allows access to a large pool of public volunteers, saves time in collecting data, reduces costs, and accelerates the speed of innovation [3,4]. With increasing globalization and continuing internationalization trends, the flattening effect of Globalization 3.0 has created an environment that encourages the growth of crowdsourcing [5]. Crowdsourcing can extend the innovation activities of enterprises to an infinite and vast network space-it also explores, utilizes, and integrates the innovation resources of the whole society and the wisdom of the society through the Internet [6][7][8][9]. From the fields of innovative design to hygiene, positioning services, and new product development, crowdsourcing quietly subverts business models and traditional social structures [10,11].
The Internet has greatly reduced the cost of information transfer and the boundaries of participating activities. In this context, the concept of health 2.0 has been proposed with the practice of crowdsourcing in the field of health communication [12]. Every health sector can benefit from a crowd of tasks that can facilitate research [3]. There is no doubt that crowdsourcing creates a great opportunity in health, hygiene, and medical research. As appointed by Swan, crowdsourcing health studies are the union of three contemporary tendencies, namely "citizen science," crowdsourcing, and Medicine 2.0 [13]. Medicine 2.0 or Health 2.0 announces individuals actively participating in their health care, especially utilizing Web 2.0 technology [13]. Like Internet 2.0, Terry argues that Health 2.0 promotes self-creation, sharing, community concepts, and user self-empowerment, all of which coincide with the aim of crowdsourcing [14]. Most of the previous studies have only conducted experiments to explore the function of crowdsourcing in the specific field of public health, such as improving public awareness of sexual health, obtaining solutions to medical problems, and collecting medical big data, etc. Limited studies have been published related to a systematic review or overview of crowdsourcing for public health. Among them, some studies extracted data from reports via PubMed, EMBASE, and Google, but no research has utilized the Web of Science ("WOS") database. The systematic review by Ranard et al. illustrated the scope of crowdsourcing in health and medical research, but only contained 21 studies [19]. The narrative review by Swan explained the adoption of crowdsourcing in health research studies up to 2011 [13]. Moreover, no scholars carry out bibliometric analysis on the application of crowdsourcing in public health or related topics. This study uses a bibliometric approach to analyze the adoption of crowdsourcing in the field of public health and presents research hotspots, evolution history and emerging trends. Using co-citation analysis and co-occurrence analysis, we look at the Web of Science ("WOS") publication data related to crowdsourcing and public health from 2006 to 2019.
This study may contribute to existing research from three aspects. First, this study is the first bibliometrics analysis of crowdsourcing applications for public health using CiteSpace software. We use bibliometric analysis to provide a new insight that was not conducted comprehensively in previous studies. Second, we offer a better understanding of the emerging trends of the adoption of crowdsourcing in the field of public health, through summarizing the role of crowdsourcing in public health. Third, the analytical framework and the results concluded in this study will provide research basis and directions for future bibliometric analysis in public health ecosystem. As this study explores the general application of crowdsourcing in public health, it could serve as a sound foundation for future research that focuses on the specific roles of crowdsourcing in concrete tasks, e.g., surveying, data processing, monitoring, etc.
The rest of this study is organized as follows. Section 2 introduces the materials and methods. Section 3 conducts the time and space scope analysis of crowdsourcing research in the field of public health. Knowledge domains and emerging trends of crowdsourcing's application in public health is displayed in Section 4. Section 5 draws the main conclusions of this study and points out future research directions and limitations.

Introduction to Bibliometrics
In 1955 Garfield proposed an approach for searching scientific literature with citation indexes and, since then, citation analysis has gradually become an important research method in the field of scientific metrology [28]. Pritchard suggested a proper name for this subject, bibliometrics, which combined mathematics and statistical methodology [29]. Norton applied bibliometrics as a tool to measure texts and information [30]. In recent years, bibliometric analysis has been widely utilized in the interdisciplinary research field in order to identify the development of hidden or emerging subjects [31][32][33]. From an objective and quantitative perspective, bibliometric analysis is the typical approach that uses citation relationship to generate effective material for scientists [34,35]-hence, it can reflect the hotspots, evolution, and emerging trends within a specific field [29,36,37].

Data Source and Search Strategy
We retrieved data from the core collection database of Web of Science (WOS), limiting search to the Science Citation Index (SCI), Social Science Citation Index (SSCI), Conference Proceedings Citation Index-Science (CPCI-S), and Conference Proceedings Citation Index-Social Science & Humanities (CPCI-SSH) [38,39]. This study analyzes publications from 2006 to 2019, since the "crowdsourcing" was proposed by Howe in 2006 [1]. The search rule was: TS = ((crowd$sourcing) AND (health OR hygiene* OR public near/2 health)). The search scope included existing research results of article, proceeding papers and reviews. After refinement, 308 documents were retrieved. The search was conducted in April 2019 and the summary of search details is shown in Table 1.

Analysis Tools
Two tools were used in the analysis, namely (A) CiteSpace V and (B) Excel 2016. (A) CiteSpace V software was used to conduct visualization and knowledge graph analysis in this study. CiteSpace V is a visual analysis tool developed by Professor Chaomei Chen (Drexel University, Dalian University of Technology, Changjiang Scholar) based on the JAVA platform, which can realize co-citation analysis, keyword co-occurrence analysis, and collaborative analysis of institutional authors, etc. [40,41]. Compared with other visualization software, CiteSpace V has the advantages of more convenient data processing, better visualization and easier interpretation. Therefore, it can meet the requirements of the literature co-citation and keyword co-occurrence analysis of large samples. (B) To perform the analysis, Microsoft Excel 2016 was used to count the annual quantities of publications for crowdsourcing research.

Parameter Settings of CiteSpace
CiteSpace V with version 5.2.R 2.3.26.2018 for 64-bit windows was used in this study. In the time slicing, the time span was from 2006 to 2019 and the years slice was set as 1. In the text processing, we chose all term source, including title, abstract, author keywords and keywords plus [36]. In addition, we selected pathfinder to prune the merged network because it can simplify the network and highlight the important structural features [42][43][44][45].

Analysis of Countries and Institutions
Citespace V can use annual rings to display the number of papers published by countries, cooperation and centrality. The size of the annual rings serves as the quantity of posts, and the outermost purple circle shows the centrality [41]. We select country and institution as network nodes respectively, with the data extraction object of Top 30, and the path data is visualized by pathfinder.
The map in Figure 3 shows that 308 publications were contributed by 24 countries/regions. The country with the most publications concerning crowdsourcing applied in public health is the United States (197), followed by England (34), China (21), and the Netherlands (13). From the perspective of betweenness centrality, the top two countries/religions were England (centrality = 0.65) and USA (centrality = 0.55), indicating that they have direct or indirect cooperation with many countries in the co-existing network, such as Canada, Netherlands and China. In recent years, researchers in China have published 21 papers. Although the research started late (the first time of publication was 2015), the number of papers has increased year by year.  Table 2, we list the top ten core research institutions that contributed to crowdsourcing research in public health. USA is the country with the largest number of academic outputs, and its main research institutions are the University of North Carolina (11), University of Pennsylvania (9), UC San Francisco (9), Columbia University (8)

Analysis of Authors and Co-Authors
The author collaboration network of crowdsourcing research was described in Figure 4. The significance of Anonymous (63), Howe (40), Brabham (35), Ranard (32) and Swan (27) can be clearly appreciated. On the report of figure information, Anonymous has produced the most papers regarding to crowdsourcing applied in public health. Hence, we list the top 10 authors and co-authors with the most articles contributed by a worth of 298 authors in Table 3. The Hirsch index, or h-index, and citations have been included for each scientist. Looking at Figure 3 and Table 3, we find that Cooper (centrality = 0.21) and Gao (centrality = 0.2) are two authors with a high degree of cooperation in this field. Tucker (10), Tang (8), Zhang (7) are the top three authors with productive publications. In terms of co-citations, Mason has been cited the highest due to having the most indicators for the number of citations. However, Howe is the core co-author on crowdsourcing applications for public health.

Analysis of Co-Cited Journals
The table below illustrates the 11 most journals from the entire 502 of journals according to crowdsourcing applied in public health, as seen in Table 4.  (46). Many papers published on medical journals represent that crowdsourcing has been used as an advanced tool in public health research field. In addition, 'Nature' and 'Science' are comprehensive top-level journals that provide research frontiers and hotspots and build knowledge base for medical, biological, economics, management, and psychology in the field of public health research.
When identifying core journals in a research field, the number of posts is not the only indicator, and the centrality and cited frequency are equally important. Figure 5 shows visualization of co-citation journals analysis. CLIN PSYCHOL SCI (0.13), BMC PUBLIC HEALTH (0.12) and AM J PUBLIC HEALTH (0.11) are the top three centrality journals for medical research. These journals offer academic platforms for medical researchers, which indicates that applying crowdsourcing in public health has received unanimous attention.

Analysis of Co-Occurring Keywords
The keywords reflect the core and focus of a paper. The top 10 keywords on crowdsourcing applications in public health are listed in Table 5. When turning to the keywords, it was interesting to report that the analysis is focused on depression (centrality = 1.13), smoking (centrality = 1.01), and children (centrality = 0.91), and is measured through the Internet, social media, and online communities. Therefore, it indicates that most crowdsourcing applications connect to public health via the Internet, online platforms, etc.

Analysis of Co-Cited References
Based on co-cited references, CiteSpace V can generate a research evolution graph to display the development of crowdsourcing applications in public health with a time zone version. We selected cited references as the network node and adopted the path search algorithm (pathfinder) to analyze [46]. Therefore, we can identify key research results on this topic, as seen in Figure 6. These scholars and their research results have played important roles in promoting the development of crowdsourcing research in the field of public health.  Table 6 shows Top 10 key references in crowdsourcing applications for public health. The publication "Crowdsourcing as a model for problem solving: Leveraging the collective intelligence of online communities for public good," published in 2008 by Brabham, presents the motivation of participants to participate in the crowdsourcing activities and explore the potential of crowdsourcing applications for public sensors [2]. The most cited article is "Amazon's Mechanical Turk: A New Source of Inexpensive, Yet High-Quality, Data?" by Buhrmester, 2011, published in the journal of the Association for Psychological Science, which describes and evaluates the potential contributions of the MTurk (a crowdsourcing platform) to psychology and other social sciences [47]. The two highly cited articles are foundation of online crowdsourcing and its applications for public health. The key research references can reflect the development of crowdsourcing as applied to public health., and the main findings are as follows: In 2011, Behrend used crowdsourcing to collect survey data for behavioral research. The authors present that the application of crowdsourcing is an efficient and appropriate alternative to university participant pools [48]. This paper with a high degree of centrality has become the foundation for crowdsourcing applied in the social survey.
In 2012, Swan pointed out that contemporary public health faces challenges, including rising costs, worsening outcomes, 'diabesity' epidemics, and an expected shortage of doctors. Crowdsourcing will be an effective tool to collect health big data and to realize the vision of preventive medicine by 2050 [49].
In 2014, Brabham argued that crowdsourcing has the potential to be a method for improving public health science. Crowdsourcing offers several effective ways toward improving health behaviors through participant engagement online, such as the knowledge discovery and management, the distributed human intelligence projects, the broadcast search and the peer-vetted creative production methods [16]. Meanwhile, Ranard created a taxonomy to characterize past applications of crowdsourcing in the field of health and medical research. The application of crowdsourcing can improve the quality, cost, and speed of research programs [19].
In 2015, Zhang used crowdsourcing contests to shape a "bottom-up" approach and described two creative contributory contests (CCC) to enhance sexual health campaigns [50]. This study may be useful for other groups expanding community engagement in sexual health science.
In 2016, Chandler summarized the MTurk's (an online crowdsourcing platform) data quality with an emphasis on results related to clinical psychological research. MTurk is a fast and cost-effective approach to collect nonprobability cases that are more diverse than those commonly adopted by psychologists [51].
In 2018, Créquit mapped the diverse adoptions of crowdsourcing in the field of health to assess the health areas that are using crowdsourcing and the crowdsourcing projects used. The application of crowdsourcing is growing in health promotion, exploration, and care. However, the definition of crowdsourcing logistics and crowdsourcing participants' characteristics is often lacking in research reports [3].
Overall, crowdsourcing has been widely used to collect data, make surveys, solve problems and monitor. Since 2012, crowdsourcing has gained important attention in the field of public health. Many scholars have begun to study the functions of crowdsourcing in personal care, biological agents, sexual health communication, etc. Furthermore, the Amazon Mturk is the most used platform, and data processing is the main type of crowdsourcing in this field. Crowdsourcing contests are an efficient and responsible way for scientists to conduct medical experiment, and to improve the awareness of public health. In addition, we find several research gaps, such as the ethical problems when applying crowdsourcing in public health, characteristics of participants in the crowdsourcing process, the effectiveness of crowdsourcing without the Internet, etc.

Analysis of Evolutionary Path
The title words and keywords can reflect the research focus of publications. Therefore, through the co-occurrence analysis of the title words and keywords, we can explore research hotspots and frontiers in the scientific field. The clusters of keywords and references (extracting title terms) are shown in Figure 7. As we can see, keywords and references are divided into nine clusters, respectively.
Given these results, we can summarize the publication types, application technology, research domains, and distribution. Firstly, the main type of publications is review. It means that many scholars conduct systematic review to describe the applications of crowdsourcing in public health. Next, the Amazon Mturk is the most used platform. In recent years, social media and mobile applications have become new tools to offer health advisory and collect information. Then, with the development of global health, crowdsourcing applications have spread to many regions, especially in China and the United States. Further, the potential of crowdsourcing applications for public health includes surveillance or monitoring, information extraction, design, etc. Finally, research subjects contain stroke, non-smokers, clinical trials, diabetic retinopathy, and health anxiety, however, participants' characteristics are poorly reported.

Conclusions
From a multi-dimensional, time and dynamic perspective, we use CiteSpace V to analyze the development trends and hotspots of crowdsourcing as applied to public health. This study has shown that crowdsourcing is a relevant topic, particularly in recent years and in the field of public health. As we know, this paper is the first bibliometrics analysis of crowdsourcing applications for public health using CiteSpace software. The results in this study contribute from several aspects to our current understanding.
First, with regard to the scope of time and space, the amount of crowdsourcing literature has grown sharply, and the application of crowdsourcing has spread to many domains, especially in public health and health communication. At present, the United States occupies a leading position in the domain of crowdsourcing research for public health, followed by England and China. Scholars from the Guangdong Provincial Center for Disease Control and Prevention cooperate with American universities to apply crowdsourcing in sexual health communication and HIV prevention. They are the first research team that have made outstanding contributions to the application of crowdsourcing in the field of public health in China.
Second, according to co-authors and types of journals, Howe and Brabham are the two core authors in this field. The World Health Organization has also pointed out the effectiveness of crowdsourcing in some reports. In addition, the results of research on crowdsourcing applications for public health are mainly published in medical and comprehensive journals. Medical journals include the 'Journal of Medical Internet Research', 'Lancet', and 'American Journal of Public Health', etc.-comprehensive journals contain 'Nature', 'Science', and 'PloS One', etc.
Third, through co-citation analysis, the use of crowdsourcing in public health is increasing, particularly in preventive medicine, mental health, and personalized prevention, etc. The specific applications of crowdsourcing contain data processing, surveying, surveillance, and problem-solving. Further, the crowdsourcing model not only promotes the application of healthy big data and the construction of intelligent health platform but also pushes the online process of social medical crowdsourcing.
Finally, crowdsourcing research in this field focuses on four knowledge domains, namely crowdsourcing as a medium for Internet public health communication, the application of crowdsourcing in the field of prevention and treatment, the role of crowdsourcing in the public health care ecosystem, and applied research of crowdsourcing competitions in infectious diseases and epidemiology. In addition, there are several future research directions to discuss. The first one is the application of other databases for bibliometric analysis, such as Google Scholar, which contains citations available in sources other than the WOS. Second, the WHO emphasize the importance of crowdsourcing competitions to improving public health. Hence, future studies can evaluate the role of crowdsourcing competitions/contests in public health development. Final, future research can consider ethical concerns because personal data and diagnostic results may be shared in the process of crowdsourcing.
Our study has some limitations. On the one hand, we did not search the gray literature to identify some unpublished studies. On the other hand, we only consider online crowdsourcing, not including research without the Internet. Therefore, we may underestimate the number of studies adopting crowdsourcing in public health.

Conflicts of Interest:
The authors declare no conflict of interest.