The Identification of Scientific Communities and Their Approach to Worldwide Malaria Research

It is essential to establish a pattern to detect the strengths and weaknesses of working groups publishing on malaria, to promote coordination to facilitate the eradication of the disease. Given the complexity of the scientific network of groups and institutions studying malaria, it is necessary to use a mathematical algorithm that allows us to know the real structure of research on the disease in the world. In this work, articles with the word “malaria” in the title or author keywords gathered from Elsevier Scopus database were analyzed. By means of specific software, graphs were created. The analysis of the data allowed established different scientific communities, among which two were very diverse: one formed by those groups concerned about the vector transmission and control, and another one focused on the drug resistance of the parasite. Basic, applied, and operational research to eradicate malaria is an ambitious goal of the international institutions and the scientific community. The combination of effort and the establishment of a worldwide-scientific network that allows an effective interconnection (exchange) of knowledge, infrastructure technology, collaborators, financial resources, and datasets will contribute more effectively to end the disease.


Introduction
Malaria is one of the infectious diseases that generates the greatest concern to global health [1,2]. According to the last report of the World Health Organization, in 2016, malaria produced almost half a million deaths and more than 200 million cases were estimated, with 90% of these cases occurring in African countries. Malaria is a mosquito-borne disease caused by a parasite of the genus Plasmodium, with six species that affect humans: Plasmodium falciparum, which causes the most deaths, Plasmodium vivax, the most widely spread except in sub-Saharan Africa, Plasmodium malariae, Plasmodium knowlesi, Plasmosdium ovale curtisi, and Plasmodium ovale wallikeri [3][4][5][6].
Plasmodium has a complex life cycle, developing partially in mosquitoes of the genus Anopheles [7]. In humans, the parasite has a multiplicative exoerythrocytic stage, in hepatic cells, and another intraerythrocytic phase. The beginning of the cycle occurs with the bite of an infected female Anopheles mosquito, which inoculates the sporozoites in the vertebrate host. These migrate to hepatic cells, where they multiply and transform into exoerythrocytic merozoites that invade red blood cells again, generating erythrocytic merozoites that can invade new red blood cells and generate a new batch of The Elsevier Scopus database was used to obtain information about published works on malaria. A complete search was performed using the search query: (TITLE-ABS-AUTHKEY (malaria)). The search range used was from 1900 to 2017. It should be noted that a different search query can give different results. To facilitate the analysis of the data, a powerful network analysis tool, Gephi, was used. The data extraction was carried out automatically through the implementation of a specific software called Research Network Bot (ResNetBot) [36]. This software allows the elaboration of a graph in which each publication on malaria is represented by a node, and the connections between two nodes represent the existence of a citation of one article on the other. With these data, a new graphic is constructed. This time, the nodes represent the researchers, while the relationships between the nodes represent the collaborations in at least one publication. Data obtained through ResNetBot was refined with the OpenRefine software (OR) (formerly Google Refine) (Google, Mountain View, California, USA) and organized into spreadsheets to facilitate its management. The need to use OR is justified by the fact that authors often write slightly different keywords that identify the same concepts. In order to be able to unify these keywords OR is used because it provides the necessary mechanisms to find and merge written variations of the same word. E.g. "New York", "New-York", "new york", etc. So, in our research the keywords need to be refined.

Evolution of Scientific Output
The search yielded 85,370 results, whose growth is represented in Figure 2. As can be observed, the data show two trends. The first of them extends throughout the twentieth century, and it is exponential, while the second one is from the year 2000 to the present, and it is linear. In the first of the periods, the adjustment to the trend line is optimal, giving rise to a regression coefficient R2 greater than 0.9. The only data that separates from this trend line corresponds to the number of articles published in the year 1946, which is 429. This data is abnormally high. In fact, it is not reached and exceeded completely until the 1980s. The explanation for this is found in two milestones of great importance in the fight against malaria that took place in that year. On the one hand, chloroquine was recognized and established as an effective and safe antimalarial agent [37]. And, on the other hand, the CDC (Communicable Disease Center) was created from the Office of MCMA (Malaria Control in War Areas), whose main objective was to fight against diseases such as malaria in the The Elsevier Scopus database was used to obtain information about published works on malaria. A complete search was performed using the search query: (TITLE-ABS-AUTHKEY (malaria)). The search range used was from 1900 to 2017. It should be noted that a different search query can give different results. To facilitate the analysis of the data, a powerful network analysis tool, Gephi, was used. The data extraction was carried out automatically through the implementation of a specific software called Research Network Bot (ResNetBot) [36]. This software allows the elaboration of a graph in which each publication on malaria is represented by a node, and the connections between two nodes represent the existence of a citation of one article on the other. With these data, a new graphic is constructed. This time, the nodes represent the researchers, while the relationships between the nodes represent the collaborations in at least one publication. Data obtained through ResNetBot was refined with the OpenRefine software (OR) (formerly Google Refine) (Google, Mountain View, CA, USA) and organized into spreadsheets to facilitate its management. The need to use OR is justified by the fact that authors often write slightly different keywords that identify the same concepts. In order to be able to unify these keywords OR is used because it provides the necessary mechanisms to find and merge written variations of the same word. E.g., "New York", "New-York", "new york", etc. So, in our research the keywords need to be refined.

Evolution of Scientific Output
The search yielded 85,370 results, whose growth is represented in Figure 2. As can be observed, the data show two trends. The first of them extends throughout the twentieth century, and it is exponential, while the second one is from the year 2000 to the present, and it is linear. In the first of the periods, the adjustment to the trend line is optimal, giving rise to a regression coefficient R 2 greater than 0.9. The only data that separates from this trend line corresponds to the number of articles published in the year 1946, which is 429. This data is abnormally high. In fact, it is not reached and exceeded completely until the 1980s. The explanation for this is found in two milestones of great importance in the fight against malaria that took place in that year. On the one hand, chloroquine was recognized and established as an effective and safe antimalarial agent [37]. And, on the other hand, the CDC (Communicable Disease Center) was created from the Office of MCMA (Malaria Control in War Areas), whose main objective was to fight against diseases such as malaria in the South of the United States during the Second World War. Currently, the CDC is the main North American public health agency and its mission is "collaborate to create the expertise, information, and tools that people, and communities need to protect their health-through health promotion, prevention of disease, injury, and disability-and preparedness for new health threats." Although the initial objective of the CDC was the control and elimination of malaria, its current role is aimed at prevention and surveillance, because it is considered eradicated in the USA since 1951. South of the United States during the Second World War. Currently, the CDC is the main North American public health agency and its mission is "collaborate to create the expertise, information, and tools that people, and communities need to protect their health-through health promotion, prevention of disease, injury, and disability-and preparedness for new health threats." Although the initial objective of the CDC was the control and elimination of malaria, its current role is aimed at prevention and surveillance, because it is considered eradicated in the USA since 1951. The second period that can be observed in Figure 2 is adjusted to a linear trend line. Its slope is m = 159. That is, since 2000, each year, the average number of articles that have been published grows by more than 150 units. This highlights the enormous interest that research around malaria has, at present.

Authors and Countries in Malaria Research
In the 85,370 published articles on malaria in the period studied, 148,876 authors and a total of 2,211,628 collaborations appear among them. In Table 1, the 20 most important authors can be observed. To establish this ranking, the eigenvector centrality or eigencentrality has been taken into account. In graph theory this is used to highlight the influence of a node in a network. Other values that could be used to measure the importance of an author are the H-index [38], the number of published articles, or the number of citations received by these articles. But the possibility that an author has worked in different areas or themes could mask the result. However, a high score of the eigenvector is directly related to the value that an author has within a network in which each author is identified with a node, and in which the connections of each node are preferably scored with other nodes of a high score. Figure 3-A shows the 455 authors represented by nodes with an eigencentrality value greater than 0.2, and their connections with the rest of the nodes (authors). Figure 3-B shows the 158 authors represented by nodes with an eigencentrality value greater than 0.4, and Figure 3-C shows the 31 authors represented by nodes with an eigencentrality value greater than 0.6. The very high density of relations that exist between authors, it is possible to emphasize above all with an eigenvector >0.2. The size of each node is proportional to the value of its eigencentrality, and the color is representative of the country from which the author´s institutional affiliation. The second period that can be observed in Figure 2 is adjusted to a linear trend line. Its slope is m = 159. That is, since 2000, each year, the average number of articles that have been published grows by more than 150 units. This highlights the enormous interest that research around malaria has, at present.

Authors and Countries in Malaria Research
In the 85,370 published articles on malaria in the period studied, 148,876 authors and a total of 2,211,628 collaborations appear among them. In Table 1, the 20 most important authors can be observed. To establish this ranking, the eigenvector centrality or eigencentrality has been taken into account. In graph theory this is used to highlight the influence of a node in a network. Other values that could be used to measure the importance of an author are the H-index [38], the number of published articles, or the number of citations received by these articles. But the possibility that an author has worked in different areas or themes could mask the result. However, a high score of the eigenvector is directly related to the value that an author has within a network in which each author is identified with a node, and in which the connections of each node are preferably scored with other nodes of a high score. Figure 3A shows the 455 authors represented by nodes with an eigencentrality value greater than 0.2, and their connections with the rest of the nodes (authors). Figure 3B shows the 158 authors represented by nodes with an eigencentrality value greater than 0.4, and Figure 3C shows the 31 authors represented by nodes with an eigencentrality value greater than 0.6. The very high density of relations that exist between authors, it is possible to emphasize above all with an eigenvector >0.2. The size of each node is proportional to the value of its eigencentrality, and the color is representative of the country from which the author's institutional affiliation.  Figure 3 attempts to represent the relationship between authors from different countries, as well as their relevance. Therefore, a visualization (by color) of the authors from the different countries is sought, obtaining a general idea of which countries prevail over others, as well as the relevance of the main authors from one country versus authors from other countries. Due to the large number of nodes (authors) it is not possible to indicate their names.  As can be observed in Table 1, the first 20 authors in malaria research belong to ten different countries. These can be considered in two groups. On the one hand, the African (Kenya, Mali and Burkina Faso) and Asians (Thailand, Viet Nam and Laos) countries, of great importance for the number of estimated cases and deaths produced by malaria; and, on the other hand, the European (United Kingdom, and Netherlands), American (USA) and Australian (Australia) countries, of great importance for the number of articles published. Figure 4 shows the countries with the highest number of cases estimated by the World Health Organization in the period 2010-2016 ( Figure 4A), which have presented a greater number of deaths estimated in that same period ( Figure 4B), and those that have published the most articles in 1900-2017 ( Figure 4C). Burkina Faso and Mali are among the five countries with the highest number of deaths, while the United States and the United Kingdom are the two countries whose institutions publish the most articles on malaria. For this reason, it is not surprising that the main researchers working in these countries are those with the highest eigencentrality.  Figure 3 attempts to represent the relationship between authors from different countries, as well as their relevance. Therefore, a visualization (by color) of the authors from the different countries is sought, obtaining a general idea of which countries prevail over others, as well as the relevance of the main authors from one country versus authors from other countries. Due to the large number of nodes (authors) it is not possible to indicate their names.
As can be observed in Table 1, the first 20 authors in malaria research belong to ten different countries. These can be considered in two groups. On the one hand, the African (Kenya, Mali and Burkina Faso) and Asians (Thailand, Viet Nam and Laos) countries, of great importance for the number of estimated cases and deaths produced by malaria; and, on the other hand, the European (United Kingdom, and Netherlands), American (USA) and Australian (Australia) countries, of great importance for the number of articles published. Figure

Communities Detection
From 85,370 published articles, 714,979 citations were counted-the citations to nodes of topics other than malaria were not included. On the other hand, 149,143 keywords defined by the authors appear in these articles. Both data allow us, using genetic algorithms, to establish a series of thematic SC, of which, the most frequent are represented in  Figure 5 has been carried out. The information that codifies Figure 5 has to do with the relative size of one community in front of the other as well as the proximity of certain communities to each other, depending on how close the nodes in the graph are.

Communities Detection
From 85,370 published articles, 714,979 citations were counted-the citations to nodes of topics other than malaria were not included. On the other hand, 149,143 keywords defined by the authors appear in these articles. Both data allow us, using genetic algorithms, to establish a series of thematic SC, of which, the most frequent are represented in  Figure 5 has been carried out. The information that codifies Figure 5 has to do with the relative size of one community in front of the other as well as the proximity of certain communities to each other, depending on how close the nodes in the graph are.  Keywords 1 to 5 appear ordered according to their appearance frequency in each SC, after eliminating malaria as keyword, since it appears in first place in all detected communities, but its presence does not contribute anything to the analysis. In Figure 5, the 11 main SC appear highlighted using different colors. Communities #1 and #2 had the highest proportion of published articles, and each of them is focused on the two main fields of action that currently exist against malaria: the fight against mosquitoes and drug resistance. Today, it is demonstrated that the most effective way to reduce severe malaria and the number of deaths caused by the disease, in endemic regions, is the use of insecticide-treated bed nets (ITNs) [39,40]. The use of ITNs reduces both the number of mosquitoes and their length of life, producing a double protection effect in the affected regions. The only class of insecticides allowed in the ITNs are the pyrethroids, due to the low toxicity effect in human health, and to their slow decomposition. Therefore, it is essential to find new systems that prevent the transmission of malaria, as well as new insecticides that repel mosquitoes from areas of human concentration. The second major concern of the international scientific community in the fight against malaria is the presence of drug resistance in Plasmodium. Over time, as new antimalarial drugs have been introduced, the parasite has become resistant to them. First, it happened with chloroquine and other similar 4-aminoquinolines. Then, resistance to sulfadoxine-pyrimethamine. And, more recently, cases of parasites resistant to drugs derived from artemisinin have also been discovered [41]. So, the long-term success of this strategy will depend to a large extent on the control over the ways that the parasite develops drug resistance. Therefore, it is a priority to approach this problem from different perspectives. It will have to be done both from the pharmacological point of view, with the use of drug combinations with different formulations, and from the strategic point of view, in terms of health infrastructure.
The following SC #3, #4, #5, and #6 are focused on the study of apicomplexa, severe malaria, diagnosis and vaccines, respectively. Of them, the research on a vaccine that ends the disease arouses Keywords 1 to 5 appear ordered according to their appearance frequency in each SC, after eliminating malaria as keyword, since it appears in first place in all detected communities, but its presence does not contribute anything to the analysis. In Figure 5, the 11 main SC appear highlighted using different colors.
Communities #1 and #2 had the highest proportion of published articles, and each of them is focused on the two main fields of action that currently exist against malaria: the fight against mosquitoes and drug resistance. Today, it is demonstrated that the most effective way to reduce severe malaria and the number of deaths caused by the disease, in endemic regions, is the use of insecticide-treated bed nets (ITNs) [39,40]. The use of ITNs reduces both the number of mosquitoes and their length of life, producing a double protection effect in the affected regions. The only class of insecticides allowed in the ITNs are the pyrethroids, due to the low toxicity effect in human health, and to their slow decomposition. Therefore, it is essential to find new systems that prevent the transmission of malaria, as well as new insecticides that repel mosquitoes from areas of human concentration. The second major concern of the international scientific community in the fight against malaria is the presence of drug resistance in Plasmodium. Over time, as new antimalarial drugs have been introduced, the parasite has become resistant to them. First, it happened with chloroquine and other similar 4-aminoquinolines. Then, resistance to sulfadoxine-pyrimethamine. And, more recently, cases of parasites resistant to drugs derived from artemisinin have also been discovered [41]. So, the long-term success of this strategy will depend to a large extent on the control over the ways that the parasite develops drug resistance. Therefore, it is a priority to approach this problem from different perspectives. It will have to be done both from the pharmacological point of view, with the use of drug combinations with different formulations, and from the strategic point of view, in terms of health infrastructure.
The following SC #3, #4, #5, and #6 are focused on the study of apicomplexa, severe malaria, diagnosis and vaccines, respectively. Of them, the research on a vaccine that ends the disease arouses the greatest interest [42]. There is currently no effective malaria vaccine, but there are three lines of action targeting key points in the life cycle of the malaria parasite: the anti-infection approach (pre-erythrocytic vaccines), blood-stage vaccines, and transmission-blocking vaccines, interrupting the spread of infection. It is likely that the definitive solution comes from the combination of several of these approaches. At the moment, one pre-erythrocytic candidate, the RTS,S vaccine [43], is going to be administered in a pilot implementation.
The last five communities of Table 2-SC #7, #8, #9, #10, and #11-involve topics such as pregnancy and VIH, Plasmodium vivax, mosquitoes and immunity, travel and drugs, and glucose-6-phosphate dehydrogenase. Of these, research in P. vivax, a highly prevalent parasite in most affected areas except Africa, is becoming very important. Although more severe cases of malaria associated exclusively with P. vivax are detected, a large number of mixed infections of P. falciparum and P. vivax are diagnosed in patients living in areas where both species are prevalent. For the rest, it should be borne in mind that pregnant women and their unborn children are particularly vulnerable to the disease, that every year there are detected thousands of cases of malaria in travelers in countries where the disease was eliminated, and there is a concern about the use of primaquine in people with a deficiency in glucose-6-phosphate dehydrogenase (G6PD), as it can cause severe hemolysis. All these aspects focus the research of a large number of work teams throughout the world.

Findings
Eradication of malaria is on the global health agenda. Therefore, now is the time to ask ourselves at what point malaria stands, how much we know about the human disease, the parasite and the mosquito vectors [44,45] and take the correct directions for malaria control and elimination.
After more than a century of research on malaria, the number of articles published on it is enormous. There are almost 100,000 records, with different approaches and an accumulated knowledge acquisition that makes necessary an analysis of the data based on algorithms that facilitate its interpretation.
After analyzing about 150,000 authors and the more than two million relationships established between them, we have been able to establish a parameter that measures the role played by each author in the complex network of researchers who published on malaria research. So, we have seen that A second aspect that we have considered is the set of keywords with which the authors delimit and define the scope of their research. The analysis of the 150,000 keywords appeared in the articles on malaria has allowed us to detect, through the use of an iterative algorithm, up to 11 communities. The analysis of these communities allows us to draw a map about worldwide malaria research. Thus, we have seen that most research efforts are focused on drug resistance and control of the mosquito vector species. Also, of great importance, especially in recent years, the research focused on vaccine development or knowledge of parasites other than Plasmodium falciparum, such as P. vivax.

Limitations
In such a large data analysis, as that carried out for all published works on malaria, an additional problem is that of the dense data representation, it has been found that representation values as a function of eigenvector value >0.2 is appropriate.

Future Work
After eradicating malaria from Europe, North America, the Caribbean, and some countries of South-Central America and Asia, the time has come to consider removing it from the countries where, at present, malaria is a true public health problem, such as the sub-Saharan Africa countries. But, for this, the combination of all the tools with which science counts nowadays will be necessary. The collaborative networks between countries and scientists of notable importance will have to be strengthened, so that, the confluence of their knowledge finds the end to malaria. No one is thinking that a simple approach will eliminate malaria in Africa or other areas. It will not be just the fight against the mosquito, or the administration of a vaccine. It will be the union of all efforts in the search for a global strategy, also at a political, financial, educational, and social level.

Conclusions
These results allow us to have a global view of the state of world malaria research. This is the first time that a manuscript studies the relationship between the work of the most important groups in this field. For the first time, the lines of research of greatest interest to the international scientific community have been drawn, and the multilateral relations that are taking place between the different scientific communities involved in this research have been established. Furthermore, this study shows that the role of each scientist, each institution or each country in the fight to eradicate malaria is unique and plays a unique role within the network of which it forms a part.