Publishing Patterns in Bric Countries: a Network Analysis

How similar are the publishing patterns of among Brazil, Russia, India and China (BRIC countries) in comparison with other countries? This is a question that we addressed by using networks as a tool to analyze the structure of similarities and disparities between countries. We analyzed the number of publications from 2006 to 2015 that are reported by SCImago Journal and Country Rank. With this information, we created a network in order to find the closest countries to BRIC ones, and also to find communities of similar countries favoring data analysis. We found that Brazil, China and Russia are not that close to the core cluster of countries that are more diversified. In opposition, India is closer to a community of countries that are more diverse in terms of publishing patterns. Furthermore, we found that, for different network topologies, Brazil acts as a bridge to connect developing countries and that Russia practices patterns that tend to isolate it from most of the countries.


Introduction
Brazil, Russia, India and China, the BRIC countries, represent growing economies in different regions of the planet [1].These countries tend to lead their regions because of their characteristics of high levels of population density and growing-emerging economies.BRIC countries have improved their higher educational systems (see for example [2]) and are improving national policies to compete in the research arena.
It is possible to analyze how good countries are in creating new knowledge by analyzing their research output [3][4][5][6].Usually these studies cover the impact of countries in terms of numbers and indicators of publications and citations.Our approach was quite different.We placed at the center of our analysis the patterns of publishing and used them to create a network of similarities between countries.
We also made the distinction between production (number of papers published) and performance or success (i.e., number of citations acquired).Although, both aspects are related to research activity, they are expressions of different capacities and can be analyzed separately.In this article, we are interested in patterns of production and not in patterns of performance, mainly because production shows how capable countries are (including the institutions and the people working there) to create new knowledge in different areas of science.On the other hand, success or performance is aimed at studying how that knowledge is used in the scientific community (see for example [6]).
In this article, we characterize the publishing patterns of countries to create a network of similarities that allows us to visualize and understand how similar/dissimilar the countries are.
In particular, we analyze where the BRIC countries are located in this landscape, and we also find the closest countries to each BRIC country.
Network visualizations have been demonstrated as powerful tools to analyze complex phenomena [7], and their application to the analysis of research in general has been increasing during the last decade even though it has been used for almost half a century (see for example a pioneer co-citation network analysis by Henry Small [8]).Related to the development of research activity, networks have been applied to analyze: citation patterns [9][10][11], co-authorship/collaboration [12,13], and research policy [14].Within these types of network-based analyses, those oriented toward understanding countries have focused on the analysis of patterns of collaboration, mainly as projection of co-authorship networks, as for example [15][16][17].
Regarding the specific analysis of research production in BRIC countries, a few studies have been proposed.Kumar and Asheulova analyzed the research output of BRIC countries by applying statistical methods to the data in Scopus [18].Yang et al. [19] analyzed the structure of disciplines in BRIC countries, comparing them to the ones included in the G7.The latter study used information from the database of the Science Citation Index of Web of Science (WoS).The authors assigned each paper to the country of the correspondence author; in other words, one paper to one country.The disciplines were the ones defined by the Journal Citation Reports (JCR) from Thomson Reuters.The authors concluded that BRIC countries are mainly focused on physics, chemistry, mathematics and some areas of engineering.Our study differentiates from the previous one, first in the data (we used SCImago), and also in the classification and the method used to find similarities between countries.
The rest of this article is organized as follows: in Section 2, we detail the data and methods used; in Section 3, we present our core results; finally, in Section 4 we discuss and conclude our findings.

Data and Methods
To analyze the publishing patterns of countries, we gathered information from SCImago [20].This site is based on the Scopus database and presents information of the number of citable documents per country in each category.Categories are defined according to the Scopus classification of science.This classification includes 27 main areas divided into 308 subcategories.We chose SCImago for three main reasons: first, it includes all the spectrum of research (Sciences, Social Sciences and Arts-Humanities) in contrast with other dedicated databases such as PubMed or DBLP; second, SCImago is public, that is to say, users do not need to pay for a subscription to access the site-and the data, which is a particularly interesting situation for institutions and people (certainly in BRIC countries) that would like to replicate, validate, or contrast our results; finally, SCImago includes a wide number of journals, in contrast with Web of Science ® , which is limited in quantity of journals but also biased in language, since most of the journals indexed are written in English (see for example [21] for a discussion on this bias).We claim that a better signal of the production in science of BRIC countries (most of them non-Anglophone) is represented by the information published on SCImago rather than, for instance, Web of Science ® .
In SCImago, each journal may be assigned to one or more categories of science.In addition, each paper is assigned-completely-to the countries that correspond to the affiliations of the authors.This assignation increases the number of authorships for each country.We will refer to this number as value of authorships.We queried information for all available countries from 2006 to 2015.Accordingly, our resulting dataset comprises 224 countries, 10 years, and 308 categories.Even though SCImago incudes information since 1996, we chose this time interval because China, one of the BRIC countries, experienced a surge in research activity starting 2006 [22,23].Furthermore, as we wanted the freshest information available, we carried out research up until the year 2015.The information was analyzed during the first week of June 2016.
To handle the data, we created a matrix A of countries and categories.We aggregated the data in the whole interval using the average over the ten years.Then, each entry A ij of the matrix A contains the average of values of authorships of a given country i in the category of science j in the interval 2006-2015.
Figure 1a presents a visualization of this matrix and Figure 1b shows the share or relative production of each category inside the country.For visualization purposes, matrices in Figure 1a,b are transposed and they show only the top 30 countries ranked (from left to right) by the values of authorships and the top 50 categories.It must be noted that both China and India are ranked in the top 10 countries, while Brazil and Russia are ranked in the top 15; this fact shows that all the BRIC countries are important contributors in the global scene of science.
In order to identify the publishing patterns of countries, we needed to find the relative importance (abundance) of each category of science within each country.We characterized the pattern of a country as a vector V in which its features correspond to the "share" of each category in that country.These values are comprised in a "share" matrix denoted by S (see Figure 1b) where each entry S ij of S is computed using the following Equation: Each vector Vi that characterizes the publishing patterns of country i, corresponds to a row in S.Then, we used these vectors to find similarities among countries and centrality measures for each BRIC country.Figure 1b presents a visualization of these vectors for the top 30 countries and for the top 50 categories.We can also note the differences between matrices in Figure 1a,b, where the latter shows fewer differences among countries, and the former shows publishing patterns (behavior in publishing) for each country.
We computed the similarity using the cosine similarity function.Cosine similarity ∅ cc1 between countries c and c' is calculated as follows: where the numerator computes the dot product or scalar product between vectors of countries and the denominator computes the product of norms.The resulting similarity matrix ∅ is symmetric.
To visualize the similarities between countries, we also created a network (see Figure 2) that allows us to have the "big picture" about the landscape of countries if clustered by similarities in publishing patterns.To detect communities we applied the algorithm Infomap [24], which is an algorithm that uses random walks to generate sequences of jumps from one node to another, and then it applies Huffman codes to detect patterns of jumps that are repeated most frequently, which are finally defined as communities.The algorithm of Huffman, also known as Huffman Coding, optimizes the number of digits needed to code (name) each node by assigning more digits to less frequent nodes.Even though other algorithms that can detect communities in a network were available, we preferred the Infomap algorithm because it is considered one of the most powerful methods, and it is not limited to particular networks as in the case of the Generalized Louvine method [25], another widely used algorithm, but oriented toward large networks.
For visualization purposes, we filtered links of the network below a threshold (0.895) to maintain only the strongest links.It must be noted that the community structure (and the visualization) are highly dependent on this parameter.A low value (more links) will produce a tiny structure of communities (1 or 2), while a high value (fewer links) will produce more clear communities but a lot of connected components and isolated nodes.As a consequence, we looked for this threshold with the criteria of searching a value-with three decimal digits-small enough to get each BRIC country in a different community, since we want to highlight the similarities and differences among these four countries.Network visualization of similarities between 111 countries according to publishing patterns.Similarity was computed using cosine similarity function.Links below 0.895 were eliminated just like isolated countries.Colors were assigned according to communities automatically detected by the algorithm Infomap [24].Nodes corresponding to BRIC countries are highlighted with squares.Country codes are defined according to the standard ISO 3166-1 alpha-3 codes.A list of communities and their corresponding countries has been included in Appendix A.

Communities in the Network of Countries are Close to Represent Regions
By analyzing Figure 2 and the table in Appendix A, we observe that communities detected by the algorithm Infomap [24] correlate with regions and/or level of development.For example, the central and highly connected cluster (red community) is composed mainly of developed countries such as United States, Germany, France, or Japan.On the contrary, we find isolated clusters of developing countries like the African countries (orange community) or some Latin American countries such as Ecuador and Bolivia (light green community).
The biggest connected component of the network includes six additional communities beside the red one.We note that in the biggest connected component the red community plays a central role, while, for instance, communities colored aqua, violet, and tomato are in the periphery.BRIC Similarity was computed using cosine similarity function.Links below 0.895 were eliminated just like isolated countries.Colors were assigned according to communities automatically detected by the algorithm Infomap [24].Nodes corresponding to BRIC countries are highlighted with squares.
Country codes are defined according to the standard ISO 3166-1 alpha-3 codes.A list of communities and their corresponding countries has been included in Appendix A.

Communities in the Network of Countries are Close to Represent Regions
By analyzing Figure 2 and the table in Appendix A, we observe that communities detected by the algorithm Infomap [24] correlate with regions and/or level of development.For example, the central and highly connected cluster (red community) is composed mainly of developed countries such as United States, Germany, France, or Japan.On the contrary, we find isolated clusters of developing countries like the African countries (orange community) or some Latin American countries such as Ecuador and Bolivia (light green community).
The biggest connected component of the network includes six additional communities beside the red one.We note that in the biggest connected component the red community plays a central role, while, for instance, communities colored aqua, violet, and tomato are in the periphery.
BRIC countries-all of them-belong to different communities (as anticipated) and they are part of the biggest connected component.

India Is the Most Connected Country of BRIC Countries
While India belongs to a community with a high level of connection to the central community, Brazil and China, on the other hand, are located in communities with few strong links with other communities.Furthermore, Russia is not well linked in a community, having poor levels of connection and being far from the central community.
We can argue these statements by looking at the centrality measures (Table 1) for each BRIC country in the network.Here, India (rank 36) is the most connected country if measured by the weighted degree as proxy of connectivity.Also in this rank, China, Brazil, and Russia are in positions 59, 73, and 90 respectively.Countries of the red community (the most connected) are in the top of this list.

Brazil Connects Communities
If we analyze the network of the strongest similarities in Figure 2, we can find that Brazil connects its community to other communities.Brazil connects its community (violet) mainly with the red community.The case of China and India is quite different, since the countries of the latter (yellow community, which is composed mainly of Middle Eastern countries such as Saudi Arabia or Iran) do not depend on India to connect with other communities.This is verified by the ranking and values of betweenness centrality in Table 1.While Brazil has a high value (rank 19), China and India occupy positions 36 and 67, respectively.Finally, in this topology, Russia is not needed for any country to connect with others.
It must be noted that the previous analysis was performed on the visualization and network topology presented in Figure 2. Observations can be different if the threshold to delete weak links changes.For this reason, in further sections we also present a quantitative analysis on the fully connected matrix and other threshold values, and we will show how these measures can be different (see Sections 3.5 and 3.6).

Dominant Areas Per Community
We also analyzed the other way around, that is, what the production is like in particular fields of science inside each community.We do this by averaging the countries' production in each field, for all the countries inside each community.Table 2 resumes the dominant scientific fields for the communities that host a BRIC country.By doing this, we can analyze which are the areas that generate these (dis)similarities.A detailed version of this table is available in Appendix B, which includes the mean values and all the results for each community.We can observe how India and China's communities are more oriented toward technological areas, such as electronic and mechanical engineering, while Russia's community is mainly focused on physics-oriented fields.On the other hand, in Brazil's community, countries participate mainly (at least five of ten categories) in agricultural or animal sciences.
By looking deeply, we can also observe that the resulting patterns are not only due to the areas in which countries publish, but also to the number of papers produced in each field.This can be noted by analyzing the detailed information included in Appendix B, where we might see, for instance, that the community of leading countries has values above 1200 papers on average in each top-ranked area while developing communities publish below 100 papers on average in top-ranked areas.

Analysis of Similar Countries to BRIC Countries
By looking at the (fully connected) similarity matrix ∅, we can find which countries are more similar to BRIC countries.In Table 3 we show the top 10 similar countries for each BRIC country.If we only analyze links greater than 0.9, we might note that China maintains strong similarities mainly with Asian countries.In the case of Russia, it only has strong similarities with two former countries of the Soviet Union.The case of Brazil is quite similar to the case of Russia, since Brazil only has strong similarities with four Latin American countries.
On the other hand, India is quite different.India has strong similarities with the countries in the top 10 ranking.Those countries are mainly European and three of them are part of the G7.When we extend this analysis to the full top 10 list, we note that again China has similarities mainly with Asian countries like South Korea, Singapore, or Japan.Something similar occurs in the case of Brazil, where we may note that 5 of the 10 most similar countries are Latin American countries.In the case of Russia, 9 of the 10 countries are from Europe and at least three of them are former Soviet Union countries.Russia is a special case in which the similarities with other countries tend to be low.Finally, we can also observe that among BRIC countries, only India is in the top 10 list of China and there are no other strong similarities between BRIC countries.

Centrality Measures for BRIC Countries
As we mentioned previously, the selection of a threshold to delete weak links produces different results in the topology of the network.In our previous analysis we chose this value with the intention of producing a meaningful community structure to analyze BRIC countries based on strong links.However, it is necessary to provide more information about the variations of the network for different threshold values.In the current section, the threshold will not be oriented toward the community structure but toward the distribution of similarities.We also wanted to analyze how the characteristics of BRIC countries (as nodes of the network) change with these variations.
In order to define different values for the threshold, we looked at the distribution of similarities of the fully connected network (see Figure 3) and then we computed the deciles of the distribution.We used these 9 values (excluding the minimum and the maximum) to create 9 different networks and to analyze the values of: degree centrality (weighted degree), betweenness centrality, and closeness centrality for each node, in particular for each BRIC country.
Publications 2016, 4, 20 8 of 14 extend this analysis to the full top 10 list, we note that again China has similarities mainly with Asian countries like South Korea, Singapore, or Japan.Something similar occurs in the case of Brazil, where we may note that 5 of the 10 most similar countries are Latin American countries.In the case of Russia, 9 of the 10 countries are from Europe and at least three of them are former Soviet Union countries.Russia is a special case in which the similarities with other countries tend to be low.Finally, we can also observe that among BRIC countries, only India is in the top 10 list of China and there are no other strong similarities between BRIC countries.

Centrality Measures for BRIC Countries
As we mentioned previously, the selection of a threshold to delete weak links produces different results in the topology of the network.In our previous analysis we chose this value with the intention of producing a meaningful community structure to analyze BRIC countries based on strong links.However, it is necessary to provide more information about the variations of the network for different threshold values.In the current section, the threshold will not be oriented toward the community structure but toward the distribution of similarities.We also wanted to analyze how the characteristics of BRIC countries (as nodes of the network) change with these variations.
In order to define different values for the threshold, we looked at the distribution of similarities of the fully connected network (see Figure 3) and then we computed the deciles of the distribution.We used these 9 values (excluding the minimum and the maximum) to create 9 different networks and to analyze the values of: degree centrality (weighted degree), betweenness centrality, and closeness centrality for each node, in particular for each BRIC country.Table 4 resumes the rank occupied for each BRIC country (see a detailed table in Appendix C) for 10 different variations in the threshold, 9 values from the deciles of the distribution of similarities, and 1 additional value corresponding to the filter used to create the community structure in Figure 2.
Regarding centrality proxied by degree, the low level of centrality (position in ranking, see first block in Table 4) is constant for Russia, which occupies the last position of the four BRIC countries along the 10 variations of the threshold.The case of Brazil is interesting, as it is a central country until the eighth value (D8) after which its ranking starts decreasing.This can be explained in the sense that strong connectivity of Brazil is certainly produced with countries that have weak connections, and these countries tend to disappear from the network once the threshold is high as in the ninth and tenth values.In these two final networks, India and China increase their position.Table 4 resumes the rank occupied for each BRIC country (see a detailed table in Appendix C) for 10 different variations in the threshold, 9 values from the deciles of the distribution of similarities, and 1 additional value corresponding to the filter used to create the community structure in Figure 2.
Regarding centrality proxied by degree, the low level of centrality (position in ranking, see first block in Table 4) is constant for Russia, which occupies the last position of the four BRIC countries along the 10 variations of the threshold.The case of Brazil is interesting, as it is a central country until the eighth value (D8) after which its ranking starts decreasing.This can be explained in the sense that strong connectivity of Brazil is certainly produced with countries that have weak connections, and these countries tend to disappear from the network once the threshold is high as in the ninth and tenth values.In these two final networks, India and China increase their position.

FigureFigure 1 .
Figure1apresents a visualization of this matrix and Figure1bshows the share or relative production of each category inside the country.For visualization purposes, matrices in Figures1a and 1bare transposed and they show only the top 30 countries ranked (from left to right) by the values of authorships and the top 50 categories.It must be noted that both China and India are ranked in the top 10 countries, while Brazil and Russia are ranked in the top 15; this fact shows that all the BRIC countries are important contributors in the global scene of science.

Figure 2 .
Figure 2.Network visualization of similarities between 111 countries according to publishing patterns.Similarity was computed using cosine similarity function.Links below 0.895 were eliminated just like isolated countries.Colors were assigned according to communities automatically detected by the algorithm Infomap[24].Nodes corresponding to BRIC countries are highlighted with squares.Country codes are defined according to the standard ISO 3166-1 alpha-3 codes.A list of communities and their corresponding countries has been included in Appendix A.

Figure 2 .
Figure 2.Network visualization of similarities between 111 countries according to publishing patterns.Similarity was computed using cosine similarity function.Links below 0.895 were eliminated just like isolated countries.Colors were assigned according to communities automatically detected by the algorithm Infomap[24].Nodes corresponding to BRIC countries are highlighted with squares.Country codes are defined according to the standard ISO 3166-1 alpha-3 codes.A list of communities and their corresponding countries has been included in Appendix A.

Figure 3 .
Figure 3. Histogram of values of similarities for the fully connected network.

Figure 3 .
Figure 3. Histogram of values of similarities for the fully connected network.

Table 1 .
Ranking of centrality values for countries.Measures of centrality: Weighted degree (Deg.), betweenness centrality (Bet) and closeness centrality.After position 46 some countries are omitted in order to present all the values for BRIC countries.Total countries in the database: 111.

Table 2 .
Ranking of top 10 ranked areas for each community.We considered the average value of production inside the community for each area.Details of values and data on the other communities are presented in Appendix B.