Growth in Biological Sub-Fields : Patterns , Predictability and Sustainability

Biologists are producing ever-increasing quantities of papers. The question arises of whether current rates of increase in scientific outputs are sustainable in the long term. I studied this issue using publication data from the Web of Science (1991–2010) for 18 biological sub-fields. In the majority of cases, an exponential regression explains more variation than a linear one in the number of papers published each year as a function of publication year. Exponential growth in publication numbers is clearly not sustainable. About 75% of the variation in publication growth among biological sub-fields over the two studied decades can be predicted by publication data from the first six years. Currently trendy fields such as structural biology, neuroscience and biomaterials cannot be expected to carry on growing at the current pace, because in a few decades they would produce more papers than the whole of biology combined. Synthetic and systems biology are problematic from the point of view of knowledge dissemination, because in these fields more than 80% of existing papers have been published over the last five years. The evidence presented here casts a shadow on how sustainable the recent increase in scientific publications can be in the long term.


Introduction
The increase in the production rate of scientific output has long been recognized [1][2][3][4][5][6].Plausible causes of such a trend include the increasing number of scientists worldwide [7], the consequently larger number of discoveries worth communication to peers, public and posterity [8], as well as administrative pressure to publish e.g. in order to get or maintain postdoctoral and tenured positions at academic institutions [9].The ever-increasing number of scientific publications results in increasing specialization, due to the sheer impossibility for both scientists and laymen to keep up with the whole of one's own main field of interest, let alone following what is happening in neighboring and distant disciplines [10].
Publication growth trends have been documented in various fields, from neuroscience [11] to bioinformatics [12] and plant biotechnology [13].For example, the global output in research on Geographic Information Systems (GIS) increased by a factor of three between 1997 and 2006 [14].In medical informatics, the average growth rate in published articles per year was 12% over the period 1987-2006 [15].Research on photosynthesis was instead shown to have reached a plateau at the beginning of the 1990s, after a period of rapid growth between the 1950s and the 1980s [16].
These results, together with the recent (i) economic crisis, (ii) adoption of new electronic technologies and (iii) shift towards publication and reading on the internet, call for a comparative study among different fields.Is the increase in research output carrying on unabated to the present day?Or are we instead reaching a paper peak, as also predicted for finite resources such as oil, coal and natural gas?How sustainable are current rates of increase of scientific outputs?Taking biology and some of its sub-fields as a case study, this paper investigates the recent (1991-2010) trend in number of publications indexed in Web of Science (WOS).
Biology, the science of life, is a key scientific area for achieving sustainability, i.e. human use of resources which preserves the environment so that human needs can be met also by future generations.Whilst many other scientific disciplines are also essential for sustainability (e.g., the social sciences, including economics; climate science; information and energy technology, etc.), biologists have much to contribute to sustainable development because ecosystems throughout the planet have evolved smart ways to recycle resources, thus making the biosphere sustainable.Whilst not all biological publications are directly relevant to current efforts to improve the sustainability of modern human civilization, investigating trends in biological publications is important from a sustainability point of view.This is because current rates of production of biological publications are contributing to information overload, which may make it harder for key advances towards sustainability to reach the intended audience (e.g., students, policy makers, journalists, the public, other scientists).

Materials and Methods
A search was carried out in WOS (all citation databases: Science Citation Index Expanded, Social Sciences Citation Index, Arts & Humanities Citation Index, Conference Proceedings Citation Index-Science, Conference Proceedings Citation Index-Social Science & Humanities, and Index Chemicus) in January 2012.WOS does not yet include many new journals, thus providing more conservative results (i.e. a lower boundary for the observed publication growth patterns) [17].The year 1991 was chosen as starting point for the study given that abstracts are searched in WOS starting from that year [18].
Whilst not all papers relevant to these sub-fields are found using these strings, the papers retrieved using these generic keywords should be representative of a given sub-field, thus allowing comparability among sub-fields and temporally within sub-fields [19,20].The methodology used (retrieving papers using keywords in the field Topic) allows the analysis to also include novel subfields that are not yet indexed as a Subject Category in Web of Science (structural biology, bioengineering, bioinformatics, biomaterials, biomechanics, systems biology and synthetic biology).Some papers may have been retrieved for more than one field, but this is inevitable given the multi-disciplinarity of today's biology and is unlikely to affect the broad results of the study.
For each sub-field, it was noted how many papers were published each year from 1991 to 2010.The year 2011 was left out of analyses given that some papers may still need to be indexed in WOS.Using SAS 9.1, linear and exponential regressions were carried out to explain the variation in number of new papers per year as a function of publication year for each sub-field.The ratios of the number of papers published in 2010 vs. 1991, 1996 vs. 1991 and 2010 vs. 2006 were calculated.Similarly, the proportion of papers published in the last three and five years over the sum of papers published over the whole study period (1991-2010) was computed, as well as the variation from 1991 to 2010 in the proportional representation of the investigated sub-fields in biology as a whole.

Results
The number of published papers per year markedly increased in all investigated sub-fields (Figure 1).The greatest increase in terms of the slope of the linear regression (i.e. the average additional number of published papers per year) was observed for genetics, the largest sub-field among those investigated.The shallowest slopes of these linear regressions were observed for the niche areas of structural and developmental biology, as well as for bioengineering (Figure 1).The ratio of the 2010 vs. 1991 publications was greatest for structural biology (23 times), neuroscience (17) and biomaterials (16).The lowest values for this ratio were observed for molecular biology (2.3) and biochemistry (1.8) (Table 1).
The proportion of papers published over the last five years compared to the whole literature corpus (1991-2010) was highest for synthetic biology (94%), systems biology (84%) and bioinformatics (65%).The same result was obtained using the proportion of papers published over the last three years only.Also in this case, the lowest proportions were observed for molecular biology and biochemistry (about 30% of 1991-2010 papers published over the last five years, and 20% over the last three years; Table 1).
Sub-fields with the greatest increase in their proportional share of the whole pie of retrieved biological papers from 1991 to 2010 were bioinformatics (+6%), biomaterials (+5%) and structural biology (+4%) (Figure 2; Table 1).The greatest proportional decrease in representation was observed for biochemistry (-10%), genetics (-7%) and molecular biology (-5%) (Figure 2; Table 1).The r 2 of an exponential regression of the number of papers per year as a function of publication year was higher than the r 2 for a linear regression in 13 of the 18 investigated sub-fields and for biology as a whole (Table 1).However, for biochemistry, genetics and bioengineering the additional proportion of variance explained by an exponential regression compared to a linear one was negligible.A linear regression explained instead more variation than an exponential one for structural, developmental and systems biology, as well as for molecular biology (Figure 1).

Table 1. Summary statistics of the increase in number of papers indexed in
Publication data from the first six years explained about 75% of the variation in publication growth among sub-fields over the two studied decades (Figure 3b).Plotting the variation in 2010 vs. 1991 publications as a function of the variation in 1995 vs. 1991 publications showed that structural biology, neuroscience, immunology and biomaterials have produced more papers than could have been predicted at the beginning of the 1990s, whereas for molecular biology, microbiology and developmental biology the opposite was the case (Figure 3a).The greatest increase in published papers over the last five years (2010 vs. 2006) was observed for synthetic biology (~six times), immunology (~three times) and systems biology (~two times), whereas the lowest increase was found for developmental biology and molecular biology (both 1.1 times).

Patterns
There is evidence for an inexorable increase in published papers per year over the last two decades for the investigated biological sub-fields, as well as for biology as a whole (where about five times as many publications were indexed in 2010 compared to 1991).Such an increase may be a good omen for the many newly founded biological journals, which can expect to be able to choose from a steady flow of submissions in the coming years, provided that the creativity and financial resources of biologists worldwide are not curbed by economic crises and/or natural resource shortages.Despite the warnings of a peak in finite natural resources such as fossil fuels and rare materials [21,22], the production of biological knowledge (according to the WOS database) does not seem to be reaching a plateau for the time being.Writing scientific papers might be less constrained by the availability of material resources than other human endeavors such as traveling, shopping and reproducing [23].

Sustainability
Nonetheless, carrying on expanding the number of biological publications in the near future at the same rate cannot be considered sustainable.For the majority of the investigated biological sub-fields, an exponential regression explained more variation in the number of indexed papers per year as a function of publication year than a linear regression.This result is important, because, continued over the next twenty years, an exponential increase would diverge considerably from a linear increase starting at year = 1 at the same number of papers and crossing again at year = 20 (Figure 4) [24].

Predictability
The data show that it would have been possible to predict about three quarters of the variation among biological sub-fields in publication growth over the studied period (1991-2010) from data about the publication growth of the same sub-fields over the first six years.Nonetheless, such a prediction would have missed the later appearance of bioinformatics, synthetic and systems biology.These findings stress the need for the monitoring of the number of new yearly publications of various scientific fields, and may be useful for research funders, graduate students, postdocs, established researchers and policy makers, who may all benefit from knowing in advance which sub-fields are likely to expand more than others.However, extrapolating these trends into the future is problematic.For example, if biomaterials and neuroscience publications were to carry on growing at the same rate, from 2050 onwards they would be producing more papers than biology as a whole (based on a continuation of the trend observed between 1991 and 2010).Similarly, immunology and synthetic biology would be producing already in 2030 more papers than biology as a whole (based on a continuation of the trend observed between 2006 and 2010).

Conclusions
Overall, there is a generally consistent pattern of unrelenting expansion in the number of biological papers that have appeared over the last two decades.Given that an exponential increase better describes the observed growth pattern compared to a linear one for the majority of investigated sub-fields, observed trends are unlikely to be sustainable.In the long term, logistic growth is more likely due to both external constraints and the ramification in new sub-disciplines [25].
However, when the proportions of papers published in the last three to five years over a 20-year period are higher than 30-40%, it becomes very difficult for students and researchers to keep up to date with the latest developments in a field.Reliable predictions of differential growth patterns among the investigated biological subfields appear to have been possible using data from the first six years of the studied period, but novel technological advances and fields not yet on the horizon cannot be anticipated using this bibliographic methodology and should thus be identified in other ways (e.g., horizon scanning exercises).
Although the number of biological publications is increasing year after year, there is no evidence that publishing peer-reviewed papers is becoming any easier, as suggested by the report of a worsening file-drawer problem in natural, medical and social science databases [26].A worsening publication bias towards positive results is a worrying trend because it may be a misleading factor in meta-analyses.
The reported increase in published papers over the last two decades has made the recent launching of many new scientific journals possible, although it may also have been in part a consequence of such new foundations [27].Such a trend towards an increased number of papers has not only consequences for scientists, the public and policy-makers in terms of availability of new results (favored by open access publishing policies) and information overload [28,29], but also potential environmental consequences.
Even if some publications are moving to online only publication [30], electronic publication (with subsequent printing by a proportion of individual researchers) still has an environmental footprint, for which we largely lack reliable information.For example, there is a lack of data on the carbon emissions for the average print and electronic paper in biology, considering not just the writing and production processes, but also the research, travel and infrastructure behind the reported results [31][32][33][34].Economies of scale may well operate when scientists and research institutions produce more papers per unit of time/energy invested, but the overall environmental impacts of the increasing scale of scientific output should not be belittled [35,36].Moreover, in many cases, electronic publication is happening on top of print publication.It is time for scientific publishers, editors and researchers to start considering ways to improve the ratio between the marginal benefits of communicating the average additional discovery and the marginal costs of the associated emissions of pollutants.Innovative technologies to diminish pollution deriving from publishing research papers may be made ineffective if the growth of scientific publications carries on at the same rate.
To achieve a reduction in the growth rate of scientific publications without blocking innovation and the communication of breakthroughs, non-authoritarian solutions are needed [37,38].These may range from the key role of mentors (e.g., supervisors and senior colleagues could provide an example to young researchers if they were the first to avoid excessive publication behavior) to incentives (short-listing and career promotions not dependent on number but on quality of publications) [39,40].Other solutions include: • editorial discouragement of the slicing of results into least publishable units and encouragement of well-rounded papers [41]; • making it mandatory for all scientists to dedicate some time to teaching and public presentations each year [42,43]; • limiting the number of publications that can be included in support of grant applications (as done e.g. by the current European Research Council starting grant scheme); • decreasing the precarious nature of science for young researchers (scientists on short-term contracts might tend to publish more papers than those on long-term contracts, so as to increase their chances of getting a new position).

Figure 1 .
Figure 1.Increase in publications per year indexed in Web of Science (WOS) (1991-2010) in 18 biological sub-fields.For all linear regression and exponential models, n = 20 (with exception of systems biology (n = 15) and synthetic biology (n = 9)) and p < 0.001.
Web of Science for various biology sub-fields between 1991 and 2010: Sum of retrieved papers (sum), whether an exponential (exp) or a linear (lin) model explains more variation in the number of papers per year as a function of publication year, ratio of papers published in 2010 vs. 1991 ((2010-1991)), variation in the sub-field proportion of papers out of all investigated sub-fields in 2010 vs. 1991 (prop), ratio of papers published in 2010 vs. 2006 ((2010-2006)), and proportion of papers published over the last five and three years out of the sum of papers retrieved over the whole study period.

Figure 2 .
Figure 2. Pie chart of number of papers indexed in Web of Science in (a) 1991 (total publications retrieved = 6523) and (b) 2010 (total = 38122) for the investigated biological sub-fields.

Figure 3 .
Figure 3. (a) Correlation of the ratio of 2010 vs. 1991 publications with the ratio of 1995 vs. 1991 publications for the investigated biological sub-fields (with exclusion of bioinformatics, systems and synthetic biology, with no retrieved publications in 1991) and for biology as a whole, (b) increase in the correlation coefficient between ratio of 2010 vs. 1991 publications and the ratio of (1991 + x) vs. 1991 publications as a function of x (number of years since 1991).

Figure 4 .
Figure 4. Divergence of hypothetical linear and exponential increases starting at the same level at year 1 and crossing at year = 20 with five times more items than at year 1.