Predicting the Evolution of Physics Research from a Complex Network Perspective

The advancement of science, as outlined by Popper and Kuhn, is largely qualitative, but with bibliometric data, it is possible and desirable to develop a quantitative picture of scientific progress. Furthermore, it is also important to allocate finite resources to research topics that have the growth potential to accelerate the process from scientific breakthroughs to technological innovations. In this paper, we address this problem of quantitative knowledge evolution by analyzing the APS data sets from 1981 to 2010. We build the bibliographic coupling and co-citation networks, use the Louvain method to detect topical clusters (TCs) in each year, measure the similarity of TCs in consecutive years, and visualize the results as alluvial diagrams. Having the predictive features describing a given TC and its known evolution in the next year, we can train a machine learning model to predict future changes of TCs, i.e., their continuing, dissolving, merging, and splitting. We found the number of papers from certain journals, the degree, closeness, and betweenness to be the most predictive features. Additionally, betweenness increased significantly for merging events and decreased significantly for splitting events. Our results represent the first step from a descriptive understanding of the science of science (SciSci), towards one that is ultimately prescriptive.


Introduction
While merging events are highly correlated with interconnections between communities, the correlation between splitting events and the internal structure of communities are much more complex; besides, the predictions of forming, dissolving, growing, shrinking were not considered at all.
Given the recent successes in the area of machine learning and artificial intelligence to a variety of prediction problems 14,15 , as well as having developed and validated a general framework to predict social group evolution in Saganowski et al. 16 , we decided to utilize machine learning techniques to fill the gap in predicting scientific knowledge events [17][18][19] .The overall idea behind the Group Evolution Prediction (GEP) method is to build a classification model trained with historical observations in order to predict the future group changes based on their current characteristics, such as size, density, average degree of nodes, etc.A single historical observation consists of a set of features describing the group at a given point in time, and an event type that this group just experienced.The profile of the group may reflect its structure (e.g.density), dynamics (e.g.average age of its member articles) or context (e.g. the journals which the articles-group members-come from).In total, we used over 100 features, some of which were already known to the literature, whereas the others focusing on the dynamics and context are the new, unique features proposed in this paper.Indeed, when we rank the most valuable features contributing to successful prediction of knowledge evolution events, the new features are among the best ones.In order to be able to perform prediction of future group changes, we have to track and learn the model on the historical cases.For that purpose, the group changes from the past (historical evolution) need to be defined and discovered using the methods successfully applied to the Social Network Analysis field, e.g. the GED method 20 , Tajeuna et al. method 21 or other 22 .Most of the methods consider the similarity between the groups in the consecutive time windows as a major factor to match similar groups and further to identify the evolution event type between them.In our work, we apply the GED method, which facilitates both the group quantity (the number of common members) and the group quality (the importance of common members), in order to match related groups.This allows us to enrich the co-citation evolution network with information about member relations, which is depicted in the Social Position measure 23 .
In this study, we extract groups-topical clusters (TCs)-from the bibliographic coupling networks (BCNs) and independently from the co-citation networks (CNs) for the period 1981-2010.Next, the GED method is utilized to label four types of evolution events (changes of TCs): continuing, dissolving, merging and splitting.Then, we use an auto-adaptive mechanism to find the most predictive machine learning model together with its parameters for each network.Additionally, two scenarios were considered for each network: when the number of events of each kind is imbalanced (the original case) and balanced by equally sampling.In general, the prediction quality was satisfactory good for all event types, with F-measures substantially exceeding 0.5.Such values are significantly greater than the baseline F-measures as of 0.14-0.21for both networks.The feature ranking tells us that the most informative features are context-based like the number of PRE, PRB, and RMP papers belonging to the group, and the structural features like the degree, closeness, and betweenness.While looking more carefully at the betweenness of papers from two merging TCs, we find the significantly higher betweenness for papers that are linked across these two TCs than those connected inside the TCs.No such enhancement in betweenness was found for continuing TCs, while a significant decrease in average betweenness was found for splitting TCs.In summary, our findings suggest that evolutionary events in the landscape of physics research can be predicted accurately using various machine learning models, and understanding this predictive power in terms of important features is a worthwhile future research direction.

Physics research evolution for 1981-2010
We begin with studying how scientific knowledge evolved in terms of communities of research papers, and how these communities changed over time.There are several studies on evolution of knowledge within the set of whole journals 2 , which is considered as the analysis on the macroscopic level.Also some research has been carried out for the collection of papers, usually involving some subjective criterion provided by the authors, e.g.only papers cited at least 100 times 1 .As a result, they focus only on a small subset-the most prominent, frequently cited papers, which do not represent the whole diverse domain knowledge.This kind of analysis is considered as microscopic.In our approach, we assume that the most informative way is to analyse neither the entire journal, nor the most cited papers, but whole communities of closely related papers.These communities emerge naturally since they share the same citation patterns.The analysis at such level provides better balance between high and low granularity.We call this kind of analysis as mesoscopic, because it is in-between the macroscopic scale of journals and the microscopic scale of individual papers.However, if we perform community detection directly on the citation network, we might end up with communities consisting of both old and recent papers simultaneously.In such case, it is difficult to interpret how scientific knowledge has evolved from the past to the present.We should be able to explain that such and such communities represent scientific knowledge from an earlier year, whereas the other communities correspond to scientific knowledge from another consecutive year.This enable us to compare them and to distill a picture of how scientific knowledge has evolved from past to present.It requires, however, to construct the networks from research papers that are published in a given year (bibliographic coupling), or papers that are cited in a given year (co-citation).The bibliographic coupling network (BCN) reflects the relation between present publications while the co-citation network (CN) represent the relation between papers which have strong influence on recent publications.In this way, we can detect communities over the years, and study how they evolve year by year, see the Methods section for details on BCN and CN.
After building BCN and CN, the Louvain method was used to extract the community structures.By checking the Physics and Astronomy Classification Scheme (PACS) numbers of the papers in these communities, we have shown that the BCN communities are meaningful and reflect the real structure of the scientific communities 3 .We found indeed that papers in the same community are really focus on the closely related topics.For the CN communities, this validation is tricky because of two problems: (i) the old Physics Review papers have no PACS numbers, and (ii) PACS was revised several times, so the same numbers in different versions can potentially refer to different topics, or the same topics are referred to by different numbers in different versions.Nevertheless, systematic validation seems to be impossible although a quick check on some CN communities after 2010 suggests that CN community structure also reliably reflects the actual scientific community.We refer to these validated units of knowledge evolution as topical clusters (TCs) in this paper.
In Figure 1, we provide the alluvial diagram that depicts the evolution of TCs within the BCNs for the period between 1981 to 2010.The equivalent alluvial diagram for the CNs is shown in Figure S2 in Supplementary Information (SI).In both alluvial diagrams, we visualized the sequences of TCs, their inheritance relations, which can be intimacy indices (for the BCN communities), fraction of common members or inclusion measures (for the CN communities), and the evolution processes they undergo, see the Methods section for more details.The events (changes) that we can discern from the alluvial diagram (shown in Figure 1) are analogous to those recognized in social group evolution 13 .They represent forming, dissolving, growing, shrinking, merging and splitting.We found in Liu et al. that the prediction of such events is hard, since the correlation between them is nonlinear and complex.This challenge is addressed in the following section by tapping into the power of machine learning.

Event labelling
The GED method takes into account the size and the similarity between groups (TCs) in the consecutive time frames in order to label groups change (assign event type).There are four events considered in this work: • continuing-a research field is said to be continuing when the problems identified and solutions obtained from one year to another are of an incremental nature.It is likely correspond to the repeated hypothesis testing picture of the progress of science proposed by Karl Popper 24 .Therefore, in the CN, this would appear as a group of papers that are repeatedly together cited year by year.In the BCN, this shows up as groups of articles from successive years sharing more or less the same reference list.
• dissolving-a research field is thought to disappear in the following year if the problems are solved or abandoned, and no new significant work is done after this.For the CN, we will find a group of papers that are cited up to a given year, but receives very few new citations afterwards.In the BCN, no new relevant papers are published in the field, hence, the reference chain terminates.
• splitting--a research field splits in the following year, when the community of scientists who used to work on the same problems, start to form two or more sub-communities, which are more and more distant from one another.In terms of the CN, we will find a group of papers that are almost always cited together up till a given year, breaking up into smaller and disjoint groups of papers that are cited together in the next year.In the BCN, we will find the transition between new papers citing a group of older papers to new papers citing only a part of this reference group.
• merging--multiple research fields are considered to have merged in the following year when the previously disjoint communities of scientists found mutual interest in each other's field so that they solve the problems in their own domain using methods from another domain.In the CN, we find previously distinct groups of papers that are cited together by papers published after a given year.In the BCN, newly published papers will form a group commonly citing several previously disjoint groups of older papers.
The GED method has two main parameters (alpha and beta), which are the levels of inclusion that groups in the consecutive years have to cross in order to be considered as matching groups.We have applied the GED method with the wide range of these parameters from 5% to 100%.The characteristics of the considered networks required us to set the alpha and beta thresholds to very low values, i.e. 30% for the BCN, and 10% for the CN, see SI for more details.In total, we have obtained 479 various events for the BCN, and 492 events for the CN, which are our observations and the labels in the prediction part of our study.In both networks, the events distribution was imbalanced with the continuing event dominating over all other types, see Figure 2A1 and Figure 2B1.

Future events prediction
The machine learning approach to prediction requires dividing the data into two parts: the training data set and test data set.The training data is used to learn classifier, which can then label events in the test data.The labelled values are compared with the event labels and the prediction performance is calculated.More than 450 observations were used to train the classifiers.Each observation contained 77 features (preselected from the initial 100) divided into three categories: microscopic features (related to nodes in the group, e.g.node degree), mesoscopic features (related to the entire group, e.g. the group size), and macroscopic features (related to the whole network, e.g.network density).Mesoscopic features calculated for individual nodes are commonly aggregated for all nodes from the group, e.g.average node degree or betweenness in the group.See the SI for the complete list of features used.To automatically select the best classification algorithm (model) as well as its hyper-parameter settings to maximize the prediction performance, the Auto-WEKA software package 25 was utilized.For each network, we ran the Auto-WEKA for 48 hours, which allowed us to validate nearly 20,000 configurations per network.The metric being maximized was the F-measure, commonly used for multi-class classification.The overall classification quality was calculated as the average F-measure for all event types, treating them as equally important.
The predicted output variable (event labels) had an imbalanced distribution.Commonly, classifiers tend to focus on the dominant event type (class), which is very well predicted, but at the expense of the minority event types.For the imbalanced BCN data set, the best performance was achieved with the Attribute Selected Classifier (with the SMO as base classifier), which performs feature selection 26 .The percentage of the correctly classified instances was 80.6%, while the average F-measure was only 0.50 due to classifier focusing on continuing, which was the most frequently occurring event type, see Figure 2A.For this event, the F-measure value was equal to 0.89, and only 7 events out of 352 were incorrectly classified.The worst classified was the splitting event, whose F-measure was only as of 0.11.Most of the splitting events were incorrectly classified as continuing (31 out of 33 events).The second worst was merging, with F-measure 0.35.Again, the majority of the merging events were wrongly classified as continuing events: 38 out of 56.Interestingly, the splitting and merging events were never cross-classified mistakenly.For the imbalanced CN data set, the best performance was achieved with a lazy classifier, which uses locally weighted learning 27 .The percentage of the correctly classified instances was 73.3%, while the average F-measure was only 0.53, again due to the classifier concentrating on the dominating continuing event type, see Figure 2B.The F-measure value for the continuing event was only 0.83, however, as many as 50 continuing events (out of 337) were wrongly classified as dissolving.Alike to BCN, many splitting and merging events were incorrectly classified as continuing: 17 out of 22 events, and 24 out of 46 events, with F-measure equal to 0.30 and 0.42, respectively.
By balancing the imbalanced training data sets (i.e. by equally sampling them), we force the classifiers to pay more attention to the features rather than to the number of occurrences of the particular majority event type.As a result of balancing data sets, the previously minor event types (dissolving, merging, and splitting) were predicted much better, but with a significant drop in performance of the continuing event classification.More importantly, by balancing the data sets we increased the overall prediction quality by over 20%.For the balanced BCN data set, the best performance was achieved by means of the boosting-based classifier AdaBoost with the Bayes Net as the base model.The percentage of the correctly classified instances was 62.0% and the average F-measure was 0.61.The biggest sources of errors were merging events, which were wrongly classified as continuing and dissolving, as well as continuing wrongly classified as splitting.The best classified event was dissolving (only 4 mistakes in 27 classifications, the overall score 0.79) followed by the splitting event (6 mistakes in 27 classifications, overall F-measure 0.70).For the balanced CN data set, the Attribute Selected Classifier (with the PART as base classifier) provided the best results-the percentage of the correctly classified instances was 69.32%, while the average F-measure was 0.69 28 .The dissolving, merging, and splitting events were classified very well with the F-measure values equal to 0.79, 0.82, and 0.75 respectively.Most of the continuing events were wrongly classified as splitting (13 out of 22), which resulted in lower F-measure value 0.40.
What is interesting for us to note is that the prediction results for the CN being slightly better than for the BCN.A possible explanation is that for the CN we used a richer similarity measure containing users importance information.Thus the event tracking and therefore the ground truth could be more accurate.Overall, the prediction quality expressed by the average F-measure was very good for the imbalanced as well as for the balanced data sets, as the baseline results obtained with the ZeroR classifier were much worse: F-measure 0.21 for both, BCN and CN, imbalanced data sets, 0.18 for the balanced BCN and 0.14 for the balanced CN.For each data set different classifier turned out to be the best, however most models were wrapped with the boosting or meta classifiers.

Predictive feature ranking
The feature selection technique is used in machine learning to find the most informative features, to avoid classifier overfitting, to eliminate (or at least to reduce) the noise in the data as well as to provide some explanations about phenomena 29 .By repeating the feature selection 1000 times, we obtained 1000 sets of selected features.Next, we calculated how many times each feature has been selected, thus, receiving the ranking of the most often selected features.For the BCN, the context-based  The context-based feature, i.e. the number of papers published in Review of Modern Physics, was the most often selected for the CN data set.It is followed by closeness-and degree-based features in the ranking, see Figure 3B.For both networks macroscopic features were ranked rather low, which suggests that the general network profile is not very important, perhaps because of the smooth changes in the entire network.Surprisingly, the dynamic features, e.g.related to the average age of references (for BCN) and age of articles (for CN) did not show informative value and were ranked very low for both networks.The rankings were validated in the additional two years of data available (2010-2012).The prediction was performed twice: (i) using all features, and (ii) using the top 10 ranked features only.Selecting only the top 10 features, boosted the quality of the prediction by 11% for the CN, and by 2% for the BCN, which underlines the necessity of the feature selection process.

Changes to the Betweenness Distributions Associated with Merging and Splitting Events in BCN
Having the list of best predictive features, Figure 3, we can analyse some of them more in-depth to look for early warning signals.Basically, we believe that scientific knowledge evolves slowly, and this slow evolution drives the evolution of citation patterns.Therefore, there must be specific changes in citation patterns that precede merging and splitting events.Besides the number of PRE papers in a TC, the sum network betweenness is also a strongly predictive feature, see Figure 3A.This suggests that we should look at the betweenness of papers in the BCN more carefully.The betweenness of the node denotes what percentage of shortest paths between all pairs of nodes in the network passes a given node.Values of nodes' betweenness can be aggregated (sum, average, max, min) for all nodes in the TC, as what we list in Table S1.However, in this section we only focus on the distribution of original node betweenness.Naively, when we consider the part of the BCN adjacency matrix corresponding to two TCs that ultimately merged, we expect to find few links between TCs at first.But as the number of links between TCs increase over time, the modularity-maximizing Louvain method will eventually merge the two TCs into a single TC.This is shown schematically in Figure 4, where in general betweenness will increase on average with time as the two TCs merge.In reality, there are always links between TCs, and the numbers and strengths of these links fluctuate over time.To develop a more quantitative description of the merging events outlined in Figure 1, as well as splitting and continuing events, we focus on five events going from 1999 to 2000, shown in Table 1.   1.The five evolution events from 1999 to 2000 in the BCN alluvial diagram Figure 1 that we will study quantitatively.The naming convention for TC is that four digits before '.' is the year of TC, two digits after '.' is the position of the TC in the diagram, starting with 00 for the bottom TC, the one just above bottom is 01 and so on.In the left panel, we highlight the related TCs.
1999.01 + 1999.02→ 2000.03 Let us consider the part of the BCN associated with the TCs.For example, for 1999.01 and 1999.02,we can see from Figure 5(A) that connections within 1999.01 and 1999.02 are very dense, but there are also some links between the two TCs.In fact, we find 164 out of 1849 papers in 1999.01 with non-zero bibliographic coupling to 144 papers in 1999.02(344 papers).
The natural question we then ask is: are the betweenness of the 164 papers in 1999.01 that are coupled to 1999.02 larger, equal, or smaller than the betweeness of the rest 1685 papers in 1999.01 not coupled to 1999.02?Alternatively, if we think of the 164 papers as randomly sampled from the 1849 papers in 1999.01,are we sampling the 164 betweenness in an unbiased fashion?To distinguish the different parts of the TC, we call all papers in 1999.01 which have coupling with papers in 1999.02 as 1999.01a,and the rest of papers as 1999.01b.For more detail analysis, we will divide 1999.01a and 1999.01binto 1999.01aα,1999.01aβ, 1999.01bα, 1999.01bβ .1999.01aα consist of 17 papers in 1999.01a that do not have references in common with papers in 1999.01b,1999.01aβconsist of 147 papers in 1999.01a that have references in common with papers in 1999.01b,1999.01bαare 907 papers in 1999.01b that have references in common with papers in 1999.01a and 1999.01bβrepresents 778 papers in 1999.01b that do not have references in common with papers in 1999.01a.
In Table 2, we show the 25th, 50th and 75th percentiles of the papers in these smaller groups, compared to those of 1849 papers in 1999.01 and 344 papers in 1999.02.As we can see, the 25th, 50th, 75th percentile betweenness in the connecting parts (1999.01aand 1999.02a)are all higher than the 25th, 50th, 75th percentile betweenness in the non-connecting parts (1999.01band 1999.02b).More importantly, these percentile betweenness are higher than the 25th, 50th, 75th percentile betweenness of the TCs 1999.01 and 1999.02themselves.To test how significant these quartiles are in 1999.01a,we randomly sampled 164 betweenness values from 1999.01 10 6 times, and measured the quartiles of these samples.When we draw random samples from a TC, the 25th percentile, the 50th percentile, and the 75th percentile, depends on the size of the TC.There is more variability in these quartiles in smaller samples than they are in larger samples.Therefore, in the test for statistical significance, the observed quartile has to be tested against different null model quartiles for samples of different sizes.To do this, we draw samples with a range of sizes from the same set of betweenness, and for a given quartile (25%, 50%, or 75%), fit the minimum quartile value against sample size to a cubic spline, and the maximum quartile value against sample size to a different cubic spline.With these two cubic splines, we can then check whether the observed quartile value for a sample of size n is more than or less than the null model minimum or maximum using cubic spline interpolation.From the histograms shown in Figure 6(A), we see that the betweenness quartiles of 1999.01a are statistically larger than random samples of the same size from 1999.01, at the level of p < 10 −6 , which means the papers in 1999.01ahave significantly larger betweenness than other papers in 1999.01.
We also checked the statistical significance of the larger betweenness values of 1999.02a,against 10 6 random samples of the same length (144) from 1999.02.From Figure 6(B), we can derive that the quartiles of 1999.02a are only a little larger than the tails of the quartile histograms of the random samples, but their statistical significance is still at the level of p < 10 −6 .
1999.01 → 2000.02+ 2000.03 When a TC splits into two in the next year, we expect the links between two parts a and b in the TC to have thinned out to the point that the modularity Q of the whole is lower than the modularities Q a and Q b of the two parts.However, in general, we would not know how to separate the TC into the two parts a and b.Fortunately, for the 1999.01→ 2000.02+ 2000.03splitting event, we also know the part 1999.01a,which merged with 1999.02a,became 2000.03.Therefore, we might naively expect 1999.01b to be the part that split from 1999.01 to become 2000.02.If we test the quartiles of 1999.01b,against random samples of the same size from 1999.01, we find the histograms shown in Figure 6(C).As we can see, the betweenness quartiles of 1999.01b are quite a bit lower than the typical values in 1999.01,but this difference is statistically not as significant as the quartiles of 1999.01a.Thinking about this problem more deeply, we realized that while papers in 1999.01bhave no references in common with 1999.02,some of them do share common references with 1999.01a.Let us call these sets of papers 1999.01aα(papers do not have references in common with papers in 1999.01b),1999.01aβ(papers have references in common with papers in 1999.01b),1999.01bα(papershave references in common with papers in 1999.01a), and 1999.01bβ(papers that do not have references in common with papers in 1999.01a).In Figure 6(D), we learn from the histograms that the betweenness quartiles of 1999.01bα are indistinguishable with random samples of the same size from 1999.01.On the other hand, from the histograms in Figure 6(E), we find out that while the lower betweenness quartile of 1999.01bβ is indistinguishable with the random samples of the same size from 1999.01, its median and upper quartile are both on the low sides of the random sample distributions.This suggests a split of 1999.01 to (1999.01a+ 1999.01bα)+ 1999.01bβ .
Just to be safe, we also checked the betweenness quartiles of 1999.01aα and 1999.01aβ, against random samples of the same sizes from 1999.01.As we can see from Figure 6(F) and (G), the lower quartiles and medians are lower than those obtained from random samples, but the upper quartiles are decidedly higher.However, the difference between 1999.01aα and 1999.01aβ is not as obvious as difference between 1999.01bα and 1999.01bβ, one possible reason is the smaller sample size (17, 147 vs. 907, 778).Again, these results are consistent with the picture that the rise in betweenness in 1999.01a is driving  Although a small part split off from each of 1999.11 and 1999.12, the main event associated with the two TCs was a symmetric merging.Looking again into the relevant parts of the BCN, we found 299 out of 1014 papers in 1999.11coupled to 347 out of 988 papers in 1999.12, and we call them 1999.11aand 1999.12a,respectively.As we can see from the histograms in Figure 6(H) and (J), the betweenness quartiles in 1999.11a and 1999.12aare significantly higher than one would expect from random samples of 1999.11 and 1999.12.Simultaneously, the betweenness quartiles in 1999.11b and 1999.12bare significantly lower than in random samples of 1999.11 and 1999.12(see Figure 6(I) and (K)).Therefore, what we are seeing here might be the early warning signals of merging, as well as that of the asymmetric splitting.
1999.04 → 2000.06 and 1999.13→ 2000.16 So far we have learnt that a decrease in betweenness within a TC signals a possible split, whereas an increase in betweenness of the part of the TC coupled to another TC signals a merger between the two TCs.For this story to be consistent, we must not see these signals in the continuing events 1999.04 → 2000.06 and 1999.13→ 2000.16.However, if we go through the full BCN, we find that 370 out of 389 papers in 1999.04 and 308 out of 319 papers in 1999.13 are coupled to papers outside of these TCs, which suggests the possibility of merging or splitting.However, as we can conclude from Table 3, while the lower betweenness quartiles of the coupling parts of 1999.04 and 1999.13 with other TCs may be significantly larger than those of random samples of the two TCs, the highest betweenness quartiles are never significantly larger.Therefore, at the same level of confidence that we have set for the precursors of merging between 1999.01 and 1999.02,as well as between 1999.11 and 1999.12,we have to say that there is no significant precursors for 1999.04 and 1999.13 to merge with other TCs.What about splitting then?A TC is likely to split into two if at least one of two parts has reduced betweenness.We see in Table 3 that betweenness in the coupling parts of 1999.04 and 1999.13 are not significantly lower than those of random samples.Therefore, we look at the non-coupling part, i.e. papers in 1999.04 and 1999.13 which have no references in common with papers in other TCs, but they may have common references with papers in the same TCs.We call these non-coupling parts 1999.04band 1999.13b,respectively (the bottom row in Table 3).Only the top betweenness quartile of 1999.04bfalls below that of random samples from 1999.04 in Table 3.Therefore, the early warning for a splitting event in the next year is not strong enough.For 1999.13b, on the other hand, all three betweenness quartiles fall below that of random samples from 1999.13, even after we have accounted for the small size of 1999.13b.This suggests that the probabiiity of a splitting event next year is high, Table 3.The distributions of betweennesses of papers in 1999.04 and 1999.13 that share common references with the other TCs in 1999 (1999.00 to 1999.15).Four columns below '1999.04'and '1999.13'denote: the first column shows how many papers have common references with the other TCs, while the second, third, and fourth column show the lower, median, and upper quartile values of betweennesses of these papers, respectively.For example, there are 25 papers in 1999.04 that share common references with papers in 1999.03, and the betweennesses of these papers have a lower quartile value of 1.6 × 10 −5 , a median value of 4.3 × 10 −4 , and an upper quartile value of 8.1 × 10 −4 .Similarly, there are 254 papers in 1999.13 that share common references with papers in 1999.10, and the betweennesses of these papers have a lower quartile value of 3.6 × 10 −5 , a median value of 8.8 × 10 −5 , and an upper quartile value of 2.7 × 10 −4 .The bottom row 'b' represent 1999.04b and 1999.13brespectively, which are papers in 1999.04 and 1999.13 have no references in common with papers in other TCs.A betweenness value in red means that it is larger than the maximum of the corresponding quartile distribution of 10 6 random samples, and a betweenness value in blue denotes it is smaller than the minimum of the corresponding 10 6 random samples.but 1999.13 continued on to 2000.16, which thereafter continued to 2001 without merging or splitting.This might be because additional conditions, like the size of TC being large, must be satisfied before a splitting can occur. 12/24

Discussion
During the past two decades, researchers have made a lot of efforts to understand the system of science.Many problems are solved, however the understanding of interactions between different fields is still limited.Investigating the temporal network (BCN, CN) and their community structures, we are able to measure and quantify the complex interaction between different fields particularly in physics over time.Naturally, we would like to have a predictive power based on this picture.However, the correlation between network structure and evolution events is nonlinear and complex.Therefore we turn to machine learning techniques, which have shown a great power to solve predictive problems that are hardly to be solved using traditional statistical methods.To our knowledge, this is the first study that utilizes both machine learning and network science approaches to predict the future of science at the community level.
To be able to identify changes in TCs we needed to define time windows used for network creation and community detection.The natural choice for bibliographical data was the usage of single years, since the publishing process may last many months.Obviously, another granularity may be considered like multiple years, e.g. 2 or 5 years.In our approach, i.e. both for BCN and CN, every citation has the same importance.However, there are some other concepts like fractional counting of citations 30 .It assumes that the impact of each citation is proportionate to the number of references in the citing document.Additionally, it can be differentiated depending on e.g. the quality of the journal.For the CN we have calculated the similarity between groups in the consecutive time windows in two ways: (i) using the plain relative overlap measure, and (ii) using the inclusion measure based on Social Position.The idea was to enrich evolution data with the structural information occurring between the nodes.It turned out that both measures provided the similar labelling, but the evolution tracking with the Social Position information produced slightly better initial prediction.Therefore, the study was continued only for the inclusion measure,see SI for more information.
We decided to analyse more in-depth only on one feature describing structural profile of TCs, namely node betweenness.It was primary caused by the limited amount of resources and complexity of analyses.The entire process required much human assistance and could not have been easily automated.In our experiments, we utilized the raw, imbalanced or artificially flattened-balanced data sets.However, depending on the study purpose, we can bias some classes we are more interested in e.g.split.It can be achieved either by means of appropriate balancing-sampling for the learning set, or reformulating the problem into the binary question-is split expected (true) or not (false).As of now, the betweenness analysis is still limited to several case studies, in future a more rigorous framework will be desired.The idea of analysing science by discovery of knowledge changes is general and can be applied to all bibliographical data containing citations.We focus solely on APS journals, however, also papers indexed by PubMed, Web of Science or Google Scholar may be studied.

Methods
The entire analytical process consists of several steps that are primary defined by the Group Evolution Prediction (GEP) framework.First, the bibliographic coupling network (BCN) and co-citation network (CN) are extracted from the references placed in the papers from a given time window, see Figure 7, and this is carried out separately for each period.As a result, we get a time series of BCNs/CNs.Next, paper groups called topical clusters (TCs) are extracted using the Louvain clustering methods, independently for each BCN/CN in the time series.Each group is described by the set of predictive features.Having TCs for consecutive periods, we were able to identify changes in TC evolution using the Group Evolution Discovery (GED) method that appropriately labels the TC changes, see below.
Independently, the features ranking and its validation were performed to find the most valuable TC measures.Based on this ranking, a structural measure node betweenness was selected for the more in-depth studies as the early signal for splitting or merging.

GEP method
The Group Evolution Prediction (GEP) method is the first generic approach for the prediction of the evolution of groups 16 , in our case groups correspond to TCs.The GEP process consists of six main steps: (1) time window definition, (2) temporal network creation, (3) group detection, (4) group evolution tracking, (5) evolution chain identification and feature calculation, and (6) classification using machine learning techniques.Thanks to its adaptable character, we were able to apply it to the BCN and CN differently.For the group (TC) detection in both networks, we applied the Louvain method 31 .The group evolution tracking was performed with the GED method (see below), but we used different similarity measures for each network BCN and CN, see below.The set of features describing the group at a given time windows was adjusted to our networks, as some of the features defined in the GEP method were not applicable in our case.We also introduced some new, dedicated measures appropriate for bibliographical data, see SI for the complete list.Finally, we applied the Auto-WEKA tool to find the best predictive model and its parameters from the wide range of all possible solutions.The commonly known average F-measure was used as a prediction performance measure.

Bibliographic coupling network (BCN) and co-citation network (CN)
In the BCN and CN, nodes represent papers and undirected but weighted edges denote the bibliographic coupling strengths and co-citation strengths, respectively.That is, if two papers share w common references, the BCN edge between them would have a weight of w.For example, papers 1 and 2 in Figure 7 share three citations: A, B, and C, whereas papers 3 and 4 commonly cite only one paper-E.On the other hand, if two papers are cited together by w papers, the edge between them in the CN receives weight w .Papers A and B are cited together by two other papers: 1 and 2, but papers B and C by three, i.e. additionally by paper 3.Both BCN and CN are temporal networks, in which the nodes are all papers published within a specific time window (BCN) or papers cited within a given time window (CN).We assume that the reasonable time window for bibliographical data is one year to facilitate the analysis of changes in scientific knowledge, i.e. changes in topical clusters year by year.For the BCN, only the giant component, which in most cases occupies 99% of the whole BCN, will be considered for the TC detection and evolution analysis.For the CN, we do not use all papers cited in the given time window because most of them are cited only a small number of times, and thus they have little influence on the broader knowledge evolution.Therefore, we rank all available N papers p 1 , p 2 , . . ., p N in the descending order by the number of times they are cited in this time window (year): Next, we choose the top n papers p 1 , p 2 , . . ., p n , that totally gathered 1 4 of all citations, i.e. such that n < N is the smallest integer to satisfy ∑ n i=1 f i ≥ 1 4 ∑ N j=1 f j .The data we used in this paper is the APS data set, consisting of about half a million publications between 1893 to 2013 and six million citation relations among them 32 .

Intimacy indices
To analyse the evolution of TCs, we need to match them from the consecutive years.The set of cited papers to large extent overlaps year by year, so for the CN, we can use the regular approach proposed together with the GED method, see below and Brodka et al. 20 .For BCN, however, there is no overlap at all between papers published in the successive years because every paper can be published only once and in only one year.Even if we do not have the corresponding papers in TCs from two BCNs, i.e. two years, the papers' references overlap each another.Therefore, we can measure the similarity of their reference pools to reflect their inheritance.For that purpose, we introduced the forward intimacy index and backward intimacy index in Liu et al. 3 .The idea behind intimacy indices is that the references related to a particular topic change gradually.The forward intimacy

Correlation between overlap measure and inclusion measure
For BCN, we use the forward and backward intimacy indices to measure the closeness between TCs in consecutive time windows (years).For CN, we considered two types of measure: (i) a simple overlap measure of two groups (the relative fraction of common members), and (ii) an overlap of two groups enriched with the information about the importance of the common members.The latter is suggested by the GED method authors, who named their similarity measure the inclusion measure.One way to evaluate the importance of TC members is to use node centrality measures to rank them within the group.In our work, we are using the Social Position measure 23 (as suggested in the GED method), an idea based on the PageRank algorithm 33 .Saganowski et al. 34 found that using a richer similarity measure allows us to track group evolution more reliably.To better understand the difference between the simple overlap measure and the inclusion measure we compared values obtained with both measures in Figure S1.It turned out that the inclusion measure is on average 20% lower than the simple overlap measure, and the corresponding values, i.e. 30% for the simple overlap and 10% for the inclusion measure, produce roughly the same number of the evolution events.However, the more complex version of the similarity measure (i.e. the inclusion measure), provided slightly better initial prediction results.Therefore, we finally utilized the inclusion measure in our calculations for CN.1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008

The list of features used in the study
As we mentioned in Future events prediction, each observation contained 77 features (preselected from the initial 100).The full list of 100 features are showed in Table S1.Many features in this list are proposed for directed social network, therefore are inappropriate for our undirected BCN and CN.The symbol + indicates this feature was used in BCN prediction, while the symbol * indicates this feature was used in CN prediction.

Features group Feature name Feature description
Members/microscopic sum group degree in The sum of indegree 35 of nodes belonging to the community calculated within the community.Indegree is a node measure defining the number of connections directed to the node avg group degree in The average value of indegree of nodes belonging to the community calculated within the community min group degree in The minimum value of indegree of nodes belonging to the community calculated within the community max group degree in The maximum value of indegree of nodes belonging to the community calculated within the community sum group degree out The sum of outdegree 35 of nodes belonging to the community calculated within the community.Outdegree is a node measure determining the number of connections outgoing from the node avg group degree out The average value of outdegree of nodes belonging to the community calculated within the community min group degree out The minimum value of outdegree of nodes belonging to the community calculated within the community max group degree out The maximum value of outdegree of nodes belonging to the community calculated within the community sum group degree total+ * The sum of total degree of nodes belonging to the community calculated within the community.Total degree is the sum of indegree and outdegree avg group degree total+ * The average value of total degree of nodes belonging to the community calculated within the community min group degree total+ * The minimum value of total degree of nodes belonging to the community calculated within the community max group degree total+ * The maximum value of total degree of nodes belonging to the community calculated within the community sum group betweenness+ * The sum of betweenness 35 of nodes belonging to the community calculated within the community.Betweenness is a node measure describing the number of the shortest paths from all nodes to all others that pass through that node avg group betweenness+ * The average value of betweenness of nodes belonging to the community calculated within the community min group betweenness+ * The minimum value of betweenness of nodes belonging to the community calculated within the community max group betweenness+ * The maximum value of betweenness of nodes belonging to the community calculated within the community Continued on next page  The sum of indegree of nodes belonging to the community calculated within the network avg network degree in The average value of indegree of nodes belonging to the community calculated within the network min network degree in The minimum value of indegree of nodes belonging to the community calculated within the network max network degree in The maximum value of indegree of nodes belonging to the community calculated within the network sum network degree out The sum of outdegree of nodes belonging to the community calculated within the network avg network degree out The average value of outdegree of nodes belonging to the community calculated within the network min network degree out The minimum value of outdegree of nodes belonging to the community calculated within the network max network degree out The maximum value of outdegree of nodes belonging to the community calculated within the network sum network degree total+ * The sum of total degree of nodes belonging to the community calculated within the network Continued on next page  The average value of total degree of nodes belonging to the community calculated within the network min network degree total+ * The minimum value of total degree of nodes belonging to the community calculated within the network max network degree total+ * The maximum value of total degree of nodes belonging to the community calculated within the network sum network betweenness + * The sum of betweenness of nodes belonging to the community calculated within the network avg network betweenness+ * The average value of betweenness of nodes belonging to the community calculated within the network min network betweenness+ * The minimum value of betweenness of nodes belonging to the community calculated within the network max network betweenness+ * The maximum value of betweenness of nodes belonging to the community calculated within the network sum network closeness+ * The sum of closeness of nodes belonging to the community calculated within the network avg network closeness+ * The average value of closeness of nodes belonging to the community calculated within the network min network closeness+ * The minimum value of closeness of nodes belonging to the community calculated within the network max network closeness+ * The maximum value of closeness of nodes belonging to the community calculated within the network sum network eigenvector+ * The sum of eigenvector of nodes belonging to the community calculated within the network avg network eigenvector+ * The average value of eigenvector of nodes belonging to the community calculated within the network min network eigenvector+ * The minimum value of eigenvector of nodes belonging to the community calculated within the network max network eigenvector+ * The maximum value of eigenvector of nodes belonging to the community calculated within the network avg group coefficient 38

+ *
The average of the local clustering coefficients of all the nodes in the community avg network coefficient 38

+ *
The average of the local clustering coefficients of all the nodes in the network Group/mesoscopic group size+ * The number of nodes in the group group density 38

+ *
The number of connections between nodes in the group in relation to all possible connections between them group cohesion 39

+ *
The vertex connectivity of the community group coefficient global 38

+ *
The ratio of the triangles and the connected triples in the community group reciprocity 40 A fraction of edges that are reciprocated within the community group leadership 35 + * A measure describing centralization in the community (the largest value is for a star network) neighborhood out The number of nodes outside the community that have incoming connection from the nodes inside the community divided by the number of nodes in the community neighborhood in The number of nodes outside the community that have outgoing connection to the nodes inside the community divided by the number of nodes in the community Continued on next page  The number of nodes outside the community that are connected to the nodes inside the community divided by the number of nodes in the community group adhesion 39

+ *
The minimum number of edges needed to be removed to obtain a community which is not strongly connected alpha 20 The GED inclusion measure of group G i from time window T n in group G j from T n+1 beta 20 The GED inclusion measure of group G j from time window T n+1 in group G i from T n network ratio size+ * The ratio of group size to network size network ratio density+ * The ratio of group density to network density network ratio cohesion+ * The ratio of group cohesion to network cohesion network ratio coefficient global+ * The  The minimum number of edges needed to be removed to obtain a graph which is not strongly connected Table S1.List of all features used in the study.Features proposed in this study are shown in bold.

Figure 1 .
Figure 1.The alluvial diagram of APS papers from 1981 to 2010 for the BCNs.Each block in a column represents a TC and the height of the block is proportional to the number of papers in the TC.For clarity reason only TCs comprising more than 100 papers are shown.TCs in successive years are connected by streams whose widths at the left and right ends are proportional to the forward and backward intimacy indices.The colours inside a TC represent the relative contributions from different journals.

Figure 2 .
Figure 2. The prediction quality of classification results.The F-measure values for the imbalanced BCN (A) and CN (B) data sets, as well as the balanced BCN (C) and CN (D) data sets.The distribution of classes in the training sets are provided for each data set: A1, B1, C1, D1, respectively.For the imbalanced data sets, the classifier focused on the dominating continuing event.Balancing the data sets increased the overall prediction quality by over 20%.

Figure 3 .
Figure 3. Feature ranking.The most frequently selected features in 1000 iterations for the BCN (A) and CN (B) data sets.The context-based features (number of papers published in a given journal) turned out to be the most informative, followed by the microscopic structural measures, especially closeness, degree and betweenness.

Figure 4 .
Figure 4. Part of the BCN adjacency matrix for two TCs (red boxes) that ultimately merged.(A) No links between the two TCs at first.(B) Few links between the two TCs.(C) More links between the two TCs.(D) Many links between the two TCs, leading to their identification as a single merged TC (big red box) by the Louvain method.

Figure 5 .
Figure 5. (A)The adjacency matrix of the BCN associated with the TCs 1999.01 (top dense block) and 1999.02(bottom dense block).(B)The adjacency matrix of the BCN associated with the TCs 1999.11(top dense block) and 1999.12(bottom dense block).

Figure 7 .
Figure 7.The process of building a Bibliographical Coupling Network (BCN) and Co-citation Network (CN) from the citation bipartite network for a given period-year t.Both BCN and CN are undirected and weighted; the weights denote the number of shared citations (BCN) or co-citing papers (CN).Separate topical clusters are extracted for BCN (C 1 , C 2 ) and CN (C 3 , C 4 ).Nodes with numbers are papers from a given period being considered and nodes with letters are their references.

Figure S2 .
Figure S2.The alluvial diagram of APS papers' references (CNs) from 1981 to 2010.Each block in a column represents a TC extracted from the CN.The height of the block is proportional to the number of papers in the TC.For clarity, only TCs comprising more than 1% of all nodes in CNs are shown.TCs in successive years are connected by streams whose widths at the left and right ends are proportional to the relative overlap percentage.The colours inside a TC represent the relative contributions from different journals.

Table 2 .
The 25th, 50th and 75th percentiles of the betweenness of 1849 papers in 1999.01, the 164 papers in 1999.01a, the 17 papers in 1999.01aα, the 147 papers in 1999.01aβ, the 1685 papers in 1999.01b, the 907 papers in 1999.01bα, the 778 papers in 1999.01bβ ; the 344 papers in 1999.02, the 144 papers in 1999.02a, and the 200 papers in 1999.02b; the 1014 papers in 1999.11, the 299 papers in 1999.11a, the 715 papers in 1999.11b and the 988 papers in 1999.12, the 347 papers in 1999.12a, the 641 papers in 1999.12b.
the merging with 1999.02a,while the fall in betweenness in 1999.01bβ is driving a splitting inside 1999.01.1999.11+1999.12→2000.15 The scatter plots for simple overlap measure and inclusion measure for CNs between 1981 to 2010.The left panel (A) is for the alpha parameter, i.e. how the groups in t are close to groups in t + 1.The right panel (B) is for beta parameter, i.e. how the groups in t + 1 are close to groups in t.The sizes of circles are proportional to the number of instances.The red dash lines are y = x for reference only.Like the bibliographic coupling network (BCN), the co-citation network (CN) can also be visualized in the form of the alluvial diagram.The groups in a CN represent the papers from the past that are coherent and related to a certain topic that stimulates the present research lines.

Table S1
ratio of group coefficient global to network coefficient global network ratio coefficient average+ * The ratio of group clustering coefficient to network clustering coefficient network ratio reciprocity The ratio of group reciprocity to network reciprocity network ratio leadership+ * The ratio of group leadership to network leadership network ratio eccentricity+ * The ratio of avg group eccentricity to network avg eccentricity network ratio adhesion+ * The ratio of group adhesion to network adhesion phys rev * The number of articles belonging to the group that were published in the Physical Review journal phys rev a+ * The number of articles belonging to the group that were published in the Physical Review A journal phys rev b+ * The number of articles belonging to the group that were published in the Physical Review B journal phys rev c+ * The number of articles belonging to the group that were published in the Physical Review C journal phys rev d+ * The number of articles belonging to the group that were published in the Physical Review D journal phys rev e+ * The number of articles belonging to the group that were published in the Physical Review E journal phys rev lett+ * The number of articles belonging to the group that were published in the Physical Review Letters journal phys rev stab+ * The number of articles belonging to the group that were published in the Physical Review STAB journal phys rev stper+ The number of articles belonging to the group that were published in the Physical Review STPER journal physics * The number of articles belonging to the group that were published in the Physics journal Continued on next page

Table S1 -
continued from previous page Features group Feature name Feature description rev mod phys+ * The number of articles belonging to the group that were published in the Review of Modern Physics journal sum group age+ * The sum of age of articles belonging to the group.In the co-reference network the age of an article is the average age of the articles it references to.In the co-citation network the age of an article is the age of the articles being cited.avg group age+ * The average age of articles belonging to the group min group age+ * The minimum age of articles belonging to the group max group age+ * The maximum age of articles belonging to the group network ratio avg group age+ * The ratio of avg group age to the average age of all articles in the network time window+ * The number of time window from which the community instance was obtained