Interdisciplinarity metric based on the co-citation network

Quantifying the interdisciplinarity of a research is a relevant problem in the evaluative bibliometrics. The concept of interdisciplinarity is ambiguous and multidimensional. Thus, different measures of interdisciplinarity have been propose in the literature. However, few studies have proposed interdisciplinary metrics without previously defining classification sets, and no one use the co-citation network for this purpose. In this study we propose an interdisciplinary metric based on the co-citation network. This is a way to define the publication's field without resorting to pre-defined classification sets. We present a characterization of a publication's field and then we use this definition to propose a new metric of the interdisciplinarity degree for publications (papers) and journals as units of analysis. The proposed measure has an aggregative property that makes it scalable from a paper individually to a set of them (journal) without more than adding the numerators and denominators in the proportions that define this new indicator. Moreover, the aggregated value of two or more units is strictly among all the individual values.


Introduction
There is no consensus in the literature about the definition of interdisciplinary research (IDR) [1,2].As a consequence, numerous indicators only try to measure one of its dimensions.The concept of interdisciplinarity is related to academic disciplines as a synthesis of theories and methods.However, there is considerable ambiguity with the discipline concept and its delimitation [3].Historically, disciplines have been associated to the organization of teaching at universities.Nevertheless, nowadays the concept has become more general and also includes the creating new knowledge [4].Focusing on knowledge creation, Sugimoto and Weingart [3] claim that IDR can be analysed from the dimensions publications, people, and ideas.These three dimensions can be measure through information from publications in multidisciplinary bibliographical databases.
In the scientometric approaches, most measures of interdisciplinarity are based on disciplinary delineations respect to indexing and classification of publications and/or their journals, mainly based on the publication perspective suggested by Sugimoto and Weingart [3].However, with no conceptual consensus and plenty of dimensions, these IDR measures based on scientometric techniques has been interpreted in different ways [5,6].Moreover, the choice of different classification sets and methodologies produces inconsistent and sometimes contradictory results [7].Therefore, the current measurements of interdisciplinarity should be interpreted with caution in evaluative studies and science policies [8].
As indicated, numerous metrics have been propose for measuring interdisciplinarity, but only a few of them have used the network of citations, and no one the co-citation network.The aim of the present study is to define the unit's field without resorting to pre-defined classification systems.For this purpose we use the co-citation network.To define the field of a focal publication i, all publications co-cited with i are recognized.A publication j is co-cited with i if there is a third publication in which i and j are both cited.The publications co-cited with i are used to define the field of publication i.
Then we propose a new measure for the degree of interdisciplinarity and we analyse its properties.The aggregative property makes it scalable from a paper individually to a set of them (journal) without more than adding the numerators and denominators in the proportions that define the metric.Moreover, the aggregated value of the metric for two or more publications is strictly between the minimum and the maximum values of the metric for each one of them.

Interdisciplinary metrics based on the citation network
We focus our overview to interdisciplinarity metrics based on the publication dimension on networks (i.e., publications and their citation links) and the studies of inconsistent and nonrobustness.The nodes in a citation network are formed by some papers and those other papers cited by them, and the edges between the nodes mean a citation link (see Figure 1, left).
To measure the degree of interdisciplinarity of journals, Leydesdorff [9] proposes the betweenness-centrality (BC) index.BC measures the degree of centrality for a node located on the shortest path between two other nodes in a network [10].If a journal or a subject category (SC) is in betweenness other journals or SCs, its publications function as a communication channel for others and can be considered as interdisciplinary [11].Recently, Leydesdorff et al. [12] modify the Rao-Stirling diversity and found this new indicator correlates with BC significantly more than Rao-Stirling diversity.
Rafols et al. [2] propose a cluster coefficient (CC) for the degree of interdisciplinarity of a SC.They identify the proportion of references among SCs, and then weighted by the percentage of publications that each SC has over the total number of publications.However, previous measures of IDR are inconsistent and non-robust.These measures may be problematic when used in practice because the IDR is strongly dependent on the chosen measure [8].Furthermore, the choice of data and methodology can produce seriously inconsistent results [7].Then, the metrics of interdisciplinarity should be used wisely in evaluative studies and science policies [8].
This inconsistent and non-robustness of the IDR metrics based on the citation network has motivated us to propose a new methodology based on the co-citation network in the following section.

Characterization of a publication's field though the co-citation network: An interdisciplinarity metric
Figure 1 illustrates a simple example of the citation and co-citation network of a paper  (the focal paper).On the left hand side it is the citation network, where  denotes the paper 's degree or, in other words, paper 's number of cites.Identically it is defined  , with   ,  ,  .The parameter  , with   ,  ,  indicates the number of references of the -th publication.On the right hand side it is the co-citation network, defined as the projection of the citation network on the set of cited papers.Parameter  indicates the number of every paper's co-cites.We present a new metric, based on the relationship between the paper 's degree in the citation ( ) and co-citation network ( ), which is The equation (1) shows that paper 's degree in the co-citation network is equal to the sum of all references cited by the citing papers, excluding paper  itself, and subtracting duplicated links  .A duplicated link is produced when two citing papers include paper  and other paper  in the reference list.In this case, the connection between  and  is duplicated.This is represented in Figure 1 by the X-motif between citing papers 1,2 and cited papers 2,  .Thus,  represents the number of non-redundant X-motifs in the paper 's citation network 1 .For example,  1 in Figure 1, and therefore  2 1 1 1 3.
From (1), we have the following equation, Then, we can define a new metric for paper , starting from the following quotient, This parameter presents values in 0,1 and indicates the proportion of paper 's duplicated cocitations (alternatively, 1  indicates the proportion of non-duplicated co-citations).In order to compare measures among different papers, we need a normalized metric.To do this, we calculate the maximum value that this parameter can reach: which is lower than one.Then, the normalized metric is 1 Given two groups of papers  ,  ,  ,  and  ,  ,  ,  forming an X-motif,  ,  ,  ,  forms automatically an X-motif as well.We call the latter a redundant X-motif.We exclude them in the calculation of  .
Now, the metric has values in 0,1 .A  value close to 1 indicates that the citing papers recurrently refer to this paper in a group formed by the same papers.Thus,  value is an indicator of the paper insertion in a specific research area, defined here by those papers which are usually cited jointly.On the contrary, high values of 1  indicate that paper  is not usually referred with the same papers, what reveals that this paper cannot be inserted in a specific research area, being an interdisciplinary paper instead.Therefore, we define the Interdisciplinary Research Index as To illustrate the metric, we apply it to paper  in the citation network of Figure 1.We observe that there is one X-motif (  1 ) and three citing papers.We have that  0.25 .
Therefore, 25% of the co-cited papers are duplicated.Normalizing, we have that  0.25 0.375 and  1 0.375 0.625.Thus, paper  is 37. 5% representative of the field where it is inserted and has 62.5% of interdisciplinarity.
can be also used to define specific research areas, formed by those papers/journals with high  and cited together.

𝑋𝑀-metric for two papers jointly
Now we present an extension of the metric above to the case of two papers jointly.Assume two papers  and  .Every paper has its own citation and co-citation network.Figure 2 presents a simple example of this case.The citation network of paper  includes two X-motifs ( 2), while the citation network of paper  includes only one X-motif ( 1).Both papers have one X-motif in common, the one formed by citing papers 2,4 and cited papers ,  .The XM-metric for the two papers jointly is built by simple aggregation of the number of Xmotifs from the two papers.Thus, given the degree of papers  and  in the citation network ( and  , respectively) and the number of non-redundant X-motifs for the papers  and  ( and  , respectively), we define This is the mediant calculation of  and  .In general, given non-negative real numbers , , , , with  0, the mediant of the two fractions and is . An important property is the "mediant inequality", which indicates that the mediant lies strictly between the two fractions.Formally, if and , , ,  0, then .This property follows from the two relations 0, 0.
Therefore,  , in (7) is in between the two fractions and coincides with the arithmetic mean if and only if the denominators of the fractions, reduced to the simplest form, are identical.
Parameter  , is again defined in the interval 0,1 .The maximum value is the mediant of the maximum values of  and  .So, we define This metric satisfies a nice property.We can assure that the -metric for ,  is in between the -metric for  and  .If this were not so, we can prove after some calculations that necessarily In practice, it is expected that (11) is not fulfilled, since in co-citation networks where preferential attachment dominates [13], the relative number of X-motifs tends to zero as  increases.This means that for large enough  , fraction ∑ is a decreasing function of  , which is in contradiction to (11).Therefore, it is expected that the aggregated -metric defined in (10) is in between the two -metric for papers  and  .
We apply the metric to papers ,  in the citation network of Figure 2. Using the -metric for single papers in (5), we have that  0.6 and  ≃ 0.27.The metric for the two papers  and  in common is  , 0.42.Thus, the set of papers ,  is 42.2% representative of the research area where it is inserted and has a 58% of interdisciplinarity.

𝑋𝑀-metric for 𝑛 papers jointly
The generalization of the -metric to  papers ( 2) is direct by applying the generalized mediant to  fractions.Thus, given the set  ,  , . . .,  of  papers, the  -metric for these papers in common is defined as where  is the number of references of paper , which is one of citing papers of  .Using the same argument above and proceeding by induction, it is expected that  , ,..., is in between the two extreme values of -metric for the n papers.

Empirical application
In this cases study, 30 research articles published in 2018 are considered.As source of citations, the Scopus database is used.For the generation of the co-citation network, all the citations received for each of the 30 papers (up to the moment of this application, March 20, 2020) are considered.
These 30 papers are the most cited in five scientific journals (6 papers for each of the analyzed journals).The journals considered belong to the Library and Information Sciences subject category, and they were chosen trying to cover different sizes, according to the number of papers published in the year 2018, and different impact factors.This justifies that while one of the papers has cited 68 times so far, another has cited only 2 times.The aim of this is to show the application of the interdisciplinarity metric regardless of the number of citations available to generate the co-citation network.Notice that even with only two citations, it is already possible to generate the co-citation network.
The metadata for the identification of each paper (authors, journal, volume, number, and pages) are shown in Table 1.This table also includes the number of citations received, the number of nodes in the co-citation network, and the interdisciplinarity metric in percentage (IDRI x 100%) to facilitate its interpretation.As can be seen from the results obtained, the discipline considered is highly interdisciplinary, with values in the range from 71.6% to 100%.In 17 of the 30 cases, interdisciplinarity surpasses 90%.Half of the papers analyzed have an index higher than 91.5% (median) and the average is 90.5%.
As extreme cases in the range of variation, we comment two cases.The paper by Shepherd et al. (2018) has received three citations to date and the only common reference in these three documents is the mentioned one.This means that it has 100% variety in the co-citation network and its interdisciplinarity index is 1.On the contrary, the paper by Boyack et al. (2018) has received 41 citations to date, and among the references of said citing papers, a 71.6% of mismatched documents were found, representing an interdisciplinarity index of 0.716.
In this cases study, citations have not been limited to calendar years since the year of publication is quite recent in relation to the measurement of impact.The citation window could have been limited to the year 2019, but in that situation the number of citations would be less.However, in the case of analyzing authors instead of specific papers, this methodology allows adding all the papers of the same author, even from different years.If the analysis unit were the journal itself, it is also possible to add all the published documents in a specific year.

Conclusions
The relevance of the interdisciplinary research is well known.Many studies support its ability to solve complex problems and generate scientific developments and innovations [2,14].As a consequence, funding agencies in many countries are considering the promotion of interdisciplinary research as a priority [15].
However, there is a lack of consistency and validity in the interdisciplinarity measures in the literature.The degree of interdisciplinarity varies with the selection of the metric, the source of the data, and the classification system used.Hence, different methodologies will produce different interdisciplinarity degrees [8].Obviously, this generates a problem in research evaluation and science policy.
In this study we have proposed the co-citation network as a way to redefine the unit's field without resorting to a pre-defined classification system.The proposed new measure for the degree of interdisciplinarity is scalable from a unit individually (paper) to a set of them (journal), without more than adding the numerators and denominators in the proportions.Moreover, the aggregated value of two or more units is strictly among all the individual values.This important property of aggregability means that this new interdisciplinarity measure can also be applied at the meso (research groups, research centres) and macro levels (regions and countries).Note that as this metric is defined as a percentage, it is a relative value, so this indicator does not depend on the size of the unit of analysis [16] An important application of this methodology could correspond to quantifying the scientific impact of publications.This problem is relevant when comparing the impact of publications from different scientific fields.This requires the use of metrics that normalize by the different citation habits between fields.[17].In the citing-side normalization, each citation is weighted by the citation density of the citing field [18,19].For cited-side normalization, this is the case of the Relative Citation Ratio (RCR), the articles co-cited with the focal article are utilized in the generation of the reference set that represents the field of the focal article [20].However, this RCR has been criticized [21].In this sense, we think our metric could be used in a new methodology for cited-side normalization.

Figure 1 .
Figure 1.Citation (left) and co-citation (right) network of paper .The square box indicate those papers citing , while the circles show 's co-cited papers.The parameter  indicates the number of every paper's citations (cited paper 1 has  1, and similarly  2,  3,  1) and the parameter  denotes the number of paper's references (citing paper 1 has  3 references,  3,  2).The parameter  shows the number of every paper's co-citations ( 2,  2,  3,  1)

Figure 2 .
Figure 2. Citation network of papers  and  .

Table 1 .
Cases study with 30 research articles from six journals, with different size and impact factor, in the Library and Information Sciences subject category (Source of citations: Scopus)