Reconstructing Nonparametric Productivity Networks

Network models provide a general representation of inter-connected system dynamics. This ability to connect systems has led to a proliferation of network models for economic productivity analysis, primarily estimated non-parametrically using Data Envelopment Analysis (DEA). While network DEA models can be used to measure system performance, they lack a statistical framework for inference, due in part to the complex structure of network processes. We fill this gap by developing a general framework to infer the network structure in a Bayesian sense, in order to better understand the underlying relationships driving system performance. Our approach draws on recent advances in information science, machine learning and statistical inference from the physics of complex systems to estimate unobserved network linkages. To illustrate, we apply our framework to analyze the production of knowledge, via own and cross-disciplinary research, for a world-country panel of bibliometric data. We find significant interactions between related disciplinary research output, both in terms of quantity and quality. In the context of research productivity, our results on cross-disciplinary linkages could be used to better target research funding across disciplines and institutions. More generally, our framework for inferring the underlying network production technology could be applied to both public and private settings which entail spillovers, including intra- and inter-firm managerial decisions and public agency coordination. This framework also provides a systematic approach to model selection when the underlying network structure is unknown.


Introduction
Economic production often results from complex systems of inter-connected production processes, forming a unified network production technology. Data Envelopment Analysis (DEA) methods have long been used to estimate production technologies and measure relative performance. Network DEA (NDEA) models provide a generalization to assess the performance of complex systems, in which the expansion of networks in economics provides impetus for the search for new and more general models of the production process. Our application extends Daraio et al. [33] to estimate scientific knowledge productivity, offering a more general network model that accounts for the complexity of research production.
The paper unfolds as follows. In the next section, we illustrate the main features of the economic model. Section 3 presents the axioms of the underlying DEA models and their connection to those of the general NDEA model. Included is a schematic of the general types of structures of NDEA models. This section also introduces the Georgesçu-Roegen [22] flow and funds model (GRFF) and its connection with our NDEA model. Section 4 illustrates the connection between the statistical approach proposed and the GRFF model. Section 5 introduces the knowledge production network that we estimate, including a schematic of the possible cross-disciplinary links that our statistical second stage estimates can reveal. Next is a description and summary statistics of the data, followed by the outlines of the alternative parsimonious NDEA models we estimate in the first stage. A formal statement of the NDEA problem objective and constraints follows. Descriptive analyses of the first stage productivity models are included, followed by the main results of the application of the second stage to our knowledge production. The final section provides a discussion of our approach and results. We include two technical appendices: Appendix A contains an introduction to the Ising spin glass model while Appendix B provides additional technical details on the Pseudo-likelihood approach. We also include a more detailed summary of our data in Appendix C.

The Economic Model
The axiomatic production theory behind this paper and described in Section 3, is found in [1,34] and [35]. Färe and Grosskopf [3] introduce the concept of NDEA and extend the axioms to a network setting. Section 3.2 highlights the correspondence of the axiomatics of NDEA with the representation of the production process with flows and funds a la Georgesçu-Roegen. This correspondence yields a new, more general framework for modeling production processes by integrating the production process, information theoretic approaches to econometrics, machine learning and statistical inference from the physics of complex systems.
As described by Prieto and Zofio [4], NDEA within an input-output model (for an introduction and a deep overview see [36]) allows us to gauge potential productivity gains by comparing technologies corresponding to different "economies". Such models represent a network where different sectoral nodes use primary inputs (endowments) to produce intermediate input and outputs (according to sectoral technologies). In graph theory terms, in an input-output model, each sector (industry) is represented by a node and each flow of intermediate inputs and outputs is represented by a link. Hence, it is possible to optimize primary input allocation, intermediate production and final production using NDEA. This framework allows us to model the different sub-technologies corresponding to alternative production processes, to assess efficient resource allocation among them, and determine potential output gains that could be realized by reducing inefficiencies. In this setting, we use statistical inference to estimate the chains/path connections within and between nodes in order to reveal the underlying structure of the input-output system (see [36], p. 675).

Basic Axioms
NDEA is widely used in economics and operations research to assess efficiency and productivity in complex technologies or systems. As of 23 November 2020 Google Scholar identifies 173,000 articles identified under 'Network DEA' between 1996 and 2020. It is an extension of Data Envelopment Analysis (DEA) which was introduced by Charnes, Cooper and Rhodes [37]. As its name suggests, DEA envelops input-output data to identify the best practice 'frontier of technology' in the sample data. Individual data points are compared to that best practice frontier to determine relative performance.
A well-known issue associated with DEA is the curse of dimensionality: adding more inputs and outputs to the model requires more data to discriminate performance. DEA as an estimator has a slow rate of convergence, making statistical inference difficult. This difficulty is inherited by and compounded in NDEA. NDEA has the additional issue of inference concerning the structure of the more complicated model structure, which connects multiple subtechnologies. This can be structured in many ways, i.e., there are many potential models. This motivates our research question: is there a statistical way to infer the structure of the network model? And can we use that as guidance in model selection when we are forced to specify fairly parsimonious network models due to the lack of data and curse of dimensionality. Our contribution is the derivation and application of just such a statistical approach to structure inference and model selection for NDEA.
NDEA models are a generalization of the basic DEA or activity analysis models of technology and efficiency; they are often referred to as looking inside the black box technology assumed in more reduced-form DEA efficiency models. NDEA can be used to model production processes when the choice of inputs/outputs in one period affects what can be produced in subsequent periods. In addition, NDEA can be used to model production processes where intermediate products are produced in one stage of production and are then used to produce final outputs in another stage. They are also useful when production entails spillovers that can enhance (as in our knowledge production application) or detract from production by other producers/DMUs.
Following [1], we show that the axiomatic underpinnings are similar to those of the standard DEA-estimated technology. Notationally, for inputs, x ∈ G + , and outputs, y ∈ M + , we define the graph of technology, or production set, which relates inputs to outputs: Depending on the problem at hand, we can model the technology equivalently in terms of the input set: or the output set, For estimation, activity analysis (DEA) models generally employ a set of linear constraints on the inputs and outputs to construct the so-called piecewise linear frontier of the technology set, whether GR, L(y) or P(x), in accordance with the basic axioms of production theory (listed below and following [35]).
If these assumptions are satisfied, then following [35], Färe and Grosskopf [1] show that the basic activity analysis (DEA) technology, here specified as an output set: satisfies the axiom set below, which provides a minimal set consistent with neoclassical production theory. We note that the λ variables are so-called intensity variables which serve to 'construct' the piecewise linear frontier of the technology/output set. A typical DEA application using the constraints in P(x) above (1), seeks to maximize outputs for each DMU, subject to the input and output constraints based on the entire data set. This yields an efficiency score for each DMU, where a value of unity signals best practice performance. The basic production axioms include: The graph is a closed set.
These are minimal axioms consistent with neoclassical production theory. A.1 allows for inactivity, A.2-A3S describe feasible constraints on inputs and outputs, imposed through the respective inequalities imposed in the input and output constraints in P(x). A.4 requires that DMUs cannot produce unlimited output with given inputs, and A.5 requires that the graph technology set contain its boundary, which then serves to identify best practice.
In addition, it is often convenient to assume convexity of the input and output sets. The general result is that if each subtechnology in the network satisfies the Kemeny et al. conditions [38], then the network satisfies the axioms above. Similarly, if each subtechnology exhibits constant returns to scale, then the network also exhibits constant returns. We note that this holds for directed networks.

What Makes a Network?
We introduce the general structure of the network model with a figure first introduced by [40], which illustrates several types of networks. See also [41]. The box in the figure represents the basic DEA models with exogenous inputs x o entering the 'black box' producing outputs y 4 exiting the black box technology. Ignoring the interior of the box would be consistent with the DEA technology described by the linear constraints in (1). The network model allows specification of multiple processes or subtechnologies. Here we assume that there are three sub-technologies P 1 , P 2 , P 3 organized as in Figure 1, where outputs from P 1 and P 2 enter P 3 as inputs.
We extend this model to include a source, 'o', which distributes inputs to the network and a sink, '4', which collects the network outputs. The notation identifies the source of the variable with a subscript and the destination with a superscript. So x i o , i = 1, 2, 3 means that input x o is distributed to the three subtechnologies, and we have Similar notation is used for outputs, so y 3 2 means that outputs from P 2 are inputs into P 3 . The final output is with the appropriate choice of dimension of the output vectors. This schematic includes the possibility of parallel subtechnologies or processes such as P 1 and P 2 , as well as sequential sub-technologies which could be linked through time providing a basis for a dynamic network or supply chain, echoing the earlier work of Georgesçu-Roegen. A formal mathematical statement of the network problem we solve in our illustration is deferred to Section 5.

Connection with Georgesçu-Roegen's Flows and Funds Model
The network model analyses the joint actions of different activities within a process. Our theoretical framework allows us to analyze and represent production processes much such as the Georgesçu-Roegen Flows and Funds model (hereafter GRFF model) as subtechnologies which are connected to form the broader network via a maximum entropy condition. In this section, we show how the GRFF Model bridges the axiomatic of NDEA and estimation techniques based on complex systems.
NDEA models use the structure of networks to model production processes. Georgesçu-Roegen in the 1970s proposed a production model based on "organized elementary process" which can be in line or in parallel. We observe here that this production element is implicitly used in NDEA models. The "organized elementary process" of the GRFF model is the main ingredient or kernel of the axiomatics of NDEA introduced in Section 3 and of the transformation processes modelled in the NDEA literature. We think we are the first to point out the correspondence illustrated in Figure 2. Figure 2 contains three panels. The North-West panel shows the elementary unit of the Georgescu-Roegen production model, the so-called "organized elementary process" which can be of parallel or in-line production. The North-East panel illustrates the Network DEA models that we presented in the previous section, characterized by both parallel (P 1 and P 2 ) and in-line production processes (such as P 1 and P 3 or P 2 and P 3 ). The South panel shows two examples of processes modeled in NDEA models. The model on the left illustrates a two-stage production process in parallel while the model on the right shows a four-stage production process in line. As mentioned above, GRFF's "organized elementary process" is implied in the North-East and South panels and for this reason, in Figure 2, we reported in dashed form the arrows from the North-West panel towards the other two panels. In this way, we highlight how GRFF's "organized elementary process" is implicitly contained in the other two panels.
The schematic representation of NDEA and the possibility of including both parallel and sequential sub-technologies can be linked then to the GRFF model. Generally (see e.g., [42]), the model takes into account the actual characteristics of production elements and processes, such as, indivisibility, complementarity, tacitness and heterogeneity of productive knowledge. In the GRFF model, a flow is an input or an output that enters or exits from a process (for example, energy, water, software, loom, computer, etc.). A fund provides its services to several processes that occur over time (for example, worker, software, land, loom, computer, etc.). A distinction is made between the agents of production processes and the services that they provide. Activities consist of different operations which require the performance of one or more elementary tasks. An elementary task is an operation which, by definition, is not further divisible (for instance, loading or unloading an intermediate product or cutting a piece of fabric). The GRFF model can be implemented both at the microeconomic level, considering individual case studies, and at the macroeconomic level, analysing a set of production units in different sectors of activity. The GRFF model allows the analytical representation of the organization of production processes including the organization and time dimension of production processes. The formulation of the network production technology presented in Section 3 is an implementation of the GRFF model. The GRFF model may also be connected to the neo-Schumpeterian interpretative framework of production of new processes by means of creation and diffusion of knowledge [43], in which there is an interplay between capabilities, transactions and scale and scope to explain the boundary and the competitiveness of the analyzed units [44].

Maximum Entropy and Georgesçu-Roegen
The principle of maximum entropy serves as the foundation of the theory of inference [45], providing the statistical mechanics to reconstruct probabilistic information from incomplete data. Physical systems evolve spontaneously and possess stability characteristics at equilibrium, which is characterized by the value of maximum entropy. The key to the application of the principle (see [45]) is associating to a probability density function (pdf) an entropy function that measures the dispersion or uncertainty with which the occurrence of possible events are expected. This allows us to introduce constraints, based on our knowledge of the system, that can be treated with the formalism of Lagrange multipliers (see Section 4.2 for more details).
Generally, entropy may be interpreted as: (i) a measure of disorder in a system, (ii) a measure of our ignorance of a system, and (iii) an indicator of the irreversible changes in a system [46]. The Austrian school posits the economy as a complex system that is the outcome of uncoordinated individual behavior [47,48]. In such systems, equilibria do not always refer to a "stationary state" but instead are related to the concept of attractors. An attractor is a deterministic sequence of states which are cyclically visited by the system. As such, it becomes impossible to fully understand macro processes by examining individual behavior. Although representatives of the Austrian School had a skeptical regard to the use of mathematical tools in economics [49], their ideas can be expressed through the lense of statistical thermodynamics or the theory of information (see also [50]).
In chapter VI of his 1971 book [22], Georgesçu-Roegen describes the introduction of statistical mechanics and highlights the connection of economic processes with the second law of thermodynamics, i.e., the entropy law.
We draw the connections here between statistics, economic productivity, and physics of complex systems. As is well known, the correlation between two variables can be influenced by other confounding factors, and does not imply a direct causal effect of one variable on another. On the other hand, interactions or interdependencies refer to strict relationships between variables, allowing us to describe the impact of the variation of one variable on another. Economic productivity models can be used to analyze these interdependencies in a production process, as well as the interconnections between different economic sectors. Physics investigates the interactions between particles in order to analyze direct and reciprocal effects.
We finally highlight the fact that in information theory the maximum entropy problem can be reformulated as a minimum cost of coding, which is actually a function defined as the opposite of the entropy [51]. We aim to derive the level and the structure of interactions between disciplinary research productivities. We can think of this as an inverse problem, as inference of the underlying network is drawn from observational data [25]. Importantly, Georgesçu-Roegen [22] provides the theoretical support for the unknown model parameters, and justification of the assumptions underlying our statistical model.

Maximum Entropy Estimates
We define our variables as the vectors s i = (s Here and in the following bold style marks a vector quantity. Subscripts, e.g., i, indicate disciplines, whereas the superscript index γ = 1, .., D refers to a given Country. The observable s i depends on the observation time t, thus the set of data {s(t)} = {s 1 (t), ...s i (t), ..., s N (t)} can be defined, where t = 1, ..., T identifies a given realization of the generic set of variables or configuration {s}. To simplify the notation, where unnecessary, the index t will be omitted. The element of our variable, s (γ) i , in the case of our application to knowledge production (see Section 5.1), is related to productivity in the discipline i of the Country γ as follows is the productivity of country γ in disciplinary subject category i at time t. We let π i (t) represent the world-country average of productivity in subject i, so that s i = 0 and s 2 i = 1 N . We can use this formulation to account for the recent trend of increasing worldwide scientific productivity, considering deviations from the world-country average productivity, ∆  Shannon's [51] theorem states that entropy (S), defined in statistical mechanics, is a measure of the 'amount of uncertainty' related to a given discrete probability distribution p(s). Accordingly, S[p] is given by where K is a positive constant and p({s}) is the pdf (probability density function) of the configuration {s}. This quantity is positive, additive for independent sources of uncertainty and it agrees with the intuitive notion that a uniform (or broad) distribution represents more uncertainty than does a sharply peaked distribution. It is immediate to verify the latter observation in the one-dimensional case by considering Equation (4) and taking into account the property of the discrete distribution of probability, In making inference on the basis of incomplete information we must use that probability which maximizes the 'amount of uncertainty' or entropy subject to whatever is known [45]. This yields an unbiased assignment, avoiding arbitrary assumption of information which by hypothesis we do not have [45]. For a set of variables {s} the so-called empirical expected value of a given function of {s} is defined as the average of the function over the observed realization (the mean) of {s}, in the present case the average over time. Since some empirical expectation values can be measured, formally this means that p({s}) can be found as the solution of a constrained optimization problem, i.e., maximizing the entropy of the distribution subject to conditions that enforce the expected values to coincide with the empirical ones. We will refer to the quantities whose averages are constrained as 'features' of the system. For simplicity, we choose to observe the lower order statistics of the data which can bring information about the underlying network of interactions between variables, i.e., pairwise correlations. The features of the system we are considering are thus the two-variable combinations s i · s j , i, j = 1, ..., N, i = j. The optimization problem reduces to with the constraints where '< >' is the true average over the distribution p. The symbol '·' denotes the Hadamard product of two vectors. The first constraint accounts for the correct normalization of the pdf, whereas the second one arises from the required equality between true and empirical average of the above-defined features. Generally, if certain interdependencies are known to exist between the elements of the matrix S[p], constraints can be imposed to account for these interdependencies. However, instead of imposing a priori interdependency constraints we chose to infer them instead of assuming their existence. Solving Equation (5) with the constraints (6) leads to The chosen parameters, J ij , are symmetric, i.e., J ij = J ji (There is a link between the assumption of equilibrium underlying a Boltzmann-Gibbs distribution, and symmetry of the pairwise interactions. Symmetric couplings lead to a steady state described by the Boltzmann-Gibbs distribution while asymmetric couplings lead to a non-equilibrium state [53]. We can assign to the system a particular dynamics, which leads it to a given steady state distribution. Recent developments achieved for dynamical inverse Ising model [54,55] could represent an interesting extension of the present work, which is left for future research). The constant Z can be determined by exploiting the constraint The parameters J ij are determined by requiring the second constraint to be fulfilled. Asymmetric J ij can always be re-conducted to symmetric J ij , which give rise to the same values of pdf if this latter is the Gibbs distribution in Equation (7). We observe that the pdf defined in Equation (7) coincides with the Maxwell-Boltzmann probability distribution function at a given fixed temperature, related to an Ising model with spin s i , interaction parameters J ij and zero magnetic field, described by the Hamiltonian The quantity Z in Equations (7) and (8), constant with respect to {s} but dependent on the set of parameters {J} with generic element J ij , is called in this context the partition function.
The connection between maximum entropy and maximum likelihood is indeed well known (see e.g., [56]) and the pdf in Equation (7) which satisfies the constraint on the parameters J ij given in Equation (6) can properly be derived by searching the maximum of the so-called Likelihood function within the class of models of the Boltzmann distribution related to an Ising model with zero magnetic field. The Likelihood function is defined in the context of Bayesian inference (see e.g., [26]). By assuming that (i) each realization of the set {s} is drawn independently, (ii) the data have been generated by a (known) model, which depends on the set of (unknown) pairwise parameters {J}, one aims to find the optimal values of {J}, i.e., the ones which maximize the conditional probability p({J}|{s}). From the Bayes theorem [26] it follows that The probability p({J}|{s}) is called posterior, p({J}) prior, p({s}) evidence and p({s}|{J}) Likelihood.
If the prior is the uniform distribution, as we assume here, the most probable a posteriori set of variables is, as a consequence of Equation (10), the one which maximizes the Likelihood function. Under the further assumption that the Likelihood function belongs to the class of Boltzmann distribution functions, the so-called Log-Likelihood function can be defined as If one assumes that the system can be described by an Ising-like, pairwise interacting model (see Appendix A for an introduction and additional details) with zero external field, the Hamiltonian H is the one defined in Equation (9). We observe that in the definition of the Log-Likelihood function in Equation (11) the hypothesis of independency of the realizations of the configuration {s} at different times has been exploited, see point (i) above. Thus, the optimization problem reduces to choosing the set of parameters {J}, which maximize the pdf in Equation (11). A quick calculation of the first and second derivatives of Equation (11) with respect to the parameters J ij shows that the set of parameters which maximizes Equation (11) should indeed maximize Equation (5) with the constraint in Equation (6).
The Ising model has been widely applied in different fields, such as modelling the behaviour of magnets in statistical physics [57], image processing and spatial statistics [58][59][60], modelling of neural networks [61] and social networks [62]. It is, however, worth noting that by exploiting Shannon's theorem the Ising model does not arise from specific hypotheses about the underlying network but instead is the least-structured model consistent with the measured pairwise correlations. In Appendix A, we outline how the Ising spin glass model is introduced in the physics of complex systems. Table 1 describes the main components of our model and the correspondence between the Ising spin model from statistical physics and economic productivity analysis.

Statistical Physics Productivity Analysis
Generalized Multicomponent spin model Disciplinary productivity, defined in with arbitrary interactions network DEA in an input-output framework node variable or multicomponent spin: deviation from world-country average productivity Pairwise node interactions or couplings: J i,j pairwise interdependencies between productivity of different disciplines Hamiltonian: The J ij in physics measure a direct and reciprocal (mutual) effect (the interaction) of one entity on another entity (and vice versa). The concept of interaction in physics can find its correspondence in the interdependency in Input-Output economic analysis. The latter means the existence of a mutual influence between sectors (disciplines).
The coupling parameters J ij generate the configurations of the system that may be characterized by the correlations between the spin variables, the so-called overlap measures (see also [63]), defined as follows: where t = 1, ..., T is time. As it is well known, a correlation measures the association between two variables. It shows a tendency of one variable to change with some regularity when the other changes, but this tendency may be moderated (influenced) by other factors, and depends on the whole configuration, including indirect effects. Correlation does not mean a direct effect or relation. On the other hand, interactions or interdependencies (J ij ) refer to strict relationships between variables which allow us to describe the impact of the variation of one variable on another. Assuming this model to make inference permits us to consider correlations beyond the interdependencies among the units of analysis. As we will see in the application (see Section 5), the productivity of two disciplines may be correlated because they tend to be associated in their variation, but they may not interact. Here we impose J ij ≥ 0 without loss of generality. The J ij represent the interaction strength between i and j.
The higher the value of J ij the stronger is the interaction between i and j.

Maximum-Likelihood and Pseudo-Likelihood Estimates
Though the likelihood function definition has deep roots in information theory, Bayesian inference and statistical mechanics, as discussed in the Section above, the realization of an optimization algorithm able to draw the optimal {J} is hindered by the general intractability of computing Z({J}) and its gradient [18,26]. Hence, in place of maximizing the likelihood function, we may define and maximize a different objective function, the so-called pseudo-likelihood. It is possible to show [18,64] that the estimation of the parameters obtained by a pseudo-likelihood maximization is consistent with the maximization of the likelihood function, that is the two functions are maximized by the same set of parameters. This statement becomes exact in the case of infinite sampling [18]. We do not discuss in detail here how these results are achieved. We discuss only the guidelines, redirecting for details elsewhere (see e.g., [65] and the references cited in this sub-section). By its very establishment, the pseudo-likelihood function permits to solve the optimization problem avoiding the troubles related to the computation of Z({J}). It has indeed the advantage to be maximized in polynomial time. The pseudo-likelihood function is based on the so-called local conditional likelihood functions, p(s i |{s \i }) at each node of the network, s i , i = 1, ..., N. The symbol {s \i } means a set of variables s j with i = j. The local conditional probability (single variable pseudo-likelihood) at the i-th node is The local Hamiltonian H i (s i |{s \i }) = −s i · [ 1 2 ∑ 1,N i =j J i,j s j ] and the local partition function is The gradient of the log-pseudo-likelihood function with respect to the parameter J ij can be easily calculated, obtaining where '< > i,{J} ' states for ensemble average calculated for the pdf p(s i |{s \i }) with parameters set {J}. It is possible to rephrase it, obtaining where '< > {J} ' states for ensemble average calculated with the pdf in Equation (7) or Equation (8) with set of parameters {J}. The gradient of the log likelihood function is By comparing Equations (15) and (16)  Specifics on computation of log-pseudo-likelihood function and its gradient with respect to J ij , and details on optimization algorithm are reported in Appendix B.
The methods employed and the codes written to implement the related algorithms have been tested on the Ising model with known coupling coefficients on random graphs, thus guaranteeing the right convergence of the inference procedure and the proper reconstruction of the interaction network.

The Knowledge Production Network
We point out that our proposed inferential approach used to recover the broader network structure can be applied to different network models developed for diverse fields of applications. To illustrate its potential, we apply the method to the field of knowledge production. Taking into account the data available for the empirical analysis, we will work with a network that has the form shown in Figure 3.
In the context of the GRFF model we assume that accumulated knowledge from previous periods is a fund variable and new knowledge produced in the form of publications in the current period is a flow. Two fund sources arise from accumulated knowledge: knowledge that a DMU itself produced in previous periods, z, and spillover knowledge accruing from the knowledge (publications) that other DMUs produced in previous periods, Y. In turn, the flow of new knowledge produced by a DMU in the current period becomes part of the fund of own accumulated knowledge it can draw upon in a subsequent period and that new knowledge spills over and becomes a fund available to other DMUs in subsequent periods. Figure 3. An illustration of the network model. Each disciplinary productivity π i , which is a node in this network, includes the country-level set of disciplinary productivity π γ i with γ = 1, ..., D. Figure 3 illustrates the structure of the network that we will reconstruct using our new semi parametric approach in Section 5. This network models the interdependencies existing between disciplinary productivity/efficiency, π i , where disciplines i = 1, ..., N are the Scopus subject categories. The network nodes π i comprise the respective productivity measures, π 1 i , ..., π γ i , ..., π D i , for each country γ = 1, ..., D. We consider the main 53 world countries according to their scientific production. The productivity/efficiency of country γ in discipline i, π γ i (omitting the time t from the notation for an easier reading), is computed in a DEA setup through the Shephard output distance function for each country relative to the discipline-time specific technology (P γ,t i that will be introduced in detail in Section 5) as The reciprocal of the Shephard output distance function, 1/π γ i , measures the proportional expansion of observed outputs that could be achieved if the DMU were to become efficient. Figure 3 shows a network of disciplinary productivities π i , each of them composed by country-level productivity π 1 i , ..., π γ i , ..., π D i . For instance, the productivity of Chemistry π Chem is composed by the country-level productivity π Arg Chem , ..., π γ Chem , ..., π USA Chem , where 'Arg' stands for Argentina.
The disciplinary interdependencies or interactions are represented by the pathways, J ij . Importantly, these pathways are generally unknown, and must be inferred, although if knowledge of the interactions is known, constraints can be incorporated that account for those interactions. Example disciplinary interdependencies include the use of new computational methods from computer science by those in mathematics or the natural sciences; advances in neuroscience by those in medicine and nursing; new findings in environmental science by those in earth sciences and agriculture. Indeed, as we will see in the application, the present study illustrates an interdependency pathway between physics and the social sciences. Although it is theoretically possible to include the inputs/outputs of all disciplines in the technology, by doing this we would encounter two problems. First, assuming the homogeneity of all disciplines in their knowledge production, which is clearly not the case, and second, the curse of dimensionality would likely render all DMUs to be on the efficient frontier. Therefore, we first estimate productivity/efficiency for specific disciplines within a specific country in an NDEA model. Then, we use those productivity/efficiency estimates to infer the generally unknown connections between disciplines using the method described in Section 4.2.

Data and Descriptive Analysis
Our world-country bibliometric data were extracted from the Scopus database, for 16 disciplinary subject categories from 1996 to 2012. Data problems in bibliometric studies are well known. A common way to reduce them is to analyze macro-level bibliometric data. Comparative analysis is more reliable when the unit of analysis is more aggregated because in a larger sample size, micro random errors mutually compensate [66]. The potential for changes in coverage from inclusion or exclusion of journals to disproportionately affect smaller countries with fewer publications presents another issue of concern. This may lead to unreliable values when a country only has a small number of scholarly outputs [67]. To avoid this problem, we consider the 53 most productive countries (in terms of scientific productivity), which account for more than 95% of the world scientific production in the considered period. Luwel [68] and Aksnes et al. [69] report the main issues related to the integration of bibliometric data with other inputs data, in particular R&D expenditures. Methodological problems in measuring productivity at the macro level are mainly due to a lack of standardization in the measurement of resources and outcomes across countries. Moreover, the methodologies for collecting input and output data have been developed largely independently from each other. We attempt to bypass these issues of standardization and measurement by working with simpler quantity data on inputs and outputs for this paper. We restrict research inputs to the number of publishing authors, N A, and outputs to the number of publications, P, as well as the number of highly cited publications (top 10%), HCP. These indicators are the most known and commonly used indicators for the assessment of the contributions scholars make in their research publications to the advancement of scholarly knowledge [70]. Bibliometrics and quantitative studies of science are heavily related to these indicators [71]. The data on N A come from Elsevier Bibliometric Research Project (EBRP). In the elaborations carried out to estimate the interactions (J ij ) and infer the network structure, to increase the number of available data we transformed yearly data into weekly data by means of a linear interpolation. This leaves 833 observations of weekly total publications and highly cited publications for the 1996-2012 panel. Table 2 gives an overview of all the indicators available for the present study. The list of the Scopus 27 subject categories is reported in Table 3.   Table 3 reports in bold the 16 subject categories considered in the analysis and the variables available for each discipline. We excluded social sciences and humanities disciplines whose coverage of scholarly outputs in indexed journals is much lower than the other subject categories considered. Table 4 presents descriptive statistics for the selected disciplines, namely: BIOC, COMP, ENGI, MEDI and PHYS. Appendix C reports descriptive statistics on the other disciplines that will be analysed in the paper.

Models for Estimating Knowledge Production
Our network models borrow from previous work on knowledge production ( [6,30]), while the production variable choices are modeled after Georgesçu-Roegen, where we include both flow variables and funds variables. We think of knowledge production in a production axiomatic framework, in which we include author count as a flow type input variable, and cumulated previous own publications as a fund or knowledge stock, which produces a flow of publication outputs.
What makes it a network is the fact that the cumulated publications of a discipline in one country are available to other countries in the same discipline, and previous cumulated publications from other countries in that discipline also are available to the discipline in the 'home' country as a fund-type input variable. This specification proxies the public good/externality nature of publications as well as their role in contributing to the stock of knowledge. Table 5 outlines the examined basic (static) and network productivity models, which we separate into two categories: Quantity (1) and Quality (2). The first category represents 'quantity of knowledge,' using raw publication counts (P) to measure final knowledge output. Within this category, the basic model (1.1) uses the stock of previous publications within each country and the number of authors (N A) as the knowledge inputs. The corresponding network model (1.2) also includes the stock of previous publications from other countries as a knowledge spillover input. The second category represents 'quality of knowledge,' using the number of highly cited publications (HCP) in place of raw publication counts, for both previous publication inputs and current publication final outputs. While citations are widely used to indicate publication quality, we note that there are several other potential alternatives, including SCImago journal rank (SJR) (Lee et al., 2016), peer reviews [72], and restricting to top-tier journals [73]. Quality models 2.1 and 2.2 serve as the quality-adjusted versions of quantity models 1.1 and 1.2, using the HCP approach to distinguish quality. We add a simple productivity model given by the ratio of publications to authors (P/N A) as a baseline.  Figure 4 illustrates the network for the hypothetical case of two countries, γ and γ , both for discipline i in period t. Their flow inputs are the author counts denoted by x and their final outputs are denoted by y. The previous publication fund variables that provide the network connection are denoted as Y and the own previous fund variables are denoted by z.
We will augment their model by including disciplines i = 1, . . . , N, here N = 16. Denote flow input as x γ,t (here a scalar, but possibly a vector). Fund variables include own country cumulated past publications denoted as z γ,t i , and other country cumulated previous publications as Y γ,t i . Final output is own country current period publications denoted as y γ,t i . Again, following [6], we define the own country fund variable z as the sum of the previous 3 periods' publications for that country, which we denote as z τ=1 ∑ D γ =γ y γ ,t−τ to represent spillover knowledge from other countries' previous publications.
We can now state the formal problem we are estimating to solve for efficiency in the NDEA model. As explained in Section 5.1, we employ the Shephard output distance function as a performance measure for each country, i.e., it is the explicit objective function. The distance function for country γ in discipline i, π γ i , is defined as This objective function scales observed country output to the frontier of the output set P γ,t (.) and takes a value of unity for a country on the frontier and a value less than one for a country that is below the frontier. Using DEA (see Section 3.1), the reference technology (output set) for period t, serves as the constraints in our problem: i are intensity variables that form the best-practice frontier technology from the observed inputs and outputs. Equation (20) is the constraint with respect to own country current output (publications), Equation (21) is the input flow constraint, Equation (22) is a fund constraint with respect to own cumulated previous publications and Equation (23) is a fund constraint with respect to other country cumulated publications-the spillover constraint. We solve this problem for each discipline in each country using linear programming. These solutions are the data for the second stage which estimates the correlations Q and the parameters J, as described below.
As described in Section 4.2, the statistical analysis of the knowledge production we propose here is developed in the framework of information theory, where through the well-known Shannon theorem, the entropy is introduced and defined. We can measure the so-called pairwise correlation functions or overlaps from the collected data. Assuming that we only get this information, by following the line of the Shannon theorem it is possible to find the class of the probability distribution (which will depend on some adjustable parameters, the set J), which maximizes the entropy with the constraint that the pairwise correlations obtained from such probability distribution match the measured ones. In this way the probability distribution (see Equation (12) with the Hamiltonian of the so-called Ising model (see Equation (13)), and also Appendix A), is directly obtained by maximizing the entropy introduced in the Shannon theorem with the aforementioned constraint, without any further unverifiable (thus arbitrary) hypothesis. In this respect, the entropy introduced in the Shannon theorem represents a measure of the 'amount of uncertainty'. For example, limiting to the case of a discrete probability distribution, if any constraint is superimposed (i.e., any a priori knowledge is available), the entropy function introduced by Shannon is maximum for a uniform distribution. This agrees with our intuitive notion that a broad distribution represents more uncertainty than a sharply peaked one. Figure 3 is a sketch of an Ising model whose variables are defined in the knowledge production framework. It is thus linked to the entropy defined from information theory because the adoption of the Ising model is a direct consequence of the maximum entropy principle, as discussed above.

Results
We estimate our two models for each of the 53 countries and 16 disciplines, reported in Table 3. To summarize the results, we present the annual output share-weighted geometric means of the productivity values by discipline, for the alternate versions of Models 1 and 2: Total publications (basic and network, 1.1 and 1.2) and highly cited publications (basic and network, 2.1 and 2.2). For comparison with our model estimates we include the simple ratio of publications to authors (P/N A). The rank order correlations of P/NA and estimated production efficiency are reported in Table 6 for selected disciplines: Computer Science (COMP), Engineering (ENGI), Medicine (MEDI) and Physics (PHYS). Not surprisingly, the lowest correlations occur for the basic DEA quantity model (1.1) and the NDEA quality model (2.2). In contrast, the two quantity models (1.1 and 1.2) and the two quality models (2.1 and 2.2) tend to have the highest correlations in each of the disciplines. We are interested in studying the interdependencies across disciplines as well as within disciplines at a macro or country level. The interactions between knowledge production efficiency may differ by type and magnitude, as well as their ultimate effects on the associated disciplines and countries. Our NDEA estimates provide within disciplinary connection across countries, but not across disciplines.
For instance, collaboration between researchers in two disciplines, say physics and health can mean that gains in efficiency in one discipline are reinforced by gains in efficiency in another discipline. Interdisciplinary research requires that the researchers in each discipline to learn the vocabulary, terminology, the notation, the ideas, and so forth of other disciplines. Those disciplines and researchers who can overcome the transaction costs of learning the vocabulary, terminology, etc. can expand this more general/interdisciplinary knowledge. Our method identifies those disciplines where such gains are possible. In contrast, if interdisciplinary transaction costs are too high, as might occur when researchers come from two very different disciplines, then gains in efficiency in one discipline may serve to lower efficiency in another discipline.
There may also be cases in which two or more disciplines with different levels of production efficiency interact such that the more productive disciplines slow down while the less productive disciplines increase their productivity. For this reason, analyzing the correlations or indirect connections (the overlap measures Q ij introduced in Equation (12)) between (disciplinary) efficiency levels can be interesting. However, going further and estimating their interdependencies (J ij ) can provide more useful information to analyze the way in which scientific knowledge is produced and organized worldwide. This information could be useful for policy makers who determine which disciplines or topics to prioritize and how to distribute research funds among disciplines. Figure 5 shows the estimated interdependency parameters (J ij ) -left panels-and the inferred networks -right panels for the three production efficiency models. Top panels refer to the simple productivity model (P/N A), middle panels show the Model 1.2 (network quantity model) and the bottom panels report Model 2.2 (network quality model) results. In the left panels of Figure 5, the darker squares indicate higher J ij and the NW to SE diagonal comprises all white squares since J ii = 0 by hypothesis. The reconstructed networks reported in the right panels of Figure 5 are derived from the estimated J ij obtained by the maximization of the pseudo-likelihood function. The J ij are the edges. The diameter of the node for discipline i is proportional to the number of interactions J ij . The thickness of the edge depends on the intensity of the related interaction. Figure 6 shows the calculated overlap measures (Q ij ) and the estimated interdependencies (J ij ) for the different productivity models. The methodology introduced in Section 4 hence allowed us to empirically infer the network structure existing among disciplinary productivity going beyond the simple overlap (or correlations) measures (Q ij ).  Table 5 for model specifications.
For instance, we can analyze the correlations and interdependencies between the productivity of disciplines CHEM with IMMU and MATE. In the basic quantity model ( Figure 6-top panel) CHEM and IMMU and CHEM and MATE show the same overlap measure Q ij = 0.16, meaning that their production efficiency tend to be positively associated. On the other hand, their respective interdependencies are different. In fact, J ij between CHEM and IMMU is zero, while the interdependency between CHEM and MATE is 0.970, meaning that the productivities of CHEM and MATE present a high level of interdependency (mutual interaction).wae Similarly, the correlations (indirect connection) between PHYS with IMMU and MATE are respectively 0.10 and 0.12, while their interdependencies are respectively 0 and 0.49. This means that PHYS interacts with MATE, but not with IMMU, although the respective production efficiencies are correlated.
Inspecting the middle and the bottom panels of Figure 6 we note that the interaction (J ij ) between PHYS and MATE is 0.79 in the quantity model (1.2) and 0.52 in the quality model (2.2) while the respective correlations are 0.15 and 0.1.
We can conduct a comparative qualitative analysis between the three different models to explain the results and how the technique works. We solved an indirect problem to estimate the J ij values for each dataset and each model separately, so the inferred network structure is the optimal structure for each dataset.
For instance, the values of J AGRI,PH AR in the three panels of Figure 6 are 0.000 (top), 0.471 (middle) and 0.475 (bottom). These values indicate there is no interdependency between AGRI and PHAR in the simple productivity model (P/N A) as the score is zero, while the interdependencies between AGRI and PHAR are much stronger in the quantity and quality models in which the knowledge production assumes as inputs N A, own previous publications and other previous publications and as output own current publications (Mod. 1.2) while Model 2.2 considers the same inputs/output than model 1.2 but uses the number of Highly Cited Publications (HCP). Similarly, J MATH,PH AR is 0.013 according to the simple productivity model, showing a weak interaction among the two simple productivities estimated by the number of publications per author, while there is no interaction between the productivities of the two disciplines if we measure them according to Model 1.2 and 2.2 (see Table 5) as the J values for MATH and PHAR in middle and bottom panels are zero.
We can also compare the simple productivity model (P/N A) with the quantity models (1.1 and 1.2) to consider any possible effects posed by the curse of dimensionality, given our sample size and model dimensions. The overall results are quite similar: the only differences are due to the DEA modelling and so we should prefer the results of the Model 1.2 which accounts for the Georgesçu-Roegen's fund's modelling of the knowledge production.
We then compared the quantity vs. quality results for the network models (1.2 and 2.2), to see whether the obtained estimates were consistent. We found for instance that the interaction between Physics (PHYS) and Computer Science (COMP) in the quantity model (1.2) is 0.033 while their interaction in the quality model (2.2) increases up to 0.172. In contrast, the interaction between Physics and Chemistry that is 0.186 in the quantity model goes down to zero in the quality model.
As we may expect, the interactions between CHEM and CENG are quite high in all the three models (0.466 in the simple productivity model, 0.785 in Mod. 1.2 and 0.896 in Mod 2.2); other expected results are the high interactions of COMP with ENGI and MATH because these disciplines share the same community. A striking result is the interactions we observe between PHYS and MEDI which is quite high in the Quality model (Mod. 2.2, with a value of 0.303) but absent in the Quantity model (Mod. 1.2). A policy implication of this result could be made in the discussion about supporting Societal Challenges that are focused on Medical Sciences and the important role that Physics could play in this context.
The results commented in this section show the usefulness of the analyses carried out in the previous section to shed deeper and new light on the interactions among disciplinary productivity that we would not have been able to derive if we had reduced the analysis to the simple productivity model (P/N A). Differently from existing bibliometric literature [74,75] we consider not only the outputs of scientific production, but also their efficiency in knowledge production. In addition, differently from existing efficiency literature [6] we estimate the interdependencies among disciplinary efficiency thanks to the application of the inferential approach proposed in Section 4.

Discussion and Conclusions
Network Data Envelopment Analysis (NDEA) models are often used to measure producer performance when separate production technologies are linked across divisions, time, and spillovers can occur between different producers. In theory, NDEA models can accommodate complex, multi-product technologies with inputs used to produce intermediate products and final outputs. However, in practice, the degree of complexity a researcher can introduce is limited by the number of available observations used to construct the technology in what is known as the curse of dimensionality. It is also not obvious how to test for the appropriate structure of the network. In this paper we address these issues with a new theoretical method based on Shannon's entropy that allows us to infer wider linkages between various producers without having to specify those links within the NDEA model. Our method specifies a NDEA technology and provides nonparametric estimates of producer performance relative to that assumed technology. Then, in a second stage, we employ a semiparametric Bayesian framework that allows us to estimate, rather than assume, the network structure. This second stage exploits advances in the physics of complex systems, machine learning and econometrics of information and reveals additional linkages in the network-in our case-allowing us to infer connections between knowledge disciplines.
While we consider our main contribution to be providing an inferential method to identify structure in NDEA models, we consider our application to knowledge production to be of interest in its own right. The economics of science [76] reminds us that researchers do research for different reasons, including their interest in "puzzle solving", reputation based on the priority of their discovery, awards and recognition for their achievements, and also through publications which can play a key role in funding and promotion. Research is also a public good, generating knowledge spillovers that can be difficult to capture and quantify. Like any public good, this can lead to underprovision in the market.
The economics of science tells us that the production of scientific research involves multiple inputs, including knowledge, time, materials and equipment. Some inputs are embedded in people (knowledge and time in particular), and most of these inputs are expensive. As observed by Stephan [76], incentives and cost matter for science and economics, particularly for shaping the most efficient mix of resource allocation across disciplines.
In this paper, we combine concepts from economic production theory, Bayesian statistics, and the physics of complex systems to infer the cross-disciplinary and cross-country interactions of research activities. Such understanding is key to achieving the efficient mix of resources for research. The NDEA estimates can be used to derive correlations between disciplines and those same NDEA estimate can be used in a second stage to infer interdependencies between disciplines. For instance, controlling for the quality of publications, math and medicine exhibit positive association, but zero interdependency. In other cases, such as physics and math, there are relatively low correlations of productive efficiencies between the two disciplines, but a high interdependency. We find non-trivial interactions in many cases, which seems promising for future work in this area. Our framework and results could be of particular interest to policy-makers and agencies tasked with prioritizing research funding areas. For instance, the relatively high interaction between physics and medicine might suggest the need to include topics from the physical sciences in new funding for medical research.
This approach could also be applied to other knowledge network systems beyond academic research, such as innovation and technology advance for industrial processes. For instance, a similar framework could be used to estimate the interdependencies of structural industrial profiles, in the form of industrial value added, and structural innovative/technological profiles, based on patents. Inference of the underlying network topology could be used to target research and development funds within the firm, and target public investment decisions.
Summing up more generally, the proposed statistical inferential framework may be applied in a variety of productivity network problems, to infer the underlying structure of the network.
The framework developed here is based on complex systems behavior modeling and estimation. There are many possible extensions, left for further studies, including the implementation of out-of-equilibrium time-dependent Ising model. Funding: The financial support of the European Union H2020 Project RISIS 2 (Grant agreement N. 824091) and of Sapienza University of Rome (through the Sapienza Awards no. PH11715C8239C105 and no. RM11916B8853C925) is gratefully acknowledged.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript:

DEA
Data Envelopment Analysis NDEA Network Data Envelopment Analysis

Appendix A. The Ising Spin Glass Model
The Ising spin glass model is made up of a lattice, where each node of the lattice is associated with a vector variable s i at the i-th site in the D-dimensional space, that represents the spin of a particle. See Figure A1 for an illustration. Figure A1. Illustration of an Ising Model. J ij > 0 correspond to ferromagnetic couplings; J ij < 0 correspond to anti-ferromagnetic couplings.
Using the relevant Ising and thermodynamic terminology, we can develop a simple spin model to describe the stationary states of our system. For a given couple i and j the 'energy' unit is given by whereas the Hamiltonian is The parameters of the system are the intensity of the external magnetic field h i , the pairwise interactions J ij , β = (k B T) −1 . The Hamiltonian in (A2) describes an Ising model, originally introduced to study the behavior of ferromagnetic systems. The Ising model, when used in the study of different systems, e.g., productivity network, can thus also account for a site-independent weight, β, and external biases, h i . For the sake of simplicity we fix here β = 1 and h i = 0, ∀i ∈ (1, N). J ii = 0, J ij = J ji . The total energy of the system is E =< H >. If the system is in equilibrium at a given 'temperature,' T, then the energy distribution of the units follows the Boltzmann law, given by where Z is the partition function introduced in Section 4.2. The component e −E/k B T is known as the Boltzmann factor and k B is Boltzmann's constant.
The model assumes ergodicity of the system, i.e., the time average of functions on our spin variables equals the average of these same functions over their probability distributions. This hypothesis is commonly assumed for processes that involve human systems, including time series econometrics.

Appendix B. Pseudo-Likelihood Approach to Inverse Inference Problem
We report in the following specifics of the objective function used in the optimization algorithm to infere the set of parameters {J}, i.e., the Pseudo-Log-Likelihood function, and of its gradient. The fact that the gradient of the Log-Pseudo-Likelihood function can be calculated exactly makes the computational solution of the inference problem faster and more reliable. By relying on Equations (13) and (14)   The gradient of the Log-Pseudo-Likelihood function is given in Equation (15). To calculate the gradient of the Pseudo-Log-Likelihood with respect to the set of parameters J ij we thus need to calculate the quantity < s i · s j > i,{J} . It is By rephrasing the expression of Z i reported in Equation (A5) we obtain . Inserting this latter expression in Equation (A7), we get and finally (the proportionality constant for Z i ({J}) and < s i · s j > i,{J} is the same) To deal with a lower number of parameters in place of maximizing the Log-Pseudo-Likelihood function, given by the sum of the single-node Log-Pseudo-Likelihood functions (Equation (14)), each single-node Pseudo-Log-Likelihood function is maximized. Since the couplings in the Ising model should be symmetric the final estimate of the J ij parameter is obtained by taking the average (J ij + J ji )/2.