Economies of Scope between Research and Teaching in European Universities

: The estimation of economies of scope between research and teaching has been the object of a large literature in economics of education and efﬁciency analysis, with parametric and nonparametric speciﬁcations. The paper contributes to the literature by building a pan-European dataset that integrates ofﬁcial statistics on higher education at country level with bibliometric indicators. The dataset allows a breakdown by scientiﬁc and educational ﬁeld, accounting for the heterogeneity among disciplines. We applied a technique which has not been used for the efﬁciency estimation of economies of scope in higher education, namely seemingly unrelated regression (SUR) applied to separate input–output equations describing the production of education and research. We found conﬁrmation for economies of scope in some ﬁelds and with some speciﬁcations, or no relation between the equations. In no case did we ﬁnd diseconomies of scope between teaching and research.


Introduction
Universities can be considered multi-input multi-output institutions, which combine heterogeneous inputs and deliver several outputs according to their institutional mission [1,2]. An important research question is which are the relations between different types of outputs. Traditionally, the outputs considered are placed under the umbrella of the Humboldtian missions of universities, namely teaching and research. The question becomes whether universities, given their inputs, can optimize jointly the outputs of teaching and research, or rather experience tensions and tradeoffs between these outputs. The analytical framework that can be used to address this issue is the theory of multiproduct organizations, and particularly the notion of economies of scope. Organizations operating under economies of scope are more efficient when they produce several outputs jointly than it would be by producing the same outputs with separate production units.
One reason for the large interest in the topic is that there are strong, perhaps destabilizing, implications from the results. If the empirical findings support the existence of economies of scope between research and teaching, then there is no room for putting the traditional Humboldtian university model under discussion. If, on the contrary, the two activities show rival or antagonistic efficiency conditions, then it would be advisable to consider different institutional models, with a partial or complete separation between the two activities. (The issue of relations between research and third mission is the object of a huge literature, which we cannot review here. Much less explored is the relation between teaching and third mission. See [3] for a pioneering paper on the impact of third-party funding to research on quality of education in engineering).
We contribute to the literature on economies of scope in universities in three ways. First, we used a large dataset covering all European countries and considering the complete census of higher education institutions (HEIs) based on ETER data. ETER (European Tertiary Education Register) is the census of higher education institutions validated by National Statistical Authorities (NSAs) that publishes individual institution-level microdata. It has been noted that most of the existing studies are based on country-level data [4][5][6]. A few recent studies take a cross-country perspective. Given the institutional embeddedness of higher education it is important to validate results with larger samples. As noted recently by Agasisti and Johnes, "further work on international data should be a priority for future research" [7] (p. 80). Thus, while the issue of economies of scope is not new in the literature, our study is based on much larger evidence than usually provided in the literature. (For the sake of completeness, some empirical research has also been devoted to examine economies of scope between different types of education, namely on-campus face-to-face and distance off-campus education. See [8] on conceptual issues and [9] on empirical estimation of economies of scope.) Second, we offer a breakdown by discipline by disaggregating HEI-level data into broad disciplines using the field of education (FoE) classification-focusing on science (FoE05), computer science (FoE06), engineering (FoE07), agriculture (FoE8), and medicine (FoE9)-and re-integrating data on scientific publications using a correspondence between educational fields and subject categories of journals. This is another advancement with respect to the state of the art and adds generality to the empirical findings. The choice of these disciplines is dictated by limited data availability. It is well known that bibliometric data based on indexed journals do not cover appropriately social sciences and humanities (SSH). Since we have publication data based on Scopus, our results should be interpreted as showing economies of scope in the covered disciplines only. To the best of our knowledge, no empirical analysis on economies of scope in SSH disciplines is available, due to the lack of comparable data (for example due to the large use of books and the adoption of national languages in large part of SSH).
Third, we experiment with a methodology that has not been widely adopted in efficiency analysis in higher education, specifically Seemingly Unrelated Regressions (SUREG), to estimate economies of scope between education and research through the specification of Translog production processes. The SUREG approach is complementary to those more frequently used for the empirical estimation of economies of scope [10].
It is worth noting that in our model we focused exclusively on the relation between research and education. We must recognize that universities do not limit themselves to these traditional missions, but add the so-called third mission, which is the commercialization of research and the active engagement in socially relevant activities. As a matter of fact, there is still considerable uncertainty on the indicators of Third mission that might be used in a cross-country analysis. The most comparable indicators, such as patents and trademarks, are heavily biased when referred to universities. Other indicators of public engagement are simply missing. Therefore, while recognizing the importance of this issue, we focus only on the first two missions. As a matter of fact, there are no published studies that put together, in a cross-country perspective, indicators of education, research, and third mission. This is clearly a promising area for further research.
The remainder of this paper is structured as follows. Section 2 discusses the background literature, Section 3 illustrates the data, Section 4 discusses the methodology, while Section 5 examines the main findings. Additional empirical results are reported as Supplementary Materials. Finally, Section 6 discussed the findings in the light of the policy debate about the future of universities in Europe.

Economies of Scope between Teaching and Research: Background Literature
A large amount of literature has examined economies of scope in the framework of efficiency analysis, using multiproduct cost functions or data envelopment analysis [1,11,12].
The empirical findings on economies of scope between teaching and research are contradictory. The pioneering studies of [13,14] found economies of scope between teaching and research, while [15] confirmed this finding but only up to a certain size. Other studies confirmed these early findings [16][17][18][19][20][21]. Using latent class and random parameter techniques of estimation of multiproduct cost functions for US higher education institutions Agasisti and Johnes [7] found unexploited economies of scale.
On the contrary, substantial diseconomies of scope have been found by [22][23][24][25][26]. Summarizing several studies for the case of UK, Papadimitriou and Johnes report that the authors "consistently find global diseconomies of scope for the typical university" [5].
Interestingly Olivares and Wetzel [25] disaggregate the findings and offer an articulated set of implications, depending on the size and nature of universities. Large universities do not suffer from diseconomies of scope, meaning that they have found an institutional equilibrium between the various missions. On the contrary, small-and medium-sized Universities of Applied Sciences (i.e., non-PhD granting institutions) are subject to strong diseconomies between teaching and research and between teaching in different disciplines. For them the implication is a need to specialize, by focusing on few disciplines, and by concentrating the research activities into a focused set of fields and priorities. On the other hand, Johnes et al. [27] show that economies of scope depend on the choice of estimating technique. Overall, the existing literature does not offer a uniform answer to the issue of economies or diseconomies of scope. Data limitations (i.e., analysis of samples at country and/or discipline level) contribute to this state of the art.
Taken together, the empirical studies offer mixed findings, but do not lead to the conclusion that research and teaching are subject systematically to diseconomies of scope. Diseconomies of scope seem to be dependent on specific factors, such as the size of the institution (for example, smaller institutions, particularly colleges or universities of applied sciences, may suffer more) or the institutional configuration of the national system (for example, universities in the UK may suffer more, given the degree of competition for research funding). This literature is not explicitly oriented towards the derivation of detailed policy implications. Its implications are somewhat indirect and general: The studies recommend policy makers to consider the large heterogeneity among universities and offer a methodology for adjusting their inputs (mainly academic and nonacademic staff and funding) to maximize the joint outputs depending on specific conditions (e.g., by size of the institution or the subject mix). Some studies offer indirect support to public policies by measuring the average efficiency of universities over time, trying to attribute the improvements to changes in policies. It is safe to state that these studies do not put into question the issue of the institutional design of Humboldtian universities, in which research and teaching are produced together.
This orientation is in contrast with a stream of studies that use economic theory to examine the issue of optimal configuration of universities based on a principal-agent theoretical framework (see [6,[28][29][30][31]). Here there is not an empirical estimation of economies or diseconomies of scope. Rather, the theoretical framework is one in which universities must produce two outputs, subject to different production functions, and subject to staff time or to budget constraints. The two outputs are defined in terms of volume but also of quality. In particular, the quality of research is distributed in a skewed way, so that it is increasingly difficult to compete in research at top level. In turn, universities receive streams of funding from various sources for their outputs and the level of funding may depend, according to various formulas, on the volume and quality of the output. The issue is framed as a multitasking optimization problem. Given information asymmetries and theoretical assumptions on production conditions, these studies derive efficiency conditions for various configurations of universities. Universities differ among themselves in terms of the allocation of their resources (e.g., time budget of staff) to research and teaching, but also in terms of the quality of the output [32].
Interestingly, a common result from these theoretical studies is that under certain institutional conditions universities cannot optimize jointly the production of both research and teaching. In other words, they suffer systematically from diseconomies of scope. Nevertheless, this literature explicitly calls into question the Humboldtian model, at least based on theoretically based arguments.
With this literature in the background, we need more robust evidence on the issue of economies of scope between research and teaching. Existing studies, while technically at the state of the art, use data at national and/or disciplinary level. There is room for building up a broader empirical base for analysis.
We contribute to this debate by examining the issue of economies or diseconomies of scope in a multidisciplinary and cross-country context. We aim to achieve more generality with respect to the previous literature in which single disciplines (e.g., economics) and moreover single countries have been covered.

Data
We exploit, for the first time, microdata on universities across all European countries. The availability of the microdata refers to detailed information for the following five fields of education (FoE): 511 universities active in science (FoE05), 323 in computer science (FoE06), 416 in engineering (FoE07) 325 in agriculture (FoE8), and 337 in medicine (FoE9). For each of these areas we combined disaggregated data on academic staff by FoE with data on publications in the subject categories that have a correspondence with the discipline. The choice of these disciplines was dictated by limited data availability. It is well known that bibliometric data based on indexed journals do not cover appropriately social sciences and humanities (SSH). Since we have publication data based on Scopus, our results should be interpreted as showing economies of scope in the covered disciplines only. To the best of our knowledge, no empirical analysis on economies of scope in SSH disciplines is available, due to the lack of comparable data (for example due to the large use of books and the adoption of national languages).
We arrived at the construction of the dataset with the following steps. First, we started with the census of European universities made available by ETER (European Tertiary Education Register), which includes microdata on all institutions validated by National Statistical Authorities (https://www.eter-project.com, accessed on 8 November 2021). These data cover the number of undergraduate and postgraduate students and degrees, academic staff, and a number of structural characteristics such as georeferentiation, foundation year, and legal status. Second, for each institution, we collect data on publications by accessing the Global Research Benchmarking System (GRBS) dataset provided by the United Nation University-International Institute for Software Technology (UNUIIST) based on Scopus publications. The dataset covers 251 subject categories in science and technology fields. The GRBS dataset is illustrated in detail in [33] and has been used in several published papers in recent years [34][35][36][37][38]. These papers describe the correspondence between FoE (data on academic staff and students) and FoS (data on publications). GRBS data used for this paper refer to the cumulative publications and citations in year 2008-2011 [33]. The GRBS dataset includes variables based on SNIP, or source normalized impact per paper, an indicator that overcomes some of the limitations of impact factor. This indicator is used to compute the number of publications in, and citations from, the top 10% and 25% journals in the distribution of SNIP. Table 1 offers a summary of variables used for our specification.
In this paper we innovated in two complementary directions, with a novel dataset and a novel technique. The European-level dataset gives more generality to our findings, while offering reliable comparability given the nature of the variables utilized. The novel technique has a well-known track record in many empirical fields, but it has never been applied to the issue at hand.

Input Variables
There are three main categories of inputs considered in studies of economies of scope: staff, student, and expenditure. In this paper we made use of the first two inputs, given that the data on expenditure in ETER are not available for our period in all countries and a specific harmonization effort is still underway. In light of the importance of this variable, we plan to replicate the exercise as soon as the harmonization exercise is completed. (There are several definitions for expenditures: total or general expenditure at university level [39,40], total departmental expenditure [41], total research income [42,43] or research grants [44,45]. An important limitation is the lack of measurement for research infrastructures, or research capital stock [4,40,46]). Number of sub-sub-subjects where the HEIs is in the first decile of the world rank of institutions with the highest share of publications in source titles that are within top 25% of that subject area, based on the SNIP value (except the one considered) (P_TOP251DEC: Percentage share of sub-sub-subjects in the first decile top 25% SNIP over total number of sub-subjects where the HEIs has publication in GRBS) (except the one considered) The number of staff at university level is the most used input variable in efficiency studies (see among many others [43,[46][47][48][49]. In some cases, a distinction is drawn between academic and non-academic staff, including only the former category [50] or including both categories [51]. We used the total number of academic staff in curricula classified under the Field of Education (FoE) classification by UNESCO. Since our research data (publications and citations) come from indexed journals (Scopus) we limited the analysis to STEM fields. We could not use non-academic staff due to the lack of a breakdown by FoE in ETER data.

Students
A second category of inputs is subject to more discussion, i.e., students. Are students to be considered inputs or outputs of the university production process?
When students are considered inputs, the argument is that human capital increases during studies [26]. In other words, students enter as an input, while graduates, or degrees, are the output. Consistent with this view, the total number of students has been used as an input, either weighted by the quality of freshmen [52] or unweighted [40,41]. We followed this literature by considering the total number of students enrolled into curricula classified in FoE 6-9 as inputs to the production process. The definition of students covers the ISCED categories 5-7, that is, we included undergraduate students but not postgraduate students (ISCED 8).

Output Variables
The outputs used in efficiency models reflect the missions of universities, i.e., teaching, research, and third mission. In practice, however, output indicators of third mission are rarely available, so that most studies only include teaching and research.

Teaching Output: Number of Graduates
The total number of graduates at undergraduate level is the most largely used teaching output variable [40,47,[53][54][55]. The number of graduates may be unweighted [56] or weighted by degree classification [41,52]. Very few studies in efficiency analysis include measures of quality of education alongside the count of graduates [46,57]. Measures of quality might be used to build up weights of quantitative measures or to correct measures [58].

Research Output: Number of Publications and Citations
Publications are largely used as indicators of research output [15,[59][60][61]. In most cases the selected indicator, depending on the objective of the analysis, is the count of publications, or the number of publications per capita. While the count of publications is an indicator of volume of scientific output, its quality is proxied by the count of citations they receive in the scientific literature.
We exploited a recently developed dataset that disaggregated Scopus data on publications by subject categories and universities, covering all European countries (see details in 30). From this dataset we used the count of publications and citations, first at the aggregate level, then by journal ranking. The ranking of journals was based on the SNIP indicator. We then included the count of publications (respectively, of citations) in the (respectively, from) the top 10% SNIP and the top 25% SNIP journals. These variables were used in separate models and reported as Supplementary Materials.
Combining these variables, we obtained a rich analysis of the overall production of research (count of publications), of its average quality or impact in the scientific community (count of citations), of its ability to be competitive in top journals (count of publications in top 10% and 25% journals), and of its excellence in being recognized by scholars in their most influential articles published in top journals (count of citations from top journals). This rich array of variables follows the recommendation of the literature to use systematically several indicators of research activity and has been used in a preliminary exploration of the dataset in the field of medicine [38].
It is worth noting that the use of publications and citations from indexed journals (in our case, Scopus) is subject to controversy. The main criticisms refer to the narrow perspective taken by these bibliometric indicators, specifically with respect to the evaluation of individual research performance and the adoption of metrics for the recruitment and career advancements of researchers. On the contrary, these indicators are usually accepted at aggregate level, that is, university or country level.

A SUR Approach to the Estimation of Economies of Scope
Economies of scope are estimated using a variety of specifications. One of the most used specification is the transcendental logarithmic function (the so-called Translog function) introduced by [62], given its flexibility. To estimate economies of scope, the sign and significance of the coefficient of interactive variables are considered.
With the aim of modeling the multiple-output process, we explored in this paper the possibility of referring to the seemingly unrelated regression (SUR) models firstly introduced by [63] which allowed us to account for the existence of contemporaneous correlations between teaching and research activities as well as (if needed) for different sets of explanatory variables in each equation, describing the two production processes (teaching and research, respectively).
The SUR approach provided us with several advantages of the joint estimation approach compared to estimating separate equations, using data for teaching and research separately. First, the SUR method estimates the parameters of all equations simultaneously, so that each equation also considers the information provided by the other equations. In our study, the SUR models enabled us to contemporaneously account for the relationship between the same inputs (this is as an example the case of teaching staff) in "producing" both teaching and research outputs. Second, the correlation among equations could be due to unobservable university (subject-area) specific attributes that influence the generation of both teaching and research outputs [64]. In this perspective, it is possible to test hypotheses across teaching and research processes, which cannot be carried out if these processes are estimated separately. Third, from an econometric perspective a gain in the degrees of freedom more precise estimates were obtained by estimating all the parameters jointly and by using a system approach [10].
Concerning the two studied university processes, we firstly modeled the number of graduates, representing teaching output, as a function of the number of academic staff and of the number of students enrolled in each specific subject area involved in our research. As for the definition of the research process, we built six separate models with research output, by considering in turn total citations (TOT_CIT), total publication (TOT_PUB), and top 10% and top 25% in citations and publications (PUB10_F, PUB25_F, CIT10_F, CIT25_F), each of which is considered a function of two inputs, i.e., the number of academic staff and of the number of students in the specific studied area. The number of students as an input to the research process was justified by the need to keep the set of inputs equal between the two equations; furthermore, the sign of the coefficient for students in the two equations offered insights on the impact of student load.
Concerning the estimation strategy, we implemented the estimates through a stepwise procedure, by firstly estimating models in reduced form-i.e., by defining a "pure" production function with two inputs and one output in each equation-and secondly by estimating "extended" models that include a variety of control factors.
Without loss of generality and by considering the university i (i = 1, . . . , n j ) within each subject area j (j = 1, . . . , 5 with j = 1 for FoE 05 science, j = 2 for FoE 06 computer science, j = 3 for FoE 07 engineering, j = 4 for FoE 08 agriculture, and j = 5 for FoE 09 medicine) as elementary statistical unit of analysis, we therefore specified the "pure" production functions within the SUR framework as: where t = 1, 2 identifying the number of equations (with t = 1 referring to the teaching process and t = 2 to the research process). Therefore, in each equation t (t = 1, 2), y ij1 refers to the teaching output expressed by the number of graduates observed in the university i in the specific subject area j, while y ij2 refers to the research output observed for the university i in the subject area j.
Focusing on the two selected inputs, in each equation X 1ijt and X 2ijt refer to the two inputs identified in the process: The first input X 1ijt is represented by the number of teaching staff of the university i operating (related to) in the subject area j with β 1jt the related parameter to be estimated thus referring to the role played by the Input 1 in the subject area j and according to the t-th equation. The second input X 2ijt is represented by the number of students enrolled at the university i in the subject area j with β 2jt the related parameter to be estimated thus referring to the role played by input 2 in the subject area j and according to the t-th equation. Finally, the term u ijt representing the error term defined for each university and subject areas explicitly allows for contemporaneous correlation, i.e., For the complete "extended" models, for each university we added to the Equation (1) a C-dimensional vector of variables Z representing specific university characteristics, such as the university overall size, the university establishment year, the intensity of international collaboration as well as the level of scientific output specialization. As a result, the following specification is defined: (2) In which, in addition to the parameters and the variables (inputs and input crossproducts) already defined for specifying Equation (1), each variable z cijt (c = 1, . . . , C) included in both equations describes a specific university characteristic, while the γ cijt is the related unknown parameter to be estimated.
As specified above, each two-equation model explicitly allows for contemporaneous correlation between error term related to teaching and research production process, with the SUR estimator that accounts for interrelations among the single equations as measured by a weighting matrix based on the covariance matrix of the error terms. In this perspective, the SUR model can be considered an application of the generalized least squares (GLS) approach in which the unknown residual covariance matrix is estimated from the data [65]. Moreover, the estimation through SUR models with respect to multivariate regression models also allow to consider the possibility of introducing "prior" information in terms of excluding some specific explanatory variables from each equation and therefore from each studied process.
Lastly, we referred to the Breusch-Pagan (BP) test [66] to assess whether the errors across the two equations were contemporaneously correlated. By rejecting the null hypothesis of independence among equations residuals, SUR models provided more efficient estimates than estimating separated models through the classical ordinary least square (OLS) estimations with a greater gain in efficiency [65] as the linear dependence among error terms of the different equations increases [67] as well as when there is a large sample size and high levels of multicollinearity among covariates [68].

Control Variables
We included in the extended models several control variables.

Size at University Level
According to the literature, there is an interaction between economies of scope and the size of the operations of universities. This interaction has been repeatedly found in the literature. Olivares and Wetzel [25] show that diseconomies of scope are more likely to be found in small-and medium-sized universities than in large ones, a finding supported also in [9] and [5,25]. The notion behind these results is that large institutions can better exploit economies of scope due to a larger portfolio and greater variety and diversity of programs. In other words, it is likely that the economies of scope change their intensity according to the size of teaching and/or research. We included therefore two variables that represent the size of the university. The first (size) is the total number of undergraduate students (ISCED 5-8) in all fields, at university level, taken from ETER. This is a measure of the size of the umbrella institution under which the individual disciplines operate.

Total Publications
A second indicator of size is given by the count of the total number of publications in all fields covered by Scopus in the subject categories corresponding to the FoE 6-9, or STEM disciplines. This indicator captures the total volume of scientific activity at university level. Different from the above variable (total number of students in all fields), this variable is only available for the STEM disciplines, so it underestimates the overall size of publications at university level. This control variable has been introduced only in the models where we modeled as research outputs the top 10% and top 25% of publications and citations, respectively (findings in Supplementary Materials).

International Co-Authorship
In a related paper, we used a variable from Scopus describing the percentage of total publications at university level co-authored with foreign authors (IC). It is positively correlated to individual research productivity in a multilevel framework [38].

Governance of University
By governance, we mean here the legal status of universities, which can be classified as public, private, and private-government-dependent. There are few studies comparing public and private universities with respect to the research orientation, intensity, or productivity.

Generalist vs. Specialist Model of University
Generalist universities cover the entire spectrum of disciplines, while specialist universities focus on one or a few fields, most frequently in applied fields such as engineering (technical universities, or polytechnics), medicine (medical schools) in STEM, as well as business or law in SSH. The distinction between generalist and specialist may have an impact on economies of scope in research, between different disciplines.

Results Organization and Presentation
For each modelled research output we firstly reported the model estimates in the "pure" specification as described by the Equation (1) in which the coefficients of the inputs together with their squared terms are interpreted as well as the interaction between inputs [69]. Secondly, we presented the "extended" version of the function with control variables as described by Equation (2). In each table we changed the research output while holding constant the teaching output represented by the number of graduates.
The introduction of factors which cannot be consider proper inputs, but might influence the production process and in some cases are not under the total control of the university (production process)-leading to the "extended" version of the production function-represents a relevant issue which can help to better explain efficiency as well as to identify the presence of "external" factors which create inefficiency, both issues contributing to a better understanding and related improvement of university performance [70].
Moreover, with the aim of assessing whether the teaching and research processes are independent from each other, we used the BP test. From a methodological perspective, if the equations are independent (i.e., there is no significant correlation of the residuals across the two production processes), no efficiency can be gained by using the SUR compared to run separate OLS estimates since the two processes can be modeled independently. On the other hand, the existence of correlation between the residuals highlights the existence of dependence (correlation) between the two process and therefore a simultaneous estimate can correctly model the two outputs (teaching and research outputs).

Estimation Results: Total Publication (TOT_PUB)
Tables 2 and 3 report the estimates for the reduced and extended Translog production functions, as described by Equations (1) and (2), respectively. It is important to note that in all tables reported throughout the paper (Tables 2-5), p-values in bold refer to statistically significant coefficients (at the 0.01, 0.05 or 0.10 level).
In Table 2, in the education equation, the sign of coefficients representing inputs are, as expected, positive for academic staff (although significant only for F05 science) and for students in the area (significant in three cases). In the research equation academic staff enters negatively in two cases, in one of which the squared variable is positive, suggesting a U-shaped relation. The interaction term is not significant in most cases. In the only three cases of significant coefficient the sign is not stable (two positive, one negative).
The BP test is significant in three areas: F05 science, F06 computer science, and F07 engineering. In all three cases we find positive coefficients. Here we find confirmation, with a novel econometric method, of the existence of economies of scope, or positive interdependence between the equation describing the production of graduates and the equation describing the publication of research.
In Table 3, the same models are estimated in the extended version, including control variables. By adding the control variables, the pattern of findings changes in a remarkable way. First, in no estimates did we find academic staff entering with a positive and significant coefficient. The only significant coefficients are negative in the TOT_PUB model (F05 science and F06 computer science), in the former case with a positive coefficient for the squared variable, suggesting a U-shaped convex relation.
At the same time, the coefficients of number of students become unstable, with three positive and three negative significant coefficients.
Among the control variables, the overall size of the university enters negatively in three cases (one positive), supporting the notion that economies of scale are not widespread at the level of institution. The specialization model is negatively associated to production, meaning that generalist universities are at an advantage. Age and governance (private vs. public) do not have any impact.
The single most important variable to explain the level of publication in all fields is the level of international cooperation at university level. Being affiliated to a university with strong international cooperation, as witnessed by international co-authorship, is positively associated to the volume of publications. Finally, the results on BP test suggest only one case of significant coefficient, in F06 computer science.

Estimation Results: Total Citations (TOT_CIT)
In this sub-section we examine the same education equation (output: number of graduates) but we changed the research equation, considering as an output the total number of citations received. This model focuses on the average quality of research, rather than on the volume of production.  In Table 4 the sign of the coefficient for the inputs are as expected with two exceptions. The coefficients for the number of academic staff are positive (two cases), while for the number of students are positive in four cases but negative to produce research (total citations) in F08 agriculture and F09 medicine. Here it seems that the larger the number of students in the area, the lower the quality of research as witnessed by total citations.
The structure of the BP test is like the reduced model, with positive and significant coefficients in F05 science, F06 computer science, and F07 engineering.
When adding the control variables, however, several new results were found ( Table 5). The number of students remained positively associated to educational output but entered negatively for research production in F05 science and F09 medicine. Academic staff was not significant for educational production, while it was negatively associated with total citations in F05 science and F06 computer science.
Among the control variables we found a similar pattern to the one found for total publications: size and specialization had a negative or non-significant association, while the level of international cooperation was the single most important explanatory factor for scientific production in all fields (from F05 to F09) and for educational production in three cases. Age and legal status (private vs. public), as in the models above, did not play any role. Finally, there was only one case of significant BP test, suggesting economies of scope and confirming the findings of the previous model.
Summing up, although we found few cases of confirmation of economies of scope, we never found evidence in the above presented analysis of diseconomies of scope between research and teaching.

Discussion, Limitations and Conclusions
Our results contribute in several ways to the current debate in higher education. We find that they confirm the validity of the European university model in several directions.
First, they show that generalist universities, covering a large spectrum of disciplines, are not at disadvantage with respect to specialist universities (mainly in engineering, medicine, and business).
Second, our results confirm that combining research and education on a large scale, following the Humboldtian model, is not damaging the research mission. This issue is at the core of a hot debate [6,[28][29][30] in which some authors have advocated the creation of research-intensive institutions as separated from universities with large enrolments of students. Another proposal has been the institutional separation of government funding between performance-based funding based exclusively on research and general support based on the number of students. Other authors [35,36] have discussed the relative advantages of the US and European models of scientific excellence: in the US top scientists are relatively concentrated in Ivy League and in a few top State universities, while in most European countries (perhaps with UK and Switzerland as exceptions) they are more diffused. Consequently, in top US universities all departments compete at the world level, while in European universities there is more internal heterogeneity with respect to scientific excellence. Our results give confirmation that research excellence in Europe is not concentrated in a small number of institutions but is spread across many institutions. They suggest a possible interpretation: European universities must find a balance between research performance and education on a large scale. There are relatively few world-class research-intensive universities (such as Oxford and Cambridge in UK, or ETH and EPFL in Switzerland), a fact which is reflected in the relative weakness of European universities in global university rankings.
What do our results add to this debate? One of the (often implicit) assumptions of the suggestion that research-intensive universities should be somewhat "insulated" from mass education is that research and education are on average substitute to each other. In other words, the assumption is that there are on average diseconomies of scope. We find this effect is not supported by data. We find no evidence of diseconomies of scope.
This does not mean that the debate on US vs. Europe should be closed here. There might be other advantages of a concentrated model, perhaps in the international visibility or the ability to attract foreign graduate students. But the debate cannot be based on wrong assumptions of diseconomies of scope between education and research.
However, there are several limitations to this study. First, we relied on a definition of academic staff which is officially provided by National Statistical Authorities (NSAs). These data have been validated within the ETER project, but the experience in comparing data across countries with detailed knowledge of subtle distinctions is still limited. There is not a uniform standard for the classification of academic staff. In other words, we cannot exclude the presence of measurement errors. On the contrary, data on students and graduates are considered reliable (they are managed administratively) and comparable (they follow ISCED classifications, which are followed by all NSAs).
Second, we only considered academic staff and students as inputs. As stated above, data on funding of universities are available only for a subset of countries and few years. Further work is needed to standardize the definitions of funding across European countries and to experiment with full models in which human and financial inputs are considered jointly.
This paper offers contributions to the methodology and literature on the economics of production of higher education and research. With respect to methodology, we adopted a set of techniques and tools (SUR and BP test) that allowed us the estimation of interdependence between equations describing the production of separate outputs by the same combination of inputs. This is a complementary approach that enriches the toolbox of the literature.
With respect to empirical results, we excluded the presence of diseconomies of scope between teaching and research in all STEM disciplines, either to produce all research outputs and to produce excellent research. The arguments about the inefficiency of keeping together research and teaching in the same institutions are not empirically grounded.
We found only one important exception, that is, diseconomies of scope in the field of Medicine, when the output is measured in terms of excellent production of research (top 10% or 25% journals).