How Efficiently Do Elite US Universities Produce Highly Cited Papers ?

While output and impact assessments were initially at the forefront of institutional research evaluations, efficiency measurements have become popular in recent years. Research efficiency is measured by indicators that relate research output to input. The additional consideration of research input in research evaluation is obvious, since the output depends on the input. The present study is based on a comprehensive dataset with input and output data for 50 US universities. As input, we used research expenses, and as output the number of highly-cited papers. We employed Data Efficiency Analysis (DEA), Free Disposal Hull (FDH) and two more robust models: the order-m and order-α approaches. The results of the DEA and FDH analysis show that Harvard University and Boston College can be called especially efficient compared to the other universities. While the strength of Harvard University lies in its high output of highly-cited papers, the strength of Boston College is its small input. In the order-α and order-m frameworks, Harvard University remains efficient, but Boston College becomes super-efficient. We produced university rankings based on adjusted efficiency scores (subsequent to regression analyses), in which single covariates (e.g., the disciplinary profile) are held constant.


Introduction
The science system has been characterised by the transition from academic science to post-academic science for several years."Bureaucratization" is the term used to describe most of the processes connected with post-academic science: "The transition from academic to post-academic science is signaled by the appearance of words such as management, contract, regulation, accountability, training, employment, etc. which previously had no place in scientific life.This vocabulary did not originate inside science, but was imported from the more 'modern' culture which emerged over several centuries in Western societies-a culture characterized by Weber as essentially 'bureaucratic'" ( [1], p. 82).As an important part of universities' commitments to accountability (against the government), research evaluation has assumed a steadily growing importance in the science system.While academic science (since its beginnings) has been characterised by the use of the peer-review system to assess single outcomes of science (e.g., manuscripts, [2]), post-academic science is characterised by the use of quantitative methods of research evaluation.According to Wilsdon, et al. [3], there are currently "three broad approaches to the assessment of research: a metrics-based model; peer review; and a mixed model, combining these two approaches.Choosing between these remains contentious" (p.59).
Typical metrics are publications and citations [4].For example, the government of Mexico follows a metrics-based model by allocating funds to higher education institutions with several indicators [5].
A special characteristic of research evaluation in the area of post-academic science is the emergence of university rankings.Here, metrics are used to rank the universities in a country or worldwide [6].University rankings have some obvious advantages.They offer, for example, a quick, simple, and easy way of comparing universities (worldwide).The most interested groups in the rankings are students, the public and governments [3].However, a lot of critiques have been published in recent years (see e.g., Reference [7]) that focus on the methods and arbitrary weightings used to combine different metrics.Daraio, et al. [8] cited four points summarizing the main criticisms aimed at rankings: mono-dimensionality, statistical robustness, dependence on university size and subject mix, and lack of consideration of the input-output structure.
In this scientometric study, we pick up the last point "lack of consideration of the input-output structure" and set a possible approach for input-output consideration in institutional evaluation to discuss (in scientometrics).Since positions in rankings depend on certain context factors [9,10], rankings should not only offer information on the output, but also the relation of input to output.Moed and Halevi [11] define input indicators as follows: "indicators that measure the human, physical, and financial commitments devoted to research.Typical examples are the number of (academic) staff employed or revenues such as competitive, project funding for research" (p.1990).
If metrics are used that relate output to input (e.g., the number of papers per full time equivalent researcher), research efficiency is measured.Thus, this study is intended to explore approaches of measuring the efficiency of universities.The study follows on from a recent discussion in the Journal of Informetrics, which started with Abramo's and D'Angelo's [12] doubts about the validity of established bibliometric indicators and the comments that ensued.Instead, they plead in favor of measuring scientific efficiency.For example, they proposed the Fractional Scientific Strength (FSS) indicator, which is a composite indicator that considers the total salary of the research staff and the total number of publications weighted with citation impact (when used on the university level).

Conceptual Framework
This study follows the call of Bornmann and Haunschild [13] and Waltman, et al. [14] who propose in a comment on the paper by Abramo's and D'Angelo's [12] that scientometricians should try to explore methods and available data to measure the efficiency of research.We do this both by using a unique data set and applying approaches rarely used in academic efficiency analysis.The former comprises information for the top 50 US American universities from the Times Higher Education (THE) Ranking 2015.Input is defined by annual research expenses.The output concerns the 1% most frequently cited publications in a specific field and given year (P top 1% ).The focus on these top publications is derived from the fact that we focus on elite universities represented by the 50 best ranked universities in the THE.Whereas the input we used is standard in the literature, our output variable has never been used before (to the best of our knowledge)-although it is very suitable to study the efficiency of top institutions.Other output variables such as total number of publications, number of graduates or third-party funding could have equally been considered.However, we had three reasons to focus on highly-cited papers: (1) Reputable universities should be evaluated with indicators which focus on excellent research.(2) Gralka, et al. [15] and other studies have shown that conclusions are very similar if (top-cited) publications, third-party funding or other indicators are used in the efficiency analysis.(3) We wanted to trace out the 'pure' research effect in efficiency analysis.Including, for instance, teaching-oriented variables would divert from this goal.
University rankings identify top universities from around the world using various indicators.We initially had the idea to undertake a global university analysis by including not only US universities, but also top universities from other countries.This would require an international database with comparable input and output data.Whereas these data are available on the publication output side (with, e.g., the Scopus database from Elsevier), they are not available on the input side.
We focus, therefore, in this study on US universities for which comparable data are available in a national database.
The most frequently used tool in the academic efficiency literature is the Data Efficiency Analysis (DEA) and variations of this non-parametric approach.The DEA yields an institutional efficiency score between 0 and 1, where 1 means efficient.However, these non-parametric approaches have several shortcomings.There is no well-defined data-generating process, and a deterministic approach is assumed: "Any deviation from the frontier is associated with inefficiency, and it is not possible to take into consideration casual elements or external noise which might have affected the results" [16].The most serious drawback of the DEA in its simplest form is that it is extremely vulnerable to outliers and measurement errors. 1  Thus, we further employ the Free Disposal Hull (FDH), which is less prone to outliers, and apply the partial frontier analysis (PFA), which nests FDH and DEA.Specifically, we employ the order-m [17] and order-α [18] approaches.Here, the sensitivity to outliers and measurement errors is reduced by allowing for super-efficient universities with efficiency scores larger than 1.To this end, sub-samples of the data are used and resampling techniques are employed.The use of four different approaches allows us to validate the robustness of our conclusions.Finally, we calculate efficiency scores adjusted for institutional background and research focus.
There is plenty of literature examining the efficiency of (higher) education institutions.Early examples are Lindsay [19] and Bessent, et al. [20].Worthington [21] and more recently Rhaiem [22] as well as De Witte and López-Torres [23] provide comprehensive surveys of the literature.In the efficiency analyses of higher education institutions, PFA has rarely been used.Bonaccorsi, et al. [24,25] applied the order-m approach to study 45 universities in Italy and 261 universities across four European countries, respectively.De Witte, et al. [26] used DEA and PFA to study the performance of 155 professors working at a Business & Administration department of a Brussels university college.Bruffaerts, et al. [27] used FDH and PFA to study the efficiency of 124 US universities.The authors tried to explain which factors drove the efficiency scores.However, they do not provide scores for each university.Gnewuch and Wohlrabe Gnewuch and Wohlrabe [28] used partial frontier analysis to identify super-efficient economics departments.There are several studies available in the literature which have investigated efficiency aspects in the US higher education system [29][30][31][32][33][34].
The paper is organized as follows: It starts by explaining the four statistical approaches used in this study for calculating the efficiency scores of the universities.The paper subsequently describes the data set and provides some descriptive statistics.In the first step, we calculate efficiency scores for the universities.In the second step, we calculate adjusted efficiency scores.These scores are adjusted to the different profiles of the universities (e.g., their disciplinary profiles).After presenting our results, we discuss the implications of our analysis.

(Partial) Academic Production Frontier Analysis
The main goal of efficiency measurement is to calculate an efficiency score for each unit (here: each university).There are two main concepts: (1) input-orientated efficiency, where the output is set constant and the inputs are adjusted accordingly; (2) output-orientated efficiency, where for a given input the output is maximized.These concepts differ in terms of the direction in which the distance of a university from the efficiency frontier is measured.In this paper, we resort to input-efficiency and variable returns to scale (VSR).With respect to the former point, we could also consider output-efficiency as US universities may have control over both the inputs (the acquired budget) and 1 There are also parametric approaches available (e.g., the stochastic frontier analysis, SFA), which have several disadvantages too.One disadvantage is that they rely on distributional assumptions; a specific functional form is required.The potential endogeneity of inputs cannot be accounted for.the outputs.In our estimation framework, we cannot test the nature of economies of scale.The partial frontier approaches used in this paper assume constant returns to scale.Furthermore, the results do not point to evidences of how the production process with respect to top-cited publications works.
We start this section by describing two full production frontier approaches for elicitation of academic efficiency scores: The most commonly used DEA and the less known FDH approach.We subsequently outline two PFA: order-m and order-α.Both techniques are generalizations of the FDH approach, as they nest it.Both approaches allow for the existence of super-efficient universities, i.e., universities with efficiency scores larger than 1.In Section 3.1.4,we illustrate the four approaches with a simple example.
We denote the input and output of a university i with x i and y i , respectively.We consider N universities.The corresponding efficiency score is given by e i .
where λ is a weighting parameter that maximizes the productivity.In this paper, we focus on the basic version of the DEA.With respect to outliers, sampling and measurement data issues we focus on the later introduced partial frontier analysis.For (robust) extensions of the DEA, we refer to Bogetoft and Otto [36] and Wilson and Clemson [37].
We compare each university i with every other university in the data set (j = 1 . . .N).The set of peer universities that satisfy the condition y lj ≥ y li ∀l is denoted by B i .Among the peer universities, the one that exhibits the minimum input serves as a reference to i, and e FDH i is calculated as the relative input use êFDH Universities that exhibit the minimum input-output serve as references.For these universities, the efficiency score e FDH i is 1.The FDH approach was introduced by Deprins, et al. [38].

Order-m Efficiency
In case of order-m efficiency, the partial aspect comes in by departing from the assumption that the universities are benchmarked on the basis of the best-performing universities in the sample.Instead, the best performance of a sample including m peers is considered.Daraio and Simar [39] proposed the following four-step procedure:

1.
Draw from B i a random sample of m peer universities with replacement.

A pseudo-FDH efficiency score (ê FDH d mi
) is calculated using the artificially drawn data.

4.
Order-m efficiency is calculated as the average of the pseudo-FDH scores Publications 2019, 7, 4 5 of 15 A potential result of this procedure is that the order-m efficiency scores exceed the value of 1.This is due to the resampling: In each replication d, university i may or may not be used for its own comparison.Therefore, this procedure allows for super-efficient universities (with êOM mi > 1) located beyond the estimated production-possibility frontier.There are two parameters that need to be determined beforehand: m and D. D is just a matter of accuracy.The higher D is, the more accurate are the results.It prolongs the computational time only.The choice of m is more critical.The smaller m is, the larger is the share of super-efficient universities.For m → ∞ the approach converges to the FDH results.

Order-α Efficiency
The order-α approach generalizes the FDH otherwise.Instead of searching for the minimum input-output relationship among the available peer universities (the benchmark), order-α uses the (100 − α)th percentile êOA αi = P (100−α) When α = 100, the approach replicates the FDH results.In case of α < 100, some universities may be classified as super-efficient.As m is the approach explained in Section 3.1.2,α can be considered as a tuning parameter: the smaller α is, the larger is the share of the super-efficient universities.

A Simple Example for Explaining the Approaches
To illustrate the outlined full and PFA approaches, we sketched them out in Figure 1.We plotted input-output combinations for various universities.The results of the DEA are given by the straight line.Universities A, B and E define the academic production frontier.These universities have an efficiency score of 1, i.e., an optimal input-output combination.The other universities on the right of or below the frontier are considered as inefficient.In case of the FDH, the outer hull is spanned in more explicitly by also considering universities that are not on the DEA curve.In Figure 1, universities C and D are also efficient now.Since the frontier has shifted towards the right, the efficiency scores for all other universities slightly increase.The distance to the frontier is smaller than for the DEA.
A potential result of this procedure is that the order-m efficiency scores exceed the value of 1.This is due to the resampling: In each replication d, university i may or may not be used for its own comparison.Therefore, this procedure allows for super-efficient universities (with e > 1) located beyond the estimated production-possibility frontier.There are two parameters that need to be determined beforehand: m and D. D is just a matter of accuracy.The higher D is, the more accurate are the results.It prolongs the computational time only.The choice of m is more critical.The smaller m is, the larger is the share of super-efficient universities.For m → ∞ the approach converges to the FDH results.

Order-α Efficiency
The order-α approach generalizes the FDH otherwise.Instead of searching for the minimum input-output relationship among the available peer universities (the benchmark), order-α uses the (100 − α)th percentile When α = 100, the approach replicates the FDH results.In case of α < 100, some universities may be classified as super-efficient.As m is the approach explained in Section 3.1.2,α can be considered as a tuning parameter: the smaller α is, the larger is the share of the super-efficient universities.

A Simple Example for Explaining the Approaches
To illustrate the outlined full and PFA approaches, we sketched them out in Figure 1.We plotted input-output combinations for various universities.The results of the DEA are given by the straight line.Universities A, B and E define the academic production frontier.These universities have an efficiency score of 1, i.e., an optimal input-output combination.The other universities on the right of or below the frontier are considered as inefficient.In case of the FDH, the outer hull is spanned in more explicitly by also considering universities that are not on the DEA curve.In Figure 1, universities C and D are also efficient now.Since the frontier has shifted towards the right, the efficiency scores for all other universities slightly increase.The distance to the frontier is smaller than for the DEA.Applying the partial frontier approaches, order-m or order-α, we get a different picture.Only universities C and D are efficient with a corresponding score of 1; universities A, B and E are considered as super-efficient with a score larger than 1.Of course, both approaches do not necessarily yield the same results, as the figure might suggest.

Regression Analyses and Adjusted Efficiency Scores
We performed regression analyses to produce adjusted efficiency scores for the universities.Since the universities have different profiles, the scores from the regression analyses are adjusted to these differences.Thus, the focus of the regression analyses is not on explaining the variance of the scores (as done, e.g., by Reference Agasisti and Wolszczak-Derlacz [40]).We used Stata [41] to compute the regression analyses.
The efficiency scores from the four approaches (explained above) are the dependent variable in the model.Four indicators are included as independent variables in the models, which reflect the disciplinary profile of the university.We expect that the disciplinary profile is related to the efficiency of a university.The results of Bornmann, et al. [42] show that the field-normalized citation impact of universities depends on the disciplinary profile.For each university, we searched for the number of publications in four broad disciplines and the multidisciplinary field in the SCImago Institutions Ranking. 2 For each institution, the percentages of publications that belong to the four disciplines were calculated and included in the regression model (mean centered).As a further independent variable, the binary information is considered for whether the institution is a public (0) or private (1) university.Private universities tend to be elite research institutions.More than these two indicators are not available in the SCImago Institutions Ranking, which was used in the regression analyses to reflect the profiles of universities.We used the cluster option in Stata to consider in the regression analysis that the universities are in different US states.With 10 universities, the most universities are located in California.The different regulations and financial opportunities in the states probably lead to related efficiency scores for universities within one state.The cluster option corrects the standard errors for the fact that there are up to 10 universities in each state.Although the point estimates of the coefficients are the same as in the regression model without the option, the standard errors are typically larger [43].

Data
For our case study, we gathered input and output data for the 50 best performing US universities as listed in the THE Ranking 2015.As input we used research expenses.The data source is the National Center for Education Statistics (NCES). 3The NCES gathers data from universities by applying uniform data definitions.This ensures the comparability of inputs across universities, which is an important requirement of efficiency studies [44,45].The expenses are self-reported data by the universities.The category includes institutes and research centers, as well as individual and project research.Information technology expenses related to research activities are also considered if the institution separates budgets and expenses information technology resources.Universities are asked to report actual or allocated costs for operation and maintenance of plant, interest and depreciation.The data refer to the academic year, which starts on 1 July and ends on 30 June.As we needed information for three calendar years (the output data refer to the calendar years 2011, 2012 and 2013), we transformed the data.As an example, we obtained the input data for 2013 by taking the mean of the data from the academic year 2013/14 and 2012/2013. 4This approach might introduce some unknown biases as we assume that the expenses are being spent evenly across the year.So, we cannot assure that the research expenses represent correctly the production process of a university.Potential measurement errors are further reasons to employ PFA.In the best case, biases cancel out across the sample.
As we focus on the best US universities, we use as output the number of papers that belong to the 1% (P top 1% ) most frequently cited papers in the corresponding fields and publication years.The use of this indicator ensures that the citation impact of all papers is standardized with respect to the year and subject area of publication.The typical output variables in efficiency analysis are students, graduates and funding; publications are used rather seldom [46,47].The data were obtained from the SCImago Institutions Ranking, which is based on Scopus data. 5The output data refer to the publication period from 2011 to 2013 with a citation window from publication until the end of 2015.We did not use data later than 2013 since it is standard to use a citation window of at least 3 years in bibliometrics [48].In Section 4, we focus on the results for 2013.Both other publication years allowed us to look at the stability of the results.
Table 1 shows the descriptive statistics both for the input and output from 2011 to 2013.The dataset is fairly heterogeneous as the difference between minimum and maximum indicates.Furthermore, the standard deviation is quite large compared to the mean.The distributions of the variables are not significantly skewed as mean and median are very close together.The development over time points out that research expenses increase whereas the average P top 1% peaked in 2012 and dropped considerably in 2013.The correlation coefficients between research expenses and P top 1% are relatively constant over time.All coefficients are about 0.6 implying a moderate positive relationship.

Results
Following the methods as outlined in Section 3.1, we estimated four efficiency scores for each university and year in our data set and obtained the corresponding efficiency rankings for 2011 to 2013.
In contrast, PFA requires the specification of parameters, which eventually influence the amount of super-efficient universities.The order-α approach requires α, the percentile of the set of peer universities used as the benchmark.Order-m requires m, the number of peer universities randomly drawn from the initial set of universities.Unless we set m = 50 or α = 100, where the partial frontier approaches converge to FDH, we find super-efficient universities by construction.Figure 2 shows the number of super-efficient universities for different values of m and α.We used data for the year 2013.It is clearly visible that with higher m or α values, respectively, the number of super-efficient universities becomes lower.Concerning m, the figure is quite stable beyond 35.We opted to set m = 40 and α = 95%, which yield 10 and 7 super-efficient universities, respectively.4 There are a few exceptions (n = 7) where the academic year differs slightly across years.We adjusted the figures accordingly.

Results for 2013
Table 2 reports the efficiency scores with their corresponding rankings based on the data from 2013.The universities are sorted by their ranking positions in the THE Ranking 2015.In 2013, 48 universities are not efficient according to the DEA results.This number drops to 38 universities using the FDH approach.This is because both Harvard and Boston College dominate the estimated academic efficiency frontier: Harvard University due to its very high output values and Boston College due to its small input values relative to the outputs.With respect to the order-α framework, there are seven universities with a score larger than 1.This number is increased to 10 based on the order-m approach.In the majority of cases, the order-α approach yields higher scores compared to the order-m approach.
In the order-α framework and the order-m approach, Harvard University remains efficient with a score of 1.00 and is not denoted as super-efficient.However, Boston College is super-efficient with the highest corresponding score both for the order-α and order-m model.
Based on a Stochastic Frontier analysis, Agasisti and Johnes Agasisti and Johnes [33] also reported efficiency scores for various US universities, but with the number of bachelors and postgraduate degrees on the output side.Similar to our results, the authors found Harvard University at the top but Boston College is not among their 20 best universities.

Results for 2013
Table 2 reports the efficiency scores with their corresponding rankings based on the data from 2013.The universities are sorted by their ranking positions in the THE Ranking 2015.In 2013, 48 universities are not efficient according to the DEA results.This number drops to 38 universities using the FDH approach.This is because both Harvard and Boston College dominate the estimated academic efficiency frontier: Harvard University due to its very high output values and Boston College due to its small input values relative to the outputs.With respect to the order-α framework, there are seven universities with a score larger than 1.This number is increased to 10 based on the order-m approach.In the majority of cases, the order-α approach yields higher scores compared to the order-m approach.
In the order-α framework and the order-m approach, Harvard University remains efficient with a score of 1.00 and is not denoted as super-efficient.However, Boston College is super-efficient with the highest corresponding score both for the order-α and order-m model.
Based on a Stochastic Frontier analysis, Agasisti and Johnes [33] also reported efficiency scores for various US universities, but with the number of bachelors and postgraduate degrees on the output side.Similar to our results, the authors found Harvard University at the top but Boston College is not among their 20 best universities.Table 3 shows the coefficients for the correlation between the ranking positions of the universities in the THE Ranking 2015 and the results of the efficiency analyses.The results point out that the ranking positions from the efficiency analysis are correlated at a (very) low level compared to the correlations among the different results of the efficiency analyses.The results of the four efficiency approaches are highly correlated, implying that one can derive similar conclusions.(2011, 2012 and 2013) for each approach of the efficiency analysis.They are all above 0.8, suggesting that the results are quite stable over the observed time period.The results of the regression analyses are shown in Table 5.As dependent variables, the efficiency scores from Table 2 are used (results from the DEA, FDH, order-α approach, and order-m approaches).We estimated linear regressions because the residuals were approximately normally distributed (as tested with the sktest in Stata).The coefficients for all disciplines point out that a decrease in the share of publications is associated with higher efficiency scores.If expensive research is done by the university, its efficiency is decreasing.Thus, a high share of paper output especially in physical and health sciences-having the largest coefficients-is related to lower efficiency scores.Furthermore, the results in Table 5 demonstrate that private universities are more efficient than public universities.Many coefficients in the models are statistically not significant (which might be the result of the low numbers of universities in the study).Subsequent to the regression models, we calculated efficiency scores for every university, which are adjusted by the influence of the independent variables.Thus, the scores are adjusted to the different institutional and field-specific profiles of the universities.It is worth noting that the adjusted scores are not predicted values, but institutional values for which the residuals from the regression analyses were added to the mean initial efficiency scores.
The adjusted ranking positions (based on the adjusted scores) are listed in Table 6 besides the initial ranking positions.Although both ranking positions are (highly) correlated (DEA: r s = 0.71, FDH: r s = 0.82, order-α: r s = 0.64, order-m: r s = 0.79), there are significant rank changes for some universities.For example, Harvard University shows a perfect rank position in the FDH; but if the score is adjusted by the independent variables in the regression model, its score decreases, leading to the 17th ranking position.

Stability of the Results over Time
Table 7 shows the Spearman rank correlation coefficients across time (2011, 2012 and 2013) for each approach of the efficiency analysis (adjusted scores).The coefficients are above or around 0.8, which demonstrate that the results are (more or less) stable over the publication years considered.

Discussion
Research evaluation is the backbone of modern science.The emergence of the modern science system is closely related to the introduction of the peer review process in assessments of research results [2].Whereas output and impact assessments were initially at the forefront of assessments, efficiency measurements have become popular in recent years [22].According to Moed and Halevi [11], research efficiency or productivity is measured by indicators that relate research output to input.The consideration of research input in research evaluation is obvious, since the output should be directly related to the input.The output is determined by the context in which research is undertaken [22,49].In this study, we went one step further.We not only related input to output for universities, but also calculated adjusted efficiency scores, which consider the different institutional and field-specific profiles of the universities.For example, it is easily comprehensible that the input-output relations are determined by the disciplinary profiles of the universities.
The present study is based on a comprehensive dataset with input and output data for 50 US universities.As input, we used research expenses, and as output the number of (highly-cited) papers.The results of the DEA and FDH analysis show that Harvard University and Boston College can be called especially efficient-compared with many other universities.Similar results can be found in other efficiency studies including US institutions.Whereas the strength of Harvard University is its high output of (highly-cited) papers, the strength of Boston College is its small input.In the order-α and order-m frameworks, Harvard University remains efficient, but Boston College becomes super-efficient.Although Harvard University is well-known as belonging to the best universities in the world, the correlations between the ranking positions of the universities in the THE Ranking 2015 and the results of our efficiency analyses are at a relatively low level.Thus, the consideration of inputs puts a different complexion on institutional performance.
Besides the university rankings based on the different statistical approaches for efficiency analyses, we produced rankings using adjusted efficiency scores (subsequent to regression analyses).Here, for example, Harvard University's ranking position fell.Although regression analyses have been used in many other efficiency studies, they have been commonly used to explain the differences in efficiency scores [22], but not to generate adjusted scores (for rankings).The adjusted rankings open up new possibilities for institutional performance measurements, as demonstrated by Bornmann, et al. [9].They produced a covariate-adjusted ranking of research institutions worldwide in which single covariates are held constant.For example, the user of the ranking produced by Bornmann, et al. [9] is able to identify institutions with a very good performance (in terms of highly cited papers), despite a bad financial situation in the corresponding countries.
What are the limitations of the current study?Although we tried to realize an advanced design of efficiency analyses, the study is affected by several limitations that should be considered in future studies.
The first limitation is related to the numbers of indicators used.We included only one input and output indicator, respectively.One important reason for this restriction is the focus of this study on efficiency in research.However, many more indicators could be included in future studies (if the focus is broader and not limited to excellent research as in this study).The efficiency study of Bruffaerts, et al. [27], which also focuses on US universities, additionally included the number of PhD degrees as input indicators, as well as several environmental variables (e.g., university size and teaching load).In an overview of efficiency studies, Rhaiem [22] categorized possible research output indicators for efficiency analyses as follows: research outputs, research productivity indices and quality of research indicators.The categorizations for possible input indicators are: "Firstly, human capital category refers to academic staff and non-academic staff; secondly, physical capital category refers to productive capital (building spaces, laboratories, etc.); thirdly, research funds category encompasses budget funds and research income; fourthly, operating budget refers to income and current expenditures; fifthly, stock of cumulative knowledge regroups three sub-categories: knowledge embedded in human resources, knowledge embedded in machinery and equipment, and public involvement in R&D; sixthly, agglomeration effects category refers to regional effect and entrepreneurial environment" (p.595).
The second limitation concerns the quality of the input data [14]."Salary and investment financial structures differ hugely between countries, and salary levels differ hugely between functions, organizations and countries.To paraphrase Belgian surrealism: a salary is not a salary, while a research investment is not a research investment.Comparability (and hence validity) of the underlying data themselves not only is a challenge, it is a problem" [50].We tried to tackle the problem in this study by using the data for all universities from one source: NCES.However, the comparability of the data for the different universities may remain a problem.Thus, Waltman, et al. [14] recommend that "scientometricians should investigate more deeply what types of input data are needed to construct meaningful productivity indicators, and they should explore possible ways of obtaining this data" (p.673) in future studies.
The third limitation questions the general implementation of efficiency studies in the practice of research evaluation.The results of the study by Aagaard and Schneider [51] highlight many difficulties in explaining research performance (output and impact) as a linear function of input indicators.Bornmann and Haunschild [13] see efficiency in research as diametric to creativity and faulty incrementalism, which are basic elements of each (successful) research process.According to Ziman [1], "the post-academic drive to 'rationalize' the research process may damp down its creativity.Bureaucratic 'modernism' presumes that research can be directed by policy.But policy prejudice against 'thinking the unthinkable' aborts the emergence of the unimaginable" (p.330).

Figure 1 .
Figure 1.Graphical exposition of full and partial frontier efficiency analysis.

Figure 1 .
Figure 1.Graphical exposition of full and partial frontier efficiency analysis.

5
See http://www.scimagoir.com.We preferred Scopus over Web of Science data as the coverage of the Scopus database is much broader.

Figure 2 .
Figure 2. Number of super-efficient universities for different values of m and α.

Figure 2 .
Figure 2. Number of super-efficient universities for different values of m and α.

Table 1 .
Descriptive statistics over time.
Notes.Descriptive statistics for the input and output are reported.Research expenses in Million Dollars.

Table 2 .
Efficiency scores and the corresponding rankings based on different approaches for measuring efficiency in 2013 (sorted by THE Ranking).

Table 2 .
Efficiency scores and the corresponding rankings based on different approaches for measuring efficiency in 2013 (sorted by THE Ranking).

Table 4 .
Spearman rank correlations for each approach of efficiency analysis across time.

Table 5 .
Beta coefficients and t statistics of the regression models with the efficiency scores as dependent variable for 2013.

Table 6 .
Initial efficiency rank positions and adjusted rank positions in 2013 (sorted by adjusted DEA scores).

Table 7 .
Spearman rank correlations for the adjusted scores from each approach across time.