Comparing Groups of Decision-Making Units in Efﬁciency Based on Semiparametric Regression

: We consider a stochastic frontier model in which a deviation of output from the production frontier consists of two components, a one-sided technical inefﬁciency and a two-sided random noise. In such a situation, we develop a semiparametric regression-based test and compare the technical efﬁciencies of the different decision-making unit groups, assuming that the production frontier function is the same for all the groups. Our test performs better than the previously proposed ones for the same purpose in numerical studies, and also has the theoretical advantage of working under more general assumptions. To illustrate our method, we apply the proposed test to Program for International Student Assessment (PISA) 2015 data and investigate whether an efﬁciency difference exists between male and female student groups at a speciﬁc age in terms of learning time and achievement in mathematics. formal analysis, H. Noh and S. J. investigation, H. Noh and S. J. writing–original preparation, H. Noh and S. J. Yang; writing–review and editing, H. Noh and S. J. Yang. All have and agree to the published version of the


Introduction
Efficiency comparison between groups is currently used in various fields such as banking, insurance, sports, and R&D investment evaluation. Numerous empirical studies frequently analyze group efficiency using so-called Data Envelopment Analysis (DEA). DEA is a body of techniques for measuring relative efficiency by comparing it with the possible frontiers of decision-making units (DMUs) with multiple inputs and outputs. Here, the term DMU is used to collectively refer to all the units in which the production activity takes place. In the DEA framework, the DMU efficiency scores of each group can be obtained after specifying some assumptions appropriate to the situation, and then the comparison of the efficiency distributions of the groups is made on the basis of their obtained scores. For example, Golany and Storberg [1] and Lee et al. [2] applied non-parametric tests, such as the Mann-Whitney (MW) and Kruskal-Wallis tests, to the efficiency scores. Cummins et al. [3] introduced a dummy variable to indicate the groups, and then regressed the efficiency scores on the dummy variable. Simar and Zelenyuk [4] adapted the test developed in Li [5] to the DEA context and applied it to the obtained scores, to test the equality of efficiency distributions. O'Donnell et al. [6] used the concept of a meta-frontier to compare the technical efficiencies of firms that may be classified into different groups.
However, this stream of research under the DEA framework has a limitation in that it does not consider the noise factor in the production process. DEA typically assumes that the inefficiency of the DMU is the only cause of its production not reaching its maximum output, but obviously there are many uncontrolled factors which need to be considered as the cause. From this recognition, Aigner and Chu [7] and Meeusen and van den Broeck [8] first proposed the stochastic frontier model (SFM), Nowadays, the stochastic frontier model is used in a large literature of studies of production. Hence, we feel the need to develop a method and compare the efficiency difference between groups under SFM framework. One pioneering work in this direction is Banker et al. [9]. They developed five DEA-based hypothesis tests to compare the efficiency of groups under SFM. Although the paper referred above is an important development toward group efficiency comparison under SFM, their tests need to improve further.
First, their rather strong assumptions might limit the applicability of the proposed methods. For their parametric tests, they assumed the equality of both noise variance and inefficiency variance across groups. Second, their theoretical justification of the proposed methods needs to be checked. As regards their ordinary least squares (OLS) test of the mean difference in inefficiency, they provided its asymptotic normality as theoretical basis, but to our knowledge, such asymptotic normality is difficult to obtain because of the slow convergence rate of the DEA estimator when the number of input variables is greater than or equal to 2. The same comment is given in Section 3.2 of Simar and Wilson [10] on a similar type of asymptotic normality result as proof of Proposition 1 in Banker and Natarajan [11]. Finally, because they used the DEA methods for SFM, the tests they developed were based not directly on inefficiency itself, but on the inefficiency contaminated by positive measurement error due to noise. This indirect approach can lower the performance of their tests.
This observation has motivated us to develop a theoretically sound tool for comparison of group inefficiencies in the presence of noise. We develop such a methodology using a semi-parametric regression technique instead of DEA methods. The newly developed test performs better than the tests of Banker et al. [9] in numerical studies. It also has the theoretical advantage of working under more general assumptions compared to Banker et al. [9].
The rest of this paper is organized as follows. Section 2 describes our proposed test for group inefficiency comparison. We then perform some simulation studies and compare our test with the tests proposed by Banker et al. [9] in Section 3. We illustrate our method by applying the proposed test to Program for International Student Assessment (PISA) 2015 data and investigate whether an efficiency difference exists between male and female student groups at a specific age in terms of learning time and achievement in mathematics in Section 4. Section 5 provides some discussion and future research topics.

Group Efficiency Comparison under SFM
Assume that we have observations on n DMUs, where each observation consists of a vector of p inputs X i = (X 1,i , . . . , X p,i ) and the corresponding output Y i . We consider the case where n DMUs can be divided into two distinct groups with n l observations (n = n 1 + n 2 ). We assume the following stochastic frontier model for two groups of DMUs: where ε l,i = V l,i − U l,i , V l,i is a random noise term of the lth group with E(V l,i |X i ) = 0, and U l,i is an inefficiency term of the lth group with U l,i ≥ 0 for l = 1, 2. We assume that the same production technology is applied to both DMU groups. Hence, the production frontier function φ(·) is the same throughout the groups, as in Banker et al. [9]. Under this model, we need to estimate the difference E(U 1 ) − E(U 2 ) and test the hypothesis to know which DMU group is more efficient. A novelty of our approach in developing the test is to implement the test without imposing any parametric assumption on the frontier function φ(·), and with minimal assumptions on inefficiency and random noise. Banker et al. [9] also implemented the test without any parametric assumption on φ(·), but with additional restrictive parametric assumptions on noise and inefficiency. In the following sections, we first review the work of Banker et al. [9] and then explain the development of our semiparametric regression-based test.

The Previous Work
To apply the DEA methods to SFM, Banker et al. [9] assumed that the random noise variables V 1,i , V 2,i are bounded above by V max , that is, V 1,i , V 2,i ≤ V max . Under this assumption, they transformed model (1) as SinceŨ l,i = (V max − V 1,i ) + U l,i ≥ 0, they considered the translated production functionφ(·) = φ(·) + V max as a new production function, andŨ l,i as the inefficiency of the DEA framework. The new inefficiencyŨ l i can be estimated asφ(X i ) − Y i afterφ(·) is estimated using the conventional DEA methods. After estimatingŨ l,i using DEA methods, they used it for group efficiency comparison. This approach is advantageous in that we use the strength of the existing well-developed DEA techniques. However, the approach has one disadvantage in that the tests developed are based not on inefficiency (U l,i ) itself, but on the inefficiency contaminated by the positive measurement error (V max − V l,i ) due to random noise. Additionally, the distributional property of the inefficiency estimated using DEA methods is generally hard to derive or quite complicated, making it very difficult to develop a statistical test theory based on estimated inefficiency (estimate ofŨ l,i ). Hence, we are motivated to develop a test for (2) directly based on inefficiency U l,i . We will explain this in the following section.

The Proposed Test
This section introduces our approach to testing the hypothesis in (2). Unlike Banker et al. [9], we do not require that neither the noise variance nor inefficiency should be equal across groups. Moreover, we allow for distributional difference in the composite error ε and input vector X i from the production environmental factors of each group. Specifically, the variance of V l and mean of inefficiency U l can differ by the group as well as conditional distribution of X i , given group l.
First, model (1) can be written as two nonparametric mean regression models as follows: where . If a dummy variable is defined for groups letting T i = 0 for i ∈ {1, . . . , n 1 } and T i = 1 for i ∈ {n 1 + 1, . . . , n 1 + n 2 }, the two models (4) and (5) can be integrated into a single partial linear semiparametric regression model as follows: where Using this model (6), we can test hypothesis (2) by testing hypothesis Note that (6) is a heteroscedastic partial linear model. Liang [12] and Ma et al. [13] studied model (6) when X i is univariate. By extending the theory from there to the case where X i is multivariate, we can test hypothesis (7). In Appendix A, we prove the asymptotic normality of the kernel-based profile estimator of β 0 based on a local linear model smoother when X i is multivariate, and provide the necessary assumptions for it. As with the estimator in Liang [12], the kernel-based profile estimator of β 0 when X i is multivariate is given aŝ where T = (T 1 , . . . , T n ) , Y = (Y 1 , . . . , Y n ) , and S the smoother matrix for estimating the vector (E(·|X 1 ), . . . , E(·|X n )) . If we choose local linear regression as the smoothing method, the smoothing matrix S = [s X 1 · · · s X n ] will be a collection of row vectors, each of which is the smoother vector where } for some kernel function K and bandwidth vector h = (h 1 , . . . , h p ) ; and As regards the estimation of σ 2 , we can first directly estimate variance σ 2 using the estimatesε * We can also estimate it using the sandwich covariance estimate based on (8), Matrix Var (Y|T, X 1 , . . . , X n ) is diagonal, with the ith diagonal element equal to where Since the frontier function is generally (coordinatewise) non-decreasing with respect to the input variables, one might consider it necessary to impose such a monotonicity on φ * (·). However, from Theorem 2.1 in Huang [14], such imposition will not decrease the asymptotic variance ofβ 0 ; that is, it shows no theoretical improvement in performance. We therefore choose to develop the test without the monotonicity assumption for simplicity.
Note that our test directly estimates the mean difference in inefficiency E(U 1 ) − E(U 2 ) using the semiparametric regression technique. Thus, the proposed test can work under assumptions that are more general than those in Banker et al. [9]. Additionally, we do not have to assume that noise has a finite upper support bound (V max ). However, the tests in Banker et al. [9] need such assumptions because they estimateŨ l,i = (V max − V l,i ) + U l,i and use it as a surrogate estimate of U l,i . However, V max − V l,i may hamper the tests and degrade their performance.

Numerical Studies
In this section, we compare the performance of our test with those of Banker et al. [9]. We consider single and multiple input cases and use sandwich formulas to estimate the variance in estimators.

Single Input Case
We first consider a single input case using the following model: l,u ), and V l,i follow the truncated normal distribution with mean 0 and variance σ 2 l,v , which lies within (−6σ l,v , 6σ l,v ), l = 1, 2. Here, N + stands for a normal distribution limited to the domain [0, ∞). As for V l,i , we try two cases to reflect both the equal and unequal error variances between groups. We set σ 1,v = σ 2,v = 1 for the equal error variance case and σ 1,v = √ 2, σ 2,v = 1 for the unequal variance case. To evaluate the type I error rate and power, we again consider two cases based on whether a mean difference (β 0 ) exists or does not exist between group inefficiencies: Here, the type I error rate implies the rate of supporting group difference in mean inefficiency when there is no difference and the power means the rate of supporting group difference in mean inefficiency when there are really inefficiency differences between groups.
We consider three sample sizes, n = 100, 200, and 400; the proportion of each group is approximately 50% and number of replications is 1000. For a comparison, we report the type I error and power of the following five tests with significance level α = 0.05: our proposed test (PT), the OLS test, the T-test, the Mann-Whitney (MW) test, the Kolmogorov-Smirnov (KS) test, and the F-test. The last five tests are from Banker et al. [9]. We used a plug-in principle (see Ruppert et al. [15]) to find the bandwidth for our PT.
The test results are depicted in Table 1. Four of these tests, that is, except the KS test and the F-test, seem to respect the significance level in both the equal and unequal variance cases. However, the KS test obviously shows a larger type I error rate than expected for unequal error variances and the F-test seems to be a conservative test, which gives much smaller type I error probabilities than expected. As regards the power, our PT performs best, with the largest power among all the tests. In unequal variance cases where n = 200, 400, the KS test has larger power than our PT. However, the KS test is not reliable since it tends to reject the null hypothesis too easily in those cases. Finally, all tests tend to show higher power with larger sample sizes.

Multiple Input Case
We next consider a multiple input case with p = 3. All the components in the simulation, except for the frontier function φ, are the same as in the single input case. As regards the frontier function, we consider two scenarios: the production function has an additive form, and the production function does not have an additive form. The additive assumption on the production function is used in Ferrara and Vidoli [16], but it may not be satisfied in some cases. However, in case of multiple covariates, the practical applicability of our proposed method may become worse since it requires multivariate smoothing and therefore suffers from the well-known "curse of dimensionality" problem as the dimension of the covariates becomes higher. In this case, additive modeling can be a meaningful alternative. For this, we try to estimate the difference in group efficiencies and test whether it is zero with an alternative estimating strategy, where we employ a backfitting procedure, which is a well-known estimating approach under the additive assumption. See Appendix B for details of the alternative method. Considering these two scenarios (additive and non-additive production functions), we examine how our PT(n) and its alternative based on the additive assumption, PT(a), behave depending on the validity of the assumption. The model considered here is where X j,i , j = 1, 2, 3, are generated from U(0, 1) independently. For the first scenario, we set φ(x) = (30x 1 − 9x 2 1 ) + (5 + 2 arctan(10(x 2 − 0.5))) + (4 √ x 3 ); this has an additive structure. For the second scenario, we consider φ(x) = (4 Note that both these production functions are concave. For PT(n), we select bandwidths by a generalized cross validation method (see Hastie and Tibshirani [17]), and for PT(a), we adopt a plug-in principle, as in the single input case. The performances of PT(n) and PT(a) as well as the other five tests are reported in Tables 2 and 3. From Table 2, our PT(n) outperforms the competitors overall, especially in the unequal variances case. Note that its type I error rates do not deviate much from 0.05, which means that the type I error rate is under control as desired; those of other competitors such as the OLS, T and KS tests tend to be a bit smaller than this level in case of equal variances, and considerably larger in case of unequal variances. The MW test seems to respect the level like PT(n) but PT(n) turns to be more powerful than MW. Note that PT(a) shows good results in terms of type I error rate, with comparable power to the MW test. Its power is lower than that of PT(n), but this is natural since the true production function is not additive. The F-test seems to be anticonservative leading to too high of a type I error probability, especially in the unequal variances case. However, in the equal variance case, the F-test shows shows very good performance in terms of type I error rate and power in large samples, as reported in Banker et al. [9]. From Table 3, our proposed two tests outperform their competitors when the additive assumption is true. Note that under the additive assumption, both PT(a) and PT(n) correctly specify the model. From our simulation, PT(a) slightly outperforms PT(n), since their type I error rates are close to 0.05 and the power of PT(a) is larger than that of PT(n). In case of equal variances, the type I error rates of the five competitors tend to be below 0.05, but when the variances are unequal, their power becomes much lower than our proposed tests although overall they show satisfactory type I errors. The only exception is the F-test. It shows the largest power in the unequal variances case but such merit is dimmed by considerably larger type I errors than other tests. Table 2. Type I error and power of the multiple input case with equal and unequal error variances when the true production function is not additive.  Table 3. Type I error and power of the multiple input case with equal and unequal error variances when the true production function is additive.

Application to PISA 2015 Data
In this section, we applied our PT to PISA 2015 data and test the efficiency difference between male and female student groups at a specific age in terms of learning time (X i ) and achievement (Y i ) in mathematics. The data can be downloaded from http://www.oecd.org/pisa/data/. PISA is a worldwide study to evaluate educational systems by measuring the scholastic performance of 15-year-old school students in mathematics, science, and reading. We considered the regional averages of the students' learning time and achievement in mathematics based on test results of the 2015 version as production data. Out of the 103 regions in the data, two regions, Nova Scotia in Canada and Chile, were excluded from our analysis in view of their outlier characteristics in efficiency analysis.
Usually, international large-scale assessments data include measurement errors at the individual as well as group level. Therefore, we considered the following stochastic frontier model for such data: In this model, we assumed that there would be no gender difference in learning ability from a biological point of view and use the same production frontier for both gender group. It means that all the socio-economic characteristics of differentiation between the gender groups were in the random error terms and not introduced in the frontier function. Table 4 shows summary statistics of each student group data. We applied the six tests in Section 3 to the data and calculated the p-values for the following hypothesis testing: From Samuelsson and Samuelsson [18], it is known that male students are often more involved in mathematics classes than female students. Additionally, since women are more involved in domestic chores than men and for men time is often made free by their families and relatives for the learning activity, male students are likely to be in an environment where they can focus more on studying than female students. Hence, we expected the results of the test to indicate that the effectiveness of male students was greater than that of female students in average. Table 5 gives the test results. At a significance level of around α = 0.05, our PT, the MW test, and the KS test (with p-value slightly higher that α = 0.05) supported the hypothesis that on average male students are more efficient in mathematics than female students. However, the OLS test, the T-test and the F-test reported no significant difference in learning efficiency between the two groups. The reason for this could be the somewhat restrictive assumptions for test validity. Thus, the three tests seem to face the risk of unreasonable results if the assumptions are not satisfied in practice, but our PT does not seem to suffer from this problem.

Discussion and Conclusions
In this study, we developed tests with sound statistical theory for group efficiency comparison under SFM with considerably better performance than the previous tests proposed in numerical simulations. However, there is still room for improvement in our methods.
First, since we perform full nonparametric modeling for the frontier function φ(·), which can be multivariate, our test might suffer from the "curse of dimensionality" and require high-order kernels for implementation with four or more input variables. In such a situation, we can consider an alternative test with spirit as in our test when the frontier function φ(X) has an additive structure, that is, φ(X) = ∑ p j=1 φ j (X j ), or could be well-approximated by it. Second, we only deal with one output case, which limits practical applications. Our methods should be extended to cover the case of multi-output production frontiers, which DEA methods cover.
Third, we assume the same production frontier for both group, which is a clear limitation in practice since such situation is not frequently observed. If it is important to assume separate production frontier functions for different groups, one can use the meta-frontier approach. O'Donnell et al. [6] proposed a meta-frontier approach to compare the group technical efficiencies under stochastic frontier framework. The proposed method has the advantage that it can be used without assuming a common frontier function. However, the use of their method sometimes can be restricted by their assumption that the frontier production function is log-linear.
Finally, if one is interested in estimating the mean inefficiency of each group, we refer to Noh and Van Keiligom [19], which is a recent work along that direction. Theorem A1. Under the above assumptions, where φ * = (φ * (X 1 ), . . . , φ * (X n )) and ε * = (ε * 1 , . . . , ε * n ) . It suffices to show that To prove these, we first give the following fact. sup where ξ(x) = E(R|X = x) andξ(x) is its local linear estimator. That is,ξ(x) = s x R with R = (R 1 , . . . , R n ) when R j is a response variable. (A3) can be shown from the standard theory of kernel smoothing. Note that (I − S)T = (T − E(T|X)) + (E(T|X) − ST).
The second term of the right-hand side of the above equation is o p (1) from (A3). This proves (A1). Next, we write Since To treat A 1,n , we note that and denoteφ * = S(Y − β 0 T). Then, Let G denote a class of functions satisfying |g(x) − g(y)| ≤ x − y for x, y ∈ [0, 1] p . Then, n a 0 (φ * (·) − φ * (·)) belongs to G with probability tending to 1 from (A3) and (A4), where a 0 < max{2a, (1 − ap − 2a)/2}. We can show that the δ-entropy of G for the supremum norm satisfies for some constant K. Here, we consider an empirical process  (1). Note that the exponential tail condition, required to apply the empirical process technique, is automatically satisfied in our case since T is a binary variable.
Liu [24]. For the test, we use the profile least square estimator of β 0 in Wei and Liu [24]. However, Wei and Liu [24] only showed the asymptotic distribution of the estimator of the parametric component vector (in our case, the estimator of β = (µ, β 0 ) ) under the homoscedasticity assumption of the error (Theorem 2.1 of their paper), and so we extended their result to the heteroscedasticity case.
To introduce the profile least square estimator of β 0 using the method of Wei and Liu [24], we define some notations. Let where S k is the smoothing matrix for local linear regression with respect to the jth (j ∈ {1, . . . , p}) covariate vector X j = (X j,1 , . . . , X j,n ) with kernel function K(·) and bandwidth h j , S * j = (I n − 1 n 1 n /n)S j , and 1 n = (1, . . . , 1) with length n. Additionally, we define the additive smoother matrix W j as W j = E j S −1 T C, where E j is a partitioned matrix of dimension n × np with n × n identity matrix as the jth "block" and zeros elsewhere. Then, the profile least squares estimator of β = (µ, β 0 ) is obtained as the estimator of the coefficient vector β of a synthetic linear regression model where W M = ∑ p j=1 W j ,Ỹ = (Ỹ 1 , . . . ,Ỹ n ) = W M Y andT des = (T des,1 , . . . ,T des,n ) = W M T des . Additionally, since W M 1 n = (0, . . . , 0) , we know that the linear model (A9) becomes where J = 1 n 1 n /n. Using the results to prove Theorem 2.1 in Wei and Liu [24], we show below that under some regularity conditions, estimatorβ 0 is asymptotically normal with mean zero and variance . Once we have a consistent estimate of σ 2 add , we can test (7) for a given significance level α. As with the case of single input variable, we can directly estimate the variance σ 2 β 0 using estimatesε * i andÊ(T|X j,i ). Here,Ê(T|X j,i ) can be obtained as the ith element of S j (T 1 , . . . , T n ) . Alternatively, we can estimate the variance via the sandwich formula estimate based on (A11) following similar steps in Section 2.2. Now, we can show the asymptotic property of the profile least square estimator of β 0 . For this, we first list the relevant assumptions.

3.
The density functions of X j (j ∈ {1, . . . , p}) are continuous, and bounded away from zero and infinity on their supports, which are bounded. 4.
V 1 , V 2 , U 1 and U 2 have finite second moments.
where µ φ * = E(∑ p j=1 φ * (X j,1 )). Then, we have from Lemma B.6 in [25]. Note that this is true as long as the conditional variance of the ε * given covariates exists. This is guaranteed by assumption 4. Wei and Liu (2012) used a similar fact under the homoscedastic error assumption. Then, with a derivation similar to that in Lemma 6.3 of Wei and Liu (2012), we can show that (A15) converges to zero in probability and (A16) can be written as: to complete the proof.