Robust Bayesian Inference in Stochastic Frontier Models

We use the concept of coarsened posteriors to provide robust Bayesian inference via coarsening in order to robustify posteriors arising from stochastic frontier models. These posteriors arise from tempered versions of the likelihood when at most a pre-speciﬁed amount of data is used, and are robust to changes in the model. Speciﬁcally, we examine robustness to changes in the distribution of the composed error in SFM. Moreover, coarsening is a form of regularization, reduces overﬁtting and makes inferences less sensitive to model choice. The new techniques are illustrated using artiﬁcial data as well as in a substantive application to large U.S banks.


Introduction
The stochastic frontier model (SFM) is a standard tool in estimation of efficiency from observed data. Robustness of SFM has not been examined thoroughly in the literature although many alternative distributional assumptions have been proposed for the error components of the model. 1 Bayesian analysis of SFM is widely used due to the convenience allowed by Markov Chain Monte Carlo in dealing with latent inefficiencies which are present in the model, particularly under alternative distributional assumptions. Feng et al. (2019) proposed a semiparametric model for stochastic frontier models: Specifically, the one-sided error term is approximated by a log-transformed Rosenblatt-Parzen kernel density estimator. In a Monte Carlo study they find that that the kernel-based semiparametric model performs better than the commonly-used exponential stochastic frontier model. Their study also indicates that the kernel model shows similar performance to a non-parametric model.
Our motivation in this paper is to take into account the previous literature with an eye towards making the posterior (and, therefore, statistical inferences) more robust to misspecification. For example, standard inference in stochastic frontier models does not take into account not only outliers but, perhaps more importantly, deviations of the assumed distributions of two-sided and one-sided error terms from their actual counterparts. To the extent that the actual distributions are unknown (see, for example, Feng et al., 2019 for details) misspecification is quite likely so, in practice, statistical inferences are likely to be misleading.
A robust posterior, in the general case, has been proposed by Miller and Dunson (2019). Specifically, rather than conditioning on the observed data assumed to be generated by the model, we condition on the event that the model generates data that are distributionally close to the observed data. This technique allows to examine robustness to changes in the distribution of the composed error in SFM. Additionally, coarsening is a form of regularization, reduces overfitting and makes inferences less sensitive to model choice. When we are interested in estimation of efficiency, returns to scale, productivity growth etc., this is clearly a desirable goal.

Model
Suppose we have observed data x ∈ X ⊆ R n and the "ideal" data is X * -ideal in the sense that it is a random sample from true data generating process (DGP). We focus on the case of i.i.d data whose distribution has density p(x i |θ), where θ ∈ Θ ⊆ R M is a parameter and Θ is the parameter space. The usual posterior p(θ|x) may not be robust if there are outliers and / or we are not certain that the density p(x i |θ) is the true one.
A robust posterior is defined by Miller and Dunson (2019) in the i.i.d case as follows: p(θ|d(P X * ,P x ) < r), wherê P x = n −1 n i=1 δ xi is the empirical distribution of x and similarly forP X * , for some discrepancy measure D(·, ·) and a given r > 0. Therefore, we condition on the event that the empirical distribution of actual data is close to the empirical distribution of data generated by the model, using a certain discrepancy function between probability measures.
If P o and P θ for θ ∈ Θ have densities p o and p θ , respectively, the Kullback-Leibler divergence is defined as: where λ is the Lebesgue measure. Miller and Dunson (2019) have derived the following approximation to the posterior under the assumption that r has an exponential distribution with parameter α: where ζ = α α+n and p(θ) is the prior. The approximation is accurate when n α or α n. The main objective of such "coarsened" posteriors is robustness to small changes in the shape of the distribution of the data, i.e. the data generating process. The approximation avoids altogether computation of D(P X ,P x ) or the term Θ p o log p o which is independent of θ.
The interpretation of the coarsened posterior is that it adjusts the sample size from n to nζ, so effectively we have a smaller sample size.
In this paper we consider the production SFM 2 : .., n. Let σ 2 = σ 2 v + σ 2 u and λ = σu σv and define the parameter vector θ = [β , σ v , σ u ] . The augmented posterior is: The marginal density of y i is p(y i |x i , θ) = ∞ 0 p(y i , u i |x i , θ)du i and is available in closed form: where Φ(·) is the standard normal distribution function; see Kumbhakar and Lovell (2000, p. 78).
Suppose we have a prior p(θ) and y = [y i ; i = 1, ..., n], X = [x i ; i = 1, ..., n] denote the data. A coarsened posterior that uses at most ζn out of n observations is: Similarly, we can define the coarsened augmented posterior: Algorithm 1 MCMC draws from posterior conditional distributions 1. Draw regression parameters from where b = (X X) −1 X (y + u).
2. Draw the scale parameter σ 2 v as follows: 3. Draw the scale parameter σ 2 u : 4. Draw technical inefficiencies: where u = [u 1 , ..., u n ] . For the SFM it becomes: With a nearly flat prior of the form 3 : the posterior becomes: Inferences can be implemented using Gibbs sampling with data augmentation based on drawing random numbers from the posterior conditional distributions summarized in Table 1.
We implement the Gibbs sampler using 15,000 iterations, the first 5,000 of which are discarded to mitigate possible start up effects. In all computations we set n = 0 and q = 0.1. In the neighborhood of these values we did not notice sensitivity 3 The prior for σv with n = 0 has been proposed by Fernandez et al. (1997).
of posteriors. 4 It should be mentioned that the coarsened posterior will have better mixing properties corresponding to ζ = 1 (which already mixes well) as the likelihood is tempered. Convergence was diagnosed successfully using Geweke's (1992) diagnostics.
Suppose u (s) i , s = 1, ..., S denotes Gibbs draws for technical inefficiencies (i.e. a sample from the posterior), and set Then an estimate of firm-specific efficiency is: Clearly, such measures depend on the amount of coarsening (ζ) and, therefore, there is the possibility of sensitive dependence on alternative model specifications.
Sometimes, we may have prior information on the regression parameters, β, summarized in the form: where the prior mean is β o and the prior covariance matrix is given by In this case, we have to modify (11) as follows:

Illustration
We give an illustration using two models. Model I is: .., n, where β 1 = −1, β 2 = β 3 = 1 2 , with n = 1000 observations, when v i ∼ i.i.d N (0, 0.1 2 ) and, independently, u i ∼ i.i.dN + (0, 0.5 2 ). Here, x i1 and x i2 are generated from standard normal distributions. Model II has 900 observations generated from the first model, but the last 100 are generated as 0.1 2 ). Therefore, in the second model, part of the data is is generated from a process without systematic part and lower signal-to-noise ratio.
In the left panel of Figure 1, we report the true density of efficiency scores along with estimates of the density for ζ = 1, 0.95, 0.98 and 0.9. The densities corresponding to different values of ζ are sample densities of posterior mean scores across the sample. In the right panel of Figure 1, we report 100 random efficiency scores corresponding to posteriors with different values of ζ.
From the right panel of Figure 1, the differences between efficiency scores are not so marked as in Model II whose results are reported in the upper panels of Figure 2. In the lower panel we report marginal posterior densities of the parameters ζ=0.95 ζ=0.9 ζ=0.8 β 2 and β 3 . Efficiency scores are markedly different compared to ζ = 1 and as ζ decreases to 0.8 the efficiency distributions move to the left. The marginal posteriors of β 2 and β 3 also seem to be different although they are not far from the posterior mean when ζ = 1.

Empirical application
We use the banking data of Malikov, Kumbhakar and Tsionas (2015) to estimate a translog cost frontier with five input prices, five outputs, a bad output (non-performing loans) and equity included as quasi-fixed input. The data is an unbalanced panel with 2,397 bank-year observations for 285 large, relatively homogeneous US banks (2001:I-2010:IV). We refer the reader to Malikov et al. (2015) for further details on the data. Our results are reported in Figures 3 and 4. As ζ increases, the distribution of efficiency scores shifts to the left, efficiency scores remain, however, highly correlated (upper right panel) and the posterior densities of both σ and λ shift to the right. Therefore, changes in distributional assumptions are likely to increase the variance of the error term but also the signal-to-noise ratio λ from 1.2 to 1.8, on the average. From the results in Figure 4, posterior densities of output cost elasticity (e cy ), elasticity with respect to non-performing loans (e cb ), technical change (e ct ) and elasticity with respect to quasi-fixed equity (e c,eq ) remain robust as ζ changes, although this is less so for output cost elasticity and, therefore, returns to scale.
Finally, to address the question of selecting ζ, Miller and Dunson (2019) propose a measure of fit and a measure of model complexity. A measure of fit is given by the average log-likelihood and the complexity measure is given by the coarsened posterior. Instead, one case use the log marginal likelihood: where p ζ (y|X, θ) is the tempered likelihood. Since this is an identity, we can use θ = θ ζ , the posterior mean, and the denominator can be approximated with a multivariate normal distribution: p ζ (θ|y, X) is the posterior covariance matrix, and M = dim(θ). This is the Laplace approximation to log marginal likelihood (LML), see DiCiccio et al. (1997). We report the values of LML for ten values of ζ in Figure 5.
Since LML stabilizes at ζ = 0.55 one would be safe to average efficiency distributions over the different values of ζ shown in Figure 3. This is valid as alternative models (corresponding to different values of ζ) have approximately the same posterior model probability given by: p ζ (y, X) = M ζ (y,X) ζ M ζ (y,X) . Moreover, from the upper right panel of Figure 4 these efficiency scores are highly correlated and slight changes in the data generating process affect only their location. Averaged efficiency densities and densities of output cost elasticity are presented in Figure 6.
The results indicate that efficiency ranges from 60% to slightly less than 100% with a median near 85%. As output cost elasticity averages 1.2, there seem to exist decreasing returns to scale in U.S large banks (the returns to scale measure   Geweke's (1992) convergence diagnostic which follows (asymptotically in the number of MCMC draws) a standard normal distribution. RNE denotes "relative numerical efficiency" which is equal to one under i.i.d sampling from the posterior. Moreover, acf(50) denotes autocorrelation of MCMC draws at lag 50. More specifically, "GCD" is the maximum absolute value across parameters β or u. We use the same strategy for RNE and acf(50).
is 1/e cy ) with constant returns enjoyed by a sizable number of banks, and increasing returns being only exceptional. Finally, as model uncertainty increases (corresponding to lower values of ζ), e cy increases slightly and its posterior shifts to the right (upper left panel of Figure 4).
Finally, in Table 1 we provide diagnostic measures to ensure that our MCMC draws provide access to the true posterior.

Concluding remarks
We have proposed the concept of coarsened posteriors to robustify inferences from SFM. In an application to U.S banks, we find that economies of scale and technical change can be estimated in a relatively robust way, including elasticities with respect to equity and non-performing loans but efficiency inferences are less robust to the amount of data we use to robustify the posterior. This causes some concern as small changes in the distributional assumptions may change efficiency scores considerably. Fortunately, efficiency scores remain highly correlated across the robustness parameter (ζ), at least in this application. In terms of future research, it would be interesting to examine the performance of robust posteriors in more complicated SFM and reconcile differences that arise from different approaches to modeling inefficiency. Moreover, an interesting avenue for future research would be based on recent work by Guo et al. (2018) who examined whether a parametric production frontier function is suitable in the analysis. Guo et al. (2018) developed two test statistics based on local smoothing and an empirical process, respectively and suggested also residual-based wild bootstrap versions of these two test statistics. As coarsening provides more robust results it is likely that the procedures in Guo et al. (2018) would tend to be in favor of the parametric specification although one has to resolve the issue of applying the Guo et al. (2018) procedures in a Bayesian context.