Next Article in Journal
Impact of Readability on Corporate Bond Market
Next Article in Special Issue
Does More Expert Adjustment Associate with Less Accurate Professional Forecasts?
Previous Article in Journal
Internal Control and SMEs’ Sustainable Growth: The Moderating Role of Multiple Large Shareholders
Open AccessArticle

Robust Bayesian Inference in Stochastic Frontier Models

Lancaster University Management School, Lancaster University, Lancaster LA1 4YX, UK
J. Risk Financial Manag. 2019, 12(4), 183; https://doi.org/10.3390/jrfm12040183
Received: 4 November 2019 / Revised: 2 December 2019 / Accepted: 2 December 2019 / Published: 4 December 2019
(This article belongs to the Special Issue Advances in Econometric Analysis and Its Applications)

Abstract

We use the concept of coarsened posteriors to provide robust Bayesian inference via coarsening in order to robustify posteriors arising from stochastic frontier models. These posteriors arise from tempered versions of the likelihood when at most a pre-specified amount of data is used, and are robust to changes in the model. Specifically, we examine robustness to changes in the distribution of the composed error in the stochastic frontier model (SFM). Moreover, coarsening is a form of regularization, reduces overfitting and makes inferences less sensitive to model choice. The new techniques are illustrated using artificial data as well as in a substantive application to large U.S. banks.
Keywords: productivity and efficiency; bayesian analysis; robustness; stochastic frontier models productivity and efficiency; bayesian analysis; robustness; stochastic frontier models

1. Introduction

The stochastic frontier model (SFM) is a standard tool in the estimation of efficiency from observed data. Robustness of SFM has not been examined thoroughly in the literature although many alternative distributional assumptions have been proposed for the error components of the model.1 Bayesian analysis of SFM is widely used due to the convenience allowed by Markov Chain Monte Carlo in dealing with latent inefficiencies that are present in the model, particularly under alternative distributional assumptions. Feng et al. (2019) proposed a semiparametric model for stochastic frontier models, specifically, the one-sided error term is approximated by a log-transformed Rosenblatt-Parzen kernel density estimator. In a Monte Carlo study, they found that that the kernel-based semiparametric model performs better than the commonly-used exponential stochastic frontier model. Their study also indicates that the kernel model shows similar performance to a non-parametric model.
Our motivation in this paper is to take into account the previous literature with an eye towards making the posterior (and, therefore, statistical inferences) more robust to misspecification. For example, standard inference in stochastic frontier models does not take into account not only outliers but, perhaps more importantly, deviations of the assumed distributions of two-sided and one-sided error terms from their actual counterparts. To the extent that the actual distributions are unknown (see, for example, Feng et al. 2019 for details) misspecification is quite likely so, in practice, statistical inferences are likely to be misleading.
A robust posterior, in the general case, has been proposed by Miller and Dunson (2019). Specifically, rather than conditioning on the observed data assumed to be generated by the model, we condition on the event that the model generates data that are distributionally close to the observed data. This technique allows examining robustness to changes in the distribution of the composed error in SFM. Additionally, coarsening is a form of regularization, reduces overfitting and makes inferences less sensitive to model choice. When we are interested in the estimation of efficiency, returns to scale, productivity growth, etc., this is clearly a desirable goal.

2. Model

Suppose we have observed data x X n and the “ideal” data is X * -ideal in the sense that it is a random sample from the true data generating process (DGP). We focus on the case of i.i.d data whose distribution has density p ( x i | θ ) , where θ Θ M is a parameter and Θ is the parameter space. The usual posterior p ( θ | x ) may not be robust if there are outliers and/or we are not certain that the density p ( x i | θ ) is the true one.
A robust posterior is defined by Miller and Dunson (2019) in the i.i.d case as follows: p ( θ | d ( P ^ X * , P ^ x ) < r ) , where P ^ x = n 1 i = 1 n δ x i is the empirical distribution of x and similarly for P ^ X * , for some discrepancy measure D ( , ) and a given r > 0 . Therefore, we condition on the event that the empirical distribution of actual data is close to the empirical distribution of data generated by the model, using a certain discrepancy function between probability measures.
If P o and P θ for θ Θ have densities p o and p θ , respectively, the Kullback–Leibler divergence is defined as:
D ( P θ , P o ) = Θ p o ( χ ) log   p o ( χ ) p θ ( χ ) λ ( d χ ) ,
where λ is the Lebesgue measure. Miller and Dunson (2019) have derived the following approximation to the posterior under the assumption that r has an exponential distribution with parameter α :
p ( θ | d ( P ^ X , P ^ x ) < r ) p ( θ ) i = 1 n p ( x i | θ ) ζ p ζ ( θ | x ) ,
where ζ = α α + n and p ( θ ) is the prior. The approximation is accurate when n α or α n . The main objective of such “coarsened” posteriors is robustness to small changes in the shape of the distribution of the data, i.e., the data generating process. The approximation avoids altogether computation of D ( P ^ X , P ^ x ) or the term Θ p o log   p o which is independent of θ . The interpretation of the coarsened posterior is that it adjusts the sample size from n to n ζ , so effectively we have a smaller sample size.
In this paper we consider the production SFM2:
y i = x i β + v i u i , i = 1 , , n ,
where x i k is a vector of regressors, and v i i i d N ( 0 , σ v 2 ) ,   u i i i d N + ( 0 , σ u 2 ) , i = 1 , , n , ( v i , u i ) x i , i = 1 , , n . Let σ 2 = σ v 2 + σ u 2 and λ = σ u σ v and define the parameter vector θ = [ β , σ v , σ u ] . The augmented posterior is:
p ( y i , u i | x i , θ ) σ v 1 σ u 1 exp   { 1 2 σ v 2 ( y i + u i x i β ) 2 1 2 σ u 2 u i 2 } , i = 1 , , n .
The marginal density of y i is p ( y i | x i , θ ) = 0 p ( y i , u i | x i , θ ) d u i and is available in closed form:
p ( y i | x i , θ ) σ 1 exp   { ( y i x i β ) 2 2 σ 2 } Φ ( λ σ ( y i x i β ) ) , i = 1 , , n ,
where Φ ( ) is the standard normal distribution function; see Kumbhakar and Lovell (2000, p. 78).
Suppose we have a prior p ( θ ) and y = [ y i ; i = 1 , , n ] , X = [ x i ; i = 1 , , n ] denote the data. A coarsened posterior that uses at most ζ n out of n observations is:
p ζ ( θ | y , X ) p ( θ ) i = 1 n p ( y i | x i , θ ) ζ .
Similarly, we can define the coarsened augmented posterior:
p ζ ( θ , u | y , X ) p ( θ ) i = 1 n p ( y i , u i | x i , θ ) ζ ,
where u = [ u 1 , , u n ] . For the SFM it becomes:
p ζ ( θ , u | y , X ) p ( θ ) σ v ζ n σ u ζ n exp   { ζ 2 σ v 2 i = 1 n ( y i + u i x i β ) 2 ζ 2 σ u 2 i = 1 n u i 2 } .
With a nearly flat prior of the form3:
p ( θ ) σ v ( n _ + 1 ) exp   { q _ 2 σ v 2 } ,   p ( σ u ) σ u 1 ,   p ( β ) const . ,
the posterior becomes:
p ζ ( θ , u | y , X ) p ( θ ) σ v ζ n n _ 1 σ u ζ n 1 exp   { ζ 2 σ v 2 [ q _ + i = 1 n ( y i + u i x i β ) 2 ] ζ 2 σ u 2 u m i = 1 n u i 2 } .
Inferences can be implemented using Gibbs sampling with data augmentation based on drawing random numbers from the posterior conditional distributions summarized in Table 1.
1. Draw regression parameters from
β | σ v , σ u , u , y , X N k ( b ,   σ v 2 ζ ( X X ) 1 ) ,
where b = ( X X ) 1 X ( y + u ) .
2. Draw the scale parameter σ v 2 as follows:
q _ + ( y X β + u ) ( y X β + u ) σ v 2 / ζ | β , σ u , u , y , X χ 2 ( n _ + ζ n ) .
3. Draw the scale parameter σ u 2 :
u u σ u 2 / ζ | β , σ v , u , y , X χ 2 ( n ) .
4. Draw technical inefficiencies:
u i | β , σ v , σ u , u , y , X N + ( u ^ i , σ 2 ) , i = 1 , , n ,
where u ^ i = σ 2 ( y i x i β ) σ v 2 , i = 1 , , n .
We implement the Gibbs sampler using 15,000 iterations, the first 5000 of which are discarded to mitigate possible start-up effects. In all computations, we set n _ = 0 and q _ = 0.1 . In the neighborhood of these values, we did not notice the sensitivity of posteriors.4 It should be mentioned that the coarsened posterior will have better mixing properties corresponding to ζ = 1 (which already mixes well) as the likelihood is tempered. Convergence was diagnosed successfully using Geweke’s (1992) diagnostics.
Suppose { u i ( s ) , s = 1 , , S } denotes Gibbs draws for technical inefficiencies (i.e., a sample from the posterior), and set R i ( s ) = exp   ( u i ( s ) ) . Then an estimate of firm-specific efficiency is:
R ^ i = S 1 s = 1 S R i ( s ) , i = 1 , , n .
Clearly, such measures depend on the amount of coarsening ( ζ ) and, therefore, there is the possibility of sensitive dependence on alternative model specifications.
Sometimes, we may have prior information on the regression parameters, β , summarized in the form:
β | σ v N k ( β o , σ v 2 ζ V o ) ,
where the prior mean is β o and the prior covariance matrix is given by σ v 2 ζ V o ,   where β o and V o are known. In this case, we have to modify (11) as follows:
β | σ v , σ u , u , y , X N k ( b ¯ ,   V ¯ ) ,
where b ¯ = ( X X + V o 1 ) 1 X ( y + u ) , and V ¯ = σ v 2 ζ ( X X + V o 1 ) 1 .

3. Illustration

We give an illustration using two models. Model I is: y i = β 1 + β 2 x i 1 + β 3 x i 2 + v i u i , i = 1 , , n , where β 1 = 1 , β 2 = β 3 = 1 2 , with n = 1000 observations, when v i i . i . d N ( 0 , 0.1 2 ) and, independently, u i i . i . d N + ( 0 , 0.5 2 ) . Here, x i 1 and x i 2 are generated from standard normal distributions. Model II has 900 observations generated from the first model, but the last 100 are generated as y i = v i u i , i = 1 , , n , where v i i . i . d N ( 0 , 0.1 2 ) and u i i . i . d N + ( 0 , 0.1 2 ) . Therefore, in the second model, part of the data is generated from a process without systematic part and lower signal-to-noise ratio.
In the left panel of Figure 1, we report the true density of efficiency scores along with estimates of the density for ζ = 1 , 0.95, 0.98 and 0.9. The densities corresponding to different values of ζ are sample densities of posterior mean scores across the sample. In the right panel of Figure 1, we report 100 random efficiency scores corresponding to posteriors with different values of ζ .
From the right panel of Figure 1, the differences between efficiency scores are not so marked as in Model II whose results are reported in the upper panels of Figure 2. In the lower panel, we report marginal posterior densities of the parameters β 2 and β 3 . Efficiency scores are markedly different compared to ζ = 1 and as ζ decreases to 0.8 the efficiency distributions move to the left. The marginal posteriors of β 2 and β 3 also seem to be different although they are not far from the posterior mean when ζ = 1 .

4. Empirical Application

We use the banking data of Malikov et al. (2016) to estimate a translog cost frontier with five input prices, five outputs, a bad output (non-performing loans) and equity included as quasi-fixed input. The data is an unbalanced panel with 2397 bank-year observations for 285 large, relatively homogeneous US banks (2001:I–2010:IV). We refer the reader to Malikov et al. (2016) for further details on the data. Our results are reported in Figure 3 and Figure 4. As ζ increases, the distribution of efficiency scores shifts to the left, efficiency scores remain, however, highly correlated (upper right panel) and the posterior densities of both σ and λ shift to the right. Therefore, changes in distributional assumptions are likely to increase the variance of the error term but also the signal-to-noise ratio λ from 1.2 to 1.8, on the average. From the results in Figure 4, posterior densities of output cost elasticity ( e c y ), elasticity with respect to non-performing loans ( e c b ), technical change ( e c t ) and elasticity with respect to quasi-fixed equity ( e c , e q ) remain robust as ζ changes, although this is less so for output cost elasticity and, therefore, returns to scale.
Finally, to address the question of selecting ζ , Miller and Dunson (2019) propose a measure of fit and a measure of model complexity. A measure of fit is given by the average log-likelihood and the complexity measure is given by the coarsened posterior. Instead, one case uses the log marginal likelihood:
( y , X ) = p ( θ ) p ζ ( y | X , θ ) p ζ ( θ | y , X )   θ Θ ,
where p ζ ( y | X , θ ) is the tempered likelihood. Since this is an identity, we can use θ = θ ¯ ζ , the posterior mean, and the denominator can be approximated with a multivariate normal distribution: p ζ ( θ | y , X ) ( 2 π ) M / 2 | V ζ | 1 / 2 , where V ζ = S 1 s = 1 S ( θ ζ ( s ) θ ¯ ζ ) ( θ ζ ( s ) θ ¯ ζ ) is the posterior covariance matrix, and M = d i m ( θ ) . This is the Laplace approximation to log marginal likelihood (LML), see DiCiccio et al. (1997). We report the values of LML for ten values of ζ in Figure 5.
Since LML stabilizes at ζ = 0.55 one would be safe to average efficiency distributions over the different values of ζ shown in Figure 3. This is valid as alternative models (corresponding to different values of ζ ) have approximately the same posterior model probability given by: p ζ ( y , X ) = ζ ( y , X ) ζ ζ ( y , X ) . Moreover, from the upper right panel of Figure 4, these efficiency scores are highly correlated and slight changes in the data generating process affect only their location. Averaged efficiency densities and densities of output cost elasticity are presented in Figure 6.
The results indicate that efficiency ranges from 60% to slightly less than 100% with a median near 85%. As output cost elasticity averages 1.2, there seem to exist decreasing returns to scale in U.S large banks (the returns to scale measure is 1 / e c y ) with constant returns enjoyed by a sizable number of banks, and increasing returns being only exceptional. Finally, as model uncertainty increases (corresponding to lower values of ζ ), e c y increases slightly and its posterior shifts to the right (upper left panel of Figure 4).
Finally, in Table 1 we provide diagnostic measures to ensure that our MCMC draws provide access to the true posterior.

5. Concluding Remarks

We have proposed the concept of coarsened posteriors to robustify inferences from SFM. In an application to U.S banks, we find that economies of scale and technical change can be estimated in a relatively robust way, including elasticities with respect to equity and non-performing loans but efficiency inferences are less robust to the amount of data we use to robustify the posterior. This causes some concern as small changes in the distributional assumptions may change efficiency scores considerably. Fortunately, efficiency scores remain highly correlated across the robustness parameter ( ζ ), at least in this application. In terms of future research, it would be interesting to examine the performance of robust posteriors in more complicated SFM and reconcile differences that arise from different approaches to modeling inefficiency. Moreover, an interesting avenue for future research would be based on recent work by Guo et al. (2018) who examined whether a parametric production frontier function is suitable in the analysis. Guo et al. (2018) developed two test statistics based on local smoothing and an empirical process, respectively and suggested also residual-based wild bootstrap versions of these two test statistics. As coarsening provides more robust results it is likely that the procedures in Guo et al. (2018) would tend to be in favor of the parametric specification although one has to resolve the issue of applying the Guo et al. (2018) procedures in a Bayesian context.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. DiCiccio, Thomas J., Robert E. Kass, Adrian Raftery, and Larr Wasserman. 1997. Computing Bayes factors by combining simulation and asymptotic approximations. Journal of the American Statistical Association 92: 903–15. [Google Scholar] [CrossRef]
  2. Feng, Guohua, Chuan Wang, and Xibin Zhang. 2019. Estimation of Inefficiency in Stochastic Frontier Models: A Bayesian Kernel Approach. Journal of Productivity Analysis 51: 1–19. [Google Scholar] [CrossRef]
  3. Fernández, Carmen, Jacek Osiewalski, and Mark F. J. Steel. 1997. On the use of panel data in stochastic frontier models with improper priors. Journal of Econometrics 79: 169–93. [Google Scholar] [CrossRef]
  4. Geweke, John F. 1992. Evaluating the Accuracy of Sampling Based Approaches to the Calculation of Posterior Moments. Bayesian Statistics 4. Edited by J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith. Minneapolis: Federal Reserve Bank of Minneapolis, pp. 169–93. [Google Scholar]
  5. Greene, William H. 1999. Frontier Production Functions. In Handbook of Applied Econometrics. Volume II: Microeconomics. Edited by M. Hashem Pesaran and Peter Schmidt. Oxford: Blackwell, chp. 3. pp. 75–153. [Google Scholar]
  6. Guo, Xu, Gaorong Li, Michael John McAleer, and Wing-Keung Wong. 2018. Specification Testing of Production in a Stochastic Frontier Model. Sustainability 10: 3082. [Google Scholar] [CrossRef]
  7. Kumbhakar, Subal C., and C. A. Knox Lovell. 2000. Stochastic Frontier Analysis. Cambridge: Cambridge University Press. [Google Scholar]
  8. Malikov, Emir, Subal C. Kumbhakar, and Mike G. Tsionas. 2016. A cost system approach to the stochastic directional technology distance function with undesirable outputs: The case of U.S. banks in 2001–2010. Journal of Applied Econometrics 31: 1407–29. [Google Scholar] [CrossRef]
  9. Miller, Jeffrey W., and David B. Dunson. 2019. Robust Bayesian inference via coarsening. Journal of the American Statistical Association 114: 1113–25. [Google Scholar] [CrossRef]
  10. Parmeter, Christopher F., and Subal C. Kumbhakar. 2014. Efficiency Analysis: A Primer on Recent Advances. Foundations and Trends in Econometrics 7: 191–385. [Google Scholar] [CrossRef]
1
For excellent surveys, see Greene (1999) and Parmeter and Kumbhakar (2014).
2
The cost frontier is obtained by taking y i and x i . Moreover, the extension to the case of panel data is straightforward.
3
The prior for σ v with n _ = 0 has been proposed by Fernández et al. (1997).
4
The neighborhood is defined as n _ + ν and q _ + s , where ν [ 10 4 , 3 ] and s [ 10 4 , 1 ] following a uniform distribution.
Figure 1. Artificial data, Model I.
Figure 1. Artificial data, Model I.
Jrfm 12 00183 g001
Figure 2. Artificial data, Model II.
Figure 2. Artificial data, Model II.
Jrfm 12 00183 g002
Figure 3. U.S Banking data, efficiency scores, σ and λ .
Figure 3. U.S Banking data, efficiency scores, σ and λ .
Jrfm 12 00183 g003
Figure 4. U.S Banking data, elasticities.
Figure 4. U.S Banking data, elasticities.
Jrfm 12 00183 g004
Figure 5. Normalized log marginal likelihood. Note: Log marginal likelihood is normalized so that its value at ζ = 0.5 is zero.
Figure 5. Normalized log marginal likelihood. Note: Log marginal likelihood is normalized so that its value at ζ = 0.5 is zero.
Jrfm 12 00183 g005
Figure 6. Robust posteriors of efficiency and output cost elasticity.
Figure 6. Robust posteriors of efficiency and output cost elasticity.
Jrfm 12 00183 g006
Table 1. MCMC convergence diagnostics.
Table 1. MCMC convergence diagnostics.
ParametersGCDRNEacf(50)
β 1.6190.5820.392
σ v 1.3030.6180.403
σ u 1.2020.5880.380
u 1.7190.4490.344
Notes: GCD is Geweke’s (1992) convergence diagnostic which follows (asymptotically in the number of MCMC draws) a standard normal distribution. RNE denotes “relative numerical efficiency” which is equal to one under i.i.d sampling from the posterior. Moreover, acf(50) denotes autocorrelation of MCMC draws at lag 50. More specifically, “GCD” is the maximum absolute value across parameters β or u We use the same strategy for RNE and acf(50).
Back to TopTop