Bayesian Approach to Disentangling Technical and Environmental Productivity

This paper models the firm's production process as a system of simultaneous technologies for desirable and undesirable outputs. Desirable outputs are produced by transforming inputs via the conventional transformation function, whereas (consistent with the material balance condition) undesirable outputs are by-produced via the so-called "residual generation technology". By separating the production of undesirable outputs from that of desirable outputs, not only do we ensure that undesirable outputs are not modeled as inputs and thus satisfy costly disposability, but we are also able to differentiate between the traditional (desirable-output-oriented) technical productivity and the undesirable-output-oriented environmental, or so-called "green", productivity. To measure the latter, we derive a Solow-type Divisia environmental productivity index which, unlike conventional productivity indices, allows crediting the ceteris paribus reduction in undesirable outputs. Our index also provides a meaningful way to decompose environmental productivity into environmental technological and efficiency changes.


Introduction
The by-production of undesirable, or so-called "bad", outputs is an inherent attribute of many production processes. Electric power generation is a classical example of such a process, where the production of electricity (desirable output) is accompanied by the emission of pollutant gases (undesirable outputs). It is therefore imperative to account for undesirable outputs when estimating the productivity growth for such processes (e.g., see Atkinson and Dorfman, 2005;Atkinson and Tsionas, 2015).
The estimation of productivity (and, potentially, its components) naturally requires the estimation of the firm's production process, the modeling of which in the presence of undesirable outputs is however not a clear-cut issue. A standard approach is to condition the conventional transformation (production) function on undesirable outputs (e.g., Reinhard et al., 1999Reinhard et al., , 2000Hailu and Veeman, 2001) which, effectively, treats these outputs as inputs. Such a treatment of undesirable outputs has since been heavily criticized due the implied strong disposability of undesirable outputs (Färe et al., 2005) and the violation of the "material balance condition" (Murty et al., 2012). A popular alternative approach to tackling undesirable outputs is to specify a (single) directional output distance function (Chung et al., 1997;Färe et al., 2005) which accommodates both the expansion in desirable outputs and a simultaneous contraction in undesirable outputs. Feng and Serletis (2013) have recently proposed a primal Divisia productivity index based on such a directional output distance function.
Both the directional output distance function and the productivity index based on the latter allow the identification of a "composite" measure of inefficiency and productivity (respectively) only. Specifically, when modeling the production technology via a (single) directional distance function (e.g., Färe et al., 2005), the inefficiency is defined over the entire vector of outputs, both desirable and undesirable. This produces a single measure of inefficiency which is a weighted combination of the technical and environmental inefficiencies, where the "weighting" is done on the basis of the prespecified directional vector. Similarly, the directional-output-distance-functionbased productivity index identifies the "composite" productivity growth only. Thus, modeling undesirable outputs via the standard directional functions precludes researchers from disentangling the technical inefficiency/productivity, conventionally oriented along desirable outputs, from the environmental, or so-called "green", inefficiency/productivity, oriented along undesirable outputs. 1 Both can be of great interest from a policy perspective.
In this paper, we follow a different path to modeling the production process with undesirable outputs in the spirit of Fernández et al. (2002Fernández et al. ( , 2005, Forsund (2009) and Murty et al. (2012). Specifically, we model the firm's production process as a system of separate simultaneous production technologies for desirable and undesirable outputs. In this setup, desirable outputs are produced by transforming inputs via the conventional transformation function satisfying all standard assumptions. Consistent with the material balance condition, the by-production of undesirable outputs is however treated as the so-called "residual generation technology". The above setup explicitly recognizes that the generation of undesirable outputs is not the intended production but rather the by-production process. By separating the generation of undesirable outputs from that of desirable outputs, we ensure that the former are not modeled as inputs as well as take into account their "costly disposability" (see Murty et al., 2012).
The by-production system approach that we employ in this paper permits us to not only distinguish between technical efficiency and (undesirable-output-specific) environmental efficiencies but to also differentiate between traditional technical productivity and environmental ("green") productivity. Specifically, we derive a Solow (1957) type primal (Divisia) environmental productivity index which, unlike a conventional desirable-output-oriented productivity index, is defined as the contraction rate in undesirable outputs unexplained by the contraction in desirable outputs. This allows us to credit the ceteris paribus reduction in undesirable outputs. Our environmental productivity index also provides a meaningful way to decompose productivity into environmental technological change and environmental efficiency change.
We apply our system approach as well as the environmental productivity index to study the efficiency and productivity trends among coal-fired electric power generating plants in the U.S. during the 1985-1995 period. The production of (desirable) electric power by these utilities is accompanied by the (undesirable) emission of SO 2 and NO x gases. We estimate the model subject to theoretical regularity conditions using (numerically) efficient Bayesian MCMC technique, where we also allow for unobserved plant-specific heterogeneity in addition to time-varying inefficiencies. We impose monotonicity and curvature regularity restrictions (at every data point) in order to ensure that our results are economically meaningful, as emphasized by Barnett et al. (1991) and Barnett (2002). Among many things, we find that electric utilities in our sample tend to suffer from higher levels of environmental inefficiency in the emission of SO 2 than in the emission of NO x gases. We also document a significant divergence between the electric-poweroriented technical productivity and the emission-oriented environmental productivity. Specifically, we find that, while the pooled posterior mean estimate of (annual) productivity growth is negative for electric power generation (-0.13%), it is non-negligibly positive for the SO 2 and NO x emissions: 2 2.25% and 3.31% per annum, respectively. The cumulative eleven-year growth is 23.26% for the SO 2 -oriented environmental productivity, 37.98% for the NO x -oriented environmental producitivity and a mere 5.33% for the electric-power-oriented technical productivity.
The rest of the paper proceeds as follows. Section 2 describes the by-production system approach to modeling production technology in the presence of undesirable outputs as well as provides the derivation of the environmental productivity index. We explain our econometric strategy in Section 4. Section 5 presents the results, and Section 6 concludes.

The By-Production Model
Building on Fernández et al. (2002Fernández et al. ( , 2005, the undesirable-output-generating production system (T) with J inputs X ∈ R J + , M desirable outputs Y ∈ R M + and P undesirable outputs B ∈ R P + can be formalized as the intersection of the primary technology used in the production of desirable outputs (T 0 ) and P individual undesirable-output residual generation technologies (T p , p = 1, . . . , P ), i.e., 3 (2.1) Consider the case of J = 3 inputs, M = 1 desirable and P = 2 undesirable outputs (as in our empirical application). Allowing for technical inefficiency in the production of a desirable output and environmental inefficiency in the by-production of undesirable outputs, we rewrite system T in terms of the stochastic transformation function and two separate (environmental) residual generation functions for each undesirable output, i.e., where θ ≤ 1 and λ p ≤ 1 are technical and environmental efficiencies, respectively; and (v 0 , v p ) are the white noise terms. The transformation function F (·) is assumed to satisfy all standard properties such as continuity, positive (negative) monotonicity in Y (X), linear homogeneity in Y and concavity in X and Y . Similarly, the residual generation function H p (·) is continuous, positively (negatively) monotonic in B p (Y ), linearly homogeneous in B p and convex in Y and B p . Thus, the production system (2.2) permits the identification of both the technical and environmental efficiencies: θ and λ p (p = 1, . . . , P ). The latter is feasible due to the separability of the primary desirable-output generating production technology (2.2a) from the undesirable-output residual generating technologies (2.2b), which is motivated by the by-production approach satisfying the material balance condition. For instance, one would generally be unable to disentangle technical and environmental efficiencies (the way the above system approach allows us to) if following a popular alternative to the estimation of production processes in the presence of undesirable outputs based on the directional distance function (Chung et al., 1997).
Specifically, when modeling the production technology via a (single) directional distance function (e.g., Färe et al., 2005), the inefficiency is defined over the entire vector of outputs, both desirable and undesirable, using an a priori specified directional vector. The latter precludes researchers from disentangling the technical inefficiency conventionally oriented along desirable outputs from the environmental inefficiency oriented along undesirable outputs. The directional distance function rather produces a "composite" measure of inefficiency which is a weighted combination of the two, where the "weighting" is done on the basis of the prespecified direction. Further, unlike a system in (2.2), the directional distance function yields an additive, not a proportional, measure of inefficiency.

Technical and Environmental Productivity
The production system T that we consider in this paper permits us to not only distinguish between technical efficiency θ (conventionally defined over the desirable output) and undesirable-outputspecific environmental efficiencies {λ p ; p = 1, . . . , P } but to also differentiate between traditional technical productivity and environmental, or the so-called "green", productivity.
Letting time t enter the transformation and residual generation functions F (·) and H p (·) explicitly and making use of their linear homogeneity properties, system (2.2) can be rewritten in the log form as where, for convenience, we define f (·) and u p,t def = − ln λ p,t ≥ 0 (p = 1, 2) are technical and environmental inefficiencies, respectively. Total differentiation of (2.3) with respect to t yields where we have made use of ∂v 0,t /∂t = ∂v p,t /∂t = 0 since (v 0 , v p ) are the i.i.d. white noise. After some rearranging, from (2.4a) we get the following Solow (1957) type (Divisia) technical productivity index: along with the similarly defined environmental productivity index from (2.4b): The negative monotonicity of F (·) and H p (·) in inputs and desirable outputs, respectively, imply that ∂ ln f (X t , t)/∂ ln X j,t ≥ 0 and ∂ ln h p (Y t , t)/∂ ln Y t ≥ 0. 4 Unlike T P G which is conventionally defined as the expansion rate in a desirable output unexplained by the growth in inputs, the environmental productivity index EP G is defined as the contraction rate in an undesirable output unexplained by the contraction in desirable outputs. 5 This allows crediting the ceteris paribus reduction in undesirable outputs.
Further, equations (2.5) and (2.6) provide a meaningful way to decompose respective productivity indices into technical/technological change and efficiency change. The conventional technical productivity index T P G equals the sum of the technical change T T C = ∂ ln f (X t , t)/∂t, which measures the temporal shift in the production frontier, and technical efficiency change T EC = −∂u 0,t /∂t, which measures the movement toward (away from) the frontier. Similarly, the B p -oriented environmental productivity index EP G p is decomposed into similarly interpreted environmental technological change ET C p = −∂ ln h p (Y t , t)/∂t and environmental efficiency change EEC p = −∂u p,t /∂t. Note the conceptual difference between the definition of a "technological progress" for desirable outputs and that for undesirable outputs. Namely, for a desirable output Y the technological progress corresponds to the case of T T C > 0, i.e., an outward shift in the production frontier over time, whereas for an undesirable output B p the technological progress corresponds to ET C p < 0, i.e., an inward shift in the residual generating frontier over time. Thus, the residual generating frontier H p (·) (p = 1, . . . , P ) is defined as the minimum quantity of undesirable output generated when producing a given quantity of desirable outputs subject to the material balance condition.
We emphasize that the primary advantage of employing a system approach to model the production process with undesirable outputs, which we consider in this paper, is the opportunity to disentangle technical and environmental productivities. For instance, as in the case of inefficiency, one generally cannot do that when using the productivity index based on the directional distance function (Feng and Serletis, 2013).

Data
The data we use come from Pasurka (2006) and Murty et al. (2012). A balanced panel consists of 92 coal-fired electric power generating plants operating in the U.S. over the period from 1985 to 1995. We focus on coal-fired plants only in order to minimize heterogeneity among units. More specifically, we focus on utilities of which at least 95% of total fuel consumption (measured in Btu) come from coal. We also exclude utilities whose consumption of fuels other than coal, oil and natural gas exceeds 10 −4 % of total fuel consumption.
The specification of outputs and inputs is as follows. The desirable output is the net electric power generation Y , measured in kWh. The two undesirable outputs are (i) the SO 2 (sulfur dioxide) gas emissions B 1 and (ii) the NO x (nitrogen oxides) gas emissions B 2 , both measured in short-tons. The three inputs to the production are (i) the real stock of physical capital X 1 , constructed from historical cost of plant data and deflated to constant dollars using the Handy-Whitman Index, (ii) labor X 2 , measured in the number of employees, and (iii) energy X 3 , i.e., the heat content of coal, oil and natural gas consumption, measured in Btu.
The data on the cost of plants and equipment (used in the construction of the capital stock) and the number of employees come from the U.S. Federal Energy Regulatory Commission Form 1 survey. The data on fuel consumption, net power generation and pollutant gas emissions come from the U.S. Department of Energy Form EIA-767 survey. For more details on the data, see Pasurka (2006).
complemented by the (environmental) residual generation technologies for undesirable outputs where a lower-case variable denotes the log of its upper-case counterpart, and D it denotes the time dummy. For greater flexibility, we also allow for unobserved firm-specific heterogeneity which we model via "true" random effects {(α 0,i , γ 0,i , δ 0,i ); i = 1, . . . , n}. The presence of these random effects (in addition to inefficiencies) captures additional technological heterogeneity among firms.
Since y it appears on the right-hand side of equations for undesirable outputs b 1,it and b 2,it , it is imperative that all three equations in (4.1)-(4.2) be estimated as a system (of simultaneous equations) in order to control for the endogeneity of outputs. We estimate this production system subject to symmetry (α hj = α jh ) as well as monotonicity and curvature restrictions. In this paper, we thus concur with Barnett et al. (1991) and Barnett (2002) on the importance of maintaining the latter theoretical regularity conditions when modeling technology (especially, if allowing for inefficiency) in order to ensure that the results are economically meaningful.
Specifically, the monotonicity conditions are: The curvature is imposed using restrictions on the eigenvalues of the Hessian matrices in levels (see O'Donnell and Coelli, 2005). We employ the following stochastic specification for system (4.1)-(4 .2): where N + denotes the (multivariate) half-normal distribution; 6 Σ and Σ u are the covariance matrices; Z it = I 3 ⊗ D where I κ is an identity matrix of dimension κ and D = [D i1 , . . . , D iT ] ; and τ = vec {τ kt ; k = 0, 1, 2; t = 1, . . . , T } is a set of 3T unknown parameters. The location parameters of each inefficiency term u k,it (k = 0, 1, 2) is given by T t=1 τ kt D it . Thus, for greater flexibility in modeling time effects, we allow inefficiency to be time-varying (i.e., a function of the time dummies). The error components (v it , u it ) are assumed to be orthogonal as well as independent of x j,it (j = 1, . . . , J). Further, the random effects (α 0,i , γ 0,i , δ 0,i ) are assumed to be identically, independently distributed from the error components (v it , u it ) as well as independent of x j,it (j = 1, . . . , J): where Ω = diag{σ 2 α , σ 2 γ , σ 2 δ }.

Priors
For the parameters in system (4.1)-(4.2), which we collectively denote by ϑ, we assume a noninformative prior that imposes the regularity restrictions so that p(ϑ) ∝ I(ϑ ∈ R), where R denotes the set of acceptable parameters. For scale parameters σ 2 where N = 1 and Q = 10 −4 . For τ , we assume a proper but relatively non-informative prior of the form τ ∼ N (0, cI 3T ) with c = 10 4 . For Σ and Σ u , we assume proper but relatively non-informative priors in the Wishart family. The results are not sensitive to c, N or Q unless c becomes approximately less than 0.1, in which case it approaches the domain of a dogmatic prior.
One may inquire if it would be possible to select objective priors such as in the case of Jeffreys' prior. One way to proceed with objective priors would be along the lines of Berger and Mortera (1999) and Mulder et al. (2010). For instance, the use of a constrained posterior prior along the lines of Berger and Mortera (1999) is an option. The Jeffreys' prior cannot be obtained analytically but can be computed using numerical or analytic derivatives. This computation is certainly heavy. Furthermore, the Jeffreys' prior is not used as much in the present literature, and the emphasis is rather placed on the so-called intrinsic Bayes factor (see Berger and Pericchi, 1996). We leave the issue for future research, but we do not expect much change since our results were not sensitive to important aspects of the prior.
The first term in the third line of (4.6) owes to the constraint u it ≥ 0. Specifically, our stochastic assumptions about u it imply the density which requires the evaluation of a tri-variate normal integral that can be performed using standard numerical algorithms. Before proceeding with MCMC methods for inference, note that the posterior is given by While the multivariate integration can be performed in the closed form with respect to inefficiencies u, the induced nonlinearity however precludes analytical integration with respect to random effects α. We are not aware of any efficient MCMC scheme that draws these random effects as a block from the posterior, especially when n is relatively large. The posterior conditional distribution of latent inefficiencies is where V = Σ −1 + Σ −1 u −1 . Draws from the above conditional distribution can be easily obtained.
The same is true for the posterior conditional distribution of the random effects if we write r it from (4.7) as r it ≡ α i − R it , where R it is defined in an obvious way. We can then draw the random effects as a block for observation i as follows where If it were not for the regularity constraints and the non-standard form of the posterior conditional distribution of τ (due do the term involving the multivariate normal integral), we could easily derive the posterior conditional distribution of parameters of interest ϑ.
Collecting data for all observations over i = 1, . . . , n and t = 1, . . . , T , we rewrite our model in an obvious matrix notation:  where we assume ζ = ζ 0 , ζ 1 , ζ 2 ∼ N (0, Σ u ) . (4.14) We rewrite the system of equations (4.13) compactly as where Y is an nT × (2 × 3) vector of "data" appearing on the left-hand side of equalities in (4.13), System (4.15) takes the form of a multivariate regression model with (4.16) The GLS estimator of Θ is given by with the corresponding covariance matrix: We note that the above approximation however ignores that ϑ included in needs to satisfy ϑ ∈ R (the regularity conditions).
Let us define a multivariate normal distribution centered at of which the covariance is h × cov{ } for some constant h > 0. We denote this distribution by f N ,κ ( ; , h × cov{ }), where κ is the dimensionality of , i.e., the number of parameters in the extended system (4.13). We use the GLS quantities to form a proposal density for generating candidate parameter draws as we describe below.
Next, we describe how to realize draws from the conditional posterior distributions of σ, Σ and Σ u . Except for Σ u , σ and Σ can be drawn from standard statistical distributions. Specifically, for the elements of σ we have: Here, θ −k denotes all elements of the entire parameter vector θ including all latent variables except the indicated subscripted parameter k ∈ {α, γ, δ}.
Our priors are conditionally conjugate, i.e., where N is a scalar prior parameter and A, A u are prior matrices. In our empirical work we take N = 10 and A = A u = 10 −3 × I 3 .
Clearly, Σ −1 belongs to the Wishart family. The same would have been true for Σ u if it were not for the second line of (4.22) which involves the Cholesky factor of this matrix. Therefore, we use the Wishart distribution to draw a candidate matrix and we retain the candidate with probability where C (c) u denotes the candidate draw and C (s) u is the existing sth draw (s = 1, ..., S).

Imposition of Restrictions
Imposing restrictions is not trivial in our application. Since the restrictions depend on the data, we adopt the following strategy. We draw from the proposal density described in the previous subsection subject to the constraints ϑ ∈ R using a special form of rejection to improve the efficiency of "naive rejection" which would keep drawing parameters until all constraints are satisfied. Specifically, we first use acceptance at a limited number of points to facilitate acceptance and then we keep drawing from the proposal distribution until all regularity constraints hold at all data points.
We first impose the restrictions at the means of variables (normalized to zero) and then at points ±r around the mean. We choose r = {0.5, 1, 2, 3}, and the restrictions hold without much trouble in the positive direction. In the negative direction, the restrictions are first tested for r = −0.1, −0.2, ..., −2 and then tested at the remaining points. This yields considerable improvement in the efficiency (i.e., timing) of acceptance rates from a density which we describe next.
Based on a current draw (s) such that ϑ (s) ∈ R, a new candidate (c) ∼ N ( , h×cov{ })× I(ϑ (c) ∈ R) is generated until it satisfies the regularity restrictions. The candidate is accepted and we set (s+1) = (c) with the Metropolis-Hastings probability otherwise we repeat the current draw, that is (s+1) = (s) ; s = 1, . . . , S. We adjust the scaling constant h so that the acceptance rate of the Metropolis-Hastings algorithm is between 20% and 30%. The Metropolis-Hastings algorithm also takes care of nonlinearity of the posterior in τ . We generate the covariance matrix Σ and scale parameters σ from their respective posterior conditional distributions which are all in standard form (inverted Wishart and inverted Gamma). The latter however is not the case for Σ u (i.e., this matrix cannot be drawn using an inverted Wishart). We therefore take an extra Metropolis-Hastings step to accommodate the presence of the Cholesky factor of Σ u (i.e., C u ) in the posterior inside the multivariate normal integral. Acceptance rates using a simple Metropolis-Hastings step were quite high (over 90%), and simple scaling has brought it down to the range of 20-25%.
Our MCMC uses S preliminary or transient passes until we obtain convergence using Geweke's (1992) relative numerical efficiency (RNE) diagnostic. Once convergence is achieved, we take another 100,000 passes. We do not use thinning. Instead, we report posterior standard deviations based on Newey-West HAC covariance matrices using 10 lags. For details, see Table 1.

Improving Performance of MCMC
We can explicitly integrate u out of (4.6) to obtain a kernel posterior of the following form: (4.25) Further, we can also derive the closed-form conditional posterior of random effects p(α|Ξ, θ −α ). We can achieve a significant improvement of MCMC performance by recognizing that the random effects α can be explicitly integrated out of the posterior, when the parameters ϑ are drawn. Specifically, similar to (4.13), we consider the following system which we can rewrite in compact notation as Collecting all (time) observations for a given firm i together, we obtain: where Y i , X i and V i are defined in an obvious way. Clearly: . . . , n , (4.29) where J T is a T × T matrix of which all elements are equal to one. Therefore, we can redefine the GLS quantities that are used to obtain a good proposal distribution for ϑ using the following as the covariance matrix V, i.e., (4.30) Using the modified proposal density, we effectively marginalize out the random firm-specific effects from the posterior and thus can draw latent inefficiencies u it marginally on these effects hoping to reduce overall autocorrelation in MCMC due to the correlation between α i and u it . 7 This requires a trivial modification in the way we draw latent inefficiencies. Since model (4.26) may be written as after collecting all (time) observations for a given firm i, in obvious notation, we have where X i and V i are naturally redefined to account for a change of sign in the first equation of (4.31) in order to accommodate a uniform sign in front of inefficiencies U i . Also recall that V i and its stochastic properties have been defined before. Now we can draw 3T × 1 inefficiencies U i as a block, after we couple system (4.32) with the following specification: we can draw latent inefficiencies, marginally on random effects, using the following multivariate truncated normal distributions, i.e., the first two moments of which are We have found that drawing blocks of latent inefficiencies marginally on random effects α i (and conditionally on various covariance matrices and (ϑ, τ )) results in vast improvements in terms of computational efficiency. Table 1 summarizes our computational experience with the data.

Random Effects
Based on our discussion above, we note that given the way the variance parameters σ 2 enter the covariance matrix Ω, in principle, there is no problem in treating the random effects α i as jointly normally distributed, i.e., α i ∼ N (0, Ω), independently over i = 1, ..., n as well as independent from all other random variables and regressors in the model. All our derivations, including the conditional posterior distribution of α i , hold true. The only difference is that, in a more general setting (when random effects are allowed to be correlated across equations) one has to draw Ω from its conditional posterior distribution as a general positive definite matrix, whereas, when the random effects are a priori independent, the problem boils down to drawing variances σ 2 only. NOTES: The random effects (α 0,i , γ 0,i , δ 0,i ) are for (y, b 1 , b 2 ), respectively. Standard deviations are in parenthesis and are computed using a Newey-West HAC correction with 10 lags.
A general case of correlated random effects has an empirical implication of firm-specific effects in the production function being correlated with those in residual generation functions for undesirable outputs. The latter possibility is testable and is of interest on its own. Should Ω be found not to be diagonal, one should naturally focus on the sign of the correlation between the random effects.
Given a proper prior p (Ω) on the different elements of Ω and the marginal posterior p (Ω|Ξ), the Verdinelli and Wasserman (1995) approach to computing the Bayes factor in favor of diagonality is given by which, in the general case, involves testing k (k − 1) /2 zero restrictions, where k is the dimension of Ω (in our case, k = 3). By "Ω = diagonal", we mean the zero restrictions While the denominator of BF diag is easy to compute, the numerator is computed in a standard fashion as where θ

(s)
−Ω is the sth (of S) MCMC draw of all parameters θ except those in Ω. Note that θ (s) −Ω does include the diagonal elements of Ω in this computation. It remains to show how draws from the conditional posterior distribution may be realized. Our prior is conditionally conjugate and has the following form: The conditional posterior distribution is given by where A Ω = n i=1 R i R i and R i def = α i − r it as before. We define the baseline prior using N o = 10 and A o = c × I 3 with c = 10 −3 . Clearly, Ω belongs to the Wishart family.
The Bayes factor BF diag using the baseline prior is 2.402 × 10 −3 with the corresponding range of (1.015 × 10 −5 ; 0.0893), which suggests that the diagonality of Ω can be definitely rejected. In order to compute the range of BF diag , we generate 1,000 alternative priors and implement the approximated MCMC using the sampling-iterative-resampling (SIR) algorithm which re-weights the original MCMC sample without recomputing MCMC samples for a new prior. For each one of the SIR re-weighted samples, we implement the Verdinelli-Wasserman (1992) approach, and the  Table 2 reports the posterior means and standard deviations of the correlation coefficients between random effects α i derived from Ω. We find that unobserved firm-specific effects are all positively correlated.

Results
Before proceeding to the discussion of technical and environmental inefficiencies as well as productivity and its components, we first focus on elasticities of the production process by electric utilities in our data sample. Table 3 reports the summary of posterior estimates of these elasticities, including input elasticities of the primary production function (i.e., electric power generation) as well as elasticities of SO 2 and NO x emissions (undesirable outputs) with respect to the net generated electric power (desirable output). 8 In particular, the reported input elasticity estimates imply a posterior mean estimate of the returns to scale, defined as the sum of input elasticities, of 0.90, which suggests that, on average, electric utilities operated at decreasing returns to scale during our sample period.

Figure 1: Kernel Densities of Technical and Environmental Inefficiency Estimates
The estimates of ∂b p,it /∂y it (p = 1, 2) are of particular interest since they capture the cost of expanding the production of electric power in terms of the associated increase in the generation of the SO 2 and NO x emissions. It is intuitive to interpret these estimates as "shadow prices" (in the elasticity form) of the power generation. The posterior mean estimates of the two shadow prices are 1.09 and 1.13. The latter implies that, on average, an increase in the net power generation by 1% requires a simultaneous increase in the SO 2 and NO x emissions by at least 1.09% and 1.13%, respectively. Note that emissions may increase by even more if the firm is not on the residual generating frontier, i.e., environmentally inefficient.
We next proceed to the discussion of technical and environmental inefficiencies exhibited by the utilities in our sample. Figure 1 plots kernel densities of the posterior estimates of the three types of inefficiency. In order to construct the figure, we use a Gaussian kernel with the cross-validated bandwidth parameters. We find apparent differences between the distributions of technical and environmental inefficiencies. Specifically, while technical inefficiency is relatively symmetrically distributed around its mean of 0.09, the distribution of the NO x -oriented environmental inefficiency is noticeably skewed to the right and the distribution of the SO 2 -oriented environmental inefficiency exhibits apparent bi-modality. There may be many reasons for such a stark difference between the levels of technical and environmental inefficiencies across utilities. One plausible explanation is that technical inefficiency may also be capturing declines in the desirable output due to unforeseen fluctuations in the demand for electric power. Since inputs often cannot be immediately adjusted/reallocated and electric power is not easily storable, electric plants may be forced to under-utilize their facilities and labor, which our model would detect and classify as technical underperformance (inefficiency) relative to the frontier. However, such a demand uncertainty would not apply to the by-production of undesirable SO 2 and NO x gases given the exact physical relationship between the power generation and the associated emission of pollutant gases. The latter is also capable of at least partly explaining why environmental inefficiency (in the emission of both SO 2 and NO x gases) appears to be relatively more stable over time unlike the electric-power-oriented technical inefficiency, which we discuss in more detail later in the paper.
Further, we find that electric utilities tend to suffer from higher levels of inefficiency in the emission of SO 2 than in the emission of NO x gases. These differences may be reflective of varying degree of strictness of environmental regulations (or the degree of their enforceability) for different pollutants across states. For instance, loose regulations for the SO 2 emissions could potentially explain why the SO 2 -oriented environmental inefficiency is considerably higher on average and is more dispersedly distributed than the NO x -oriented inefficiency. Lastly, we document little correlation between the two environmental inefficiencies as well as between the environmental inefficiencies and the technical inefficiency. Table 4 reports such Spearman rank correlation coefficients for the posterior inefficiency estimates. For instance, there appears to be virtually no relationship between the electric-power-oriented technical inefficiency and the SO 2 -oriented environmental inefficiency exhibited by utilities. We next proceed to the posterior estimates of the productivity growth components -technological change and efficiency change -as defined in (2.5) and (2.6). From Table 3, we see that the levels of both environmental efficiencies appear to be stable over time: posterior mean estimates of the SO 2 EEC and the NO x EEC are virtually zero. The left panel of Figure 2, which depicts box-and-whiskers plots of the distributions of technical and environmental efficiency change estimates for each year in our sample, confirms the relative stability of environmental efficiency levels (see the left panel of the figure). The electric-power-oriented (i.e., desirable-output-oriented) technical efficiency is however less stable over the course of the years. The mean estimates of T EC are predominantly positive across utilities in 1986, 1988 and 1995, whereas a significant decrease in efficiency is documented for 1987 and 1994. However, the mean posterior estimate of T EC pooled over the entire sample is a mere 0.29% (also see Table 3).
We document several regularities in the estimates of the technological change. First, the technical change T T C is primarily negative and is close to zero in the production of electric power (a desirable output). The posterior mean estimate of annual T T C is -0.42%. This finding is in sharp contrast with what we observe for the environmental technological change ET C. The posterior mean estimates of ET C for the SO 2 and NO x emissions (undesirable outputs) are staggering 2.25% and 3.31% per annum, respectively. Second, we find that technological change is fairly stable across the years in all dimensions, be it the intended production of electric power or undesirable by-production of emission gases. The right panel of Figure 2 confirms this observation: the distributions of T T C and the two ET C do not change much over the course of the years.
The discussed differences in the temporal dynamics between technical and environmental efficiency change and technological change result in dramatic differences in the measures of productivity across desirable and undesirable outputs. We find that, while the pooled posterior mean estimate of (annual) productivity growth is negative for electric power generation (-0.13%), it is substantially positive for the SO 2 and NO x emissions: annual 2.25% and 3.31%, respectively. In other words, keeping input quantities constant, the net electric power generation, on average, fell by 0.13% per year during our sample period. Utilities however did a significantly better job in terms of cutting the emission of SO 2 and NO x gases for any fixed quantity of the net electric power generated: on average, emissions fell by respective 2.25% and 3.31% per year ceteris paribus. This disconnect between technical and environmental productivities of electric plants in our sample is also confirmed by virtually zero rank correlation coefficients between T P G and EP G for the SO 2 and NO x emissions (see Table 4). Figure 3 vividly illustrates the differences between technical and environmental productivities. It plots the productivity indices that are normalized to 100 in the year 1985 and are constructed using the (respective)-output-weighted average annual productivity growth rates (over all utilities in the sample). The figure shows that the industry enjoyed stable positive rates of EP G during the 1985-1995 period, whereas T P G was more sporadic and included periods of both positive and negative growth. The cumulative eleven-year growth is 23.26% for the SO 2 -oriented EP G, 37.98% for the NO x -oriented EP G and a mere 5.33% for the electric-power-oriented T P G.

Conclusion
The prevalent approaches to modeling the firm's production process in the presence of undesirable outputs either treat these outputs as inputs, which questionably implies their strong disposability as well as violates the "material balance condition", or employ the directional distance function that, despite a sound theoretical foundation, allows the identification of a "composite" measure of inefficiency only. Similarly, the directional-output-distance-function-based productivity index identifies only the composite productivity which is a weighted combination of the technical and environmental productivities. This precludes researchers from disentangling the technical inefficiency/productivity, conventionally defined over desirable outputs, from the environmental ("green") inefficiency/productivity, defined over undesirable outputs.
In this paper, we follow a different path to modeling the production process with undesirable outputs in the spirit of Fernández et al. (2002Fernández et al. ( , 2005, Forsund (2009) and Murty et al. (2012). Specifically, we model the productive operations of the firm as a system of simultaneous production technologies for desirable and undesirable outputs. In this setup, desirable outputs are produced by transforming inputs via the conventional transformation function satisfying all standard assumptions. Consistent with the material balance condition, the by-production of undesirable outputs is however treated as the so-called "residual generation technology". The above setup explicitly recognizes that the generation of undesirable outputs is not the intended production but rather the by-production process. By separating the generation of undesirable outputs from that of desirable outputs, we ensure that the former are not modeled as inputs as well as take into account their "costly disposability".
The by-production system approach that we employ in this paper permits us to not only distinguish between technical efficiency and (undesirable-output-specific) environmental efficiencies but to also differentiate between traditional technical productivity and environmental productivity. Specifically, we derive a Solow (1957) type primal (Divisia) environmental productivity index which, unlike a conventional desirable-output-oriented productivity index, is defined as the contraction rate in undesirable outputs unexplained by the contraction in desirable outputs. This allows us to credit the ceteris paribus reduction in undesirable outputs. Our environmental productivity index also provides a meaningful way to decompose productivity into environmental technological change and environmental efficiency change.