Bayesian Approach to Disentangling Technical and Environmental Productivity

Emir Malikov; Subal C. Kumbhakar; Efthymios G. Tsionas

doi:10.3390/econometrics3020443

,

and

¹

Department of Economics, St. Lawrence University, Canton, NY 13617, USA

²

Department of Economics, State University of New York at Binghamton, Binghamton, NY 13902, USA

³

Norwegian Agricultural Economics Research Institute, NO-0030 Oslo, Norway

⁴

Department of Economics, Lancaster University Management School, Lancaster LA1 4YX, UK

Econometrics2015, 3(2), 443-465;https://doi.org/10.3390/econometrics3020443

Version Notes

Order Reprints

Abstract

This paper models the firm’s production process as a system of simultaneous technologies for desirable and undesirable outputs. Desirable outputs are produced by transforming inputs via the conventional transformation function, whereas (consistent with the material balance condition) undesirable outputs are by-produced via the so-called “residual generation technology”. By separating the production of undesirable outputs from that of desirable outputs, not only do we ensure that undesirable outputs are not modeled as inputs and thus satisfy costly disposability, but we are also able to differentiate between the traditional (desirable-output-oriented) technical productivity and the undesirable-output-oriented environmental, or so-called “green”, productivity. To measure the latter, we derive a Solow-type Divisia environmental productivity index which, unlike conventional productivity indices, allows crediting the ceteris paribus reduction in undesirable outputs. Our index also provides a meaningful way to decompose environmental productivity into environmental technological and efficiency changes.

Keywords:

bad output; by-production; efficiency; MCMC; productivity

JEL classifications:

C11; C30; C43; D24

1. Introduction

The by-production of undesirable, or so-called “bad”, outputs is an inherent attribute of many production processes. Electric power generation is a classical example of such a process, where the production of electricity (desirable output) is accompanied by the emission of pollutant gases (undesirable outputs). It is therefore imperative to account for undesirable outputs when estimating the productivity growth for such processes (e.g., see [1,2]).

The estimation of productivity (and, potentially, its components) naturally requires the estimation of the firm’s production process, the modeling of which in the presence of undesirable outputs is however not a clear-cut issue. A standard approach is to condition the conventional transformation (production) function on undesirable outputs (e.g., [3,4,5]) which, effectively, treats these outputs as inputs. Such a treatment of undesirable outputs has since been heavily criticized due the implied strong disposability of undesirable outputs [6] and the violation of the “material balance condition” [7]. A popular alternative approach to tackling undesirable outputs is to specify a (single) directional output distance function [6,8] which accommodates both the expansion in desirable outputs and a simultaneous contraction in undesirable outputs. Feng and Serletis [9] have recently proposed a primal Divisia productivity index based on such a directional output distance function.

Both the directional output distance function and the productivity index based on the latter allow the identification of a “composite” measure of inefficiency and productivity (respectively) only. Specifically, when modeling the production technology via a (single) directional distance function (e.g., [6]), the inefficiency is defined over the entire vector of outputs, both desirable and undesirable. This produces a single measure of inefficiency which is a weighted combination of the technical and environmental inefficiencies, where the “weighting” is done on the basis of the prespecified directional vector. Similarly, the directional-output-distance-function-based productivity index identifies the “composite” productivity growth only. Thus, modeling undesirable outputs via the standard directional functions precludes researchers from disentangling the technical inefficiency/productivity, conventionally oriented along desirable outputs, from the environmental, or so-called “green”, inefficiency/productivity, oriented along undesirable outputs.1 Both can be of great interest from a policy perspective.

In this paper, we follow a different path to modeling the production process with undesirable outputs in the spirit of Fernández et al. [11,12, Forsund [13] and Murty et al. [7]. Specifically, we model the firm’s production process as a system of separate simultaneous production technologies for desirable and undesirable outputs. In this setup, desirable outputs are produced by transforming inputs via the conventional transformation function satisfying all standard assumptions. Consistent with the material balance condition, the by-production of undesirable outputs is however treated as the so-called “residual generation technology”. The above setup explicitly recognizes that the generation of undesirable outputs is not the intended production but rather the by-production process. By separating the generation of undesirable outputs from that of desirable outputs, we ensure that the former are not modeled as inputs as well as take into account their “costly disposability” (see [7]).

The by-production system approach that we employ in this paper permits us to not only distinguish between technical efficiency and (undesirable-output-specific) environmental efficiencies but to also differentiate between traditional technical productivity and environmental (“green”) productivity. Specifically, we derive a Solow [14] type primal (Divisia) environmental productivity index which, unlike a conventional desirable-output-oriented productivity index, is defined as the contraction rate in undesirable outputs unexplained by the contraction in desirable outputs. This allows us to credit the ceteris paribus reduction in undesirable outputs. Our environmental productivity index also provides a meaningful way to decompose productivity into environmental technological change and environmental efficiency change.

We apply our system approach as well as the environmental productivity index to study the efficiency and productivity trends among coal-fired electric power generating plants in the U.S. during the 1985–1995 period. The production of (desirable) electric power by these utilities is accompanied by the (undesirable) emission of SO₂ and NO_x gases.

We estimate the model subject to theoretical regularity conditions using (numerically) efficient Bayesian MCMC technique, where we also allow for unobserved plant-specific heterogeneity in addition to time-varying inefficiencies. We impose monotonicity and curvature regularity restrictions (at every data point) in order to ensure that our results are economically meaningful, as emphasized by Barnett et al. [15] and Barnett [16]. Among many things, we find that electric utilities in our sample tend to suffer from higher levels of environmental inefficiency in the emission of SO₂ than in the emission of NO_x gases. We also document a significant divergence between the electric-power-oriented technical productivity and the emission-oriented environmental productivity. Specifically, we find that, while the pooled posterior mean estimate of (annual) productivity growth is negative for electric power generation (–0.13%), it is non-negligibly positive for the SO₂ and NO_x emissions:2 2.25% and 3.31% per annum, respectively. The cumulative eleven-year growth is 23.26% for the SO₂-oriented environmental productivity, 37.98% for the NO_x-oriented environmental producitivity and a mere 5.33% for the electric-power-oriented technical productivity.

The rest of the paper proceeds as follows. Section 2 describes the by-production system approach to modeling production technology in the presence of undesirable outputs as well as provides the derivation of the environmental productivity index. Data are discussed in Section 3. We explain our econometric strategy in Section 4. Section 5 presents the results, and Section 6 concludes.

2. The By-Production Model

Building on Fernández et al. [11,12], the undesirable-output-generating production system (𝕋) with J inputs

X \in R_{+}^{J}

, M desirable outputs

Y \in R_{+}^{M}

and P undesirable outputs

B \in R_{+}^{P}

can be formalized as the intersection of the primary technology used in the production of desirable outputs (

T_{0}

) and P individual undesirable-output residual generation technologies (

T_{p}, p = 1, \dots, P

), i.e.,3

\begin{matrix} T = T_{0} \cap T_{1} \cap \dots \cap T_{p} \dots \cap T_{P}, where \\ T_{0} \overset{def}{=} {(X, Y) : X can produce Y} \\ T_{p} \overset{def}{=} {(B_{p}, Y) : Y generates B_{p}} \forall p = 1, \dots, P . \end{matrix}

(1)

Consider the case of

J = 3

inputs,

M = 1

desirable and

P = 2

undesirable outputs (as in our empirical application). Allowing for technical inefficiency in the production of a desirable output and environmental inefficiency in the by-production of undesirable outputs, we rewrite system 𝕋 in terms of the stochastic transformation function and two separate (environmental) residual generation functions for each undesirable output, i.e.,

\begin{matrix} F (X, θ^{- 1} Y) & = exp {v_{0}} \end{matrix}

(2a)

\begin{matrix} H_{p} (Y, λ_{p} B_{p}) & = exp {v_{p}} \forall p = 1, 2, \end{matrix}

(2b)

where

θ \leq 1

and

λ_{p} \leq 1

are technical and environmental efficiencies, respectively; and

(v_{0}, v_{p})

are the white noise terms. The transformation function

F (\cdot)

is assumed to satisfy all standard properties such as continuity, positive (negative) monotonicity in Y (X), linear homogeneity in Y and concavity in X and Y. Similarly, the residual generation function

H_{p} (\cdot)

is continuous, positively (negatively) monotonic in

B_{p}

(Y), linearly homogeneous in

B_{p}

and convex in Y and

B_{p}

.

Thus, the production system (2) permits the identification of both the technical and environmental efficiencies: θ and

λ_{p} (p = 1, \dots, P)

. The latter is feasible due to the separability of the primary desirable-output generating production technology (2a) from the undesirable-output residual generating technologies (2b), which is motivated by the by-production approach satisfying the material balance condition. For instance, one would generally be unable to disentangle technical and environmental efficiencies (the way the above system approach allows us to) if following a popular alternative to the estimation of production processes in the presence of undesirable outputs based on the directional distance function [8].

Specifically, when modeling the production technology via a (single) directional distance function (e.g., [6]), the inefficiency is defined over the entire vector of outputs, both desirable and undesirable, using an a priori specified directional vector. The latter precludes researchers from disentangling the technical inefficiency conventionally oriented along desirable outputs from the environmental inefficiency oriented along undesirable outputs. The directional distance function rather produces a “composite” measure of inefficiency which is a weighted combination of the two, where the “weighting” is done on the basis of the prespecified direction. Further, unlike a system in (2), the directional distance function yields an additive, not a proportional, measure of inefficiency.

Technical and Environmental Productivity

The production system 𝕋 that we consider in this paper permits us to not only distinguish between technical efficiency θ (conventionally defined over the desirable output) and undesirable-output-specific environmental efficiencies

{λ_{p}; p = 1, \dots, P}

but to also differentiate between traditional technical productivity and environmental, or the so-called “green”, productivity.

Letting time t enter the transformation and residual generation functions

F (\cdot)

and

H_{p} (\cdot)

explicitly and making use of their linear homogeneity properties, system (2) can be rewritten in the log form as

\begin{matrix} ln Y_{t} & = ln f (X_{t}, t) - u_{0, t} + v_{0, t} \end{matrix}

(3a)

\begin{matrix} ln B_{p, t} & = ln h_{p} (Y_{t}, t) + u_{p, t} + v_{p, t} \forall p = 1, 2, \end{matrix}

(3b)

where, for convenience, we define

f (\cdot) \overset{def}{=} {[F (\cdot, 1)]}^{- 1}

and

h_{p} (\cdot) \overset{def}{=} {[H_{p} (\cdot, 1)]}^{- 1}

; and

u_{0, t} \overset{def}{=} - ln θ_{t} \geq 0

and

u_{p, t} \overset{def}{=} - ln λ_{p, t} \geq 0 (p = 1, 2)

are technical and environmental inefficiencies, respectively.

Total differentiation of (3) with respect to t yields

\begin{matrix} \frac{d ln Y_{t}}{d t} & = \sum_{j = 1}^{J} \frac{\partial ln f (X_{t}, t)}{\partial ln X_{j, t}} \frac{\partial ln X_{j, t}}{\partial t} + \frac{\partial ln f (X_{t}, t)}{\partial t} - \frac{\partial u_{0, t}}{\partial t} \end{matrix}

(4a)

\begin{matrix} \frac{d ln B_{p, t}}{d t} & = \frac{\partial ln h_{p} (Y_{t}, t)}{\partial ln Y_{t}} \frac{\partial ln Y_{t}}{\partial t} + \frac{\partial ln h_{p} (Y, t)}{\partial t} + \frac{\partial u_{p, t}}{\partial t} \forall p = 1, 2, \end{matrix}

(4b)

where we have made use of

\partial v_{0, t} / \partial t = \partial v_{p, t} / \partial t = 0

since

(v_{0}, v_{p})

are the i.i.d. white noise. After some rearranging, from (4a) we get the following Solow [14] type (Divisia) technical productivity index:

T P G \overset{def}{=} \frac{d ln Y_{t}}{d t} - \sum_{j = 1}^{J} \frac{\partial ln f (X_{t}, t)}{\partial ln X_{j, t}} \frac{\partial ln X_{j, t}}{\partial t} = \underset{T T C}{\underset{︸}{\frac{\partial ln f (X_{t}, t)}{\partial t}}} \underset{T E C}{\underset{︸}{- \frac{\partial u_{0, t}}{\partial t}}},

(5)

along with the similarly defined environmental productivity index from (4b):

E P G_{p} \overset{def}{=} - (\frac{d ln B_{p, t}}{d t} - \frac{\partial ln h_{p} (Y_{t}, t)}{\partial ln Y_{t}} \frac{\partial ln Y_{t}}{\partial t}) = \underset{E T C_{p}}{\underset{︸}{- \frac{\partial ln h_{p} (Y, t)}{\partial t}}} \underset{E E C_{p}}{\underset{︸}{- \frac{\partial u_{p, t}}{\partial t}}} \forall p = 1, 2 .

(6)

The negative monotonicity of

F (\cdot)

and

H_{p} (\cdot)

in inputs and desirable outputs, respectively, imply that

\partial ln f (X_{t}, t) / \partial ln X_{j, t} \geq 0

and

\partial ln h_{p} (Y_{t}, t) / \partial ln Y_{t} \geq 0

.4

Unlike

T P G

which is conventionally defined as the expansion rate in a desirable output unexplained by the growth in inputs, the environmental productivity index

E P G

is defined as the contraction rate in an undesirable output unexplained by the contraction in desirable outputs.5 This allows crediting the ceteris paribus reduction in undesirable outputs.

Further, Equations (5) and (6) provide a meaningful way to decompose respective productivity indices into technical/technological change and efficiency change. The conventional technical productivity index

T P G

equals the sum of the technical change

T T C = \partial ln f (X_{t}, t) / \partial t

, which measures the temporal shift in the production frontier, and technical efficiency change

T E C = - \partial u_{0, t} / \partial t

, which measures the movement toward (away from) the frontier. Similarly, the

B_{p}

-oriented environmental productivity index

E P G_{p}

is decomposed into similarly interpreted environmental technological change

E T C_{p} = - \partial ln h_{p} (Y_{t}, t) / \partial t

and environmental efficiency change

E E C_{p} = - \partial u_{p, t} / \partial t

.

Note the conceptual difference between the definition of a “technological progress” for desirable outputs and that for undesirable outputs. Namely, for a desirable output Y the technological progress corresponds to the case of

T T C > 0

, i.e., an outward shift in the production frontier over time, whereas for an undesirable output

B_{p}

the technological progress corresponds to

E T C_{p} < 0

, i.e., an inward shift in the residual generating frontier over time. Thus, the residual generating frontier

H_{p} (\cdot) (p = 1, \dots, P)

is defined as the minimum quantity of undesirable output generated when producing a given quantity of desirable outputs subject to the material balance condition.

We emphasize that the primary advantage of employing a system approach to model the production process with undesirable outputs, which we consider in this paper, is the opportunity to disentangle technical and environmental productivities. For instance, as in the case of inefficiency, one generally cannot do that when using the productivity index based on the directional distance function [9].

3. Data

The data we use come from Pasurka [17] and Murty et al. [7]. A balanced panel consists of 92 coal-fired electric power generating plants operating in the U.S. over the period from 1985 to 1995. We focus on coal-fired plants only in order to minimize heterogeneity among units. More specifically, we focus on utilities of which at least 95% of total fuel consumption (measured in Btu) come from coal. We also exclude utilities whose consumption of fuels other than coal, oil and natural gas exceeds

10^{- 4} %

of total fuel consumption.

The specification of outputs and inputs is as follows. The desirable output is the net electric power generation Y, measured in kWh. The two undesirable outputs are (i) the SO₂ (sulfur dioxide) gas emissions

B_{1}

and (ii) the NO_x (nitrogen oxides) gas emissions

B_{2}

, both measured in short-tons. The three inputs to the production are (i) the real stock of physical capital

X_{1}

, constructed from historical cost of plant data and deflated to constant dollars using the Handy-Whitman Index; (ii) labor

X_{2}

, measured in the number of employees; and (iii) energy

X_{3}

, i.e., the heat content of coal, oil and natural gas consumption, measured in Btu.

The data on the cost of plants and equipment (used in the construction of the capital stock) and the number of employees come from the U.S. Federal Energy Regulatory Commission Form 1 survey. The data on fuel consumption, net power generation and pollutant gas emissions come from the U.S. Department of Energy Form EIA-767 survey. For more details on the data, see Pasurka [17].

4. Econometric Strategy

Under the assumption of the translog functional form of

ln f (\cdot)

and

ln h_{b} (\cdot)

, from system (3) we get the following system consisting of the production function for a desirable output

Y_{i t}

:

\begin{matrix} y_{i t} = & α_{0, i} + \sum_{j = 1}^{J} α_{j} x_{j, i t} + \frac{1}{2} \sum_{h = 1}^{J} \sum_{j = 1}^{J} α_{h j} x_{h, i t} x_{j, i t} + \\ \sum_{t^{'} = 1}^{T} β_{t^{'}} D_{i t^{'}} + \sum_{t^{'} = 1}^{T} \sum_{j = 1}^{J} β_{t^{'} j} D_{i t^{'}} x_{j, i t^{'}} + v_{0, i t} - u_{0, i t}, i = 1, \dots, n; t = 1, \dots, T, \end{matrix}

(7)

complemented by the (environmental) residual generation technologies for undesirable outputs

(B_{1, i t}, B_{2, i t})

:

\begin{matrix} b_{1, i t} & = γ_{0, i} + γ_{1} y_{i t} + \frac{1}{2} γ_{11} y_{i t}^{2} + \sum_{t^{'} = 1}^{T} φ_{t^{'}} D_{i t} + \sum_{t^{'} = 1}^{T} φ_{t^{'} 1} D_{i t} y_{i t} + v_{1, i t} + u_{1, i t} \end{matrix}

(8a)

\begin{matrix} b_{2, i t} & = δ_{0, i} + δ_{1} y_{i t} + \frac{1}{2} δ_{11} y_{i t}^{2} + \sum_{t^{'} = 1}^{T} ψ_{t^{'}} D_{i t} + \sum_{t^{'} = 1}^{T} ψ_{t^{'} 1} D_{i t} y_{i t} + v_{2, i t} + u_{2, i t}, \end{matrix}

(8b)

where a lower-case variable denotes the log of its upper-case counterpart, and

D_{i t}

denotes the time dummy. For greater flexibility, we also allow for unobserved firm-specific heterogeneity which we model via “true” random effects

{(α_{0, i}, γ_{0, i}, δ_{0, i}); i = 1, \dots, n}

. The presence of these random effects (in addition to inefficiencies) captures additional technological heterogeneity among firms.

Since

y_{i t}

appears on the right-hand side of equations for undesirable outputs

b_{1, i t}

and

b_{2, i t}

, it is imperative that all three equations in (7)–(8) be estimated as a system (of simultaneous equations) in order to control for the endogeneity of outputs. We estimate this production system subject to symmetry (

α_{h j} = α_{j h}

) as well as monotonicity and curvature restrictions. In this paper, we thus concur with Barnett [15] and Barnett [16] on the importance of maintaining the latter theoretical regularity conditions when modeling technology (especially, if allowing for inefficiency) in order to ensure that the results are economically meaningful.

Specifically, the monotonicity conditions are:

\begin{matrix} \frac{\partial y_{i t}}{\partial x_{j, i t}} & = α_{j} + \sum_{h = 1}^{J} α_{h j} x_{h, i t} + \sum_{t^{'} = 1}^{T} β_{t^{'} j} D_{i t^{'}} \geq 0 \forall j = 1, \dots, J \\ \frac{\partial b_{1, i t}}{\partial y_{i t}} & = γ_{1} + γ_{11} y_{i t} + \sum_{t^{'} = 1}^{T} φ_{t^{'} 1} D_{i t} \geq 0 \\ \frac{\partial b_{2, i t}}{\partial y_{i t}} & = δ_{1} + δ_{11} y_{i t} + \sum_{t^{'} = 1}^{T} ψ_{t^{'} 1} D_{i t} \geq 0 . \end{matrix}

(9)

The curvature is imposed using restrictions on the eigenvalues of the Hessian matrices in levels (see [18]). We employ the following stochastic specification for system (7)–(8):

\begin{matrix} v_{i t} = {[v_{0, i t}, v_{1, i t}, v_{2, i t}]}^{'} & \sim i.i.d. 𝒩 (0, Σ) \end{matrix} \begin{matrix} u_{i t} = {[u_{0, i t}, u_{1, i t}, u_{2, i t}]}^{'} & \sim 𝒩_{+} (Z_{i t} τ, Σ_{u}), \end{matrix}

(10)

where

𝒩_{+}

denotes the (multivariate) half-normal distribution;6 Σ and

Σ_{u}

are the covariance matrices;

Z_{i t} = I_{3} \otimes D

where

I_{κ}

is an identity matrix of dimension κ and

D = {[D_{i 1}, \dots, D_{i T}]}^{'}

; and

τ = vec \{τ_{k t}; k = 0, 1, 2; t = 1, \dots, T\}

is a set of

3 T

unknown parameters. The location parameters of each inefficiency term

u_{k, i t} (k = 0, 1, 2)

is given by

\sum_{t = 1}^{T} τ_{k t} D_{i t}

. Thus, for greater flexibility in modeling time effects, we allow inefficiency to be time-varying (i.e., a function of the time dummies). The error components

(v_{i t}, u_{i t})

are assumed to be orthogonal as well as independent of

x_{j, i t} (j = 1, \dots, J)

. Further, the random effects

(α_{0, i}, γ_{0, i}, δ_{0, i})

are assumed to be identically, independently distributed from the error components

(v_{i t}, u_{i t})

as well as independent of

x_{j, i t} (j = 1, \dots, J)

:

{[α_{0, i}, γ_{0, i}, δ_{0, i}]}^{'} \sim i.i.d. 𝒩 (0, Ω),

(11)

where

Ω = diag {σ_{α}^{2}, σ_{γ}^{2}, σ_{δ}^{2}}

.

4.1. Priors

For the parameters in system (7)–(8), which we collectively denote by ϑ, we assume a non-informative prior that imposes the regularity restrictions so that

p (ϑ) \propto I (ϑ \in ℛ)

, where

ℛ

denotes the set of acceptable parameters. For scale parameters

(σ_{α}^{2}, σ_{γ}^{2}, σ_{δ}^{2})

, we assume

p (σ_{k}) \propto σ^{- (N^{'} + 1)} exp \{- Q^{'} / (2 σ_{k}^{2})\}

\forall k \in {α, γ, δ}

, where

N^{'} = 1

and

Q^{'} = 10^{- 4}

. For τ, we assume a proper but relatively non-informative prior of the form

τ \sim 𝒩 (0, c I_{3 T})

with

c = 10^{4}

. For Σ and

Σ_{u}

, we assume proper but relatively non-informative priors in the Wishart family. The results are not sensitive to c,

N^{'}

or

Q^{'}

unless c becomes approximately less than 0.1, in which case it approaches the domain of a dogmatic prior.

One may inquire if it would be possible to select objective priors such as in the case of Jeffreys’ prior. One way to proceed with objective priors would be along the lines of Berger and Mortera [20] and Mulder et al. [21]. For instance, the use of a constrained posterior prior along the lines of Berger and Mortera [20] is an option. The Jeffreys’ prior cannot be obtained analytically but can be computed using numerical or analytic derivatives. This computation is certainly heavy. Furthermore, the Jeffreys’ prior is not used as much in the present literature, and the emphasis is rather placed on the so-called intrinsic Bayes factor (see [22]). We leave the issue for future research, but we do not expect much change since our results were not sensitive to important aspects of the prior.

4.2. Posterior Distribution

For convenience, we let

σ^{2} \overset{def}{=} (σ_{α}^{2}, σ_{γ}^{2}, σ_{δ}^{2})

and

α_{i} \overset{def}{=} (α_{0, i}, γ_{0, i}, δ_{0, i})

. The kernel posterior distribution of all parameters denoted by

θ \in R^{d}

(a superset of ϑ), if conditioned on the latent inefficiencies and random effects, is given by

\begin{matrix} p (θ | Ξ, α, u) \propto & {| Σ |}^{- n T / 2} exp \{- \frac{1}{2} \sum_{i = 1}^{n} \sum_{t = 1}^{T} (r_{i t} - u_{i t}) Σ^{- 1} (r_{i t} - u_{i t})\} \times \\ exp \{- \frac{1}{2} \sum_{i = 1}^{n} \sum_{t = 1}^{T} (u_{i t} - Z_{i t} τ) Σ_{u}^{- 1} (u_{i t} - Z_{i t} τ)\} \times \\ \prod_{i = 1}^{n} \prod_{t = 1}^{T} Φ_{3}^{- 1} (C_{u} (Z_{i t} τ)) \times p (ϑ, τ, Σ, Σ_{u}, σ), \end{matrix}

(12)

where

α = {α_{i}; i = 1, \dots, n}

and

u = {u_{i t}; i = 1, \dots, n; t = 1, \dots, T)

. Also:

r_{i t} = [\begin{matrix} - y_{i t} + α_{0, i} + Ξ_{0, i t} ϑ_{0} \\ b_{1, i t} - γ_{0, i} - Ξ_{1, i t} ϑ_{1} \\ b_{2, i t} - δ_{0, i} - Ξ_{1, i t} ϑ_{2} \end{matrix}],

(13)

where

Ξ_{0, i t}

and

Ξ_{1, i t}

denote regressors in (7) and (8), respectively (some of which are endogenous);

{ϑ_{k}; k = 0, 1, 2}

denotes vectors of parameters in the three equations of the system and

ϑ = {[ϑ_{0}^{'}, ϑ_{1}^{'}, ϑ_{2}^{'}]}^{'}

; and

Ξ = {Ξ_{0, i t}, Ξ_{1, i t}, y_{i t}, b_{1, i t}, b_{2, i t}; i = 1, \dots, n; t = 1, \dots, T}

denotes the entire available data. Further,

Σ_{u}^{- 1} = C_{u}^{'} C_{u}

(via the Cholesky decomposition) and

Φ_{k} (w)

denotes the k-variate normal probability integral evaluated at some vector

w \in R^{k}

.

The first term in the third line of (12) owes to the constraint

u_{i t} \geq 0

. Specifically, our stochastic assumptions about

u_{i t}

imply the density

\begin{matrix} p (u_{i t} | Z_{i t}, τ, Σ_{u}) = & {(2 π)}^{- 3 / 2} {| Σ_{u} |}^{- 1 / 2} Φ_{3}^{- 1} (C_{u} (Z_{i t} τ)) \times \\ exp \{- \frac{1}{2} (u_{i t} - Z_{i t} τ) Σ_{u}^{- 1} (u_{i t} - Z_{i t} τ)\}, \end{matrix}

(14)

which requires the evaluation of a tri-variate normal integral that can be performed using standard numerical algorithms. Before proceeding with MCMC methods for inference, note that the posterior is given by

p (θ | Ξ) \propto \int_{R^{n}} \int_{R_{+}^{n T}} p (θ | Ξ, α, u) d u d α .

(15)

While the multivariate integration can be performed in the closed form with respect to inefficiencies u, the induced nonlinearity however precludes analytical integration with respect to random effects α. We are not aware of any efficient MCMC scheme that draws these random effects as a block from the posterior, especially when n is relatively large.

The posterior conditional distribution of latent inefficiencies is

u_{i t} | \cdot \sim 𝒩_{+} (V (Σ^{- 1} r_{i t} + Σ_{u}^{- 1} Z_{i t} τ), V),

(16)

where

V = {(Σ^{- 1} + Σ_{u}^{- 1})}^{- 1}

. Draws from the above conditional distribution can be easily obtained. The same is true for the posterior conditional distribution of the random effects if we write

r_{i t}

from (13) as

r_{i t} \equiv α_{i} - R_{i t}

, where

R_{i t}

is defined in an obvious way. We can then draw the random effects as a block for observation i as follows

α_{i} | \cdot \sim 𝒩 ({\bar{α}}_{i}, V_{α}),

(17)

where

\begin{matrix} {\bar{α}}_{i} & = {(T Σ + Ω^{- 1})}^{- 1} \times T Σ^{- 1} {\bar{R}}_{i}, {\bar{R}}_{i} = T^{- 1} \sum_{t = 1}^{T} R_{i t} \\ V_{α} & = {(T Σ + Ω^{- 1})}^{- 1} . \end{matrix}

(18)

If it were not for the regularity constraints and the non-standard form of the posterior conditional distribution of τ (due do the term involving the multivariate normal integral), we could easily derive the posterior conditional distribution of parameters of interest ϑ.

Collecting data for all observations over

i = 1, \dots, n

and

t = 1, \dots, T

, we rewrite our model in an obvious matrix notation:

[\begin{matrix} y + u_{0} & = & Ξ_{0} ϑ_{0} + v_{0} \\ b_{1} - u_{1} & = & Ξ_{1} ϑ_{1} + v_{1} \\ b_{2} - u_{2} & = & Ξ_{1} ϑ_{2} + v_{2} \\ u_{0} & = & Z τ_{0} + ζ_{0} \\ u_{1} & = & Z τ_{1} + ζ_{1} \\ u_{2} & = & Z τ_{2} + ζ_{2} \end{matrix}],

(19)

where we assume

ζ = {[ζ_{0}^{'}, ζ_{1}^{'}, ζ_{2}^{'}]}^{'} \sim 𝒩 (0, Σ_{u}) .

(20)

We rewrite the system of equations (19) compactly as

Y = X ϖ + E,

(21)

where 𝕐 is an

n T \times (2 \times 3)

vector of “data” appearing on the left-hand side of equalities in (19),

X = diag {Ξ_{0}, Ξ_{1}, Ξ_{2}, Z, Z, Z}

,

E = {[v_{0}^{'}, v_{1}^{'}, v_{2}^{'}, ζ_{0}^{'}, ζ_{1}^{'}, ζ_{2}^{'}]}^{'}

and

ϖ = {[ϑ^{'}, τ^{'}]}^{'}

is a conformable vector of parameters.

System (21) takes the form of a multivariate regression model with

cov {E} = V = Φ \otimes I_{n T}, where Φ = [\begin{matrix} Σ & 0 \\ 0 & Σ_{u} \end{matrix}] .

(22)

The GLS estimator of Θ is given by

\hat{ϖ} = {(X^{'} V^{- 1} X)}^{- 1} X^{'} V^{- 1} Y,

(23)

with the corresponding covariance matrix:

cov {\hat{ϖ}} = {(X^{'} V^{- 1} X)}^{- 1} .

(24)

We note that the above approximation however ignores that ϑ included in ϖ needs to satisfy

ϑ \in ℛ

(the regularity conditions).

Let us define a multivariate normal distribution centered at

\hat{ϖ}

of which the covariance is

h \times cov {\hat{ϖ}}

for some constant

h > 0

. We denote this distribution by

f_{𝒩, κ} (ϖ; \hat{ϖ}, h \times cov {\hat{ϖ}})

, where κ is the dimensionality of ϖ, i.e., the number of parameters in the extended system (19). We use the GLS quantities to form a proposal density for generating candidate parameter draws as we describe below.

Next, we describe how to realize draws from the conditional posterior distributions of σ, Σ and

Σ_{u}

. Except for

Σ_{u}

, σ and Σ can be drawn from standard statistical distributions. Specifically, for the elements of σ we have:

\begin{matrix} \frac{Q^{'} + Q_{α}}{σ_{α}^{2}} | θ_{- α}, Ξ & \sim χ_{n + N^{'}}^{2} \\ \frac{Q^{'} + Q_{γ}}{σ_{γ}^{2}} | θ_{- γ}, Ξ & \sim χ_{n + N^{'}}^{2} \\ \frac{Q^{'} + Q_{δ}}{σ_{δ}^{2}} | θ_{- δ}, Ξ & \sim χ_{n + N^{'}}^{2}, \end{matrix}

(25)

where

Q_{α} = \sum_{i = 1}^{n} α_{0, i}^{2}

,

Q_{γ} = \sum_{i = 1}^{n} γ_{0, i}^{2}

and

Q_{δ} = \sum_{i = 1}^{n} δ_{0, i}^{2}

. Here,

θ_{- k}

denotes all elements of the entire parameter vector θ including all latent variables except the indicated subscripted parameter

k \in {α, γ, δ}

.

Our priors are conditionally conjugate, i.e.,

\begin{matrix} p (Σ) & \propto | Σ^{- 1} |^{\bar{N} + n T - (3 + 1) / 2} exp \{- \frac{1}{2} \bar{A} Σ^{- 1}\} \\ p (Σ_{u}) & \propto | Σ_{u}^{- 1} |^{\bar{N} - (3 + 1) / 2} exp \{- \frac{1}{2} {\bar{A}}_{u} Σ_{u}^{- 1}\}, \end{matrix}

(26)

where

\bar{N}

is a scalar prior parameter and

\bar{A}, {\bar{A}}_{u}

are prior matrices. In our empirical work we take

\bar{N} = 10

and

\bar{A} = {\bar{A}}_{u} = 10^{- 3} \times I_{3}

.

The posterior conditional of Σ is

p (Σ^{- 1} | θ_{- Σ}, Ξ) \propto {| Σ^{- 1} |}^{\bar{N} + n T - (3 + 1) / 2} exp \{- \frac{1}{2} tr {\bar{A} + A} Σ^{- 1}\},

(27)

where

A = (Y_{k} - X_{k} ϑ_{k}) {(Y_{k^{'}} - X_{k^{'}} ϑ_{k^{'}})}^{'}

for

k, k^{'} = 0, 1, 2

.

The conditional posterior of

Σ_{u}

is

\begin{matrix} p (Σ_{u}^{- 1} | θ_{- Σ_{u}}, Ξ) \propto & | Σ_{u}^{- 1} |^{\bar{N} + n T - (3 + 1) / 2} exp \{- \frac{1}{2} tr {{\bar{A}}_{u} + A_{u}} Σ_{u}^{- 1}\} \times \\ \prod_{i = 1}^{n} \prod_{t = 1}^{T} Φ_{3}^{- 1} (C_{u} (Z_{i t} τ)), \end{matrix}

(28)

where

A_{u} = (u_{m} - D τ_{m}) {(u_{m} - D τ_{m})}^{'}

for

k, k^{'} = 0, 1, 2

.

Clearly,

Σ^{- 1}

belongs to the Wishart family. The same would have been true for

Σ_{u}

if it were not for the second line of (28) which involves the Cholesky factor of this matrix. Therefore, we use the Wishart distribution to draw a candidate matrix and we retain the candidate with probability

min \{1, \frac{\prod_{i = 1}^{n} \prod_{t = 1}^{T} Φ_{3}^{- 1} (C_{u}^{(c)} (Z_{i t} τ))}{\prod_{i = 1}^{n} \prod_{t = 1}^{T} Φ_{3}^{- 1} (C_{u}^{(s)} (Z_{i t} τ))}\},

(29)

where

C_{u}^{(c)}

denotes the candidate draw and

C_{u}^{(s)}

is the existing sth draw (

s = 1, . . ., S

).

4.3. Imposition of Restrictions

Imposing restrictions is not trivial in our application. Since the restrictions depend on the data, we adopt the following strategy. We draw from the proposal density described in the previous subsection subject to the constraints

ϑ \in ℛ

using a special form of rejection to improve the efficiency of “naive rejection” which would keep drawing parameters until all constraints are satisfied. Specifically, we first use acceptance at a limited number of points to facilitate acceptance and then we keep drawing from the proposal distribution until all regularity constraints hold at all data points.

We first impose the restrictions at the means of variables (normalized to zero) and then at points

\pm r

around the mean. We choose

r = {0.5, 1, 2, 3}

, and the restrictions hold without much trouble in the positive direction. In the negative direction, the restrictions are first tested for

r = - 0.1, - 0.2, . . ., - 2

and then tested at the remaining points. This yields considerable improvement in the efficiency (i.e., timing) of acceptance rates from a density which we describe next.

Based on a current draw

ϖ^{(s)}

such that

ϑ^{(s)} \in ℛ

, a new candidate

ϖ^{(c)} \sim 𝒩 (\hat{ϖ}, h \times cov {\hat{ϖ}}) \times I (ϑ^{(c)} \in ℛ)

is generated until it satisfies the regularity restrictions. The candidate is accepted and we set

ϖ^{(s + 1)} = ϖ^{(c)}

with the Metropolis-Hastings probability

min \{1, \frac{p (ϖ^{(c)} | α, u, Ξ) / f_{𝒩, κ} (ϖ^{(c)}; \hat{ϖ}, h \times cov {\hat{ϖ}})}{p (ϖ^{(s)} | α, u, Ξ) / f_{𝒩, κ} (ϖ^{(s)}; \hat{ϖ}, h \times cov {\hat{ϖ}})}\},

(30)

otherwise we repeat the current draw, that is

ϖ^{(s + 1)} = ϖ^{(s)}; s = 1, \dots, S

. We adjust the scaling constant h so that the acceptance rate of the Metropolis-Hastings algorithm is between 20% and 30%. The Metropolis-Hastings algorithm also takes care of nonlinearity of the posterior in τ.

We generate the covariance matrix Σ and scale parameters σ from their respective posterior conditional distributions which are all in standard form (inverted Wishart and inverted Gamma). The latter however is not the case for

Σ_{u}

(i.e., this matrix cannot be drawn using an inverted Wishart). We therefore take an extra Metropolis-Hastings step to accommodate the presence of the Cholesky factor of

Σ_{u}

(i.e.,

C_{u}

) in the posterior inside the multivariate normal integral. Acceptance rates using a simple Metropolis-Hastings step were quite high (over 90%), and simple scaling has brought it down to the range of 20%–25%.

Our MCMC uses

S^{'}

preliminary or transient passes until we obtain convergence using Geweke’s [23] relative numerical efficiency (RNE) diagnostic. Once convergence is achieved, we take another 100,000 passes. We do not use thinning. Instead, we report posterior standard deviations based on Newey-West HAC covariance matrices using 10 lags. For details, see Table 1.

Table 1. Computational Experience with the Data.

**Table 1.** Computational Experience with the Data.
	(1)	(2)
median RNE	0.113	0.615
median NSE	0.0010	0.0012
draws to convergence	150,000	70,000
median ACF at lag 50	0.977	0.312

NOTES: (1) denotes MCMC using full MCMC, drawing random effects and inefficiencies through regular Gibbs sampling. In (2), MCMC is applied by marginalizing the random effects to draw inefficiencies. RNE—relative numerical efficiency; NSE—numerical standard error; ACF—autocorrelation function.

4.4. Improving Performance of MCMC

We can explicitly integrate u out of (12) to obtain a kernel posterior of the following form:

p (θ | Ξ, α) \propto \int_{R_{+}^{n T}} p (θ | Ξ, α, u) d u .

(31)

Further, we can also derive the closed-form conditional posterior of random effects

p (α | Ξ, θ_{- α})

.

We can achieve a significant improvement of MCMC performance by recognizing that the random effects α can be explicitly integrated out of the posterior, when the parameters ϑ are drawn. Specifically, similar to (19), we consider the following system

\begin{matrix} y_{i t} + u_{0, i t} & = α_{0, i} + Ξ_{0, i t} ϑ_{0} + v_{0, i t} \\ b_{1, i t} - u_{1, i t} & = γ_{0, i} + Ξ_{1, i t} ϑ_{1} + v_{1, i t} \\ b_{2, i t} - u_{2, i t} & = δ_{0, i} + Ξ_{1, i t} ϑ_{2} + v_{2, i t}, \end{matrix}

(32)

which we can rewrite in compact notation as

Y_{i t} = α_{i} + X_{i t} ϑ + v_{i t},

(33)

where

X_{i t} = diag {Ξ_{0, i t}, Ξ_{1, i t}, Ξ_{2, i t}}

. Collecting all (time) observations for a given firm i together, we obtain:

Y_{i} = X_{i} ϑ + V_{i},

(34)

where

Y_{i}

,

X_{i}

and

V_{i}

are defined in an obvious way. Clearly:

V_{i} \sim 𝒩 (0, Ω \otimes J_{T} + Σ \otimes I_{T}) \forall i = 1, \dots, n,

(35)

where

J_{T}

is a

T \times T

matrix of which all elements are equal to one.

Therefore, we can redefine the GLS quantities that are used to obtain a good proposal distribution for ϑ using the following as the covariance matrix V, i.e.,

V = (Ω \otimes J_{T} + Σ \otimes I_{T}) \otimes I_{n} .

(36)

Using the modified proposal density, we effectively marginalize out the random firm-specific effects from the posterior and thus can draw latent inefficiencies

u_{i t}

marginally on these effects hoping to reduce overall autocorrelation in MCMC due to the correlation between

α_{i}

and

u_{i t}

.7 This requires a trivial modification in the way we draw latent inefficiencies.

Since model (32) may be written as

\begin{matrix} - y_{i t} & = - α_{0, i} - Ξ_{0, i t} ϑ_{0} - v_{0, i t} + u_{0, i t} \\ b_{1, i t} & = γ_{0, i} + Ξ_{1, i t} ϑ_{1} + v_{1, i t} + u_{1, i t} \\ b_{2, i t} & = δ_{0, i} + Ξ_{1, i t} ϑ_{2} + v_{2, i t} + u_{2, i t}, \end{matrix}

(37)

after collecting all (time) observations for a given firm i, in obvious notation, we have

Ψ_{i} = X_{i} ϑ + V_{i} + U_{i},

(38)

where

X_{i}

and

V_{i}

are naturally redefined to account for a change of sign in the first equation of (37) in order to accommodate a uniform sign in front of inefficiencies

U_{i}

. Also recall that

V_{i}

and its stochastic properties have been defined before. Now we can draw

3 T \times 1

inefficiencies

U_{i}

as a block, after we couple system (38) with the following specification:

U_{i} = (I_{T} \otimes Z_{i}) τ + ζ_{i}

(39)

subject to

U_{i} \geq 0

and

ζ_{i} \sim 𝒩 (0, Σ_{u} \otimes I_{T}); i = 1, \dots, n

.

Since

cov {U_{i}, ζ_{i}} = [\begin{matrix} Ω \otimes J_{T} + Σ \otimes I_{T} & 0 \\ 0 & Σ_{u} \otimes I_{T} \end{matrix}] \overset{def}{=} [\begin{matrix} H & 0 \\ 0 & M \end{matrix}],

(40)

we can draw latent inefficiencies, marginally on random effects, using the following multivariate truncated normal distributions, i.e.,

U_{i} \sim 𝒩_{+} ({\tilde{U}}_{i}, W),

(41)

the first two moments of which are

\begin{matrix} {\tilde{U}}_{i} & = {(H^{- 1} + M^{- 1})}^{- 1} \{H^{- 1} (Ψ_{i} - X_{i} ϑ) + M^{- 1} (I_{T} \otimes Z_{i}) τ\} \\ W & = {(H^{- 1} + M^{- 1})}^{- 1} . \end{matrix}

(42)

We have found that drawing blocks of latent inefficiencies marginally on random effects

α_{i}

(and conditionally on various covariance matrices and

(ϑ, τ)

) results in vast improvements in terms of computational efficiency. Table 1 summarizes our computational experience with the data.

4.5. Random Effects

Based on our discussion above, we note that given the way the variance parameters

σ^{2}

enter the covariance matrix Ω, in principle, there is no problem in treating the random effects

α_{i}

as jointly normally distributed, i.e.,

α_{i} \sim 𝒩 (0, Ω)

, independently over

i = 1, . . ., n

as well as independent from all other random variables and regressors in the model. All our derivations, including the conditional posterior distribution of

α_{i}

, hold true. The only difference is that, in a more general setting (when random effects are allowed to be correlated across equations) one has to draw Ω from its conditional posterior distribution as a general positive definite matrix, whereas, when the random effects are a priori independent, the problem boils down to drawing variances

σ^{2}

only.

A general case of correlated random effects has an empirical implication of firm-specific effects in the production function being correlated with those in residual generation functions for undesirable outputs. The latter possibility is testable and is of interest on its own. Should Ω be found not to be diagonal, one should naturally focus on the sign of the correlation between the random effects.

Given a proper prior

p (Ω)

on the different elements of Ω and the marginal posterior

p (Ω | Ξ)

, the Verdinelli and Wasserman [25] approach to computing the Bayes factor in favor of diagonality is given by

B F_{diag} = \frac{p (Ω = diagonal | Ξ)}{p (Ω = diagonal)},

(43)

which, in the general case, involves testing

k (k - 1) / 2

zero restrictions, where k is the dimension of Ω (in our case,

k = 3

). By “

Ω = diagonal

”, we mean the zero restrictions

Ω_{i j} = 0, i > j, i, j = 1, . . ., 3

, where

Ω \equiv [Ω_{i j}]

.

While the denominator of

B F_{diag}

is easy to compute, the numerator is computed in a standard fashion as

p (Ω = diagonal | Ξ) = S^{- 1} \sum_{s = 1}^{S} p (Ω = diagonal, θ_{- Ω}^{(s)} | Ξ)

(44)

where

θ_{- Ω}^{(s)}

is the sth (of S) MCMC draw of all parameters θ except those in Ω. Note that

θ_{- Ω}^{(s)}

does include the diagonal elements of Ω in this computation.

It remains to show how draws from the conditional posterior distribution may be realized. Our prior is conditionally conjugate and has the following form:

p (Ω^{- 1}) \propto {|Ω^{- 1}|}^{{\bar{N}}_{o} - (3 + 1) / 2} exp \{- \frac{1}{2} tr {A_{o} Ω^{- 1}}\} .

(45)

The conditional posterior distribution is given by

p (Ω^{- 1} | θ_{- Ω}, Ξ) \propto {|Ω^{- 1}|}^{{\bar{N}}_{o} + n T - (3 + 1) / 2} exp \{- \frac{1}{2} tr {A_{o} + A_{Ω}} Ω^{- 1}\},

(46)

where

A_{Ω} = \sum_{i = 1}^{n} R_{i} R_{i}^{'}

and

R_{i} \overset{def}{=} α_{i} - r_{i t}

as before. We define the baseline prior using

{\bar{N}}_{o} = 10

and

A_{o} = c \times I_{3}

with

c = 10^{- 3}

. Clearly, Ω belongs to the Wishart family.

The Bayes factor

B F_{diag}

using the baseline prior is

2.402 \times 10^{- 3}

with the corresponding range of

(1.015 \times 10^{- 5}; 0.0893)

, which suggests that the diagonality of Ω can be definitely rejected. In order to compute the range of

B F_{diag}

, we generate 1000 alternative priors and implement the approximated MCMC using the sampling-iterative-resampling (SIR) algorithm which re-weights the original MCMC sample without recomputing MCMC samples for a new prior. For each one of the SIR re-weighted samples, we implement the Verdinelli and Wasserman [25] approach, and the range is taken as the 95% confidence interval of the computed approximate Bayes factors in favor of diagonality. The 1000 alternative priors are generated by uniformly varying

{\bar{N}}_{o}

in the interval

[1; 100]

and c in the interval

[10^{- 7}; 10]

without restricting them to integer values.

Table Table 2 reports the posterior means and standard deviations of the correlation coefficients between random effects

α_{i}

derived from Ω. We find that unobserved firm-specific effects are all positively correlated.

Table 2. Correlation between Random Effects.

**Table 2.** Correlation between Random Effects.
	$γ_{0, i}$	$δ_{0, i}$
$α_{0, i}$	0.831 (0.0011)	0.630 (0.0130)
$γ_{0, i}$		0.601 (0.0102)

NOTES: The random effects

(α_{0, i}, γ_{0, i}, δ_{0, i})

are for

(y, b_{1}, b_{2})

, respectively. Standard deviations are in parenthesis and are computed using a Newey-West HAC correction with 10 lags.

5. Results

Before proceeding to the discussion of technical and environmental inefficiencies as well as productivity and its components, we first focus on elasticities of the production process by electric utilities in our data sample. Table 3 reports the summary of posterior estimates of these elasticities, including input elasticities of the primary production function (i.e., electric power generation) as well as elasticities of SO₂ and NO_x emissions (undesirable outputs) with respect to the net generated electric power (desirable output).8 In particular, the reported input elasticity estimates imply a posterior mean estimate of the returns to scale, defined as the sum of input elasticities, of 0.90, which suggests that, on average, electric utilities operated at decreasing returns to scale during our sample period.

The estimates of

\partial b_{p, i t} / \partial y_{i t} (p = 1, 2)

are of particular interest since they capture the cost of expanding the production of electric power in terms of the associated increase in the generation of the SO₂ and NO_x emissions. It is intuitive to interpret these estimates as “shadow prices” (in the elasticity form) of the power generation. The posterior mean estimates of the two shadow prices are 1.09 and 1.13. The latter implies that, on average, an increase in the net power generation by 1% requires a simultaneous increase in the SO₂ and NO_x emissions by at least 1.09% and 1.13%, respectively. Note that emissions may increase by even more if the firm is not on the residual generating frontier, i.e., environmentally inefficient.

We next proceed to the discussion of technical and environmental inefficiencies exhibited by the utilities in our sample. Figure 1 plots kernel densities of the posterior estimates of the three types of inefficiency. In order to construct the figure, we use a Gaussian kernel with the cross-validated bandwidth parameters. We find apparent differences between the distributions of technical and environmental inefficiencies. Specifically, while technical inefficiency is relatively symmetrically distributed around its mean of 0.09, the distribution of the NO_x-oriented environmental inefficiency is noticeably skewed to the right and the distribution of the SO₂-oriented environmental inefficiency exhibits apparent bi-modality. There may be many reasons for such a stark difference between the levels of technical and environmental inefficiencies across utilities. One plausible explanation is that technical inefficiency may also be capturing declines in the desirable output due to unforeseen fluctuations in the demand for electric power. Since inputs often cannot be immediately adjusted/reallocated and electric power is not easily storable, electric plants may be forced to under-utilize their facilities and labor, which our model would detect and classify as technical underperformance (inefficiency) relative to the frontier. However, such a demand uncertainty would not apply to the by-production of undesirable SO₂ and NO_x gases given the exact physical relationship between the power generation and the associated emission of pollutant gases. The latter is also capable of at least partly explaining why environmental inefficiency (in the emission of both SO₂ and NO_x gases) appears to be relatively more stable over time unlike the electric-power-oriented technical inefficiency, which we discuss in more detail later in the paper.

Table 3. Summary of Posterior Estimates.

**Table 3.** Summary of Posterior Estimates.
	Mean	Median	St.Dev.	95% Credible Interval
Elasticity
Capital Elasticity	0.2985	0.2984	0.0505	(0.1992; 0.3959)
Labor Elasticity	0.4032	0.4043	0.0482	(0.3076; 0.4935)
Energy Elasticity	0.2002	0.1998	0.0103	(0.1801; 0.2205)
RTS	0.9018	0.9032	0.0726	(0.7608; 1.0370)
SO₂ Shadow Price	1.0873	1.0664	0.1437	(0.8524; 1.4334)
NO_x Shadow Price	1.1275	1.1163	0.1161	(0.9366; 1.3988)
Inefficiency
Tech. Ineff.	0.0905	0.0915	0.0257	(0.0361; 0.1390)
SO₂ Env. Ineff.	0.0870	0.0875	0.0351	(0.0254; 0.1504)
NO_x Env. Ineff.	0.0458	0.0438	0.0156	(0.0186; 0.0798)
Efficiency Change
TEC	0.0029	0.0022	0.0308	(–0.0575; 0.0647)
SO₂ EEC	–0.0000	–0.0004	0.0099	(–0.0133; 0.0207)
NO_x EEC	0.0000	–0.0004	0.0038	(–0.0050; 0.0107)
Technological Change
TTC	–0.0042	–0.0027	0.0104	(–0.0272; 0.0117)
SO₂ ETC	0.0225	0.0224	0.0110	(0.0024; 0.0446)
NO_x ETC	0.0331	0.0332	0.0052	(0.0229; 0.0431)
Productivity Growth
TPG	–0.0013	0.0009	0.0323	(–0.0663; 0.0626)
SO₂ EPG	0.0225	0.0220	0.0149	(–0.0021; 0.0495)
NO_x EPG	0.0331	0.0330	0.0064	(0.0211; 0.0453)

Figure 1. Kernel Densities of Technical and Environmental Inefficiency Estimates.

Further, we find that electric utilities tend to suffer from higher levels of inefficiency in the emission of SO₂ than in the emission of NO_x gases. These differences may be reflective of varying degree of strictness of environmental regulations (or the degree of their enforceability) for different pollutants across states. For instance, loose regulations for the SO₂ emissions could potentially explain why the SO₂-oriented environmental inefficiency is considerably higher on average and is more dispersedly distributed than the NO_x-oriented inefficiency. Lastly, we document little correlation between the two environmental inefficiencies as well as between the environmental inefficiencies and the technical inefficiency. Table 4 reports such Spearman rank correlation coefficients for the posterior inefficiency estimates. For instance, there appears to be virtually no relationship between the electric-power-oriented technical inefficiency and the SO₂-oriented environmental inefficiency exhibited by utilities.

Table 4. Rank Correlation Coefficients.

**Table 4.** Rank Correlation Coefficients.
Inefficiency				Efficiency Change
Tech. Ineff	1.000			TEC	1.000
SO₂ Env. Ineff	0.080	1.000		SO₂ EEC	–0.059	1.000
NO_x Env. Ineff	0.112	0.227	1.000	NO_x EEC	0.080	0.139	1.000
Technological Change				Productivity Growth
TTC	1.000			TPG	1.000
SO₂ ETC	–0.001	1.000		SO₂ EPG	–0.024	1.000
NO_x ETC	–0.020	0.045	1.000	NO_x EPG	–0.049	0.065	1.000

We next proceed to the posterior estimates of the productivity growth components—technological change and efficiency change—as defined in (5) and (6). From Table 3, we see that the levels of both environmental efficiencies appear to be stable over time: posterior mean estimates of the SO₂

E E C

and the NO_x

E E C

are virtually zero. The left panel of Figure 2, which depicts box-and-whiskers plots of the distributions of technical and environmental efficiency change estimates for each year in our sample, confirms the relative stability of environmental efficiency levels (see the left panel of the figure). The electric-power-oriented (i.e., desirable-output-oriented) technical efficiency is however less stable over the course of the years. The mean estimates of

T E C

are predominantly positive across utilities in 1986, 1988 and 1995, whereas a significant decrease in efficiency is documented for 1987 and 1994. However, the mean posterior estimate of

T E C

pooled over the entire sample is a mere 0.29% (also see Table 3).

We document several regularities in the estimates of the technological change. First, the technical change

T T C

is primarily negative and is close to zero in the production of electric power (a desirable output). The posterior mean estimate of annual

T T C

is –0.42%. This finding is in sharp contrast with what we observe for the environmental technological change

E T C

. The posterior mean estimates of

E T C

for the SO₂ and NO_x emissions (undesirable outputs) are staggering 2.25% and 3.31% per annum, respectively. Second, we find that technological change is fairly stable across the years in all dimensions, be it the intended production of electric power or undesirable by-production of emission gases. The right panel of Figure 2 confirms this observation: the distributions of

T T C

and the two

E T C

do not change much over the course of the years.

The discussed differences in the temporal dynamics between technical and environmental efficiency change and technological change result in dramatic differences in the measures of productivity across desirable and undesirable outputs. We find that, while the pooled posterior mean estimate of (annual) productivity growth is negative for electric power generation (–0.13%), it is substantially positive for the SO₂ and NO_x emissions: annual 2.25% and 3.31%, respectively. In other words, keeping input quantities constant, the net electric power generation, on average, fell by 0.13% per year during our sample period. Utilities however did a significantly better job in terms of cutting the emission of SO₂ and NO_x gases for any fixed quantity of the net electric power generated: on average, emissions fell by respective 2.25% and 3.31% per year ceteris paribus. This disconnect between technical and environmental productivities of electric plants in our sample is also confirmed by virtually zero rank correlation coefficients between

T P G

and

E P G

for the SO₂ and NO_x emissions (see Table 4).

Figure 3 vividly illustrates the differences between technical and environmental productivities. It plots the productivity indices that are normalized to 100 in the year 1985 and are constructed using the (respective)-output-weighted average annual productivity growth rates (over all utilities in the sample). The figure shows that the industry enjoyed stable positive rates of

E P G

during the 1985–1995 period, whereas

T P G

was more sporadic and included periods of both positive and negative growth. The cumulative eleven-year growth is 23.26% for the SO₂-oriented

E P G

, 37.98% for the NO_x-oriented

E P G

and a mere 5.33% for the electric-power-oriented

T P G

.

Figure 2. Technical and Environmental Efficiency Change and Technological Change.

Figure 3. Technical and Environmental PG Indices.

6. Conclusions

The prevalent approaches to modeling the firm’s production process in the presence of undesirable outputs either treat these outputs as inputs, which questionably implies their strong disposability as well as violates the “material balance condition”, or employ the directional distance function that, despite a sound theoretical foundation, allows the identification of a “composite” measure of inefficiency only. Similarly, the directional-output-distance-function-based productivity index identifies only the composite productivity which is a weighted combination of the technical and environmental productivities. This precludes researchers from disentangling the technical inefficiency/productivity, conventionally defined over desirable outputs, from the environmental (“green”) inefficiency/productivity, defined over undesirable outputs.

In this paper, we follow a different path to modeling the production process with undesirable outputs in the spirit of Fernández et al. [11,12, Forsund [13] and Murty et al. [7]. Specifically, we model the productive operations of the firm as a system of simultaneous production technologies for desirable and undesirable outputs. In this setup, desirable outputs are produced by transforming inputs via the conventional transformation function satisfying all standard assumptions. Consistent with the material balance condition, the by-production of undesirable outputs is however treated as the so-called “residual generation technology”. The above setup explicitly recognizes that the generation of undesirable outputs is not the intended production but rather the by-production process. By separating the generation of undesirable outputs from that of desirable outputs, we ensure that the former are not modeled as inputs as well as take into account their “costly disposability”.

The by-production system approach that we employ in this paper permits us to not only distinguish between technical efficiency and (undesirable-output-specific) environmental efficiencies but to also differentiate between traditional technical productivity and environmental productivity. Specifically, we derive a Solow [14] type primal (Divisia) environmental productivity index which, unlike a conventional desirable-output-oriented productivity index, is defined as the contraction rate in undesirable outputs unexplained by the contraction in desirable outputs. This allows us to credit the ceteris paribus reduction in undesirable outputs. Our environmental productivity index also provides a meaningful way to decompose productivity into environmental technological change and environmental efficiency change.

Acknowledgments

We would like to thank two anonymous referees for many insightful comments and suggestions that helped improve this article. Any remaining errors are our own.

Author Contributions

All authors contributed equally to the project.

Conflicts of Interest

The authors declare no conflict of interest.

References

S.E. Atkinson, and J.H. Dorfman. “Bayesian measurement of productivity and efficiency in the presence of undesirable outputs: Crediting electric utilities for reducing air polution.” J. Econ. 126 (2005): 445–468. [Google Scholar] [CrossRef]
S.E. Atkinson, and E.G. Tsionas. “Directional distance functions: Optimal endogenous directions.” J. Econ., 2015, in press. [Google Scholar] [CrossRef]
S. Reinhard, C.A.K. Lovell, and G. Thijssen. “Econometric estimation of technical and environmental efficiency: An application to Dutch dairy farms.” Am. J. Agric. Econ. 81 (1999): 44–60. [Google Scholar] [CrossRef]
S. Reinhard, C.A.K. Lovell, and G. Thijssen. “Environmental efficiency with multiple environmentally detrimental variables; estimated with SFA and DEA.” Eur. J. Oper. Res. 121 (2000): 287–303. [Google Scholar] [CrossRef]
A. Hailu, and T.S. Veeman. “Non-parametric productivity analysis with undesirable outputs: An application to the Canadian pulp and paper industry.” Am. J. Agric. Econ. 83 (2001): 605–616. [Google Scholar] [CrossRef]
R. Färe, S. Grosskopf, D.W. Noh, and W. Weber. “Characteristics of a polluting technology: Theory and practice.” J. Econ. 126 (2005): 469–492. [Google Scholar] [CrossRef]
S. Murty, R.R. Russell, and S.B. Levkoff. “On modeling pollution-generating technologies.” J. Environ. Econ. Manag. 64 (2012): 117–135. [Google Scholar] [CrossRef]
Y. Chung, R. Färe, and S. Grosskopf. “Productivity and undesirable outputs: A directional distance function approach.” J. Environ. Manag. 51 (1997): 229–240. [Google Scholar] [CrossRef]
G. Feng, and A. Serletis. “Undesirable outputs and a primal Divisia productivity index based on the directional output distance function.” J. Econ. 183 (2014): 135–146. [Google Scholar] [CrossRef]
R. Färe, and S. Grosskopf. “Directional distance functions and slacks-based measures of efficiency.” Eur. J. Oper. Res. 200 (2010): 320–322. [Google Scholar] [CrossRef]
C. Fernández, G. Koop, and M.F.J. Steel. “Multiple-output production with undesirable outputs: An application to Nitrogen Surplus in Agriculture.” J. Am. Stat. Assoc. 97 (2002): 432–442. [Google Scholar] [CrossRef]
C. Fernández, G. Koop, and M.F.J. Steel. “Alternative efficiency measures for multiple-output production.” J. Econ. 126 (2005): 411–444. [Google Scholar] [CrossRef]
F. Forsund. “Good modelling of bad outputs: Pollution and multiple-output production.” Int. Rev. Environ. Resour. Econ. 3 (2009): 1–38. [Google Scholar] [CrossRef]
R.M. Solow. “Technical change and the aggregate production function.” Rev. Econ. Stat. 39 (1957): 312–320. [Google Scholar] [CrossRef]
W.A. Barnett, J. Geweke, and M. Wolfe. “Seminonparametric Bayesian estimation of the Asymptotically Ideal Production Model.” J. Econ. 49 (1991): 5–50. [Google Scholar] [CrossRef]
W.A. Barnett. “Tastes and technology: Curvature is not sufficient for regularity.” J. Econ. 108 (2002): 199–202. [Google Scholar] [CrossRef]
C.A. Pasurka. “Decomposing electric power plan emissions within a joint production framework.” Energy Econ. 28 (2006): 26–43. [Google Scholar] [CrossRef]
C.J. O’Donnell, and T.J. Coelli. “A Bayesian approach to imposing curvature on distance functions.” J. Econ. 126 (2005): 493–523. [Google Scholar] [CrossRef]
G. Koop, and M. Steel. “Bayesian analysis of stochastic frontier models.” In A Companion to Theoretical Econometrics. Edited by B. Baltagi. Malden, MA, USA: Blackwell Publishing Ltd., 2001. [Google Scholar]
J.O. Berger, and J. Mortera. “Default Bayes factors for nonnested hypothesis testing.” J. Am. Stat. Assoc. 94 (1999): 542–554. [Google Scholar] [CrossRef]
J. Mulder, H. Hoijtink, and I. Klugkist. “Equality and inequality constrained multivariate linear models: Objective model selection using constrained posterior priors.” J. Stat. Plan. Inference 140 (2010): 887–906. [Google Scholar] [CrossRef]
J.O. Berger, and L.R. Pericchi. “The intrinsic Bayes factor for model selection and prediction.” J. Am. Stat. Assoc. 91 (1996): 109–122. [Google Scholar] [CrossRef]
J. Geweke. “Evaluating the accuracy of sampling-based approaches to calculating posterior moments.” In Bayesian Statistics 4. Edited by J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith. Oxford, UK: Clarendon Press, 1992. [Google Scholar]
E.G. Tsionas, and S.C. Kumbhakar. “Firm heterogeneity, persistent and transient technical inefficiency: A generalized true random-effects model.” J. Appl. Econ. 29 (2014): 110–132. [Google Scholar] [CrossRef]
I. Verdinelli, and L. Wasserman. “Computing Bayes factors by using a generalization of the Savage-Dickey density ratio.” J. Am. Stat. Assoc. 90 (1995): 614–618. [Google Scholar] [CrossRef]

¹Färe and Grosskopf [10] have recently proposed the slacks-based directional distance function which allows inefficiency to be input- and output-specific. The estimation of such slacks-based inefficiencies however is feasible under the deterministic treatment of the production technology only. In this paper, we focus on the econometric estimation of stochastic production technologies that accommodate random disturbances.
²Implying a ceteris paribus contraction in these emissions.
³We differ from Fernández et al. [11,12] by formulating separate residual generation technologies for each undesirable output. The latter allows us to gauge $B_{p}$ -specific “green” productivity.
⁴Recall that $f (\cdot) = {[F (\cdot, 1)]}^{- 1}$ and $h_{p} (\cdot) = {[H_{p} (\cdot, 1)]}^{- 1}$ . Hence, negative monotonicity of $F (\cdot)$ and $H_{p} (\cdot)$ imply positive monotonicity of $f (\cdot)$ and $h_{p} (\cdot)$ .
⁵Recall that the quantity of undesirable outputs does down as desirable outputs decrease due to the complementarity of the two types of outputs.
⁶For a similar stochastic formulation, e.g., see Koop and Steel [19].
⁷For alternative ways to reduce this correlation in multiple random-effect models, see Tsionas and Kumbhkar [24].
⁸We also reestimate our model with no theoretical regularity constraints imposed. Consistent with one’s expectations, the unconstrained metrics generally have larger credible intervals. However, since the unconstrained estimates violate regularity conditions dictated by economic theory and thus have no meaningful economic interpretation, we do not report them here.

© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license ( http://creativecommons.org/licenses/by/4.0/).

Bayesian Approach to Disentangling Technical and Environmental Productivity

Abstract

1. Introduction

2. The By-Production Model

Technical and Environmental Productivity

3. Data

4. Econometric Strategy

4.1. Priors

4.2. Posterior Distribution

4.3. Imposition of Restrictions

4.4. Improving Performance of MCMC

4.5. Random Effects

5. Results

6. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics