Next Article in Journal
Are Some Forecasters’ Probability Assessments of Macro Variables Better Than Those of Others?
Next Article in Special Issue
Bayesian Model Averaging Using Power-Expected-Posterior Priors
Previous Article in Journal
Balanced Growth Approach to Tracking Recessions
Previous Article in Special Issue
Bayesian Model Averaging and Prior Sensitivity in Stochastic Frontier Analysis
Open AccessArticle

Improved Average Estimation in Seemingly Unrelated Regressions

Department of Economics, University of California, Riverside, CA 92521, USA
*
Author to whom correspondence should be addressed.
Econometrics 2020, 8(2), 15; https://doi.org/10.3390/econometrics8020015
Received: 24 February 2020 / Revised: 16 April 2020 / Accepted: 17 April 2020 / Published: 27 April 2020
(This article belongs to the Special Issue Bayesian and Frequentist Model Averaging)

Abstract

In this paper, we propose an efficient weighted average estimator in Seemingly Unrelated Regressions. This average estimator shrinks a generalized least squares (GLS) estimator towards a restricted GLS estimator, where the restrictions represent possible parameter homogeneity specifications. The shrinkage weight is inversely proportional to a weighted quadratic loss function. The approximate bias and second moment matrix of the average estimator using the large-sample approximations are provided. We give the conditions under which the average estimator dominates the GLS estimator on the basis of their mean squared errors. We illustrate our estimator by applying it to a cost system for United States (U.S.) Commercial banks, over the period from 2000 to 2018. Our results indicate that on average most of the banks have been operating under increasing returns to scale. We find that over the recent years, scale economies are a plausible reason for the growth in average size of banks and the tendency toward increasing scale is likely to continue
Keywords: Stein-type shrinkage estimator; asymptotic approximations; SUR; GLS Stein-type shrinkage estimator; asymptotic approximations; SUR; GLS

1. Introduction

Seemingly unrelated regressions (SUR) was introduced by (Zellner 1962) and is one of the econometric developments that has been widely used in applied work. The relative ease of estimation, applying a large class of modeling and testing problems, and the availability of data representing a sample of cross section units observed over several time periods are related to the popularity of this model. (Zellner 1962) proposed a generalized least squares (GLS) estimator for estimating the coefficients of a set of SUR and established that it yields, at least asymptotically, to more efficient estimators than those obtained by single-equation least squares. See the surveys by (Srivastava and Dwivedi 1979; Fiebig 2001) and the book by (Srivastava and Giles 1987) for a concise coverage of the literature in this area.
Shrinkage estimations in SUR was first introduced by (Zellner and Vandaele 1975), which extends the results of (James and Stein 1961) and (Sclove 1968) to multivariate regression equations and presents a technique of constructing an estimator whose risk is smaller than the risk of the GLS estimator. However, the resulting estimator depends on some unknown matrices and is not practical. (Srivastava 1973) investigates the properties of the estimator when consistent estimators are substituted for these unknown matrices. (Maddala 1991) reviewed the shrinkage estimators and showed that these estimators appear to perform better than both pooled and single-equation least squares estimators (see also Maddala and Hu 1996; Maddala et al. 1997; Choi and Li 2000). Maddala et al. (2001) show the superior properties of shrinkage estimators among single-equation estimators and various averaging estimators in a heterogeneous panel data model under error homoscedasticity framework. In univariate equation models, recently, (Hansen 2016) introduces shrinkage for general parametric models by shrinking maximum likelihood estimators (MLE) toward a restricted MLE. Hansen (2016) shows the dominance of the shrinkage estimator over the MLE in terms of having lower asymptotic risk when the shrinkage dimension exceeds two, using a local to zero asymptotic framework. Wang et al. (2019) propose a Mallow pooling averaging estimator for heterogeneous panel data models and conclude that the pooling averaging estimator is preferred when the panel is heterogenous and the signal-to-noise ratio is moderate or large. For more on averaging estimators see (Ullah and Wang 2013).
In the analysis of SUR, a question that often practitioners are faced is whether to assume parameter homogeneity or parameter heterogeneity. On one hand, the parameter heterogeneity assumption results in consistent estimators and violation of this assumption causes misleading estimates (see, for example, Robertson and Symons 1992; Pesaran and Smith 1995; Su and Chen 2013; Durlauf et al. 2001; Browning and Carro 2007). On the other hand, the parameter homogeneity assumption causes higher efficiency but could be at the cost of estimation bias and inconsistency of estimators, which is supported by an increasing number of studies due to a better forecast performance of the estimators under this assumption (see, for example, Maddala 1991; Maddala and Hu 1996; Baltagi and Griffin 1984; Baltagi et al. 2000; Hoogstrate et al. 2000). This question and the results of the mentioned research show the typical bias-variance trade-off that needs to be considered in choosing the restrictions. While efficiency is important, robustness is also critical, since researchers prefer as few ad hoc restrictions as possible. In the present scenario the efficient estimator depends on more stringent condition of homogeneity and therefore is less robust to the heterogeneity restriction. Therefore, this efficiency-robustness trade-off (bias-variance trade-off) calls for thorough examination. A natural approach is to consider a pre-test estimator, but it is proven unable to solve the efficiency-robustness issue (Leeb and Pötsche 2005).
A more useful approach considered here is an averaging estimator (Stein-type shrinkage), that is a weighted average of the robust and efficient (the unrestricted GLS and the restricted GLS) estimators. The weight is inversely related to a weighted quadratic loss function, which measures the weighted distance between the unrestricted and the restricted GLS estimators. The first and second moments of our proposed average estimator are derived using (Nagar 1959) large-sample approximations. Furthermore, we show the dominance properties in terms of mean squared error (MSE) of the estimators, which ensures that the proposed estimator is robust against arbitrary deviations from the restrictions. This is an advantage of our method relative to the “local asymptotic” argument that some previous studies rely on (see, for example, Hansen 2016). Our dominance property ensures that the proposed averaging estimator is robust against arbitrary deviations from the restrictions, while previous estimators in the literature consider mainly very small violations of the restrictions. Further discussion in this area has been generally theoretical, but we present here, an important empirical question and show the advantages of the averaging estimator over the previous estimators considered. Essentially, we apply our estimator to estimate cost efficiency of United States (U.S.) Commercial banks using a cost system method over the period from 2000 to 2018. Since bank size is an important factor of production environment, following the literature, we use it to partition bank technologies. However, these partitions are user-specified, and the estimates can be misleading because of false parameter heterogeneity assumptions. Therefore, we use the average estimator introduced in the paper to estimate the cost efficiency, as it optimally balances the trade-off between bias and variance efficiency of the restricted and the unrestricted GLS estimators. We find that on average majority of banks have been operating under increasing returns to scale over the sample period. We also find more signs of cost efficiency for Large Banks (banks with asset size more than $500 Million dollars) and Small Banks (banks with asset size less than $100 Million dollars) relative to Medium banks (banks with asset size between $500 and $100 Million dollars). This finding is important for gauging costs and benefits of any policy intervention to control the size of banks.
The paper is organized as follows. Section 2 describes the model and the assumptions. In Section 3, we introduce the estimators. We give the bias, MSE matrix and the risk of the average estimator using the large-sample approximations in Section 4. Section 5 reports some Monte Carlo simulations to evaluate the accuracy of the approximations. Results from our empirical example are presented in Section 6. Finally, Section 7 contains some concluding remarks, and proofs are given in Appendix A.

2. The Model and Notation

Consider the following m seemingly unrelated linear regressions
y i = X i β i + u i ,     i = 1 , 2 , m ,
where y i = ( y i 1 , y i 2 , , y i T ) is a T × 1 vector of observations on the dependent variable y i t , with T being the number of observations, X i is a T × k matrix of observations on the k vector of regressors including the intercept (that is, x i t , 1 = 1 )1, β i is a k × 1 vector of unknown coefficients and u i = ( u i 1 , u i 2 , , u i T ) is a T × 1 vector of disturbances, for i = 1 , 2 , , m . It is convenient to stack the m equations above in the following form:
y 1 y 2 y m = X 1 0 0 0 X 2 0 0 X m β 1 β 2 β m + u 1 u 2 u m ,
or compactly as
y m T × 1 = X m T × m k β m k × 1 + u m T × 1 .
We assume,
Assumption 1.
The m T × 1 vector of disturbances, u , has a zero conditional mean
E ( u | X 1 , X 2 , , X m ) = 0 .
Assumption 2.
The disturbances are uncorrelated across observations but correlated across equations,
E ( u i u j | X 1 , X 2 , , X m ) = σ i j I T ,
or
E ( u u | X 1 , X 2 , , X m ) = Ω = σ 11 I T σ 12 I T σ 1 m I T σ 21 I T σ 22 I T σ 2 m I T σ m 1 I T σ m 2 I T σ m m I T = Σ I T ,
where I T is the T × T identity matrix,
Σ m × m = σ 11 σ 12 σ 1 m σ 21 σ 22 σ 2 m σ m 1 σ m 2 σ m m ,
and we assume Ω is positive definite.
Assumption 3.
The disturbances are normally distributed with mean zero and variance-covariance matrix Ω .
We define some notations below, which will be used in the following sections. So let
Q m T × m T = Ω 1 ( I m T Ψ ) ,
where
Ψ m T × m T = X ( X Ω 1 X ) 1 X Ω 1 .
If we partition Q in the sub-matrices of T × T as below
Q = Q 11 Q 12 Q 1 m Q 21 Q 22 Q 2 m Q m 1 Q m 2 Q m m ,
then we define
Π = Q 11 Q 12 Q 1 m Q 21 Q 22 Q 2 m Q m 1 Q m 2 Q m m .
Also, we define
Φ T × T = i = 1 m Ψ i i ,
where Ψ i i is the ith diagonal T × T sub-matrix of Ψ which is partitioned as below
Ψ = Ψ 11 Ψ 12 Ψ 1 m Ψ 21 Ψ 22 Ψ 2 m Ψ m 1 Ψ m 2 Ψ m m .

3. Estimators

Our goal is to estimate the vector of slope parameters, β , in Equation (3). We consider three estimators of the slope parameters. The first estimator is the (Zellner 1962) GLS estimator (the unrestricted GLS estimator), which is the standard estimator in SUR. The second estimator is a restricted GLS estimator that ignores the slope parameters heterogeneity and estimates a pooled model. The third estimator, called the average estimator, is a weighted average of the restricted and the unrestricted GLS estimators where the weight is proportional to a weighted quadratic loss function.

3.1. Unrestricted Estimator

The typical estimator of the slope parameters in SUR is a feasible GLS estimator defined as
β ^ = ( X Ω ^ 1 X ) 1 X Ω ^ 1 y = β + ( X Ω ^ 1 X ) 1 X Ω ^ 1 u ,
where Ω ^ is an estimator of Ω , which can be calculated as
Ω ^ = Σ ^ I T ,
where Σ ^ is an estimator of Σ , such that its ( i , j ) th element, s i j , estimates σ i j , using a single-equation estimator of β i , defined as β ˘ i = ( X i X i ) 1 X i y i , for i = 1 , 2 , , m . Hence, s i j is equal to
s i j = ( y i X i β ˘ i ) ( y j X j β ˘ j ) / T = u i M i M j u j / T ,
where M i = I T X i ( X i X i ) 1 X i is an idempotent projection matrix.

3.2. Restricted Estimator

The restricted estimator is defined under the parameter homogeneity assumption across equations, which can be written as
β 1 = β 2 = = β m = β ¯ ,
where β ¯ is a weighted average of the slope parameters, β i ’s, defined as
β ¯ = ( J X Ω 1 X J ) 1 J X Ω 1 X β ,
in which J = ( I k , I k , , I k ) is a k × m k matrix, where I k denotes the k × k identity matrix.
Equivalently, the parameter homogeneity assumption can be formulated as a restriction matrix as
β 1 β ¯ β 2 β ¯ β m β ¯ = β J β ¯ = ( I m k J ( J X Ω 1 X J ) 1 J X Ω 1 X ) β = R β = 0 ,
where R = I m k J ( J X Ω 1 X J ) 1 J X Ω 1 X is an idempotent matrix.
Hence, we can derive the restricted estimator from the following minimization
Minimize s . t . β ( y X β ) Ω 1 ( y X β ) , subject to R β = 0 .
The solution to the above minimization can be formulated as a feasible restricted GLS estimator in below
β ˜ = β ^ ( X Ω ^ 1 X ) 1 R ^ [ R ^ ( X Ω ^ 1 X ) 1 R ^ ] 1 R ^ β ^ = ( I m k R ^ ) β ^ = J ( J X Ω ^ 1 X J ) 1 J X Ω ^ 1 X β ^ ,
where2 R ^ = I m k J ( J X Ω ^ 1 X J ) 1 J X Ω ^ 1 X is an estimate of R .

3.3. Average Estimator

We define the average estimator as below
β ^ A = ( 1 τ D ) β ^ + τ D β ˜ ,
where D is a weighted quadratic loss function defined as
D = ( β ^ β ˜ ) W ( β ^ β ˜ ) ,
with W an arbitrary symmetric positive definite weight matrix with elements of order O ( T ) , and τ is a positive characterizing parameter. We will defer describing the optimal choice for this parameter in the next section.3
The idea behind the average estimator defined above is that when the difference between the restricted and the unrestricted GLS estimators is small (D is small), the average estimator gives a higher weight to the restricted GLS estimator, as it is the most efficient estimator. However, when the difference between the restricted and the unrestricted GLS estimators is substantial, the bias of the restricted GLS estimator, resulting from ignoring the parameter heterogeneity, could be more than its variance efficiency gain, so the average estimator assigns a higher weight to the unrestricted GLS estimator.

4. Large-Sample Approximate Bias and MSE

We employ the large-sample approximations method developed by (Nagar 1959), to analyze the bias, mean squared error matrix (MSEM) and risk of the average estimator.
Theorem 1.
Under Assumptions 1–3, the bias of the average estimator up to order O ( T 1 ) is
B i a s ( β ^ A ) = E ( β ^ A β ) = τ ϕ R β ,
and the MSEM of the average estimator up to order O ( T 2 ) is
MSEM ( β ^ A ) = E [ ( β ^ A β ) ( β ^ A β ) ] = MSEM ( β ^ ) + τ 2 ϕ 2 R β β R 2 τ ϕ R ( X Ω 1 X ) 1 R + 2 τ ϕ 2 [ R β β R W R ( X Ω 1 X ) 1 R + R ( X Ω 1 X ) 1 R W R β β R ] ,
and for the symmetric positive definite weight matrix W of order O ( T ) , the risk of the average estimator up to order O ( T 1 ) is
Risk ( β ^ A ) = E [ ( β ^ A β ) W ( β ^ A β ) ] = Risk ( β ^ ) + τ ϕ 2 [ τ 2 [ tr ( P ) 2 ϕ p ϕ ] Risk ( β ^ ) + τ ϕ 2 [ τ 2 [ tr ( P ) 2 λ m a x ( P ) ] ,
where ϕ = β R W R β = O ( T ) , ϕ p = β R W 1 / 2 P W 1 / 2 R β = O ( T ) , λ m a x ( . ) denotes the maximum eigenvalue, and P = W 1 / 2 R ( X Ω 1 X ) 1 R W 1 / 2 .
Proof. 
Appendix A.  ☐
We note that,
MSEM ( β ^ ) = ( 1 + m T ) ( X Ω 1 X ) 1 1 T ( X Ω 1 X ) 1 H ( X Ω 1 X ) 1 + o ( T 2 ) ,
see (Srivastava 1970) for a proof, hence
Risk ( β ^ ) = ( 1 + m T ) tr [ W ( X Ω 1 X ) 1 ] 1 T tr [ W ( X Ω 1 X ) 1 H ( X Ω 1 X ) 1 ] + o ( T 1 ) ,
where H = X ( Σ 1 Φ ) X X Π X .
From Theorem 1, it follows that the average estimator dominates the unrestricted GLS estimator in terms of having a smaller risk, when the second term on the right-hand side of Equation (20) is negative, which will hold when
0 < τ < 2 [ tr ( P ) 2 ϕ p ϕ ] ,
given tr ( P ) > 2 ϕ p / ϕ . As the upper bound of the condition above depends on the slope parameters, one could replace it with an infimum value. Let d be defined as
d = tr ( P ) λ m a x ( P ) ,
which lies in the range d [ 0 , ( m 1 ) k ] , as P is a non-zero positive semi-definite matrix. Therefore, when d > 2 , an infimum value for the upper bound is 2 [ tr ( P ) 2 λ m a x ( P ) ] . 4 Therefore, given d > 2 , an equivalent condition for the condition in (23) can be written as
0 < τ 2 [ tr ( P ) 2 λ m a x ( P ) ] .
In other words, when d > 2 , and τ satisfies the condition in Equation (25), the risk of the average estimator is less than the risk of the unrestricted GLS estimator up to the order of interest. In addition, as the choice of the characteristic parameter is user-specified, its optimal value, τ o p t , that minimizes the upper bound of the risk of the average estimator (the last term in Equation (20)), up to order O ( T 1 ) , provided d > 2 , is
τ o p t = tr ( P ) 2 λ m a x ( P ) .
Since the optimal τ depends on the unknown value of Ω , one could substitute it with its estimated value, and use an estimate of τ o p t , as below
τ ^ o p t = tr ( P ^ ) 2 λ m a x ( P ^ ) ,
where P ^ = W 1 / 2 R ^ ( X Ω ^ 1 X ) 1 R ^ W 1 / 2 , is an estimate of P .
Corollary 1.
Under Assumptions 1–3, when d > 2 , then up to order O ( T 1 ) we have
Risk ( β ^ ^ A ) Risk ( β ^ ) τ o p t 2 ϕ < Risk ( β ^ ) ,
where β ^ ^ A is the average estimator with τ ^ o p t .
Proof. 
Appendix A.  ☐
Two arbitrary choices of W are T I m k , and X Ω 1 X , where the former one in the risk gives the mean squared error and the latter one, provides the (in-sample) mean squared forecast error (MSFE).
Corollary 2.
Under Assumptions 1–3, when 0 < τ < 2 [ ( m 1 ) k 2 ] , then up to order O ( T 1 ) , we have
MSFE ( β ^ A ) = MSFE ( β ^ ) + τ ϕ 2 [ τ 2 [ ( m 1 ) k 2 ] 2 ] < MSFE ( β ^ ) .
The optimal value of τ that minimizes the MSFE of the average estimator, provided ( m 1 ) k > 2 , is
τ o p t , F = ( m 1 ) k 2 ,
and the associated optimal MSFE of the average estimator up to order O ( T 1 ) is
MSFE o p t ( β ^ A ) = MSFE ( β ^ ) [ ( m 1 ) k 2 ] 2 ϕ ,
where
MSFE ( β ^ ) = ( 1 + m T ) m k 1 T tr [ H ( X Ω 1 X ) 1 ] .
Proof. 
Appendix A. ☐

5. Monte Carlo Simulation

The results below are the simulation results of the model of Section 2, where x i t , 1 = 1 and the remaining regressors are independently generated from the standard normal distribution. The sample size varies from T = 100 , m = 3 , 6 , and k = 3 , 5 , leading to four combinations of m, and k. u 1 t is generated as I I D N ( 0 , 1 ) , while u i t = c u 1 t + v i t , for i = 2 , , m , where v i t I I D N ( 0 , 1 ) and c = 0.5 . We consider two DGPs for generating β i ’s, the first one is under a complete heterogeneity in coefficients where we assume that
DGP 1 : β i = β ¯ + ( i × δ ) / m , i = 1 , 2 , , m ,
with β ¯ = ( 1 , 1 , , 1 ) , and the second DGP is under a weak heterogeneity where we assume that
DGP 2 : β i 1 , β i 2 = 1 + ( i × δ ) / m , if i = 1 , , [ m / 2 ] 1.2 , if i = [ m / 2 ] + 1 , , m , β i l = 2 , l { 3 , , k } ,
where [ m / 2 ] denotes the largest integer value that is smaller than m / 2 , and δ takes values on a 10-point grid on [ 0 , 1 ] .
The results of 1000 monte carlo simulations are given in Figure 1, Figure 2, Figure 3 and Figure 4, where the vertical axes measure the relative mean squared error (RMSE) of the unrestricted GLS estimator, the restricted GLS estimator and the average estimator to the unrestricted GLS estimator. Hence, the RMSE of the unrestricted GLS estimator is equal to one. The horizontal axes measure the degree of parameter heterogeneity, δ , which is set between zero and one with 0.1 grid value.
The Monte Carlo results support our theoretical findings of the previous section. The figures show that the RMSE of the average estimator for the whole parameter heterogeneity is below that of the unrestricted estimator. This shows the superiority of the average estimator relative to the unrestricted GLS estimator.
The RMSE of the average estimator in DGP1 of a complete heterogeneous SUR, is smaller than that of the restricted GLS estimator except for very small values of parameter heterogeneity ( δ ). This is expected because as δ takes higher values, the bias of the restricted GLS estimator increases, which then results in higher MSE. In DGP2 where the SUR is characterized by some degrees of homogeneity, the RMSE of the restricted GLS estimator remains smaller than that of the unrestricted GLS estimator for larger values of δ relative to DGP1. In this case, the unrestricted GLS estimator can be inferior to the restricted GLS estimator even with the presence of weak degrees of heterogeneity. This is because although the unrestricted GLS estimator is unbiased, it is inefficient, especially under small sample sizes, and a high number of regressors. In contrast, the restricted GLS estimator properly makes the use of cross equation variations and thus provides a more accurate results.
In general, we find that the average estimator performs robustly well in SUR with various degrees of heterogeneity. When there is a strong heterogeneity, the average estimator prevails. When there is a relatively weak heterogeneity, the average estimator tends to gain more from the efficiency of the restricted GLS estimator by assigning a high weight to this estimator and thus still remains one of the best choices.

6. Application: Returns to Scale in US Banking Industry

In this section we apply the average estimator studied in the previous sections to regressions of the cost system for U.S. commercial banks. We are interested in estimating the returns to scale (RTS) for these banks over the past recent years.
Over the past few years, the number of U.S. commercial banks fell by almost 70%, where in 1984 the total number of U.S. commercial banks was 14,391 and dropped to 4773 in 2018. Over the same period of time, the average asset value of U.S. banks (adjusted for inflation), which is also a measure of bank size, increased by about ten times, from 140 million dollars in 1984 to 1400 million dollars in 2018 (See Figure 5). To support this bank size expansion, bank executives and analysts claim that due to the changes in regulation (such as the permission of interstate branching and combination of banks) and because of technological and financial innovations (such as communication technologies, the securitization and sale of bank loans) over the past few years, the cost of production for larger banks has reduced and encouraged banks to grow larger and/or merge.
On the other hand, critics contend that this decrease in the number of operating banks, and having banks with large assets not only impact the market competition, but also result in agency problems and disproportionate benefits of government policies in favor of large banks. In particular, the financial crisis of 2007 focused attention on large financial institutions considered as “too-big-to-fail”. These together have brought attention of policy makers for regulatory limits on bank size. However, any policy intervention needs to consider the potential efficiency benefits of operating at a large scale. Therefore, estimation of scale economies and RTS is essential for analyzing the costs and benefits of any policy intervention to control the size of banks.
The estimation of scale economies and RTS for U.S. banking industry has stimulated a substantial body of studies. Older empirical studies that used data from the 1980s and 1990s did not find scale economies in banking industry except for very small banks. But recent research that used data from the 2000s, and more modern methods for estimating the banking models, has found considerably more evidence of scale economies in banking. These studies include (Hughes et al. 1996, 2000, 2001; Berger and Mester 1997; Hughes and Mester 1998, 2013; Wheelock and Wilson 2001; Feng and Serletis 2009). Most of the studies in the literature partition banks based on their asset size in groups and estimate each group independently from the other groups. However, not only there is no reason to believe these categories are set based on banks underlying technology, at the same time, it is hard to believe that these groups are not affected by some unknown factors that could have resulted in correlations between groups. Therefore, there are two issues that need to be carefully considered by researchers. First, estimating cost efficiency using all observations as one group has the advantage of smaller variance but at the same time, it means ignoring the potential heterogeneity bias due to difference in technology. Second, partitioning banks in different groups and using single-equation estimators are inefficient compared to pooled estimators which ignore heterogeneity. Hence, there is a trade-off between bias and variance efficiency between these two estimators. As the average estimator introduced in the previous sections results in the optimal balance between bias and variance efficiency, we recommend using this estimator in the estimation of the returns to scale for banking industry to obtain robust and efficient estimators.

6.1. The Model

We follow the so called “intermediation approach” framework of (Sealey and Lindley 1977), which is broadly employed in the literature. According to this approach, a bank’s balance sheet is assumed to capture the essential structure of a bank’s core business. Inputs are considered to be liabilities (core deposits and purchased funds), physical capital and labor. The inputs result in the bank’s productions which are assets (other than the physical, includes loans and trading securities).
With regard to variable specification, we define five inputs and five outputs that are the ones used in the literature. We define the following output quantities: consumer loans ( y 1 ), real estate loans ( y 2 ), loans to business and other institutions ( y 3 ), federal funds sold and securities purchased under agreements to resell ( y 4 ), and other assets ( y 5 ). The input variables are: labor quantities ( x 1 ), premises and fixed assets ( x 2 ), purchased funds ( x 3 ), interest-bearing transaction accounts ( x 4 ), and non-transaction accounts ( x 5 ). For each input x j , its price w j is obtained by dividing its total expenses by the corresponding input quantities.
For modeling the cost of banking industry, we consider a translog cost function and normalize it, so that the homogeneity (in input prices) property is automatically satisfied. We allow for individual (fixed) effects by adding intercepts in each regression, to control for specific group characteristics, heterogeneity in skills and so on. Hence, the cost equation for each group i = 1 , 2 , , m , is considered as
ln ( C i t / w 5 , i t ) = β 0 , i + n = 1 4 β n , i ln ( w n , i t / w 5 , i t ) + l = 1 5 γ l , i ln ( y l , i t ) + 1 2 n = 1 5 l = 1 5 γ n l , i ln ( y n , i t ) ln ( y l , i t ) + 1 2 n = 1 4 l = 1 4 η n l , i ln ( w n , i t / w 5 , i t ) ln ( w l , i t / w 5 , i t ) + l = 1 4 n = 1 5 δ l n ln ( w l , i t / w 5 , i t ) ln ( y n , i t ) + u i t , t = 1 , 2 , , T i ,
where T i is the number of observations in group i (the number of banks operating within group i), and C i t is the total cost of bank t, in group i, defined as
C i t = w 1 , i t x 1 , i t + w 2 , i t x 2 , i t + w 3 , i t x 3 , i t + w 4 , i t x 4 , i t + w 5 , i t x 5 , i t , t = 1 , 2 , , T i , and i = 1 , , m .
The cost function is symmetric which requires the imposition of the following restrictions on the parameters as below
η j q , i = η q j , i γ n l , i = γ l n , i .
RTS is defined as the inverse of the sum of cost elasticities. If we define the output elasticity of the model for output j of bank t in group i, as E c y j , i t = ln ( C i t ) ln ( y j , i t ) and the sum of cost elasticities as E c y i t = j = 1 5 ln ( C i t ) ln ( y j , i t ) , then RTS of bank t in group i is defined as R T S i t = 1 / E c y i t . Also, the RTS for group i is a vector of T i × 1 defined as
R T S i = R T S i 1 R T S i 2 R T S i T i ,
which is used for calculating mean, quartiles, and deciles of RTS for group i with T i banks, see Section 6.3.
A bank with R T S > 1 , has increasing returns to scale, that is for one percent increase in all outputs, cost is increased by less than one percent, and the bank is operating below its efficient scale size ( R T S = 1 ) when R T S < 1 .

6.2. The Data

The data we use is obtained from the Reports of Income and Condition (Call Reports)5, over the period from 2000 to 2018. We omit observations where negative values for assets, equity, outputs, and prices are reported. The summary of the data for years 2000 and 2018 is in Table 16.
Following (Feng and Serletis 2009) and others in the literature, we classify the banks into three groups which is mainly based on the standard asset size categories that are used by the Federal Financial Institutions Examination Council (FFIEC). Banks with over $ 500 million in total assets are classified as Large banks, banks with assets between $ 100 million and $ 500 million are classified as Medium banks, and banks with under $ 100 million in assets are classified as Small banks. In order to have a consistent partitions over time, the asset size caps in each year are justified upward by the growth in the CPI. Table 2 presents the number and share of banks in each group with their corresponding asset ranges for years 2000 and 2018.

6.3. Estimation

We estimate model of Equation (30) using the average estimation method developed in the previous sections for each year separately. Basically, our SUR at each year consists of three cost equations representing Large, Medium, and Small bank groups, and the observations for each regression are the operating banks data under each bank group. Since the sample size for each group is different, we face a SUR with unequal number of observations and to estimate the variance-covariance matrix ( Ω ), we consider the following procedures recommended in the literature (See Schmidt 1977; Baltagi et al. 1989):
  • Ignore the extra observations in estimating Ω ;
  • Use the extra observations to estimate variances. This procedure has the disadvantage of producing estimates of Ω that are not positive definite;
  • Use the extra observations to estimate variances, and modifying the estimates of covariances using the method of (Srivastava and Zaatar 1973);
  • Use all observations in estimation, following the method of (Hocking and Smith 1968).
It is known in the literature that the results of the above procedures are much the same. Likewise, we find that the procedures above, generate similar results, so we only report the results of method 3.
After estimating Equation (30), we obtain the sum of cost elasticities for each bank by
E c y i t = j = 1 5 ln ( C i t ) ln ( y j , i t ) = j = 1 5 γ j , i + j = 1 5 n = 1 5 γ n j , i ln ( y n , i t ) + j = 1 5 n = 1 4 δ n j ln ( w n , i t / w 5 , i t ) ,
where E c y i t is the sum of cost elasticity of bank t in group i, and the parameters are replaced with their estimated values. Then, the RTS is calculated following Equation (33). We also obtain the RTS using the unrestricted GLS estimator, and the restricted GLS estimator.
The results of years 2000, 2009 and 20187 are reported in Table 3, which presents the extreme deciles, quartiles and means of our estimated RTS using the three estimation methods. The patterns of mean, extreme deciles and quartiles for all years are plotted in Figure 6. The results base on the restricted estimator over the most recent years suggest increasing RTS at each decile. However, these results are not economically reasonable. On the other hand, we find evidence of decreasing RTS for almost 50% of Small and Medium banks using the unrestricted estimator over the most recent years. These contradicting results show the importance of the average estimator which is used to respond to this model uncertainty.
Comparing the RTS of banks using the average estimator over the sample period shows that, on average majority of banks have increasing returns to scale. In most recent years, the results exhibit more signs of cost efficiency for Large and Small banks, such that all of Large banks have increasing RTS and only less than 25% of Small banks have exhausted their cost efficiency. However, we find that more than 50% of Medium banks are operating under decreasing returns to scale near the end of the sample. The results are consistent with some recent studies (e.g., References Feng and Serletis 2009; Hughes and Mester 2013; Wheelock and Wilson 2001; Henderson et al. 2015; Mailkov et al. 2015) although we are not aware of any study from 2011 to 2018.

7. Conclusions

In this paper, we introduce an averaging estimator for a SUR model. The introduced estimator is a weighed average of an unrestricted GLS estimator which is the (Zellner 1962) estimator and a restricted GLS estimator. The weight is inversely related to a quadratic loss function which measures the weighted distance between the unrestricted and the restricted GLS estimators. The bias, MSE matrix, and risk of the average estimator using the large-sample approximations of (Nagar 1959) are derived. The superiority conditions of the average estimator in terms of the weighted mean squared error is given for any user-specific symmetric positive definite weight matrix, and is not limited to the case where the weight is the inverse of the variance-covariance matrix of the unrestricted GLS estimator.
We also provide some Monte Carlo results which support our theoretical claims. Finally, as our estimator is motivated by economic theory, we use U.S. Commercial banking data, and estimate a cost system for the banking industry to show how our estimator can be used in the applied work. We also estimate the cost system using single-equation least squares and a pooled estimator, and compare them with our proposed average estimator. We found more reliable estimation results with the cost system using our average estimator than the other estimators. We found that on average majority of banks have been operating under increasing returns to scale over the sample period from 2000 to 2018.

Author Contributions

A.M., A.U. contributed equally to the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

The authors are thankful to the guest-editors and two referees for helpful and constructive comments on the subject matter of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Lemma A1.
Under Assumptions 1–3, we have the followings
Ω ^ 1 = Ω 1 O p ( 1 ) Ω 1 Δ Ω 1 O p ( T 1 / 2 ) + Ω 1 Δ Ω 1 Δ Ω 1 O p ( T 1 ) Ω 1 Δ Ω 1 Δ Ω 1 Δ Ω 1 O p ( T 3 / 2 ) + O p ( T 2 ) ,
( X Ω ^ 1 X ) 1 = ( X Ω 1 X ) 1 O p ( T 1 ) + ( X Ω 1 X ) 1 X Ω 1 Δ Ω 1 X ( X Ω 1 X ) 1 O p ( T 3 / 2 ) + O p ( T 2 ) ,
X Ω ^ 1 u = X Ω 1 u O p ( T 1 / 2 ) X Ω 1 Δ Ω 1 u O p ( 1 ) + X Ω 1 Δ Ω 1 Δ Ω 1 u O p ( T 1 / 2 ) + O p ( T 1 ) ,
R ^ = R + R 1 / 2 + O p ( T 1 ) ,
where Δ = Ω ^ Ω , and R 1 / 2 = J ( J X Ω 1 X J ) 1 J X Ω 1 Δ Θ X = O p ( T 1 / 2 ) , with the suffix showing the order of magnitude in probability, and Θ = Ω 1 Ω 1 X J ( J X Ω 1 X J ) 1 J X Ω 1 .
Proof. 
It can be easily verified that, under Assumptions 1–3, Δ = O p ( T 1 / 2 ) . Employing this condition, and using the standard geometric expansion for the inverse of a matrix8, for large T, we have the followings
Ω ^ 1 = ( Ω + Δ ) 1 = Ω 1 [ I m T + Δ Ω 1 ] 1 = Ω 1 [ I m T Δ Ω 1 + Δ Ω 1 Δ Ω 1 Δ Ω 1 Δ Ω 1 Δ Ω 1 + ] = Ω 1 O p ( 1 ) Ω 1 Δ Ω 1 O p ( T 1 / 2 ) + Ω 1 Δ Ω 1 Δ Ω 1 O p ( T 1 ) Ω 1 Δ Ω 1 Δ Ω 1 Δ Ω 1 O p ( T 3 / 2 ) + O p ( T 2 ) ,
which gives the results in Equation (A.1). Now, by using Equation (A.1), we have
( X Ω ^ 1 X ) 1 = [ X Ω 1 X X Ω 1 Δ Ω 1 X + ] 1 = ( X Ω 1 X ) 1 [ I m k X Ω 1 Δ Ω 1 X ( X Ω 1 X ) 1 + ] 1 = ( X Ω 1 X ) 1 O p ( T 1 ) + ( X Ω 1 X ) 1 X Ω 1 Δ Ω 1 X ( X Ω 1 X ) 1 O p ( T 3 / 2 ) + O p ( T 2 ) ,
also we have
X Ω ^ 1 u = X Ω 1 u O p ( T 1 / 2 ) X Ω 1 Δ Ω 1 u O p ( 1 ) + X Ω 1 Δ Ω 1 Δ Ω 1 u O p ( T 1 / 2 ) + O p ( T 1 ) .
By using the above results, we have
R ^ = I m k J ( J X Ω ^ 1 X J ) 1 J X Ω ^ 1 X = I m k J [ ( J X Ω 1 X J ) 1 O p ( T 1 ) + ( J X Ω 1 X J ) 1 J X Ω 1 Δ Ω 1 X J ( J X Ω 1 X J ) 1 O p ( T 3 / 2 ) + O p ( T 2 ) ] × J [ X Ω 1 X O p ( T ) X Ω 1 Δ Ω 1 X O p ( T 1 / 2 ) + O p ( 1 ) ] = R + J ( J X Ω 1 X J ) 1 J X Ω 1 Δ [ Ω 1 Ω 1 X J ( J X Ω 1 X J ) 1 J X Ω 1 ] X + O p ( T 1 ) = R + R 1 / 2 + O p ( T 1 ) .
 ☐
Proof. 
Theorem 1:
Using the results of Lemma A1, in Equation (10), we have
β ^ β = ( X Ω ^ 1 X ) 1 X Ω ^ 1 u = A 1 / 2 + A 1 + O p ( T 3 / 2 ) ,
where A 1 / 2 and A 1 are defined below, and the suffixes show the order of magnitude in probability,
A 1 / 2 = ( X Ω 1 X ) 1 X Ω 1 u = O p ( T 1 / 2 ) , A 1 = ( X Ω 1 X ) 1 X Ω 1 Δ Q u = O p ( T 1 ) ,
and Q = Ω 1 Ω 1 X ( X Ω 1 X ) 1 X Ω 1 .
Using Equation (A.5) in Equation (17), we have
1 D = 3 [ ( β ^ β ˜ ) W ( β ^ β ˜ ) 3 ] 1 = 3 [ β ^ R ^ W R ^ β ^ 3 ] 1 = 3 [ ( β + A 1 / 2 + O p ( T 1 ) ) [ R + R 1 / 2 + O p ( T 1 ) ] W [ R + R 1 / 2 + O p ( T 1 ) ] ( β + A 1 / 2 + O p ( T 1 ) ) 3 ] 1 = 3 [ ϕ + 2 β R W R A 1 / 2 + 2 β R W R 1 / 2 β + O p ( 1 ) 3 ] 1 = 1 ϕ 3 [ 1 + 2 ϕ β R W R A 1 / 2 + 2 ϕ β R W R 1 / 2 β + O p ( T 1 ) 3 ] 1 = 1 ϕ 3 [ 1 2 ϕ β R W R A 1 / 2 2 ϕ β R W R 1 / 2 β 3 ] + O p ( T 2 ) ,
where ϕ = β R W R β = O ( T ) , and the last equality above holds by using the standard geometric expansion. Also, the use has been made of Equations (A.1)–(A.4). The terms with order O p ( T 2 ) and smaller are dropped, because they will not enter in the calculation of the bias and MSE of the average estimator up to the orders of interest.
Employing Equations (A.4) and (A.6) in Equation (16), we obtain
β ^ A β = ( β ^ β ) τ ϕ [ 1 2 ϕ β R W R A 1 / 2 2 ϕ β R W R 1 / 2 β ] [ R β + R A 1 / 2 + R 1 / 2 β ] + O p ( T 2 ) .
The bias of the average estimator using the above equation up to order O ( T 1 ) is
E ( β ^ A β ) = E ( β ^ β ) τ ϕ R β + o p ( T 1 ) = τ ϕ R β + o p ( T 1 ) ,
where the use has been made of
E ( β ^ β ) = E ( A 1 / 2 ) + E ( A 1 ) + o p ( T 1 ) = 0 + o p ( T 1 ) ,
because, both A 1 / 2 and A 1 are odd functions of the error term which has a normal distribution.
The MSE matrix up to order O ( T 2 ) is
E [ ( β ^ A β ) ( β ^ A β ) ] = E [ ( β ^ β ) ( β ^ β ) + Ξ 1 Ξ 2 Ξ 2 + Ξ 3 + Ξ 3 Ξ 4 Ξ 4 3 ] + o p ( T 2 ) ,
where Ξ 1 - Ξ 4 are defined as below
Ξ 1 = τ 2 ϕ 2 R β β R , Ξ 2 = τ ϕ R β ( β ^ β ) , Ξ 3 = 2 τ ϕ 2 ( β R W R A 1 / 2 + β R W R 1 / 2 β ) R β ( β ^ β ) , Ξ 4 = τ ϕ ( R A 1 / 2 + R 1 / 2 β ) ( β ^ β ) .
Now, we obtain the expectations of Ξ 1 - Ξ 4 .
E ( Ξ 1 ) = τ 2 ϕ 2 R β β R ,
E ( Ξ 2 ) = τ ϕ R β E ( β ^ β ) = 0 + o p ( T 2 ) ,
E ( Ξ 3 ) = 2 τ ϕ 2 R β [ β R W R E ( A 1 / 2 A 1 / 2 ) + E ( β R W R 1 / 2 β A 1 / 2 ) ] + o p ( T 2 ) = 2 τ ϕ 2 R β β R W R ( X Ω 1 X ) 1 R + o p ( T 2 ) ,
where the last equality holds by noting that the second term on the right-hand side of the first equality in the above equation is an odd function of normal distributions, and utilizing the following two equations below
E ( A 1 / 2 A 1 / 2 ) = ( X Ω 1 X ) 1 X Ω 1 E ( u u ) Ω 1 X ( X Ω 1 X ) 1 = ( X Ω 1 X ) 1 ,
R ( X Ω 1 X ) 1 R = [ I m k J ( J X Ω 1 X J ) 1 J X Ω 1 X ] ( X Ω 1 X ) 1 [ I m k J ( J X Ω 1 X J ) 1 J X Ω 1 X ] = ( X Ω 1 X ) 1 J ( J X Ω 1 X J ) 1 J = ( X Ω 1 X ) 1 R = R ( X Ω 1 X ) 1 .
Similarly, we have
E ( Ξ 4 ) = τ ϕ [ R E ( A 1 / 2 A 1 / 2 ) + E ( R 1 / 2 β A 1 / 2 ) ] + o p ( T 2 ) = τ ϕ R ( X Ω 1 X ) 1 R + o p ( T 2 ) .
Employing the results of Equations (A.11)–(A.16), in Equation (A.10), we obtain the MSE matrix of the average estimator up to order O ( T 2 ) , as below
MSEM ( β ^ A ) = MSEM ( β ^ ) + τ 2 ϕ 2 R β β R 2 τ ϕ R ( X Ω 1 X ) 1 R + 2 τ ϕ 2 [ R β β R W R ( X Ω 1 X ) 1 R + R ( X Ω 1 X ) 1 R W R β β R ] ,
hence the risk of the average estimator up to order O ( T 1 ) , can be written as
Risk ( β ^ A ) = E [ ( β ^ A β ) W β ^ A β ) ] = tr [ W E [ ( β ^ A β ) ( β ^ A β ) ] ] = tr [ W MSEM ( β ^ A ) ] = Risk ( β ^ ) + τ 2 ϕ 2 τ ϕ tr [ W R ( X Ω 1 X ) 1 R ] + 4 τ ϕ 2 β R W R ( X Ω 1 X ) 1 R W R β Risk ( β ^ ) + τ 2 ϕ 2 τ ϕ tr [ W R ( X Ω 1 X ) 1 R ] + 4 τ ϕ λ m a x [ W 1 / 2 R ( X Ω 1 X ) 1 R W 1 / 2 ] ,
where the last inequality holds because W 1 / 2 R ( X Ω 1 X ) 1 R W 1 / 2 is symmetric, therefore
β R W R ( X Ω 1 X ) 1 R W R β β R W R β λ m a x ( W 1 / 2 R ( X Ω 1 X ) 1 R W 1 / 2 ) ,
see (Abadir and Magnus 2005, pp. 181–82). ☐
Proof. 
Corollary 1:
Note that, using Lemma A1, we have
Ω ^ Ω = O p ( T 1 / 2 ) ,
R ^ R = O p ( T 1 / 2 ) ,
and
P ^ = P + P 1 / 2 + O p ( T 1 ) ,
where
P 1 / 2 = W 1 / 2 R 1 / 2 ( X Ω 1 X ) 1 R W 1 / 2 + W 1 / 2 R ( X Ω 1 X ) 1 R 1 / 2 W 1 / 2 + W 1 / 2 R ( X Ω 1 X ) 1 X Ω 1 Δ Ω 1 X ( X Ω 1 X ) 1 R W 1 / 2 .
Therefore, it is easy to see that τ ^ o p t τ = O p ( T 1 / 2 ) , hence the consistency of τ ^ follows. Further, by replacing τ ^ o p t by τ in Equation (A.10), the results in Equations (A.11), (A.13) and (A.16) remain unchange, and for Equation (A.12) we have
E ( Ξ ^ 2 ) = E [ τ ^ ϕ R β ( β ^ β ) ] E [ τ o p t ϕ R β ( β ^ β ) ] + 1 ϕ E [ [ tr ( P 1 / 2 ) 2 λ m a x ( P 1 / 2 ) ] R β ( β ^ β ) 2 ] + o p ( T 2 ) = 0 + o p ( T 2 ) ,
where the last equality holds using Equation (A.9) and noting that the terms on the right hand side of the inequality are odd functions of the error term which has a normal distribution. Also, the inequality above holds because P , and P 1 / 2 are symmetric, so we have
λ m a x ( P ^ ) λ m a x ( P ) + λ m a x ( P 1 / 2 ) + o p ( T 1 / 2 ) ,
see (Abadir and Magnus 2005, p. 344).
Therefore, the result in Equation (27) follows. ☐
Proof. 
Corollary 2:
Using the results in the first equality of Equation (A.18) when W = X Ω 1 X , up to order O ( T 1 ) , we have
MSFE ( β ^ A ) = MSFE ( β ^ ) + τ ϕ 2 [ τ 2 [ ( m 1 ) k 2 ] 2 ] ,
where the use has been made of Equation (A.15), and noting that tr ( R ) = ( m 1 ) k .  ☐

References

  1. Abadir, Karim M., and Jan R. Magnus. 2005. Matrix Algebra. New York: Cambridge University Press. [Google Scholar]
  2. Baltagi, Badi H., and James M. Griffin. 1984. Short and long run effects in pooled models. International Economic Review 25: 631–45. [Google Scholar] [CrossRef]
  3. Baltagi, Badi H., Susan Garvin, and Stephen Kerman. 1989. Further Monte Carlo Evidence on Seemingly Unrelated Regressions with Unequal Number of Observations. Annales D’Economie et de Statistique 14: 103–15. [Google Scholar] [CrossRef]
  4. Baltagi, Badi H., James M. Griffin, and Weiwen Xiong. 2000. To pool or not to pool: Homogeneous versus heterogeneous estimations applied to cigarette demand. The Review of Economics and Statistics 82: 117–26. [Google Scholar] [CrossRef]
  5. Berger, Allen N., and Loretta J. Mester. 1997. Inside the black box: What explains differences in the efficiencies of financial institutions? Journal of Banking and Finance 21: 895–947. [Google Scholar] [CrossRef]
  6. Browning, Martin, and Jesus Carro. 2007. Heterogeneity and microeconometrics modeling. In Advances in Economics and Econometrics. Edited by Richard Blundell, Whitney Newey and Torsten Persson. Cambridge: Cambridge University Press, vol. 3, pp. 47–74. [Google Scholar]
  7. Choi, Hak, and Hongyi Li. 2000. Economic development and growth convergence in China. Journal of International Trade and Economic Development 9: 37–54. [Google Scholar] [CrossRef]
  8. Durlauf, Steven N., Andros Kourtellos, and Artur Minkin. 2001. The local Solow growth model. European Economic Review 45: 928–40. [Google Scholar] [CrossRef]
  9. Feng, Guohua, and Apostolos Serletis. 2009. Efficiency and productivity of the US banking industry, 1998–2005: Evidence from the Fourier cost function satisfying global regularity conditions. Journal of Applied Econometrics 24: 105–38. [Google Scholar] [CrossRef]
  10. Fiebig, Denzil G. 2001. Seemingly Unrelated Regression. In A Companion to Theoretical Econometrics. Edited by Badi H. Baltagi. Oxford: Blackwell, chp. 5. [Google Scholar]
  11. Hansen, Bruce E. 2016. Efficient Shrinkage in Parametric Models. Journal of Econometrics 190: 115–32. [Google Scholar] [CrossRef]
  12. Henderson, Daniel J., Subal C. Kumbhakar, Qi Li, and Christopher F. Parameter. 2015. Smooth coefficient estimation of a seemingly unrelated regression. Journal of Econometrics 189: 148–62. [Google Scholar] [CrossRef]
  13. Hocking, R. R., and Wm. B. Smith. 1968. Estimation of Parameters in the Multivariate Normal Distribution with Missing Observations. Journal of the American Statistical Association 63: 154–73. [Google Scholar]
  14. Hoogstrate, André J., Franz C. Palm, and Gerald A. Pfann. 2000. Pooling in dynamic panel-data models: An application to forecasting GDP growth rates. Journal of Business and Economic Studies 18: 274–83. [Google Scholar]
  15. Hughes, Joseph P., and Loretta J. Mester. 1998. Bank capitalization and cost: Evidence of scale economies in risk management and signaling. The Review of Economics and Statistics 80: 314–25. [Google Scholar] [CrossRef]
  16. Hughes, Joseph P., and Loretta J. Mester. 2013. Who said large banks don’t experience scale economies? Evidence from a risk-return-driven cost function. Journal of Financial Intermediation 22: 559–85. [Google Scholar] [CrossRef]
  17. Hughes, Joseph P., William Lang, Loretta J. Mester, and Choon-Geol Moon. 1996. Efficient banking under interstate branching. Journal of Money, Credit, and Banking 28: 1045–71. [Google Scholar] [CrossRef]
  18. Hughes, Joseph P., William Lang, Loretta J. Mester, and Choon-Geol Moon. 2000. Recovering risky technologies using the almost ideal demand system: An application to US banking. Journal of Financial Services Research 18: 5–27. [Google Scholar] [CrossRef]
  19. Hughes, Joseph P., Loretta J. Mester, and Choon-Geol Moon. 2001. Are scale economies in banking elusive or illusive?: Evidence obtained by incorporating capital structure and risk-taking into models of bank production. Journal of Banking and Finance 25: 2169–208. [Google Scholar] [CrossRef]
  20. James, W., and Charles Stein. 1961. Estimation with quadratic loss. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability 1: 361–80. [Google Scholar]
  21. Leeb, Hannes, and Benedikt M. Pötscher. 2005. Model Selection and Inference: Facts and Fiction. Econometric Theory 21: 21–59. [Google Scholar] [CrossRef]
  22. Maddala, G. S., and Wanhong Hu. 1996. The pooling problem. In The Econometrics of Panel Data. Edited by Laszlo Matyas and Patrick Sevestre. Advanced Studies in Theoretical and Applied Econometrics. Dordrecht: Springer, vol. 33, pp. 307–22. [Google Scholar]
  23. Maddala, G. S., Robert P. Trost, Hongyi Li, and Frederick Joutz. 1997. Estimation of short run and long run elasticities of energy demand from panel data using shrinkage estimators. Journal of Business and Economic Statistics 15: 90–100. [Google Scholar]
  24. Maddala, G. S., Hongyi Li, and Virendra K. Srivastava. 2001. A Comparative Study of Different Shrinkage Estimators for Panel Data Models. Annals of Economics and Finance 2: 1–30. [Google Scholar]
  25. Maddala, G. S. 1991. To pool or not to pool: That is the question. Journal of Quantitative Economics 7: 255–64. [Google Scholar]
  26. Mailkov, Emir, Diego Restrepo-Tobon, and Subal C. Kumbhakar. 2015. Estimation of banking technology under credit uncertainty. Empirical Economics 49: 185–211. [Google Scholar] [CrossRef]
  27. Nagar, Anirudh L. 1959. The Bias and Moment Matrix of the General k-Class Estimators of the Parameters in Simultaneous Equations. Econometrica 27: 575–95. [Google Scholar] [CrossRef]
  28. Pesaran, M. Hashem, and Ron Smith. 1995. Estimating long-run relationships from dynamic heterogeneous panels. Journal of Econometrics 68: 79–113. [Google Scholar] [CrossRef]
  29. Robertson, Donald, and J. Symons. 1992. Some strange properties of panel data estimators. Journal of Applied Econometrics 7: 175–89. [Google Scholar] [CrossRef]
  30. Schmidt, Peter. 1977. Estimation of Seemingly Unrelated Regressions with Unequal Numbers of Observations. Journal of Econometrics 5: 365–77. [Google Scholar] [CrossRef]
  31. Sclove, Stanley L. 1968. Improved estimators for coefficients in linear regression. Journal of American Statistical Association 63: 596–606. [Google Scholar]
  32. Sealey, Calvin W., and James T. Lindley. 1977. Inputs, outputs, and a theory of production and cost at depository financial institutions. Journal of Finance 32: 1251–66. [Google Scholar] [CrossRef]
  33. Srivastava, Virendra K., and Tryambkeshwar D. Dwivedi. 1979. Estimation of seemingly unrelated regression equations: A brief survey. Journal of Econometrics 10: 15–32. [Google Scholar] [CrossRef]
  34. Srivastava, Virendra K., and David E. A. Giles. 1987. Seemingly Unrelated Regression Models: Estimation and Inference. New York: Marcel Dekker. [Google Scholar]
  35. Srivastava, J. N., and M. K. Zaatar. 1973. A Monte Carlo Comparison of Four Estimators of Dispersion Matrix of a Bivariate Normal Population, Using Incomplete Data. Journal of the American Statistical Association 68: 180–83. [Google Scholar] [CrossRef]
  36. Srivastava, Virendra K. 1970. The efficiency of estimating seemingly unrelated regression equations. Annals of the Institute of Statistical Mathematics 22: 483–93. [Google Scholar] [CrossRef]
  37. Srivastava, Virendra K. 1973. The Efficiency of an Improved Method of Estimating Seemingly Unrelated Regression Equations. Journal of Econometrics 1: 341–50. [Google Scholar] [CrossRef]
  38. Su, Liangjun, and Qihui Chen. 2013. Testing homogeneity in panel data models with interactive fixed effects. Econometric Theory 29: 1079–135. [Google Scholar] [CrossRef]
  39. Ullah, Aman, and Huansha Wang. 2013. Parametric and Nonparametric Frequentist Model Selection and Model Averaging. Econometrics 1: 157–79. [Google Scholar] [CrossRef]
  40. Wang, Wendun, Xinyu Zhang, and Richard Paap. 2019. To Pool or Not to Pool: What is a Good Strategy? Journal of Applied Econometrics 34: 724–45. [Google Scholar] [CrossRef]
  41. Wheelock, David C., and Paul W. Wilson. 2001. New evidence on returns to scale and product mix among US commercial banks. Journal of Monetary Economics 47: 653–74. [Google Scholar] [CrossRef]
  42. Zellner, Arnold, and Walter Vandaele. 1975. Bayes-Stein estimators for k-means, regression and simultaneous equation models. In Studies in Bayesian Econometrics and Statistics. Edited by Stephen E. Fienberg and Arnold Zellner. Amsterdam: North-Holland Publishing Company, pp. 627–53. [Google Scholar]
  43. Zellner, Arnold. 1962. An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for Aggregation Bias. Journal of the American Statistical Association 57: 348–68. [Google Scholar] [CrossRef]
1.
Note that we do not assume that X i ’s are the same, nor do we assume they are different across equations. In other words, our model supports complete heterogeneity, partial heterogeneity, and complete homogeneity of regressors.
2.
The second equality holds by using Equation (A.15).
3.
The weight can be replaced by a positive part weight, that is, when τ / D < 0 , we assign a zero weight to the restricted estimator. It is easy to verify that the MSE of the positive part is smaller. However, it will not change the approximations, so for simplicity we do not impose it at this stage. Nevertheless, the Monte Carlo and the application results are reported using the positive part weights.
4.
See Equation (A.19) in Appendix A.
5.
The data from 2000–2010 is downloaded from the Federal Reserve Bank of Chicago website, and the rest of the data from 2011–2018 is downloaded from the FFIEC Central Data Repository’s Public Data Distribution website.
6.
The summary of the data for other years are not reported to save the space, but it is available upon request.
7.
We only report the results for these three years to save the space. However, the results for other years are available upon request.
8.
( I + A ) 1 = I A + A 2 A 3 + .
Figure 1. RMSE of Unrestricted, Restricted and Average Estimators, for T = 100, m = 3, k = 3.
Figure 1. RMSE of Unrestricted, Restricted and Average Estimators, for T = 100, m = 3, k = 3.
Econometrics 08 00015 g001
Figure 2. RMSE of Unrestricted, Restricted and Average Estimators, for T = 100, m = 3, k = 5.
Figure 2. RMSE of Unrestricted, Restricted and Average Estimators, for T = 100, m = 3, k = 5.
Econometrics 08 00015 g002
Figure 3. RMSE of Unrestricted, Restricted and Average Estimators, for T = 100, m = 6, k = 3.
Figure 3. RMSE of Unrestricted, Restricted and Average Estimators, for T = 100, m = 6, k = 3.
Econometrics 08 00015 g003
Figure 4. RMSE of Unrestricted, Restricted and Average Estimators, for T = 100, m = 6, k = 5.
Figure 4. RMSE of Unrestricted, Restricted and Average Estimators, for T = 100, m = 6, k = 5.
Econometrics 08 00015 g004
Figure 5. United States Commercial Banks, Number and Average Assets (source: Federal Reserve Bank of St. Louis FRED database).
Figure 5. United States Commercial Banks, Number and Average Assets (source: Federal Reserve Bank of St. Louis FRED database).
Econometrics 08 00015 g005
Figure 6. RTS Estimates Using The Restricted and The Average Estimator over period 2000–2018.
Figure 6. RTS Estimates Using The Restricted and The Average Estimator over period 2000–2018.
Econometrics 08 00015 g006
Table 1. Summary Statistics.
Table 1. Summary Statistics.
VariableMinMax
2000201820002018
C136.49151.7818,369,359.0426,757,643.39
w 1 7015.1415,321.43163,358.47261,240.11
w 2 6.759.6112375.0043,469.79
w 3 3.200.02206.5348.82
w 4 0.260.02265.36153.96
w 5 0.460.0649.2116.82
y 1 1.251.0042,638,250.00173,922,000.00
y 2 0.25170.50168,465,250.00686,161,250.00
y 3 45.75125.50178,056,500.00457,517,750.00
y 4 89.500.25144,188,250.00703,099,250.00
y 5 39.0042.5086,346,000.00704,384,250.00
VariableMeanSTD
2000201820002018
C24,454.4943,803.47331,091.51643,872.50
w 1 26,175.3249,386.167084.4215,332.39
w 2 230.31265.78351.89920.70
w 3 33.223.516.382.75
w 4 14.792.439.013.87
w 5 22.873.464.791.90
y 1 57,014.23268,717.11754,295.154,722,816.78
y 2 175,843.57881,915.932,961,928.5915,632,429.24
y 3 206,836.22929,043.012,423,963.1610,526,805.47
y 4 192,820.92862,701.612,985,314.6616,392,980.08
y 5 85,019.93842,793.391,554,035.2016,868,329.90
Note: All variables except w 1 w 5 , are measured in thousands of dollars.
Table 2. Banks Asset Size Classes.
Table 2. Banks Asset Size Classes.
Bank GroupsAsset Size (in Millions of Dollars of the Year)Number of BanksShare of Banks
2000
Large BanksAssets ≥ 5007398.9
Medium Banks100 ≤ Assets < 500294635.5
Small BanksAssets < 100462055.6
2018
Large BanksAssets ≥ 72998819.8
Medium Banks146≤ Assets <729229846.1
Small BanksAssets < 146170434.1
Table 3. Summary of estimates of returns to scale.
Table 3. Summary of estimates of returns to scale.
Bank SizeEstimatorEstimates of RTS
D 10 Q 25 Q 50 Q 75 D 90 Mean
2000
Restricted0.9790.9880.9971.0081.0190.998
Large BanksUnrestricted0.9710.9850.9981.0091.0210.996
Average Estimator0.9800.9890.9981.0071.0180.998
Medium BanksUnrestricted0.9690.9921.0171.0431.0711.019
Average Estimator0.9951.0021.0111.0201.0281.011
Small BanksUnrestricted0.9971.0141.0341.0551.0751.035
Average Estimator1.0061.0141.0241.0341.0441.025
2009
Restricted1.0071.0161.0261.0381.0511.028
Large BanksUnrestricted1.0011.0251.0531.0801.1081.055
Average Estimator1.0041.0261.0511.0751.1011.052
Medium BanksUnrestricted0.9560.9901.0321.0791.1251.038
Average Estimator0.9610.9931.0311.0761.1181.037
Small BanksUnrestricted0.9791.0131.0491.0851.1241.050
Average Estimator0.9831.0151.0481.0821.1181.049
2018
Restricted1.0091.0211.0341.0491.0661.037
Large BanksUnrestricted1.0091.0521.1001.1401.1841.098
Average Estimator1.0101.0521.0941.1321.1721.093
Medium BanksUnrestricted0.9150.9500.9881.0301.0700.989
Average Estimator0.9190.9530.9891.0291.0680.990
Small BanksUnrestricted0.9600.9971.0421.1041.1661.054
Average Estimator0.9620.9981.0401.0981.1581.051
Note: Decile, quartile and mean estimates of RTS for the restricted, unrestricted, and average estimators. D 10 , D 90 , Q 25 , Q 50 , and Q 75 are the lower decile, the upper decile, the lower quartile, median and upper quartile, respectively.
Back to TopTop