1. Introduction
Functional form selection can be difficult in applied work, especially in cases where economic theory is not a useful guide. The Fourier flexible form of
Gallant (
1981,
1982) has been a preferred solution in many cases, as it allows for global approximations to the underlying theoretical counterpart. However, there is high potential for approximation error resulting from boundary and edge effects when approximating non-periodic data, as is often the case in economic applications. The purpose of this article is to develop a new functional form based on the existing Fourier flexible form that will address this issue while maintaining the desirable characteristics of the original.
From the early 1970’s, Diewert-flexible forms such as the generalized Leontief, quadratic (normalized, square-rooted and symmetric) and translog have dominated applied parametric analysis. These functional forms have many desirable properties, placing no restrictions on derived measures that are functions of their first and second derivatives (
Creel 1997).
1 However, they are based on second-order Taylor series approximations, which are local in nature. Unless the true function subject to approximation happens to be in the same family as the approximating function, least squares will not consistently estimate the true value of the function in a global sense (
White 1980). This limitation was addressed by
Gallant’s (
1981,
1982) Fourier flexible form. Based on the composition of a truncated Fourier series expansion of orthogonal polynomials and a second-order Taylor-series expansion in logarithms, the Fourier flexible form can provide an arbitrarily close, global approximation to an unknown function (
Gallant 1982).
Despite its reduction in specification error, the Fourier flexible form is still subject to approximation error (
Mitchell and Onvural 1996). The approximation error results from two sources: using trigonometric regression terms to estimate a non-periodic function and Fourier approximation with a finite order of approximation (i.e., Gibbs phenomenon). Due to finite sample sizes, virtually all applications in economics require a truncated Fourier series, i.e., a finite order of approximation and most applications employ non-periodic relationships, so these two sources of approximation error are frequently present.
To solve this problem, we replace the second-order logarithmic expansion with a second-order expansion in the Box-Cox polynomial. As evidenced by
Eubank and Speckman (
1990), including a low-order polynomial in a trigonometric regression (such as a Fourier series approximation) can dramatically reduce approximation error. The exact low-order polynomial that minimizes the approximation error is dependent on the data generating process and simulation results from
Eubank and Speckman (
1990) demonstrate that the precise order can vary widely. By using the Box-Cox function in place of the logarithmic, the ideal low-order polynomial that reduces approximation error will be revealed by the true data generating process.
In introducing the Box-Cox modification, we also allow for a much wider range of nested testing possibilities. With the exception of the likelihood dominance test of
Skolrud and Shumway (
2013), the Fourier flexible form has only been tested against its nested alternatives—the translog and Cobb-Douglas. With the new functional form, robust, nested testing is possible for a wide variety of popular functional forms.
Simulation evidence demonstrates that the new functional form, referred to as the Box-Cox Fourier flexible form, significantly reduces boundary approximation bias relative to the original Fourier flexible form. Likelihood ratio tests demonstrate the superiority relative to the original and to the translog in terms of overall fit. The ability of the new functional form to reduce the bias is tested for multiple data generating processes, orders of Fourier series approximations and error distributions.
2. Materials and Methods
The Fourier flexible form is preferable to its Diewert-flexible competitors due to its ability to represent an unknown function as closely as desired in terms of the Sobolev norm, which is a global measure of distance between two functions as well as their derivatives.
2 Parameter estimates based on Diewert-flexible forms are subject to the maintained hypothesis that the true unknown function is derived from the same family of functions used in the approximation (
White 1980;
Gallant 1981).
3 The fact that the Fourier flexible form is not subject to this maintained hypothesis makes it an extremely desirable alternative, as we can never know if the maintained hypothesis is in fact valid. These facts are well-known and clearly documented in the literature for a variety of cases. The important statistical properties are proven in
Gallant (
1981,
1982),
Elbadawi et al. (
1983) and
Eastwood and Gallant (
1991).
In this article, we take the desirable statistical properties of the original Fourier flexible form as given and provide only a brief description as it pertains to our suggested modification. Interested readers should consult the previously mentioned articles for more details on the statistical properties of the Fourier flexible form, along with less-technical summaries available in
Creel (
1997),
Mitchell and Onvural (
1996) and
Ivaldi et al. (
1996).
For ease of exposition, we present the Fourier flexible form as a cost function, similar to
Gallant (
1982). Let
be the cost function resulting from a perfectly competitive firm’s cost minimization problem, where
is a vector of
n input prices and
is a vector of
m output quantities. The Fourier flexible form of the cost function is given by:
where
is a
dimensional vector of scaled input prices and output quantities,
is a vector of parameters,
is an integer-valued
dimensional vector,
is the number of
vectors and
is a stochastic disturbance term with mean 0 and variance
.
4 Note that the data vector
must be scaled to fit in the interval
] in order to be expanded in the Fourier series approximation. In
Gallant’s (
1982) formulation, the
vector is composed of logarithmic scaled input prices and output quantities.
The Fourier flexible form is comprised of a second-order polynomial appended to a Fourier series. While the Fourier series component is sufficient for close approximation in the Sobolev norm, the addition of the polynomial serves to (1) limit the number of required Fourier series terms, (2) allow for nested testing of the translog and Cobb-Douglas functional forms and (3) decrease approximation error at the boundaries of the domain (
Gallant 1981,
1982;
Gallant and Souza 1991). In the next section, we argue that a generalization of this polynomial allows for further reductions in approximation error and allows for a wider range of nested tests of functional form.
Consider the following modification of the Fourier flexible form presented in Equation (1):
where
is an
vector of scaled input prices and output quantities and
and
are Box-Cox transformations, defined as:
and all other variables are as previously defined. In previous uses of the Fourier flexible form,
would be expressed in logarithms; this approach relaxes this restriction. Note however that as
,
,
and
become
and
, respectively and the Box-Cox Fourier flexible form becomes the Fourier flexible form.
There are two important benefits resulting from the extra flexibility afforded by the Box-Cox polynomial expansion. The most important benefit is the reduction in approximation error. In their paper analyzing the improvement provided by appending low-order polynomials to trigonometric expansions (truncated Fourier series),
Eubank and Speckman (
1990) demonstrate that the boundary approximation can be dramatically improved but the improvement is dependent on the order of the polynomial. By choosing the Box-Cox parameters
and
parametrically, our new form can adjust to reduce the boundary approximation error. The second benefit concerns the nested testing of popular functional form alternatives. By choosing a second-order polynomial expansion in logarithms, the original construction (
Gallant 1982) nests the popular translog functional form (
Christensen et al. 1975), which in turn nests the Cobb-Douglas functional form. This choice allowed researchers to make a robust, nested comparison to two of the most popular functional form alternatives. Many studies using the Fourier flexible form in an empirical setting make the comparison to the translog and an extensive search did not find any study that failed to reject the translog based on statistical test. Our modification continues to allow this important nested test, as well as the nested testing of several more functional forms through appropriate modifications of
and
. They include the TLF, translog, Generalized Box-Cox, linear, normalized quadratic, generalized Leontief, modified resistance, non-homothetic CES (
Applebaum 1979), logarithmic and Cobb-Douglas.
5 The parametric restrictions required to produce these functional forms are listed in
Table 1.
To obtain these benefits, we have introduced two nonlinear parameters, which will require a more complicated estimation technique compared to the Fourier flexible form. Fortunately, the Box-Cox polynomial has seen frequent use in economics, so appropriate estimation procedures are well developed. Our estimation technique relies on existing procedures with only minor modifications to allow for the estimation of Fourier series parameters. In the following exposition, we will refer to the Fourier flexible form of
Gallant (
1982) as the TLF, to reflect its nesting of the translog and we will refer to the Box-Cox Fourier flexible form as the BCF, to reflect its nesting of the Box-Cox polynomial.
One of the advantages of the TLF is that despite its complexity, it is still linear in parameters, so ordinary least squares can be used with no inherent complications. The BCF has two nonlinear parameters, and . Using maximum likelihood estimation, these parameters (and the remaining parameters) in the model can be recovered with relative ease.
First, we collect the right-hand side variables into the matrix
, such that
includes the
second-order expansion in
and the
2A trigonometric terms. The number of trigonometric terms to include is governed by several factors, including the sample size, order of approximation of the Fourier series component and the number of input prices and output quantities. We discuss these factors and selection of multi-indices
in the
Appendix A.
We can write the BCF compactly as
. If we expect the error
to be normally distributed, we can write the likelihood function as:
The log-likelihood is given by:
Instead of choosing all parameters in the vector , , and simultaneously, we can “concentrate out” most of them and greatly simplify the optimization process. Concentrating out a parameter from an objective function involves replacing the parameter with its optimal solution from solving the system of first-order conditions. This process embeds the optimal choice of the parameter in the objective function itself. The first-order conditions from choosing and to maximize (5) can be solved to yield the usual estimators, and , where is the vector of residuals, .
Before both
and
are concentrated out, we need to concentrate out just
so that the information matrix can be derived to yield the standard errors for the estimated parameters. Denote the log-likelihood with
concentrated out as
:
The first-order conditions from the maximization of (6) are as follows:
where subscripts indicate partial derivatives with respect to the subscripted argument and
is a vector of ones. The Hessian matrix, derived from the system of first-order conditions, is given by:
The inverse of the negative of the Hessian matrix in (8), evaluated at the parameter values that maximize (6), is the estimated covariance matrix of the parameter estimates. As noted in
Spitzer (
1982), use of
as the covariance matrix for
instead of
would lead to an underestimation of the standard errors, due to the neglected variance from the
and
The estimation problem can be further simplified by concentrating out
from the log-likelihood function. Denote this concentrated likelihood function by
:
Now the problem has been reduced to a choice of just two parameters, not the entire set of parameters. Denote the values of and that maximize (9) as and , respectively. Then, the optimal is given by . This method is equivalent to solving a series of least squares problems for varying values of and If the assumption of a normal distribution is deemed too restrictive for a given application, alternative estimation strategies can be implemented, such as nonlinear least squares, or a grid search for the optimal values of and followed by ordinary least squares.
To assess the ability of the BCF to mitigate approximation error compared to the TLF, we turn to Monte Carlo simulations. We generate input price and output quantity data using fixed parameter values and total cost is derived from several data generating processes. We impose symmetry and linear homogeneity on the generated cost function through price normalization and proper multi-index selection (discussed in the
Appendix A). Concavity is not imposed a priori to maintain second-order flexibility and to not confound simulation results. As robustness checks, we conduct simulations with errors from two different distributions.
To prevent the BCF from having an a priori advantage over the TLF in estimation, we choose data generating processes that are not nested in the generalized Box-Cox (and consequently not nested in the BCF). We use the generalized Cobb-Douglas function (
Fuss et al. 1978):
the resistance function (
Heady and Dillon 1961):
and the generalized quadratic function (
Denny 1974):
We consider two distributions for the generation of the error term —standard normal and Laplace. Errors from the standard normal distribution are used as a baseline comparison and are consistent with the likelihood function developed in the previous section. The Laplace distribution is used as a comparison due to its high kurtosis (thick tails), which we expect will exacerbate boundary approximation issues. For this reason, we expect the fit of the TLF relative to the BCF to be even worse in the case of Laplace errors. To estimate parameters of the BCF when the errors are distributed Laplace, we use nonlinear least squares with estimates from the corresponding maximum likelihood estimation as starting values.
Additionally, we use second, third and fourth-order Fourier series approximations for comparison. As the order of approximation increases and the TLF and BCF both improve in overall fit, we hypothesize that the relative advantage of the BCF over the TLF will diminish. This is due to the reduction in approximation error resulting from Gibbs Phenomenon, a “wringing” effect that subsides as the order of approximation increases (
Eubank and Speckman 1990;
Jerri 1998). Due to the semi-nonparametric estimation, the sample size must change along with the order of approximation. To determine the appropriate sample size to use in the data generating process, we use the rule proposed by
Eastwood and Gallant (
1991), which suggests selecting a total number of parameters equal to the sample size raised to the two-thirds power. Combined with the rules governing multi-index selection discussed in the
Appendix A, we can derive the appropriate sample size for each order. Using these rules, second, third and fourth-order Fourier approximations require sample sizes of 333, 1348 and 5405, respectively.
As one instance of economic relevance, we design a simulation to estimate returns to scale at each observation of the generated data. We compare estimates of economies of scale from the BCF and the TLF to the true economies of scale values and then conduct hypothesis tests to see the frequency with which each form correctly rejects increasing, constant, or decreasing returns to scale at several observations in the sample data. We use the multi-product economies of scale measure,
, from
Baumol et al. (
1982), where
with
indicating the derivative of the cost function with respect to output
j.
3. Results
We first compare the fit of the BCF to the simulated data using likelihood ratio tests, which we report in
Table 2. Hypothesis tests are conducted against the null of the TLF and translog (TL) for each data generating process and for three increasing orders of Fourier series approximation. With a second-order approximation, the TL is rejected in favor of the BCF at the 1% level of significance for all data generating processes, while the TLF is rejected in favor of the BCF at the 5% level for one data generating process (resistance) and at the 1% level for the remaining two. As the order of approximation in the Fourier series increases, the TLF is still rejected in favor of the BCF, albeit at a lower level of significance for two of the three data generating processes. The TL is rejected in favor of the BCF at the 1% level of significance in eight of the nine scenarios and at the 5% level in the final scenario.
Table 3 shows the percentage
reduction in the cost estimation bias from using the BCF and TLF estimates for each data generating process and for second, third and fourth-order Fourier series approximations when the errors are standard normal. Estimation bias is measured as the absolute value of the bias between the true cost function and the estimated function. Bias reductions are split into two columns for each data generating process/order of approximation combination: the average in percentage bias reduction over the bottom 10% of sorted observations and the average over the top 10% of sorted observations. When the data is generated from the generalized Cobb-Douglas function, the second-order BCF reduces the approximation bias by over 27% compared to the TLF over the bottom 10% of observations and reduces the bias by 24% over the top 10% of observations. When the resistance function is used as the data generating process, the gain from using the BCF is smaller, with a range of bias reduction between 15% and 17%. The results from using the generalized quadratic is just the opposite, with reductions in bias of greater than 35% from using the BCF. We suspect that this is due to the complexity of the data generating processes: compared to the generalized Cobb-Douglas, the resistance function is simpler (fewer parameters) and the generalized quadratic is more complex (more parameters, highly nonlinear).
When the order of approximation is increased in the BCF and TLF, the relative advantage of the BCF shrinks. For a third-order approximation, the average percentage bias reduction ranges from 7% to 25% and for a fourth-order approximation, the bias reduction ranges from 5% to 19%. The relationship in bias reduction across data generating processes remains consistent across the increasing approximation orders.
Figure 1 provides a graphical representation of the percentage deviation from the true cost function for the BCF and TLF estimates using the generalized Cobb-Douglas data generating process, a second-order Fourier series approximation and normally distributed random errors. The horizontal axis shows observations sorted in order of increasing total cost. From the figure, we can see that the percentage deviation of the BCF remains low throughout the entire domain, while the TLF shows very large deviations at the boundaries of the domain. For the smallest observation, the TLF has a bias of about 50% while the BCF bias is only about 8%, a 5/6 drop. For the largest observation, the effect is even greater, at least a 9/10 drop.
Figure 2 demonstrates the estimation bias for the resistance function data generating process and
Figure 3 shows the bias for the generalized quadratic. Both figures provide visual confirmation of the results in
Table 3: the BCF bias is a small fraction of the TLF bias for the smallest and largest observations and the TLF bias is higher when the generalized quadratic data generating process is used and lower when the resistance function is used.
Figure 4 and
Figure 5 demonstrate the difference in bias when the approximation order of the Fourier series increases to three and four, respectively. Note that in each figure, the sample size has increased to accommodate the semi-nonparametric estimation. Compared to
Figure 1,
Figure 4 shows a decrease in TLF bias at the domain boundaries. In
Figure 5, the TLF bias is even smaller as the order of approximation increases to four.
In
Table 4, we conduct the same simulations and analysis as reported in
Table 3 for the case of Laplace distributed random errors (mean = 0, standard deviation = 1). We use the Laplace distribution due to its excess kurtosis, which is equal to six (kurtosis for the standard normal is three). With higher kurtosis, a greater proportion of the observations will be concentrated at the tails of the data and
Table 4 demonstrates the extent to which this excess kurtosis exacerbates the boundary issue. For a second-order Fourier series approximation using the generalized Cobb-Douglas data generating process, the percentage reduction in bias increases to a range of 31% to 34%, up from a range of 24% to 27% under normally distributed errors. With the most complex data generating process considered, the generalized quadratic, the reduction in bias from using the BCF is nearly double that under normally distributed errors. We note that the relationships revealed by
Table 3 continue in
Table 4: bias reduction improves with more complex data generating processes and diminishes as the order of approximation increases.
Figure 6 provides a graphical representation of the BCF and TLF bias for a generalized Cobb-Douglas data generating process and second-order Fourier approximation with errors distributed Laplace. The difference in approximation error compared to the normal distributed error case shown in
Figure 1 is striking, with the TLF deviating from the true cost function by as much as 70% for the smallest and largest observations.
We note that in each simulation considered, the difference in bias present in both the BCF and TLF in the middle 80% of the data is small, with the BCF outperforming the TLF in the majority of cases by less than 1%. These results are documented in
Table 5. In two cases, (case 1: 3
rd order approximation, generalized quadratic data generating process, normally distributed errors; case 2: 4th order approximation, generalized Cobb-Douglas data generating process, Laplace distributed errors) the TLF slightly outperforms the BCF but by less than 0.1% in either case. This means that when statistics using the data means are the only ones of interest, using the BCF provides only a minimal advantage in reducing bias. However, when estimates involving the top and bottom deciles of the domain are important, our simulation results suggest that the BCF will provide a dramatic reduction of approximation bias over the TLF.
Finally, we report results from a scenario where the approximation bias of the TLF can lead to a misinterpretation with important economic implications. In
Table 6, we present economies of scale estimates by data generating process estimated by the TLF and BCF. Estimates are split in two columns: the bottom 10% and top 10% columns give average economies of scale estimates for the smallest 10% and largest 10% of observations, respectively. Bootstrap standard errors are shown in parentheses below each estimate. The true average economies of scale estimate is shown at the bottom of each column. In this simulation, the data and parameters are generated such that the average economies of scale over the bottom 10% of observations is 1.10 (increasing returns to scale) and the average economies of scale over the top 10% of observations is 0.90 (decreasing returns to scale).
We conduct hypothesis tests to determine if estimates from the TLF and BCF properly reject constant returns to scale in favor of increasing returns over the bottom 10% of observations and if they properly reject constant returns to scale in favor of decreasing returns over the top 10% of observations. The TLF properly rejects constant returns in favor of increasing returns at the 10% level of significance for two of the data generating processes (generalized Cobb-Douglas and Resistance) and it properly rejects constant returns in favor of decreasing returns at the 5% level for one data generating process (resistance). Estimates from the BCF properly reject the null hypothesis for each data generating process at either the 5% (two specifications) or 1% (four specifications) level of significance. Importantly, this means that for a 1% level hypothesis test, the TLF would incorrectly fail to reject constant returns to scale in all cases and that the BCF would fail to reject constant returns to scale in only two cases. If we were concerned with issues of firm consolidation, using estimates from the TLF would lead to the false conclusion that the largest 10% of firms had failed to reach a scale where average costs started to increase.
4. Discussion
With simulation evidence, we have demonstrated that the Fourier flexible form (TLF) of
Gallant (
1981,
1982) suffers from serious approximation bias at the boundaries of the data. We propose a new functional form, the Box-Cox Fourier (BCF), which modifies the leading second-order polynomial of the original function proposed by Gallant, reduces approximation error in the data boundaries and allows for nested testing of a wide range of common functional forms. The new functional form adds an additional layer of complexity by requiring the estimation of two nonlinear parameters but an estimation strategy is introduced that reduces the computational burden.
Simulation evidence indicates that the BCF has a substantial advantage over the TLF in mitigating approximation error in the data boundaries. For both the smallest and largest observations, the approximation bias from the BCF is a small fraction of that from the TLF. The magnitude of the advantage depends on the complexity of the unknown data generating process, the order of the Fourier series approximation and the error distribution. As the data generating process increases in complexity, the advantage of the BCF increases. In cases where the sample size is large enough to afford higher degrees of Fourier series approximation, the BCF’s advantage diminishes but is still often substantial. Finally, when the data has high levels of kurtosis (thick tails), the advantage to using the BCF is especially apparent.
Depending on the true data generating process, there may be cases where lower-dimensioned functional forms can be appropriately used. With the TLF, only translog and Cobb-Douglas alternatives can be tested as nested hypotheses. The generalized nature of the BCF allows for nested testing of a much wider set of functional forms, leading to a higher probability that the most suitable lower-dimensioned form for estimation is identified.
The BCF will be most useful in situations where derived measures near the boundaries of the data are of particular importance. As an example, we consider the case where economies of scale are estimated for the smallest and largest deciles of firms in the data to determine if an optimal firm size has been reached within the data set. Our simulations show that in some cases, the TLF will incorrectly identify decreasing vs. constant or increasing vs. constant returns to scale, resulting in misleading economic implications. With its minimal computational cost, very large reduction in approximation bias in the boundaries of the data and ability to test a wide range of alternative functional forms, our research demonstrates that the BCF should be a leading candidate for initial functional form selection.