The Environmental Kuznets Curve: A Semiparametric Approach with Cross-Sectional Dependence

: This paper proposes a new approach to examine the relationship between CO 2 emissions and economic developing. In particular, we propose to test the Environmental Kuznets Curve (EKC) hypothesis for a panel of 24 OECD countries and 32 non-OECD countries by developing a more ﬂexible estimation technique which enables to account for functional form misspeciﬁcation, cross-sectional dependence, and heterogeneous relationships among variables, simultaneously. We propose a new nonparametric estimator that extends the well-known Common Correlated Effect (CCE) approach from a fully parametric framework to a semiparametric panel data model. Our results corroborates that the nature and validity of the income–pollution relationship based on the EKC hypothesis depends on the model assumptions about the functional form speciﬁcation. For all the countries analyzed, the proposed semiparametric estimator leads to non-monotonically increasing or decreasing relationships for CO 2 emissions, depending on the level of economic development of the country.


Introduction
The growing interest of citizens and governments in environmental degradation, generated mainly by economic growth and the excessive use of natural resources, has led economies around the world to take measures to mitigate the effects of global warming and climate change, according to what was established in the Kyoto Protocol and the 2015 Paris Agreement on Climate Change Mitigation. However, despite the efforts made, the burning fossil fuels (i.e., carbon, oil and gases) used for the production of energy necessary for economic development continues to significantly contribute to CO 2 emissions. This situation has led the main policymakers to focus their efforts on promoting policies that pretend to achieve sustainable economic growth, that is, to achieve an increase in economic growth that is compatible with the environment. Since the relationship between economy and environment is highly complex and controversial, much of the empirical studies carried out in the literature are based on the so-called Environmental Kuznets Curves (EKC), which are still subject to an intense debate nowadays.
The EKC is based on the concept of the Kuznets curve, initially proposed by Kuznets (1955), which describes an inverted U-shaped relationship between income inequality and income per capita. This hypothesis argues that income inequality initially increases as per capita income rises and then begins to decrease from a certain threshold point. In the 1990s, the concept of the Kuznets curve was applied for the first time to the environmental quality trying to corroborate whether the relationship between income per capita and environmental degradation follows a similar relationship in terms of inverted U-shaped pattern (see Grossman and Krueger 1993, 1995and Stern 1998. In this framework, the EKC postulates that low income levels are correlated with a growing deterioration of the environment, but after a certain threshold point of the income per capita, the relationship between the two variables becomes negative again. In this way, this hypothesis is supported by the argument that higher levels of development imply a change in the economic structure in favor of industry and services, where production processes turn to more efficient and environmentally friendly technologies that help to preserve natural resources and reduce significantly the environmental deterioration. Similarly, the EKC shows the development of an economy over time. In a first stage, the economy is based on the agricultural sector with a strong impact on the quality of the environment. In a second stage, the industry is developed and, although it generates a higher level of wealth, it causes a great damage in the environmental's quality. After a threshold point, the economy sustains its growth in efficient and cleaner technologies, mainly in the services sector. Thus, the EKC hypothesis emphasizes that the economic growth is a precondition for reducing environmental pollution (see Beckerman 1992).
Since the pioneering work of Grossman and Krueger (1991), several empirical studies have emerged in the literature with the aim of corroborating the EKC hypothesis in the income-pollution relationship. However, so far there is no consensus on this relationship (see Dinda (2004); Galeotti (2007) and Zervas (2013a, 2013b) for intensive reviews about this issue). On the one hand, Selden and Song (1994), Grossman and Krueger (1995), List and Gallet (1999) and Stern and Common (2001), among others, point out that the intensity of pollutant emissions initially increases with the income per capita in the early stages of the economic development, but eventually falls as the income per capita rises above certain threshold point, at least in the case of the developed countries. On the other hand, Harbaugh et al. (2002) and Effiong and Oriabije (2018) highlight that there is no evidence that this relationship is robust for a number of emission pollutants.
A possible reason for this lack of consensus may be the presence of several specification errors in the models used to corroborate the EKC hypothesis. Firstly, it is common in this literature to assume ex-ante specific functional forms (i.e., quadratic or cubic polynomials) that may not fully explain the complexity of the EKC relationship. In the last two decades, nonparametric and semiparametric models are gaining popularity in order to deal with this problem (see Harbaugh et al. 2002;Strobl 2005, andAzomahou et al. 2006 among others). Secondly, most of these studies completely ignore the unobservable heterogeneity of countries or regions due to economic, social, political, structural, and biophysical differences that can have variable effects on environmental quality (see Dinda 2004 or Purcel 2020 and the references therein, among others). Hence, assuming homogeneity of parameters (i.e., suggesting that the income-pollution relationship is the same for all countries) can lead to misleading inferences. Thirdly, most studies with longitudinal data use standard panel data techniques that crucially depend on the independence assumption between the individuals. However, the growing process of globalization and the importance of economic and social interconnections between economic agents make necessary to control for the unobserved cross-sectional dependence (see Apergis et al. 2017 andMartinez-Zarzoso andBengochea-Morancho 2004, for example).
Given the lack of consensus about the income-pollution relationship based on the results obtained with fully parametric estimation techniques, this article reviews the EKC hypothesis trying to overcome the three potential misspecification problems discussed above: (i) functional misspecification; (ii) cross-sectional dependence from common factors; (iii) heterogeneous relations among variables. More precisely, in this article a new nonparametric estimation procedure is developed that allows to detect the true shape of the income-pollution relationship, allowing heterogeneous relationships among countries and controlling for the possible cross-section dependence related to the presence of a finite number of unobservable common factors (see Bai 2009 andPesaran 2006, among others). To the best of our knowledge, this is the first paper that try to solve the three misspecification problems simultaneously with the aim of modelling the income-pollution relationship in a more efficient way and leading to more suitable political prescriptions. Finally, with the aim of being able to fully characterize the Kuznets curve, we will carry out an empirical study with both OECD and non-OECD countries. In this way, we will be able to draw conclusions for both developing countries and those that are developing.
The paper is organized as follows. In Section 2, we present a review of the literature on EKCs, focusing mainly on issues related to econometric specifications. In Section 3, the model specification and the data description are presented. In Section 4, we present the main results and in Section 5 we discuss the main empirical findings. Finally, in Section 6 we offer concluding remarks. In the Appendices, we present details about the econometric methodology of the semiparametric model that we propose and the necessary assumptions to obtain the main asymptotic properties of the proposed estimators.

Literature Overview on the EKC
After the seminal work of Krueger (1991, 1995); Shafik and Bandyopadhyay (1992) and Panayotou (1993) on the EKC, the relationship between pollution and economic growth has been intensively analyzer over the past two decades from both theoretical and empirical point of view.
Despite the enormous amount of studies devoted in this topic, the agreement of the EKC hypothesis are far. Evidence of an EKC has been found for several indicators, but these findings are not unanimously accepted in the literature as it can be seen in the case of the carbon dioxide emissions. For this particular environmental measure, an EKC was corroborated in the studies of Holtz-Eakin and Selden (1995), Galeotti and Lanza (1999), Heil and Selden (2001), Cole (2004), Galeotti et al. (2006), and Parajuli et al. (2019), among others, in contradiction with the results obtained in Shafik and Bandyopadhyay (1992), Galeotti and Lanza (2001), York et al. (2003), and Nutakor et al. (2020), for example. Other researches such as Sengupta (1996) and Martinez-Zarzoso and Bengochea-Morancho (2004) find a cubic N-shaped relationship, whereas Barra and Zotti (2018) points out that preliminary evidence validates the Kuznets's hypothesis, although this relationship turned out to be misleading once the issue of (non)-stationarity has been taking into account.
Most of these empirical studies have relied on parametric specifications (i.e., quadratic or cubic specifications). However, although most of them resort to specific panel data technique to deal with the presence of unobservable individual heterogeneity. In this situation, the risk of misspecification is considerably high for two reasons. Firstly, most of the above studies crucially depend on the independence assumption between the individuals. Hence, the presence of cross-sectional independence may invalidate most of their conclusions. Secondly, fully parametric models are subject to restrictive conditions about the functional form. When the assumptions of the empirical model and parameters are inconsistent with the real data generating process, there exists model specification problem and it can lead us to misleading inference and conclusions. In order to avoid this problem, nonparametric and semiparametric regression models have become very popular in the recent literature since they do not require the specification of a specific functional form in order to investigate the existence of EKC in the relationship between pollution and economic growth.
Among the papers that use nonparametric estimation techniques is that of Harbaugh et al. (2002), where they use a nonparametric pooled regression to examine the robustness of the evidence for the existence of an inverted U-shaped relationship between national income and pollution for a panel of countries. They find that the results are highly sensitive both to slight variations in the data and to reasonable permutations of the econometric specifications. Hence, they conclude that there is little empirical support for an inverted U-shaped relationship between pollution and national income in these data. The work in Bertinelli and Strobl (2005) proposes a more flexible semi-parametric specification to overcome the above problems and obtain that they are unable to reject a linear relationship between economic growth and environmental degradation for a panel of 122 countries over the period . Azomahou et al. (2006) examine the empirical relationship between CO 2 emissions per capita and GDP per capita during the period 1960-1996 using a panel of 100 countries. In order to avoid any ad hoc choice of a parametric functional form, they consider a fully nonparametric model. Further, in order to avoid any bias related with the presence of unobserved individual heterogeneity, the propose to estimate this regression using marginal integration technique, since they propose to use the first differencing transformation in order to avoid any bias problem related to the presence of unobserved individual heterogeneity. In this framework they obtain a monotonically-increasing relationship between CO 2 emissions and economic development of the existence of an EKC for CO 2 emissions has been clearly contradicted in this more flexible specification. Bertinelli et al. (2012) investigated the relationship between the CO 2 emissions per capita and the GDP per capita using a kernel regression estimation in a set of countries individually. They obtain that the relationship between output and pollution between countries is different. For some developed countries the relationship has been heterogeneous after 1960, whereas for almost all the developing countries it was always upward sloping. Chen and Chen (2015) analyze the nonlinear relationship between industrial pollution and economic development level based on nonparametric method for testing and verifying the EKC of carbon dioxide in China. Using standard nonparametric techniques they verify the pattern of EKC in the relationship between the industrial carbon dioxide emissions and the level of economic development. Kalaitsidakis et al. (2018) examine the relationship between total factor productivity growth and emissions using a semiparametric smooth coefficient model that allow to directly estimate the output elasticity of emissions using a panel of 17 countries for the period of 1981-1998. They are able to show that there exists a monotonically-increasing relationship between emissions and total factor productivity growth. Further, they find that the CO 2 emissions effect varies depending on a country's emissions level.
Although the literature on the relationship between pollution and economic growth is extensive, there is still no consensus on the EKC hypothesis even when we resort to more flexible specifications such as nonparametric or semiparametric specifications. Further, more of the above studies ignores the presence of cross-sectional dependence. In the following section, we present an alternative estimation technique that enables us to overcome all these difficulties simultaneously.

Model Specification
Following the standard approach in the literature, we use the Stochastic Impacts by Regression on Population, Affluence and Technology (STIRPAT) model proposed in Dietz and Rosa (1997) as the reference analytical framework for evaluating the anthropogenic forces behind environmental change. Hence, the environmental impacts (I) can be understood as a multiplicative function of population size (P), affluence described per capita of economic activity (A), and the level of technology per unit of consumption and production (T) whose model specification is given by where (a, b, c, d) are the parameters of the model, ε denotes the idiosyncratic error term, and the subscript i are observational units (i.e., countries, regions) in a cross-section data.
Taking the natural logarithm of (1) and adding a quadratic term of the affluence (A) variable in line with the EKC hypothesis to capture possible existence of an inverted U-shaped relationship, the resulting panel data model is specified as follows: where E it is a measure of environmental quality of country i at time t; pop denotes the population size; gdpc is GDP per capita; enit denotes technology which is proxied by energy intensity to capture technology damaging effect on the environment. α i represents country-specific effect that is constat with time, and a time-specific effect d t to account for time-varying stochastic shocks that are common to all countries (including deterministics such as intercepts or seasonal dummies). All the variables in Equation (2) are expressed in natural logarithms so the estimated coefficients are interpreted as elasticities. Further, the above setup is sufficiently general and renders a variety of panel data models as special cases. For example, if d t = 1, the familiar fixed or random effects model is obtained. As the reader can notice, a very interesting information is obtained depending on the sign and statistical significance of the slope parameters of the income variable (gdpc). On the one hand, if β 1 > 0 and β 2 = 0, then the relationship income-pollution is monotonically increasing (or decreasing if β 1 < 0 and β 2 = 0). On the other hand, if β 1 > 0 and β 2 < 0, then and inverted U-shaped curve is observed for that relationship with the turning point given as E * = −β 1 2β 2 . In this framework, standard panel data techniques can be used in order to obtain consistent estimates for the slope parameters. Nevertheless, as it was discussed previously, the empirical model (2) exhibits several weaknesses. As noted in Yatchew (1998), it is quite common that Economic Theory does not provide enough information regarding the functional form existing between the dependent variable and the covariates of the model. Therefore, assuming an ex-ante specific functional form and ignoring the possible parameters heterogeneity among countries can lead to misleading inference. Furthermore, the regression model (2) completely ignores the potential cross-sectional dependence.
In order to avoid the possible functional form misspecification problem in the above parametric framework, we propose an alternative approach that enables to relax the functional form assumptions and allows the data generating process to determine the true shape of the income-pollution relationship. Furthermore, with the aim of controlling for the potential cross-sectional dependence, we assume that it is due to the presence of a finite number of unobservable common factors (see Pesaran 2006 andBai 2009, for example). Therefore, the semiparametric heterogeneous panel data model with cross-sectional dependence that we specify is as follows: where m i (·) is an unknown smooth function to estimate, lgdpc = ln(gdpc), f t ≡ ( f 1t , f 2t , . . . , f rt ) is a r × 1 vector of unobserved common effects, γ i ≡ (γ i1 , γ i2 , . . . , γ ir ) is the associated vector of factor loadings, and it are the individual-specific (idiosyncratic) errors assumed to be independently distributed of (d t , x it , z it ). As the reader can notice, this specification suggests that the income-pollution relationship and the slope parameters can be different among individuals. In order to obtain consistent estimates for model (3), we propose a new estimation procedure which combines nonparametric techniques with the Common Correlated Effects (CCE) approach proposed in Pesaran (2006). In the Appendix A, we present the proposed estimation procedure in detail.

Data Source
To investigate the empirical relationship between wealth-pollution, we used panel data sets consisting of 24 OECD countries and 32 non-OECD countries for the period 1980 to 2016. Countries with insufficient data on CO 2 emissions are dropped from the database. The detailed list of countries used for the estimation is collected in the Appendix C.
The data used in this study come from two main sources. Environmental degradation captured using the CO 2 emissions obtained from the International Energy Statistics of the U.S. Energy Information Administration (EIA) 1 . The CO 2 emissions (metric tones per capita) include burning of fossil fuels and cement manufacturing, but excludes emissions from land use such as deforestation.
The data for all other variables (population, affluence, and technology) are obtained from the World Development Indicators (WDI) of the World Bank. Taking the study in Effiong and Oriabije (2018) as benchmark, population is measured as total population, while affluence which captures economic prosperity is measured as real GDP per capita (constant 2015 US dollars). For its part, technology is measured using energy intensity. In this literature, energy intensity is usually expressed as total primary energy consumption per dollar GDP (1000 Btu per year 2015 999 US dollars).

Results
In order to begin our empirical analysis about the EKC in the OECD and non-OECD countries, we first examine if our panel variables contain cross-sectional dependence (CSD) using the CSD test proposed by Pesaran (2004) which follows a N(0, 1) distribution under the null hypothesis of cross-section independence. The results of this test, which are collected in Table 1, indicate the null hypothesis of CSD is rejected at the 1% level of significance. Therefore, it is possible to point out that all the series of the database used in this study contains CSD. At the light of this result, we also implemented the Pesaran (2007) panel unit root test, which is robust to the presence of CSD. The results are reported in Table 2 and indicate that all variables have a unit root in their levels and are stationary in their first differences. Hence, the unite root test results indicate the presence of CSD and non-stationary of the variables for both OECD and non-OECD countries.  Finally, we carry out the Friedman (1937); Frees (1995) and Pesaran (2004) tests for cross-sectional dependence. The results are reported in Table 3 and in all the cases they reject the null hypothesis of cross-sectional independence. Therefore, the presence of CSD has been corroborated for this study and ignoring this fact can seriously damage our results. In order to account for CSD, a fully parametric model with heterogeneous slope coefficients as the following one is going to be estimated, (4) where we are working with the natural logarithm of co2, gdpc, pop, and enit. Therefore, if we are interesting in calculating the income level at which the turning point occurs to corroborate the EKC hypothesis, it can be done using the following expression: gdpc * = e − β 1 2 β 2 . Tables 4 and 5 illustrates the results of fully parametric models with heterogeneous slopes using the Common Correlated Estimator (CCE) proposed in Pesaran (2006) for OECD and non-OECD countries, respectively. As it has been discussed previously, each country has its own economic structure so it is expected that CO 2 emissions-income relationship may differ across countries. One of the advantages of the CCE estimator is that they facilitate testing the EKC hypothesis for individual countries so we are able to identify country-specific determinants of environmental degradation.
However, in this paper we are not willing to impose restrictive conditions about the functional form, so we propose to estimate an alternate semiparametric regression model with heterogeneous slope coefficients and CSD as the following We estimate this regression model using the alternative nonparametric estimator proposed in Appendix A, which enables us to combine nonparametric techniques and controlling for CSD, simultaneously.
Unlike the above fully parametric models which yields a unique coefficient estimator for the population parameter, one of the advantage of the non-and semi-parametric techniques is that they provide a regression plot that describes the true shape of the relationship between the dependent variable and the explanatory variable of interest while holding other regressors at a fixed point such as their means. Tables 6 and 7 illustrates the results for the slope coefficients of OECD and non-OECD countries, respectively, while Figures 1 and 2 presents the nonparametric estimates of the income-CO 2 emissions for the OECD countries and Figures 3 and 4 for the non-OECD countries.  Note: ***, **, and * indicate statistical significance at the 1%, 5%, and 10% levels, respectively.

Discussion
Analyzing the empirical estimates in Tables 4 and 5, we can point out that the energy intensity variable is positive and statistically significant in most of the OECD countries. This implies that higher consumption of fossil fuels in the production process will increase CO 2 emissions which in turn, will put further pressure on environmental quality. On its part, we obtain that this variable is not statistically significant in most of the non-OECD countries. For the population variable we do not get a definite conclusion. On the one hand, there are some OECD countries such as Canada, Denmark or Finland, for example, which exhibit a negative and significative effect, which implies that higher population reduces the pressure on environmental quality. On the other hand, there are some non-OECD countries such as Brazil, Costa Rica or Guatemala, among others, that show a positive and significative effect, which means that higher population exacerbates pressure on environmental quality.
Focusing now on the EKC hypothesis, it is expected that the coefficient on GDP will be positive because of a scale effect, whereas the coefficient on GDP 2 will be negative since emissions fall via the composition effects in the latter stages of development. When we analyze the results of Tables 4 and 5 we find evidence that supports the EKC hypothesis only in six of the countries analyzed. In particular, the EKC hypothesis is supported in Finland, Korea, New Zealand (in the OECD database) and in Argentina, Ecuador, and Guatemala (in the non-OECD database). In particular, we find a peak turning point for Finland occurring at USD 30,001. Korea possesses a higher peak of USD 51,328 while New Zealand has a turning point at USD 32,728. On its part, Argentina exhibits a turning point at USD 1139, Ecuador has a peak of USD 1964, and Guatemala has a turning point of USD 466.
These findings are generally comparable to other empirical studies who find an EKC in the OECD. In particular, Galeotti (2007), Apergis et al. (2017), Churchill et al. (2018), and Dijkgraaf and Volleberg (2005), among others, find evidence supporting an inverted EKC relationship for some OECD countries, but not others. Our failure to find an EKC for Italy is consistent with Apergis (2016) and Churchill et al. (2018), but differs from Shabaz et al. (2017). Furthermore, Shabaz et al. (2017) find evidence of an EKC for Germany, which we do not. Other authors such as Churchill et al. (2018) try to examine the possibility of a N-shaped EKC for which they suggest to include the cube term of the per capita income. In this case, they obtain that there is a second turning point (i.e., the emissions begin to rise again when rich countries reach a second income tipping point 2 ) in Australia, Canada and Japan, whereas an inverted N-shaped EKC was found for Denmark.
Despite what was obtained in Table 4, where only Finland, Korea, and New Zealand corroborated the EKC hypothesis, completely different conclusions are obtained if the plots in Figures 1 and 2 are analyzed. In the 24 OECD countries analyzed, the presence of a nonlinear relationship between the CO 2 emissions and the economic development is clearly corroborated. Further, there is evidence supporting the validity of the EKC hypothesis in the majority of these OECD countries. However, for Denmark, France, Ireland, Italy, Turkey, and the United Kingdom the EKC hypothesis is rejected. In these particular cases our findings are consistent with the scale, technological and composition effects of economic growth (see Dinda 2004). These countries are characterized by a large agricultural sector which only have a marginal, or no, impact on the environment. However, globalization has allowed them to carry out the transition process from the agriculture to the industrial sector very quickly, and probably that is where it comes from the exponential growth that can be seen in the CO 2 emissions as the income increases. Furthermore, for Japan, South Korea, Luxembourg, Norway and Spain a N-shaped EKC was found.
Regarding the non-OECD countries, the nonlinear relationship between pollution and income is also corroborated. Likewise, countries can be grouped into two categories. On the one hand, the countries that offer insight into the existence of the EKC hypothesis such as Argentina, Costa Rica, Ecuador, El Salvador, Honduras, Hong Kong, Jamaica, Malaysia, Nicaragua, Nigeria, Pakistan, and Peru. On the other hand, there is another group of countries which exhibit an exponential growth as the income per capita increases such as The Bahamas, Brazil, Cyprus, Republica Dominicana, Gabon, Guatemala, Israel, Nepal, and Paraguay. This result suggests that these countries are still in an intermediate stage of development where the agricultural sector remains dominant and the industrial sector is less sophisticated, so that economic growth will normally have a scale effect on the environment. Further, a N-shaped EKC was found for China, Colombia, Panama, South Africa, and Uruguay.
Analyzing the estimators obtained for the slope coefficients of the semiparametric model, we can point that the significance of the population and energy intensity variables has increased considerably for both OECD and non-OECD countries (see Tables 6 and 7), compared to the results for the fully parametric models (see Tables 5). Furthermore, in most of the countries the effect of the energy intensity is positive, so we can conclude that higher consumption of fossil fuels in the production process will increase CO 2 emissions impacting negatively on the environmental quality. On its part, the effect of the population variable is more ambiguous and it will be depend on the particularities of the country.
In summary, the empirical results show that the nature and validity of the income-pollution relationship based on the EKC hypothesis depends on the model assumptions on functional form specification, corroborating the results obtained in Effiong and Oriabije (2018). The fully parametric estimate, that avoid the possible bias related to the CSD, only offers insight into the existence of an inverted U-shaped EKC curve in six of the 50 analyzed countries. On the contrary, the semiparametric analysis with CSD provides a more definite shape of the income-pollution relationship with flexibility in functional form specification as a non-monotonically increasing or decreasing relationship for CO 2 emissions in all the countries, depending on the level of economic development of the country. This inconsistency reiterates the econometric caveats in the literature surrounding ex-ante restrictions on the functional form specification and robustness issues. Finally, allowing heterogeneous unknown smooth functions and slope parameters enables us to characterize the individual behaviour of each of the countries directly. This result is very useful in this literature given the heterogeneity that characterizes the economic structure of the countries.
The conclusions from this study can be very useful for the policy makers in order to decide where they have to focus their effort to achieve sustainable economic growth. Our results suggest that not only poor countries, but also richer countries, face environmental pollution. It implies that economic development is not a sufficient condition to reduce CO 2 emissions. In this situation, all countries, especially developed countries because of their important resources (financial, technological, etc.), should make an effort to reduce these emissions in order to reduce global warming.

Conclusions
CO 2 emissions play an important role in global warming and much effort is being pout into reducing such emissions all over the world. Consequently, studying the relationship between CO 2 emissions and economic development is important in order to be able to correctly advice policy makers on how best to make economic growth compatible with the environment. With this aim, a huge international literature has emerged trying to test of the validity of the EKC hypothesis although there is still a great lack of consensus about the income-pollution relationship.
A possible reason for this lack of consensus may be the presence of several specification errors in the models used to corroborate the EKC hypothesis. In this paper, we contribute to the literature revisiting the EKC hypothesis trough an alternative more flexible estimation technique which enables to account for functional form misspecification, cross-sectional dependence, and heterogeneous relationships among variables, simultaneously. The empirical results show that the nature and validity of the income-pollution relationship based on the EKC hypothesis crucially depends the model assumptions on functional form specification.
The semiparametric estimators proposed in this paper enables to provide a more definite shape of the income-pollution relationship non-monotonically increasing or decreasing for the OECD and non-OECD countries, depending on the level of economic development of the country. On the contrary, the fully parametric estimate, only offers insight into the existence of an inverted U-shaped EKC curve in six of the 50 analyzed countries. This inconsistency reiterates the econometric caveats in the literature surrounding ex-ante restrictions on the functional form specification and robustness issues.
Author Contributions: Conceptualization, data curation, formal analysis, funding acquisition, investigation, methodology, A.S.; writing-original draft preparation, writing-review and editing, supervision, I.D. All authors have read and agreed to the published version of the manuscript.

Funding:
The authors gratefully acknowledge financial support from the Programa Estatal de Generación de Conocimiento y Fortalecimiento Científico y Tecnolótico del Sistema de I+D+i y del Programa Estatal de I+D+i Orientada a los Retos de la Sociedad/Spanish Ministry of Science and Innovation (Ref. PID2019-105986GB-C22) for the partial support of this work.

Acknowledgments:
The authors are thankful for the constructive suggestions provided by the members of the evaluation board of the final project of the Economics Degree of the University of Cantabria, which improved the quality of the paper.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Econometric Estimation. Semiparametric Approach
A semi-parametric model specifies the conditional mean of the dependent variable as two separate components, one parametric and one non-parametric. These types of models are very attractive from an empirical point of view due to their flexibility to balance precision and robustness. On the one hand, it allows us to incorporate prior information from economic theory or past experience while maintaining more flexibility in the specification of the model. On the other hand, although there is a nonparametric part that shows a slower convergence rate, the estimators obtained for the parametric part exhibit the same statistical properties as if the whole model were totally parametric. This is the so-called √ N-consistency property (see Robinson (1988) and Speckman (1988), for example) for cross-sectional models. Finally, semiparametric models allow to provide a solution to the well-known "curse of dimensionality" of the fully nonparametric models. Several reviews on this topic exist and we suggest the interested reader consult Ai and Li (2008), Henderson and Parmeter (2015), Parmeter and Racine (2019), Rodriguez-Poo and Soberon (2017), and Su and Ullah (2011), among others.
A semi-parametric panel data model with heterogeneous slopes and unknown functions and cross-sectional dependence is given by where y it denotes the dependent variable (i.e., the environmental quality measure in Equation (5)), x it and z it are p × 1 and q × 1 vectors of the explanatory variables of interest (i.e., in the case of Equation (5), x it = (ln(pop it ), ln(enit it )) and z it = ln(gdpc it )), respectively, m i (·) is an unknown smooth function and β i is a p × 1 vector of unknown population parameters. The aim of the researchers is to obtain consistent estimators of m i (·) and β i knowing that d t ≡ (d 1t , d 2t , . . . , d Nt ) is a N × 1 vector of observed common effects (including deterministic regressors such as intercepts or seasonal dummies), α i is a N × 1 vector of unknown parameters, f t ≡ ( f 1t , f 2t , . . . , f rt ) is a r × 1 vector of unobserved common factors, γ i ≡ (γ i1 , γ i2 , . . . , γ ir ) is the corresponding vector of the factor loadings, and it are the individual-specific (idiosyncratic) errors assumed to be independently distributed of (d t , x it , z it ). In general, however, the unobserved factors f t could be correlated with (d t , x it , z it ), and to allow for such a possibility, we adopt the fairly general models for the individual specific regressors, where A i , B i , Γ 1i , and Γ 2i are N × p, N × q, r × p, and r × q, factor loading matrices with fixed components, and v 1it and v 2it are the specific components of x it and z it , respectively, distributed independently of the common effects and across i, but assumed to follow general covariance stationary processes.
With the aim of obtaining consistent estimators for β i and m i (·), in the following we will show how, with modifications, the Common Correlated Effects (CCE) approach proposed in Pesaran (2006) can be applied to a semiparametric regression model. Let 0 p and 0 q be matrices of zero of p × p and q × q dimension, respectively, I p and I q identity matrices of p × p and q × q dimension. If we combine (A1)-(A3) and rearrange terms, we can write where C i and D i are matrices of N × (1 + p + q) and r × (1 + p + q) dimension, respectively, of the form In order to show that using suitable proxies for the unobserved factors is enough to avoid having to use initial estimates of β i , we take the cross-sectional sample averages of (A4) obtaining where , v 1At , and v 2At are the cross-sectional averages of C it , D it , v 1it , and v 2it , respectively, and Pesaran (2006), we can premultiply both sides of (A5) by Γ and solve for f t , provided that rank(Γ) = r ≤ (1 + p + q) for sufficiently large N. As → 0, and m At p → 0 for each t under weak conditions. It follows, The result of this last line suggests that we can use λ t ≡ (y At , x At , z At , d t ) as observable proxies of the unobservable factors, f t . Therefore, we can conclude that effectively the Common Correlated Effect (CCE) approach proposed in Pesaran (2006) for fully parametric models can be applied in a semi-parametric setting with slight changes.
In this situation, we can estimate β i and m i (·) by augmenting the semiparametric regression of y it on x it and z it with h t obtaining the following regression model where o p (1) captures possible approximation errors of the proxies. In addition, λ t ≡ (y At , x At , z At , d t ) is a × 1 vector of proxies, where = (1 + p + q + N).
In order to get a √ T-consistent estimator of β i , we follow Robinson (1988) to eliminate the unknown functional m i (·). Taking conditional expectations of (A9) yields and subtracting (A10) from (A9) yields In order to obtain feasible estimators for β i , it is well-known that these conditional expectations are unknown and need to be estimated. With this aim, Robinson (1988) proposes to use (higher-order) Nadaraya-Watson kernel estimators. Later, Linton (1995) and Hamilton and Truong (1997), among others, pointed out that partial regression methods can be improved further by using local linear smoothers (see Fan and Gijbels (1996) to a deeper discussion about the desirable properties of these estimators). At the light of these results, we propose to use a local linear smoothers to estimate these conditional expectations. Let . For a given point z ∈ I R q and for z it in a neighbourhood of z, we propose to minimize the following weighted local linear least-squares (LLLS) problem for µ 1 , where K a (·) = a −q K((z it − z)/a) is a product kernel function such as K(u) = ∏ q l=1 k(u l ), u l is the lth component of u, and a is a positive bandwidth term. Of course, a general diagonal or non-diagonal bandwidth matrix could be employed, but for the sake of simplicity, a single scalar bandwidth is used.
Using the resulting estimators for these conditional estimators in (A11) and writing the resulting expression in vectorial form yields where Y i· ≡ (Y i1 , . . . , Y iT ) and i· ≡ ( * i1 , . . . , * iT ) are T-dimensional vectors, X i· ≡ (X i1 , . . . , X iT ) and Λ ≡ (λ 1 , . . . , λ T ) are matrices of dimension T × d and T × , respectively, and I T is a T × T diagonal matrix. Further, assuming that Z z i K a (z it )Z z i is invertible, S i is a T × T smoothing matrix associated to the individual i of the form where Z z i is a T × (1 + q) matrix, e 1 is a (1 + q) × 1 vector having 1 in the first entry and all other entries 0, K a (z) = diag(K a (z i1 − z), . . . , K a (z iT − z)) is a T × T diagonal matrix. Note that * it is the new error term which consists of three elements: (i) original error term, (ii) approximation error of the proxies, (iii) approximation error of the Taylor expansion.
By the formula for partitioned regression, the estimator of β i in (A13) is given by Following a similar reasoning, the estimator of δ i in (A13) is given by where M X i = I T − X i· ( X i· X i· ) −1 X i· . Focusing now on the nonparametric estimation of the smooth unknown function m i (·), we use the above estimator so the corresponding weighted local linear least-squares problem to minimize is of the following form where K h (·) is a product kernel defined as in (A12) and h is the new bandwidth term. Then, assuming that (Z z i K h (z i )Z z i ) is invertible, the resulting CCE nonparametric estimator of m i (·) is given by where Under the conditions in Appendix B one can show that the semiparametric CCE estimator, β i is consistent and asymptotically normal as N and T tends to infinity. More precisely, following a similar proof scheme as in Pesaran (2006), the following result is obtained.
Theorem A1. Consider the panel data model (A1), and suppose that β i < C, Γ 1i < C, Γ 2i < C, Assumptions A1-A3 and A4-A7 hold, (N, T) → ∞ (in no particular order), and the rank condition (A7) is satisfied. Then, β i and δ i are consistent estimators of β i and δ i , respectively. If it is further assumed that √ T/N → 0 as (N, T) → ∞, then and Ω i = E( i· i· ) are covariance matrices. Furthermore, M G = I T − G( G G) −1 G for G = ( D, F), where D and F are vectors whose tth element are such as d t = d t − E(d t |z it = z) f z it (z) and f t = f t − E( f t |z it = z) f z it (z).
Similarly, under the conditions in Appendix B one can show that the nonparametric CCE estimator, m i (z; h) is consistent and asymptotically normal as N and T tends to infinity. Theorem A2. Consider the panel data model (A1), and suppose that Assumptions A1-A9 hold. Given the √ T-consistency of β i and δ i , as T → ∞, where σ 2 i = E( 2 it ) and H m i (·) is the Hessian matrix of m i (·).
That theorem is proved following a similar proof scheme as in Musolesi et al. (2020), so it is omitted. The detailed proof of the theorem can be provided upon request.
Finally note that the estimate of the variances of the above theorems can be used to construct standard errors for β i or confidence bands for m i (·). We use a standard multivariate kernel density estimator with a Epanechnikov kernel and the Silverman's rule-of-thumb to chose the bandwidth.

Appendix B. Assumptions
In order to derive the asymptotic distribution of β i and m i (·) obtained in Appendix A, the following notation is used. Denoting X i· = X i· − B X (z) and Λ = Λ − B Λ (z), where B X (z) = E(X i· |z it = z) f z it (z) and B Λ (z) = E(Λ|z it = z) f z it (z). Furthermore, the following conditions are required.
Assumption A1. (Common Effects). The (N + r) × 1 vector for common effects (d t , f t ), is covariance stationary with absolute summable autocovariances, distributed independently of the individual-specific errors it , v 1it , and v 2it for all i, t, and t .
Assumption A3. (Identification of β i ) For each i, Ψ iA = T −1 X i· M Λ X i· and Ψ iG = T −1 X i M G X i· are nonsingular p × p matrices and have finite second-order moments for all i. Furthermore, Assumption A4. (Density function). The density of z it satisfies 0 < f z it (·) < ∞ and is twice continuously differentiable in all its arguments with bounded second-order derivatives at any point of its support.

Assumption A5. (Smoothness condition)
. Let Z ∈ I R q be the support of z it . The unknown functions E(λ t |z it ), E(x it |z it ), and E(y it |z it ) are bounded and twice continuously differentiable at z in the interior of Z with second-order derivatives bounded.