Using Quantile Regression to Analyze the Relationship between Socioeconomic Indicators and Carbon Dioxide Emissions in G20 Countries

: Numerous studies addressed the impacts of social development and economic growth on the environment. This paper presents a study about the inclusive impact of social and economic factors on the environment by analyzing the association between carbon dioxide (CO 2 ) emissions and two socioeconomic indicators, namely, Human Development Index (HDI) and Legatum Prosperity Index (LPI), under the Environmental Kuznets Curve (EKC) framework. To this end, we developed a two-stage methodology. At ﬁrst, a multivariate model was constructed that accurately explains CO 2 emissions by selecting the appropriate set of control variables based on model quality statistics. The control variables include GDP per capita, urbanization, fossil fuel consumption, and trade openness. Then, quantile regression was used to empirically analyze the inclusive relationship between CO 2 emissions and the socioeconomic indicators, which revealed many interesting results. First, decreasing CO 2 emissions was coupled with inclusive socioeconomic development. Both LPI and HDI had a negative marginal relationship with CO 2 emissions at quantiles from 0.2 to 1. Second, the EKC hypothesis was valid for G20 countries during the study period with an inﬂection point around quantile 0.15. Third, the fossil fuel consumption had a signiﬁcant positive relation with CO 2 emissions, whereas urbanization and trade openness had a negative relation during the study period. Finally, this study empirically indicates that effective policies and policy coordination on broad social, living, and economic dimensions can lead to reductions in CO 2 emissions while preserving inclusive growth.


Introduction
Climate change has become a global challenge to all earth inhabitants [1,2]. In 2015, the United Nations Framework Convention on Climate Change (UNFCCC) responded to the climate change risks by adopting the Paris Climate Change Agreement [3]. This agreement aims to "limit global warming to well below 2 degrees Celsius, preferably to 1.5 degrees Celsius, compared to pre-industrial levels". Many studies have proven that human activities such as burning fossil fuels are the main causes of increasing heat temperatures, which contribute to global warming and escalate the speed of changes in the climate system [4][5][6].
Understanding the causes of global CO 2 emissions has attracted the attention of researchers and policy makers alike. This study addresses the inclusive relationship between socioeconomic indicators and CO 2 emissions. The aim is to understand the collective impact of multiple social and economic factors on the environment. A well-defined socioeconomic indicator is the Legatum Prosperity Index (LPI), which provides a methodology to Other studies examined different variables; nonetheless, they apply similar concepts or tools. Wang et al. [23] conducted a study about the influence of trade openness on CO 2 emissions and found a positive relation, since trade openness encourages more production, which increases pollution. In another case study [24], the authors concluded that trade openness facilitates clean production using advanced technology, which improves the environment. The influence of population density on CO 2 emissions was studied by Chen et al. [25], and they discovered a positive relation for developing countries and a negative relation for developed countries. Lv and Li [26] examined the effect of financial development on CO 2 emission. They found that there was a positive correlation between CO 2 emissions and financial development. However, their results also indicate that there was a significantly negative spatial correlation between CO 2 emissions among neighboring countries. Moreover, the data show that there was a significant negative spillover effect of financial development on CO 2 emissions. These findings suggest that a country with high financial development could improve a neighboring country's environmental performance. The work by Angel et al. [27] built on the assumption that urban growth is associated with decreasing GHG emission and studied urban expansion and densification in 200 cities around the world during the period 1990 to 2014. They concluded that a trade-off between densifying and expanding cities should be targeted based on population increase and economic growth. On contrary, other studies support the assumption that urbanization increases carbon emissions [28,29].
In this paper, we focus the attention on the influence of inclusive socioeconomic development on CO 2 emissions in the G20 countries. These countries include wide variations in economic and social dimensions, which together account for around 90% of world GDP, 80% of global trade, 60% of world population, and about half of the global land area [30]. Unlike many previous studies that focused solely on economic growth, we adopt two inclusive indicators that cover pivotal economic and social well-being: LPI and HDI. To study their association with CO 2 emissions, we developed two multivariate models that adequately explain CO 2 emissions using relevant control variables. To this end, we considered GDP per capita, fossil fuel consumption, urbanization, and trade openness, whose associations with CO 2 have been well-studied [4,13,24,31,32]. Then, two models were constructed with LPI and HDI to analyze their relationship with CO 2 emissions using quantile regression.
Our study contributes to the literature as follows: (i) This is the first study that investigates the relationship between inclusive socioeconomic indicators and CO 2 emissions. (ii) We modelled the per capita CO 2 emissions using two socioeconomic indicators and four well-studied control variables, i.e., per capita GDP, fossil fuel consumption, urbanization, and trade openness. (iii) We empirically tested the EKC hypothesis. (iv) We applied the most recent data for all G20 countries, which widely represent the world and include a period after the Paris climate change agreement [3]. The remainder of this paper is organized as follows. Section 2 presents the materials and methods, Section 3 provides results and discussion, and Section 4 concludes the study with policy implications.

Materials and Methods
This paper utilized an empirical approach to analyze data from 2000 to 2019 to understand the overall relationship between CO 2 emissions and socioeconomic indicators represented by LPI and HDI. Figure 1 shows the main steps of our study. At first, a set of well-studied drivers of CO 2 emissions were considered as model control variables. To this end, data for gross domestic product (GDP), fossil fuel consumption (FC), urbanization (URB), trade openness (TR), and population density (PD) were utilized. The outcome of the variable selection method is a subset of the original set of variables, which are selected based on their goodness of explanation and statistical significance. Then, two multivariate models were constructed with LPI and HDI to analyze their association with CO 2 emissions using quantile regression.

Data and Variable Selection
This study used the annual data for G20 countries for eight variables, as shown in Table 1. These variables include CO2, LPI, and HDI, which are the main focus of this study. The other variables (GDP, FC, TR, URB, and PD) were considered for selecting control variables in our model. Due to the complexity and nonlinearity of CO2 emissions, a model that comprehend an adequate set of control variables is needed to study the relationship with the indicators LPI and HDI. That is, a set of N explanatory variables Xi are related to CO2 as follows: where ln is the natural logarithm; CO2t is the per capita CO2 emissions at year t; Xn is the n-th explanatory variable; N is the number of explanatory variables; α0 …. αN are the model parameters. The aim is to select a subset of control variables that best explains the dependent variable using quantile regression (explained in Section 2.3). For each possible subset of the N explanatory variables (there are 2 N − 1 possible subsets), the model parameters are computed at 50% quantile along with the adjusted R-squared, mean absolute error (MAE), and statistical significance. These three statistical metrics reflect the model quality and are used for selecting the control variables in the models for analyzing the relationship between CO2 emissions and the socioeconomic indicators.

Empirical Model
To study the inclusive relationship between the socioeconomic indicators and CO2 emissions, two empirical models were considered for LPI and HDI as follows:

Data and Variable Selection
This study used the annual data for G20 countries for eight variables, as shown in Table 1. These variables include CO 2 , LPI, and HDI, which are the main focus of this study. The other variables (GDP, FC, TR, URB, and PD) were considered for selecting control variables in our model. Due to the complexity and nonlinearity of CO 2 emissions, a model that comprehend an adequate set of control variables is needed to study the relationship with the indicators LPI and HDI. That is, a set of N explanatory variables X i are related to CO 2 as follows: where ln is the natural logarithm; CO 2t is the per capita CO 2 emissions at year t; X n is the n-th explanatory variable; N is the number of explanatory variables; α 0 . . . α N are the model parameters. The aim is to select a subset of control variables that best explains the dependent variable using quantile regression (explained in Section 2.3). For each possible subset of the N explanatory variables (there are 2 N − 1 possible subsets), the model parameters are computed at 50% quantile along with the adjusted R-squared, mean absolute error (MAE), and statistical significance. These three statistical metrics reflect the model quality and are used for selecting the control variables in the models for analyzing the relationship between CO 2 emissions and the socioeconomic indicators.

Empirical Model
To study the inclusive relationship between the socioeconomic indicators and CO 2 emissions, two empirical models were considered for LPI and HDI as follows: where ln is the natural logarithm, CO 2it is per capita CO 2 emissions with I ∈ {G20 countries} and t ∈ [2000:2019], LPI and HDI are socioeconomic indicators, GDP is per capita GDP, and X j ⊆ {FC, URB, TR, PD} are control variables selected according to Equation (1). All G20 countries were covered in this study with an exception of the EU (which consisted of 28 countries during the study period, out of which four countries were included in the study, namely, France, Germany, Italy, and the UK). The parameters β 0 , . . . , β 3 , δ 0 , . . . , δ 3 are the elasticity estimates of the explanatory variables. The quadratic term of the per capita GDP was introduced in the models to empirically test the EKC hypothesis. When β 2 is positive and β 3 is negative, this means an inverted U-shape pattern is detected [36]. Meanwhile, if β 2 is negative and β 3 is positive, this means a U-shape pattern is detected. When both β 2 and β 3 are positive (negative), a monotonically increasing (decreasing) pattern holds. For the coefficients of LPI and HDI (β 1 , δ 1 ), positive (negative) values mean more (less) CO 2 emissions.

Quantile Regression
Quantile regression (QR), introduced by Koenker and Bassett [37], is a common approach in econometrics for parameter estimation and analysis of models. Compared to the ordinary least squares (OLS) method that estimates the conditional mean of the dependent variable, QR is median-based and aims at estimating quantiles of the dependent variable. Suppose y and x are dependent and independent variables, respectively. Assuming linearity in the conditional relation y|x leads to the following: where β q is the coefficient of the q quantile, q ∈ [0, 1]. β q can be estimated by minimizing the following sum of absolute differences objective function: The parameters in Equation (5) can be evaluated using linear programming [38]. By varying q gradually from 0 to 1 and solving for β q , a plot is obtained for each explanatory variable that explains its relation with the dependent variable. The QR estimation is more robust to outliers and wide variations in data than the OLS estimation. More importantly, when the distribution of the dependent variable does not follow the normal distribution, which is the case for most environmental and economic data, OLS estimation becomes unreliable whereas QR estimation can detect the heterogeneous relations with the dependent variable [39].

Data Statistics
The statistical descriptors of the variables in this study are shown in Table 2. The skewness and kurtosis descriptors were computed based on third-order and fourth-order central moments, respectively. The skewness reflects the asymmetry of the distribution, whereas kurtosis describes its steepness. The Jarque-Bera test [40] scores indicate that the null hypothesis is rejected for all variables, which means they do not follow the normal Sustainability 2021, 13, 7011 6 of 12 distribution. Therefore, the OLS is not accurate for parameter estimation when modeling these variables.

Control Variables Selection
In this study, we empirically selected the control variables of the regression models as described in Section 2.1. The aim was to select a subset from a group of explanatory variables (GDP, URB, FC, TR, and PD) that is statistically significant and yields the highest adjusted R-squared value and lowest MAE. To this end, regression parameters of Equation (1) were computed for 2 5 − 1 = 31 subsets of variables, as depicted in Figure 2. From the figure, GDP, URB, FC, and TR returned the highest adjuster R-squared value of 0.89 and lowest MAE of 0.0301 with all variables were statistically significant. Therefore, they were considered as control variables in models 1 and 2 of Equations (2) and (3).

Control Variables Selection
In this study, we empirically selected the control variables of the regression models as described in Section 2.1. The aim was to select a subset from a group of explanatory variables (GDP, URB, FC, TR, and PD) that is statistically significant and yields the highest adjusted R-squared value and lowest MAE. To this end, regression parameters of Equation (1) were computed for 2 5 − 1 = 31 subsets of variables, as depicted in Figure 2. From the figure, GDP, URB, FC, and TR returned the highest adjuster R-squared value of 0.89 and lowest MAE of 0.0301 with all variables were statistically significant. Therefore, they were considered as control variables in models 1 and 2 of Equations (2) and (3).

Empirical Analysis
The empirical analysis was carried out by estimating the model parameters of the explanatory variables using quantile regression and then interpreting their relations with the CO2 emissions. In this study, LPI and HDI were employed to represent socioeconomic indicators. Based on the selected control variables in Section 3.2, Equations (3) and (4) can be rewritten as follows:

Empirical Analysis
The empirical analysis was carried out by estimating the model parameters of the explanatory variables using quantile regression and then interpreting their relations with the CO 2 emissions. In this study, LPI and HDI were employed to represent socioeconomic indicators. Based on the selected control variables in Section 3.2, Equations (3) and (4) can be rewritten as follows: Model 2 : Figures 3 and 4 show the quantile regression plots of the per capita CO 2 emissions (horizontal axes) versus each explanatory variable (vertical axes) for LPI and HDI, respectively. In both figures, q was changed gradually from 0.05 to 0.95 with 0.05 steps. The vertical axes in the plots depict estimation of the parameters β 1 , . . . , β 6 in model 1 and δ 1 , . . . , δ 6 in model 2. For comparison purposes, the red horizontal line in each plot represents the estimated coefficient obtained by the OLS and its corresponding 95% confidence interval indicated by the dotted lines. The curve represents the estimated coefficient using QR, and the shaded area shows the corresponding 95% confidence interval. Tables 3 and 4 show the quantile regression results for the explanatory variables at different q values for LPI and HDI, respectively. Due to data availability, LPI was analyzed for the period from 2007 to 2019, whereas HDI was analyzed from 2000 to 2019. In the following, we analyze the influence of the explanatory variables on CO 2 emissions in both cases of LPI and HDI. For the intercept coefficients, their values were included in Tables 3 and 4, but their plots were not included due to space limitations. Intercept coefficients are useful for prediction applications and not important for analyzing the marginal effects of the explanatory variables. For comparison purposes, the red horizontal line in each plot represents the estimated coefficient obtained by the OLS and its corresponding 95% confidence interval indicated by the dotted lines. The curve represents the estimated coefficient using QR, and the shaded area shows the corresponding 95% confidence interval. Tables 3 and  4 show the quantile regression results for the explanatory variables at different q values for LPI and HDI, respectively. Due to data availability, LPI was analyzed for the period from 2007 to 2019, whereas HDI was analyzed from 2000 to 2019. In the following, we analyze the influence of the explanatory variables on CO2 emissions in both cases of LPI and HDI. For the intercept coefficients, their values were included in Tables 3 and 4, but their plots were not included due to space limitations. Intercept coefficients are useful for prediction applications and not important for analyzing the marginal effects of the explanatory variables.  Note: The numbers in parentheses are standard errors. ***, **, and * stand for the significance level of 1%, 5%, and 10%, respectively.    Note: The numbers in parentheses are standard errors. ***, **, and * stand for the significance level of 1%, 5%, and 10%, respectively.

Discussion
This subsection provides discussion about the results of this paper and comparison with similar works in the literature. For both LPI and HDI, there was a negative marginal relationship with CO 2 emissions at quantiles larger than 0.2, with this relation becoming more significant as the quantile increases. This means that improved socioeconomic development was related to reductions in the CO 2 emissions in G20 countries except those below the 0.2 quantile. For the per capita GDP, the effect on CO 2 emission was positive for all quantiles. To empirically validate the EKC hypothesis, we analyzed the coefficient Sustainability 2021, 13, 7011 9 of 12 of the squared per capita GDP, which was negative for quantiles larger than 0.15. This indicates that a persistent increase in the per capita GDP can lead to reductions in the per capita CO 2 emissions. We conclude that the EKC hypothesis was applicable to the G20 during the period of study. This conclusion is consistent with the findings of [18][19][20]41] and against those of [21,22]. An interesting finding is the link between EKC curve, LPI, and HDI turning points in relation with CO 2 emissions. As illustrated in Figure 5, at low quantiles (interval A), all GDP, squared GDP, LPI, and HDI had positive relations with CO 2 emissions. Whereas in interval B, squared GDP, LPI, and HDI had negative relations. This evidence supports the assumption that at early stages of development, an improved socioeconomic development is associated with an increase in CO 2 emissions. However, as economic and social development matures, increasing GDP is associated with reductions in CO 2 emissions.  For the control variables, the empirical results in this paper show that increasing urbanization in G20 countries is associated with reductions in CO2 emissions, which agrees with the results on G20 countries data by Chen et al. [25] and does not agree with the finding using data of USA [31] and Bangladesh [42]. We hypothesize that different patterns of urbanization depending on social and economic circumstances have distinct im- For the control variables, the empirical results in this paper show that increasing urbanization in G20 countries is associated with reductions in CO 2 emissions, which agrees with the results on G20 countries data by Chen et al. [25] and does not agree with the finding using data of USA [31] and Bangladesh [42]. We hypothesize that different patterns of urbanization depending on social and economic circumstances have distinct impacts on the environment. In addition, there is consensus in the research community that increased fossil fuel consumption is coupled with increased CO 2 emissions [31,43], which this study confirms as well. Finally, this study revealed that increased trade openness is associated with reduction in CO 2 emissions, which agrees with the work by Chen et al. [25] on data from G20 countries and by Dogan and Turkekul [31] on USA, and disagrees with another work on 64 countries [44]. Our interpretation to this variation is that trade openness impact on CO 2 emissions is related to the nature of goods and services being imported and exported and their impact on the environment. Economies that import goods and services that release high CO 2 emissions during production can reduce their emissions quota compared with producing them locally. In addition, opening trade between countries allows market access and stimulates increased production and more waste and pollution from the burning of fossil fuels.

Main Conclusions and Policy Implications
This study investigated the inclusive relationship between socioeconomic indicators and CO 2 emissions in G20 countries using two socioeconomic indicators: LPI for the period 2007 to 2019 and HDI for the period 2000 to 2019, which were chosen based upon the reliability and availability of data. Quantile regression was employed to simultaneously analyze the heterogeneous effects of the explanatory variables, which included four control variables beside the socioeconomic variables. We empirically selected the control variables based on their regression quality and introduced them in two empirical models for LPI and HDI. The selected control variables were fossil fuel consumption, gross domestic production, trade openness, and urbanization. The results of both models revealed that there was an association between increased socioeconomic development and reduced CO 2 emissions. The EKC hypothesis was validated for the G20 countries based upon the empirical models used in our study. The fossil fuel consumption was significantly related to increased CO 2 emissions. In addition, reductions of CO 2 emission were related to increasing the trade openness and urbanization. Policy makers are encouraged to look into the factors that contribute to socioeconomic development. In particular, we suggest there is a negative relationship between socioeconomic development and CO 2 emissions, which could help policy makers adopt the necessary measures to combat climate change and yet become more prosperous.
The socioeconomic relationship with CO 2 emissions described in this paper provides policy makers with a more inclusive overview about how to counter CO 2 emissions. Unlike the majority of previous works, there is no emphasis on a specific factor that drives CO 2 emissions. We approached the issue of CO 2 emissions and sustainable development from a relatively new perspective. Policy makers ought to understand that although socioeconomic development could lead to more CO 2 emission, the relationship is described in an inverted U-shaped manner, as described in the EKC hypothesis. This means that as a country becomes more developed, it should experience lower CO 2 emissions. This could be a result of a more advancement in technology, education, infrastructure, and other dimensions of socioeconomic development.
This study has examined the issue of CO 2 at a macro level. Nonetheless, future research needs to go beyond the scope of this study, and it may cover other environmental factors that could hinder the efforts to combat climate change. There is an urgent need to investigate the issue of climate change from different perspectives as they provide the academic and practice communities with the needed knowledge to understand the issue holistically. Policy makers in various economic and social fields are encouraged to coordinate their policies to balance achieving prosperity for their communities with the environmental implications of those policies. A suggested future research direction is to examine CO 2 emissions for each G20 country and perform comprehensive and unified comparison.