Testing Nonlinearity with Rényi and Tsallis Mutual Information with an Application in the EKC Hypothesis

The nature of dependence between random variables has always been the subject of many statistical problems for over a century. Yet today, there is a great deal of research on this topic, especially focusing on the analysis of nonlinearity. Shannon mutual information has been considered to be the most comprehensive measure of dependence for evaluating total dependence, and several methods have been suggested for discerning the linear and nonlinear components of dependence between two variables. We, in this study, propose employing the Rényi and Tsallis mutual information measures for measuring total dependence because of their parametric nature. We first use a residual analysis in order to remove linear dependence between the variables, and then we compare the Rényi and Tsallis mutual information measures of the original data with that the lacking linear component to determine the degree of nonlinearity. A comparison against the values of the Shannon mutual information measure is also provided. Finally, we apply our method to the environmental Kuznets curve (EKC) and demonstrate the validity of the EKC hypothesis for Eastern Asian and Asia-Pacific countries.


Introduction
An analysis of the dependence between two or more random variables can be traced back to the late 19th century, beginning with the works of mathematicians such as Gauss and Laplace. Later, Galton created the concept of correlation, which enabled Pearson to derive the correlation coefficient that has been extensively used in all kinds of statistical analyses since then [1]. When the dependence is linear or approximately linear, the correlation coefficient is the most effective indicator of the relationship between the random variables. It also provides a simple interpretation for the direction of the relation, whether positive or negative. When the dependence departs from the linearity, the linear correlation coefficient is of no use, and various methods have been proposed for evaluating nonlinearity. One of these measures is Spearman's correlation coefficient, which is nonparametric and uses ranked values to assess monotonic nonlinearity between two random variables [2]. Another measure for nonlinear dependence is the correlation ratio, which expresses the relationship between random variables as a single valued function. In the case of nonlinear relationships, the value of the correlation ratio is greater than the correlation coefficient, and therefore, the difference between the correlation ratio and the correlation coefficient refers to the degree of the nonlinearity of dependence [3]. Polynomial regression has also been used for modeling nonlinear dependence in various phenomena. Although nonparametric regression models have been used more often, polynomial regression is still being deployed for modeling dependence in some areas of application, such as biomechanics [4], cosmology [5], climatization [6], and chemistry [7]. As more and more-complex data have been produced through technological development, the need for analyzing these data have Although nonparametric regression models have been used more often, polynomial regression is still being deployed for modeling dependence in some areas of application, such as biomechanics [4], cosmology [5], climatization [6], and chemistry [7]. As more and more-complex data have been produced through technological development, the need for analyzing these data have given rise to a new field, called functional data analysis, which also includes functional regression. Functional regression models assume functional relationships between responses and predictors, and for polynomial models, these relationships are in polynomial form rather than linear [8].
Shannon entropy plays a central role in information theory as a measure of information choice and uncertainty. Conditional entropy can also be used as a measure of missing information [9]. Conditional entropy or mutual information do not assume any underlying distribution and reflect the stochastic relationship between random variables as a whole-linear or nonlinear [10]. These properties have made mutual information a good choice for analyzing dependencies. Hence, mutual information is extensively used for dependency analysis, especially in finance [11][12][13] and in genetics [14][15][16]. Although mutual information is an effective method for determining the dependency between random variables, it does not provide any information on the nature of the dependence as being linear or nonlinear. Very few attempts have been made to investigate the nature of the dependence by extracting the linear component of Shannon mutual information, though some have, such as [1,17].
The environmental Kuznets curve (EKC) hypothesis states that there is an inverse Ushape relationship between per capita gross domestic product (GDP) and measures of environmental degradation [18]. Because carbon dioxide ( ) is the major factor for greenhouse gas emissions, it is accepted as the main reason for the environmental degradation. Hence, the same relationship is assumed between GDP and . So the EKC is an indication of the "stages of economic growth" that economies pass through as they make a transition from agriculturally based to industrial and then to postindustrial servicebased economies. In a way, EKC provides a visual representation of the stages of economic growth, as seen in Figure 1 (Panayatou 1993). There are various methods in the literature to test the EKC. Some studies have used panel data, while others have used time series data [19]. Panayotou [20], who first suggested the term EKC, used cross-sectional data and empirically tested the relation between environmental degradation and economic development for the late 1980s. He discovered quadratic patterns in a sample of developing and developed countries. Antle and Heidebrink [21] found turning points for the EKC curve by using cross-sectional data. Vasilev [22] also studied EKC with cross-sectional data. There are various methods in the literature to test the EKC. Some studies have used panel data, while others have used time series data [19]. Panayotou [20], who first suggested the term EKC, used cross-sectional data and empirically tested the relation between environmental degradation and economic development for the late 1980s. He discovered quadratic patterns in a sample of developing and developed countries. Antle and Heidebrink [21] found turning points for the EKC curve by using cross-sectional data. Vasilev [22] also studied EKC with cross-sectional data.
Although the determination of the exact shape of the Kuznets curve is important, demonstrating its nonlinearity will help support the EKC hypothesis. We aim to determine nonlinearity by deploying mutual information with an application on EKC. The Rényi and Tsallis mutual information types are used in determining the nonlinearity of EKC, and the results are compared with that of Shannon. By demonstrating the confirmation of the EKC hypothesis, it can be concluded that the "grow and pollute now, clean later" strategy revealed by the hypothesis has enormous environmental costs, so alternative strategies should be developed for growth.
The structure of our study is as follows: Section 2 describes the tests for nonlinearity on the basis of mutual information. Section 3 starts with the application by conducting a cross-sectional analysis using ordinary least squares (OLS) and then adds the application of nonlinearity tests. Finally, Section 4 concludes.

Mutual Information
Relative entropy is a special case of statistical divergence. It is a measure of the inefficiency of assuming that the probability distribution is q when the true distribution is p [23]. Shannon, Rényi, and Tsallis relative entropies for the discrete case are defined as follows: Bivariate extensions are as follows: To check the independence of variables, the null and alternative hypotheses can be stated as follows: where q X,Y (x, y) = p X (x)·p Y (y) for all (x, y) ∈ R 2 . Mutual information can be seen as the divergence of the joint probability function from the product of the two marginal probability distributions. In other words, mutual information is derived as a special case of divergence or relative entropy. Three alternative formulations of mutual information are due to Shannon, Rényi, and Tsallis. Shannon mutual information (or Kullback-Leibler divergence) is defined as follows: Mutual information formulated this way is also called as cross entropy. Rényi order-α divergence (or Rényi mutual information) of p X,Y (x, y) from p X (x)p Y (y) is given as follows: Tsallis order-α divergence of p X,Y (x, y) from p X (x)p Y (y) (or Tsallis mutual information) is given as follows: In the case of independence, the Rényi and Tsallis mutual information types are 0, just like Shannon mutual information. As α → 1, the Rényi and Tsallis mutual information types approach Shannon mutual information [24]. The mutual information of two variables reflects the reduction in the variability of one variable, by knowing the other. Mutual information becomes 0 if and only if the random variables are independent. It should also be emphasized that mutual information measures general dependence, whereas the correlation coefficient measures linear dependence [15].

Testing Linearity by Using Mutual Information
The application of the Shannon mutual information measure on the problem of detecting nonlinearity was suggested by Tanaka, Okamoto, and Naito [17] and by Smith [1].
This method utilizes the residuals obtained by the ordinary linear regression model. Note that a linear regression model that fits data well is a good indicator of linear relation between variables so that the residuals obtained from a linear model are considered to include no linear dependence on independent variables: Next, the mutual information between residuals and observed values of the independent variable is calculated. The mutual information between independent and dependent variables M(X,Y) can be computed, as can the mutual information between independent variable and the residuals obtained from linear regression M(X,ξ). Note that the later statistic reflects the nonlinear dependence between the original variables. If the mutual information between the independent variable and residuals does not differ much from the mutual information between the dependent and independent variables, then the relation is nonlinear. By comparing M(X,ξ) with M(X,Y), we can evaluate the degree of nonlinearity in the dependence [1,17].
We suggest that nonlinearity can be detected better by the Rényi and Tsallis mutual information measures because of their parametric nature.
Especially becauase the Tsallis mutual information measure is calculated on the basis of the power of α, the larger the α value, the larger the Tsallis mutual information was becoming, so the difference between these two common mutual information measures cannot be interpreted. Therefore, we suggest a new measure that still leads to the same result, as seen in Equation (13): The letters S, R, and T in the index indicate the Shannon, Rényi, and Tsallis mutual information measures, respectively. As M(X,ξ) and M(X,Y) become closer to each other, λ converges to zero, implying nonlinearity. This hypothesis is tested by using two simulated data sets, one of which represents a linear relationship and the other one reflects curvilinearity. The number of simulated pairs of X and Y values is 1000. The simulated data representing the linear and the curvilinear relationships are modeled by Equations (14) and (15): Various α values between 0 and 5 are selected randomly from a uniform distribution for assessing the effect of α on nonlinearity measures. Table 1 provides 50 randomly generated observations from a uniform distribution for different values of α and the corresponding λ values for the Rényi and Tsallis measures. Because λ values close to 1 indicate a linear relationship, λ T , λ S , and λ R support the linearity hypothesis. It can be observed that λ T detects linearity more strongly than does λ R for α > 1; conversely, λ R captures linearity better for α < 1. The mean and the standard deviation for each mutual information measure are also presented in Table 1 for checking the variability of each measure against various α values. The standard deviation values for λ T are lower than those for λ R , pointing out the consistency of Tsallis in the case of linearity.
On the other hand, nonlinearity is captured by λ T better than by λ R for α < 1 and vice versa for α > 1. When the standard deviations are considered, λ R is more stable in determining nonlinearity.
Changing the scale parameter α of mutual information measures naturally changes the sensitivity of this measure, and by plotting λ values against the scale parameter α, the change in sensitivity can be graphically displayed. In order to visually interpret the results, the λ R and λ T values, according to the different α values, are as seen in the graphs: As can be seen from Figure 2, for α > 1, λ T is more succesful and stable than λ R for a linear relationship. In addition, the λ T measure consistently takes values close to 1, whereas λ R gets smaller as α values increase. As seen in Figure 3, in the curvilinear relationship, started to grow after alpha 1.4; takes values close to zero in all values of alpha. However, also takes a maximum value of 0.09101. Both common information measures can be used as criteria in nonlinearity. However, more consistently indicates nonlinearity. Because there is no logarithmic function in the Tsallis mutual information formula, when α takes a value greater than 1, Tsallis mutual information makes deviations from linearity less important than Rényi mutual information does. Therefore, will make it less sensitive to nonlinearity than and therefore more unresponsive to nonlinearity than and . For the same reason, will represent linearity better than Rényi will in linear relationship. An important general property of Rényi entropy is that for a given probability distribution, Rényi entropy is a monotonically decreasing function of α, where α is an arbitrary real number other than 1. Therefore, as can be seen in Figure 2, increasing α values will not provide additional information, so α values are limited to 5.

Method for Bin-Size Selection
Mutual information depends mainly on both the bin size and the sample size; thus, a natural question arises about the optimal choice of one parameter given the value of another. Here, we use the Freedman-Diaconis rule for finding the optimal number of bins. According to this rule, the optimal number of bins can be calculated on the basis of the As seen in Figure 3, in the curvilinear relationship, λ T started to grow after alpha 1.4; λ R takes values close to zero in all values of alpha. However, λ T also takes a maximum value of 0.09101. Both common information measures can be used as criteria in nonlinearity. However, λ R more consistently indicates nonlinearity. Because there is no logarithmic function in the Tsallis mutual information formula, when α takes a value greater than 1, Tsallis mutual information makes deviations from linearity less important than Rényi mutual information does. Therefore, λ T will make it less sensitive to nonlinearity than λ R and therefore more unresponsive to nonlinearity than λ S and λ T . For the same reason, λ T will represent linearity better than Rényi will in linear relationship. As seen in Figure 3, in the curvilinear relationship, started to grow after alpha 1.4; takes values close to zero in all values of alpha. However, also takes a maximum value of 0.09101. Both common information measures can be used as criteria in nonlinearity. However, more consistently indicates nonlinearity. Because there is no logarithmic function in the Tsallis mutual information formula, when α takes a value greater than 1, Tsallis mutual information makes deviations from linearity less important than Rényi mutual information does. Therefore, will make it less sensitive to nonlinearity than and therefore more unresponsive to nonlinearity than and . For the same reason, will represent linearity better than Rényi will in linear relationship. An important general property of Rényi entropy is that for a given probability distribution, Rényi entropy is a monotonically decreasing function of α, where α is an arbitrary real number other than 1. Therefore, as can be seen in Figure 2, increasing α values will not provide additional information, so α values are limited to 5.

Method for Bin-Size Selection
Mutual information depends mainly on both the bin size and the sample size; thus, a natural question arises about the optimal choice of one parameter given the value of another. Here, we use the Freedman-Diaconis rule for finding the optimal number of bins. According to this rule, the optimal number of bins can be calculated on the basis of the An important general property of Rényi entropy is that for a given probability distribution, Rényi entropy is a monotonically decreasing function of α, where α is an arbitrary real number other than 1. Therefore, as can be seen in Figure 2, increasing α values will not provide additional information, so α values are limited to 5.

Method for Bin-Size Selection
Mutual information depends mainly on both the bin size and the sample size; thus, a natural question arises about the optimal choice of one parameter given the value of another. Here, we use the Freedman-Diaconis rule for finding the optimal number of bins. According to this rule, the optimal number of bins can be calculated on the basis of the interquartile range (IQR = Q 3 − Q 1 ) and the number of data points n. Freedman and Diaconis use the IQR of the data instead of the standard deviation; therefore, this method is described as more robust than some of the other methods.
The Freedman-Diaconis rule takes into account the asymmetry of the data and sets the bin size to be proportional to the IQR [25]. Countries (1971-2016)

Model
To test the EKC hypothesis, a simple linear regression model is applied. Using the ordinary least squares procedure, we find a quadratic relationship ("inverted U-hypothesis") between CO 2 emissions (metric tons per capita) and GDP per capita (current USD) in a time series of East Asia and Asia-Pacific countries (excluding high-income countries) over a 46-year period.
East Asia and Asia-Pacific countries were classified initially as low income (LIC) in the 1990s, then as lower middle income (LMC) in 2010. In fact, the highest growth rate of CO 2 emissions (5.6% (1990-2008)) was observed in the East Asia and the Asia-Pacific region, where the highest GDP growth rates (7.2% (1990-2000) and 9.4% (2000-2010)) were achieved.
We first examine the residual diagrams from a linear regression model to determine whether there are serious deviations from assumptions. In Figure 4a, nonlinearity is apparent, whereas in Figure 4b, the deviation from normality assumption can be seen: IQR Q Q ) and the number of data points n. Freedman and Diaconis use the IQR of the data instead of the standard deviation; therefore, this method is described as more robust than some of the other methods.
( ) × Δ = 3 2 bin IQR X n (16) The Freedman-Diaconis rule takes into account the asymmetry of the data and sets the bin size to be proportional to the IQR [25].

Model
To test the EKC hypothesis, a simple linear regression model is applied. Using the ordinary least squares procedure, we find a quadratic relationship ("inverted U-hypothesis") between emissions (metric tons per capita) and GDP per capita (current USD) in a time series of East Asia and Asia-Pacific countries (excluding high-income countries) over a 46-year period.
East Asia and Asia-Pacific countries were classified initially as low income (LIC) in the 1990s, then as lower middle income (LMC) in 2010. In fact, the highest growth rate of emissions (5.6% (1990-2008)) was observed in the East Asia and the Asia-Pacific region, where the highest GDP growth rates (7.2% (1990-2000) and 9.4% (2000-2010)) were achieved.
We first examine the residual diagrams from a linear regression model to determine whether there are serious deviations from assumptions. In Figure 4a, nonlinearity is apparent, whereas in Figure 4b, the deviation from normality assumption can be seen: According to a quick visual check of the residuals in Figure 4a, a quadratic model seems to be more appropriate. In Table 2, the results of quadratic models are given. The scatter diagram of and GDP variables is shown in Figure 5.  According to a quick visual check of the residuals in Figure 4a, a quadratic model seems to be more appropriate. In Table 2, the results of quadratic models are given. The scatter diagram of CO 2 and GDP variables is shown in Figure 5.  To test the appropriateness of a simple linear regression function, the null and alternative hypotheses are given as follows: When we look at the results, shown in Table 3  The test statistic is as follows: To test the appropriateness of a simple linear regression function, the null and alternative hypotheses are given as follows: The general linear test statistic for simple regression model is as follows: When we look at the results, shown in Table 3, F * > F(0.05; 3.41) = 2.833, so we reject null hypothesis H 0 . This means that the linear regression function does not provide a good fit for the data. The dependence measures are r 2 = 0.91 and η 2 XY = 0.96. A nonzero value of η 2 YX − r 2 is associated with a departure from linearity. The calculated value of this difference is η 2 YX − r 2 = 0.05. To test the significance of this difference, the alternatives are given as follows: H 0 : The relationship between X and Y is linear. H a : The relationship between X and Y is not linear. The test statistic is as follows: This value of F also indicates a significant departure from linearity.

Testing Linearity on the Basis of Shannon, Rényi, and Tsallis Mutual Information Measures
The Tanaka, Okamoto, and Naito [17] and Smith [1] method is based on comparing the Shannon mutual information between the original data series with that between the new ones obtained by removing linear dependence from the original ones.
Entropy and mutual information calculations are based on a contingency table. A possible reason for the EKC hypothesis may lie in the fact that in poor countries, most of the output is produced in the agricultural sector. So CO 2 emissions are lower in these countries than in other countries. In middle-income countries, pollution begins to increase. As the country grows, it tends to switch to cleaner technologies.
Here, on the basis of the Freedman-Diaconis rule, the optimal number of bins is calculated and presented in Table 4: To detect nonlinearity by using the Shannon, Rényi and Tsallis mutual information measures, the following table for different values of alpha may help. To evaluate the degree of nonlinearity included in the dependence, the two mutual information measures were compared. When M(X, ξ) = M(X, Y), the dependence is interpreted to be based on nonlinearity, so the proposed λ S , λ R , and λ T measures are considered as criteria of nonlinearity.
As seen in the Table 5, λ S , λ R , and λ T are close to zero, so the relationship is nonlinear. As can be checked from the simulation data in Table 1, α < 1 λ T and α > 1 λ R more successfully reveal the curvature. Therefore, the results obtained from the EKC data also support this situation. As a result, the λ S , λ R , and λ T values nearly zero indicate a curvilinear relationship, which supports the EKC hypothesis.  The relationship between λ and α can be seen in Figure 6: The relationship between λ and α can be seen in Figure 6:

Conclusions
The environmental Kuznets curve (EKC) hypothesizes that the relationship between environmental quality and real output has an inverted U-shaped quality. Using the ordinary least squares estimation procedure, we have found a quadratic relationship between emission and GDP in a time series of East Asia and Asia-Pasific countries (excluding high-income countries) over a period of 46 years. One technique to check the EKC hypothesis utilizes an F test, by which we have concluded that the linear model does not provide a good fit for the data. As a second technique, comparing the linear determination coefficient with the correlation ratio may be useful. Again, for the EKC data, the difference between these two association measures was found to be significant, addressing curvilinearity. Alternatively, the difference between two dependence measures on the basis of mutual information can be used. Although Shannon mutual information has been used more often in the literature, we suggested that the Rényi and Tsallis mutual information measures catch the nature of the relation between the variables better because of their parametric flexibility.

Conclusions
The environmental Kuznets curve (EKC) hypothesizes that the relationship between environmental quality and real output has an inverted U-shaped quality. Using the ordinary least squares estimation procedure, we have found a quadratic relationship between CO 2 emission and GDP in a time series of East Asia and Asia-Pasific countries (excluding highincome countries) over a period of 46 years. One technique to check the EKC hypothesis utilizes an F test, by which we have concluded that the linear model does not provide a good fit for the data. As a second technique, comparing the linear determination coefficient with the correlation ratio may be useful. Again, for the EKC data, the difference between these two association measures was found to be significant, addressing curvilinearity. Alternatively, the difference between two dependence measures on the basis of mutual information can be used. Although Shannon mutual information has been used more often in the literature, we suggested that the Rényi and Tsallis mutual information measures catch the nature of the relation between the variables better because of their parametric flexibility.
In this study, the mutual information between dependent and independent variables (M(X,Y)) was found first. Secondly, by using a simple linear regression model, the residuals (ξ) were calculated. Then, the mutual information between the independent variable and the residuals (M(X,ξ)) was obtained. Finally, by comparing these two mutual information measures, the degree of nonlinearity included in the dependence was determined. We also proposed a measure of nonlinearity, λ, and demonstrated that the Rényi and Tsallis mutual information measures determined nonlinearity better for certain ranges of α values compared with the Shannon mutual information measure.
Applications of all these measures on CO 2 emissions and GDP data underlined curvilinearity, and hence, the presumed pattern by the EKC hypothesis was realistic. The result concludes that the "growth and pollute now, clean later" strategy is wasting a lot of resources and has enormous environmental costs. Therefore, countries should seek alternative growth strategies.