A Regression Analysis of the Carbon Footprint of Megacities

Urbanization and climate change are two major issues that humanity faces in the 21st century. Megacities are large urban agglomerations with more than 10 million inhabitants that emerged in the 20th century. The world’s top 100 economies include many North and South American megacities, such as New York, Los Angeles, Mexico City, Sao Paulo and Buenos Aires; European cities such as London and Paris; and Asian cities such as Tokyo, Osaka, Seoul, Beijing and Mumbai. This paper addresses a dearth of megacity energy metabolism models in the literature. Cross-sectional data for 36 global megacities were collected from many literature and Internet sources. Variables included megacity name, country and region; population; area; population density; (per capita) GDP; income inequality measures; (per capita) energy consumption; household electricity prices; (per capita) carbon and ecological footprint; degree days; average urban heat island intensity; and temperature and precipitation. A descriptive comparison of the characteristics of megacities was followed by ordinary least squares with heteroskedasticity-robust standard errors that were used to estimate four alternative multiple regression models. The per-capita carbon footprint of megacities was positively associated with the megacity GDP per capita, and the megacity ecological footprint; and negatively associated with country income inequality, a low-income country dummy, the country household electricity price, and the megacity annual precipitation. Targeted policies are needed, but more policy autonomy should be left to megacities. Collecting longitudinal data for megacities is very challenging but should be a next step to overcome misspecification and bias issues that plague cross-sectional approaches.


Introduction
Cities have been the "hotspots of human economic activity" [1] since the onset of agriculture 10,000 years ago [2]. The industrial revolution brought factory jobs near cities, and motivated migration from rural to urban areas [3]. This was a momentous demographic change [4], that altered the rural and urban landscape [5]. While cities occupy only 2% of earth's area, they account for about 70% of energy consumption [3] and produce about 60% of the global greenhouse emissions [6].
Understanding the factors that drive the carbon emissions of megacities, i.e., the largest cities in the world, is an important task in the fight against climate change. Yet, such an empirical analysis for a complete list of global megacities is lacking from the published literature.
This work follows from the review of Tasios, Koumenou and Paravantis [7], who presented a qualitative overview of the contribution of megacities to climate change by reviewing trends and literature findings related to growth and expansion over time; energy consumption and carbon dioxide emissions; ecological footprint; urban sprawl and transport governance; and the Urban Heat Island (UHI) and heat mortality. Although they did not carry out any statistical analysis of empirical data, in their suggestions for further work, they mentioned forthcoming research that would carry out such an analysis for a complete list of global megacities.
This empirical research is the promised continuation of their study, assessing the contribution of megacities to climate change via their carbon footprint. Multiple regression is carried out on cross sectional data for 36 global megacities with population greater than 10 million inhabitants, associating their carbon footprint with socioeconomic factors (including income inequality) and geographic factors (especially precipitation).
To the knowledge of the authors of this work, such an empirical analysis has not been published in the research literature, so the results of this research fill a gap. Although cross sectional data lack a time component, the findings of this work are significant and interesting. They provide a basis for further research with longitudinal data, although collecting such data for megacities will be a difficult task. If the factors driving the carbon footprint of megacities are understood, targeted policies may be undertaken by local authorities that will help in the fight against climate change.
As for the rest of this paper, Section 2 presents the literature review culminating in the research question. Section 3 describes the data and outlines the methodology. Section 4 presents the results of the analysis. Section 5 discusses some of these results. Finally, Section 6 contains the conclusions, lists the limitations of this work, and presents recommendations for further research.

Literature Review
From 1950 until 1980, the world experienced intense industrialization and urbanization. Cities were rebuilt, new cities were established, and industrial production grew [8]. Urban population which equaled 30% of the total population in 1950, surpassed 50% around 2007, and was around 55% in 2018 [9]. The process of urbanization is illustrated in Figure 1.
In 1950, about 60% of the population lived in urban areas with fewer than 300,000 inhabitants, while just 17% resided in urban areas with 1 to 5 million inhabitants. By 2015, about 42% of the population lived in urban areas with fewer than 300,000 inhabitants, having dropped by 17.5% since 1950. At the same time, the population living in urban areas with 1-5 million and 10 million or more inhabitants, increased to 22% and 12% respectively, with the latter presenting the highest increase over time (8.5%). As can be seen from the steady slope of the corresponding line in Figure 1, the population of cities inhabited by 300 thousand to one million, remained a constant percentage of the total.
The massive migration from rural areas to urban centers resulted in an increasing number of very large cities, usually with converging metropolitan areas and a population over 10 million people, referred to as megacities [1,10]. In 2018, over half a million people lived in 36 megacities around the globe. To meet their needs and sustain their development, megacities are in constant demand of more food, water, fuel and energy, which in turn causes increasing emissions, refuse and wastewater disposal [11].
Urbanization intensifies environmental impacts such as traffic air pollution; morbidity and mortality especially during heatwaves [12]; and social problems (e.g., crime) that have a bigger impact on vulnerable social groups such as the elderly, who live on limited financial resources, and cannot relocate easily. In particular, megacities have a de facto front runner position in addressing climate change [13]. According to the 2010 World Energy Outlook [14], urban areas were responsible for 71% of global energy-related carbon emissions. The Urban Heat Island (UHI) is one of the most documented urban aspects of climate change [15] and can be used as an indicator of urbanization [16]. The UHI Intensity equals the maximum temperature difference between the urban area and its rural environment [17]. Most documented cases have reported UHI intensity varying from one to 15 • C [15,[18][19][20].
mendations for further research.

Literature Review
From 1950 until 1980, the world experienced intense industrialization and urbanization. Cities were rebuilt, new cities were established, and industrial production grew [8]. Urban population which equaled 30% of the total population in 1950, surpassed 50% around 2007, and was around 55% in 2018 [9]. The process of urbanization is illustrated in Figure 1.  Carbon emissions may be measured by the carbon footprint. Carbon footprint equals the total greenhouse gas emissions, direct and indirect. Several factors are thought to influence the carbon footprint. Little work has been published on the factors that determine the carbon footprint of global megacities. This task is important because of the preeminent role of megacities in climate change.
Motivated by a dearth of available data on the carbon emissions of metropolitan areas, Sovacool and Brown [21] carried out a preliminary assessment of the carbon footprint of 12 megacities: four of the most densely populated cities in the world (Sao Paulo, New Delhi, Manila, and Singapore); five of the most populated cities in the world (Beijing, Tokyo, New York, Jakarta, and Seoul); and three of the cities with the largest land area (Los Angeles, London, and Mexico City). They aimed to calculate the carbon footprint of these megacities, ascertain whether it was lower or higher than the corresponding national average, compare the megacities to one another, and present implications for public policy. They defined a metropolitan area according to its political boundaries, and accounted for emissions from personal and mass transportation, buildings and industry, agriculture and forestry, and wastes, ignoring activities such as heavy freight, air transportation, and marine bunkering of oil. They measured direct emissions and those called responsible emissions, i.e., emissions from products that were produced in a metropolitan area but consumed elsewhere. They ignored deemed emissions, i.e., the opposite of responsible, and logistic emissions, i.e., those related to products passing through the metropolitan area. Sovacool and Brown presented no intermediate indicator values nor any computational details on how the carbon footprint values of these 12 metropolitan areas were calculated, but discussed policy suggestions on population density, transport modes, electricity supply, and tradeoffs. Los Angeles had the biggest metropolitan footprint per capita (3.68 metric tons), followed by Singapore, New York, Mexico City, Tokyo, and Seoul; New Delhi had the smallest (0.70 metric tons). The 12 metropolitan areas that were examined are a small sample of the megacities in existence globally, and the authors pointed out that their emission estimates may vary in quality and coverage because of their reliance on various literature sources.
Motivated by the need to understand the drivers behind the carbon footprint, Minx et al. [22] applied a hierarchical three-step (local, regional, and national level) hybrid method, linking global supply chains to local activities and lifestyles for 434 urban and rural municipalities in the UK. Their method integrated consumption and geodemographic data from multiple sources, prioritizing the most robust information. More detailed local and regional data sources were used when possible. Carbon footprint estimates were compared with extended territorial carbon emission accounts. Both the highest and lower carbon footprint values were found in urban areas. General additive models (GAMs), a generalization of ordinary least squares, were estimated. Variables used included income per capita, household size, per capita ownership, and the proportion of highly educated people. It was found that the carbon footprint of cities was mainly determined by sociodemographic, and less so by infrastructural and geographic factors. Carbon emissions were found to increase with growing income, education, car ownership, and decreasing household size. The impact of population density was small, but statistically significant. Heating degree days were insignificant for the carbon footprint as a whole.
In their review, Wiedenhofer et al. [23] examined the role of urban form in the relationship of household consumption, time use, and carbon footprints. The authors stressed the importance of land use, infrastructure services, and equity, further to population density, which is often cited in this respect. Income was found to have overriding relevance and was a major predictor of the carbon footprint of households, while income inequality has a substantial impact on its distribution. Household size is a key factor in reducing carbon footprints due to sharing appliances and living spaces and thus achieving household economies of scale. Education, gender, and age had mixed small effects, depending on the country and socioeconomic group. Urban form and population density are considered major factors enabling lower energy use and emissions, an effect described as urban economies of scale. Population density may interact with income, e.g., in the UK, income is a main predictor of carbon footprint at lower population densities. All in all, shorter travel distances, higher shares of public transport, and more cycling and walking offer potential for the mitigation of urban emissions.
Motivated by the lack of understanding of how carbon footprints are distributed among cities and how they vary by type of urban settlements, Moran et al. [24] calculated the carbon footprint of 13,000 cities worldwide, including all global megacities. They adopted a top-down approach, using griddled population and income data to disaggregate existing carbon footprint values in four steps. Their model used urban versus rural consumption patterns and purchasing power as the main predictors of the carbon footprint per capita. Population and population density values were used to identify cities, which was a nontrivial task especially for contiguous conurbations such as Tokyo/Yokohama. Using various assumptions, their model managed to allocate all carbon emissions in a manner that was subjected to sensitivity analysis. It was found that a relatively small number of urban areas account for a disproportionate share of global carbon footprint. Corroborating other literature findings, it was found that the top 10% of income earners are responsible for at least 38% of global greenhouse gas emissions. Urban areas with high carbon footprint were found even in countries with low total and per capita emissions, such as Dhaka (Bangladesh), Cairo (Egypt), and Lima (Peru). The largest urban clusters had carbon footprint in excess of their direct emissions, underscoring the need to account for indirect carbon emissions derived from food, paper use, transportation, waste disposal, etc. Moran et al. argue that state and local authorities may benefit by understanding the distribution and drivers of carbon footprint, and low-carbon programs will be more effective if they consider local consumer income and consumption patterns.
Bargaoui, Liouane and Nouri [25] used the STIRPAT (Stochastic Impacts by Regression on Population, Affluence and Technology) approach to study the carbon dioxide emissions of 211 countries for the period 1980-2010. Independent variables included: population; Gross Domestic Product (GDP) per capita; industrial activity and/or energy efficiency; and Kyoto protocol ratification. They allowed for country and year specific effects to capture country heterogeneity and time effects. Alternative static and dynamic models were estimated, with corrections to remove any endogeneity bias caused by the lagged carbon dioxide emission values. Significant effects were found for population, economic growth, urbanization, and Kyoto protocol ramification. Urbanization impacted carbon dioxide emissions differently according to income levels. GDP per capita exerted a significant positive effect on carbon dioxide emissions, except for low-income countries.
Noorpoor and Kudahi [26] used the STIRPAT approach to model the carbon dioxide emissions of the power sector of Iran. Data were collected from various sources; the carbon dioxide emissions corresponding to the energy consumption of the residential, industrial, public, agricultural, trade, and lightning sectors were estimated using the emission factors of the Intergovernmental Panel on Climate Change (IPCC) guidelines. Grid losses and the energy consumption of power plants were also taken into consideration. The following independent variables were used, all in logarithm (log) form: population; GDP per capita; electricity intensity (in kWh/US$); and electricity generation from natural gas, heavy oil, gas oil, and the sum of hydropower, renewable energy, and nuclear energy. Population and GDP per capita (or economic level) played a role in other similar studies reviewed by Noorpoor and Kudahi. A regression model was estimated on short time series (11 years) data by partial least squares and had an impressively high coefficient of determination (R 2 = 0.999, with little other fit information presented). It was found that population, GDP per capita, electricity intensity, and the consumption of fossil fuels for electricity generation influenced carbon dioxide emissions positively. Electricity generation by hydroelectric, renewable and nuclear energy was the only variable that was associated with fewer carbon dioxide emissions.
Wang et al. [27] aimed to developing a reliable statistical estimation methodology for STIRPAT models, by addressing heterogeneity and non-normality in the data. Data were collected from 76 Chinese cities for an 11-year period by resorting to the yearbooks of the cities. Significant disparities in the carbon emissions per capita were noted among the Chinese cities. Their dependent variable was defined as the logarithm of carbon emissions per capita. The following independent variables were used, also log transformed: GDP per capita; squared GDP per capita (to identify the existence of an Environmental Kuznets Curve (EKC) relationship, if its coefficient were negative); average annual resident population; percent of urban in resident population; share of coal in energy consumption; share of valued added in secondary industry. An Asymmetric Laplace Distribution Mixture Model was estimated and compared to other methods including Ordinary Least Squares. The results indicated the existence of EKC for Chinese cities and proposed that more policy autonomy be left to them. The authors pointed out that a bottom-up approach, involving local authorities in the determination of their most pressing environmental issues and the development of their own individualized plans for mitigating emissions, is compatible with the approach of the 2015 United Nations Paris Climate Change Conference.
The ecological footprint is related to the carbon footprint of cities. Baabou et al. [28] computed the ecological footprint of 19 coastal Mediterranean cities. In their analysis, Baabou et al. examined how the ecological footprint varied by consumption category (food, transport, goods, housing, gross fixed capital formation, services, and government), land use type, and time. They found that the ecological footprint per capita of most cities exceeded the corresponding value of their country. To get a sense for the elasticities of the various categories, an Ordinary Least Squares regression model of the ecological footprint was estimated on only 17 observations for the five independent variables of food, housing, goods, services, and transportation. All variables were in log-transformed values. It was found that differences among the ecological footprint of cities are likely driven by socioeconomic factors like disposable income, infrastructure, and cultural habits.
Motivated from the insufficient understanding of the potential for urban mitigation of climate change, Creutzig et al. [29] analyzed 274 cities of all sizes worldwide based on three global datasets. Multiple linear regression models of greenhouse gas emissions per capita, final energy per capita, and urban transport energy per capita were estimated via the backward elimination procedure, with GDP per capita, population density, heating degree days, and fuel price as independent variables. It was found that GDP per capita and heating degree days were positively associated, while population density and fuel price were negatively associated with the dependent variables. All data were in natural log form, which-as the authors noted-allows convenient interpretation of the coefficients that are independent of units. Cooling degree days, household size, urbanization rate, and a commerce center index were independent variables that were not selected. The authors of this research prefer the estimation of a single best model formulation (developed on strong theoretical reasoning) to automatic atheoretical approaches such as backward elimination. Furthermore, three-level threshold regression was used to split cities based on economic activity, population density, gasoline prices, and heating degree days, uncovering eight city typologies. Energy use was found to vary with increasing GDP per capita especially for values up to 10,000 US$; this increase slowed down above 30,000 US$. It was found that economic activity, transport costs, geographic factors, and urban form explained 37% of urban direct energy use, and 88% of urban transport energy use. It was concluded that effective urban planning and transport policies must vary by city type. Higher gasoline prices for affluent cities in developed countries, and higher population densities and compact urban form for cities in developing countries, can limit energy use and mitigate the carbon emissions of cities.
Regarding the methodologies that may be used to analyze carbon footprint, Hailemariam, Dzhumashev and Shahbaz [30] investigated the association of carbon emissions with income inequality and economic growth. Income inequality is a slow-moving process, so it was pointed out that the lack of reliable historical cross-country data makes controlling for unobserved common factors difficult because modern panel data estimation techniques require large samples over a lengthy period of time. Furthermore, it was argued that cross-sectional analyses may yield biased and inconsistent estimates. It was also reported that most previous studies do not account for endogeneity (caused by simultaneity, omission of relevant variables, and measurement errors), heterogeneity, and cross-sectional dependence.
To capture income inequality, Hailemariam, Dzhumashev and Shahbaz [30] used the Gini index and had cross-country annual data spanning the period from 1945 to 2010. They argue that their study also captured the Veblen effect, i.e., the emulative tendency of the wealthy to consume expensive items as a means of confirming their status. Their findings revealed that an increase in the top income inequality is associated with an increase in carbon emissions. The effect of income on emissions was conditional on the level of economic development, playing a negative role on carbon emissions when economic development is high. These findings were consistent with the Environmental Kuznets Curve (EKC) hypothesis, according to which environmental quality deteriorates with per capita income up to a point (as economic growth takes precedence over a clean environment); after that, environmental quality improves, behaving like a superior good (as the wealthy, having solved their economic problems, now develop a preference for a cleaner environment to complement their quality of life).
In comparing their results to the published literature, Hailemariam, Dzhumashev and Shahbaz [30] reviewed inconclusive and conflicting findings on the relationship of income and income inequality with environmental quality, even including no significant effects of the Gini index on emissions. They attributed this lack of agreement to model misspecifications and lack of comparability of data over space and time in previous studies. In interpreting their literature findings, some fine points should be taken into consideration, e.g., the existence of richer households does not necessarily imply greater inequality. In a couple of the few empirical works similar to this research, with statistical analysis of the energy metabolism of global megacities, Kennedy et al. [6,31] reported on 27 megacities with population greater than 10 million in 2010. They sought to study the urban metabolism and identify associated biophysical characteristics that may be used to compare megacities. Among other goals, Kennedy et al. quantified the energy flows for dominant forms of consumption, and analyzed factors correlated with them. The 27 megacities examined consumed 6.7% of the global energy consumption, 9.3% of global electricity, and 9.9% of global gasoline. The majority of megacities are in developing regions of the world, particularly Asia, with hot climate and low heating requirements. Although these megacities included some of the wealthiest cities in the world, they are characterized by extreme poverty levels, and socio-spatial fragmentation.
In a macroscale analysis, Kennedy et al. [31] reported that urban density is significantly related with transportation energy consumption (if a wide range of densities is examined). They also reported that the per capita use of heating and industrial fuels is significantly correlated with heating degree days. In terms of statistical analysis, Kennedy et al. [6,31] estimated stepwise linear regression models with only one or two independent variables, including the 10-year population growth; GDP; 10-year GDP growth rate; 10-year growth rate of electricity; 10-year growth rate of transportation; urbanized area per person; residential and total gross floor area; and heating degree days. Reporting that little previous research had explored differences in electricity use between global cities, Kennedy et al. found the electricity use per capita in megacities to be significantly correlated with the urbanized area per capita. Electricity use per capita increased for lower density cities. It was hypothesized that lower density megacities have greater building floor space per capita, leading to higher electricity consumption for lighting and other building uses; that turned out to be a less significant factor in a microscale analysis of a couple of megacity subareas. The GDP per capita was also significantly correlated with the per capita electricity use, but it was dropped out of the stepwise regression analysis because of less explanatory power. In their microscale analysis, Kennedy et al. (2015a) focused on the association of building floor area with electricity use in subareas of London and Buenos Aires. Their findings reflected spatial variation in wealth and possible spatial tradeoffs between living space and disutility of travel.
Unfortunately, Kennedy et al. [31] did not estimate a single multiple linear regression model with as many independent variables (guided by theory and the literature) as allowed by the available cases. Furthermore, some of the figures of Kennedy et al. (including those in the supplementary material) show linear relationships between variables that were characterized by heteroscedasticity and the presence of outliers, e.g., GDP per capita versus area per person [31]; commercial/industrial electricity use versus commercial/industrial floor space [31]; and transportation fuel user per capita versus GDP per capita [6]. Such variables should have been log transformed to remove skewness before they were analyzed by regression.
Although the resource flows and the wastes produced by megacities have global environmental impacts, their quantification is rarely undertaken, a gap that hampers the development of policy [6]. Although the work of Kennedy et al. [6,31] was pioneering, it underscored the need for more complete and rigorous statistical analysis of the energy metabolism and carbon emissions of megacities. Unfortunately, as is the experience of the authors of this research, collecting building data for megacities is a challenging task.
The previous paragraphs looked at how urbanization has created megacities; presented a short overview of the environmental impacts of megacities, including their role in global climate change; noted that direct and indirect carbon emissions may be represented by the carbon footprint; presented empirical works addressing the factors that determine carbon emissions including STIRPAT approaches; and examined the few works that carried out statistical analysis of related megacity data. Little research has been directed at modeling the carbon footprint of all global megacities and the factors that determine it. This study addresses this literature gap, by developing a multiple regression model of the carbon footprint of 36 global megacities with cross sectional data that include geographic, socioeconomic, energy, and environmental factors. The fact that the data on which the models were estimated are not longitudinal presents certain inherent limitations. Nevertheless, this study is a solid first step in determining the driving factors of the carbon footprint of global megacities and a useful template for comparing megacities and considering mitigation policies.

Materials and Methods
Multiple regression analysis was used. Collecting longitudinal data for megacities would be quite tedious because time series data on global megacity variables are not readily available. So, this research assembled cross-sectional data on megacities. Minitab (Minitab Ltd., Coventry, UK) was used for graphing, and the freeware Gretl (Gnu Regression, Econometrics and Time-series Library, http://gretl.sourceforge.net/) econometric package for model estimation [32].
Data were collected from numerous literature publications and online sources that are shown in the rightmost column of Table 1. Most data were available for 2018, with some data being available for the two previous years. Carbon footprint was chosen as the dependent variable, with data from Moran et al. [24]. Population data of the correct year were used to compute various indexes. Unfortunately, most energy data were only available for megacities up to 2011; these energy data were investigated but not used in the final regression model. Variable definitions, units and some descriptive statistics are shown in Table 1.
The number of nonmissing and missing cases per variable is shown in the fourth column of Table 1. Of the 27 variables, 18 (67%) had no missing cases. Of the rest, four (15%) had nine missing cases. No effort to impute missing values was made. No variable was ignored because of missing values. As it turned out, the final selection of independent variables in the regression models rendered 32 nonmissing cases.
Variables that exhibited positive skewness, were transformed with natural logarithms (with LN prefixed to the variable name). Such log transformations are common in econometric analysis (as well as in studies reviewed herein). When used carefully [33], they can make data conform more closely to the normal distribution and help avoid nonlinear correlations. They can also allow the estimation of regression models on data would violate the assumption of ordinary least squares if they were not log transformed. Finally, log transformed data may be easier to interpret. The 36 megacities examined are shown in Table 2. Although London, Seoul, and Tehran had less than 10 million people, they were included in the list. Of the 36 megacities, 22 were in Asia; 6 in South America; Europe and Africa had 3 megacities each; and North America had 2. Of the 36 megacities, 6 were in China; 5 were in India; Brazil, Japan, Pakistan, and the United States had 2 megacities each; the other countries had one megacity each.
The 36 megacities had a total population of 562.066 million, which represented 7.4% of the global population of 2018 (7.592 billion, according to the latest World Bank data available at https://data.worldbank.org/indicator/SP.POP.TOTL). The 22 megacities that were located in Asia had a total population of 368.553 million, accounting for almost two thirds (64.5%) of the total megacity population. Tokyo (Japan) was the most populated megacity, housing 37.5 million people, followed by Chongqing (China, 29.9 million), New Delhi (India, 28.5 million), Shanghai (China, 25.5 million), Sao Paulo (Brazil, 21.5 billion), and Cairo (Egypt, 20 million). Seoul (South Korea), London (United Kingdom), and Tehran (Iran) were the smallest megacities with population below 10 million.  The average population density was highest for African and Asian megacities. Dhaka was the most densely populated megacity with 53,578 thousand inhabitants per km 2 . Karachi, Lahore, Kinshasa, Mumbai and Bogota had population densities decreasing from 33.9 to 26.6 thousand inhabitants per km 2 . Next there was a group of 11 megacities with population densities between 19.2 and 10.2 thousand inhabitants per km 2 . The rest of the megacities had population densities lower than 10 inhabitants per km 2 . Megacities with large area and relatively low population were characterized by urban sprawl.
New York had the highest GDP (1751 billion US$), with Tokyo and Los Angeles being next (976 and 941.06 billion US$ respectively); Paris, London, Shanghai and Beijing were next, with GDPs decreasing from 681 to 513 billion US$. The other megacities had GDP smaller than 410 billion US$, with Lagos and Kinshasa having the smallest GDP values (33.679 and 6.477 US$ correspondingly). Of the 36 countries, 5 were in countries that were characterized as low income (Bangladesh, Democratic Republic of Congo, Nigeria, Pakistan) by the previously cited G 2 LM|LIC Programme.
A similar picture was given by the GPD per capita, which on the average was much higher for North American and European megacities. New York had the highest GDP per capita, 93,613 US$. Los Angeles and London had GDP per capita values around 75 thousand US$. Paris was next, with 63,120 US$. Seoul and Shenzhen had GDP per capita values equal to 39,044 and 33,929 US$ correspondingly. All other megacities had per capita GDPs below 28 thousand US$. Lagos and Kinshasa had the lowest per capita GDP values (2321 and 535 US$ respectively). The Gross National Income (GNI) in Purchasing Power Parities (PPP) per capita was also available, but only on a per country basis, and gave information similar to the GDP.
Data on three indicators were collected to measure income inequality: (1) the Gini income inequality index, (2) the inequality-adjusted Human Development Index (HDI), and (3) the Palma ratio. The Gini income inequality index ranges from zero to one, with bigger values corresponding to greater income inequality. In this research, the Gini index was only available at a country level, varying from 0.299 to 0.51 with a mean value of 0.43. The Palma ratio is another measure of inequality that equals the ratio of the richest 10% divided by the poorest 40% of the population's share of gross national income (GNI). Bigger values of the Palma ratio correspond to greater income inequality. In this research, the Palma ratio was also available only at a country level and varied from 1.2 (for Japan) to 3.1 (for Colombia) and 4 (for Brazil), with a mean value of 1.828. The inequality adjusted HDI varied from 0.316 (Kinshasa) to 0.882 (Tokyo and Osaka), with higher values indicating more human development adjusted for the human development cost of inequality.
Megacity energy related variables had the most missing values. Total energy consumption per capita was highest for North American followed by European megacities. Total megacity energy consumption per capita was highest in New York City, Moscow and Tehran, and lowest in Mumbai and Kolkata. Megacity electricity consumption per capita, higher for North American megacities, was highest in Los Angeles, with New York and Osaka next; it was lowest in Dhaka and Lagos. Megacity transportation per capita, also higher for North American megacities, was highest in New York, and lowest in Mumbai, Kolkata, Cairo and Dhaka.
Electricity prices for households were highest in Japan (estimated at 0.285 US$/kWh for Osaka and Tokyo). The US and the UK were next (0.261 US$/kWh for New York, Los Angeles, and London). Paris, Lima, and Manila households were charged 0.212, 0.197 and 0.183 US$/kWh correspondingly. Bogota, Rio de Janeiro, and Sao Paulo were charged 0.143 US$/kWh. Households in other megacities were charged 0.122 US$/kWh or less, with Tehran households charged the lowest rate at 0.004 US$/kWh.
Average annual temperature, average annual precipitation and average UHI intensity (based on published values) were also available. Average temperature was lowest in European, and highest in African megacities. Chennai (28.5 • C) and Bangkok (28 • C) were the warmest, and London (10.3 • C) and Moscow (4 • C) the coldest cities. Average annual precipitation was over 2000 mm in Mumbai and Dhaka, and below 10 in Cairo and Lima. The average UHI intensity was highest in Tehran (12 • C) and lowest in Lima (0.08 • C).
Finally, degree days were calculated as the sum of the positive differences between (a) a base temperature of 16 • C and the daily average outdoor temperature for the winter months, and (b) the daily average outdoor temperature and the base temperature of 16 • C [47]. Those were highest for Tianjin (30.2 days), Beijing (29.4 days), and Moscow (25.3 days); and lowest for Lima (5.3 days), Bogota (5.1 days), and Mexico City (4.6 days).
Turning to variable associations, Pearson correlation coefficients are shown in Table 3. Carbon footprint per capita was strongly and positively correlated with the country ecological footprint per capita (0.906), the GDD per capita (0.834), the electricity consumption per capita (0.756), the transportation energy consumption per capita (0.755), the energy consumption per capita (0.728), the GDP (0.723), and the megacity area (0.715). The other associations were consulted when selecting the independent variables for the regression model, to help avoid multicollinearity.
Turning to the log-transformed data, the logarithm of the megacity carbon footprint showed a relatively strong negative association with the logarithm of the megacity population density (R = −0.706). The logarithm of the megacity carbon footprint per capita showed a relatively strong positive association with the logarithm of the area of a megacity (R = 0.634), underscoring that urban sprawl favors per capita carbon emissions (e.g., as people have to commute more).
The logarithm of carbon footprint per capita showed a reasonably strong positive association with the logarithm of GDP (R = 0.693) as well as the logarithm of the GDP per capita of a megacity (R = 0.78). The logarithm of the carbon footprint per capita showed a weak negative association with the Gini income inequality index (R = −0.134); a strong positive association with the inequality-adjusted HDI index (R = 0.82); and a very weak negative association with the Palma ratio (R = −0.135).
The logarithm of the carbon footprint per capita and the logarithm of the energy consumption of a megacity showed a moderate positive association (R = 0.583), with some dispersion and outlying observations at lower values. A similar association was present between the logarithm of the carbon footprint per capita and the logarithm of the energy consumption per capita (R = 0.611). A stronger positive association was present between the logarithm of the carbon footprint per capita and the logarithm of the electricity consumption of a megacity (R = 0.693), and an even stronger positive association with the logarithm of the electricity consumption per capita (R = 0.781). The logarithm of the carbon footprint per capita showed a strong positive association with the energy consumption of transportation (R = 0.723), and a stronger positive association with the energy consumption of transportation per capita (R = 0.782). It is worth noting that the logarithms of the megacity total energy, electricity and transportation consumptions per capita were linearly associated with the logarithm of the GDP per capita, indicating that much of the information contained in the energy consumption variables was also present in the GDP. The logarithm of carbon footprint per capita showed a rather strong association with the logarithm of the ecological footprint of a megacity (R = 0.75). The per capita carbon footprint was very weakly associated with annual average temperatures, degree days and UHI intensity, but showed a mediocre negative association with the average annual precipitation (R = −0.428).
Guided by the literature findings and based on the descriptive trends and the associations, regression modeling was considered, with the logarithm of the megacity carbon footprint per capita being the dependent variable. The following groups of factors were considered, with the corresponding metrics being candidates for independent variables, log-transformed when appropriate:
The logarithm of the population density was selected to represent the impact of the size of a megacity on the carbon footprint per capita. It was thought that population density would be a better proxy of urban sprawl than population and area. It was decided that both the logarithm of megacity GDP per capita, and the low-income dummy be included, so that some differentiation in the effect of income levels on carbon footprint per capita be accounted for. For income inequality, the logarithm of the Palma ratio was chosen because it performed better than both the Gini index and the inequality adjusted HDI. The Palma ratio also provided complementary information on the impact of different income levels on the per-capita carbon footprint. Of the energy consumption variables, only household electricity prices had complete information for all cases and fitted the data very well. Both the per-capita ecological footprint (available for countries) and the megacity ecological footprint (computed as the product of the per per-capita ecological footprint and the megacity pollution) were considered as independent variables. It was thought that the latter would also act as a proxy of the size of a megacity. Of the climate related variables, the average annual precipitation was interestingly found to perform the best. The incorporation of interaction and power terms was investigated but found to add no value to interpretation.
The following model formulation was proposed, with signs indicating a priori expectations, log signifying natural logarithms, and variable names as shown in Table 1:

Results
This section presents the results of the statistical analysis. A multiple linear regression model was estimated, associating the logarithm of a megacity carbon footprint per capita with the megacity GDP per capita, the logarithm of the country Palma ratio, a dummy variable accounting for low-income countries, the logarithm of the country electricity price for households, the logarithm of the country ecological footprint per capita, and the average annual precipitation for a megacity.
On the available degrees of freedom, there were 32 complete cases. Taking into advisement the rule of thumb of 5 to 10 cases per independent variable, these 32 cases allowed for 3 to 5 variables (32/10 = 3.2, 32/6 = 5.3), although this recommendation may be too strict [48].
Population density was insignificant in all estimated models, despite a negative association between carbon dioxide emissions and population density that has been reported in the literature [49]. So, population density was dropped from further consideration, thinking that some of the urban sprawl effects would be picked up by the ecological footprint.
Four alternative multiple regression models with 4 to 6 independent variables were estimated, the best of which is shown in Table 4. There were no changes in the signs of the coefficients of the independent variables among the models, nor any major changes in their values, indicating that the formulations were likely to be correct. All models were estimated by ordinary least squares (OLS) with heteroskedasticity-robust standard errors (HC1 variant). Residual looked good, normality tests failed to reject the null hypothesis of normality, and Breusch-Pagan tests (more appropriate for smaller samples than White's test) failed to reject the null hypothesis of no heteroscedasticity. The coefficients of determination R 2 equaled 0.926, with the p-value of the corresponding F-tests equal to 0.000, indicating a very good fit (especially for cross-sectional data). Akaike, Schwartz and Hannan Quinn information criteria (with the latter two enforcing stricter penalties on loss of degrees of freedom) confirmed that the model shown in Table 4 had the best fit. Given that all Variance Inflation Factors (VIF) were less than five, multicollinearity among the independent variables did not present a significant problem.
The association of residuals with each independent variable was investigated graphically for possible endogeneity. The best model shown in Table 4 was also theoretically more appealing, including only per capita variables. Thus, it was confirmed as the preferred formulation and may be written as: log(megacity carbon footprint per capita) = −1.807 + 0.247 × log(megacity GDP per capita) − 0.623 × log(country Palma ratio) − 0.65 × low income country dummy − 0.223 × log(household electricity price for country) + 0.829 × log(country ecological footprint per capita) − 0.000228 × megacity average annual precipitation.
Now follows a discussion of the estimated regression coefficients of the model, starting with their signs and statistical significance. The intercept in regression should be ignored. All slope coefficient signs were as expected. As indicated by the t-test p-values, the coefficients of the logarithm of the Palma ratio, the low-income dummy, the logarithm of the household electricity price, and the logarithm of the per-capita ecological footprint were significant at a 99.6% confidence level or better. The coefficient of average annual precipitation was significant at a 98.2% confidence level. Finally, the coefficient of the megacity GDP per capita was significant at a 93.1% confidence level, but the variable was retained because it was theoretically important.
Next, the interpretation of the values of the coefficients is discussed. The megacity GDP per capita was positively associated with the per-capita carbon footprint. Since this was a log-log association, if the megacity GDP per capita increased by 1%, the megacity carbon footprint per capita would increase by 0.247%. As an example, an increase of the GDP per capita from 16 Income inequality as expressed by the country Palma ratio was negatively associated with the per-capita carbon footprint. Since this was also a log-log association, if the country Palma ratio increased by 1%, the megacity carbon footprint per capita would decrease by 0.623%. As an example, an increase of the Palma ratio from 1.7 (China) to 1.9 (US), which would constitute an increase of 100 × (1.9 − 1.7)/1.7 = 11.76%, would decrease the per-capita carbon footprint of Beijing (4.2 tons) by 4.2 × (0.623/100) × 11.76 = 0.308 tons or 308 kg Furthermore, low income countries were associated with a smaller per-capita carbon footprint of megacities. Since this was a log-level association, the model predicted that the per-capita carbon footprint of megacities in low-income countries would decrease by an impressive 100 × 0.826 = 82.6%. A discussion of the effects of income and income inequality follows in the next section.
Log-log associations held for the next two independent variables (only available at a country level). The logarithm of the household electricity prices was negatively associated with the logarithm of the per-capita carbon footprint of megacities, which meant that if the household electricity price or the per-capita ecological footprint increased by 1%, the percapita carbon footprint would decrease by 0.223%. The logarithm of the country ecological footprint per capita was positively associated with the logarithm of the per-capita carbon footprint of megacities, which meant that if the per capita ecological footprint of a country increased by 1%, the per capita carbon footprint of a megacity would increase by 0.829%.
Finally, the average annual precipitation of megacities was negatively associated with the logarithm of the per-capita carbon footprint of megacities. Since this was a log-level association, if the average annual precipitation increased by 1%, the per-capita carbon footprint of megacities would decrease by 100 × 0.000228 = 0.0228%. This represented a small, but very significant and perhaps surprising effect that is discussed in the next section. The average annual precipitation was related to the average annual temperature, but it performed much better in the models, appearing to be a better proxy for climate. It was also an excellent variable to include in the models because it was independent of all others.

Discussion
This section discusses the effects of income, income inequality, and precipitation. It also considers mitigation policies for cities.
Starting with income, it is mentioned that further to the models presented in the previous section, the squared logarithm of the GDP per capita was added to the best model (estimation not shown) to test for a possible EKC effect. The addition of that term improved the model fit marginally, but the signs of the log GDP and squared log GDP terms did not indicate an EKC effect.
Jorgenson, Schor and Huang [50] shed light on the findings of this research regarding income and income inequality. They investigated the relationship between carbon dioxide emissions and income inequality for US states, improving on the fact that most other studies take nations are their unit of analysis. They mentioned a variety of pathways through which income inequality may affect emissions. One pathway is based on a political economy explanation, arguing that the wealthy cause more emissions through their ownership of companies as well as political influence used to prevent more environmental protection. A second pathway relates to the marginal propensity to emit, which amounts to the consumption of carbon intensive goods varying with the level of income in various nonlinear, conflicting, and parallel ways. It was stressed that this second pathway does not correspond to a single hypothesis. Finally, a third pathway relates to the Veblen effect, as the wealthy are subjected to more intense consumption competition and longer hours of work, which increases energy consumption and leads to more emissions. The first and the third pathway especially help explain the positive association of the level of income with the carbon footprint found in this study.
Turning to the effect of income inequality, although the pathways presented by Jorgenson, Schor and Huang [50] supported the possibility that more inequality leads to more carbon emissions, they also indicated that their relationship may be complex and convoluted. Among other considerations, greater income equality may be associated with more emissions because of the existence of more people who are in the middle-class and have more carbon-intensive lifestyles. Jorgenson, Schor and Huang used two measures of income inequality, the Gini index and the income share of the top 10% and found state emissions to be positively associated with the second. The effect of the Gini index was insignificant, which was inconsistent with the marginal propensity to emit approach, i.e., the poor increasing their emissions as they reduce inequality by moving to the middle class. The insignificance of the Gini index agrees with the results of this study. Furthermore, it was not the Palma ratio that was found to be positively associated with carbon emissions, but the income share of the top 10%, so the results of this study are not directly comparable to Jorgenson, Schor and Huang.
The negative association of the carbon footprint with the Palma ratio does not agree with Liu, Zhang and Liu [51], although they investigated the carbon emissions not of individuals, but households, and not globally, but in China. Although they mainly used the Gini index to proxy income inequality, they cautioned against its lack of sensitivity to changes in high-and low-income levels. To this effect, they also tested the income share of the 10% and the Palma ratio. Their analysis based on nationwide panel data confirmed the significant positive effect of income inequality on household carbon emissions.
Hailemariam, Dzhumashev and Shahbaz [30] also found top income inequality to be positively associated with carbon dioxide emissions. Yet, they also found a negative association between the Gini index and carbon dioxide emissions, that may be explained by the marginal propensity to emit that is applicable to low-and middle-income levels, where the Gini index is considered relevant. As it was argued, it is not just the income, but also its distribution that determines carbon emissions.
A plausible explanation of the negative association between the Palma ratio and the carbon footprint per capita is given by the work of López, Arce and Serrano [52], who examined extreme inequality in Spanish households and its effect of environmental sustainability from 2006 to 2013. They aimed to model how changes in household consumption affect changes in the household carbon footprint, taking into account both the Gini index and the Palma ratio. They pointed out that the Palma ratio is useful for evaluating the impacts of inequality at the extremes of the income distribution, something that the Gini index fails to capture adequately. An important innovation of their study was that they used different inequality indexes for domestic and imported consumption. In separate regression models for household consumption groups, they found the domestic Palma ratio to be negatively and significantly associated with the carbon footprint of households. Although their results partially reflect substitution between domestic and imported con-sumption, they show that inequality at the income extremes captured by the Palma ratio may be negatively associated with the carbon footprint of households.
A further study that supports the negative association of income inequality with carbon emissions was that of Ravallion, Heil and Jalan [53]. Pooled and fixed effects regression models of carbon emissions were estimated in that study, with independent variables including a time trend, log population, log GDP, and squared log GDP, also as interaction terms with the Gini index. It was found that economic growth generally comes with higher emissions. It was also found that higher inequality both within and between countries was associated with lower emissions at given average incomes. A nonlinear relationship between carbon emissions and average income among countries was established. The authors concluded that a tradeoff was present between mitigation of climate change on the one hand, and economic growth and social equity on the other. This tradeoff would be ameliorated only when sufficiently high growth and/or low inequality was present.
The negative effect of the average annual precipitation on the carbon footprint of the megacities was an interesting finding. It is worth mentioning that the inclusion of temperature and the absolute latitude (being analogous to the distance from the equator and possibly proxying for climate) was attempted, but both added little to the best model, giving statistically insignificant results. There was no doubt that precipitation was the best geographic variable to include in the carbon footprint model.
Looking at specific cities, those with high precipitation included Mumbai (2431 mm), Dhaka (2148 mm), Manila (1970 mm), Shenzhen (1867 mm), and Jakarta (1855 mm). Cities with low precipitation included Lima (16 mm), Cairo (20 mm), Seoul (135 mm), Karachi (210 mm), and Tehran (230 mm). Cities that fit the linear association of precipitation with the logarithm of carbon footprint well included Beijing, Bogota, Buenos Aires, Dhaka, Istanbul, Jakarta, Kolkata, London, Los Angeles, Manila, Mexico City, Moscow, Osaka, Paris, Sao Paulo, Seoul, Tehran, and Tokyo. Cities with high precipitation and a low carbon footprint included Dhaka, Jakarta, Kolkata, Manila, and Mumbai. Cities with low average precipitation and a high carbon footprint included Istanbul, London, Los Angeles, Moscow, Paris, Seoul, Tehran, and Tianjin. Karachi, Lahore, and Lima were unusual observations with low average precipitation and a low carbon footprint. New York was unusual in that it had a carbon footprint that was too high relative to its average precipitation. Lagos was also unusual in that it had a carbon footprint that was too low relative to its average precipitation. All in all, cities had diverse characteristics that did not reveal a particular pattern that could explain the association of precipitation with the carbon footprint by revealing an overlooked factor that was left out of the model.
It would be reasonable for precipitation to have emerged as a significant predictor, if carbon footprint reflected the balance rather than the emissions of carbon. Carbon footprint values are estimated using carbon emission factor intensities [54], and rainfall may act as a carbon sink [55] by helping carbon emissions be reabsorbed by the soil. Furthermore, rain combined with humidity favor plant growth, and plants absorb carbon more efficiently when plentiful water and high humidity are available. Nevertheless, as long as the values of carbon footprint reflect solely carbon emissions and not the balance of carbon dioxide (directly or indirectly), this consideration is irrelevant.
Nordhaus [56] provided some insight as to the effect of precipitation, by investigating the relationship of macroeconomics and time-invariant geographic factors such as climate. These are statistically exogenous in that they affect but are not affected by socioeconomic factors (at least on a decadal time scale). Nordhaus pointed out that for most countries the averages of geographic variables cover such a wide area that are meaningless. When metrics of economic activity are measured at an aggregate country level, effects of factors such as the climate are averaged out. Historically, society has moved from climatic-sensitive farming into climate-insensitive manufacturing and services, so most productivity studies have ignored the role of climate as a determining factor or focused on geographic factors without intrinsic economic significance such as the distance from the equator (possibly acting as a proxy for climate). In fact, noneconomic factors such as the climate may affect economic growth more than economic policy and institutions.
Nordhaus's analysis focused on the intensity of economic activity per unit area rather than per capita and developed detailed gridded data, allowing him to define the concept of the Gross Cell Product (GCP) as the total net production of market goods and services in a region. Spatial disaggregation is usually done at the national, state, and provincial political subdivision level. Nordhaus used the lowest level available. Scaling was done to convert published data from political to geographical boundaries. Calculations indicated significant gains in accuracy from disaggregation, although for many low-income countries such as Nigeria, there were no regional economic data.
Nordhaus estimated a multiple regression with the logarithm of output per km 2 as the dependent variable. Independent variables included country effects (for 72 countries); mean annual and other temperature measures; mean annual and other precipitation measures; elevation measures; distance from coast (less than 50, 100 and 200 km); and variables for 27 soil types. Endogenous geographic variables such as coastal density and proximity to markets were omitted. Nordhaus reported that all independent variables were highly significant without showing coefficient signs and values. It was concluded that the density of economic activity is very strongly related to geographic conditions, especially temperature, precipitation, and coastal proximity. These findings provide justification for the fact that precipitation had an effect on the carbon footprint in this analysis, its negative impact perhaps representing the effect of transportation or air conditioning, both of which might be discouraged by increased precipitation.
Turning to policies, in a recent publication reviewing the consumption-based carbon footprint (CBCF) literature, Ottelin et al. [57] considered different urban types, and stressed the importance of including even the indirect global environmental pressures of cities in policy discussions. Carbon trading among cities, according to which net importer cities would require importing companies to purchase carbon credits from net exporter cities and use the funds to decarbonize production, may be more efficient and economical than focusing on national-scale studies.
Ottelin et al. [57] observed that when comparing the averages of the absolute CBCF without controlling for any background variables, urban areas tend to have higher footprints. The higher the level of urbanization, the higher the consumption-based emissions. This result seemed to hold regardless of the level of development of the country. The typical emission profiles tend to be similar regardless of the level of development: urban dwellers have more indirect, and rural dwellers have more direct emissions. Controlling for relevant socioeconomic background variables, such as income and household size, is important in order to estimate the per capita emissions correctly.
Ottelin et al. [57] noted that the policy recommendations of the CBCF literature lack consensus. Some authors support urban density policies (which was also tried, but remained an insignificant variable in this research) while others question their effectiveness irrespective of the geographic location or spatial scale of a study. Other authors debate sustainable urban and transport planning. Finally, others highlight that urbanization is an important driver of CBCF, particularly in developing economies. The missing consensus on urban density policies has motivated some authors to suggest tailored policies for urban, suburban, and rural areas (based on empirical findings). The existing literature allows the justification of various policy recommendations.
Ottelin et al. [57] mention suggestions for large-scale carbon sinks (like large parks or forests) in or near cities. Negative emission technologies such as carbon capture and storage are also mentioned in the literature. Local renewable energy production is encouraged. Furthermore, information campaigns, policy guidelines, and carbon footprint calculators for citizens are promoted as tools to change consumption behavior towards more sustainable lifestyles. The most obvious policy is direct advice for consumers and households regarding sustainable consumption and lifestyles. Tailored top-down policies allow differentiation between population segments, for example public transportation in dense urban cores and electric vehicles and solar panels in suburban areas.
These points constitute good advice for megacities. As Kennedy et al. [6,31] pointed out, megacities may be seen as innovation centers, presenting the opportunity of achieving high levels of resource efficiency that will help reduce the global environmental burden.

Conclusions
This research addressed an omission in the literature by modeling the per-capita carbon footprint of global megacities. The megacity GDP per capita and the ecological footprint per capita impacted the per capita carbon footprint of megacities positively. Megacities in low-income countries, and income inequality (expressed by the Palma ratio) were associated with a smaller per capita carbon footprint. Higher household electricity prices also translated to a smaller per-capita carbon footprint. Finally, megacities with more rainfall had smaller per-capita carbon footprints. To our knowledge, no other published work has estimated multiple regression models for the carbon footprint (or carbon emissions) of megacities globally.
Taking the carbon footprint into consideration when considering alternative policies has the advantage of accounting for both direct and indirect carbon emissions. Although policy recommendations in the literature lack a consensus, such policies could include carbon trading among megacities, the incorporation of large parks in megacities, the use of renewable energy production (e.g., with solar panels), and sustainable transportation measures (e.g., favoring electric mobility and encouraging more cycling and walking). Such policies should also be tailored to the urban realities of megacities. The results of this study indicate electricity price would be an effective policy tool in mitigating the carbon footprint, although the effect of economic policies addressing growth and income inequality is debatable. Geopolitically, megacities are areas of high global risk [6] and are emerging as important actors in climate policy [58]. Targeted policies are needed, but more policy autonomy should be left to megacities [27]. This work provides an initial guidance.
Despite the efforts of the authors, a multiple regression model estimated on crosssectional data is likely to suffer from misspecification and other biases. Ideally, all the dependent variables would be determined exogenously; in practice, other variables (that are hard to measure for megacities) are likely to affect the carbon footprint through their impact on some of the independent variables, and so act as confounders. If endogeneity is suspected and instrumental variable data are available, two-stage least squares could be employed to estimate better models. Further to this, the results of this research should be investigated and confirmed by collecting and analyzing longitudinal data, which is a non-trivial task for megacities and will likely require delving into yearbooks written in various languages.