Interannual Variation in Night-Time Light Radiance Predicts Changes in National Electricity Consumption Conditional on Income-Level and Region

Using remotely-sensed Suomi National Polar-orbiting Partnership (NPP)-VIIRS (Visible Infrared Imagery Radiometer Suite) night-time light (NTL) imagery between 2012 and 2016 and electricity consumption data from the IEA World Energy Balance database, we assemble a five-year panel dataset to evaluate if and to what extent NTL data are able to capture interannual changes in electricity consumption within different countries worldwide. We analyze the strength of the relationship both across World Bank income categories and between regional clusters, and we evaluate the heterogeneity of the link for different sectors of consumption. Our results show that interannual variation in nighttime light radiance is an effective proxy for predicting within-country changes in power consumption across all sectors, but only in lower-middle income countries. The result is robust to different econometric specifications. We discuss the key reasons behind this finding. The regions of Sub-Saharan Africa, Middle-East and North Africa, Latin America and the Caribbeans, and East Asia and the Pacific render a significant outcome, while changes in Europe, North America and South Asia are not successfully predicted by NTL. The designed methodological steps to process the raw data and the findings of the analysis improve the design and application of predictive models for electricity consumption based on NTL at different spatio-temporal scales.


Introduction
Over the last 20 years, the release of open-access night-time light (NTL) data with worldwide coverage has proven useful for estimating multiple aspects of human development at both a global [1] and a local scale [2].Different generations of data have been published by National Aeronautics and Space Administration (NASA) and National Oceanic and Atmospheric Administration (NOAA) (including the Defence Meteorological Satellite Program-Operational Linescan (DMSP-OLS), the Visible Infrared Imagery Radiometer Suite-day-night band (VIIRS-DNB) [3], and the upcoming Black Marble [4] products) and have been continuing to improve in data quality and increasing spatial and temporal resolutions.Extensive research has been carried out on NTL data, in a large range of applications, including as a proxy for electricity-related indicators (e.g., electricity demand [5][6][7][8][9][10], power supply reliability and outages [11,12], household electrification [13][14][15][16][17], electricity demand peaks visualisation [18]) and socio-economic development metrics (including economic growth [19], poverty detection [20], inequality [21] and greenhouse gas emissions [22]).
At the same time, a growing demand for reliable data to support decision-makers has been witnessed in recent years.The greatest roadblocks are found in the context of developing countries, where data availability is affected by a broad spectrum of issues, including the financial, technical, or infrastructure constraints to collect and maintain up-to-date data, but also data quality issues.This has been of particular relevance in the objective of tracking progress towards the United Nations' Sustainable Development Goals.In response to such challenges, novel approaches to complement the on-the-field data collection that is resource-consuming and often subject to potential errors are gaining significance and international support (e.g., using remote sensing for measuring agricultural yield [23] or monitoring deforestation [24]).
In this paper we present a number of exploratory findings in the prospect of using and validating NTL-based tools for the monitoring of electricity consumption at different spatial and temporal scales (as already explored in a number of recent contributions [8,[25][26][27]).The necessity to carry our exploratory analysis stems from a strong caveat characterizing approaches based on remotely-sensed data, namely that they should be used in parallel and not in substitution of more precise methodologies, as satellite output data may include some long-term, unaccounted-for bias which could significantly affect the results.For this reason, a continuous validation of those data is strongly suggested, so as to factor potential evolution of the energy system and data issues in the estimation model.
Specifically, we evaluate if and to which extent the increase in per-capita gross national income (GNI) leads to a significant variation of the relation NTL vs electricity consumption in its different components.Our hypothesis is that at higher levels of economic development, electricity consumption reaches a threshold where the lighting component becomes a marginal share of the Total Final Consumption.This is particualrly important given the forecasted growing importance of electricity in the global energy mix [28].We also analyze the strength of the relationship across regional clusters.We run log-log regressions analysis, implementing pooled-OLS (ordinary least squares), country fixed-effects, and country-year demeaning specifications.This is done subsetting observations for each income category when testing the income hypothesis, and by creating interaction terms when assessing the NTL-power consumption relationship across regional clusters.
The remainder of the paper is structured as follows: In Sections 2 and 3 we present the materials collected and methods designed and applied, respectively, including a discussion of the data and of the relative challenges which had to be tackled.Section 4 presents the log-log regression results for income-categories and regions for total final consumption (TFC) of electricity.In Section 5, the key findings and limitations of the study are discussed, and future research prospects are highlighted.The Appendix A includes regression results across further power consumption sectors and additional figures.

Materials
Remotely-sensed NTL data is derived from Suomi National Polar-orbiting Partnership (NPP)-VIIRS (Visible Infrared Imager Radiometer Suite) monthly composites [3].Images have been retrieved for the period between April 2012 and December 2016.Each raster file has a resolution of 15-arc seconds, corresponding to 450 m at nadir.The data is partially pre-processed at the source, i.e., it comes with a two-step correction to be be cloud-free and lunar illuminance-free.In particular, the day-night band (DNB) sensor collects data multiple times a day, both during the day and the night, and aggregates such images in daily snapshots, which are then averaged into monthly composites at the pixel-level, and days with cloud-cover disturbances are discarded.Each pixel contains information on the mean detected radiance (expressed in μW • cm −2 • sr −1 units).As discussed in the literature [29][30][31], irrespective of corrections, the raw data is affected by a range of issues which needs to be tackled by an appropriate processing procedure.These are discussed in the Methods section.
Electricity consumption data has been collected by the IEA World Energy Balance database [32], which represents the most reliable, standardized, and continuous available time series of energy data worldwide.For this study the electricity final consumption has been considered, by analyzing: (1) total final (shown in Figure 1), (2) residential and (3) commercial and public services consumption.The data is available with annual detail, and the most recent with a worldwide coverage area for the year 2016.
For this reason, as they are only partially overlapping with the night-time light data, the analysis will be limited to the 2012-2016 period.Other sources provide more up-to-date information but on a limited number of countries (mainly OECD countries), and therefore they are less useful for the global scope of this research.The total number of countries considered amounts to 109, including 13 for Central and Eastern Europe, 11 in East Asia and the Pacific, 24 for Europe, 18 for Latin American and the Caribbeans, 16 for the Middle East and North Africa, two for North America, five for South Asia, and 20 for Sub-Saharan Africa).A caveat stems from the classification of the different typologies of power consumption: While the IEA aims at providing standard-quality data, there might still be differences in the accounting and categorization of power consumption between countries.Finally, per-capita GNI (calculated with the World Bank Atlas method ( The Atlas conversion factor for any year is the average of a country's exchange rate for that year and its exchange rates for the two preceding years, adjusted for the difference between the rate of inflation in the country and international inflation; the objective of the adjustment is to reduce any changes to the exchange rate caused by inflation.(The World Bank (2018))) is acquired for each year and each country from the World Bank Data Portal (indicator NY.GNI.PCAP.CD).Based on thresholds reported for each year [33], we create a categorical variable with income categories, resulting in low-income (on average, n = 11), lower-middle income (n = 30), upper-middle income (n = 30), and high-income (n = 38) countries clusters.

Methods
The NTL data processing has been implemented within the Google Earth Engine interface [34], a cloud-computing Javascript console and interface for remotely sensed data elaboration (refer to the Source code section for the repository hosting the script, which allows results reproduction, updating, and parameters alteration).As discussed in [29], due to calibration issues in each monthly tile, a number of cells contain very small, sometimes negative values (which in radiance terms do not make sense), that are identified as sensor and calibration noise, rather than actual radiance.At the same time, the radiance data is affected by the albedo of land surface (which can lead to the observed radiance abnormally fluctuating across seasons due to vegetation phenology, snow, and ice cover [18,35]).As shown in Table 1, removing pixels with very small radiance values-which cannot be identified as stable anthropogenic light-lead to very significant drops in the light metrics.This is particularly relevant in snow-covered countries.To cope with both issues, after an empirical assessment of cells values across the world, cells with radiance value <0.25 have been set to zero.In the same fashion, disproportionately large-value pixels (of several orders of magnitude) are occasionally observed among a number of oil and gas-producing countries, mainly owing to flaring activity not being filtered out by the processing algorithm (refer to [36]).In fact, observing Landsat or Sentinel satellite imagery in the proximity of such pixels reveals the presence of extraction sites.Table 2 compares the median sum of light in top flaring producing countries (as according to data from the World Bank's Global Gas Flaring Reduction Partnership [37]) with and without the correction adopted in this study, i.e., where we set all pixels with a radiance value > 300 to 0, and it shows how such pixels in some countries (e.g., Algeria, Nigeria, Iraq and Iran) contribute to a very large share of the total detected radiance, and thus are prone to bias the analysis.Interestingly, the same procedure does not change drastically the value in two control oil and gas producing countries where substantially less flaring is reported (United Kingdom or India).Subsequently, the raster files' projection has been adjusted to a world cylindrical equal area (Lambert) projection (SR-ORG:8287) to enable a consistent computation, since the original file came in the standard EPSG:4326 projection, where the pixel resolution is of 450 m (the linear resolution equivalent of 15 arcseconds) at nadir but it changes as one moves away from the equator.The median annual value at each pixel has been considered, and the sum of the total radiance within each country has then been calculated for each country.The median has been favoured over the mean to discard anomalous months in terms of detected radiance.The figure has been divided by the national population, and has been defined as R pc ct .The annual values of R pc ct for each country are represented in Figure 2.
The yearly country-level figures have then been joined with the IEA electricity consumption database and with per-capita GNI measures.To assess the predictive power of NTL over actual consumption levels, log-log regressions have been performed using the following general specification: where log(Electr ct ) is the natural logarithm of electric consumption for the given category, log(NL ct ) is the natural logarithm of NTL, and β 1 is the corresponding estimated regression coefficient.Given the log-log specification-the β 1 regression coefficient is to be interpreted as the predicted % change in electric consumption in response to a 1% change in NTL intensity.µ c represents the country fixed-effects (controlling for the mean level of NTL radiance in that country, allowing the regression coefficient to be interpreted as a within-estimator), ρ t are the year fixed-effects (controlling for world-wide year events e.g., particularly snowy years affecting the albedo, or to changes in the satellite calibration), c t is a vector of stochastic error terms, t are years, c are countries.Three comparative specifications, including only the NTL coefficient, adding country fixed-effects, and adding also year fixed-effects, have been run.The main purpose of the analysis is investigating to which extent the yearly per-capita average radiance is capable of predicting interannual variations across different sectorial per-capita electricity consumption figures.Observations have been initially grouped into the income categories defined by The World Bank (low-income, lower-middle income, upper-middle income, and high-income economies).
The exercise has then been reiterated, but in this case the figures have been clustered by macro-region, where the world has been split into seven macro-areas, including: Central and Eastern Europe, Middle-East and North Africa, Sub-Saharan Africa, South Asia, East Asia and the Pacific, and the Americas.In this, case interaction terms between the categorical variables for each region and the log(NTL) variable have been introduced in the specification-as in Equation ( 2)-to assess the heterogeneity in the effect across regional clusters.
Here, β 1 is the usual coefficient measuring the main effect of the log(NTL); φ 3 is a vector of coefficients representing the main effect estimators each region (regions are introduced as levels of a categorical variable, with r identifying each region), while φ 2 is the vector of coefficients quantifying the interaction effects between the log(NTL) and each region, and thus representing the coefficients of interest for our research question.
Along with the numerical analysis, plots and maps of the results have been produced to highlight the main findings visually.

Subset Regressions by Income Category
Here we report the result of log-log regressions (with heteroskedasticity-robust standard errors) run to assess the average explanatory power of NTL over the changes in the actual consumption levels within World Bank-defined income categories.The coefficients of the log-log specifications express the % change in the response variable as a result of a 1% change in the explanatory variable of interest, and thus the slope.Differentiating the specification with respect to the response variable results in δY Y = β 1 δX X , where the βs measure the ceteris paribus % predicted change in the response variable (here, electricity consumption) for a 1% change in the explanatory variable of interest, i.e., sum of NTL radiance.This is what in economic terms is referred to as an elasticity.In both cases, reported regressions are those run with TFC as the response variable.The Appendix A reports results for the two other sub-categories of power consumption considered (commercial and public and residential).Figure 3 shows the scatterplot and linear fit for the log-log specification by income category, where each observation across the 2012-2016 period is reported for each country.Longitudinal regression results for TFC are reported in Tables 3-6.Column 1 refers to a pooled OLS log-log specification, column 2 adds country-fixed-effects (which can be interpreted as a demeaning factor, i.e., a set of dummy variables controlling for the mean electricity consumption of each country), and column 3 also includes year-fixed-effects.Regression results show that-when linking TFC and sum of NTL radiance in a pooled OLS regression-the relationship is positive and highly significant across income categories, with estimates in the 0.67-1.53range.The highest coefficient is found for low-income countries (Table 3), and the lowest for high-income countries (Table 6), highlighting a general gradual decoupling of the total final electricity consumption from the lighting component across income categories.Nonetheless, when switching to fixed-effects specifications and therefore looking at the within-country interannual variations, the high statistical significance of the NTL coefficient persists only for lower-middle income countries (Table 3) and-limited to the country fixed-effects only specification-to upper-middle income countries (Table 5).While detailed discussion of this finding is offered in the Discussion section, we believe it likely reflects the fact that low-income countries might be affected by issues of data quality and low interannual heterogeneity in the five-year period for which the data is available, while in higher-middle and high income countries a decoupling in the relationship between light radiance and actual power consumption is expected (as partially confirmed from the pooled OLS results).

Dependent Variable:
Log of total final consumption of electricity It is necessary to remark that in regression specifications (1) (log-log), the log of NTL coefficients show the ability to predict TFC from NTL, no matter the year or the country under analysis.Therefore, the estimators reflect the slopes of the lines of Figure 3. On the other hand, for specifications (2) (country fixed-effects, where the country-specific mean value across years is subtracted from each observation), the regression coefficients show the average ability to predict TFC from NTL within each country in that income category, i.e., they do not explain the overall heterogeneity in TFC, but only the changes from year to year within each country (the coefficient is thus a within-estimator).Since in the five-year period under examination we generally observe little variation within each country (at least in relative terms, i.e., compared to the large amount of variation between countries), adding country fixed-effects dramatically increases the fraction of total variance explained.Nonetheless, there still is some residual standard error to be explained, and we test the capacity of NTL to address precisely that residual (small relatively to the between county-heterogeneity, but here considered in absolute terms).The fact that variables are not dropped due to potential multicollinearity is reassuring in this sense.Figure 4 shows the analysis of specifications (2) with scatterplots and linear fits for (A) lower-middle income and (B) high-income countries.In particular, it shows how the within-country variation in TFC for a number of lower-middle income countries (but not all of them) is predicted effectively by NTL, while for the case of high-income countries, relationships appear flat, underpinning the (in)significance of regression coefficients of specifications (2).Finally, when considering specifications (3) (country and year fixed-effects), on the top of country fixed-effects we also add year dummies to control for those factors which are constant to all countries in each year but change from year to year, i.e., mostly data quality due to calibration of satellite.Concerning regression diagnostics, it must be noted that the R 2 coefficient being close to 1 in the fixed-effect regression is caused by the limited amount of heterogeneity in the within-country data for the change in the TFC of electricity between 2012 and 2016.Adding fixed-effects to the specification makes the coefficient for the log of NTL fall substantially and the R 2 go close to 1.The residual standard error metric shows that there is indeed a fraction of unexplained heterogeneity (the R 2 should not be taken as a reference in this specific setting), the high statistical significance of the log(NTL) coefficient even under the fixed-effects specification in the context of lower-middle income countries is the most interesting result, because it shows that NTL are capable of capturing those small changes in the level of consumption within countries between one year and the other.The F Statistic-useful for comparing specifications as it sheds light on the significance of multiple coefficients at the same time-underpins our results.For instance, for low-income countries it shows that switching from the country-fixed-effects to the year and country-fixed-effects specification has a positive impact on the capacity of the model to explain the response variable, and thus that year fixed-effects contribute to explaining more variance than the country fixed-effects only.
Figure 5 shows the distribution of the share of commercial and public electricity consumption (COMMPUB) and residential electricity consumption (RESIDENT) consumption over TFC by the four income categories considered.It shows how-as income increases-the spread between the two categories tends to close, with COMMPUB becoming dominant or comparable to RESIDENT in high-income countries.
The graph supports the understanding of our results, as it sheds some light on the varying significance of lighting across income categories.Detected NTL radiance stems from public street and road lighting, and from private buildings.Satellite data has been found to be suitable to detect both public [38,39] and indoor light [40], and therefore can potentially shed useful information on both components.
However, it has to be remembered that differences exist across countries in the procedures to collect and elaborate statistical data.For this reason, although the data are grouped in a common framework, some potential minor differences remain in the sector classifications of energy consumption.Thus, we believe that the use of total final consumption is a better indicator across multiple countries.
For the same reason, to draw detailed conclusions on the type of lights that are actually captured would require a more detailed analysis at country or local scale.

Regional Clustering
To test if significant differences affect different regions of the world in the TFC-NTL relationship (e.g., due to cultural, infrastructure and development clustering), here (Table 7) we report the results of the regressions specifications implementing interaction terms between each macro-region and NTL intensity.Similarly to income-category clustering, Figure 6 shows the scatterplot and linear fit for the log-log specification by region, where values for each countries are averaged across the 2012-2016 period.
Results show that once fixed-effects are added, the main effect of the interaction term for log(NTL) becomes insignificant, but the interaction effects across log(NTL) and region remains positive and highly significant in Sub-Saharan Africa, Middle-East and North Africa, Latin America and the Caribbeans, and East Asia and the Pacific, with elasticities in the 0.34-0.75range.In particular, the highest effect is found for East Asia and the Pacific and the lowest for Middle East and North Africa.Conversely, interaction coefficients for North America, South Asia and Europe are insignificant.Conversely, changes in Europe, North America and South Asia are not successfully predicted by NTL.
Comparing results with regressions for other consumption categories found in the Appendix A reveals substantially similar patterns in terms of significance.The key discrepancies is that while lower-middle income countries exhibit highly significant coefficients also for residential (0.21) and commercial and public (0.34) sectors, RESIDENT AND COMMPUB coefficients for upper-middle income countries are not significant even under the country-fixed-effects only specification.On the other hand, we observe a 10% significant coefficient for the country-fixed-effects specification for commercial and public consumption in high-income countries, although the observed coefficient (0.01) is particularly low and thus not deemed a robust result.Across regional-level outcomes, for the case of residential consumption, the same results that were found for TFC in terms of which interaction term coefficients are observed for the RESIDENT variable as statistically significant.However, in this case only two regions exhibit a very high significance, namely East Asia and the Pacific (0.59), and Sub-Saharan Africa (0.36).Conversely, with regards to commercial and public consumption, only NTL in East Asia and the Pacific are found to be highly significant predictors of interannual, within-country consumption level changes, with a 5% significant coefficient of 1.17.

Discussion and Conclusions
We have assessed the capacity of exploiting the interannual variation in VIIRS night-time light radiance to predict changes in the national electricity consumption change during the 2012-2016 period.Our results for TFC have highlighted that the approach is successful in lower-middle income countries, for which a 0.23 coefficient is found in the country-year demeaning specification, and only partially in upper-middle income countries (country-fixed-effects coefficient of 0.13 only significant at a 10% level), but not across other income categories.We believe that low-income countries might be affected by (i) issues of data quality (ii) little interannual heterogeneity in the five-year period for which the data is available, while in higher-middle and high income countries a decoupling in the relationship between light radiance and actual power consumption is the hypothesized reason for the insignificance of the regression coefficient.This result finds evidence in the pooled OLS specifications, which look into overall changes and not into the within-country variation.Here, the magnitude of the regression coefficient is declining as one moves up across income categories.This is in line with the initial hypothesis.In a similar fashion, we have found that macro-regions where the relation does not hold are those with the countries with the highest consumption and levels of per-capita incomes across the world, namely North America and Europe, but also the exception of South Asia (including India, Bangladesh, Pakistan, Sri Lanka, and Nepal).Conversely, the approach seems particularly suitable for East Asia and Latin America, for which the highest and most significant coefficients are found.

Research Prospects
One of the key objectives of this paper has been to carry out exploratory analysis for the capacity of NTL to predict the change in power consumption within a country.This is particularly interesting in the perspective of applying the methodology at finer spatio-temporal scales (e.g., in a number of developing countries at a monthly scale) and built an ad-hoc model which, once appropriately calibrated on historical data, can produce continuous estimates of energy consumption as new data is published.The upcoming release of a new generation of high-resolution, monthly NTL data product, the VIIRS-based VNP 46 Black Marble [4], could allow mitigating a substantial part of the issues in the raw data discussed in the Methods section of this paper.Furthermore, the validation with region or province level data on consumption in tight cooperation with local power supply utilities could render the model even more effective and insightful for the specific location under examination.At the same time, the national scale, yearly estimation approach could also be improved through the development of better algorithms to clean data and remove non-electric detected light, such as flaring, calibration noise, and albedo and land cover.Table A3.Regression for COMMPUB-Higher-middle income countries.

Dependent Variable:
Log of Total Final Consumption of electricity

Figure 2 .
Figure 2. Average annual sum of light per capita, worldwide.

Figure 4 .
Figure 4. NTL vs. TFC for (A) lower-middle income and (B) high-income countries (annual values between 2012-2016).Each color represents a separate country.

Figure 5 .
Figure 5. Share of electricity consumption sectors over TFC, by income category.Commercial and public electricity consumption (COMMPUB), residential electricity consumption (RESIDENT).

Table 1 .
Median sum of light in selected countries countries with and without floor correction.

Table 2 .
Mean sum of light with and without correction in top flaring countries and in control countries.

Table 3 .
Low-income countries regression for TFC.

Table 4 .
Lower-middle income countries regression for TFC.

Table 5 .
Upper-middle income countries regression for TFC.

Table 6 .
High-income countries regression for TFC.

Table 7 .
Region clustering regression for TFC.

Table A6 .
Regression for RESIDENT-Low-income countries.

Table A7 .
Regression for RESIDENT-Lower-middle income countries.

Table A8 .
Regression for RESIDENT-Higher-middle income countries.

Table A9 .
Regression for RESIDENT-High-income countries.