Can Nighttime Light Data Be Used to Estimate Electric Power Consumption? New Evidence from Causal-E ﬀ ect Inference

: Nighttime light data are often used to estimate some socioeconomic indicators, such as energy consumption, GDP, population, etc. However, whether there is a causal relationship between them needs further study. In this paper, we propose a causal-e ﬀ ect inference method to test whether nighttime light data are suitable for estimating socioeconomic indicators. Data on electric power consumption and nighttime light intensity in 77 countries were used for the empirical research. The main conclusions are as follows: First, nighttime light data are more appropriate for estimating electric power consumption in developing countries, such as China, India, and others. Second, more latent factors need to be added into the model when estimating the power consumption of developed countries using nighttime light data. Third, the light spillover e ﬀ ect is relatively strong, which is not suitable for estimating socioeconomic indicators in the contiguous regions between developed countries and developing countries, such as Spain, Turkey, and others. Finally, we suggest that more attention should be paid in the future to the intrinsic logical relationship between nighttime light data and socioeconomic


Introduction
In the field of the social sciences, we often use secondhand data released by the government or research institutions. When studying smaller administrative units, data are often not available. Meanwhile, data are often not timely, and their time span is often large [1]. Croft discovered that nighttime light data can be used as an indicator of human activities [2]. Recently, scientists began to use remote sensing data to estimate the data of social activities. Among these, the nighttime lighting data released by the U.S. National Oceanic and Atmospheric Administration (NOAA) is the most used. The data were collected by the Operational Linescan System (OLS) flown by the U.S. Air Force Defense Meteorological Satellite Program (DMSP) from 1992 to 2012. In 2013, Visible Infrared Imaging Radiometer Suite (VIIRS) data products replacing the DMSP and OLS appeared, with higher resolution and grayscale, effectively avoiding pixel mutation and saturation, and eliminating the interference of stray light, lightning, moonlight, and cloud cover [3].
Nighttime light data are an important symbol of human activity, and they are also the most direct feature of the urbanization of human society in the spatial dimension. The multidimensional and multiscale study of global and regional nighttime light data is helpful to understand the connection between global environmental change and the human environment. In recent years, nighttime light data have been increasingly used to aid the rapid assessment and comprehensive spatial analysis of key elements of urbanization, such as urban land cover, population estimates, and economic activity at global or regional scales [4]. Because of their global extent and standardized production and the relative ease with which DMSP nighttime light data can be accessed, they have been widely used as a proxy for other more difficult means of measuring these economic and social indicators [5]. The logic is that urban processes are highly correlated with each other [6]; if one process or activity can be measured well, it can be used to make reasonable estimates of others. In the existing research, the methods of estimation are linear regression, logic regression, and power regression. Finally, these models can achieve a very high coefficient of determination or coefficient of correlation. Some of them can reach 99%. However, this is also a signal of spurious regression in econometrics. Therefore, we want to know whether there is a causal relationship between nighttime light data and these social indexes, from the perspective of econometrics.
In this paper, we propose a causal-effect inference method to test whether nighttime light data is suitable to estimate socioeconomic indicators and select electric power consumption as the research object, and use econometric methods to test the relationship between nighttime light data and electric power consumption on a country level. First, in Section 2 we summarize the research results of estimating electric power consumption using light data. Then we introduce data resources and research objects, including countries and time span. The testing process is described in Section 3. The empirical results are shown in Section 4. In Section 5, we discuss the empirical results. Finally, the conclusions and suggestions are made in Section 6.
Nighttime light intensity data were first used to estimate population, GDP, and electric power consumption [4,36,37]. It is important to test a regression model based on some information criteria, such as AIC, BIC, and so on. However, they did not care about these information criteria in existing papers except R Square. They made the prediction based on a high adjusted R Square. An adjusted R Square is the most important criterion. It is safe to interpret night lights as an indication of anthropogenic activity specifically attributable to the use of electric lighting. In previous years, due to the lack of statistical data, some scholars used light data to estimate energy consumption in poor countries and regions [4,12,13]. With the improvement of statistical data at the national level, scholars began to use light data to estimate the energy consumption of city-level or smaller units, mainly electric power consumption [8,14,20,22]. According to the existing literature, the relationship between nighttime light intensity and electric power consumption is the earliest and most studied. Therefore, we choose electric power consumption as the research object of this paper. The research process proposed in this paper can be used to test other objects, such as GDP, metal stocks, carbon emissions, and so on.

Method
A causal-effect inference method is proposed, and three kinds of econometric and statistical analysis will be performed, namely correlation analysis, cluster analysis, and causality analysis. First, we conducted correlation analyses on the time and space dimensions. Meanwhile, based on the results of the spatial dimension correlation analysis, a clustering analysis of samples was carried out. Then, we selected three kinds of panel data analysis methods: A panel unit root test [37][38][39][40], a panel cointegration test [40,41], and a panel causality test [42][43][44]. The test process can be seen in Figure 1 and contains seven steps in total. The software of the test process is mainly R and ArcGIS in this paper.
Step 1: First of all, we collected data of socioeconomic indicators from released statistical databases. Meanwhile, nighttime light data were downloaded from the NOAA website. According to the spatial scale of the socioeconomic indicators, the pre-processing of the nighttime light data was carried out. For example, if we all collect national annual data, the lighting data should also be aggregated into the national annual level (Equation (1)): Step 2: We calculated the correlation coefficient and tested spatial dependence between the two series. First, we calculated the annual correlation coefficient by using annual data from countries (Equation (2)). Then we used annual data from each country to calculate the country's own correlation coefficient (Equation (3)). Then, the correlation results were analyzed.
where in Equations (2) and (3), Cov and Var represent the covariance matrix and the variance, respectively; R t denotes the correlation coefficient between X and Y at time t; X t denotes the data set vector of variable X at time t; Y t denotes the data set vector of variable Y at time t; R i denotes the correlation coefficient of the i-th country; X i denotes the data set vector of the i-th country's variable X; and Y t denotes the data set vector of the i-th country's variable Y. Step 2: We calculated the correlation coefficient and tested spatial dependence between the two series. First, we calculated the annual correlation coefficient by using annual data from countries (Equation 2). Then we used annual data from each country to calculate the country's own correlation coefficient (Equation 3). Then, the correlation results were analyzed.
( , ) = ( , ) where in Equations 2 and 3, Cov and Var represent the covariance matrix and the variance, respectively; Rt denotes the correlation coefficient between X and Y at time t; Xt denotes the data set vector of variable X at time t; Yt denotes the data set vector of variable Y at time t; Ri denotes the correlation coefficient of the i-th country; Xi denotes the data set vector of the i-th country's variable X; and Yt denotes the data set vector of the i-th country's variable Y.
Step 3: Meanwhile, we firstly tested the spatial dependence of the correlation coefficients based on spatial statistic methods [45]. The method for spatial dependence testing was GW statistics in this  Step 3: Meanwhile, we firstly tested the spatial dependence of the correlation coefficients based on spatial statistic methods [45]. The method for spatial dependence testing was GW statistics in this paper [46]. This step was optional. If it was independent of the spatial dimension, we jumped to Step 4. If there was spatial dependence, we needed to classify them. There are many methods of spatial clustering analysis. Which method to choose depended on the results of the correlation analysis. In this paper, we conducted a cluster analysis of the spatial dimension using a natural breaks classification method based on the GW correlation coefficient [46,47]. The number of clusters was determined by the algorithm itself. In all the next steps, we examined the whole sample and each cluster separately.
Step 4: Stationarity or cointegration is necessary for a causality test [44]. In the test process of non-stationarity panel data, Levin and Lin found that the limit distribution of these estimators must obey a Gaussian distribution. These results were also applied to heterogeneous panel data, and an early version of the panel unit root test was established. Later, after improvements by Levin et al., the Levin Lin and Chu (LLC) method of checking the unit root of the panel was put forward [39]. Levin et al. pointed out that this method allows for different intercepts and time trends, heteroscedasticity, and high-order series correlation, and is suitable for a panel unit root test for medium dimensions (time series between 25 and 250, cross-sectional numbers between 10 and 250). Lm et al. also proposed the Im Pesran and Shin (IPS) method to test the unit root of the panel [40], but Breitung found that the IPS method was very sensitive to the setting of restrictive trend, and proposed the Breitung method to test the unit root of the panel [48]. Maddala and Wu also proposed the unit root test method for ADF-Fisher and PP-Fisher panels [49]. So, which test method to choose depends on the panel data structure. After completing the stationarity test, the next step is shown in Table 2. When both X and Y were stationarity, we chose to go directly to Step 6. When both X and Y were not stationarity, we chose to proceed to Step 5. When either X or Y was stationarity, there was no causal relationship between them. Table 2.
Step selection rules.

Stationarity
Step 6 End No Stationarity End Step 5 Step 5: If there is a cointegration relationship between two non-stationarity series, a causality test can still be carried out between them. Since Pedroni proposed the panel cointegration test [40], there have been plentiful research results using it. At present, Pedroni's, Kao's, and Johansen's tests are the main methods of balancing panel data. In this paper, we chose three methods. If there was a cointegration relationship between X and Y, the next step was to proceed to Step 6. If there was no cointegration relationship between X and Y, we did not think there was a causal relationship between them.
Step 6: How to decide causality between two related variables is a difficult and important issue for economists [44]. Granger pointed out that the past and present may cause the future, but the future cannot cause the past [50]. Because the data in this paper are panel data, a panel Granger causality test method was adopted. Dumitrescu and Hurlin proposed a test of Granger noncausality for heterogeneous panel data models [43]. Lopez and Weber implemented a procedure proposed by Dumitrescu and Hurlin for detecting Granger causality in panel data sets [42]. In this paper, we chose this method to test causality between electric power consumption and nighttime light data. The original hypothesis and alternative hypothesis are as follows: Original Hypothesis: There is no causal relationship between X and Y.
Alternative Hypothesis: There is causal relationship between X and Y.
The test equation is shown as Equations (4) and (5). For simplicity, the individual effects α i . are supposed to be fixed in the time dimension. Both individual processes nightlight i,t . and socioeconomic i,t . are given and observable. We assume that lag orders k are identical for all cross-section units of the panel and the panel is balanced. Besides, we allow the autoregressive parameters γ Step 7: By analyzing the empirical results, we made a judgment on the relationship between X and Y. The cause of noncausal judgment needs further analysis. At the same time, the results of this study provide some direction and reference for other studies.

Data
There have been some studies on the relationship between energy consumption and nighttime light data at the national level, provincial level, and city level [4,8,[16][17][18]21]. Because there is a strong correlation between nighttime light and electric power consumption, these studies mainly use light data to estimate energy consumption. We want to evaluate the causal relationship between them. So, we must use statistical data. However, electric power consumption data are often available only at the national level. Statistical objects tend to be large-scale administrative units, such as countries, provinces, etc. Reliable statistics serve as the foundation and starting point for social science research of any country [1]. Considering the availability of data, we chose the national level as the test object. Based on the data released by BP [36], we finally selected 77 countries as samples. These 77 countries are the world's major countries regarding electric power consumption, accounting for more than 90% of the world's total consumption [36].
Meanwhile, nighttime light data is available on the NOAA website. In this paper, we use DMSP stable light data, obtained by averaging the annual visible and gray values after eliminating the impact of accidental noise, such as clouds and fire. With increased satellite lifetimes, sensor aging will occur, so old satellite sensors will be replaced every few years. Nighttime lighting data in this paper obtained from 1992 to 2012 were extracted from different satellites. Because of the replacement of old with new sensors, the data collected by the two satellites for one year were collected at the same time in some years. Because of different sensor settings, the data collected by different satellites in the same year are not always comparable. Therefore, we got nighttime light data from multiple satellites by different aging sensors in the same year. Such problems make it impossible for us to directly use unprocessed light data. The nighttime light data correction method proposed by Elvidge et al. was used to correct the light data [3]. Since the annual data are used in the electric power consumption data, the annual nighttime light sum was also used in the lighting data (Equation (1)).

Correlation Analysis
We used annual data from 77 countries to calculate the correlation between global nighttime light intensity and electric power consumption. In the global results, there was indeed a high correlation between them. In the time dimension, the correlation coefficient between nighttime light intensity and electric power consumption decreased year by year ( Figure 2). The correlation decreased from 0.98 in 1992 to 0.88 in 2012. From 1992 to 2005, the correlation decreased slightly, and increased in 1997 and 1999. After 2005, the correlation between global nighttime light intensity and electric power consumption decreased faster.
In the spatial dimension, the correlation between them also had spatial heterogeneity. In Southeast Asia, Africa, the Middle East, and South America, the relationship between them was positive. However, in North America and Europe, the relationship between them was negative. The correlation coefficients of developing countries were generally positive and high; China had the highest, at 0.9915. But developed countries were generally low or showed a negative correlation; Canada had the largest negative correlation, at −0.7919; Nordic countries were close to 0; and some European countries near Africa had higher correlation coefficients, such as Spain and Greece (Table 3). Using to the spatial classification method [45], we divided the whole sample into seven groups. The results are shown in Figure 3. In the spatial dimension, the correlation between them also had spatial heterogeneity. In Southeast Asia, Africa, the Middle East, and South America, the relationship between them was positive. However, in North America and Europe, the relationship between them was negative. The correlation coefficients of developing countries were generally positive and high; China had the highest, at 0.9915. But developed countries were generally low or showed a negative correlation; Canada had the largest negative correlation, at -0.7919; Nordic countries were close to 0; and some European countries near Africa had higher correlation coefficients, such as Spain and Greece (Table  3). Using to the spatial classification method [45], we divided the whole sample into seven groups. The results are shown in Figure 3.  Note: * Denotes rejection of null hypothesis at 10% significance levels; ** Denotes rejection of null hypothesis at 5% significance levels; *** Denotes rejection of null hypothesis at 1% significance levels. The mean correlation coefficient of the whole sample was 0.4290; by analogy, Group 1 was 0.9658, Group 2 was 0.9028, Group 3 was 0.8252, Group 4 was 0.6945, Group 5 was 0.2589, Group 6 was -0.1186, and Group 7 was -0.5432 (Table 4). The highest was China and the lowest was Canada. The standard deviations of groups with a larger mean were smaller, such as Groups 1 and 2; those with smaller means were larger, such as Groups 5 and 7.  The mean correlation coefficient of the whole sample was 0.4290; by analogy, Group 1 was 0.9658, Group 2 was 0.9028, Group 3 was 0.8252, Group 4 was 0.6945, Group 5 was 0.2589, Group 6 was −0.1186, and Group 7 was −0.5432 (Table 4). The highest was China and the lowest was Canada. The standard deviations of groups with a larger mean were smaller, such as Groups 1 and 2; those with smaller means were larger, such as Groups 5 and 7.

Stationarity Test
Three methods (the HT, LLC, and ADF tests) were used in this study. If the cross-sectional dimension of panel data is large and the time dimension is small, it is called short panel, and vice versa. The whole sample group is short panel, which is tested by the HT and the ADF methods. From Group 1 to Group 7, all groups are long panel, which are tested by the LLC and the ADF methods. The results are shown in Table 5. Following the principle of majority, when the test results show that the series is stationarity many times, it means that the series is stationarity, and vice versa. In the nighttime light intensity series, the whole sample group, Group 4, Group 5, Group 6, and Group 7 were stationarity; Group 1, Group 2, and Group3 were non-stationarity. In the electric power consumption series, Group 3, Group 5, Group 6, and Group 7 were stationarity; the whole sample group, Group 1, Group 2, and Group 4 were non-stationarity.
Based on the stationarity test results, Groups 5, 6, and 7 could be used for the causality test. Groups 1 and 2 needed to use the cointegration test ( Table 6). The test process for Group 3 and Group 4 did not go on to the next step, as there was no causal relationship between nighttime light intensity and electric power consumption in the whole sample group, Group 3, and Group 4. (b)

Cointegration Test
In this study, we used three methods to test cointegration: The Kao test, the Pedroni test, and the Westerlund test. The results are shown in Table 7. By comparing three panel data cointegration test results, there was a cointegration relationship between nighttime light intensity and electric power consumption levels in Group 1 and Group 2. So, two groups could be used for panel data causal analysis in the next step.

Causality Test
For the whole sample, Group 3, and Group 4, a panel Granger causality test could not be performed, because they did not meet the basic conditions of the causality test. Therefore, we could not judge whether there was Granger homogeneous causality in these groups. Based on results of Group 2 and Group 7, electric power consumption was the Granger cause of light intensity. But nighttime light intensity was not the Granger cause of electric power consumption. For Groups 1, 5, and 6, electric power consumption was the Granger cause of nighttime light intensity, and nighttime light intensity was also the Granger cause of electric power consumption (Table 8).

Discussion
According to the correlation results, there was spatial heterogeneity between power consumption and nighttime light data. Developing countries, such as China and Vietnam, generally had a high and positive correlation; developed countries generally had a low or even negative correlation. According to the causal analysis, there was no causal relationship in the world as a whole, but there was local causality. In developing countries and regions with a strong correlation, such as Asia, South America, and Africa, the results showed that there was causality between power consumption and nighttime light intensity; in developed countries, there was no causality between power consumption and nighttime light intensity. The reasons are mainly the following: • First, the impact of a country's electric power consumption structure on the estimation results; the electricity structure of developed countries is more complex, and the electricity structure of developing countries is simple. • Second, the power supply structure of developed countries also has a greater impact; the power supply channels of developed countries are diversified and the proportion of renewable energy is higher, such as in the Nordic region, while developing countries generally still use hydroelectric or coal-fired power generation (Figure 4).

•
Third, there is a spatial spillover effect of nighttime light, near Africa and the Middle East; some European countries have strong positive correlation, such as Portugal, Spain, Turkey, and Ireland, while other European countries have weak or negative correlation (Figure 3).

Conclusions
In order to evaluate the applicability of nighttime light data in estimating socioeconomic indicators, we propose a causal-effect inference method to test the relationship between nighttime light data and socioeconomic indicators. According to the method, we found evidence to support applications of nighttime light data in the estimation of socioeconomic indicators. At the same time, we find that some conclusions are of great significance to the application of light data. The main conclusions are as follows:  In general, we can draw some conclusions. First, nighttime light data can be used to estimate power consumption intensity in some areas. The estimates of developing countries regions, such as Southeast Asia and the Middle East, are more accurate. Using nighttime light data to directly estimate the economic and social indicators of developed countries has a large error. Second, in the contiguous regions between Africa, the Middle East, and Europe, such as Spain and Turkey, the nighttime light spillover effect is relatively strong, which is not suitable for estimating socioeconomic indicators. Third, for developed countries, it is necessary to introduce more variables, such as urban population, spatial climate conditions, and so on, to estimate economic and social indicators using nighttime light data.

Conclusions
In order to evaluate the applicability of nighttime light data in estimating socioeconomic indicators, we propose a causal-effect inference method to test the relationship between nighttime light data and socioeconomic indicators. According to the method, we found evidence to support applications of nighttime light data in the estimation of socioeconomic indicators. At the same time, we find that some conclusions are of great significance to the application of light data. The main conclusions are as follows:

•
Casual inference is necessary before estimating socioeconomic indicators by nighttime light data. • Spatial heterogeneity exists in the applicability of nighttime light data in estimating socioeconomic indicators.

•
Nighttime light data are more suitable for estimating electric power consumption in developing countries, such as China, India, and so on.

•
For developed countries, it is necessary to add more latent variables, such as urban population, spatial climate conditions, and so on, to estimate electric power consumption using nighttime light data.

•
In the contiguous regions of geography, such as regions between Africa, the Middle East, and Europe, the nighttime light spillover effect is relatively strong. So, nighttime light data need to be corrected.
Nighttime light data are publicly accessible data in practice. In academic research and practice, economists, geographers, and ecologists regard it as an important tool and are widely used. Through this study, we should pay more attention to correlative and causal relationships in future practice. The correlation between nighttime light data and economic and social indicators is not equal to causality. When using nighttime light data in practice, theoretical causal inference is a necessary process. And, the method proposed in this paper can be used to test other research objects, such as metal stocks, population, GDP, etc. And this method is also applicable when the statistical data of smaller spatial units, such as the province level, the city level, or smaller units, can be obtained. Future research should pay more attention to the intrinsic logical relationship between nighttime light data and socioeconomic indicators. In future research, the relationship between nighttime light and socioeconomic indicators on the spatial dimension is important in this field. In order to estimate the socioeconomic indicators more accurately using light data, it is necessary to deal with the spatial dependence of nighttime light data.