Statistical Correlation between Monthly Electric Power Consumption and VIIRS Nighttime Light

: The nighttime light (NTL) imagery acquired from the Visible Infrared Imaging Radiometer Suite (VIIRS) Day / Night Band (DNB) enables feasibility of investigating socioeconomic activities at monthly scale, compared with annual study using nighttime light data acquired from the Defense Meteorological Satellite Program / Operational Linescan System (DMSP / OLS). This paper is the ﬁrst attempt to discuss the quantitative correlation between monthly composite VIIRS DNB NTL data and monthly statistical data of electric power consumption (EPC), using 14 provinces of southern China as study area. Two types of regressions (linear regression and polynomial regression) and nine kinds of NTL with di ﬀ erent treatments are employed and compared in experiments. The study demonstrates that: (1) polynomial regressions acquire higher reliability, whose average R square is 0.8816, compared with linear regressions, whose average R square is 0.8727; (2) regressions between denoised NTL with threshold of 0.3 nW / (cm 2 · sr) and EPC steadily exhibit the strongest reliability among the nine kinds of processed NTL data. In addition, the polynomial regressions for 12 months between denoised NTL with threshold of 0.3 nW / (cm 2 · sr) and EPC are constructed, whose average values of R square and mean absolute relative error are 0.8906 and 16.02%, respectively. These established optimal regression equations can be used to accurately estimate monthly EPC of each province, produce thematic maps of EPC, and analyze their spatial distribution characteristics.

Electric power consumption (EPC) is a basic index in measuring regional energy consumption, which can not only objectively reflect economic performance situation, but also exhibit industrial structure change and energy consumption level. Obtaining accurate and timely EPC is of great practical significance in optimizing allocation of power resources and monitoring economic performance situation.
Two kinds of remotely sensed NTL data, the Defense Meteorological Satellite Program/Operational Linescan System (DMSP/OLS) and the Visible Infrared Imaging Radiometer Suite Day/Night Band Figure 1. Study area. Fourteen provinces of southern China were selected as study cases considering the spatial and temporal coverage of monthly VIIRS DNB data.

Auxiliary Data
Monthly EPC data of 14 provinces in study area from January 2013 to December 2018 were acquired from statistical website of each provincial government. EPC included industrial and household electricity consumption, which could reflect the social and economic status.
The vector data of provincial administrative regions of study area was acquired from website of Database of Global Administrative Areas (GADM, https://gadm.org/) will be used for regional aggregation of NTL data. The projection and coordinate of the vector data were consistent with VIIRS DNB data.

Methods
Four main procedures were undertaken to figure out the optimal regression between NTL and EPC: firstly, gap filling of downloaded NTL data; secondly, denoising of gap filled NTL; thirdly, Figure 1. Study area. Fourteen provinces of southern China were selected as study cases considering the spatial and temporal coverage of monthly VIIRS DNB data.

Nighttime Light Data
The monthly cloud-free composites of VIIRS NTL images collected from December 2012 to January 2019 were used in this study. These images were retrieved from the National Oceanic and Atmospheric Administration National Centers for Environmental Information (https://ngdc.noaa.gov/eog/viirs/ index.html, last accessed on 1 April 2019). These data have not been filtered to screen out lights from aurora, fires, boats, and other temporal lights. Only two years of yearly composites released on the website (2016 and 2017). The VIIRS images provide gridded average values of anthropogenic NTL radiance (in units of nW/(cm 2 ·sr) hereafter) with a spatial resolution of 15 arc-seconds (~500 m at the equator).
The NTL data of June 2018 was not available online, which was represented by the average data of May and July 2018. For better identification, the downloaded NTL data and estimated NTL data of June 2018 were identified as original NTL or NTL 0 hereafter.

Auxiliary Data
Monthly EPC data of 14 provinces in study area from January 2013 to December 2018 were acquired from statistical website of each provincial government. EPC included industrial and household electricity consumption, which could reflect the social and economic status.
The vector data of provincial administrative regions of study area was acquired from website of Database of Global Administrative Areas (GADM, https://gadm.org/) will be used for regional aggregation of NTL data. The projection and coordinate of the vector data were consistent with VIIRS DNB data.

Methods
Four main procedures were undertaken to figure out the optimal regression between NTL and EPC: firstly, gap filling of downloaded NTL data; secondly, denoising of gap filled NTL; thirdly, spatial filtering for denoised NTL; fourthly, regression between NTL and EPC for each month and evaluation of regression ( Figure 2). ISPRS Int. J. Geo-Inf. 2020, 9,32 4 of 13 spatial filtering for denoised NTL; fourthly, regression between NTL and EPC for each month and evaluation of regression ( Figure 2).

Gap Filling of NTL Data
NTL data with nearly complete spatial coverage of study area in all months were selected for experiment. However, there were still no-value areas in the northernmost part of the study area in June every year. These no-value pixels were replaced by the average of the same pixels in May and July of the same year [36]. In addition, due to various factors, pixels with values less than or equal to 0 nW/(cm 2 ·sr) may sporadically appeared in images in all months, which were replaced by the average values of the same pixels in the preceding and the following months based on the assumption that night lighting should be gradually changed between adjacent months. The NTL data of June 2014 before and after gap filling were shown as an example in Figure 3 and Figure 4, respectively. After gap filling, data coverage and availability of NTL data were significantly improved. Nevertheless, there were still a small number of pixels equal to or less than 0 nW/(cm 2 ·sr) in the image, which will be handled in the subsequent noise reduction process.
The NTL data after gap filling for pixels less than or equal to 0 nW/(cm 2 ·sr) was called NTLg hereafter.

Denoise of NTL
Several kinds of processing were implemented on NTLg data, including denoising, average filtering, median filtering, and mid-value filtering.
There existed background noise in VIIRS DNB data that should be treated. Li et al. derived a denoised NTL data through multiplying the NPP-VIIRS imagery by the mask generating with all positive value pixels from the DMSP-OLS imagery in 2010 [15]. Ma et al. proposed a simple and feasible method of denoising by taking the mean radiance value of lake pixel samples as the denoising threshold

Gap Filling of NTL Data
NTL data with nearly complete spatial coverage of study area in all months were selected for experiment. However, there were still no-value areas in the northernmost part of the study area in June every year. These no-value pixels were replaced by the average of the same pixels in May and July of the same year [36]. In addition, due to various factors, pixels with values less than or equal to 0 nW/(cm 2 ·sr) may sporadically appeared in images in all months, which were replaced by the average values of the same pixels in the preceding and the following months based on the assumption that night lighting should be gradually changed between adjacent months. The NTL data of June 2014 before and after gap filling were shown as an example in Figures 3 and 4, respectively. After gap filling, data coverage and availability of NTL data were significantly improved. Nevertheless, there were still a small number of pixels equal to or less than 0 nW/(cm 2 ·sr) in the image, which will be handled in the subsequent noise reduction process.
The NTL data after gap filling for pixels less than or equal to 0 nW/(cm 2 ·sr) was called NTL g hereafter.

Denoise of NTL
Several kinds of processing were implemented on NTL g data, including denoising, average filtering, median filtering, and mid-value filtering.
There existed background noise in VIIRS DNB data that should be treated. Li et al. derived a denoised NTL data through multiplying the NPP-VIIRS imagery by the mask generating with all positive value pixels from the DMSP-OLS imagery in 2010 [15]. Ma et al. proposed a simple and feasible method of denoising by taking the mean radiance value of lake pixel samples as the denoising threshold value, which equaled to 0.3 nW/(cm 2 ·sr) [37]. Using the method proposed by Ma et al., the NTL g data were denoised by setting pixels of value lower than 0.3 nW/(cm 2 ·sr) with 0 nW/(cm 2 ·sr), which were called denoised NTL with threshold of 0.3 hereafter (or NTL 1 ). ISPRS Int. J. Geo-Inf. 2020, 9, 32 5 of 13 value, which equaled to 0.3 nW/(cm 2 ·sr) [37]. Using the method proposed by Ma et al, the NTLg data were denoised by setting pixels of value lower than 0.3 nW/(cm 2 ·sr) with 0 nW/(cm 2 ·sr), which were called denoised NTL with threshold of 0.3 hereafter (or NTL1).  value, which equaled to 0.3 nW/(cm 2 ·sr) [37]. Using the method proposed by Ma et al, the NTLg data were denoised by setting pixels of value lower than 0.3 nW/(cm 2 ·sr) with 0 nW/(cm 2 ·sr), which were called denoised NTL with threshold of 0.3 hereafter (or NTL1).

Spatial Filtering
There may be a few pixels with abnormally high value in NTL 1 data, due to gas flares, fires, oilfields, volcanoes, etc. In order to reduce the potential influence of abnormally high values, average filtering, median filtering, and mid-value filtering were implemented on NTL 1 , respectively.
Average filtering means that the pixel value is reset to average value of n*n adjacent pixels. The results of average filtering of 3 × 3 and 5 × 5 were called NTL 2 and NTL 3 hereafter, respectively.
Median filtering means that the pixel value is reset to median value of n*n adjacent pixels. The results of median filtering of 3 × 3 and 5 × 5 were called NTL 4 and NTL 5 hereafter, respectively.
Mid-value filtering means that the pixel value is reset to the average of maximum and minimum value of n*n adjacent pixels. The results of mid-value filtering of 3 × 3 and 5 × 5 were called NTL 6 and NTL 7 hereafter, respectively.

Regression and Evaluation
Sum of NTL of each provincial region was calculated for each kind of NTL data (NTL 0 -NTL 7 and NTL g ) by accumulating values of all pixels in each region and each month.
Two common regression models, linear regression and polynomial regression, were performed between each sum of NTL and EPC data, respectively. R-squared mean absolute relative error (MARE), maximum relative error (MRE), and root mean squared error (RMSE) were used to demonstrate the reliability of regression, which were described as where y i represents statistical EPC data of the ith sample.ŷ i represents calculated EPC data of the ith sample. m denotes the sample size of each month, which equals to 84 in this study. R square and RMSE were used to evaluate the quality of regression. The higher R square and the lower RMSE were, the stronger the regression will be. MARE and MRE were used to describe the estimation error of models, which were only used as reference parameters due to the fact that maximum R square, minimum RMSE, and minimum MARE may not indispensably occur at the same time.

Overall Analysis of Regression
Two types of regression between monthly EPC and nine kinds of monthly NTL data with different treatments were performed. A total of 216 regression equations were obtained for 12 months. It was essential to decide which kind of regression performed strongest and which kind of NTL data performed best in regression, for the sake of been reliably applied in the future.
As mentioned above, R square, MARE, MRE, and RMSE were employed to describe the quality of each regression equation. In order to compare the stability of these regression analyses in 12 months of a year, the average of regression parameters of each regression in 12 months were calculated and a total of 18 groups of average values were obtained (shown in Table 1). According to the average value in Table 1, all 18 regression formulas achieved promising results, with all R square exceeded 0.8459 and mean value of R square equaled to 0.8772. The linear regression between NTL 0 and EPC was comparatively the least reliable one, whose R square, MARE, MRE, and RMSE were 0.8459, 20.70, 100.64, and 486632.44, respectively. Meanwhile, the polynomial regression between NTL 0 and EPC was comparatively the least reliable one in 9 kinds of polynomial regression, whose R square, MARE, MRE, and RMSE were 0.8607, 17.18, 92.30, and 462995.95, respectively. In other words, when linear regression or polynomial regression was performed between EPC data and various NTL data, respectively, using processed NTL data was consistently more reliable than using original NTL data. These comparisons demonstrated the necessity to process NTL data appropriately before using it to estimate EPC, which may improve the reliability of estimation.
As shown in Table 1, polynomial regressions were superior to linear regression in reliability for regressions between any kind of NTL data and EPC. The mean values of R square, MARE, MRE, and RMSE of nine linear regressions were 0.8727, 18.96, 86.95, and 438650.23, respectively. However, the mean values of R square, MARE, MRE, and RMSE of nine polynomial regressions were 0.8816, 16.38, 82.55, and 423048.08, respectively. It was noticeable that the mean value of MARE of polynomial regressions was 13.60%, lower than that of linear regressions. Therefore, compared with linear regression, polynomial regressions can obtain higher precision results in estimating monthly EPC based on NTL data.
Among the nine kinds of NTL data to be based in building regression models, regression between NTL 1 and EPC steadily exhibited the strongest reliability in two types of regressions. The mean value of R square of regressions between NTL 1 and EPC reached the highest value in two types of regression, respectively. By contrast, three kinds of processing (average filtering, median filtering, and mid-value filtering) on NTL 1 data failed to effectively improve the reliability of regression.
Based on the above analysis, the polynomial regressions between NTL 1 and EPC would be mainly concerned in the following sections.

Analysis of Monthly Regression
Taking NTL 1 as the independent variable and EPC as the dependent variable, the polynomial regressions of 12 months were built, respectively, and the results were showed in Figure 5.
ISPRS Int. J. Geo-Inf. 2020, 9, 32 8 of 13 In each plot, the regression curve visibly reflected the distribution trend of scattered points. The vast majority of the points were close to the fitting curves, whose relative errors were low. Even in polynomial regressions with relatively low R square (Figure 5e and Figure 5j), only a few points were relatively far from the regression curves, with comparatively higher relative errors.  Polynomial regression equations between NTL1 and EPC for 12 months, together with corresponding R square, MARE, MRE, and RMSE, were listed in Table 2. In the regression of 12 months, the R square of 5 months (Jan, Mar, Jul, Aug, and Dec) were higher than 0.9, together with the MARE lower than 16%. In addition, the R square of 3 months (Apr, May, and Oct) were between 0.82 and 0.85, together with the MARE between 19% and 20%. The MARE described the overall reliability of estimation. However, compared with the MARE, the MRE usually reflects the estimation results of very few abnormal samples, so it does not have a strong co-direction or hetero-direction relationship with R square. In each plot, the regression curve visibly reflected the distribution trend of scattered points. The vast majority of the points were close to the fitting curves, whose relative errors were low. Even in polynomial regressions with relatively low R square (Figure 5e,j), only a few points were relatively far from the regression curves, with comparatively higher relative errors.

Discussion
The reason why so many researchers endeavored to estimate EPC based on NTL images was because the process of consuming electricity was often accompanied by the emission of light, such as home lights, business lights, street lamps, etc. However, not all EPC produced lights, such as air conditioners, water heaters, electric fans, etc. Although these electrical devices did not directly produce lights, they were closely related to human activities. Where there were air conditioners, water heaters, electric fans, and other electrical appliances, there would be human activities, accompanied by household lights, commercial lights, street lamps, and so on. In addition, some other things besides electricity may produce lights, by using gasoline or other materials, such as fireworks, car lights, etc.
From the perspective of time, the data of EPC includes the total EPC in a whole period of time, while the NTL data only records the light information above a certain brightness at a certain moment, which cannot record the information of most other time periods. Therefore, it is theoretically impossible to accurately calculate the annual or monthly EPC by using NTL data. We can only estimate EPC values within a given time period based on composite data of NTL values at multiple moments. The accuracy of estimation may be affected by industrial structure, energy consumption structure, population structure, and other factors in different regions besides the accuracy of NTL data.
The overpass time of SNPP is around 01:30 in local solar time, which is not the peaking lighting time within a day. By visual interpretation upon VIIRS DNB images, there is still plenty of lighting after midnight, which may probably last until dawn. Using such lighting information can reasonably reflect socio-economic activities considering that reliable results have been obtained in large number of previous studies based on this data.
Environmental surface variables may affect nighttime brightness. Levin found that albedo and snow cover exert obvious positive impacts on VIIRS DNB nighttime brightness [38]. The accuracy of estimating socioeconomic activities using VIIRS data may be enhanced if the magnitude of impact can be reasonably estimated and corresponding calibration treatment be performed on VIIRS data.
The probable impacts of satellite observation angles were not covered in this study. Li et al. investigated the variation of viewing angles of SNPP satellite and quantified the viewing angle effects on the artificial light radiance [39]. The VIIRS DNB data will be able to describe socioeconomic activities more accurately if they are improved by removing the angular effects.
Despite the above problems, there is a close positive correlation between EPC and NTL data, which can reflect the social and economic activities of human beings on the surface of the earth to a large extent. The use of NTL data can achieve a long time series, large spatial coverage, rapid monitoring of social and economic activities.
DMSP/OLS data was the most widely used NTL data in EPC estimation, due to its long time series (1992-2013). Despite its advantages, VIIRS DNB data was relatively less used in EPC estimation due to its short time series. Previous studies have shown that annual EPC data can be estimated using VIIRS DNB data with a higher accuracy than DMSP/OLS data. Except for annual data, NOAA released monthly composite VIIRS DNB data from April 2012 to the present. Unfortunately, no study regarding estimating monthly EPC using monthly composite VIIRS DNB data has been reported. We conducted regression analysis between monthly EPC and corresponding monthly composite VIIRS DNB data and obtained satisfactory results. This demonstrated the feasibility of estimating monthly EPC using monthly composite VIIRS DNB data. In addition, NOAA began to release daily VIIRS DNB data, which will provide additional data option for future research.
Linear regression models were often employed in estimating EPC based on NTL data. For each month, we compared polynomial regression model with linear regression model and found that the accuracy of EPC estimation using polynomial regression model was higher than the other one. We also conducted exponential regression and logarithmic regression between EPC and NTL in the experiment, but the R square values were much lower than those of linear regression and polynomial regression.
The method of reducing background noise in NTL data proposed by Ma et al. was employed in this paper, because it was easy to understand and conduct. In spite of noise reduction, there might still be other sources of at-sensor nighttime radiance that remain uncorrected in the dataset, such as atmospheric backscatter and diffuse radiation [40].
The purpose of conducting three kinds of spatial filtering was to reduce the feasible influence of abnormal high pixel value. Filtering windows of 3*3 and 5*5 were chosen because they were widely used and have low computational complexity. However, according to the regression results, the relationships between EPC and spatially filtered NTL data did not improve. This might be due to two reasons: (1) spatial filtering of a small number of outliers had little effect on the total NTL value of the province; (2) a large number of pixels in urban and suburban areas have been smoothed, might resulted in some information loss.
Although we have obtained EPC estimation models based on VIIRS NTL data on a monthly basis, these models are built based on statistical analysis, and it is difficult to explain the physical meaning of each parameter of the models. This is the inherent defect of statistical analysis. However, statistical model is still of practical value and significance before the physical model is established effectively.
In this paper, monthly regression models are established with sample data from 14 provinces in southern China. The parameters of these models may not be appropriate elsewhere, due to different statistical standard of electric power consumption. However, it is feasible to establish monthly regression models for each region using the steps and data described in this paper.

Conclusions
This paper investigated the relationship between EPC and NTL data on a monthly scale, using monthly VIIRS DNB NTL composite data from January 2013 to December 2018 and the corresponding monthly statistical data of EPC of 14 provinces in southern China. Two kinds of regressions were compared for the purpose of obtaining more reliable regression results. Furthermore, nine kinds of NTL with different treatments, including original NTL (NTL 0 ), Gap filled NTL (NTL g ), denoised NTL by threshold of 0.3 (NTL 1 ), 3*3 average filtered NTL (NTL 2 ), 5*5 average filtered NTL (NTL 3 ), 3*3 median filtered NTL (NTL 4 ), 5*5 median filtered NTL (NTL 5 ), 3*3 mid-value filtered NTL (NTL 6 ) and 5*5 mid-value filtered NTL (NTL 7 ), were involved in building regression formulas. The conclusions are drawn as follows: High reliability was achieved in all 18 regression formulas (two types of regressions between EPC and nine kinds of processed NTL), with all R square exceeded 0.8459 and mean value of R square equaled to 0.8772. Compared with linear regressions, polynomial regressions acquired higher reliability, whose average R square was 0.8816, higher than 0.8727 of linear regressions. Regressions between denoised NTL with threshold of 0.3 (NTL 1 ) and EPC steadily exhibited the strongest reliability among the nine kinds of NTL data to be based in building two types of regression models. Three kinds of treatments (average filtering, median filtering, and mid-value filtering) on NTL 1 data did not effectively improve the reliability of regressions. These kinds of data processing were not recommended in estimating EPC based on NTL data.
For the 12 months of polynomial regressions between NTL 1 and EPC, the average value of R square was 0.8906, and the average value of MARE was 16.02%. For nearly 90% of the 1008 estimations (84 per month, 12 months), the absolute relative errors between estimated EPC and statistical values were less than 30%, which indicated high estimation accuracy in most cases.