3.1. Modelling of TNL–EPC
The commonly used models for estimating electricity consumption include the linear, exponential, logarithmic, power function, and polynomial models.
Table 1 presents details of the general form of the five models used in this work. Using these five regression models, with TNL as the independent variable x and EPC as the dependent variable y, the goodness-of-fit R
2 was used to assess the model fitting effect.
In the construction of energy consumption models, R
2 is often chosen as the main criterion for model evaluation [
19,
23]. In this work, R
2 was used to quantify the proportion of variance in electric power consumption (EPC) that was explained by the total nighttime light (TNL) as an independent variable, thereby serving as a direct indicator of the model’s explanatory capacity [
27]. This approach aligns with our objective of identifying models that most effectively capture the relationship between TNL and EPC by maximising the explanatory capacity of TNL on EPC.
Although R2 prioritises the model’s explanatory power, it may artificially increase with the addition of predictors, even if they do not contribute significantly to the model’s explanatory capacity. To avoid spurious increases in R2, our model maintained simplicity (e.g., univariate power functions), which helped to ensure robust model performance. This is critical in spatiotemporal studies, where the complexity of spatiotemporal data (e.g., nonlinear relationships and heterogeneity) necessitates models with greater flexibility to capture these nuances.
R2 is a dimensionless coefficient with values ranging from 0 to 1. An R2 close to 1 indicates that the model predictions are closer to the actual results, suggesting a better fit.
To obtain the optimal fitting model applicable to the Chinese region, the electricity consumption estimation models in different space, time, and fitting dimensions were compared, respectively:
(1) Annual TNL–EPC spatial regression modelling, including the following: graded and ungraded.
The TNL of each province in a given year was taken as the independent variable, while the annual electricity power consumption (EPC) of the corresponding province was used as the dependent variable. During the fitting process, it was found that the samples from Heilongjiang, Tibet, and Xinjiang significantly deviated from the overall data trend. These regions were characterised by a high total amount of TNL but a low amount of EPC. This result suggests that these regions may have non-representative lighting sources that do not correspond to typical electricity consumption patterns. This deviation is probably because the high-intensity pixels in these areas have not been completely removed. This resulted in significant anomalies in the relationship between TNL and EPC in Xinjiang, Tibet, and Heilongjiang. The main reasons for the anomalies in these regions are as follows:
First, the anomalies in Xinjiang can be mainly attributed to extensive oil and gas extraction activities. Industrial operations such as drilling sites and flaring produce high-intensity lighting that significantly deviates from typical urban electricity consumption patterns. These industrial light sources produce abnormally high pixel values that distort the TNL–EPC relationship, thereby compromising model accuracy.
Second, the outliers in Tibet are related to the sparse population and low level of urbanisation. Lighting in the region is dominated by non-urban sources such as military infrastructure and aurora borealis. These sources of light are largely uncorrelated with social demand for electricity, resulting in a unique pattern that deviates from the general TNL–EPC relationship.
Finally, the anomalies in Heilongjiang are caused by frequent wildfires and agricultural burning activities, especially during the dry season. These activities result in transient high-brightness pixels that do not conform to typical urban lighting patterns. In addition, Heilongjiang has one of the highest rates of agricultural fires in China, especially in spring and fall. Crop straw burning significantly increases the nighttime light intensity, which further distorts the TNL–EPC relationship.
Provinces with significant outliers (e.g., Xinjiang, Tibet, and Heilongjiang) were excluded from the regression analysis. This step minimised bias due to non-representative light sources, such as industrial activities, wildfires, and non-urban infrastructure. With reference to previous studies, the generalisability of the model was ensured by masking these non-urban light sources, which, in turn, will improve the applicability of the model in different regions [
20]. In the electricity consumption estimation model, R
2 rose from around 0.5 to nearly 0.9 after excluding Xinjiang, and above 0.9 after excluding Tibet and Heilongjiang. Therefore, after excluding the 4 provinces with missing data and the 3 provinces with anomalous data, there were 27 provinces in total, and the spatial regression of the five models was fitted year by year. Then, the model with the largest R
2 in each year was selected as the optimal model for that year.
On this basis, considering the differences in the levels of economic development of different regions, the light density of each province was calculated. The 27 provinces were divided into three levels according to the size of the light density, and the electricity consumption prediction models were constructed separately in the different levels to explore whether hierarchical fitting would improve the accuracy of the model. In 2012, for example, the grading results in three levels are presented in
Table A1.
(2) Monthly TNL–EPC spatial regression modelling, including the following: graded and ungraded.
The TNL in December 2020, which had higher data quality and completeness, was selected as the independent variable. The corresponding monthly electricity power consumption (EPC) of each province was used as the dependent variable. During the fitting process, it was found that Shandong, Jilin, and Heilongjiang significantly deviated from the overall data trend. This deviation may be due to residual noise from non-urban light sources that was not completely removed, similar to the problems encountered in the annual regression model described above. As with Heilongjiang, the anomaly in Jilin was also mainly due to agricultural activities, particularly crop burning during the dry season, which introduces transient high-brightness pixels that distort the TNL–EPC relationship. Shandong, on the other hand, is an important industrial base in China (e.g., chemical, iron and steel, and manufacturing). The industrial facilities in Shandong may continue to operate at night, generating high-intensity, steady lighting signals that deviate from typical residential electricity consumption patterns. Such lighting inflates the total nighttime light (TNL) beyond what actual electric power consumption (EPC) can match due to differences in the way industrial electricity consumption is counted (e.g., captive power plants, which are not fully included in grid data). In addition, the nighttime lights of logistics operations in port cities, such as Qingdao and Yantai, may be misclassified as residential electricity consumption, leading to a deviation in the TNL–EPC relationship by inflating the TNL beyond what EPC can match, as documented in previous studies [
19]. Therefore, after removing the 4 provinces with missing data and 3 provinces with data anomalies, regression fitting was performed on a total of 27 provinces, and the model with the highest R
2 was selected as the model with the best fitting effect in December 2020. Similar to the construction of the spatial dimensional hierarchical model for annual data, the provinces were graded, and the electricity consumption prediction models were constructed separately. In addition, the fitting effects before and after grading were compared.
(3) Annual TNL–EPC temporal regression modelling, including on the national, provincial, and prefectural scale.
The TNL data for the nation, provinces, and major prefectures were collected annually from 2012 to 2023 as the independent variable. The corresponding electricity power consumption (EPC) of each administrative unit (national, provincial, and major prefectural levels) was used as the dependent variable. A total of 12 annual sample points for each administrative unit were used for regression fitting. The model with the highest R2 value was selected as the best-fitting model in the temporal dimension. In the construction of prefectural scale electricity consumption models, it was difficult to model all cities in China in practice. For this reason, electricity consumption models for the capital cities of each province in China were developed in this work. In this modelling, excluding the areas with missing data, 30 provinces and 18 cities were finally modelled.
(4) Monthly TNL–EPC temporal regression modelling, including one the national, provincial, and prefectural scale.
The TNL data for the nation, provinces, and major prefectures were collected monthly in 2020 and used as the independent variable. The corresponding monthly EPC of each administrative unit was used as the dependent variable. After excluding months with missing data, a total of seven monthly sample points per administrative unit were used for regression fitting. Due to some difficulties in the actual collection of electricity consumption data for prefecture-level cities, a total of 30 provinces and 9 cities were finally modelled.