Next Article in Journal
Constitutive Modelling of Tendons as Fibre-Reinforced Soft Tissues with a Single Fibre Family: Stress-Relaxation Tests for Parameter Identification
Previous Article in Journal
Hydro-Mechanical Behavior and Seepage-Resistance Capacity of a Coal Pillar-Water-Blocking Wall Composite Structure for Goaf Water Hazard Control
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Machine Learning Framework for Predicting Regional Energy Consumption from Satellite-Derived Nighttime Light Imagery

by
Monica Borunda
1,*,
Jessica Gallegos
2,
José Alberto Hernández-Aguilar
3,
Guadalupe Lopez Lopez
4,
Victor M. Alvarado
4,
Gerardo Ruiz-Chavarría
2 and
O. A. Jaramillo
5
1
SECIHTI, Centro Nacional de Investigación y Desarrollo Tecnológico, Tecnológico Nacional de México, Cuernavaca 62490, Mexico
2
Facultad de Ciencias, Universidad Nacional Autónoma de México, Ciudad de México 04510, Mexico
3
Facultad de Contaduría, Administración e Informática, Universidad Autónoma del Estado de Morelos, Cuernavaca 62200, Mexico
4
Centro Nacional de Investigación y Desarrollo Tecnológico, Tecnológico Nacional de México (TecNM/CENIDET), Cuernavaca 62490, Mexico
5
Instituto de Energías Renovables, Universidad Nacional Autónoma de México, Temixco 62580, Mexico
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(1), 449; https://doi.org/10.3390/app16010449
Submission received: 4 December 2025 / Revised: 25 December 2025 / Accepted: 30 December 2025 / Published: 31 December 2025
(This article belongs to the Section Energy Science and Technology)

Featured Application

The proposed framework offers a viable, transparent, and reproducible alternative for characterizing energy consumption dynamics in regions where conventional statistical data is scarce, outdated, or published at scales that are too broad to reveal local realities.

Abstract

Reliable estimates of regional energy consumption are essential to planning sustainable development and achieving decarbonization; however, this information is still not available for several regions worldwide. In this work, we propose a methodological framework that uses satellite-derived Nighttime Light (NTL) imagery and machine learning to predict regional electricity consumption one year ahead. The methodology follows three stages: First, a Random Forest regression model is used to identify the relationship between NTL data and regional energy consumption. Thereafter, NTL values for the year ahead are forecasted using NTL values from previous years. Lastly, the obtained result is applied to estimate regional energy consumption from predicted NTL values for the year ahead. The country of Mexico is considered a case study to apply and validate this methodology, reproducing spatial consumption patterns with high correlation to official data ( R 2 > 0.85 ) , thus confirming the success of this proposal. The proposed methodology demonstrates how energy demand can be estimated, even in areas of scarce information, providing a transparent and replicable approach for energy monitoring in data-limited regions.

1. Introduction

In many countries, obtaining accurate information about how energy is consumed at the regional level remains complicated. Data are often incomplete, outdated, or published at scales that are too broad to reveal local realities [1,2] and are not always available to everybody. Mexico is a good example of this situation: most official statistics are aggregated at the state or national levels [3], which makes it difficult to analyze small-scale dynamics or to design public policies personalized to specific regions. The contrast between rural and urban areas is also significant. Limited access to electricity in rural zones reflects deep economic inequalities and constrains development [4]. When detailed data are unavailable, finding complementary ways to approximate regional energy demand becomes a practical necessity.
One promising alternative comes from satellite observations of the Earth at night. The term Nighttime Light refers to the visible light emissions from human settlements and natural reflections of moonlight from the Earth’s surface at night, captured via NASA satellites in orbit. The light emitted by human settlements is captured with the Visible Infrared Imaging Radiometer Suite (VIIRS) sensor as Nighttime Light (NTL) imagery [5,6,7], which mirrors patterns of activity, electrification, and economic intensity. Because these data are continuous, global, and open to the public, they have been used to explore social and energy-related processes in regions with limited ground information. Still, most previous works have focused on spatial correlations, rather than developing predictive or transferable methodologies that could be applied in data-scarce contexts.
In this study, we propose a three-stage framework that combines remote sensing and machine learning to bridge this gap, inspired by previous works that combine NTL with machine learning for power estimation [8,9]. In the first stage, a Random Forest model trained with yearly NTL data was used to forecast annual light intensity. In the second stage, simple Linear Regression was applied to estimate municipal-level electricity consumption from the predicted NTL values, which is different from other approaches that use Gradient Boosting Regression Trees [10]. Using Mexico as a case study, the framework demonstrates how satellite-derived data can be used to build reproducible and spatially explicit maps of regional energy consumption, even where official records are incomplete or missing. The importance of such a framework is shown in systematic reviews on published works regarding NTL and electricity consumption, suggesting the need for more transferable predictive methodologies [11]. In the next subsection, the state of the art for energy consumption modeling using Nighttime Light analysis or similar remote sensing data is presented.

1.1. State of the Art

Satellite imagery was combined with machine learning to predict poverty [12], energy consumption [13,14], power delivery, and power loss after catastrophic events such as hurricanes [15]. In this section, a literature review on using Nighttime Light to predict energy consumption (EC) is provided. For this purpose, we used the keywords “forecasting energy consumption” + “machine learning” + “nighttime light data” for the years 2021–2025; with the resulting papers, we organized the main results into the following five essential thematic subjects:

1.1.1. Regional and Global Estimation of Electricity Consumption (EC/EPC)

A significant pillar of the state of the art involves using Nighttime Light (NTL) data to map energy consumption across diverse scales, where statistical data is often sparse. Guo et al. (2021) [16] demonstrated this by calibrating EC at a 100 m resolution for 45 Chinese cities. On a global scale, Hu et al. (2022) [17] generated a 1 km resolution dataset spanning 28 years, achieving a Mean Absolute Relative Error (MARE) of less than 20% in 92.6% of the countries analyzed. To assist with Sustainable Development Goal 7, Gao et al. (2022) [18] used a month-specific substitution method in Cambodia, achieving an average relative error (RE) of 5.47% and an adjusted R2 of 0.967. Additionally, Gallegos et al. (2024) [19] successfully estimated EC for Mexican municipalities, identifying NTL as the most critical predictive parameter, surpassing both temperature and GDP.

1.1.2. Performance Benchmarking of Machine and Deep Learning Models

The sources highlight a rigorous comparison of algorithms to identify the most precise predictors for specific socio-economic tasks. Guo et al. (2021) [16] and Gallegos et al. (2024) [19] both identified Random Forest (RF) as the optimal model for electricity consumption; Gallegos et al. reported an R2 of 0.881 and a Mean Absolute Percentage Error (MAPE) of 3.4% for RF. For high-frequency temporal forecasting in India, Darshini et al. (2025) [13] found that the Bidirectional LSTM was superior for single-step forecasts (R2 of 0.94 and RMSE of 2.83 MU). For global urban monitoring, Chakraborty et al. (2023) [20] employed an ensemble forecast (FCNN, CNN, and LSTM), achieving an average recall of 87.83% and a precision of 87.05%.

1.1.3. Technical Innovations in Sensor Integration and Data Quality

Recent research addresses the challenge of reconciling different satellite generations and environmental noise. Hu et al. (2022) [17] introduced a locally adaptive method to integrate DMSP-OLS and NPP-VIIRS data, using built-up area density (BUAD) for saturation correction and yielding a global MARE of 2.962%. Guo et al. (2021) [16] found that the Luojia1-01 sensor provides higher precision than NPP-VIIRS because its 22:30 local overpass time aligns better with peak human activity. Furthermore, Chen et al. (2025) [21] demonstrated that SDGSAT-1 imagery (40 m resolution) provides more robust results for monthly predictions than lower-resolution datasets. To mitigate heavy cloud cover in low-latitude regions, Gao et al. (2022) [18] selected the month-specific substitution method using December imagery as the most reliable annual proxy.

1.1.4. Capturing Temporal Dynamics and Environmental Interactions

Advancements have shifted from annual snapshots to capturing intra-annual and near-real-time variations. Chen et al. (2025) [21] developed a four-parameter monthly predictive model incorporating NTL and temperature interactions, achieving an MARE of 7.96% during training. Darshini et al. (2025) [13] utilized k-means clustering to group Indian states by consumption patterns, finding that the ConvLSTM model produced forecasts with MAPE values below 10% for these clusters. Similarly, Chakraborty et al. (2023) [20] used multi-step-ahead forecasting to detect urban infrastructure changes, such as disasters or conflicts, based on deviations from a city-specific baseline.

1.1.5. Mapping Socio-Economic Livelihoods and Poverty

NTL data is increasingly combined with daytime imagery to map broader economic metrics in data-poor regions. Jean et al. (2023) [22] introduced a multi-step transfer learning approach using CNNs to predict household consumption and asset wealth in Africa; their models explained 37% to 55% of consumption variation and 55% to 75% of asset wealth variation. Notably, for clusters below the poverty line, this model increased explanatory power by an average of 81.2% compared to using Nighttime Light alone. Complementarily, Guo et al. (2021) [16] found that, while RF was optimal for EC, Geographically Weighted Regression (GWR) was the best model for mapping GDP and population.
As seen in the previous state of the art, although research is underway to use satellite technology to estimate or predict electric power consumption at the regional level and smaller scales, there is still a long way to go. The proposed framework aims to predict the EPC at a regional level, with municipal granularity, for the upcoming year, using annual mosaics of nighttime satellite imagery for the region of interest.
This work is organized as follows: In Section 2, the three-stage methodology of the proposed framework is described; in Section 2.1, the conceptual framework on which this methodology is based is outlined; in Section 2.2, we discuss data acquisition and its subsequent processing; the process for modeling the relationship between the EPC and the NTL with data is outlined in Section 2.3; the process for predicting Nighttime Light images is detailed in Section 2.4; in Section 2.5 and Section 2.6, we describe the prediction of EPC and its evaluation and validation, respectively; finally, in Section 2.7, the implementation of the proposed framework is portrayed. The results of the proposed methodology, applied to Mexico as a case study, are presented in Section 3, following the three main stages described in Section 2, as well as all performed steps. Data, including NTL images, EPC, and other relevant information, as well as their processing, are described in Section 3.1 and Section 3.2; then, the data-driven modeling of the EPC-NTL relationship, the forecasting of the NTL for a future year, and the subsequent procedure for the forecasted EPC, followed by the corresponding evaluation and validation are shown in detail in Section 3.3, Section 3.4, Section 3.5 and Section 3.6, respectively; the case of 2024 is presented in Section 3.7, and it is treated as a bonus year, since there is a lack of data with which to perform the validation. Finally, the discussion of the work is presented in Section 4, and its conclusions are summarized in Section 5.

2. Materials and Methods

The understanding of how satellite-based light observations relate to electricity consumption requires a framework that joins spatial, temporal, and statistical reasoning in a practical way. In this work, we developed a data-driven methodology for estimating and forecasting regional electric power consumption (EPC) from satellite-derived Nighttime Light (NTL) imagery. Comparable approaches have been reported in recent studies that link remote sensing and machine learning to estimate or forecast energy use from satellite data [9,10,14]. However, the proposed framework integrates remote sensing and machine learning techniques through the following three main sequential stages:
  • Data-driven modeling of the empirical relationship between EPC and NTL;
  • Forecasting of NTL intensity using multi-temporal data;
  • Estimation of future energy consumption from predicted NTL values.
Each stage involves steps that are described in detail in the following subsections, as shown in Figure 1. The proposed approach establishes a consistent sequence of analytical stages that integrate spatiotemporal modeling and quantitative validation. Furthermore, it is adaptable to different geographic contexts and temporal resolutions.
Both the data-driven modeling and the forecasting processes employ a Random Forest Regressor, selected for its robustness against nonlinearity, multicollinearity, and noise in high-dimensional geospatial datasets [23]. The Random Forest Regressor is an ensemble algorithm that builds multiple Decision Trees using bootstrap samples of the data and random feature subsets at each split. Each tree produces a prediction, and the final output is the average of the individual results; this approach reduces variance, mitigates overfitting, and allows the model to capture nonlinear relationships and interactions that are difficult to represent with a single tree; this selection was motivated by the need for a model that balances nonlinear modeling capability with interpretability and robustness, particularly in the context of annual satellite-derived NTL data and regional-scale energy applications.
Prior work has shown that ensemble-based models perform well when linking NTL with electricity use or with related energy indicators [8,9]. However, the Random Forest machinery was proven to perform the best among other algorithms, such as Linear Regression, Decision Trees, and Neural Networks [19].
The resulting approach is interpretable, reproducible, and transferable to regions with limited energy data. The following subsections outline the conceptual structure, data preparation, and modeling sequence that define the proposed three-stage methodology.

2.1. Conceptual Framework

Regional electric power consumption (EPC) depends on many factors, increasing, for instance, in regions with higher per capita GDP, higher urbanization rates, and higher high-technology exports, and decreasing in areas with higher temperatures and more agglomerate human activities. However, remotely sensed Nighttime Light (NTL) encodes valuable information for EPC. Indeed, a relationship between NTL and EPC was proposed in [24], as follows:
E P C = k × G P C × e β C V × e γ T × N T L b   ,
where G P C is the per capita GDP; C V is the variation coefficient of NTL within a country, computed as the standard deviation divided by the mean value of NTL; T is the yearly average temperature of a country; and k ,   α ,   β , and b are model parameters that need to be fitted with the region’s data. The above general formulation captures the combined effects of economic activity, spatial light heterogeneity, and climatic conditions on electricity use.
A linear expression can be derived by applying the natural logarithm to Equation (1), as follows:
ln E P C = ln k +   α   ln G P C +   β   C V + γ T + b ln N T L   ,
making the problem easier to solve.

2.2. Data and Preprocessing

The following two main datasets are required in this methodology:
  • Nighttime Light data;
  • Electricity consumption data.
The Nighttime Light data was collected from NASA’s “Black Marble” project [25,26], which offers a set of satellite-derived Nighttime Light data that provides global measurements of nocturnal visible and near-infrared light, representative of human activity on the planet at small, medium, and large time scales. The Day/Night Band (DNB) detectors on the Suomi National Polar-orbiting Partnership (Suomi-NPP) satellite are input to the Visible Infrared Imaging Radiometer Suite (VIIRS), which provides ultra-sensitive data and imagery in low-light conditions [27]. The VIIRS Black Marble product has an algorithm to remove clouds and to correct atmospheric, terrain, lunar BRDF, thermal, and straylight effects, as well as background noise. In particular, the “VIIRS/NPP Lunar BRDF-Adjusted Nighttime Light Yearly L3 Global 15 arc second Linear Lat Lon Grid”, also known as VNP46A4, delivers annual composites generated from daily atmospheric and lunar-bidirectional reflectance distribution function (BRDF)-corrected NTL radiance data [28] with 500 m spatial resolution.
On the other hand, electricity consumption data can be obtained from national and governmental statistical sources or open utility registries in each country, or from international, open datasets, such as the International Energy Agency (IEA) [29], the World Bank—Sustainable Energy for All (SE4ALL) [30], and the UN Data Energy Statistics Database [31], sources which ensure consistency and replicability of the proposed framework.

2.3. Data-Driven Modeling of the EPC-NTL Relationship

The first stage consists of establishing the data-driven relationship between electricity consumption and satellite-observed Nighttime Light intensity, as seen in Equation (2). The coefficients can be estimated with data from the region of interest using different techniques; for instance, Linear Regression, Decision Trees, Random Forest, and Neural Networks were used to estimate the parameters for the case study of Mexico in [19], finding that Random Forests performed the best and that, in that case study, the electrical power consumption mainly depends on NTL, as shown in Equation (3).
ln E P C = c + b ln N T L   ,
where c = ln k and b are model parameters dependent on the data region. Thus, in any country, the general expression of energy consumption can be reduced to include the dominant variables for each case. A linear fit is transparent, easy to diagnose, and good enough when the signal is strong; similar calibrations appear in studies that translate radiance into local energy estimates [10]; this reduced model preserves interpretability and maintains strong predictive performance when NTL is the primary explanatory signal.
Therefore, the Random Forest Regressor can be trained using a single year of NTL data in the region or country, with the corresponding municipal-level energy consumption values as the target variable; this Nonlinear Regression captures local variations and interactions that may not be fully represented by Equation (2) alone, while remaining consistent with its functional structure.
Municipal NTL values are computed as the sum of all pixel intensities within each area. If a pixel overlaps multiple regions, its contribution is weighted by the covered area. In this way, the Random Forest model identifies the functional dependence of EPC on NTL, and, optionally, on G P C , C V , or T , when available, providing a calibrated mapping function for the region of study.
This step ensures that the predictive relationship between Nighttime Light emissions and electricity consumption is established using observed data, before any temporal extrapolation. In the next subsection, we describe the forecasting of Nighttime Light intensity.

2.4. NTL Forecasting

The second stage consists of predicting the spatial distribution of NTL for future years based on past observations. Rather than predicting the NTL values for the entire image at once, with the proposed approach, we can estimate NTL values for each pixel, capturing fine-grained temporal patterns and providing precise predictions at each location.
The temporal evolution of NTL intensity can be modeled using a Random Forest Regressor. In this case, the model is trained to predict the NTL intensity for the year t + 1 , based on the NTL observations from the five preceding years, as follows:
N T L t + 1 = f ( N T L t , N T L t 1 , N T L t 2 , N T L t 3 , N T L t 4 )   ,
where f represents a nonlinear and nonparametric mapping, learned from the data through the ensemble of Decision Trees; in this way, each tree explores different combinations of temporal features, capturing the temporal dependencies in the NTL signal. By aggregating predictions from multiple trees, we can then use the Random Forest to provide a robust and interpretable forecast of the NTL intensity.
Thus, the NTL value of each pixel can be forecasted using the values from the five previous consecutive years; this pixel-level learning approach captures both temporal and spatial dynamics of human activity. First, the selection of the number of grids covering the entire area of interest, either a region or a country, must be performed. Each mosaic or grid is stored as a 2400 × 2400 matrix, with each pixel corresponding to a 500   m   ×   500 m area on the surface; these matrices contain numerical values indicating the intensity of Nighttime Light captured at each pixel. Subsequently, a procedure was developed to merge all the individual matrices into a single dataset, finding fine-grained temporal patterns and providing precise predictions for each location.
In this way, the yearly composites of NTL for the following year N T L p r e d are predicted. The Random Forest model was selected for its robustness in handling large datasets, its ability to capture intricate patterns in the data, and its proven effectiveness in regression tasks with high-dimensional input features.
The model performance can be validated by comparing estimated values with measured NTL data. Three metrics, commonly used in regression tasks, were calculated [32]. The Mean Absolute Error (MAE) represents the average magnitude of the errors between predicted and real values, without considering their direction, i.e., whether they are overestimations or underestimations; it is calculated as follows:
M A E = 1 N i = 1 N x i x ^ i   ,
where x i and x ^ i are the real and predicted N values of the variable, respectively. The MAE is easy to interpret, as it is expressed in the same units as the target variable, making it useful for understanding the average error magnitude in the context of NTL values.
The Mean Absolute Percentage Error (MAPE) is used to calculate the absolute difference between the real and the predicted N values and compute the mean of these quantities, as follows:
M A P E = 1 N i = 1 N x i x ^ i x i 100 %   .
The MAPE represents the average percentage error of the model across all test data.
The Coefficient of Determination, or Pearson’s coefficient (R2), quantifies the proportion of the variance in the dependent variable that can be predicted after the independent variable, calculated as follows:
R 2 = 1 i = 1 N x i x ^ i 2 i = 1 N x i x ¯ 2   ,
where x ¯ is the average of the real values.
Additionally, the Root Mean Square Error (RMSE) was employed to measure the mean difference between the real and predicted N values, as follows:
R M S E = i = 1 N x i x ^ i 2 N   ,
The lower the value, the better the prediction, and larger errors are penalized more severely.
The above metrics are used to measure the accuracy of the estimation. After verifying the accuracy of the model for the first year, further verification of the consecutive years should be performed to ensure consistent predictive accuracy. The predicted values of NTL can be mapped onto a Nighttime Light image for visualization.
To predict energy consumption at a regional level, it is first necessary to obtain NTL values for each spatial unit, which are areas smaller than the full extent of the satellite image. Once the numerical NTL brightness values are retrieved for each pixel in the image matrix, a function is implemented to assign latitude and longitude coordinates to every pixel, in order to determine which pixels belong to each spatial unit, thus enabling the aggregation of the total NTL values per region. In order to accomplish this, a dataset containing the geometry of each administrative boundary is required; then, the total NTL is obtained by summing the NTL values of all pixels located within its area. If pixels overlap multiple regions, their value is proportionally distributed according to the area covered by each region.
Once the predicted NTL is obtained, the next step is to calculate the EPC, as outlined in the following subsection.

2.5. Prediction of EPC

Once N T L p r e d is obtained, it is integrated with the EPC–NTL relationship obtained in Section 2.3 to forecast electricity consumption for year t + 1 . Each predicted NTL pixel is assigned to its administrative region.
Applying the simplified regression equation yields the following:
E P C t + 1 = e ( c + b ln N T L t + 1 )   .
In this way, a spatially explicit map of projected electricity consumption is obtained and can be aggregated at different spatial levels (municipal, regional, or national), depending on the need.
Either NTL or EPC predictions are treated at the municipal scale; this choice preserves urban–rural contrasts that would vanish at coarser aggregations and ensures that the outputs are directly usable for planning.

2.6. Model Evaluation and Validation

The predictive performance of the EPC prediction model is assessed using MAE, MAPE, and R2, calculated as in Equations (5) to (7). Lower MAE and MAPE values and an R2 closer to 1 indicate higher accuracy; on the other hand, larger MAE and MAPE values and an R2 closer to 0 indicate poor performance.

2.7. Implementation

All procedures can be implemented in Python 3.10 using Jupyter Notebooks within the Google Colab environment. To implement this methodology for the case study shown in the next section, we used an Intel Xeon CPU and 12 GB RAM. Finally, the Scikit-learn machine learning library was also used to achieve the successful implementation of the proposed framework [33].
Therefore, the final product of the proposed framework is a set of maps of the predicted annual electricity consumption across the region of interest; this proposed methodology separates forecasting of NTL from mapping of the EPC, which reduces overfitting and makes the methodology easier to apply to places where ground data are scarce or dated, approach which merges the predictive capacity of data-driven methods with the powerful information encoded in satellite remote sensing, offering an alternative to energy monitoring in areas where conventional data are scarce or outdated.
In the next section, we apply the proposed framework methodology, along with all its steps, to the case study of Mexico. The aim is to predict the EPC for a specific year by utilizing Nighttime Light satellite images, as well as energy and economic indicators from previous years.

3. Results

The methodology described above brings together the main elements needed to connect satellite data with regional electricity use. In simple terms, we first predict how Nighttime Light may behave over the next year, then apply that information to the EPC-NTL relationship to estimate electricity consumption for the same period. In this section, the focus is on working through the case study of Mexico, following all the steps and their respective results for each of the three stages described in Section 2. In the following subsections, we outline the steps for each stage, including the results obtained at each stage, beginning with the model’s input and the data collection.

3.1. Data and Preprocessing

In this subsection, we describe the gathering of a dataset combining satellite observations with municipal-level electricity consumption, gross domestic product, and temperature variables. The idea was to bring together sources that capture both the spatial distribution of human activity and the annual evolution of energy use across the country. The following subsections are used to describe the origin, characteristics, and purpose of each dataset used in the analysis.

3.1.1. Nighttime Light (NTL) Satellite Images

We gathered a series of Nighttime Light imagery using the VNP46A4 product; this dataset provides annual global mosaics at 500 m resolution, with effects of lunar illumination, viewing geometry, atmospheric conditions, and terrain already corrected.
Due to its extensive territory, Mexico is not covered by a single VIIRS tile. Following the procedure used in earlier analyses, 12 grids are required each year to fully cover the country—each a 2400 × 2400-pixel matrix, in which every pixel represents a 500 m × 500 m surface element—were downloaded and merged programmatically to produce a continuous national mosaic for each year in the study period.
We used images from the latest radiometrically stable segment of the Black Marble archive, covering the period from 2018 to 2024, in order to capture gradual changes in Nighttime Light. The mosaics reflect Mexico’s strong spatial contrasts, corresponding to bright metropolitan corridors in the Valley of Mexico, Monterrey, Guadalajara, and Puebla, mid-size urban centers, and extensive rural regions with limited artificial illumination.

3.1.2. EPC Data

Mexico is a federal republic, divided into 32 states and 2469 municipalities. With roughly 77 municipalities per state, this structure captures marked differences between urban and rural areas, making the municipal scale suitable for linking nighttime radiance with electricity use. The electricity consumption data used in this work comes from the annual municipal records reported by the Mexican electric utility, Comisión Federal de Electricidad (CFE) [34], a source that compiles total electricity uses in each municipality for each year, combining residential demand with commercial-, industrial-, agricultural-, and service-related consumption. The format is not overly complex, but it is detailed enough to describe how electricity use is distributed across the country.

3.1.3. Other Variables

In addition to Nighttime Light and electricity consumption, three auxiliary variables were compiled to represent the following broader socio-economic and climatic context of Mexican municipalities: average annual temperature (T), per capita gross domestic product (GDC), and the coefficient of variation of NTL (CV); these variables correspond to the general formulation on which this framework is based, as shown in Section 2.1, namely Equation (1).
In this case study, only 2018 average annual temperature data was required; data was collected from the “temperature_daily–maximum_annual-mean” product of the fifth-generation European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric reanalysis of the global climate (ERA5) [35], a product of the Copernicus Climate Data Store [36]. The data are provided as monthly values in degrees Kelvin, and only those corresponding to pixels over Mexico were selected and arranged into a single data matrix. Thereafter, a function was constructed to map each matrix index to its corresponding latitude and longitude coordinates. The mean yearly temperature for each municipality was obtained by identifying the pixels belonging to each municipality and considering their respective values.
Per capita gross domestic product (GDP) was obtained from the Sistema Nacional de Información Municipal (SNIM) [37]; on this site, there is political and socio-demographic information available for most of the municipalities and states of Mexico. In particular, the GDP is measured in Purchasing Power Parity (PPP) dollars. We used the GDP 2015 data due to insufficient information from various other years; however, as we explain in Section 3.3, the GDP does not have a significant impact on energy consumption. Therefore, we do not anticipate that this will substantially alter the results of our models.

3.2. Data Processing

All datasets were processed to create a unified structure in which every variable—satellite radiance, electricity consumption, and the auxiliary indicators—could be compared at the municipal level. The first step involved preparing the Nighttime Light mosaics. For each year, the twelve VIIRS tiles covering Mexico were merged into a single raster and assigned geographic coordinates using Python libraries including Rasterio, NumPy, and Pandas. Pixel values represent the observed NTL intensity for each year. Municipal boundaries were integrated through GeoPandas to aggregate pixel-level values into administrative units for calibration and mapping. The pixel values were georeferenced and assigned to the corresponding municipality, and temporal identifiers were added to construct a spatiotemporal table. If a pixel covered more than one municipality, its radiance was distributed proportionally by the shared area.
The remaining variables were prepared in a similar way. Temperature values from ERA5 were clipped to the country and matched to municipalities based on their coordinates. Per capita GDP was added using municipal identifiers, and the coefficient of variation of NTL was computed from the full distribution of radiance values inside each municipal boundary. After checking for inconsistencies or missing records, everything was combined into one table, where each municipality had the full set of variables; this processed dataset is the one used in the modeling steps that follow.

3.3. Data-Driven Modeling of the EPC-NTL Relationship

To model the EPC-NTL relationship, given in Equation (1), the municipal data collected and described in the previous subsections was used.
The results of the earlier study indicated that the EPC’s main dependence on the variables in the initial model is primarily due to NTL [20], showing that EPC grows almost linearly with NTL, whereas the dependence of EPC on temperature, GDP, and CV is much weaker, such that the relation between the EPC and NTL can be simplified to Equation (3). Indeed, Figure 2 shows the dependence of the EPC on each of the variables separately, finding that the dominant dependence is on NTL. Indeed, this NTL dependence dominates the other variables, as exhibited in Figure 3; thus, even though GDP data from 2015 were used, this does not significantly change the model’s results.
These results indicate that, at the municipal and annual scales, Nighttime Light intensity acts as an integrated proxy for multiple socio-economic drivers of electricity consumption, effectively subsuming the explanatory contributions of temperature and GDP. Moreover, the results explain why auxiliary variables add limited marginal predictive power once NTL is included.
The authors used Linear Regression, Decision Trees, Random Forests, and a simple Neural Network to determine which modeling approach best recovered the linear relationship given in Equation (2). Thus, 85% of the data for 2018, corresponding to 85% of the municipalities, was selected from the train–test split function of scikit-learn [32] and used as the training dataset, while the remaining 15%, corresponding to 15% of the municipalities, was used as the test set. The models were implemented in the Python programming language. Linear Regression uses gradient descent to minimize the differences between predictions and true values. The “DecisionTreeRegressor” model from the scikit-learn library was used with a maximum of 40 leaf nodes, found to be the best value to enhance model performance to avoid overfitting, without losing its good performance. The “RandomForestRegressor” model from the same library was optimized to the best performing configuration with the following hyperparameters: maximum depth of 7, 100 estimators and 2 maximum features. The Neural Network from TensorFlow’s Keras API consists of three dense layers. The 1st layer, with 128 neurons and ReLU activation function, was initialized with a normal distribution with an input shape of 4 features; the 2nd layer has 64 neurons and the same activation function and initialization settings; the 3rd layer is a single neuron with a linear activation function. The Adam optimizer and the MAE loss function were used to compile the model. Table 1 shows the performance of the 4 algorithms. Although all of them reproduced the basic trend, the Random Forest Regressor achieved the best performance, with the closest R2 to 1 and the lowest MAPE [20].
Thus, for the following steps, the Random Forest Regressor with optimal hyperparameters is used, obtained with a grid search to obtain the best results in the test set, as shown in Table 2.
The linear relationship between the logarithms of EPC and NTL, given in Equation (3), is shown in Figure 4. The y-intercept c and slope b were determined with the curve_fit function from the scipy.optimize library to the values c = 9.213 ± 0.065 and b = 0.874 ± 0.008 .
The performance metrics indicate an R2 value of 0.850 on the training dataset and 0.856 on the test dataset. Additionally, the MAPEs are 3.9% for the training set and 3.8% for the test set, reflecting low error and consistent performance across both datasets.
A possible limitation of the model is the differences in municipalities in Mexico, with some being heavily urban and very densely populated, while others are completely rural. To test how the model performs in these different settings, Figure 5 quantifies the error in the model predictions for 2018 as a function of the population density for the municipalities. The error is quantified using an “Error Factor”, which is simply the real EPC value divided by the predicted one, such that a result of 1 indicates a perfect prediction. As can be seen in the figure, municipalities with high population density have more consistently good predictions, while those with low population densities are usually overestimated, and sometimes underestimated.
Low-density municipalities in Mexico are typically characterized by higher shares of agricultural activity and other daytime-dominated economic sectors, which generate electricity consumption that is weakly correlated with nighttime illumination. As a result, NTL-based approaches tend to underestimate EPC in these areas, whereas predictions are more accurate in densely populated and highly illuminated urban municipalities.
Therefore, this analysis shows that, for this case study, NTL data alone are sufficient to estimate EPC, and that calculating electric power consumption for each municipality is straightforward, concluding the first stage of the proposed methodology and leading to the next stage, NTL forecasting, which is described in the following subsection.

3.4. NTL Forecasting

Figure 6 presents the single-image map of Mexico, created from the twelve grids downloaded from NASA Earth Data [5] for the years from 2014 to 2018.
The Random Forest regression function was employed to model the temporal relationships between consecutive years of NTL data, given in Equation (4). For this purpose, the NTL brightness values from 2014 to 2018 were used as independent variables, with each year representing one feature, to predict the target variable, which is the NTL value of the following year (2019). A train–test split was performed using scikit-learn, allocating 85% of the data to training and 15% to testing. We used the model to perform NTL future prediction for each pixel value separately, enabling highly detailed spatial predictions. After computing the predicted NTL values, they were mapped into the whole image. Figure 7a shows the real NTL map for 2019, and Figure 7b shows the predicted one. At first glance, the two look quite similar: the bright areas appear where we expect them, and the general spatial pattern is maintained. Of course, a visual comparison only tells part of the story, so later, we include numerical error measures to evaluate how close the prediction really is.
This initial period served to train the Random Forest model. We repeated the same routine for the following years, constantly shifting the five-year window used as input. To obtain the estimate for 2020, we took the NTL images from 2015 to 2019; for 2021, we used 2016 to 2020; for 2022, the 2017–2021 period; and for 2023, the 2018–2022 sequence.
For 2019, which we used as the first real test of the model, the Random Forest performed quite well. The errors were relatively low (MAE around 4.8 and a MAPE just above 9%), and the fit to the data was strong, with an R2 close to 0.93; these numbers give us a reasonable level of confidence before looking at the estimates for the following years. Figure 8 shows the comparison of the estimated NTL values with the actual values downloaded for the year 2019. The graph shows that most data points align closely with the identity line, indicating that, on average, the estimated values closely match the actual values; this alignment highlights the model’s ability to accurately replicate real NTL patterns for the initial validation year.
After training the model with data from 2014 to 2018 to estimate 2019, the strong performance observed for that year confirmed that the Random Forest model was well-trained and ready to be applied to subsequent years. Therefore, the same trained model was used to estimate NTL values for 2020–2023. We followed this approach because, for these years, the actual NTL data are available on the EarthData website [5], which enables an additional validation process, allowing us to assess how well the model performed and to analyze the predicted values in greater detail. Figure 9 presents the NTL images estimated with the model for these years.
Figure 10 presents graphs comparing the estimated NTL values with the actual values downloaded for the years 2020 to 2023. In all four graphs, we observe that most data points also align closely with the identity line, indicating that, on average, the estimated values closely match the actual values; this alignment reflects the model’s ability to accurately replicate real NTL patterns for the analyzed years.
Finally, Table 3 summarizes the error values for all the years we estimated (2020–2023), using the same three metrics: MAE, MAPE, and R2. For reference, the 2019 results are also shown, so everything can be compared in one place. Note that the years 2020 and 2021 exhibit the highest prediction errors; these years correspond to a period of pronounced deviations in observed NTL values, which are reflected in increased MAE and MAPE and a reduced R2. In contrast, prediction accuracy improved for 2022 and 2023, as shown in the corresponding error metrics in Table 3. Note that the reduced performance observed for 2020–2021 coincides with the COVID-19 pandemic; this behavior confirms that the forecasting error is driven by exogenous structural shocks, rather than model instability, highlighting the sensitivity of NTL-based predictors to abrupt socio-economic disruptions.
Once the model’s predictions were evaluated against the actual measured values, using the metrics described earlier, the model was deemed suitable for predicting subsequent years; this approach enabled the model to learn how luminosity changes over time, enabling predictions of the next year’s pixel-level values based on patterns identified in previous years. The implementation of this method follows the same pixel-based framework described earlier, ensuring consistency across spatial units (municipalities in our case) and facilitating the generation of yearly estimations.
The municipal NTL maps for Mexico are generated from the previously obtained dataset. Since the spatial correspondence between pixel brightness and municipal boundaries had already been established, producing the maps involved visualizing the aggregated NTL values per municipality. Figure 11 shows the resulting nighttime luminosity distribution maps, generated in Python using matplotlib, across the country for each year.
To evaluate the robustness of the proposed NTL forecasting stage with respect to the temporal context, a sensitivity analysis was conducted by varying the length of the historical input window while keeping the model structure and hyperparameters the same. Three configurations (3-, 5-, and 7-year windows) were tested, using 2019 as a common validation year. Table 4 summarizes the corresponding performance metrics.
The five-year input window was selected because it provides a good balance between capturing recent changes in Nighttime Light patterns and maintaining stable model performance. Shorter windows are more affected by year-to-year fluctuations, due to the limited amount of training data, while longer windows include older observations that may no longer reflect current lighting conditions, leading to slightly reduced forecasting accuracy.
With stage 2 of the proposed methodology now complete, in the following subsection, the final stage of the methodology for predicting EPC for the case study is detailed.

3.5. Prediction of EPC

Using the c and b parameters calculated in Section 3.3, as well as the predicted NTL obtained in Section 3.4, it is possible to predict the EPC for one year by applying Equation (9); therefore, the electrical power consumption for each municipality from 2019 to 2023 is estimated. Finally, by integrating the geographic coordinate data, the predicted EPCs for all municipalities in Mexico for these years are shown in Figure 12.

3.6. Model Evaluation and Validation

To verify the model’s performance beyond the training year, the predicted electrical energy consumption for 2022 was compared with the actual municipal-level data available for that year, as presented in ref [37]. As presented in Figure 13, the predicted and actual EC values were plotted, using the Pandas library, to show the accuracy of the prediction. The close fit of most points to this line indicates a strong correspondence between predicted and observed values. In addition to the visual comparison, the MAE, MAPE, and R2 were computed to evaluate the model’s accuracy further, yielding MAE = 0.67, MAPE = 4.32%, and R2 = 0.8113. At the municipal level, this level of accuracy is sufficient to preserve relative consumption patterns, which is the primary requirement for planning and policy-oriented applications in data-scarce contexts.
It is important to note that the year 2022 was selected for this verification because it is the most recent year for which official EC data at the municipal level are publicly available.

3.7. 2024, a Bonus Year

As a bonus year, electric power consumption was predicted for 2024 at the municipal level. Although no official data are available to validate these predictions, generating these estimates is precisely one of the main goals of this study: to provide a practical method for generating electric power consumption data when recent records are missing or delayed.
Following the proposed methodology, historical data from 2019 to 2023 is used to predict the NTL values for 2024, as shown in Figure 14. Given that 2024 is the only year without actual data for comparison, the prediction depends entirely on the model’s ability to generalize patterns from past trends. Despite this challenge, the strong performance demonstrated in previous years supports the reliability of this forecasting approach.
In order to obtain the municipal electric power consumption estimates for 2024, we first predicted the NTL values for that year, and then aggregated them to the municipal level, as shown in Figure 15a. Subsequently, the complete map of municipal electric power consumption for 2024 was elaborated by applying the established relationship between NTL intensity and energy use, shown in Figure 15b.

4. Discussion

The resulting maps are suitable for identifying high-demand regions, growth trends, and energy inequities. The proposed framework addresses the challenge of predicting EPC due to a lack of economic information by leveraging NTL from satellites and an IA technique known as Random Forest.
The results presented in Section 3.3 help explain the limited predictive contributions of auxiliary variables in the proposed framework. Although temperature and GDP per capita are known drivers of electricity demand at finer temporal scales, their relationships with annual municipal EPC are highly dispersed, as shown in Figure 2 and Figure 3; this behavior is largely attributable to spatial and temporal aggregation effects, as well as to the use of temporally static auxiliary datasets. Consequently, their limited contributions reflect data resolution constraints, rather than their intrinsic irrelevance.
The strong performance of the simplified two-variable model (NTL–EPC) in the Mexican case can be explained by the integrative nature of Nighttime Light intensity. At the annual and municipal scales, NTL effectively aggregates the combined effects of population density, economic activity, urban infrastructure, and electrification levels. As shown in the feature importance analysis (Figure 2 and Figure 3), NTL dominates the explanatory power of the model, while temperature, GDP per capita, and CV contribute only marginally. In this context, adding additional variables introduces limited new information and may even amplify noise due to spatial aggregation and temporal mismatch in auxiliary datasets. Consequently, a parsimonious model, centered on NTL, captures the dominant signal governing EPC in Mexico, achieving high predictive performance without unnecessary model complexity.
Recent studies have successfully applied ensemble methods, such as Gradient Boosting Regression Trees (GBRT) and deep learning architectures, including CNN–LSTM models, to Nighttime Light–based energy analysis; however, the selection of Random Forest (RF) in this study was driven by its robustness, interpretability, and reproducibility. The proposed framework operates on annual NTL composites and municipal-level tabular data, a setting in which RF has demonstrated strong performance while remaining resilient to multicollinearity, moderate sample sizes, and heterogeneous feature distributions. In contrast, CNN–LSTM architectures typically require dense, high-frequency temporal sequences and large training datasets to fully exploit their spatiotemporal learning capabilities, conditions that are not always met in regional NTL-based energy studies. Moreover, RF enables explicit assessment of variable importance, which is critical for transparent interpretation and policy-relevant applications. For these reasons, RF was selected as the core algorithm, providing a balanced trade-off between predictive accuracy and methodological transparency.
The reduced performance observed for 2020–2021, as shown in Table 3, coincides with the COVID-19 pandemic, which caused abrupt and non-stationary changes in economic activity and electricity use across Mexico. The associated decline in nighttime illumination led to systematic overestimations with the model, which relies on historical temporal patterns. Importantly, the recovery of predictive accuracy in 2022–2023 indicates that the framework does not collapse under anomalous conditions and is able to adapt once the system stabilizes, supporting its robustness for real-world applications.
Hence, as shown, RF is a machine learning method suitable for predicting EPC from NTL. RF enables EPC prediction due to its efficiency in handling complex, nonlinear relationships and multicollinearity, as reported by Guo (2021) [16], who demonstrated that the RF model was optimal for estimating EC and urban build-up area (B-A), compared to alternatives such as Geographically Weighted Regression (GWR), Linear Regression, and Neural Networks, attributing RF’s success to its efficiency in handling complex, nonlinear relationships and multicollinearity. Prior work also confirmed the superior performance of ensemble-based models, such as RF, over Linear Regression, Decision Trees, and Neural Networks for linking NTL to electricity use, as demonstrated by Gallegos et al. (2024) [19]. Table 5 presents recent related work; the first column lists the author(s) and year, the second column lists the ML technique(s) tested (the optimal model is in bold), and the third column lists the relevant metrics.
According to Table 5, Random Forest (RF) was identified as one of the optimal traditional models for electric power consumption (EPC) due to its ability to handle complex nonlinearities without prior probability assumptions.
In the proposed framework, RF provided competitive metrics to those reported in the literature. The RF model trained for this framework on five years of NTL observations demonstrated strong predictive performance for the following year. For the initial test year (2019), the NTL forecast demonstrated an R2 of 0.93 and an MAPE of 9.17%, aligning well with the performance of sophisticated temporal models used in other recent studies, such as the monthly prediction work in the Yangtze River Delta, which reported a MAPE of 10.38% according to Chen et al. (2025) [21].
The work of Darshini et al. (2025) [13] favors LSTM variants, particularly Bidirectional LSTM, for multi-step forecasting because they effectively capture long-term temporal dependencies, obtaining an R2 of 0.94, which is very close to the best result reported in this research, an R2 of 0.93.
Locally Adaptive Methods (R2: 0.997 average), reported by Hu et al. (2022) [17], are used for global-scale mapping, choosing the best model for each specific country/district, and are the most effective methods through which to account for geographic heterogeneity.
The methodology of this framework provides a transparent, replicable approach that uses globally available, corrected annual NTL composites (VNP46A4), making it particularly valuable for regions struggling with data scarcity, as in Cambodia [18]. Therefore, the proposed framework presents a practical, data-driven alternative for energy monitoring in areas where conventional statistical records are incomplete or missing.

Limitations

The presented framework is innovative, but its dependence on annual NTL composites limits temporal granularity, making it inadequate for capturing seasonal or monthly variations in electricity demand.
The main limitation of the framework is that NTL data struggles to capture economic activities that lack nighttime illumination, such as agriculture and some secondary industries, a common problem in related research (e.g., Guo, 2021 [16]).
Estimating EC and its spatial distribution can be conducted only on an annual basis; the annual NTL composite limits temporal granularity, but it remains a promising approach for estimating electricity demand, as demonstrated by Gao et al. (2022) [18].
GDP is underrepresented in some regions and does not correspond to NTL activity, and the time windows of NTL satellite images vary, sometimes covering short periods or non-representative hours of electricity consumption.
Another key limitation noted was the lack of publicly available data, which the authors of this research aimed to address by demonstrating the utility of readily available variables. For instance, the most recent information for EC by sector and municipality available in Mexico is from 2022 [37]. Despite government public information repositories being online, there is no publicly available and easily accessible database that breaks down electricity consumption by municipality in Mexico.
The spatial resolution of Nighttime Light products is an additional factor influencing EPC estimation. The VNP46A4 Black Marble product used in this study (500 m resolution) represents a compromise between spatial detail and signal stability, which is appropriate for municipal-level analysis in Mexico. At this resolution, individual pixels may include mixed land uses, particularly in small or rural municipalities; however, aggregating NTL values over entire administrative units helps mitigate pixel-level noise. Higher-resolution NTL products, such as SDGSAT-1, can potentially reduce mixed-pixel effects and improve spatial detail in small or heterogeneous municipalities, but they may also increase sensitivity to local noise and require more complex preprocessing. Conversely, coarser-resolution products (e.g., DMSP-OLS) tend to smooth spatial variability, which may reduce accuracy in densely populated urban areas with fine-grained lighting patterns; these trade-offs indicate that the optimal NTL resolution depends on the spatial scale of analysis and data availability. In future work, researchers could explore multi-resolution or cross-product approaches to further assess the impact of spatial resolution on EPC estimation.
Future extensions of the framework could integrate auxiliary datasets, such as land use/land-cover maps or daytime satellite imagery, in order to better represent the electricity consumption associated with agricultural or daytime-only activities in low-density municipalities.
Although the proposed framework is demonstrated using Mexico as a case study, its applicability to other regions depends on several structural and data-related factors. In regions with low electrification rates, Nighttime Light signals may be sparse, and electricity consumption may be dominated by a limited number of users, reducing the correlation between NTL and EPC. In such cases, aggregating NTL over larger spatial units or incorporating electrification indicators may be necessary.
In regions with high industrial electricity consumption, where energy use is weakly associated with nighttime illumination, using NTL-based approaches may underestimate EPC unless complemented with sector-specific auxiliary data. Additionally, in areas with frequent cloud cover, NTL data quality may be degraded, requiring longer temporal input windows, quality-filtered composites, or alternative satellite products.
Based on these considerations, the best practices for adapting the framework to new regions include the following: (i) selecting the spatial scale of analysis according to electrification density and settlement patterns; (ii) adjusting the temporal input window to balance data availability and robustness; (iii) incorporating auxiliary variables only when they are available at compatible spatial and temporal resolutions; and (iv) validating predictions using partial or indirect EPC proxies when official data are scarce. The above guidelines aim to facilitate the practical application of the framework in data-limited contexts beyond the Mexican case study.

5. Conclusions

The findings attained from the proposed framework clearly established the critical importance of Nighttime Light (NTL) satellite imagery for estimating electric power consumption. The strong linear relationship between the logarithm of EC and the logarithm of NTL meant that even a simple model accurately captures energy consumption.
The framework successfully trained an ML model to estimate municipal electric power consumption in Mexico for 2018. The Random Forest model proved effective in predicting EPC from NTL.
Variables commonly assumed to have strong influences, such as temperature (T) and per capita gross domestic product (GDP), demonstrated limited predictive power in the municipal context.
The generalization capability of the framework was confirmed through a case study in 2022, conducted at the state level, that used real EPC data.
The results of this study emphasize that employing a framework with ML techniques supported by accessible NTL data provides a viable alternative for characterizing electric power consumption, which is critical for planning, addressing energy inequality, and ensuring sustainable development.
Future research will focus on improving the performance of the prediction algorithm and testing the framework in other world regions. We will utilize Convolutional Neural Networks, particularly CNN-LSTM, as reported by Darshini et al. (2025) [13]. Another research direction is to incorporate recent economic indicators into the framework and test their performance.

Author Contributions

Conceptualization, M.B.; methodology, M.B. and J.G.; software, J.G., J.A.H.-A. and G.R.-C.; validation, M.B., J.G. and J.A.H.-A.; formal analysis, M.B.; investigation, G.L.L. and G.R.-C.; resources, M.B., J.G., G.L.L., V.M.A. and O.A.J.; data curation, J.G., V.M.A. and O.A.J.; writing—original draft preparation, M.B. and J.A.H.-A.; writing—review and editing, M.B., J.A.H.-A., G.L.L., V.M.A., G.R.-C. and O.A.J.; visualization, M.B.; supervision, M.B.; project administration, M.B.; funding acquisition, M.B., J.A.H.-A., G.L.L., V.M.A., G.R.-C. and O.A.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Codes and data used in this work are available at https://github.com/moniborunda/Energy-consumption-with-satellite-images/ (accessed on 20 December 2025).

Acknowledgments

M.B. thanks SECIHTI for her “Investigadoras e Investigadores por México” research position with I.D. 71557 and CENIDET-TECNM for the hospitality and support.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
BPNNBack Propagation Neural Network
BUADBuilt-Up Area Density
CFEComisión Federal de Electricidad (Mexican Electric Utility)
CNNConvolutional Neural Network
CVCoefficient of Variation of NTL
DMSPDefense Meteorological Satellite Program
OLSOperational Linescan System
ECEnergy Consumption
ECMWFEuropean Centre for Medium-Range Weather Forecasts
EPCElectric Power Consumption (or Electricity Consumption)
ERA5ECMWF Atmospheric Reanalysis of the Global Climate
FCNNSFully Connected Neural Networks
GDCPer capita Gross Domestic Product
GDPGross Domestic Product
GISAGlobal Impervious Surface Area
GWRGeographically Weighted Regression
IEAInternational Energy Agency
INEGIInstituto Nacional de Estadística y Geografía
LRLinear Regression
LSTMLong Short-Term Memory
MAEMean Absolute Error
MAPEMean Absolute Percentage Error
MAREMean Absolute Relative Error
MLMachine Learning
NTLNighttime Light or Nighttime Light Satellite Imagery
NPPSuomi National Polar-Orbiting Partnership (also Suomi-NPP)
PPrecision
PPPPurchasing Power Parity
RRecall
R2Coefficient of Determination (or Pearson’s coefficient)
RFRandom Forest
RMSERoot Mean Square Error
SE4ALLSustainable Energy For All
SDG7Sustainable Development Goal 7 (United Nations Sustainable Development Goal 7)
SNIMSistema Nacional de Información Municipal
SOLSum of Lights
SVMSupport Vector Machine
TTemperature
VIIRSVisible Infrared Imaging Radiometer Suite

References

  1. IEA World Energy Outlook 2023. IEA Publications. Available online: https://www.iea.org/reports/world-energy-outlook-2023 (accessed on 16 October 2025).
  2. Bhattacharyya, S.C. Energy Access and Development. In The Handbook of Global Energy Policy; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2013; pp. 227–243. [Google Scholar]
  3. Balance Nacional de Energía, SENER. Available online: https://base.energia.gob.mx/BNE/BalanceNacionalDeEnerg%C3%ADa2022.pdf (accessed on 16 October 2025).
  4. Zhong, L.; Lin, Y.; Yang, P.; Liu, S.; He, Y.; Xie, Z.; Yu, P. Quantifying the inequality of urban electric power consumption and its evolutionary drivers in countries along the belt and road: Insights from satellite perspective. Energy 2024, 312, 133425. [Google Scholar] [CrossRef]
  5. NASA. Nighttime Lights. Available online: https://www.earthdata.nasa.gov/topics/human-dimensions/nighttime-lights (accessed on 16 October 2025).
  6. Román, M.O.; Wang, Z.; Sun, Q.; Kalb, V.; Miller, S.D.; Molthan, A.; Schultz, L.; Bell, J.; Stokes, E.C.; Pandey, B.; et al. NASA’s Black Marble nighttime lights product suite. Remote Sens. Environ. 2018, 210, 113–143. [Google Scholar] [CrossRef]
  7. Levin, N.; Kyba, C.C.M.; Zhang, Q.; de Miguel, A.S.; Román, M.O.; Li, X.; Portnov, B.A.; Molthan, A.L.; Jechow, A.; Miller, S.D.; et al. Remote sensing of night lights: A review and an outlook for the future. Remote Sens. Environ. 2020, 237, 111443. [Google Scholar] [CrossRef]
  8. Fonseca Flores, A.; Oro Boff, V.M.; Freitas Silveira Netto, C.; Brei, V.; Limongi, R. Using Nightlight Satellite Imagery to Predict Energy Consumption in Multiple Spatial-Temporal Aggregations with Machine Learning. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4599953 (accessed on 20 December 2025).
  9. Lu, W.; Zhang, D.; He, C.; Zhang, X. Modeling the spatiotemporal dynamics of electric power consumption in China from 2000 to 2020 based on multisource remote sensing data and machine learning. Energy 2024, 308, 132971. [Google Scholar] [CrossRef]
  10. Guo, X.; Wang, Y. Estimation of regional electricity consumption using national polar-orbiting partnership’s visible infrared imaging radiometer suite night-time light data with gradient boosting regression trees. Remote Sens. 2024, 16, 3841. [Google Scholar] [CrossRef]
  11. Bhattarai, D.; Lucieer, A.; Lovell, H.; Aryal, J. Remote sensing of night-time lights and electricity consumption: A systematic literature review and meta-analysis. Geogr. Compass 2023, 17, e12684. [Google Scholar] [CrossRef]
  12. Jean, N.; Burke, M.; Xie, M.; Alampay Davis, W.M.; Lobell, D.B.; Ermon, S. Combining Satellite Imagery and Machine Learning to Predict Poverty. Science 2016, 353, 790–794. [Google Scholar] [CrossRef] [PubMed]
  13. Darshini, R.; Kumar, A.; Vensuslaus, M.A.; Rishikeshan, C.A.; Victor, J.T.J. Forecasting Electricity Consumption of India through Nighttime Satellite Imagery. PLoS ONE 2025, 20, e0327031. [Google Scholar]
  14. Cheng, L.; Feng, R.; Wang, L.; Yan, J.; Liang, D. An assessment of electric power consumption using random forest and transferable deep model with multi-source data. Remote Sens. 2022, 14, 1469. [Google Scholar] [CrossRef]
  15. Montoya-Rincon, J.P.; Azad, S.; Pokhrel, R.; Ghandehari, M.; Jensen, M.P.; Gonzalez, J.E. On the Use of Satellite Nightlights for Power Outages Prediction. IEEE Access 2022, 10, 16729–16739. [Google Scholar] [CrossRef]
  16. Guo, B.; Bian, Y.; Zhang, D.; Su, Y.; Wang, X.; Zhang, B.; Wang, Y.; Chen, Q.; Wu, Y.; Luo, P. Estimating Socio-Economic Parameters via Machine Learning Methods Using Luojia1-01 Nighttime Light Remotely Sensed Images at Multiple Scales of China in 2018. IEEE Access 2021, 9, 34352–34365. [Google Scholar] [CrossRef]
  17. Hu, T.; Wang, T.; Yan, Q.; Chen, T.; Jin, S.; Hu, J. Modeling the Spatiotemporal Dynamics of Global Electric Power Consumption (1992–2019) by Utilizing Consistent Nighttime Light Data from DMSP-OLS and NPP-VIIRS. Appl. Energy 2022, 322, 119473. [Google Scholar] [CrossRef]
  18. Gao, X.; Wu, M.; Gao, J.; Han, L.; Niu, Z.; Chen, F. Modelling Electricity Consumption in Cambodia Based on Remote Sensing Night-Light Images. Appl. Sci. 2022, 12, 3971. [Google Scholar] [CrossRef]
  19. Gallegos, J.; Borunda, M.; Garduno, R.; García-Beltrán, C.D. Spatial Intelligent Estimation of Energy Consumption. In Advances in Soft Computing. MICAI 2024; Lecture Notes in Computer Science, Martínez-Villaseñor, L., Ochoa-Ruíz, G., Eds.; Springer: Cham, Switzerland, 2024; Volume 15247. [Google Scholar]
  20. Chakraborty, S.; Stokes, E.C. Adaptive Modeling of Satellite-Derived Nighttime Lights Time-Series for Tracking Urban Change Processes Using Machine Learning. Remote Sens. Environ. 2023, 298, 113818. [Google Scholar] [CrossRef]
  21. Chen, S.; Yan, D.; Li, C.; Chen, J.; Yan, J.; Zhang, Z. Monthly Urban Electricity Power Consumption Prediction Using Nighttime Light Remote Sensing: A Case Study of the Yangtze River Delta Urban Agglomeration. Remote Sens. 2025, 17, 2478. [Google Scholar] [CrossRef]
  22. Hall, O.; Dompae, F.; Wahab, I.; Dzanku, F.M. A review of machine learning and satellite imagery for poverty prediction: Implications for development research and applications. J. Int. Dev. 2023, 35, 1753–1768. [Google Scholar] [CrossRef]
  23. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  24. Xie, Y.; Weng, Q. World energy consumption pattern as revealed by DMSP-OLS nighttime light imagery. GIScience Remote Sens. 2016, 53, 265–282. [Google Scholar] [CrossRef]
  25. NASA’s Black Marble. Available online: https://blackmarble.gsfc.nasa.gov/ (accessed on 17 November 2025).
  26. Wang, Z.; Román, M.O.; Sun, Q.; Kalb, V.; MacManus, K.; Ryan, R.E. Nasa’s Black Marble Product Suite: Validation Strategy. In Proceedings of the IGARSS 2018–2018 IEEE International Geoscience and Remote Sensing Sympsosium, Valencia, Spain, 22–27 July 2018; pp. 8197–8200. [Google Scholar] [CrossRef]
  27. VIIRS Land Visible Infrared Imaging Radiometer Suite. Available online: https://viirsland.gsfc.nasa.gov/Products/NASA/BlackMarble.html (accessed on 17 November 2025).
  28. VNP46A4—VIIRS/NPP Lunar BRDF-Adjusted Nighttime Lights Yearly L3 Global 15 ARC Second Linear Lat Lon Grid. Available online: https://gkhub.earthobservations.org/packages/qp5v1-f6606 (accessed on 13 February 2025).
  29. International Energy Agency. Available online: https://www.iea.org/data-and-statistics (accessed on 15 July 2025).
  30. World Bank Group. Available online: https://data.worldbank.org/indicator/EG.USE.ELEC.KH.PC (accessed on 15 July 2025).
  31. UNdata. Available online: https://data.un.org/ (accessed on 15 July 2025).
  32. Klyuev, R.V.; Morgoev, I.D.; Morgoeva, A.D.; Gavrina, O.A.; Martyushev, N.V.; Efremenkov, E.A.; Mengxu, A. Methods of forecasting electric energy consumption: A literature review. Energies 2022, 15, 8919. [Google Scholar] [CrossRef]
  33. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  34. Datos abiertos, Gobierno de México. Usuarios y Consumo de Electricidad por Municipio. Available online: https://datos.gob.mx/busca/dataset/usuarios-y-consumo-de-electricidad-por-municipio-a-partir-de-2018 (accessed on 24 May 2025).
  35. ECMWF Reanalysis v5 (ERA 5). Available online: https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5 (accessed on 8 May 2025).
  36. Copernicus Climate Data Store. ERA 5. Available online: https://cds.climate.copernicus.eu/#!/home (accessed on 8 May 2025).
  37. Instituto Nacional para el Federalismo y el Desarrollo Municipal. Sistema Nacional de Información Municipal. Available online: http://www.snim.rami.gob.mx (accessed on 8 May 2025).
Figure 1. Main methodology for regional energy consumption forecasting.
Figure 1. Main methodology for regional energy consumption forecasting.
Applsci 16 00449 g001
Figure 2. Dependence of EC on: (a) NTL, (b) T, (c) GDP, and (d) CV.
Figure 2. Dependence of EC on: (a) NTL, (b) T, (c) GDP, and (d) CV.
Applsci 16 00449 g002
Figure 3. Comparison of dependence of EC on NTL, T, and CV.
Figure 3. Comparison of dependence of EC on NTL, T, and CV.
Applsci 16 00449 g003
Figure 4. Relationship between the natural logarithm of EC and the natural logarithm of NTL for municipalities in 2018, along with the Linear Regression fit derived from this data.
Figure 4. Relationship between the natural logarithm of EC and the natural logarithm of NTL for municipalities in 2018, along with the Linear Regression fit derived from this data.
Applsci 16 00449 g004
Figure 5. Error factor in municipality predictions of EPC for 2018 as a function of population density.
Figure 5. Error factor in municipality predictions of EPC for 2018 as a function of population density.
Applsci 16 00449 g005
Figure 6. NTL images of Mexico for the years (a) 2014, (b) 2015, (c) 2016, (d) 2017, and (e) 2018. The axis units are in pixels.
Figure 6. NTL images of Mexico for the years (a) 2014, (b) 2015, (c) 2016, (d) 2017, and (e) 2018. The axis units are in pixels.
Applsci 16 00449 g006
Figure 7. Comparison of NTL images for 2019. Image (a) corresponds to the actual NTL satellite image, and image (b) corresponds to the forecasted NTL image. The axis units are in pixels.
Figure 7. Comparison of NTL images for 2019. Image (a) corresponds to the actual NTL satellite image, and image (b) corresponds to the forecasted NTL image. The axis units are in pixels.
Applsci 16 00449 g007
Figure 8. Comparison of the estimated NTL values versus the actual values for the year 2019. The axis units are in NTL units.
Figure 8. Comparison of the estimated NTL values versus the actual values for the year 2019. The axis units are in NTL units.
Applsci 16 00449 g008
Figure 9. Estimated NTL images of Mexico for the years (a) 2020, (b) 2021, (c) 2022, and (d) 2023. The axis units are in pixels.
Figure 9. Estimated NTL images of Mexico for the years (a) 2020, (b) 2021, (c) 2022, and (d) 2023. The axis units are in pixels.
Applsci 16 00449 g009
Figure 10. Comparison of estimated NTL values versus actual values for (a) 2020, (b) 2021, (c) 2022, and (d) 2023. The axis units are in NTL units.
Figure 10. Comparison of estimated NTL values versus actual values for (a) 2020, (b) 2021, (c) 2022, and (d) 2023. The axis units are in NTL units.
Applsci 16 00449 g010aApplsci 16 00449 g010b
Figure 11. Map of predicted NTL by municipality in Mexico for the years (a) 2019, (b) 2020, (c) 2021, (d) 2022, and (e) 2023. The maps show the total NTL for each municipality in Mexico, obtained by summing the values of all pixels within each municipality.
Figure 11. Map of predicted NTL by municipality in Mexico for the years (a) 2019, (b) 2020, (c) 2021, (d) 2022, and (e) 2023. The maps show the total NTL for each municipality in Mexico, obtained by summing the values of all pixels within each municipality.
Applsci 16 00449 g011
Figure 12. Energy Consumption in kWh for each municipal entity for the years (a) 2019, (b) 2020, (c) 2021, (d) 2022, and (e) 2023.
Figure 12. Energy Consumption in kWh for each municipal entity for the years (a) 2019, (b) 2020, (c) 2021, (d) 2022, and (e) 2023.
Applsci 16 00449 g012
Figure 13. Comparison of predicted and actual electric power consumption values in [kWh] for each municipal entity for the year 2022.
Figure 13. Comparison of predicted and actual electric power consumption values in [kWh] for each municipal entity for the year 2022.
Applsci 16 00449 g013
Figure 14. Predicted NTL image of Mexico for the year 2024. The axis units are in pixels.
Figure 14. Predicted NTL image of Mexico for the year 2024. The axis units are in pixels.
Applsci 16 00449 g014
Figure 15. Map showing predicted values of (a) NTL and (b) EPC by municipality in Mexico for the year 2024, and (b) EPC by municipality in Mexico in 2024.
Figure 15. Map showing predicted values of (a) NTL and (b) EPC by municipality in Mexico for the year 2024, and (b) EPC by municipality in Mexico in 2024.
Applsci 16 00449 g015
Table 1. Performance of the tested algorithms for estimating the energy consumption relationship.
Table 1. Performance of the tested algorithms for estimating the energy consumption relationship.
AlgorithmMAE
(ln NTL Units)
MAPE
(%)
R2RSME
(ln NTL Units)
Linear Regression0.6003.80.8620.800
Decision Trees0.5703.60.8730.760
Random Forest0.5403.40.8810.720
Neural Network0.5503.50.8690.740
Table 2. Optimal hyperparameters for the Random Forest model.
Table 2. Optimal hyperparameters for the Random Forest model.
HyperparameterOptimal ValueFunction/Purpose
max_depth
(Maximum Depth)
7Limits the maximum depth of each individual Decision Tree to prevent overfitting by controlling complexity.
n_estimators
(Number of Trees)
100Determines the number of independent Decision Trees in the forest to increase the robustness and stability of predictions.
max_features
(Maximum Features)
2Sets the maximum number of independent variables (features) considered for splitting a node, introducing randomness to improve diversity.
Table 3. MAE, MAPE, and R2 values for the predicted NTL images from 2019 to 2023.
Table 3. MAE, MAPE, and R2 values for the predicted NTL images from 2019 to 2023.
YearsMAE (NTL Units)MAPE (%)R2
20194.779.170.93
202017.7234.080.85
202117.6433.920.84
202215.9528.700.86
202314.8827.400.85
Table 4. Sensitivity analysis of NTL forecasting performance with respect to the temporal input window length using 2019 as the validation year.
Table 4. Sensitivity analysis of NTL forecasting performance with respect to the temporal input window length using 2019 as the validation year.
Input Window LengthTraining
Period
MAE
(NTL Units)
R2
3-year2016–20185.60.89
5-year2014–20184.80.93
7-year2012–20185.20.91
Table 5. Comparison with related works.
Table 5. Comparison with related works.
Author(s)
and Year
ML Technique(s)Values of Metrics
(Optimal Model)
Guo et al.
(2021) [16]
Random Forest (RF), GWR, LR,
BPNN, SVM
R2: 0.91, RMSE: 10.42,
MAE: 5.48 (RF at city scale)
Hu et al.
(2022) [17]
Locally adaptive selection
(Linear, Exponential, Logarithmic,
2nd-order Polynomial)
R2: 0.997 (Average),
MARE: 2.962% (Global)
Gao et al.
(2022) [18]
Exponential, Linear, and Power
functions
Adj-R2: 0.967,
Mean RE: 5.47%
Gallegos et al. (2024) [19]Random Forest, Neural Network,
Decision Tree, Linear Regression
R2: 0.881, MAPE: 3.4%
(Random Forest)
Darshini et al.
(2025) [13]
Bidirectional LSTM, Stacked LSTM,
CNN-LSTM, ConvLSTM
R2: 0.94, RMSE: 2.83 MU
(Bidirectional LSTM)
Chen et al.
(2025) [21]
Five-parameter model (integrating
NTL and temperature interaction)
MARE: 7.13% (Annual
prediction), RMSE: 361,064
This researchRandom Forest (RF)Best result R2: 0.93, MAPE 9.17%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Borunda, M.; Gallegos, J.; Hernández-Aguilar, J.A.; Lopez Lopez, G.; Alvarado, V.M.; Ruiz-Chavarría, G.; Jaramillo, O.A. A Machine Learning Framework for Predicting Regional Energy Consumption from Satellite-Derived Nighttime Light Imagery. Appl. Sci. 2026, 16, 449. https://doi.org/10.3390/app16010449

AMA Style

Borunda M, Gallegos J, Hernández-Aguilar JA, Lopez Lopez G, Alvarado VM, Ruiz-Chavarría G, Jaramillo OA. A Machine Learning Framework for Predicting Regional Energy Consumption from Satellite-Derived Nighttime Light Imagery. Applied Sciences. 2026; 16(1):449. https://doi.org/10.3390/app16010449

Chicago/Turabian Style

Borunda, Monica, Jessica Gallegos, José Alberto Hernández-Aguilar, Guadalupe Lopez Lopez, Victor M. Alvarado, Gerardo Ruiz-Chavarría, and O. A. Jaramillo. 2026. "A Machine Learning Framework for Predicting Regional Energy Consumption from Satellite-Derived Nighttime Light Imagery" Applied Sciences 16, no. 1: 449. https://doi.org/10.3390/app16010449

APA Style

Borunda, M., Gallegos, J., Hernández-Aguilar, J. A., Lopez Lopez, G., Alvarado, V. M., Ruiz-Chavarría, G., & Jaramillo, O. A. (2026). A Machine Learning Framework for Predicting Regional Energy Consumption from Satellite-Derived Nighttime Light Imagery. Applied Sciences, 16(1), 449. https://doi.org/10.3390/app16010449

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop