Next Article in Journal
Static Voltage Stability Assessment of Electricity Networks Using an Enhanced Line-Based Index
Previous Article in Journal
Research on the Energy Conversion Mechanism of Engine Speed, Turbulence and Combustion Stability Based on Large Eddy Simulation
Previous Article in Special Issue
Energy Poverty in China: Measurement, Regional Inequality, and Dynamic Evolution
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning for Sustainable Urban Energy Planning: A Comparative Model Analysis

1
Thomas Jefferson High School for Science and Technology, 6560 Braddock Road, Alexandria, VA 22312, USA
2
Geographic Data Science Lab, Department of Geography and Planning, University of Liverpool, Chatham Street, Liverpool L69 7ZT, UK
3
Department of Computational and Data Sciences, George Mason University, 4400 University Drive, Fairfax, VA 22030, USA
*
Author to whom correspondence should be addressed.
Energies 2026, 19(1), 176; https://doi.org/10.3390/en19010176
Submission received: 4 November 2025 / Revised: 24 December 2025 / Accepted: 26 December 2025 / Published: 29 December 2025
(This article belongs to the Special Issue Environmental Sustainability and Energy Economy: 2nd Edition)

Abstract

Accurate short-term forecasting of urban electricity demand is essential for operational planning and climate-resilient energy management. This study evaluates four forecasting models, namely, Prophet, Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Temporal Convolutional Networks (TCN), across 15 U.S. cities representing diverse climatic regimes. Model performance is assessed at 1, 6, 12, and 24 h horizons using MAE, RMSE, MAPE, and R2 within a unified, climate-aware evaluation framework. Results show that Prophet consistently outperforms deep learning models at longer horizons (12–24 h), achieving MAE reductions of approximately 70–90% relative to LSTM and GRU across all climatic clusters, while maintaining R2 values above 0.95 even in highly variable climates. At short horizons (1–6 h), LSTM and GRU perform competitively in climatically stable cities, reducing MAE by up to 15–25% compared with Prophet, but their accuracy deteriorates rapidly as forecast horizons increase. TCN exhibits intermediate performance, outperforming recurrent models in selected short-horizon cases but showing reduced robustness under high climate variability. Statistical testing indicates that model performance varies significantly across cities within climatically heterogeneous clusters (p < 0.05), highlighting the influence of climatic variability on forecasting reliability. Overall, the results demonstrate that model effectiveness is strongly context-dependent, providing quantitative guidance for climate-aware model selection in urban energy systems.

1. Introduction

Forecasting energy demand in urban areas is critical for managing resources, reducing operational costs, and achieving sustainability goals. On average, cities contribute over 75% of countries’ Gross Domestic Product (GDP) and are the main engines of economic growth. However, this economic activity is accompanied by significant environmental costs, as cities also consume approximately 75% of global primary energy, while emitting between 50% and 60% of the world’s total greenhouse gases [1]. This disproportionate energy consumption highlights the challenges cities encounter in driving economic development while maintaining sustainability and energy efficiency. The increasing density of urban environments, hosting 55% of the world population today with an expected increase to 68% by 2050 [2], further increases the pressure placed on city infrastructure to deliver reliable and efficient energy services [3]. This demand is not limited to basic infrastructure but extends to buildings, which account for significant energy consumption (36% of global consumption [4]) throughout their life cycle, from raw material extraction to daily operations such as lighting, air conditioning, and cleaning. As cities strive to meet the rising energy demand, ensuring an uninterrupted and efficient energy supply becomes an urgent priority to support continued urban growth and development [1]. This urgency was emphasised at the recent United Nations Conference of the Parties meeting (COP28), where nearly 200 countries committed to addressing these challenges [5].
Accurate energy demand prediction not only supports efficient resource allocation but also plays a vital role in strategic planning for infrastructure development, sustainability initiatives, and energy policy formulation [6,7]. Governments and energy providers rely on demand forecasts to make informed decisions on energy production, distribution, and storage, all of which have significant cost implications [8]. Errors in energy demand forecasting can lead to energy shortages, driving up energy prices and potentially exacerbating fuel poverty [9,10]. Additionally, reliable energy forecasts enable policymakers to design more effective sustainability initiatives, such as integrating renewable energy sources into the grid or planning future smart city developments where energy efficiency is paramount [11,12,13]. For instance, cities that seek to achieve net zero carbon targets (more than 700 [14]) must align their energy infrastructure investments with accurate predictions of future energy needs to ensure both economic feasibility and environmental responsibility.
The advent of machine learning and deep learning models has revolutionised the field of energy prediction, offering novel techniques that outperform traditional forecasting methods. Conventional models like ARIMA (autoregressive integrated moving average) or multiple linear regression (MLR), while effective in simple, stable environments, often struggle to capture the complex, non-linear, and interdependent factors influencing dynamic systems [15,16], such as urban energy consumption. Machine learning models, on the other hand, can analyse large datasets, detect hidden patterns, and adapt to changes in consumption trends, external variables, and environmental factors [17,18], making them suitable for complex environments such as urban energy systems [19]. These models excel in handling the high dimensionality and diversity of urban energy data, which often includes temporal, spatial, and socio-economic variables [20,21]. Further, as cities become more interconnected through smart grid technologies, the ability to process vast amounts of real-time data has become a critical advantage of these advanced models, enabling a more responsive and adaptable approach to energy management [17,22]. Consequently, machine learning approaches provide more robust and adaptive solutions, better suited to the complexity of such real-world scenarios.
Despite this methodological progress across multiple subfields, including building-level forecasting, ensemble learning, deep learning, and spatio-temporal modelling, a clear research gap remains. Most studies focus on individual buildings, single cities, or geographically narrow regions [23,24,25], making it difficult to determine whether reported performance gains persist across broader urban systems. As a result, it remains unclear how forecasting models behave across diverse cities characterised by markedly different climatic regimes. Although weather variables are commonly used as predictors, the role of climatic variability itself as a driver of forecasting uncertainty is rarely examined explicitly. Furthermore, few studies adopt climate-aware or regime-based analytical frameworks prior to model comparison, limiting the interpretability and transferability of results. In addition, most evaluations focus on short-term horizons, typically one to several hours ahead, providing limited insight into how model performance degrades at medium- and long-term horizons.
Several studies illustrate these constraints in practice. Many focus on specific building attributes [26,27,28] or particular building types [29,30,31], while others concentrate on single industrial sectors [32,33]. Temporal scope is another limitation: much of the literature relies on short, single-year datasets [34,35,36], or aggregates demand to daily or monthly intervals that obscure finer-scale variability [37,38]. Even studies using high-resolution hourly data often omit contextual features such as day-of-week or seasonal markers [39,40], reducing their ability to capture diurnal and seasonal demand patterns. These examples demonstrate how data and methodological constraints in prior work hinder the development of forecasting models that can generalise reliably across cities with diverse climatic and temporal demand regimes. This synthesis directly motivates a comparative model-evaluation problem rather than the development of yet another forecasting architecture. Specifically, it remains unclear under what climatic conditions and forecast horizons advanced deep learning models genuinely outperform simpler statistical approaches, and whether such advantages are robust across heterogeneous urban environments.
To address these challenges, this study undertakes a systematic comparison of four advanced time-series forecasting models: Prophet, Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Temporal Convolutional Network (TCN). These models are selected as representative members of distinct modelling families, statistical decomposition and deep learning sequence modelling, allowing performance differences to be interpreted in relation to both climatic regime and forecast horizon. By integrating climatic variability through a coefficient-of-variation-based clustering framework, this study explicitly links model performance to environmental regime.
The aim of this study is therefore to evaluate, in a climate-aware and multi-horizon framework, the performance of Prophet, LSTM, GRU, and TCN across fifteen U.S. cities with diverse climatic and geographic conditions. By doing so, the study identifies context-specific strengths and limitations of each model and provides empirical guidance for climate-dependent model selection. The findings are intended to inform the design of more efficient and resilient urban energy systems, contributing to sustainable energy planning in the context of climate change and rapid urbanisation.
The key contributions of this study are summarised as follows:
  • A systematic, multi-city benchmarking of Prophet, LSTM, GRU, and TCN across fifteen U.S. cities with diverse climatic and geographic conditions.
  • A climate-aware evaluation framework based on clustering cities using the coefficient of variation of key meteorological variables.
  • A multi-horizon performance assessment (1, 6, 12, and 24 h) using MAE, MSE, RMSE, MAPE, and R2.
  • Quantitative evidence that Prophet consistently outperforms deep learning models at longer forecast horizons across all climatic clusters.
  • Empirical demonstration that urban climate variability is a key determinant of forecasting reliability.

2. Background

Accurate energy consumption predictions are essential for advancing urban sustainability, particularly in meeting net-zero targets. Machine learning has become pivotal, enabling models that optimise energy use, reduce emissions, and support resilient infrastructure design. These predictions facilitate targeted interventions, guide energy-efficient retrofits, and align urban planning with long-term sustainability goals for green cities. Early studies on energy prediction relied primarily on statistical and time-series methods, such as MLR and ARIMA models [41,42,43]. These models were widely adopted due to their simplicity and ease of interpretation, performing adequately in stable, predictable environments where energy demand followed regular seasonal or daily patterns. However, the limitations of traditional models became evident as cities grew more complex, with demand fluctuations driven by diverse, non-linear influences such as climate variability, socio-economic shifts, and technological changes [3,44,45,46]. As a result, these methods struggled to capture the multifaceted dynamics of urban energy use, particularly under irregular consumption patterns or rapidly changing external conditions.
The shortcomings of traditional models have paved the way for machine learning approaches, which offer flexibility and improved predictive power in handling high-dimensional, non-linear data. Early applications of machine learning in energy forecasting utilised algorithms such as Support Vector Machines (SVMs) and Decision Trees (DTs), which proved more capable of handling complex data patterns compared to traditional models [17]. DTs, in particular, random forest, has been shown to perform better over other machine learning algorithms. Fan et al. [47], for example, compared the performance of random forest (RF), regression tree (RT), and support vector regression (SVR—a variant of SVM) in predicting energy consumption for two buildings in Florida. The results of that study showed that the prediction performances of RF, measured by a performance index, were 14–25% and 5–5.5% better than RT and SVR, respectively. In a related work, Candanedo et al. [48] applied RF to forecast short-term electricity consumption in a Belgian household. The authors evaluated the prediction performance of RF against three other models: MLR, SVR, and Gradient Boosting Machines (GBM). The results showed that RF and GBM performed better than MLR and SVR. Similarly, Ahmad et al. [49] used RF to forecast the hourly Heating, Ventilation, and Air Conditioning (HVAC) electricity consumption of a hotel in Madrid, Spain and reported performance comparable to ANN-based feed-forward networks; 91% and 95% for RF and ANN respectfully. While these studies confirm the strong short-term performance of RF-based models, their analyses are primarily restricted to individual buildings or single urban locations, limiting the generalisability of findings and offering limited insight into long-horizon urban-scale forecasting.
Ensemble models have also gained popularity for their ability to combine the strengths of multiple learners. Priyadarshini et al. [50] employed an ensemble learning approach combining DT, RF, and gradient boosting methods to forecast energy consumption in smart buildings. This study highlighted the benefits of using ensemble techniques to improve predictive accuracy and robustness by leveraging the strengths of multiple algorithms. Divina et al. [51] also used various weaker models, a regression tree, RF and ANN to build an ensemble model showing better performance compared to individual models. Similarly, Khashei et al. [52] demonstrated the effectiveness of hybrid models, integrating neural networks with ARIMA, for short-term energy forecasting. Their research showed that hybrid models could effectively capture both linear and non-linear relationships in energy consumption data, leading to better performance than traditional univariate models. Although ensemble and hybrid methods often yield higher accuracy, they introduce substantial model complexity, increased computational cost, and reduced interpretability, which limit their scalability for large, multi-city applications.
More recently, there has been increasing exploration in the use of deep learning models. For example, Alizadegan et al. [53] compared LSTM and Bidirectional LSTM (Bi-LSTM) to traditional forecasting methods ARIMA and Seasonal ARIMA (SARIMA) for energy consumption prediction. That work showed that both LSTM models, in particular, Bi-LSTM, outperformed the traditional models. Paterakis et al. [54] also compared the results of an MLP enriched deep learning model with several popular machine learning models, including, logistic regression (LR), MLR, different RTs, SVRs, and gaussian models to predict aggregated energy demand and showed that the former outperformed all other models. Hrnjica and Mehr [55] demonstrated the effectiveness of LSTM networks in forecasting energy demand for smart grids in Nicosia, Northern Cyprus. Their study indicated that LSTM networks could capture complex temporal dependencies within energy consumption data, addressing inefficiencies observed when compared to a standard time series model based on decomposition. Other work by [56] further used an LSTM framework within a deep learning model to forecast electricity demand, capturing both short-term fluctuations and long-term trends. Their approach demonstrated superior performance against several benchmark models, indicating the LSTM’s capability in handling complex, non-linear relationships in time-series data. Naji et al. [57] also explored the use of Extreme Learning Machines (ELM) for estimating building energy consumption. The results of study showed improved accuracy and efficiency over traditional ANN approaches. The ELM’s ability to handle diverse input variables and provide rapid training made it an promising option for real-time applications. Despite their high short-term accuracy, deep learning models are data-intensive, computationally expensive, and often exhibit performance degradation at longer forecast horizons. Moreover, most existing deep learning studies remain limited to individual buildings, single cities, or homogeneous climatic regions, restricting their transferability across heterogeneous urban environments.
Several recent studies have investigated hybrid deep learning methods to further enhance prediction accuracy. Su et al. [58] introduced a hybrid model combining wavelet transform and enhanced deep Recurrent Neural Networks (RNNs) for forecasting natural gas demand. Their approach used both simulated and real datasets, demonstrating superior performance over conventional methods. Karijadi and Chou. [59] combined both RF and LSTM for building energy prediction showing that this approach performed better than traditional ML approaches that included LR, SVR, ANN, and RF on its own. Similarly, del Real et al. [60] combined Convolutional Neural Networks (CNNs) and ANNs to forecast energy demand in France. Their findings revealed that the hybrid model significantly improved prediction metrics such as mean absolute error (MAE) and mean absolute percentage error (MAPE) compared to using ANNs alone, underscoring the advantage of integrating feature extraction capabilities of CNNs with the regression strength of ANNs. While these hybrid approaches enhance accuracy, they further increase model complexity and black-box behaviour, hindering interpretability and large-scale operational deployment.
Spatio-temporal learning has also contributed to recent advances. Kim and Cho [61] proposed a CNN-LSTM neural network model to forecast residential energy consumption, capturing both spatial and temporal dynamics in household power consumption data. Their model outperformed several conventional techniques, achieving high accuracy even in the presence of irregular consumption patterns. Amasyali and El-Gohary [62] further emphasised the role of deep learning in energy prediction by using a multi-layer Deep Neural Network (DNN) to model cooling energy consumption based on outdoor weather conditions. Their simulation-based approach across various climates demonstrated the robustness of deep learning methods compared to traditional models such as SVM and RF. However, these approaches require dense spatial sensing and high-dimensional inputs, which are not consistently available at the urban-system scale.
In addition to these methodological limitations, broader multi-region studies highlight further gaps. For example, Cawthorne et al. [63] analysed electricity demand across multiple U.S. states but lacked the spatial and temporal granularity needed to assess model performance under heterogeneous climatic and urban conditions. Such studies often rely on aggregated signals that obscure local demand variability driven by differences in urban form, building stock, and infrastructure. Consequently, empirical evidence remains limited on how forecasting accuracy changes systematically across climatically diverse urban environments.
Forecast horizons represent another major source of variation in the existing literature. Most statistical and machine learning studies focus predominantly on short-term predictions, typically one to several hours ahead because short horizons exhibit stronger temporal autocorrelation and are easier to model [47,48,61]. Medium- and long-term horizons, such as 6-, 12-, or 24 h forecasts, are examined far less frequently, despite their relevance for operational planning, load balancing, and energy market coordination. Deep learning models often achieve strong performance at short horizons but tend to suffer degradation as horizons lengthen [55,56,64]. Conversely, simpler statistical models may remain competitive, or even superior, at longer horizons due to their explicit modelling of trend and seasonality. The limited attention to medium- and long-term horizons leaves open the question of how different models deteriorate over time and whether certain architectures display greater climate-dependent robustness.
Table 1 summarises a representative selection of studies across statistical, machine learning, hybrid, deep learning, and spatio-temporal approaches, highlighting their methodological assumptions, typical forecasting horizons, and principal advantages and limitations. This synthesis demonstrates that most prior work concentrates on building-scale or single-city forecasting, relies heavily on short forecasting horizons, and seldom evaluates how climatic variability influences model performance. Very few studies compare statistical and deep learning models under a consistent, climate-aware, multi-city framework. As a result, existing findings remain difficult to generalise across regions with differing climatic, economic, and infrastructural characteristics.
Daily and seasonal temperature fluctuations, industrial activity, urban form, and technology adoption all shape energy use patterns [49,76,77,78,79]. Cities with temperate climates tend to exhibit relatively stable consumption, while those with extreme temperature regimes show pronounced heating and cooling loads. Economic composition, building stock, and grid infrastructure further influence demand dynamics [80]. These spatial and climatic differences imply that a forecasting model that performs well in one city may fail to generalise elsewhere unless climatic regime characteristics are explicitly incorporated into the modelling framework.
This study addresses these gaps by systematically evaluating four forecasting models: Prophet, LSTM, GRU, and TCNs, across fifteen U.S. cities with diverse climatic profiles. It employs a climate-aware clustering approach based on the coefficient of variation of key meteorological variables to examine how climatic regimes influence model performance. The study also evaluates each model across multiple forecasting horizons (1, 6, 12, and 24 h) using standard error metrics. By integrating climatic variability into a multi-city, multi-horizon design, this work provides a rigorous assessment of model robustness and identifies the conditions under which simpler models outperform deep learning architectures. The findings offer empirically grounded guidance for climate-dependent model selection and support the development of forecasting systems that are more resilient, scalable, and adaptable to diverse urban contexts.

3. Materials and Methods

3.1. Data

In this research, two data sets were used. The first dataset comprised energy demand data obtained from the United States (US) Energy Information Administration (EIA) [81]. Specifically, hourly electric grid data in megawatts (MW) were sourced from the EIA-930 platform, which serves as a centralised repository for high-voltage bulk electric power grid information across the contiguous US. This data were collected from electricity balancing authorities and provides detailed hourly operational statistics for subregions. These regions vary in size, ranging from individual cities to large areas that overlap with multiple states. In this study, the focus was on energy demand at the city level. To achieve this, the data were further filtered to align with these specific geographic boundaries in mind. This resulted in a set of 15 cities of interest (See Figure 1): Baltimore (Maryland), Chicago (Illinois), El Paso (Texas), Houston (Texas), Los Angeles (California), New York City (New York), Omaha (Nebraska), Philadelphia (Pennsylvania), Portland (Oregon), San Antonio (Texas), San Diego (California), Seattle (Washington), Tallahassee (Florida), Tampa (Florida), and Tucson (Arizona). With the exception of Houston and San Antonio, where data collection began on 27 May 2019 due to the data’s availability, the start and stop dates for cities were 1 July 2015 and 1 July 2018, respectively. A short description of each city is provided in Table A1 in Appendix A for comparison.
The second dataset comprised meteorological data obtained from Meteosat [82], a weather and climate database that provides detailed weather data for thousands of weather stations and places worldwide. Specifically, for each study location, data on temperature, wind speed, pressure, relative humidity, and dew point were collected. These variables were selected due to their established influence on energy consumption patterns, as demonstrated in previous studies [83,84,85,86]. The meteorological data were collected at an hourly resolution to align temporally with the energy demand dataset.
To further understand the energy consumption patterns across cities, Figure 2 displays box plots showing the distribution of energy demand in the various cities at hourly intervals. These plots provide insights into variations in consumption, the presence of outliers, and factors influencing demand. To improve comparability between plots and ensure a clearer representation of the data, extreme outliers with values above 60,000 MW were removed. This adjustment was necessary as these outliers significantly skewed the visualisation. In total, three extreme outliers were removed, two from Chicago and one from Omaha, allowing for a more accurate comparison of the distributions of energy demand between cities.
The figure reveals that some cities, such as Chicago, Houston, Los Angeles, and New York City, exhibit significantly higher energy demand with wider interquartile ranges, indicating substantial fluctuations in consumption. Despite the removal of extreme outliers, several cities still show a considerable number of data points beyond the upper whiskers, particularly in Houston, Chicago, and New York City. These persistent outliers suggest that these cities experience periodic spikes in energy usage, likely driven by industrial activity, seasonal variations, or population density effects, among others. One possible key contributor to these fluctuations is the Urban Heat Island (UHI) effect, which elevates temperatures in densely built environments due to the prevalence of impervious surfaces and reduced vegetation cover [87,88]. UHI intensifies cooling demand during warmer months, especially in large metropolitan areas like Houston and New York City, where extensive infrastructure and anthropogenic heat exacerbate urban temperatures [89,90]. These elevated temperatures place additional strain on HVAC systems, thereby increasing overall energy consumption and amplifying demand variability [91].
Further, the distribution of energy demand in many cities appears to be right-skewed, as indicated by medians positioned closer to the lower quartile and extended upper whiskers. This skewness suggests that while most observations cluster around lower values, occasional extreme peaks drive up overall demand. The high median energy demand observed in Houston, New York City and Chicago reinforces the idea that these cities require significantly more energy, possibly due to their larger populations, extensive infrastructure, or higher economic activity.
In contrast, cities such as Seattle, Tallahassee and El Paso exhibit relatively stable energy consumption, as evidenced by their compact interquartile ranges and the limited presence of outliers. This stability may reflect consistent usage patterns, possibly due to milder climates or lower fluctuations in industrial and commercial activity. Differences in energy demand between cities can be attributed to various factors, such as climate, population density, and economic activity [92,93,94]. Cities in extreme climates may require more energy for heating or cooling, while those with high industrial and commercial activity generally exhibit greater fluctuations in demand. In addition, cities with well-implemented energy efficiency measures may demonstrate a more stable demand over time.

3.2. Preprocessing

To ensure transparency, reproducibility, and consistency across cities and models, a unified preprocessing pipeline was applied throughout the study. Missing observations in both energy demand and meteorological variables were handled using listwise deletion. Missing values accounted for a small proportion of the data (mean 2.2%, standard deviation 2.2%) and were sparsely distributed in time; therefore, deletion was preferred over interpolation to avoid introducing artificial temporal structure into the time-series. Outliers were identified using the interquartile range (IQR) method, whereby observations lying below Q 1     1.5   ×   IQR or above Q 3   +   1.5   ×   IQR were removed. This procedure was applied independently for each city to account for differences in demand scale and variability, and was intended solely to mitigate the influence of extreme values that could disproportionately affect model training and evaluation.
Feature engineering was central to capturing both temporal dependencies and weather-driven variability in energy demand. Time-based indicators were derived from the timestamp, including hour of day, day of week, month of year, and a binary weekend indicator, enabling the models to learn diurnal, weekly, and seasonal consumption patterns. Additional predictors were constructed to capture short-term temporal dependencies and climatic effects. Lagged demand features (Lag 1 to Lag 24) were introduced to model hourly autocorrelation, while rolling statistics of temperature (24 h mean and standard deviation) were computed to represent intra-day variability and smoothed climatic trends. An interaction term between temperature and relative humidity was also included to capture their combined, non-linear influence on energy consumption. The final feature set combined these engineered variables with the meteorological predictors described in Section 3.1.
All input features were normalised using Min–Max scaling to the range [0, 1]. Scaling parameters were computed exclusively from the training set for each city and subsequently applied to the corresponding validation and test sets to prevent information leakage. This normalisation ensured numerical stability during neural network training and maintained comparability across heterogeneous variables such as energy demand, temperature, and humidity [95]. The processed data were then structured into fixed-length input sequences paired with output targets corresponding to forecast horizons of 1, 6, 12, and 24 h ahead. This sequential arrangement preserved the chronological order of observations, allowing the models to learn both short-term dependencies and longer-term temporal trends driven by climatic and temporal variability.

3.3. Model Development

This research evaluates four time-series forecasting methods: Prophet and three deep learning approaches: TCN, LSTM, and GRU. These models were selected to provide a balanced and methodologically diverse benchmark representing three major modelling paradigms used in energy forecasting: (i) interpretable statistical decomposition (Prophet), (ii) recurrent neural networks for sequential modelling (LSTM and GRU), and (iii) convolutional sequence architectures (TCN). Together, these models enable a systematic comparison between classical statistical forecasting and state-of-the-art deep learning under a unified multi-city, multi-horizon framework. This design aligns with recent reviews emphasising the need for comparative evaluations across model classes rather than within a single family [15,16,17].
Importantly, this selection was made to ensure conceptual breadth while avoiding redundancy. Models such as Random Forest (RF), XGBoost, or hybrid CNN–LSTM architectures, although widely used in energy forecasting [47,48,61], either (a) do not provide temporal sequence modelling in their native form and require additional feature engineering to capture autocorrelation structures; (b) introduce substantial model complexity that exceeds the aims of this comparative study, particularly hybrid deep learning models that combine convolutional and recurrent components [60,61]; or (c) overlap functionally with the deep learning models already selected (e.g., CNN–LSTM hybrids share representational characteristics with TCN and LSTM). The chosen model set therefore captures the dominant modelling families, statistical, recurrent, and convolutional, without over-expanding the design space, ensuring a focused yet comprehensive comparison consistent with prior methodological recommendations [18,19].
Prophet is an additive time-series model developed by Meta (formerly Facebook), designed to handle datasets with strong seasonal trends and missing values. It decomposes time-series data into trend, seasonality, and holiday components, making it robust for scenarios with stable or interpretable patterns [96]. Prophet is included as a transparent statistical baseline that captures trend and seasonality with minimal tuning, enabling clear interpretability and low computational cost. Studies have shown that Prophet performs competitively with ARIMA and, in some cases, with deep learning methods in energy demand forecasting [97,98]. Prophet was implemented using its default structure, with adjustments to seasonalities and changepoints.
LSTM and GRU models capture temporal dependencies in time-series data, making them suitable for energy demand prediction. These models were implemented using the TensorFlow Keras library. Both architectures included a recurrent layer with 50 units, followed by a dense layer to produce the predicted energy demand. A dropout regularisation of 20% was applied to prevent overfitting. Sequences of length 24 were prepared for both the feature set and the target variable to model temporal dependencies. The models were trained using the Adam optimiser with a learning rate of 0.001 in line with previous related work [99,100]. Training was conducted for 30 epochs with a batch size of 32, using mean squared error (MSE) as the loss function. Early stopping was applied with a patience of three epochs to restore the best weights, and 10% of the training data was used for validation. The model with the lowest validation loss was saved. LSTM was selected for its ability to model long-range temporal dependencies via gated memory mechanisms, while GRU was included to assess whether a reduced-parameter recurrent architecture with faster convergence can deliver comparable forecasting accuracy under identical data and training conditions. Research has shown that LSTMs and GRUs outperform traditional models in urban energy demand forecasting by accounting for non-linear interactions between weather, socio-economic factors, and historical consumption data [101,102].
TCN was implemented using the Keras library and adapted for sequential data using dilated convolutions to capture long-range temporal dependencies. The model architecture consisted of multiple convolutional layers, with dilation factors increasing in each layer to capture dependencies over various time scales. Unlike recurrent architectures, TCNs model long temporal sequences using causal and dilated convolutions without relying on internal memory states, while supporting full parallelisation during training. TCNs are particularly efficient due to their ability to be parallelised, enabling faster training while maintaining performance. This makes them suitable for complex urban energy systems with high temporal granularity data [103,104]. TCN was included to provide a non-recurrent deep learning benchmark for direct comparison with LSTM and GRU under identical forecasting horizons. Previous studies have demonstrated TCNs’ effectiveness in modelling dynamic energy consumption in environments with irregular fluctuations [105].
To ensure fair comparison across deep learning architectures, all models were trained using a consistent set of hyperparameters rather than exhaustive hyperparameter optimisation. The selected dropout rate (20%), batch size (32), learning rate (0.001), input sequence length (24), and Adam optimiser follow common practice in urban- and building-scale energy forecasting [99,100,101,102,106]. This strategy ensures that observed performance differences primarily reflect architectural characteristics rather than tuning advantages, consistent with methodological guidance on transparent and reproducible model comparison [15,18]. Consequently, variations in predictive performance across cities are interpreted as reflecting differences in climatic regimes and demand dynamics, rather than suboptimal model calibration. While city-specific hyperparameter optimisation may improve absolute accuracy for individual locations, it would undermine the study’s objective of conducting a transparent, reproducible, and architecture-focused comparison. Future work could extend this framework by exploring adaptive or climate-aware hyperparameter tuning strategies to further enhance local forecasting performance.
To preserve the temporal structure of the data, each city’s dataset was split into training (80%), validation (10%), and test (10%) sets, following established practice in time-series forecasting [107,108]. Test sets consisted exclusively of future observations at 1 h, 6 h, 12 h, and 24 h forecasting horizons, enabling evaluation across multiple temporal scales. Model robustness was further assessed using five-fold rolling-origin time-series cross-validation, where each validation fold occurred chronologically after its corresponding training window. This validation strategy reflects realistic forecasting conditions and avoids information leakage associated with randomised resampling, which is inappropriate for sequential data. Together, these procedures ensure temporal consistency and statistical validity in model performance evaluation.

3.4. Model Evaluation

Forecasting performance was evaluated using multiple metrics to provide a comprehensive assessment of model accuracy across different cities and horizons. The following standard metrics were considered [109]:
  • Mean Absolute Error (MAE): Measures the average magnitude of the errors in the predictions, providing a direct interpretation of the typical deviation between predicted and observed values.
  • Mean Squared Error (MSE): Calculates the average of the squared differences between predicted and observed values, giving higher weight to larger errors.
  • Root Mean Squared Error (RMSE): The square root of MSE, which expresses the error in the same units as the original observations and allows easier interpretation.
  • Coefficient of Determination (R2) Represents the proportion of variance in the observed data explained by the model. Values closer to 1 indicate better explanatory power.
  • Mean Absolute Percentage Error (MAPE): Measures the average percentage deviation of predicted values from actual observations, providing an interpretable scale-independent metric for comparing performance across cities with varying energy demand levels.
The average performance across the five cross-validation folds was calculated for each model, horizon, and city. This allowed for a detailed comparison of forecasting accuracy across multiple metrics and facilitated identification of the most effective models for specific cities and city clusters (see Section 3.5).

3.5. Performance Across Cities

To group cities with similar weather patterns, the K-means clustering algorithm [110] was applied using only the raw weather variables. To determine the optimal number of clusters, a range of k values (2 to 10) were tested. The Elbow Method was used by plotting the within-cluster sum of squares (WCSS) against k, with the optimal number of clusters identified at the point where additional clusters resulted in minimal reduction in WCSS. The Silhouette Score was also calculated for each k to validate cluster cohesion and separation, confirming the choice of four clusters. This clustering allowed cities with similar climatic conditions to be grouped together, providing a basis for subsequent comparisons of model performance across clusters.

3.6. Significance Testing

To further examine whether the observed patterns in model performance are consistent across cities within the same cluster, the Kruskal–Wallis test [111] was applied to the average model performance values for each city, model, and forecast horizons. This is a non-parametric statistical method used to assess whether there are significant differences in distributions between two or more independent groups. In this context, it evaluates whether model performance differs significantly among cities within a single cluster. The null hypothesis states that the average model performance across cities within a cluster is equal, while the alternative hypothesis posits that at least one city exhibits a significantly different average performance. A p-value below 0.05 indicates a statistically significant difference in model performance between cities.

3.7. Clustering of Climate Variables

To better understand energy demand fluctuations across cities, as previously discussed for Figure 2, the coefficient of variation (CoV) was computed for each climate variable (CV) for each city. CoV is a useful statistical measure that expresses the ratio of the standard deviation to the mean of a variable, providing insights into the relative variability of each climate variable. A higher CoV indicates greater variability in the climate conditions, while a lower CoV suggests more consistent climate patterns. For this analysis, CoV was calculated using hourly climate data (1 h horizon) rather than aggregated data at longer intervals (6 h, 12 h, or 24 h). This approach was chosen because short-term fluctuations in climate variables such as temperature, humidity, and precipitation can have the most immediate impact on energy demand, particularly for heating, cooling, and other climate-sensitive systems. Also, using longer horizons would smooth out these high-frequency variations, potentially underestimating the influence of climate variability on energy consumption. Moreover, since the energy demand data used in this study is available at hourly resolution, using 1 h CV ensures consistency and comparability between climate inputs and energy responses, allowing for a more precise analysis of the relationship between climate variability and model performance.
The clustering was based exclusively on the coefficient of variation (CoV) of meteorological variables, specifically temperature, dew point, relative humidity, precipitation, wind speed, and surface pressure (see Table A3 in Appendix C). These variables were selected due to their well-established influence on heating and cooling demand and their role as exogenous drivers of short-term energy consumption. Energy-demand variability was deliberately not included in the clustering process. Including demand-derived metrics would risk circularity, as the resulting clusters would then be partially defined by the target variable used to evaluate model performance. By clustering cities solely on exogenous climatic variability, the analysis isolates climate regime effects and enables a clearer assessment of how forecasting accuracy responds to environmental heterogeneity rather than endogenous demand patterns.
The computed CoV values were then subjected to hierarchical clustering to group cities based on the similarity of their climate variability. This clustering helps identify patterns and associations between climate conditions and energy demand fluctuations, highlighting cities that experience similar levels of climatic instability or consistency. The resulting clusters were compared to the energy demand patterns observed in the box plots in Figure 2, enabling a deeper understanding of how climate variability influences energy consumption and the performance of different forecasting models.

4. Results

This section presents the results of the model performance analysis, offering insights into the effectiveness of different forecasting models. Performance is examined both at the individual city level (Section 4.1) and within clusters based on shared climatic conditions (Section 4.2), providing an overview of how each model performs under varying conditions. In addition, Section 4.3 evaluates whether the observed differences in model performance across cities within each cluster are statistically significant, helping to assess the extent to which local conditions influence forecasting accuracy. Finally, the results of the covariance analysis are presented in Section 4.4 and Section 4.5 examines its relationship with forecasting performance.

4.1. Model Performance for Individual Cities

Table A2 in Appendix B provides performance of the four forecasting methods, Prophet, GRU, LSTM, and TCN, across the studied cities and forecast horizons. Overall, Prophet demonstrates consistently low errors and high R2 values, indicating strong and reliable forecasting accuracy compared with deep learning models. In cities such as Baltimore, Chicago, and New York City, Prophet achieves particularly low MAE and MAPE values across all horizons, including 6 h, 12 h, and 24 h forecasts, whereas GRU, LSTM, and TCN exhibit steadily increasing errors as the horizon extends. Deep learning models generally perform well for short-term horizons of 1–6 h, but their accuracy declines significantly at 12 and 24 h horizons, while Prophet maintains robust performance, reflecting the effectiveness of its decomposition-based approach in capturing both short- and long-term temporal trends.
Smaller or mid-sized cities, including Tallahassee, Omaha, El Paso, and Tucson, show more variability in model performance across horizons. In Tallahassee, Prophet consistently outperforms deep learning models, particularly at longer horizons; for example, at the 24 h horizon, Prophet achieves an RMSE of 28.34 and R2 of 0.9941, whereas the GRU model reaches an RMSE of 198.67 and R2 of 0.8625. In Omaha, LSTM and TCN occasionally produce negative R2 values for 1- and 24 h forecasts, indicating poor predictive capability. The poor performance of the GRU model in cities such as Omaha may be in part linked to high short-term climate variability and irregular load dynamics. Omaha belongs to a cluster characterised by pronounced temperature fluctuations (Section 4.4), which introduce non-stationary demand patterns that challenge recurrent architectures relying on short temporal dependencies. Previous studies show that GRU and LSTM models are sensitive to demand volatility and weak temporal regularity, leading to rapid error accumulation in heterogeneous urban settings [101,112,113]. In contrast, Prophet maintains stable accuracy across all horizons. Other mid-sized cities show moderate errors for GRU, LSTM, and TCN that increase with horizon length, whereas Prophet remains comparatively stable, although minor variations in short-horizon R2 values, particularly at 6 h, are observed in cities such as San Diego, reflecting local temporal fluctuations.
In larger urban centres, including Houston, Los Angeles, and New York City, GRU and LSTM provide reasonable short-term forecasts, with 6 h MAE and RMSE values comparable to Prophet, but their performance declines noticeably at 12 and 24 h horizons. TCN generally follows similar trends but occasionally produces slightly higher errors at intermediate horizons of 12 h, suggesting some limitations in capturing complex temporal dependencies over extended periods. Prophet consistently delivers more accurate forecasts across all metrics and horizons, demonstrating robustness to both temporal scale and urban variability. Overall, Prophet emerges as the most reliable model for multi-horizon forecasting at the city level, while GRU, LSTM, and TCN seem to be better suited for short-term predictions, especially in highly volatile or large-scale urban datasets.

4.2. Model Performance for City Clusters

The results for the city clusters are shown in Table 2. Information for Cluster 4 is not included in the table, as it contains only one city (Houston); therefore, the cluster-level results would be identical to those of that individual city. Cluster 1, comprising Baltimore, New York City, Philadelphia, and San Antonio, shows the strongest overall predictive performance across all models. The deep learning models (GRU, LSTM, and TCN) achieve low error metrics (MAE and RMSE) and high R2 values for shorter horizons, particularly at 1 h, where R2 exceeds 0.97. However, as the forecast horizon increases to 24 h, their accuracy gradually declines, with RMSE roughly doubling and R2 values decreasing to around 0.82. Prophet, by contrast, maintains high stability, with extremely low error values and R2 consistently above 0.99 across all horizons.
Cluster 2 includes Chicago and Los Angeles, and presents more challenging temporal dynamics, reflected in higher errors and lower R2 compared to Cluster 1. The neural models perform well at shorter horizons, with R2 values above 0.9 for 1 h and 6 h, but their performance declines markedly beyond 12 h. Prophet again outperforms all deep learning models, maintaining lower error levels and higher R2 values (around 0.97–0.98) across all horizons.
Cluster 3, which contains El Paso, Omaha, Portland, San Diego, Seattle, Tallahassee, Tampa, and Tucson, shows the weakest predictive performance overall. The deep learning models produce substantially higher MAE and RMSE values, and R2 values frequently fall below 0.8, indicating a lower degree of predictability. This cluster likely exhibits greater temporal and spatial variability, making it more difficult for neural networks to learn consistent patterns. Prophet, however, remains relatively robust, with lower errors and R2 values consistently around 0.9 or higher, indicating that it captures the dominant trends even in less predictable contexts.
Across all clusters, a clear pattern emerges: as the forecast horizon increases, error metrics (MAE, MSE, RMSE, and MAPE) increase correspondingly, while R2 values decline. This trend confirms that model accuracy decreases as predictions extend further into the future. Among the models, Prophet consistently delivers the lowest error values and highest R2 across all clusters and horizons, followed by GRU and LSTM, which perform comparably but degrade more rapidly with longer prediction intervals. TCN performs similarly to GRU and LSTM in some clusters but exhibits greater variability across horizons and locations. These results indicate that Prophet offers the most robust and generalisable performance across diverse urban contexts. GRU and LSTM remain competitive for short-term forecasting tasks, particularly in clusters with stable temporal dynamics, while TCN appears less consistent across clusters. The performance differences across clusters highlight the influence of local urban characteristics on predictive accuracy, suggesting that cities with more regular temporal behaviour—such as those in Cluster 1—are inherently easier to model, whereas more heterogeneous clusters, such as Cluster 3, present greater forecasting challenges.

4.3. Within-Cluster Differences in Model Performance

The preceding analysis in Section 4.2 highlighted distinct differences in predictive performance between clusters, with Cluster 1 showing the highest overall accuracy and stability, Cluster 2 exhibiting moderate performance, and Cluster 3 demonstrating greater variability and weaker predictability. To further examine whether these observed patterns also reflect differences in performance within clusters, that is, between individual cities belonging to the same group, the Kruskal–Wallis test [111] was applied to the average model performance values for each city across all models and forecast horizons. These results are presented in Table 3. Similar to Section 4.2, Cluster 4 was excluded from this analysis as it contained only a single city (Houston), making the Kruskal–Wallis test inapplicable since it requires at least two groups for comparison.
Consistent with the earlier findings in Table 2, no statistically significant differences were observed for Cluster 1 across any metric or forecast horizon (all p > 0.138). This supports the conclusion that the models generalised well across cities such as Baltimore, New York City, Philadelphia, and San Antonio, capturing broadly similar temporal patterns despite local differences. Cluster 2 also showed relatively good performance across cities, with most p-values exceeding 0.14. However, significant differences in the coefficient of determination (R2) were identified at the 1 h (H = 5.33, p = 0.021) and 6 h (H = 4.08, p = 0.043) horizons, indicating some variation in the explanatory power of the models across cities such as Chicago and Los Angeles, particularly for short-term forecasts.
In contrast, Cluster 3 exhibited a higher degree of heterogeneity in model performance, highlighting the weaker predictability observed earlier. Statistically significant differences were identified for several metrics at the 1 h and 6 h horizons; for example, MAE (H = 15.01, p = 0.020), RMSE (H = 14.36, p = 0.026), and R2 (H = 17.22, p = 0.008). These findings suggest that cities within Cluster 3, which include El Paso, Portland, Seattle, and Tampa, display more diverse temporal behaviours, making it difficult for the models to generalise consistently. As forecast horizons lengthened, these differences diminished, with all p-values exceeding 0.05 at 12 and 24 h, implying convergence in long-term predictive patterns across cities. These results reinforce the descriptive analysis presented in Section 4.2. Model performance was consistent across cities in Clusters 1 and 2, indicating that urban areas with similar temporal structures support reliable model generalisation. In contrast, greater intra-cluster variability in Cluster 3 highlights the influence of urban heterogeneity on predictive accuracy, particularly at shorter forecast horizons.

4.4. Clustering Analysis of Climate Variables

Building upon Section 4.2, which examined model performance across city clusters based on energy demand characteristics, this section explores whether similar grouping patterns emerge when clustering cities by their climate variability (CV). The objective is to determine if climatic conditions help explain the observed differences in model performance. The clustering of CV results revealed three distinct groups of cities, as shown in Figure 3, with detailed CV values provided in Table A3 in Appendix C. Cluster 1 includes cities with moderate to high levels of temperature and humidity variability, such as Houston, Los Angeles, San Antonio, San Diego, Tallahassee, and Tampa. These cities typically experience warmer climates with substantial temperature variation, resulting in significant energy demands for cooling during hot periods. In contrast, Cluster 2, comprising Baltimore, Chicago, Omaha, Philadelphia, Portland, and Seattle, features more temperate climates with moderate temperature and humidity fluctuations. These cities tend to display more stable energy demand patterns with predictable seasonal variations driven by heating and cooling requirements. Lastly, Cluster 3, consisting of El Paso, New York City, and Tucson, exhibits the highest levels of temperature fluctuation and humidity variability, leading to more pronounced and erratic energy demand patterns driven by extreme weather events.
When comparing these climate-based clusters with the energy demand characteristics shown in Figure 2, a clear relationship emerges. Cities in clusters with greater climatic variability (Cluster 1 and Cluster 3) tend to exhibit higher energy demand fluctuations. These cities experience stronger seasonal shifts in temperature and humidity, directly influencing heating and cooling requirements, which in turn produce larger spikes in energy consumption. Conversely, cities in Cluster 2, with relatively stable climates, demonstrate more consistent energy usage patterns, as their demand primarily varies in line with predictable seasonal temperature changes. These findings highlight the critical role of climate variability in shaping urban energy demand: cities with higher climatic instability tend to require more adaptive energy systems, while those with more moderate climates can rely on steadier and more predictable consumption trends.
To maintain consistency across the comparative analyses, model performance is evaluated only for the 1 h forecast horizon. This ensures that the relationships between climate variability and forecasting accuracy are not confounded by horizon-dependent performance differences. Short-term forecasts (1 h) also provide the clearest view of how local climatic fluctuations influence immediate energy demand, making them most suitable for this comparison. Linking CV findings to the model performance results presented earlier, it becomes evident that cities with higher climate variability (Clusters 1 and 3) generally correspond to lower model accuracy across all approaches, particularly for the neural models (TCN, GRU, and LSTM). This aligns with the performance patterns described in Section 3.5, where models struggled in regions characterised by pronounced temperature and humidity fluctuations. The higher variability introduces uncertainty and short-term irregularities in energy demand trends, making it more difficult for models to capture stable temporal dependencies. Prophet, which performed best overall in the earlier analyses, also exhibited a modest decline in accuracy in these high-variability cities. Because Prophet decomposes time series into additive trend and seasonal components, it performs best when underlying patterns are regular and cyclical. In cities where abrupt or irregular climate-driven changes occur, these assumptions are partially violated, leading to increased forecast errors. However, even under such conditions, Prophet generally remained more robust than the deep learning models, which are more sensitive to data noise and non-stationarity.
In contrast, cities within Cluster 2, that is, those with more stable climatic conditions, display superior model performance across all forecasting approaches. The predictability of their energy demand patterns allows both simpler models like Prophet and more complex architectures such as GRU and LSTM to achieve higher accuracy. Cities such as Portland, Seattle, and Philadelphia, all within Cluster 2, consistently demonstrated stronger performance metrics across the 1 h forecast horizon. This reinforces the interpretation that stable climate conditions contribute significantly to improved forecasting outcomes.
Overall, these results indicate that climatic variability exerts a substantial influence on model performance. Cities with greater variability in temperature and humidity, such as those in Clusters 1 and 3, present greater forecasting challenges, while those with more stable climatic conditions, such as in Cluster 2, enable more reliable predictions. This linkage between environmental stability and forecasting accuracy underscores the importance of incorporating climate variability considerations into energy demand modelling.

4.5. Linking Model Characteristics to Forecasting Performance

The observed performance differences across models can be directly linked to their underlying architectural properties. Prophet consistently outperforms deep learning models at longer forecast horizons (12–24 h), particularly across climatically diverse city clusters. This behaviour reflects Prophet’s explicit decomposition of time series into trend and seasonal components, which provides structural stability when temporal autocorrelation weakens at longer horizons. In contrast, LSTM and GRU models demonstrate strong performance at short horizons (1–6 h), especially in clusters with relatively stable climate conditions. Their gated recurrent architectures are well suited to learning short-term temporal dependencies and non-linear interactions between weather variables and recent demand history. However, as the forecast horizon increases, error accumulation and the absence of explicit long-term trend constraints lead to rapid performance degradation, particularly in cities with high climatic variability.
The TCN exhibits intermediate behaviour. Its use of dilated causal convolutions enables efficient learning of longer temporal contexts than recurrent models, which explains its competitive short-horizon performance in some cities. Nevertheless, the lack of explicit seasonality modelling and its reliance on fixed receptive fields limit robustness under highly non-stationary demand regimes, resulting in less consistent performance across clusters.
Across all models, performance deterioration with increasing forecast horizon is more pronounced in clusters characterised by high climate variability, indicating that architectural sensitivity to non-stationarity plays a central role in forecasting reliability. These results suggest that models with explicit trend and seasonality representations are more robust to climatic heterogeneity, while deep learning architectures are better suited to short-term, stable demand regimes.

5. Discussion

The results presented in the previous section reveal systematic differences in forecasting performance across models, cities, and climatic contexts. This section interprets these findings by examining the factors that may contribute to the observed variations in model behaviour, including city-specific demand characteristics, climatic variability, and model architecture. Particular attention is given to short-term (1 h) forecasting, where performance differences are most pronounced and operational relevance is highest. Beyond interpreting current results, the discussion situates these findings within emerging challenges in urban energy planning, including climate non-stationarity, increasing system coupling, and the growing reliance on AI-driven decision support in complex urban energy systems.

5.1. Model Performance and City-Specific Challenges

The variability in model performance across cities emphasises the complexity of forecasting urban energy demand in heterogeneous urban environments. Models that perform well in cities with stable and predictable data patterns, such as those in Cluster 1, highlight the potential for specific models to excel under predictable conditions. In contrast, cities with more erratic or highly variable energy demand, often linked to pronounced climate variability, pose greater challenges for all models. The deep learning models, such as GRU, LSTM, and TCN, perform competitively at short horizons but degrade with longer-term forecasts, while Prophet demonstrates consistent robustness even in less predictable contexts. These patterns underscore the importance of tailoring model selection to city-specific characteristics, including climate, urban form, and underlying socio-economic activity, rather than relying on a one-size-fits-all approach.
The failure of GRU in specific cities such as Omaha highlights an important interaction between model architecture and urban demand characteristics. Cities with low aggregate demand, weak seasonality, and irregular load dynamics provide limited sequential signal for gated recurrent models to exploit. In such contexts, model performance is not primarily constrained by climate volatility or data quality, but by the absence of stable temporal dependencies. This finding reinforces that deep recurrent architectures are not universally suitable for all urban settings and may perform poorly in cities where energy demand is dominated by stochastic or locally driven behaviour rather than consistent climatic forcing. This finding has forward-looking implications: as electrification, decentralised generation, and behavioural heterogeneity increase, more cities may exhibit demand patterns that challenge conventional deep learning assumptions, reinforcing the need for adaptable and context-aware forecasting strategies.

5.2. Cluster Analysis and the Role of Climatic Variables

Clustering cities based on climate variability provides further insights into the relationship between environmental conditions and model performance. Cities with more moderate and stable climate conditions, such as those in Cluster 2, exhibit more consistent energy demand patterns and generally higher forecasting accuracy across models. Conversely, cities with greater temperature and humidity fluctuations, as seen in Clusters 1 and 3, experience more erratic demand patterns, which challenge both deep learning models and Prophet. These findings highlight the critical role of climate variability in shaping short-term energy demand and forecasting performance. Accounting for these differences is essential when designing models, as cities with high climatic instability may benefit from adaptive forecasting approaches that can respond dynamically to rapid changes in environmental conditions.
From a future-planning perspective, this sensitivity to climate variability represents a critical challenge. As climate change accelerates, historical climate–demand relationships may become increasingly unreliable, undermining forecasting approaches trained on past observations. Urban energy planning will therefore require forecasting systems that explicitly account for non-stationary climate conditions and extreme events rather than relying on historical averages alone.

5.3. Weather Data Uncertainty and Model Robustness

Although meteorological variables are key predictors of urban energy demand, they are subject to measurement error, spatial representativeness issues, and short-term variability that introduce noise into the forecasting process. Hourly weather observations may not fully capture microclimatic effects within large urban areas, and small errors in temperature or humidity can propagate into demand predictions, particularly at short forecasting horizons. The results suggest that model sensitivity to such uncertainty varies by architecture. Prophet exhibits greater robustness under noisy climatic conditions due to its explicit decomposition into trend and seasonal components, which dampens the influence of short-term fluctuations. In contrast, deep learning models (LSTM, GRU, and TCN), which rely more heavily on recent input sequences, appear more sensitive to high-frequency variability in weather inputs. This sensitivity is particularly evident in city clusters characterised by high climatic variability, where deep learning performance deteriorates more rapidly with increasing forecast horizon. These findings indicate that uncertainty in meteorological predictors interacts with model structure and should be considered when selecting forecasting approaches for climate-volatile urban environments. Further, this sensitivity becomes increasingly problematic as urban energy systems integrate weather-dependent technologies such as heat pumps, distributed renewables, and flexible demand, where forecasting errors may propagate across coupled systems and markets.

5.4. Implications of Forecast Horizons

The analysis focuses on short-term (1 h) forecasting, where accuracy is highest and operational relevance is greatest. At longer horizons, especially beyond 12–24 h, performance deteriorates for most models as errors accumulate and temporal dependencies weaken. This suggests that near-real-time forecasting currently offers the most reliable support for operational energy management. However, future urban energy planning increasingly requires coordination across multiple temporal scales, from real-time grid operation to medium- and long-term infrastructure investment. Bridging short-term forecasting with longer-term planning models remains a key challenge, particularly as cities move towards integrated, multi-energy systems that require coordinated operation across electricity, storage, and emerging energy carriers such as hydrogen.

5.5. Broader Implications for Future Urban Energy Systems

Beyond model comparison, the findings highlight structural challenges for future urban energy systems. Increasing electrification, sector coupling, and the integration of new energy carriers are expected to significantly alter urban demand dynamics. In such settings, forecasting accuracy alone is insufficient; models must also be interpretable, transferable across cities, and robust to structural change. Recent research on AI-driven urban retrofit and integrated electricity–hydrogen markets [114] also emphasis that forecasting uncertainty, model scalability, and cross-sector coordination are major barriers to real-world deployment. While this study focuses on electricity demand, the results underscore a shared challenge: future urban energy planning will depend on forecasting frameworks capable of supporting complex, coupled systems under climate and demand uncertainty.

5.6. Methodological and Data Limitations

Despite the strengths of the proposed climate-aware, multi-city evaluation framework, several methodological limitations should be acknowledged. First, a uniform set of hyperparameters was intentionally applied across all deep learning models to ensure a transparent and architecture-focused comparison. While this enhances reproducibility and fairness, it may not yield optimal performance for every city, particularly given the wide variation in climatic regimes and demand volatility. In addition, although rolling-origin time-series cross-validation was employed to improve robustness, the 80–10–10 train–validation–test split represents a pragmatic rather than theoretically optimal choice. Alternative window lengths or adaptive splitting strategies could influence performance estimates, especially in cities with shorter or more volatile time series.
Second, the analysis relies on historical energy demand and a limited set of meteorological variables at hourly resolution. Other potentially important drivers, such as detailed building characteristics, socio-economic indicators, behavioural factors, or real-time pricing signals, were not included due to data availability constraints. Moreover, the study focuses on short- to medium-term forecasting horizons (up to 24 h), which are most relevant for operational planning but may not generalise to longer-term forecasting tasks. These limitations point to clear directions for future work, including climate-aware hyperparameter optimisation, richer multi-source data integration, and the extension of the framework to longer horizons and alternative validation strategies.

5.7. Policy Implications for Regional Planning and Energy Management

The findings provide actionable insights for policymakers seeking to optimise urban energy systems. By identifying clusters of cities with similar climatic and temporal characteristics, governments and utilities can develop context-specific forecasting strategies and allocate resources more efficiently. Cities with stable energy demand can benefit from standardised forecasting protocols, while cities facing higher variability may require investment in adaptive infrastructures, such as smart grids, energy storage, and real-time monitoring systems. More broadly, integrating data-driven forecasting into regional and long-term planning frameworks will be essential as cities transition towards low-carbon, highly interconnected energy systems. Aligning short-term operational forecasting with strategic planning can support resilience, reduce systemic risk, and enable more effective responses to climate variability and future energy transitions.

6. Conclusions

This study examined the performance of multiple forecasting models for urban energy demand across a diverse set of cities and climatic contexts. The results demonstrate that forecasting accuracy varies substantially across both cities and models, underscoring the importance of selecting approaches that are appropriate for local demand characteristics and climatic conditions. Models with explicit representations of trend and seasonality, such as Prophet, consistently exhibit robust performance across forecast horizons and urban contexts, particularly in cities with stable and predictable demand patterns. In contrast, deep learning architectures, including GRU, LSTM, and TCN, tend to perform more effectively at short forecasting horizons, especially in environments with relatively stable temporal dynamics, but their performance degrades more rapidly as forecast horizons increase or climatic variability intensifies.
The analysis further highlights the role of climatic variability in shaping forecasting performance. Cities characterised by pronounced temperature and humidity fluctuations generally exhibit higher prediction errors across all models, reflecting the challenges posed by non-stationary and irregular demand patterns. These findings indicate that environmental conditions are a key determinant of forecasting reliability and should be explicitly considered when developing and deploying urban energy forecasting frameworks. As climate change continues to increase the frequency and intensity of extreme weather events, the ability of models to remain robust under evolving climatic regimes will become increasingly important.
Beyond methodological insights, the results have practical implications for urban planners, policymakers, and energy system operators. Forecasting strategies tailored to local climatic and demand characteristics can support more reliable short-term operational planning, improve resource allocation, and enhance the resilience of urban energy systems. The cluster-based perspective adopted in this study further suggests opportunities for regional coordination, whereby cities with similar climatic and temporal characteristics may benefit from shared forecasting strategies and decision-support tools.
Several directions for future research emerge directly from these findings. First, hybrid modelling approaches that combine the structural interpretability of decomposition-based models (e.g., Prophet) with the adaptive learning capacity of deep neural networks could offer improved performance across both stable and highly variable urban contexts. Second, transfer learning within climatically similar city clusters represents a promising strategy for improving cross-city generalisation, particularly for cities with limited historical data or high demand volatility. Third, deeper or more expressive neural architectures may be beneficial for large metropolitan areas with complex and high-volume demand patterns, whereas smaller or less regular cities may benefit more from simpler or regularised models that prioritise robustness over representational complexity.
In addition, while a uniform set of hyperparameters was intentionally adopted in this study to enable transparent and architecture-focused comparisons, future work could explore adaptive strategies such as regime-specific hyperparameter optimisation or Bayesian optimisation tailored to climatic and demand regimes. Such approaches would help disentangle intrinsic model capability from tuning effects while improving local forecasting accuracy. Further extensions could also incorporate richer data sources, including building characteristics, socio-economic indicators, and real-time behavioural signals, as well as examine longer forecasting horizons relevant to strategic planning.
Overall, this study demonstrates that effective urban energy forecasting requires context-sensitive model selection informed by climatic variability, demand dynamics, and forecasting horizon. By integrating hybrid modelling strategies, transfer learning, and adaptive optimisation into future forecasting frameworks, urban energy systems can be supported with more accurate, generalisable, and resilient predictive tools.

Author Contributions

Conceptualization, R.K., S.S., A.D., R.M., O.G. and A.C.; Methodology, A.T., R.K., S.S., A.D. and R.M.; Formal analysis, A.T., R.K., S.S., A.D. and R.M.; Investigation, A.T., R.K., S.S., A.D. and R.M.; Resources, R.M.; Data curation, A.T., R.K., S.S., A.D. and R.M.; Writing—original draft, A.T., R.K., S.S., A.D., R.M., O.G. and A.C.; Writing—review & editing, A.T., R.K., S.S., A.D., R.M., O.G. and A.C.; Visualization, A.T., R.K., S.S., A.D., R.M. and O.G.; Supervision, R.M., O.G. and A.C.; Project administration, A.T., R.K., S.S., A.D. and R.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

These data were derived from the following resources available in the public domain: [Energy data—U.S. Energy Information Administration—https://www.eia.gov/electricity/gridmonitor/dashboard/electric_overview/US48/US48; Weather data-Meteosat database—https://meteostat.net/en/], accessed on 20 August 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Geographic Differences in Cities

Table A1. Geographic location, climate characteristics, and key geographic features of the cities included in the analysis.
Table A1. Geographic location, climate characteristics, and key geographic features of the cities included in the analysis.
CityLocationClimateKey Geographic Features
BaltimoreEast Coast, along the Patapsco RiverTemperate climate, mild winters, hot humid summersWaterfront, historical port city
ChicagoShore of Lake MichiganHumid continental, cold winters, hot summersDense urban core, major economic and cultural hub
El PasoWest Texas, near the US-Mexico borderHot desert, very hot summers, mild wintersDesert region, cultural mix due to border proximity
HoustonGulf CoastHumid subtropical, hot summers, mild wintersSprawling urban landscape, oil and gas industry
Los AngelesPacific CoastMediterranean, mild wet winters, hot dry summersCoastal, sprawling urban environment, diverse population
New York CityEast Coast, on the Hudson RiverHumid subtropical, cold winters, hot summersDense urban fabric, major global financial and cultural center
OmahaGreat PlainsContinental, cold winters, hot summersMid-sized city, agricultural and transportation history
PhiladelphiaEast CoastHumid subtropical, hot humid summers, cold wintersHistorical city, birthplace of American independence
PortlandPacific NorthwestTemperate oceanic, wet mild winters, dry warm summersSurrounded by natural landscapes, green spaces, environmental awareness
San AntonioCentral TexasHot semi-arid, hot summers, mild wintersHistorical sites (e.g., Alamo), growing tech and military presence
San DiegoPacific CoastMediterranean, mild wet winters, warm dry summersBeaches, military presence, biotechnology sector
SeattlePacific NorthwestTemperate oceanic, wet mild winters, cool summersTech hub, surrounded by water, mountains, and forests
TallahasseeFlorida PanhandleHumid subtropical, hot humid summers, mild wintersMix of urban and rural landscapes, state capital
TampaGulf CoastHumid subtropical, hot humid summers, mild wintersGulf Coast location, vibrant tourism sector
TucsonSonoran DesertHot desert, very hot summers, mild wintersDesert landscapes, surrounded by mountains

Appendix B. City Performance

Table A2. Forecasting performance by city, model, forecasting horizon, and evaluation metrics.
Table A2. Forecasting performance by city, model, forecasting horizon, and evaluation metrics.
CityModelHorizon hMAEMSERMSER2MAPE
BaltimoreGRU188.478614,690.1486120.73990.97452.5452
BaltimoreGRU6171.547660,469.7216244.48700.89534.7924
BaltimoreGRU12216.456992,818.8963303.34550.84035.9993
BaltimoreGRU24254.4262122,873.0359349.51690.78837.1577
BaltimoreLSTM1101.109117,898.7000133.61330.96882.9368
BaltimoreLSTM6184.928565,262.6987254.86490.88695.2260
BaltimoreLSTM12223.310795,815.1164308.75760.83466.2756
BaltimoreLSTM24265.6068129,445.3420358.81360.77697.5113
BaltimoreProphet116.8647511.204822.60980.99880.5033
BaltimoreProphet636.96732267.737547.62080.99641.2099
BaltimoreProphet1259.22294514.792167.19220.98771.4903
BaltimoreProphet2412.0533195.711113.98970.99940.3363
BaltimoreTCN197.237117,544.9835131.50250.96992.7376
BaltimoreTCN6183.830267,873.9032259.51520.88205.1792
BaltimoreTCN12221.146497,361.6977311.07280.83116.1941
BaltimoreTCN24257.9645126,072.5177354.40560.78237.2184
ChicagoGRU1411.9057710,790.8828730.90770.87463.8183
ChicagoGRU6522.7851950,437.3204900.77560.82234.7208
ChicagoGRU12646.09381,216,845.00491046.82830.76565.8427
ChicagoGRU24750.67691,492,227.21401176.01570.70776.8169
ChicagoLSTM1396.2622790,257.1815762.40750.86133.6013
ChicagoLSTM6541.97601,073,265.1010950.02620.80154.8691
ChicagoLSTM12652.48301,213,519.56751049.74660.76495.9291
ChicagoLSTM24744.11711,549,057.40261198.67110.69656.6916
ChicagoProphet1216.432765,043.4487255.03620.96671.9727
ChicagoProphet6302.1496102,524.5160320.19450.89433.2633
ChicagoProphet12380.7706240,835.4790490.74990.94562.8188
ChicagoProphet24230.1452125,196.5957353.83130.97091.7634
ChicagoTCN1489.43701,185,090.8360911.19780.79574.5444
ChicagoTCN6654.37321,545,219.81341124.28020.71855.9340
ChicagoTCN12768.88091,622,442.78011214.00350.68736.9259
ChicagoTCN24815.40841,911,625.91071320.15340.63227.3344
El PasoGRU156.23618237.574688.13320.90265.7367
El PasoGRU664.378910,226.545899.42900.88106.5026
El PasoGRU1266.977511,318.2860104.37060.86766.7730
El PasoGRU2477.581414,502.6749118.98320.83207.6801
El PasoLSTM169.678810,765.3048101.90330.87657.3847
El PasoLSTM676.005112,432.8817110.66810.85847.9864
El PasoLSTM1281.771214,279.8511118.61870.83818.4867
El PasoLSTM2489.080217,219.6905130.11180.80349.1766
El PasoProphet116.6351571.111023.89790.98301.7502
El PasoProphet632.16781579.710739.74560.91503.5299
El PasoProphet1247.08592997.897154.75310.98274.1459
El PasoProphet2420.7198639.942525.29710.98452.1519
El PasoTCN159.84269764.633994.90260.88476.1376
El PasoTCN673.193713,165.6902113.14040.84727.5571
El PasoTCN1282.660215,549.2825122.43730.82068.5019
El PasoTCN2493.923819,376.3633137.86360.77599.5898
HoustonGRU1331.5139219,891.2710462.47590.97342.5519
HoustonGRU6607.8856743,561.5799858.88040.90694.6952
HoustonGRU12721.18681,040,957.42661014.70680.86805.6419
HoustonGRU24813.95391,287,635.62391127.81760.83786.4065
HoustonLSTM1338.0176205,037.3653451.40080.97362.6931
HoustonLSTM6608.9744719,818.4786843.17740.90754.7661
HoustonLSTM12756.84251,076,929.77181032.19270.86285.9916
HoustonLSTM24798.26941,223,494.52381099.80970.84326.3466
HoustonProphet174.93497922.925089.01080.99820.6068
HoustonProphet678.287110,074.6494100.37260.99580.6570
HoustonProphet12135.530326,278.3719162.10610.99700.9045
HoustonProphet2437.44701966.771944.34830.99960.2860
HoustonTCN1287.3305167,616.7009402.53360.97812.2747
HoustonTCN6565.2385689,801.6394819.63670.91044.4444
HoustonTCN12697.4020987,720.1023984.43060.87235.5042
HoustonTCN24777.93441,208,719.12281089.37100.84436.2027
San DiegoGRU1287.3305167,616.7009402.53360.97812.2747
San DiegoGRU6565.2385689,801.6394819.63670.91044.4444
San DiegoGRU12697.4020987,720.1023984.43060.87235.5042
San DiegoGRU24777.93441,208,719.12281089.37100.84436.2027
San DiegoLSTM193.879717,313.6341130.85290.92534.9417
San DiegoLSTM6131.467735,510.5746187.39000.84746.9734
San DiegoLSTM12134.666136,960.2088191.84740.84077.2015
San DiegoLSTM24140.170240,967.1217202.03920.82407.3749
San DiegoProphet155.73996751.629682.16830.72322.5565
San DiegoProphet659.99334967.777570.48250.39513.0492
San DiegoProphet1257.748310,554.9469102.73730.81883.4295
San DiegoProphet2429.77971196.970734.59730.97061.3249
San DiegoTCN188.216715,510.0581123.72400.93334.5919
San DiegoTCN6121.001631,945.1841178.13730.86256.3700
San DiegoTCN12124.156133,359.8352182.30830.85606.5187
San DiegoTCN24139.353741,248.7149202.64460.82267.3410
SeattleGRU142.19338403.624887.56210.82673.9526
SeattleGRU645.71748989.063191.03840.81424.2440
SeattleGRU1252.937810,224.375198.03630.78734.9264
SeattleGRU2458.115711,280.7365103.85910.76395.4098
SeattleLSTM147.39689126.383192.86460.80974.4371
SeattleLSTM652.01639795.662696.56820.79524.9043
SeattleLSTM1258.193310,880.3089101.76370.77225.5820
SeattleLSTM2460.753211,770.3553106.83500.75215.6413
SeattleProphet144.59262633.279351.31550.94733.4631
SeattleProphet628.11441166.060334.14760.97402.4748
SeattleProphet1220.5678697.211226.40480.97592.4150
SeattleProphet2434.49942026.549645.01720.94323.0491
SeattleTCN145.45019343.507493.77210.80534.2203
SeattleTCN652.041110,508.251199.88410.78074.8083
SeattleTCN1257.778111,694.4309106.54820.75385.3545
SeattleTCN2463.506612,805.7487111.78560.72935.8945
TallahasseeGRU113.7545358.801618.81870.94634.4417
TallahasseeGRU621.0256874.350029.40650.87046.7393
TallahasseeGRU1226.24691329.925336.20820.80498.5227
TallahasseeGRU2428.94371669.461640.60300.75199.4233
TallahasseeLSTM113.7034348.010918.59080.94864.5107
TallahasseeLSTM621.4450887.308629.61490.87036.8613
TallahasseeLSTM1225.92731246.108335.07590.81588.6454
TallahasseeLSTM2429.69221682.576540.82750.75009.7838
TallahasseeProphet17.1543110.542010.51390.92382.4172
TallahasseeProphet66.378248.17386.94070.95902.4493
TallahasseeProphet127.013377.73838.81690.97611.8121
TallahasseeProphet245.018647.45596.88880.98101.4878
TallahasseeTCN112.6412311.071917.58580.95444.1127
TallahasseeTCN620.9350876.251929.48540.87146.7059
TallahasseeTCN1225.25571242.693735.11300.81688.3671
TallahasseeTCN2428.96471659.184740.51090.75499.4831
TusconGRU191.636815,922.7464124.58060.94536.8071
TusconGRU6133.985335,488.8612184.87310.88209.8593
TusconGRU12144.352440,552.5880196.89970.867210.7675
TusconGRU24154.225046,671.1455210.88560.848711.4220
TusconLSTM195.371617,112.9011129.17120.94107.3249
TusconLSTM6131.331534,284.5924181.13810.88699.7986
TusconLSTM12137.948137,677.9076189.37170.876310.3284
TusconLSTM24154.923047,436.7154211.57730.848311.7839
TusconProphet135.02891912.260643.72940.97742.3130
TusconProphet642.99492898.221053.83510.89572.8304
TusconProphet1270.54906553.784380.95540.98454.8086
TusconProphet2422.6890743.469827.26660.99281.4463
TusconTCN192.028116,159.7710125.24350.94296.9088
TusconTCN6136.574637,891.2820189.23690.876910.2769
TusconTCN12138.444839,060.6096192.51740.873310.2012
TusconTCN24154.445147,055.4719211.30720.844211.6041
PhiladelphiaGRU1107.662520,894.6098143.93000.97442.4104
PhiladelphiaGRU6169.904960,706.0674246.15110.92603.7399
PhiladelphiaGRU12220.8261100,301.2897316.08800.87794.8669
PhiladelphiaGRU24272.3402147,989.8930384.26910.81976.0290
PhiladelphiaLSTM1108.477120,616.1893143.16240.97502.4505
PhiladelphiaLSTM6181.064364,005.3140252.51250.92214.0411
PhiladelphiaLSTM12229.6147102,740.1984319.84490.87515.1070
PhiladelphiaLSTM24277.3656147,255.3639382.56790.82126.1326
PhiladelphiaProphet132.01781649.995940.62010.99490.7985
PhiladelphiaProphet658.39724684.280468.44180.98591.5030
PhiladelphiaProphet1248.09363141.358556.04780.99230.9416
PhiladelphiaProphet2415.7171457.451721.38810.99880.3667
PhiladelphiaTCN186.980014,196.7247118.89970.98271.9425
PhiladelphiaTCN6168.997958,352.8073241.17980.92903.7529
PhiladelphiaTCN12216.741694,044.5152306.31220.88554.7832
PhiladelphiaTCN24261.9680134,265.0427365.74850.83675.7712
San AntonioGRU1203.713174,011.9930271.96860.98042.6793
San AntonioGRU6358.7291260,185.1058509.94130.93114.7560
San AntonioGRU12463.8332434,166.9088658.35100.88476.1599
San AntonioGRU24539.2059579,395.5928760.49610.84707.1010
San AntonioLSTM1243.6161102,469.5116319.48900.97313.2746
San AntonioLSTM6411.2620324,088.5075567.90700.91525.4366
San AntonioLSTM12493.8181462,247.9877679.35950.87776.6202
San AntonioLSTM24546.5051586,885.8201765.80460.84457.3336
San AntonioProphet156.38157466.827886.41080.99740.8132
San AntonioProphet657.22076580.808481.12220.99740.8355
San AntonioProphet1251.22215164.367671.86350.99810.5644
San AntonioProphet2466.48825960.017677.20110.99810.8230
San AntonioTCN1201.478073,713.4719270.80270.98032.6872
San AntonioTCN6383.0537296,180.6270543.99360.92115.1432
San AntonioTCN12463.7172446,544.7699668.14360.88156.2405
San AntonioTCN24527.9993567,517.8416753.12190.84867.1410
PortlandGRU162.368612,343.0811105.59420.93632.5682
PortlandGRU690.411220,637.1185140.90870.89123.6696
PortlandGRU12104.785925,089.9987155.60000.86774.3240
PortlandGRU24116.449031,548.7418175.99690.83214.6856
PortlandLSTM177.557117,319.8427125.70550.90983.2161
PortlandLSTM696.663122,733.9227148.04220.87983.9064
PortlandLSTM12107.136226,553.7377160.34890.85934.3873
PortlandLSTM24127.645136,781.5590189.83500.80425.2175
PortlandProphet133.86101595.391739.94240.96921.5143
PortlandProphet655.62034078.246263.86110.96132.5806
PortlandProphet1242.29222496.876849.96880.96291.5736
PortlandProphet2446.94834574.773867.63710.88422.0022
PortlandTCN186.521721,652.2050139.99180.88833.5602
PortlandTCN6111.878730,203.0069169.44010.84224.5172
PortlandTCN12133.798038,914.5441190.74580.79775.6347
PortlandTCN24133.229139,629.4233196.51950.79035.4446
Los AngelesGRU1284.3712154,206.1317388.33360.97552.4735
Los AngelesGRU6416.7433355,133.6235593.50110.94383.5483
Los AngelesGRU12476.3850440,366.5809661.01750.93064.1065
Los AngelesGRU24506.8041527,929.6993722.76850.91764.3072
Los AngelesLSTM1316.6008178,070.7432419.50360.97122.8143
Los AngelesLSTM6428.6626356,605.6034595.23300.94373.7140
Los AngelesLSTM12467.6323428,820.1693652.01720.93254.0293
Los AngelesLSTM24519.7834549,017.1206737.08160.91414.4519
Los AngelesProphet191.506212,684.2843112.62450.99110.8561
Los AngelesProphet6235.511477,605.6645278.57790.83572.2544
Los AngelesProphet12196.864645,301.8665212.84240.99121.6746
Los AngelesProphet2495.298224,106.6388155.26310.98710.9529
Los AngelesTCN1279.7668138,702.0973370.61710.97712.4671
Los AngelesTCN6427.8595376,380.2749609.49560.94063.6784
Los AngelesTCN12442.1358409,148.5964638.34420.93563.7473
Los AngelesTCN24485.9583500,411.1493705.45690.92164.1022
New York CityGRU1156.499478,352.6758239.22550.95282.6875
New York CityGRU6219.9790127,878.0057327.30270.91823.7550
New York CityGRU12282.8775185,053.9870407.23250.87704.8511
New York CityGRU24339.1639250,319.9588481.43270.82915.8022
New York CityLSTM1158.497145,486.2243208.81160.96772.8190
New York CityLSTM6227.8100101,044.1728309.77140.92963.9662
New York CityLSTM12300.4532175,351.2818410.14780.87685.2054
New York CityLSTM24357.7402247,874.9332489.73870.82376.2788
New York CityProphet135.93582109.529545.92960.99600.6409
New York CityProphet670.85898465.329592.00720.97081.3635
New York CityProphet1256.86915176.195771.94580.99360.8446
New York CityProphet2438.59042356.418448.54300.99680.6857
New York CityTCN1114.665529,430.5004162.75150.98032.0017
New York CityTCN6219.3952134,029.3786332.40960.91483.7460
New York CityTCN12274.5975207,097.1027420.14420.86624.6773
New York CityTCN24316.4043224,683.7379459.25510.84465.4101
OmahaGRU123.2177876.277128.7292−0.01902.4699
OmahaGRU624.0404947.423829.7480−1.10422.5632
OmahaGRU1224.82481017.605030.9657−1.22972.6464
OmahaGRU2424.4959965.936430.2452−1.16282.6053
OmahaLSTM1203.8283811,324.9597601.64540.055614.2714
OmahaLSTM6221.6740823,106.6493615.1799−1.019715.6610
OmahaLSTM12245.3685832,964.0885625.9563−1.083717.3301
OmahaLSTM24210.6698814,033.9343607.21060.040514.4560
OmahaProphet163.686413,795.3933117.45380.70533.7328
OmahaProphet645.35709021.777194.98300.66153.2103
OmahaProphet1247.59345372.003773.29400.92852.6613
OmahaProphet2441.30963839.093561.96040.93882.4913
OmahaTCN1170.79021,291,635.6565664.75870.216011.7918
OmahaTCN6171.10891,392,047.5211689.65810.159111.9326
OmahaTCN12204.58541,312,614.1613682.79860.147914.2650
OmahaTCN24217.29041,118,488.3941663.44580.033915.4355
TampaGRU181.230411,844.6583108.18440.97183.3570
TampaGRU6132.225434,859.3461185.74290.91805.2682
TampaGRU12156.554147,501.9739217.03200.88836.3090
TampaGRU24178.994162,426.8019248.04830.85357.3254
TampaLSTM190.719514,509.2779120.13920.96563.7922
TampaLSTM6148.040642,019.8347204.55570.90115.9564
TampaLSTM12173.130855,454.0895234.62910.86947.1790
TampaLSTM24186.941966,433.4134256.64850.84417.7130
TampaProphet115.0019371.021519.26190.99720.7445
TampaProphet625.2020822.615328.68130.98571.2935
TampaProphet1239.67063546.244059.55030.98581.3388
TampaProphet2422.9832704.930526.55050.99681.0227
TampaTCN168.74559001.887194.34700.97842.8069
TampaTCN6119.164428,724.9426169.26770.93284.7538
TampaTCN12140.584938,396.9949195.77960.91005.6751
TampaTCN24152.967646,308.3699214.98410.89156.2396

Appendix C. Hierarchical Clustering of Climate Variable Variability

Table A3. Variability measures of climate variables for each city used in the hierarchical clustering analysis. The final column reports the cluster assignment corresponding to Figure 3.
Table A3. Variability measures of climate variables for each city used in the hierarchical clustering analysis. The final column reports the cluster assignment corresponding to Figure 3.
CityTemperatureDew PointRelative HumidityPrecipitationWind SpeedPressureCluster
Houston0.3480.4170.2718.9550.6000.0051
Los Angeles0.2800.3900.3119.1901.2540.0041
San Antonio0.3840.4340.3388.5210.6450.0061
San Diego0.2430.3690.2289.3000.7410.0041
Tallahassee0.3630.4190.2698.8130.8160.0051
Tampa0.2390.3070.2348.2950.6690.0041
Baltimore0.5510.5590.2986.7570.6120.0072
Chicago0.6310.6070.2656.4650.5160.0072
Omaha0.5970.5700.2818.1230.6040.0082
Philadelphia0.5750.5620.3136.7960.6040.0072
Portland0.5500.5630.2653.8150.7410.0072
Seattle0.5370.5460.2394.2000.7890.0072
El Paso0.4700.6030.58516.3640.7190.0063
New York City0.5750.5730.27619.0250.6260.0083
Tucson0.4400.7040.62012.6050.6290.0053

References

  1. UN-Habitat. Urban Energy. Available online: https://unhabitat.org/topic/urban-energy (accessed on 1 October 2025).
  2. UN-HABITAT. Envisaging the Future of Cities; UN-HABITAT: Nairobi, Kenya, 2022. [Google Scholar]
  3. Madlener, R.; Sunak, Y. Impacts of urbanization on urban structures and energy demand: What can we learn for urban energy planning and urbanization management? Sustain. Cities Soc. 2011, 1, 45–53. [Google Scholar] [CrossRef]
  4. UN Environment Programme. Global Status Report for Buildings and Construction; Global Alliance for Buildings and Construction: Paris, France, 2020. [Google Scholar]
  5. International Energy Agency. Energy Efficiency 2024; International Energy Agency: Paris, France, 2024. [Google Scholar]
  6. Manohar, V.J.; Murthy, G.; Royal, N.P.; Binu, B.; Patil, T. Comprehensive Analysis of Energy Demand Prediction Using Advanced Machine Learning Techniques. In Proceedings of the E3S Web of Conferences, Casablanca, Morocco, 4–5 December 2025; EDP Sciences: Les Ulis, France, 2025; Volume 616, p. 02027. [Google Scholar]
  7. Durmus Senyapar, H.N.; Aksoz, A. Empowering sustainability: A consumer-centric analysis based on advanced electricity consumption predictions. Sustainability 2024, 16, 2958. [Google Scholar] [CrossRef]
  8. Joskow, P.L. Creating a smarter US electricity grid. J. Econ. Perspect. 2012, 26, 29–48. [Google Scholar] [CrossRef]
  9. Walker, A.; Cox, E.; Loughhead, J.; Roberts, J. Counting the Cost: The Economic and Social Costs of Electricity Shortfalls in the UK-A Report for the Council for Science and Technology 2014. Available online: https://www.raeng.org.uk/media/2s2pgeeg/single-pages-counting-the-cost-report.pdf (accessed on 9 September 2025).
  10. Ward, C.; Robinson, C.; Singleton, A.; Rowe, F. Spatial-Temporal Dynamics of Gas Consumption in England and Wales: Assessing the Residential Sector Using Sequence Analysis. Appl. Spat. Anal. Policy 2024, 17, 1273–1300. [Google Scholar] [CrossRef]
  11. Bayindir, R.; Irmak, E.; Colak, I.; Bektas, A. Development of a real time energy monitoring platform. Int. J. Electr. Power Energy Syst. 2011, 33, 137–146. [Google Scholar] [CrossRef]
  12. Bayindir, R.; Hossain, E.; Kabalci, E.; Billah, K.M.M. Investigation on North American microgrid facility. Int. J. Renew. Energy Res. 2015, 5, 558–574. [Google Scholar] [CrossRef]
  13. Li, P.; Zhang, J.S. A new hybrid method for China’s energy supply security forecasting based on ARIMA and XGBoost. Energies 2018, 11, 1687. [Google Scholar] [CrossRef]
  14. C40. 2021. 700+ Cities in 53 Countries Now Committed to Halve Emissions by 2030 and Reach Net Zero by 2050. Available online: https://www.c40.org/news/cities-committed-race-to-zero/ (accessed on 9 September 2025).
  15. Hou, H.; Liu, C.; Wang, Q.; Wu, X.; Tang, J.; Shi, Y.; Xie, C. Review of load forecasting based on artificial intelligence methodologies, models, and challenges. Electr. Power Syst. Res. 2022, 210, 108067. [Google Scholar] [CrossRef]
  16. Fan, C.; Sun, Y.; Zhao, Y.; Song, M.; Wang, J. Deep learning-based feature engineering methods for improved building energy prediction. Appl. Energy 2019, 240, 35–45. [Google Scholar] [CrossRef]
  17. Ahmad, T.; Madonski, R.; Zhang, D.; Huang, C.; Mujeeb, A. Data-driven probabilistic machine learning in sustainable smart energy/smart energy systems: Key developments, challenges, and future research opportunities in the context of smart grid paradigm. Renew. Sustain. Energy Rev. 2022, 160, 112128. [Google Scholar] [CrossRef]
  18. Himeur, Y.; Ghanem, K.; Alsalemi, A.; Bensaali, F.; Amira, A. Artificial intelligence based anomaly detection of energy consumption in buildings: A review, current trends and new perspectives. Appl. Energy 2021, 287, 116601. [Google Scholar] [CrossRef]
  19. Kapp, S.; Choi, J.K.; Hong, T. Predicting industrial building energy consumption with statistical and machine-learning models informed by physical system parameters. Renew. Sustain. Energy Rev. 2023, 172, 113045. [Google Scholar] [CrossRef]
  20. Li, F.; Yigitcanlar, T.; Nepal, M.; Thanh, K.N.; Dur, F. A novel urban heat vulnerability analysis: Integrating machine learning and remote sensing for enhanced insights. Remote Sens. 2024, 16, 3032. [Google Scholar] [CrossRef]
  21. Bansal, C.; Jain, A.; Barwaria, P.; Choudhary, A.; Singh, A.; Gupta, A.; Seth, A. Temporal prediction of socio-economic indicators using satellite imagery. In Proceedings of the 7th ACM IKDD CoDS and 25th COMAD, Hyderabad, India, 5–7 January 2020; pp. 73–81. [Google Scholar]
  22. Kiasari, M.; Ghaffari, M.; Aly, H.H. A Comprehensive Review of the Current Status of Smart Grid Technologies for Renewable Energies Integration and Future Trends: The Role of Machine Learning and Energy Storage Systems. Energies 2024, 17, 4128. [Google Scholar] [CrossRef]
  23. Wang, Z.; Wang, Y.; Zeng, R.; Srinivasan, R.S.; Ahrentzen, S. Random Forest based hourly building energy prediction. Energy Build. 2018, 171, 11–25. [Google Scholar] [CrossRef]
  24. Panigrahi, R.; Patne, N.R.; Pemmada, S.; Manchalwar, A.D. Regression model-based hourly aggregated electricity demand prediction. Energy Rep. 2022, 8, 16–24. [Google Scholar] [CrossRef]
  25. Li, C.; Ding, Z.; Zhao, D.; Yi, J.; Zhang, G. Building energy consumption prediction: An extreme deep learning approach. Energies 2017, 10, 1525. [Google Scholar] [CrossRef]
  26. Al Tarhuni, B.; Naji, A.; Brodrick, P.G.; Hallinan, K.P.; Brecha, R.J.; Yao, Z. Large scale residential energy efficiency prioritization enabled by machine learning. Energy Effic. 2019, 12, 2055–2078. [Google Scholar] [CrossRef]
  27. Dong, Q.; Xing, K.; Zhang, H. Artificial neural network for assessment of energy consumption and cost for cross laminated timber office building in severe cold regions. Sustainability 2017, 10, 84. [Google Scholar] [CrossRef]
  28. Kontokosta, C.E.; Tull, C. A data-driven predictive model of city-scale energy use in buildings. Appl. Energy 2017, 197, 303–317. [Google Scholar] [CrossRef]
  29. Li, X.; Ding, L.; Lv, J.; Xu, G.; Li, J. A novel hybrid approach of KPCA and SVM for building cooling load prediction. In Proceedings of the 2010 Third International Conference on Knowledge Discovery and Data Mining, Online, 9–10 January 2010; IEEE: New York, NY, USA, 2010; pp. 522–526. [Google Scholar]
  30. Lv, J.; Li, X.; Ding, L.; Jiang, L. Applying principal component analysis and weighted support vector machine in building cooling load forecasting. In Proceedings of the 2010 International Conference on Computer and Communication Technologies in Agriculture Engineering, Nanchang, China, 22–25 October 2010; IEEE: New York, NY, USA, 2010; Volume 1, pp. 434–437. [Google Scholar]
  31. Li, Q.; Meng, Q.; Cai, J.; Yoshino, H.; Mochida, A. Applying support vector machine to predict hourly cooling load in the building. Appl. Energy 2009, 86, 2249–2256. [Google Scholar] [CrossRef]
  32. VE, S.; Shin, C.; Cho, Y. Efficient energy consumption prediction model for a data analytic-enabled industry building in a smart city. Build. Res. Inf. 2021, 49, 127–143. [Google Scholar] [CrossRef]
  33. Walther, J.; Weigold, M. A systematic review on predicting and forecasting the electrical energy consumption in the manufacturing industry. Energies 2021, 14, 968. [Google Scholar] [CrossRef]
  34. Li, X.; Deng, Y.; Ding, L.; Jiang, L. Building cooling load forecasting using fuzzy support vector machine and fuzzy C-mean clustering. In Proceedings of the 2010 International Conference on Computer and Communication Technologies in Agriculture Engineering, Nanchang, China, 22–25 October 2010; IEEE: New York, NY, USA, 2010; Volume 1, pp. 438–441. [Google Scholar]
  35. Solomon, D.M.; Winter, R.L.; Boulanger, A.G.; Anderson, R.N.; Wu, L.L. Forecasting Energy Demand in Large Commercial Buildings Using Support Vector Machine Regression; Columbia University Computer Science Technical Reports, CUCS-040-11; Department of Computer Science, Columbia University: New York, NY, USA, 2011. [Google Scholar]
  36. Edwards, R.E.; New, J.; Parker, L.E. Predicting future hourly residential electrical consumption: A machine learning case study. Energy Build. 2012, 49, 591–603. [Google Scholar] [CrossRef]
  37. Lam, J.C.; Wan, K.K.; Wong, S.; Lam, T.N. Principal component analysis and long-term building energy simulation correlation. Energy Convers. Manag. 2010, 51, 135–139. [Google Scholar] [CrossRef]
  38. Farzana, S.; Liu, M.; Baldwin, A.; Hossain, M.U. Multi-model prediction and simulation of residential building energy in urban areas of Chongqing, South West China. Energy Build. 2014, 81, 161–169. [Google Scholar] [CrossRef]
  39. Dagnely, P.; Ruette, T.; Tourwé, T.; Tsiporkova, E.; Verhelst, C. Predicting hourly energy consumption. Can you beat an autoregressive model. In Proceedings of the 24th Annual Machine Learning Conference of Belgium and The Netherlands, Benelearn, Delft, The Netherlands, 18–20 November 2015; Volume 19. [Google Scholar]
  40. Ben-Nakhi, A.E.; Mahmoud, M.A. Cooling load prediction for buildings using general regression neural networks. Energy Convers. Manag. 2004, 45, 2127–2141. [Google Scholar] [CrossRef]
  41. Fumo, N.; Biswas, M.R. Regression analysis for prediction of residential energy consumption. Renew. Sustain. Energy Rev. 2015, 47, 332–343. [Google Scholar] [CrossRef]
  42. Ozturk, S.; Ozturk, F. Forecasting energy consumption of Turkey by Arima model. J. Asian Sci. Res. 2018, 8, 52. [Google Scholar] [CrossRef]
  43. Ediger, V.Ş.; Akar, S. ARIMA forecasting of primary energy demand by fuel in Turkey. Energy Policy 2007, 35, 1701–1708. [Google Scholar] [CrossRef]
  44. Mathumitha, R.; Rathika, P.; Manimala, K. Intelligent deep learning techniques for energy consumption forecasting in smart buildings: A review. Artif. Intell. Rev. 2024, 57, 35. [Google Scholar] [CrossRef]
  45. Ahmad, M. Seasonal decomposition of electricity consumption data. Rev. Integr. Bus. Econ. Res. 2017, 6, 271. [Google Scholar]
  46. Mbasso, W.F.; Molu, R.J.J.; Harrison, A.; Pushkarna, M.; Kemdoum, F.N.; Donfack, E.F.; Jangir, P.; Tiako, P.; Tuka, M.B. Hybrid modeling approach for precise estimation of energy production and consumption based on temperature variations. Sci. Rep. 2024, 14, 24422. [Google Scholar] [CrossRef]
  47. Fan, G.F.; Zhang, L.Z.; Yu, M.; Hong, W.C.; Dong, S.Q. Applications of random forest in multivariable response surface for short-term load forecasting. Int. J. Electr. Power Energy Syst. 2022, 139, 108073. [Google Scholar] [CrossRef]
  48. Candanedo, L.M.; Feldheim, V.; Deramaix, D. Data driven prediction models of energy use of appliances in a low-energy house. Energy Build. 2017, 140, 81–97. [Google Scholar] [CrossRef]
  49. Ahmad, M.W.; Mourshed, M.; Rezgui, Y. Trees vs. Neurons: Comparison between random forest and ANN for high-resolution prediction of building energy consumption. Energy Build. 2017, 147, 77–89. [Google Scholar] [CrossRef]
  50. Priyadarshini, I.; Sahu, S.; Kumar, R.; Taniar, D. A machine-learning ensemble model for predicting energy consumption in smart homes. Internet Things 2022, 20, 100636. [Google Scholar] [CrossRef]
  51. Divina, F.; Gilson, A.; Goméz-Vela, F.; García Torres, M.; Torres, J.F. Stacking ensemble learning for short-term electricity consumption forecasting. Energies 2018, 11, 949. [Google Scholar] [CrossRef]
  52. Khashei, M.; Bijari, M. A novel hybridization of artificial neural networks and ARIMA models for time series forecasting. Appl. Soft Comput. 2011, 11, 2664–2675. [Google Scholar] [CrossRef]
  53. Alizadegan, H.; Rashidi Malki, B.; Radmehr, A.; Karimi, H.; Ilani, M.A. Comparative study of long short-term memory (LSTM), bidirectional LSTM, and traditional machine learning approaches for energy consumption prediction. Energy Explor. Exploit. 2025, 43, 281–301. [Google Scholar] [CrossRef]
  54. Paterakis, N.G.; Mocanu, E.; Gibescu, M.; Stappers, B.; van Alst, W. Deep learning versus traditional machine learning methods for aggregated energy demand prediction. In Proceedings of the 2017 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGT-Europe), Torino, Italy, 26–29 September 2017; IEEE: New York, NY, USA, 2017; pp. 1–6. [Google Scholar]
  55. Hrnjica, B.; Mehr, A.D. Energy demand forecasting using deep learning. In Smart Cities Performability, Cognition, & Security; Springer International Publishing: Cham, Switzerland, 2020; pp. 71–104. [Google Scholar]
  56. Bedi, J.; Toshniwal, D. Deep learning framework to forecast electricity demand. Appl. Energy 2019, 238, 1312–1326. [Google Scholar] [CrossRef]
  57. Naji, S.; Keivani, A.; Shamshirband, S.; Alengaram, U.J.; Jumaat, M.Z.; Mansor, Z.; Lee, M. Estimating building energy consumption using extreme learning machine method. Energy 2016, 97, 506–516. [Google Scholar] [CrossRef]
  58. Su, H.; Zio, E.; Zhang, J.; Xu, M.; Li, X.; Zhang, Z. A hybrid hourly natural gas demand forecasting method based on the integration of wavelet transform and enhanced Deep-RNN model. Energy 2019, 178, 585–597. [Google Scholar] [CrossRef]
  59. Karijadi, I.; Chou, S.Y. A hybrid RF-LSTM based on CEEMDAN for improving the accuracy of building energy consumption prediction. Energy Build. 2022, 259, 111908. [Google Scholar] [CrossRef]
  60. del Real, A.J.; Dorado, F.; Durán, J. Energy demand forecasting using deep learning: Applications for the French grid. Energies 2020, 13, 2242. [Google Scholar] [CrossRef]
  61. Kim, T.Y.; Cho, S.B. Predicting residential energy consumption using CNN-LSTM neural networks. Energy 2019, 182, 72–81. [Google Scholar] [CrossRef]
  62. Amasyali, K.; El-Gohary, N. Deep learning for building energy consumption prediction. In Proceedings of the 6th CSCE-CRC International Construction Specialty Conference, Vancouver, BC, Canada, 31 May–3 June 2017; Volume 31. [Google Scholar]
  63. Cawthorne, D.; de Queiroz, A.R.; Eshraghi, H.; Sankarasubramanian, A.; DeCarolis, J.F. The role of temperature variability on seasonal electricity demand in the Southern US. Front. Sustain. Cities 2021, 3, 644789. [Google Scholar] [CrossRef]
  64. Ullah, F.U.M.; Ullah, A.; Khan, N.; Lee, M.Y.; Rho, S.; Baik, S.W. Deep Learning-Assisted Short-Term Power Load Forecasting Using Deep Convolutional LSTM and Stacked GRU. Complexity 2022, 2022, 2993184. [Google Scholar] [CrossRef]
  65. Alduailij, M.A.; Petri, I.; Rana, O.; Alduailij, M.A.; Aldawood, A.S. Forecasting peak energy demand for smart buildings. J. Supercomput. 2021, 77, 6356–6380. [Google Scholar] [CrossRef]
  66. Nepal, B.; Yamaha, M.; Yokoe, A.; Yamaji, T. Electricity load forecasting using clustering and ARIMA model for energy management in buildings. Jpn. Archit. Rev. 2020, 3, 62–76. [Google Scholar] [CrossRef]
  67. Prakash, C.; Dhyani, B.; chauhan, A.; Sayal, A. ARIMA based forecasting of solar and hydro energy consumption with implications for grid stability and renewable policy. Discov. Sustain. 2025, 6, 663. [Google Scholar] [CrossRef]
  68. Magalhães, B.; Bento, P.; Pombo, J.; Calado, M.d.R.; Mariano, S. Short-term load forecasting based on optimized random forest and optimal feature selection. Energies 2024, 17, 1926. [Google Scholar] [CrossRef]
  69. Deng, S.; Dong, X.; Tao, L.; Wang, J.; He, Y.; Yue, D. Multi-type load forecasting model based on random forest and density clustering with the influence of noise and load patterns. Energy 2024, 307, 132635. [Google Scholar] [CrossRef]
  70. Goudarzi, S.; Anisi, M.H.; Kama, N.; Doctor, F.; Soleymani, S.A.; Sangaiah, A.K. Predictive modelling of building energy consumption based on a hybrid nature-inspired optimization algorithm. Energy Build. 2019, 196, 83–93. [Google Scholar] [CrossRef]
  71. Izudin, N.E.M.; Sokkalingam, R.; Daud, H.; Mardesci, H.; Husin, A. Forecasting electricity consumption in Malaysia by hybrid ARIMA-ANN. In Proceedings of the 6th International Conference on Fundamental and Applied Sciences: ICFAS 2020, Sarawak, Malaysia, 14–16 July 2020; Springer: Berlin/Heidelberg, Germany, 2022; pp. 749–760. [Google Scholar]
  72. Souhe, F.G.Y.; Mbey, C.F.; Boum, A.T.; Ele, P.; Kakeu, V.J.F. A hybrid model for forecasting the consumption of electrical energy in a smart grid. J. Eng. 2022, 2022, 629–643. [Google Scholar] [CrossRef]
  73. Abumohsen, M.; Owda, A.Y.; Owda, M. Electrical load forecasting using LSTM, GRU, and RNN algorithms. Energies 2023, 16, 2283. [Google Scholar] [CrossRef]
  74. Khan, Z.A.; Hussain, T.; Ullah, A.; Rho, S.; Lee, M.; Baik, S.W. Towards efficient electricity forecasting in residential and commercial buildings: A novel hybrid CNN with a LSTM-AE based framework. Sensors 2020, 20, 1399. [Google Scholar] [CrossRef] [PubMed]
  75. Wang, C.; Li, X.; Shi, Y.; Jiang, W.; Song, Q.; Li, X. Load forecasting method based on CNN and extended LSTM. Energy Rep. 2024, 12, 2452–2461. [Google Scholar] [CrossRef]
  76. Carréon, J.R.; Worrell, E. Urban energy systems within the transition to sustainable development. A research agenda for urban metabolism. Resour. Conserv. Recycl. 2018, 132, 258–266. [Google Scholar] [CrossRef]
  77. Yu, C.; Fu, C.; Xu, P. Energy shock, industrial transformation and macroeconomic fluctuations. Int. Rev. Financ. Anal. 2024, 92, 103069. [Google Scholar] [CrossRef]
  78. Wang, D.; Li, J.; Wang, Y.; Wan, K.; Song, X.; Liu, Y. Comparing the vulnerability of different coal industrial symbiosis networks under economic fluctuations. J. Clean. Prod. 2017, 149, 636–652. [Google Scholar] [CrossRef]
  79. Staffell, I.; Pfenninger, S. The increasing impact of weather on electricity supply and demand. Energy 2018, 145, 65–78. [Google Scholar] [CrossRef]
  80. López-Moreno, H.; Núñez-Peiró, M.; Sánchez-Guevara, C.; Neila, J. On the identification of Homogeneous Urban Zones for the residential buildings’ energy evaluation. Build. Environ. 2022, 207, 108451. [Google Scholar] [CrossRef]
  81. U.S. Energy Information Administration. Real-Time Operating Grid-U.S. Energy Information Administration (EIA). Available online: https://toolkit.climate.gov/tool/us-electric-system-real-time-operating-grid (accessed on 9 September 2025).
  82. Meteostat. Available online: https://meteostat.net/en/ (accessed on 9 September 2025).
  83. Fan, C.; Xiao, F.; Wang, S. Development of prediction models for next-day building energy consumption and peak power demand using data mining techniques. Appl. Energy 2014, 127, 1–10. [Google Scholar] [CrossRef]
  84. Ashraf, J.; Azam, R.; Rifa, A.A.; Rana, M.J. Multiple machine learning models for predicting annual energy consumption and demand of office buildings in subtropical monsoon climate. Asian J. Civ. Eng. 2025, 26, 293–309. [Google Scholar] [CrossRef]
  85. Dubey, A.K.; Kumar, A.; García-Díaz, V.; Sharma, A.K.; Kanhaiya, K. Study and analysis of SARIMA and LSTM in forecasting time series data. Sustain. Energy Technol. Assessments 2021, 47, 101474. [Google Scholar] [CrossRef]
  86. Gao, Y.; Ruan, Y. Interpretable deep learning model for building energy consumption prediction based on attention mechanism. Energy Build. 2021, 252, 111379. [Google Scholar] [CrossRef]
  87. Oke, T.R. The energetic basis of the urban heat island. Q. J. R. Meteorol. Soc. 1982, 108, 1–24. [Google Scholar] [CrossRef]
  88. Santamouris, M. Analyzing the heat island magnitude and characteristics in one hundred Asian and Australian cities and regions. Sci. Total Environ. 2015, 512, 582–598. [Google Scholar] [CrossRef] [PubMed]
  89. Taha, H. Urban climates and heat islands: Albedo, evapotranspiration, and anthropogenic heat. Energy Build. 1997, 25, 99–103. [Google Scholar] [CrossRef]
  90. Santamouris, M. Cooling the cities–A review of reflective and green roof mitigation technologies to fight heat island and improve comfort in urban environments. Sol. Energy 2014, 103, 682–703. [Google Scholar] [CrossRef]
  91. Yang, J.; Bou-Zeid, E. Should cities embrace their heat islands as shields from extreme cold? J. Appl. Meteorol. Climatol. 2019, 58, 1307–1321. [Google Scholar] [CrossRef]
  92. Dodman, D. Urban density and climate change. In Analytical Review of the Interaction Between Urban Growth Trends and Environmental Changes; United Nations Population Fund (UNFPA): New York, NY, USA, 2009. [Google Scholar]
  93. Steemers, K. Energy and the city: Density, buildings and transport. Energy Build. 2003, 35, 3–14. [Google Scholar] [CrossRef]
  94. Poggi, F.; Amado, M. The spatial dimension of energy consumption in cities. Energy Policy 2024, 187, 114023. [Google Scholar] [CrossRef]
  95. Capra, M.; Bussolino, B.; Marchisio, A.; Masera, G.; Martina, M.; Shafique, M. Hardware and software optimizations for accelerating deep neural networks: Survey of current trends, challenges, and the road ahead. IEEE Access 2020, 8, 225134–225180. [Google Scholar] [CrossRef]
  96. Taylor, S.J.; Letham, B. Forecasting at scale. Am. Stat. 2018, 72, 37–45. [Google Scholar] [CrossRef]
  97. Cheng, L.; Chengbo, Y.; Tao, L. Prophet-Based Medium and Long-Term Electricity Load Forecasting Research. J. Phys. Conf. Ser. 2022, 2356, 012002. [Google Scholar] [CrossRef]
  98. Arslan, S. A hybrid forecasting model using LSTM and Prophet for energy consumption with decomposition of time series data. PeerJ Comput. Sci. 2022, 8, e1001. [Google Scholar] [CrossRef]
  99. Zhou, C.; Fang, Z.; Xu, X.; Zhang, X.; Ding, Y.; Jiang, X. Using long short-term memory networks to predict energy consumption of air-conditioning systems. Sustain. Cities Soc. 2020, 55, 102000. [Google Scholar] [CrossRef]
  100. Munem, M.; Bashar, T.R.; Roni, M.H.; Shahriar, M.; Shawkat, T.B.; Rahaman, H. Electric power load forecasting based on multivariate LSTM neural network using Bayesian optimization. In Proceedings of the 2020 IEEE Electric Power and Energy Conference, Virtual, 7–8 December 2020; IEEE: New York, NY, USA, 2020; pp. 1–6. [Google Scholar]
  101. Kong, W.; Dong, Z.Y.; Jia, Y.; Hill, D.J.; Xu, Y.; Zhang, Y. Short-term residential load forecasting based on LSTM recurrent neural network. IEEE Trans. Smart Grid 2017, 10, 841–851. [Google Scholar] [CrossRef]
  102. Salem, F.M.; Salem, F.M. Gated RNN: The Gated Recurrent Unit (GRU) RNN. In Recurrent Neural Networks: From Simple to Gated Architectures; Springer: Berlin/Heidelberg, Germany, 2022; pp. 85–100. [Google Scholar]
  103. Lea, C.; Flynn, M.D.; Vidal, R.; Reiter, A.; Hager, G.D. Temporal convolutional networks for action segmentation and detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 156–165. [Google Scholar]
  104. Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar] [CrossRef]
  105. Shaikh, A.K.; Nazir, A.; Khalique, N.; Shah, A.S.; Adhikari, N. A new approach to seasonal energy consumption forecasting using temporal convolutional networks. Results Eng. 2023, 19, 101296. [Google Scholar] [CrossRef]
  106. Wang, X.; Wu, Y.; Zou, W.; Zhao, X. Hybrid Time Series Forecasting for Real-Time Electricity Market Demand Using ARIMA-LSTM and Scalable Cloud-Native Architecture. Informatica 2025, 49. [Google Scholar] [CrossRef]
  107. Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice; OTexts: Melbourne, Australia, 2018. [Google Scholar]
  108. Bergmeir, C.; Hyndman, R.J.; Koo, B. A note on the validity of cross-validation for evaluating autoregressive time series prediction. Comput. Stat. Data Anal. 2018, 120, 70–83. [Google Scholar] [CrossRef]
  109. Shan, R.; Jia, X.; Su, X.; Xu, Q.; Ning, H.; Zhang, J. Ai-driven multi-objective optimization and decision-making for urban building energy retrofit: Advances, challenges, and systematic review. Appl. Sci. 2025, 15, 8944. [Google Scholar] [CrossRef]
  110. MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, 21 June–18 July 1965 and 27 December 1965–7 January 1966; Volume 1, pp. 281–297. [Google Scholar]
  111. Conover, W.J. Practical Nonparametric Statistics; John Wiley & Sons: Hoboken, NJ, USA, 1999; Volume 350. [Google Scholar]
  112. Himeur, Y.; Alsalemi, A.; Bensaali, F.; Amira, A.; Al-Kababji, A. Recent trends of smart nonintrusive load monitoring in buildings: A review, open challenges, and future directions. Int. J. Intell. Syst. 2022, 37, 7124–7179. [Google Scholar] [CrossRef]
  113. Bedi, J.; Toshniwal, D. Empirical mode decomposition based deep learning for electricity demand forecasting. IEEE Access 2018, 6, 49144–49156. [Google Scholar] [CrossRef]
  114. Mochi, P. Perspective on challenges and opportunities in integrated electricity-hydrogen market. Energy Sustain. Dev. 2025, 87, 101728. [Google Scholar] [CrossRef]
Figure 1. Cities used in this study.
Figure 1. Cities used in this study.
Energies 19 00176 g001
Figure 2. Energy demand across cities.
Figure 2. Energy demand across cities.
Energies 19 00176 g002
Figure 3. Hierarchical clustering of cities based on climate variables. Colours indicate the clusters identified at the selected linkage distance, with cities sharing the same colour belonging to the same cluster.
Figure 3. Hierarchical clustering of cities based on climate variables. Colours indicate the clusters identified at the selected linkage distance, with cities sharing the same colour belonging to the same cluster.
Energies 19 00176 g003
Table 1. Representative energy forecasting studies across methodological categories.
Table 1. Representative energy forecasting studies across methodological categories.
Study TypeMethodsKey AssumptionsForecast HorizonAdvantages (+) and Limitations (–)
Statistical models (e.g., [41,43,65,66,67])MLR, ARIMA, SARIMALinear relationships; stationarityDaily–monthly+ Interpretable; low computational cost – Performs poorly under non-linearity and high variability
ML models (e.g., [47,48,68,69])RF, SVR, DTFeature-driven relationshipsHourly–daily+ Strong non-linear modelling; moderate complexity – Limited temporal memory; typically building-scale
Hybrid models (e.g., [51,52,70,71,72])ANN–ARIMA; ensemble learners; ARIMA–SVR–PSO; ARIMA–ANN; SVR–FA–ANFISCombined linear & non-linear structureShort-term (hourly)+ Improved accuracy and robustness – High complexity; reduced interpretability
Deep learning (e.g., [55,56,64,73,74])LSTM, GRU, CNN, Bi-LSTM, RNN, CNN–LSTM–AE, LSTM–GRURequire large training datasets1–6 h (typically)+ Excellent short-term accuracy; captures non-linear temporal dynamics – Data-intensive; weaker long-horizon stability
Spatio-temporal DL (e.g., [61,62,75])CNN–LSTM, DNNDense spatial sensing; spatial–temporal dependencyShort-term+ Captures spatial relationships and temporal patterns – Limited scalability; high data and compute demand
Table 2. Model performance metrics by cluster, model type, and forecasting horizon. Each horizon (1, 6, 12, and 24 h) reflects short- to long-term forecasting accuracy across different city clusters.
Table 2. Model performance metrics by cluster, model type, and forecasting horizon. Each horizon (1, 6, 12, and 24 h) reflects short- to long-term forecasting accuracy across different city clusters.
ClusterModelHorizonMAEMSERMSER2MAPE
Cluster 1GRU1139.0946,987.36193.970.9712.58
GRU6230.04127,309.73331.970.9184.26
GRU12295.99203,085.27421.250.8705.47
GRU24351.28275,144.62493.930.8216.52
LSTM1152.9346,617.66201.270.9712.87
LSTM6251.27138,600.17346.260.9134.67
LSTM12311.80209,038.65429.530.8665.80
LSTM24361.80277,865.37499.230.8176.81
Prophet135.302934.3948.890.9970.69
Prophet655.865499.5472.300.9881.23
Prophet1253.854499.1866.760.9930.96
Prophet2433.212242.4040.280.9980.55
TCN1125.0933,721.42170.990.9782.34
TCN6238.82139,109.18344.280.9124.46
TCN12294.05211,262.02426.420.8665.47
TCN24341.08263,134.79483.130.8286.39
Cluster 2GRU1348.14432,498.51559.620.9253.15
GRU6469.76652,785.47747.140.8834.14
GRU12561.24828,605.79853.920.8484.98
GRU24628.741,010,078.46949.390.8135.56
LSTM1356.43484,163.96590.960.9163.21
LSTM6485.32714,935.35772.630.8734.29
LSTM12560.06821,169.87850.880.8494.98
LSTM24631.951,049,037.26967.880.8055.57
Prophet1153.9738,863.87183.830.9791.41
Prophet6268.8390,065.09299.390.8652.76
Prophet12288.82143,068.67351.800.9682.25
Prophet24162.7274,651.62254.550.9791.36
TCN1384.60661,896.47640.910.8863.51
TCN6541.12960,800.04866.890.8304.81
TCN12605.511,015,795.69926.170.8115.34
TCN24650.681,206,018.531012.810.7775.72
Cluster 3GRU180.9029,954.39119.940.7923.54
GRU6134.72109,476.50199.420.7404.78
GRU12161.39154,886.04232.380.6945.57
GRU24180.36190,159.07258.160.6746.19
LSTM185.25125,815.35170.240.7846.08
LSTM6106.76135,212.41198.860.7337.46
LSTM12118.03139,762.63209.750.7028.40
LSTM24120.71141,269.81219.070.6888.48
Prophet133.813689.7749.220.8932.31
Prophet636.123097.7748.410.8362.66
Prophet1237.423677.5653.650.9472.48
Prophet2428.751861.3938.280.9571.93
TCN176.03193,888.43175.580.8095.32
TCN695.62215,352.98207.000.7576.66
TCN12109.83207,395.99216.530.7297.76
TCN24118.46182,788.03223.970.6858.49
Table 3. Kruskal–Wallis test results for model performance across cities within clusters and time horizons.
Table 3. Kruskal–Wallis test results for model performance across cities within clusters and time horizons.
ClusterHorizon (h)MetricH-Statisticp-Value
Cluster 11MAE5.0510.168
Cluster 11MAPE2.0510.562
Cluster 11MSE4.6990.195
Cluster 11R21.5410.673
Cluster 11RMSE5.0510.168
Cluster 16MAE4.3900.222
Cluster 16MAPE2.8460.416
Cluster 16MSE4.8530.183
Cluster 16R22.8010.423
Cluster 16RMSE4.8530.183
Cluster 112MAE3.9490.267
Cluster 112MAPE3.5960.309
Cluster 112MSE4.8530.183
Cluster 112R24.2570.235
Cluster 112RMSE4.8530.183
Cluster 124MAE5.3160.150
Cluster 124MAPE3.5960.309
Cluster 124MSE5.5150.138
Cluster 124R23.2650.353
Cluster 124RMSE5.5150.138
Cluster 21MAE2.0830.149
Cluster 21MAPE2.0830.149
Cluster 21MSE2.0830.149
Cluster 21R25.3330.021
Cluster 21RMSE2.0830.149
Cluster 26MAE2.0830.149
Cluster 26MAPE2.0830.149
Cluster 26MSE2.0830.149
Cluster 26R24.0830.043
Cluster 26RMSE2.0830.149
Cluster 212MAE2.0830.149
Cluster 212MAPE2.0830.149
Cluster 212MSE2.0830.149
Cluster 212R22.0830.149
Cluster 212RMSE2.0830.149
Cluster 224MAE2.0830.149
Cluster 224MAPE2.0830.149
Cluster 224MSE2.0830.149
Cluster 224R22.0830.149
Cluster 224RMSE2.0830.149
Cluster 31MAE15.0070.020
Cluster 31MAPE6.9310.327
Cluster 31MSE14.2680.027
Cluster 31R217.2240.008
Cluster 31RMSE14.3570.026
Cluster 36MAE14.8600.021
Cluster 36MAPE4.7590.575
Cluster 36MSE11.8370.066
Cluster 36R215.1030.019
Cluster 36RMSE12.0000.062
Cluster 312MAE11.9480.063
Cluster 312MAPE5.2980.506
Cluster 312MSE11.7930.067
Cluster 312R28.8230.184
Cluster 312RMSE12.4290.053
Cluster 324MAE9.5910.143
Cluster 324MAPE4.8690.561
Cluster 324MSE8.3280.215
Cluster 324R28.8520.182
Cluster 324RMSE8.3280.215
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tiwari, A.; Kukreja, R.; Subramanian, S.; Devkar, A.; Mahabir, R.; Gkountouna, O.; Croitoru, A. Machine Learning for Sustainable Urban Energy Planning: A Comparative Model Analysis. Energies 2026, 19, 176. https://doi.org/10.3390/en19010176

AMA Style

Tiwari A, Kukreja R, Subramanian S, Devkar A, Mahabir R, Gkountouna O, Croitoru A. Machine Learning for Sustainable Urban Energy Planning: A Comparative Model Analysis. Energies. 2026; 19(1):176. https://doi.org/10.3390/en19010176

Chicago/Turabian Style

Tiwari, Abhiraj, Rushil Kukreja, Sanjeev Subramanian, Anush Devkar, Ron Mahabir, Olga Gkountouna, and Arie Croitoru. 2026. "Machine Learning for Sustainable Urban Energy Planning: A Comparative Model Analysis" Energies 19, no. 1: 176. https://doi.org/10.3390/en19010176

APA Style

Tiwari, A., Kukreja, R., Subramanian, S., Devkar, A., Mahabir, R., Gkountouna, O., & Croitoru, A. (2026). Machine Learning for Sustainable Urban Energy Planning: A Comparative Model Analysis. Energies, 19(1), 176. https://doi.org/10.3390/en19010176

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop