Artificial Intelligence Approaches for Energy Consumption and Generation Forecasting, Anomaly Detection, and Public Decision-Making: A Systematic Review

Ayuso, David Velasco; Román Gallego, Jesús Ángel; Domínguez, Carolina Zato

doi:10.3390/en19102347

Open AccessSystematic Review

Artificial Intelligence Approaches for Energy Consumption and Generation Forecasting, Anomaly Detection, and Public Decision-Making: A Systematic Review

by

David Velasco Ayuso

^*

,

Jesús Ángel Román Gallego

and

Carolina Zato Domínguez

Department of Computer Science and Automation, University of Salamanca, 37008 Salamanca, Spain

^*

Author to whom correspondence should be addressed.

Energies 2026, 19(10), 2347; https://doi.org/10.3390/en19102347

Submission received: 22 March 2026 / Revised: 3 May 2026 / Accepted: 10 May 2026 / Published: 13 May 2026

Download

Browse Figures

Versions Notes

Abstract

The large-scale integration of variable renewable energy sources introduces critical challenges of intermittency and uncertainty, yet consumption forecasting, generation forecasting, and anomaly detection are typically addressed in isolation, neglecting the bidirectional feedback between consumption patterns, generation mix, and public decision-making. This PRISMA 2020-compliant systematic review compared statistical, machine learning, and deep learning models for energy forecasting and machine learning and deep learning models for anomaly detection. Searches in Google Scholar and Scopus used seven targeted strings, restricted to peer-reviewed empirical studies (2022–2026; 2023–2026 for anomaly detection), indexed in Q1–Q3 JCR journals, excluding theoretical and non-benchmarked works. A six-item risk of bias questionnaire—with a threshold of four points—guided inclusion, yielding 60 articles. Addressing the first research question (RQ1) on comparative model performance, hybrid deep learning architectures optimized with bio-inspired metaheuristics achieved the highest forecasting accuracy (

R^{2}

up to 0.9984), with metaheuristic optimization acting as a cost-reducing factor; statistical models remained competitive for long-horizon forecasting, while large-language-model-based approaches addressed data scarcity through few-shot learning. Addressing the second research question (RQ2) on smart grid optimization, predictive techniques reduce forecasting errors enabling real-time load adjustment and Demand Response, though a systematic asymmetry constrains their potential: consumption studies integrate socio-economic variables, whereas generation studies rely on meteorological inputs. Addressing the third research question (RQ3) on infrastructure security, supervised and unsupervised approaches detect anomalous operational states and support fault diagnosis, yet remain constrained by scarce labeled fault data and limited cross-regional validation; generative models such as GANs and diffusion models partially address this limitation by enabling Sim2Real strategies and realistic digital twin construction. Evidence is strongest for hybrid forecasting; certainty is lower for anomaly detection given reliance on experimental surrogates. No single paradigm achieves universal superiority. The primary finding is the consistent absence of integrated frameworks jointly modeling consumption, generation, anomaly detection, and public decision-making across the reviewed literature. This result reflects a structural limitation of the current state of the art, rather than a forward-looking research agenda. This study was funded by the ENIA International Chair on Trustworthy Artificial Intelligence European Recovery Plan; the protocol was not pre-registered.

Keywords:

energy consumption forecasting; energy generation forecasting; anomaly detection; smart grids; hybrid deep learning models; metaheuristic optimization; large language models; neuromorphic computing; demand response; integrated energy systems

1. Introduction

The consolidation of variable renewable energy (VRE) is currently posing significant challenges to the energy transition process. The large-scale integration of these sources introduces critical issues of intermittency and uncertainty that increase system vulnerability and generate unexpected market dynamics. Extreme climatic events, intensified by climate change, exacerbate these operational challenges: heat waves, droughts, hurricanes, and storms can damage infrastructure, disrupt supply, and cause unexpected peaks in energy demand [1]. With the deployment of advanced measurement infrastructures, it has been demonstrated that the complexity of consumption profiles has increased the variability of electricity demand, thereby requiring new Demand Response management strategies [2]. To maintain system stability and flexibility under such variability, Energy Storage Systems (ESS) and Virtual Energy Plants (VEP) have emerged, enabling Distributed Energy Sources (DESs) to be decentralized, aggregated, and coordinated to optimize energy dispatch and dynamic market participation [3]. Regarding existing research gaps, isolated models present several limitations. Traditional statistical models such as ARIMA or linear regression assume linearity and stationarity, which limits their ability to capture high volatility and non-linear patterns [2,4]. Deep learning models, by contrast, require large data volumes, high computational resources, and sensitive hyperparameter tuning processes [2]. In many cases, the incorporation of exogenous variables introduces the curse of dimensionality and data redundancy due to inherent multicollinearity [5]. Although these challenges have been partially addressed, there remains a lack of systematic integration of robust techniques to filter heteroscedastic noise within Advanced Metering Infrastructure (AMI), where data availability is often limited [2]. One of the most significant and underexplored research gaps lies in the fact that generation and consumption forecasting models continue to operate in isolation [2]. This separation overlooks the emergence of a bidirectional feedback loop, in which the increasing penetration of renewable energy sources reshapes consumption patterns—for instance, through electric vehicles or demand response policies—while fluctuations in demand simultaneously affect generation strategies. Furthermore, the scarcity of historical data in newly deployed wind and solar farms limits the applicability of deep neural networks and constrains the use of transfer learning techniques [6,7]. This review addresses the operational and economic challenges associated with renewable energy intermittency, including increased balancing costs, inefficient dispatch, and grid instability. The proliferation of intelligent smart grids requires advanced solutions for dynamic Demand Response management. Moreover, current literature does not adequately address the bidirectional feedback between consumption, generation, and policy decision-making within a unified framework [3]. In addition, limited and low-quality data hinder accurate anomaly detection, making it a complex and context-dependent problem [8,9]. Consequently, synthesizing the fragmented body of existing research is essential to systematically characterize the current state of the art and identify recurring structural limitations across studies. Therefore, to address these challenges, the main objectives of this study are (1) to evaluate and compare the performance of advanced forecasting approaches, including statistical, machine learning (ML), and deep learning (DL) models; (2) to analyze the contribution of predictive techniques to Smart Grid management optimization; and (3) to identify computational strategies for mitigating vulnerability to extreme events and complex consumption patterns. Several prior systematic reviews have addressed related topics, and this work is explicitly differentiated from three of the most frequently cited. Mosavi et al. [10] conducted a broad taxonomy of machine learning models across diverse energy system types, classified by modelling technique, energy type, and application area; however, their review did not follow a PRISMA-compliant protocol, did not address anomaly detection or public decision-making, and predates the emergence of hybrid metaheuristic-optimized architectures, LLM-based forecasters, and neuromorphic computing paradigms. Wang et al. [11] reviewed deep learning architectures exclusively for renewable energy generation forecasting—covering deterministic and probabilistic methods based on deep belief networks, stacked autoencoders, and recurrent neural networks—leaving energy consumption forecasting and anomaly detection entirely outside their scope. Voyant et al. [12] restricted their analysis to solar irradiance forecasting using classical machine learning methods such as ANN, SVM, and regression trees, without addressing wind energy, consumption patterns, or fault detection. In contrast, the present review integrates all three domains—energy consumption forecasting, generation forecasting, and anomaly detection—within a unified PRISMA 2020 framework, explicitly models the bidirectional feedback between these domains and public decision-making, and incorporates paradigms absent from those prior reviews, including LLM-based approaches and neuromorphic architectures. This broader and more integrated scope constitutes the principal methodological contribution of the present work with respect to those foundational surveys. Although the core corpus of this review comprises twelve primary studies, the potential risk of a limited sample size is mitigated through the systematic application of a structured snowballing strategy, incorporating forty-eight secondary references. This expansion enables the integration of a broader methodological and empirical context, capturing algorithmic antecedents, baseline comparisons, and complementary validation scenarios across diverse datasets and geographical settings. As a result, the combined corpus of sixty studies provides a sufficiently representative and heterogeneous evidence base, reducing the likelihood of sample bias and strengthening the robustness, generalizability, and interpretability of the synthesized conclusions in accordance with PRISMA 2020 recommendations. The methodology adopted in this work follows the PRISMA 2020 framework. By systematically consulting databases, applying eligibility criteria, and evaluating the selected studies, this approach aims to provide a structured and reproducible synthesis of the existing literature. The remainder of this paper is structured as follows. Section 2 reviews the state of the art. Section 3 describes the PRISMA-based methodology. Section 4 presents the study extraction and selection process. Section 5 discusses the results and main findings. Section 6 identifies the main research gaps and future directions. Finally, Section 7 synthesizes the main findings and consolidates the identified structural research gaps.

2. State of the Art

The reviewed literature encompasses statistical, machine learning, and deep learning approaches for energy generation and consumption forecasting, as well as anomaly detection in intelligent critical infrastructures.

Classical and statistical approaches remain widely used. For instance, Ref. [4] evaluated models such as SARIMA/X, FB Prophet, Holt–Winters, and TBATS to predict monthly energy demand in Brazil, using combined governmental data sources (EPE, IPEA, ABVE, and INMET) along with economic, industrial, and climatic exogenous variables. Similarly, Ref. [2] compared ARIMA and SARIMA with nonlinear methods for hourly short-term forecasting based on AMI data from the KT platform in South Korea, incorporating energy consumption and a public holiday indicator as exogenous variables.

Machine Learning (ML) and Deep Learning (DL) models are also extensively used. In this context, Ref. [13] addressed hourly solar radiation forecasting in Palestine and Jordan using Spiking Neural Networks (SNNs), as well as models such as LSTM, MLP, CNN, and ARIMAX, based on the National Solar Radiation Database (NSRDB) and enriched with detailed climatic exogenous variables. In a similar vein, Ref. [14] applied ensemble learning techniques combining LSTM and LightGBM to predict solar and wind energy generation at intervals ranging from 10 min to 1 h, using meteorological datasets from the Middle East. Likewise, Ref. [15] employed Residual Neural Networks (ResNets) to predict daily solar irradiance in the Arabian Peninsula, using data from the MODIS satellite, including Aerosol Optical Depth (AOD), Single Scattering Albedo (SSA), and additional meteorological variables. Beyond the domains addressed in this review, ML-based forecasting techniques have also been successfully applied to other energy sources such as geothermal systems [16], indicating that similar modeling strategies are consistently applied across multiple energy domains beyond those explicitly covered in this review.

The reviewed studies incorporate short-term forecasting models for Demand Response optimization in AMI infrastructures [2] and energy storage system scheduling to address grid congestion [17]. However, although some studies have explored the bidirectional relationship between power demand and generation mix in long-term macroeconomic planning [3], a consistent limitation across studies is the absence of holistic short-term frameworks capable of jointly modeling decentralized components under anomalous climatic conditions.

The literature shows a transition from classical models toward more complex and hybrid approaches to address their documented limitations. For example, Ref. [18] predicted Turkish net electricity consumption using XGBoost and CatBoost optimized with metaheuristics on a national dataset incorporating variables such as gross income, production, population size, and import/export ratios. Similarly, Ref. [19] optimized LSTM using the Butterfly Optimization Algorithm (BOA), employing datasets such as IHEPC (France) and AEP, enriched with meteorological variables. Alternatively, Ref. [5] combined dimensionality reduction techniques (PCA and ICA) with Random Forest, SVR, Linear Regression, ANN, and LSTM using a synthetic dataset that includes climatic, demographic, and calendar-related variables.

Another relevant approach is presented in [20], where a hybrid Autoencoder–LSTM (AE-LSTM) model was developed using one year of real-time measurements from a 100 MW solar power plant, incorporating daily energy generation, maximum grid-connected output, and irradiance data. Furthermore, Ref. [7] introduced a multimodal LLM-based approach for ultra-short-term wind energy forecasting (15 min horizon), using wind farm datasets from Inner Mongolia, Yunnan, and Gansu (China), enriched with Numerical Weather Prediction (NWP) data.

Regarding anomaly and fault detection in energy infrastructures, the analysis indicates that solutions are highly context-dependent. An unsupervised approach is proposed by [8], which applies Denoising Diffusion Probabilistic Models (DDPM) to detect anomalies in nuclear power plants using the synthetic Fuqing Unit 2 Full-Scale Simulator (FU-FS) dataset, consisting of over 2215 monitoring variables under simulated incident conditions. Among supervised approaches, Ref. [9] developed a BO-CNN-LSTM model for fault diagnosis in hydraulic turbines, using a manually labeled acoustic signal dataset covering normal operation, sediment-related anomalies, and physical impact disturbances.

This systematic review synthesizes existing approaches for energy generation and consumption forecasting, as well as anomaly detection in smart grids. The comparative analysis reveals that most studies address these problems in isolation, with limited consideration of bidirectional interactions between generation, consumption, and anomalous conditions, thereby defining a structured gap in integrated modeling frameworks.

3. Methodology

To conduct this systematic review, the PRISMA 2020 methodology was followed in its original form, as specified by [21]. This review adheres to the PRISMA 2020 guidelines; however, the protocol was not pre-registered in a public registry.

3.1. Definition of Research Questions

The first step consists of defining the research questions that guide the systematic search, synthesis, and selection of the analyzed studies. These are formulated to structure the systematic synthesis and comparative evaluation of the reviewed literature.

RQ1—What is the comparative performance of statistical, Machine Learning, and Deep Learning models for energy consumption and generation forecasting in smart grid environments?
RQ2—How do predictive forecasting techniques contribute to the optimization of Smart Grid management, particularly in Demand Response and operational decision-making?
RQ3—How do predictive techniques contribute to the security and monitoring of energy infrastructures, particularly through anomaly and fault detection?

3.2. Systematic Literature Search

Based on the scope defined by the research questions, a rigorous compilation of articles was conducted. The search phase was carried out using relevant academic databases and scholarly search engines, such as Google Scholar and Scopus. The search terms were related to artificial intelligence approaches for energy consumption and generation forecasting, as well as anomaly detection. Multiple systematic search strings were utilized to cover the three topics previously mentioned:

The first query explores energy consumption forecasting using machine learning and deep learning techniques across different temporal horizons, whereas the second focuses on efficiency-oriented approaches, enabling a direct comparison between both perspectives. Query 1 was conducted in October 2025 (earliest retrieved study dated 12 October 2025), and Query 2 was conducted between October and December 2025 (earliest retrieved study dated 29 October 2025).

The third filter explores general energy generation forecasting without restricting the AI technique. The fourth focuses specifically on renewable energy sources, such as photovoltaic and wind energy, across different temporal horizons. The fifth targets efficiency comparisons between classical deep learning and neuromorphic approaches in short-term forecasting. Together, these filters define three complementary research directions. Query 3 was conducted between October and November 2025 (earliest retrieved study dated 19 October 2025), Query 4 was conducted in January 2026 (earliest retrieved study dated 4 January 2026), and Query 5 was conducted in February 2026 (earliest retrieved study dated 8 February 2026).

Through this query, both anomaly and fault detection are analyzed in renewable and non-renewable energy systems using artificial intelligence approaches. Query 6 was conducted in December 2025 (earliest retrieved study dated 17 December 2025).

Through this query, purely statistical approaches such as SARIMA, Holt–Winters, and Kalman filters are evaluated within empirical benchmarking frameworks applied to both energy consumption and generation forecasting across different temporal horizons. Query 7 was conducted in January 2026 (earliest retrieved study dated 4 January 2026, latest dated 10 January 2026).

Table 1 provides a structured and reproducible overview of all seven search strings. The limitations applied to the filters include (1) temporal restrictions, constrained between 2022 and 2026, with a stricter window (2023–2026) for anomaly detection studies; (2) accessibility and validation, including only fully available and peer-reviewed texts; (3) indexing quality, restricting the selection to articles published in high-impact journals (Q1, Q2, or Q3); (4) methodological approach, including only empirical studies based on machine learning, deep learning, or statistical techniques; (5) domain-specific constraints, requiring practical relevance in anomaly detection and excluding purely theoretical or isolated works, as well as studies lacking benchmarking against baseline models; and (6) language restriction, including only articles written in English. All seven search strings, together with the filters and limits described above, were applied consistently across both Google Scholar and Scopus without introducing database-specific modifications that could bias the retrieval process in both Google Scholar and Scopus. The systematic search was conducted between October 2025 and February 2026, with the final search completed in February 2026.

3.3. Study Selection

The study selection process is based on the definition of both inclusion and exclusion criteria, which were used to identify the main articles of this systematic review. The core of the analysis consists of twelve primary studies. For each of these, four additional secondary articles were selected using broader inclusion and exclusion criteria through the snowballing technique. This approach enables structured contextualization and comparative synthesis of the research lines associated with each core study. The inclusion and exclusion criteria applied to the twelve core articles are detailed in Table 2, and those applied to secondary references in Table 3.

The screening process was conducted by a single reviewer (the first author) in two stages: title and abstract screening, followed by full-text eligibility assessment. Inclusion and exclusion decisions were applied systematically based on predefined criteria established prior to the search phase. Although the screening was performed by a single reviewer, inclusion and exclusion decisions were discussed and validated collaboratively among all authors to ensure methodological consistency and reduce potential selection bias. Duplicate records were identified and removed after database retrieval. References were managed using Zotero, and documentation was stored in Google Drive. Artificial-intelligence-based tools were used solely to prioritise the reading order of candidate articles; all screening, inclusion, and exclusion decisions were performed manually.

3.4. Study Grouping and Synthesis Strategy

Following the study selection process, the included articles were systematically grouped to enable structured qualitative synthesis. Studies were first categorized according to the target domain: energy consumption forecasting, energy generation forecasting, and anomaly detection in power systems. Within each domain, studies were further organized based on the model category (statistical, machine learning, deep learning, hybrid, LLM-based, or neuromorphic).

To operationalize this grouping process, key study characteristics (including target domain, model category, and evaluation metrics) were first tabulated and systematically compared against the predefined synthesis groups, ensuring consistent and transparent assignment of each study prior to qualitative synthesis.

Comparative analysis was conducted only among studies addressing comparable tasks and reporting compatible evaluation metrics.

Given the substantial heterogeneity across studies in terms of datasets, temporal resolutions, evaluation protocols, and reported metrics, a quantitative meta-analysis was not considered appropriate. Instead, a structured qualitative synthesis combining narrative analysis and tabular comparison was adopted to ensure methodological consistency and meaningful cross-study interpretation.

3.5. Exploration of Heterogeneity

No formal statistical methods (e.g., subgroup analysis or meta-regression) were applied to explore heterogeneity among study results. However, heterogeneity was qualitatively examined by comparing studies across predefined dimensions, including target domain, model category, dataset characteristics, temporal resolution, and evaluation metrics. This approach enabled the identification of systematic differences in model performance and applicability across heterogeneous study settings.

3.6. Sensitivity Analysis

No formal sensitivity analyses were conducted. However, robustness of the synthesis was ensured through the application of predefined inclusion and exclusion criteria, a minimum quality threshold based on the risk of bias assessment, and consistent selection of representative results from each study. This approach reduces the likelihood that individual studies disproportionately influence the overall conclusions. The absence of formal sensitivity analysis constitutes a recognized limitation of this review, partially mitigated by the systematic and transparent application of the predefined methodological framework described above.

3.7. Reporting Bias Assessment

To mitigate the risk of bias arising from missing or selectively reported results, several measures were adopted. The application of a strict quality threshold—requiring inclusion in Q1–Q3 JCR-ranked journals for core studies—ensures that the included corpus meets a minimum standard of peer-reviewed methodological transparency, substantially reducing the probability of incorporating studies with undisclosed selective reporting. For secondary references, preprint sources were occasionally consulted on a punctual basis to avoid excluding relevant emerging work prior to formal publication. Additionally, the extraction of limitations, weaknesses, and negative results from each included study was conducted systematically for every core article, as reflected in Table 6, ensuring that the synthesis reflects both the strengths and the documented shortcomings of the analyzed approaches without selectively emphasizing favorable outcomes.

3.8. Certainty of Evidence Assessment

Given the absence of a standardized framework for assessing certainty of evidence in artificial intelligence applications within energy systems, confidence in the body of evidence was evaluated through a combination of complementary approaches.

First, higher certainty was assigned to conclusions supported by multiple independent studies reporting convergent findings across sufficiently large and diverse datasets, distinct geographical contexts, and varying temporal resolutions. The frequency and prominence of exceptions were also considered: isolated discrepancies attributable to context-specific factors reduced confidence in the affected conclusions without invalidating them, whereas systematic divergences across studies of the same domain were explicitly flagged as limitations of the synthesis.

Second, confidence was further modulated by examining the consistency of results across studies addressing comparable tasks within the same domain. When multiple studies reported convergent findings using different architectures or datasets, the certainty of the corresponding conclusion was considered higher. Conversely, when discrepancies were observed across studies within the same domain—particularly those attributable to geographical context, dataset origin, or data availability constraints such as the scarcity of real-world fault data in anomaly detection—the confidence assigned to the affected conclusions was treated with greater caution and explicitly acknowledged as a limitation during the synthesis.

This combined approach ensured that the certainty of evidence reflected not only the methodological quality of individual studies, but also the coherence and contextual comparability of the evidence base as a whole.

3.9. Study Quality and Risk of Bias Evaluation

To ensure methodological rigor and compliance with PRISMA 2020 recommendations, a structured assessment framework was designed to evaluate both the methodological quality and the risk of bias of each study during the full-text eligibility phase. Given the absence of standardized tools specifically tailored to artificial intelligence applications in energy forecasting and anomaly detection, an ad hoc questionnaire was developed, grounded in the research gaps identified in the introduction and aligned with the predefined inclusion criteria.

The proposed instrument follows a hybrid approach, simultaneously capturing dimensions of methodological quality and potential sources of bias at the study level. The questionnaire used is as follows:

Q1: Does the article directly address electric energy consumption forecasting, energy generation prediction, or anomaly detection in power systems? (to mitigate selection and relevance bias by ensuring strict alignment with the predefined research domains)
Q2: In the case of studies predicting energy consumption or generation, do they utilize real-world time-series datasets (such as AMI, SCADA, or meteorological data) to train and validate their models? (to reduce data validity bias by prioritizing empirical evidence derived from real operational environments)
Q3: In the case of studies detecting anomalies and faults, do they utilize synthetic datasets generated coherently based on plausible values and realistic distributions? (to mitigate data realism bias by ensuring that simulated data adequately represent real-world fault conditions)
Q4: In the case of studies not based on purely statistical approaches, do they address specific forecasting limitations by proposing advanced frameworks (e.g., metaheuristic hyperparameter tuning, dimensionality reduction techniques such as PCA/ICA, or energy-efficient architectures)? (to reduce methodological bias by ensuring that model complexity is justified and relevant limitations are explicitly addressed)
Q5: In the case of studies based on purely statistical approaches, do they provide a detailed benchmark against more complex solutions, justifying their advantages and limitations? (to mitigate methodological bias by requiring comparative validation against state-of-the-art approaches)
Q6: Does the proposed method demonstrate clear operational applicability to ensure grid reliability, infrastructure safety, or optimal energy management in real-world scenarios? (to reduce external validity bias by ensuring practical relevance and real-world applicability of the proposed methods)

Each question was evaluated using a binary scoring scheme, assigning one point for each affirmative response («Yes») and zero otherwise, resulting in a maximum score of six points per study. Questions not applicable to a given study type—for instance, Q2 for anomaly detection studies or Q5 for non-statistical approaches—were scored as affirmative, on the grounds that the absence of a particular bias dimension does not constitute a methodological weakness. This scoring system enabled a direct operationalization of the risk-of-bias assessment.

Based on this framework, studies were classified into two categories:

Low risk of bias (high methodological quality): 4–6 points
High risk of bias (low methodological quality): 0–3 points

A minimum threshold of four points was required for an article to be included as a core study in this systematic review. The risk of bias assessment was also considered during the interpretation of results, particularly when comparing studies with heterogeneous data sources, validation strategies, and levels of operational applicability.

The evaluation was conducted at the study level through full-text analysis. The initial assessment was performed by a single reviewer (the first author), following a structured and predefined protocol. To mitigate potential subjectivity and confirmation bias, the evaluation outcomes were subsequently reviewed and discussed collaboratively among all authors until consensus was reached.

Artificial-intelligence-based tools were used exclusively during the preliminary screening phase to assist in prioritizing the reading order of candidate articles. These tools did not influence the quality assessment, risk of bias evaluation, data extraction, or inclusion decisions.

This hybrid quality and risk-of-bias assessment framework ensures alignment with PRISMA 2020 requirements while providing a domain-specific, reproducible, and transparent evaluation strategy adapted to the methodological characteristics of artificial intelligence research in energy systems.

3.10. Data Items (Outcomes)

The outcomes for which data were sought were defined according to the nature of the predictive task addressed in each study, distinguishing between regression-based forecasting problems (energy consumption and generation) and classification-based anomaly detection tasks. These evaluation metrics constitute the effect measures used for the synthesis and comparison of the included studies.

For energy consumption and generation forecasting studies, which involve continuous time-series prediction, the primary outcomes correspond to standard regression performance metrics, including Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the coefficient of determination (

R^{2}

). Additional commonly reported metrics, such as Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE), were also systematically collected. When available, complementary metrics—such as Mean Bias Error (MBE), normalized RMSE (nRMSE), Willmott Index (WI), Mean Percentage Error (MPE), and directional accuracy measures (e.g., A10, PCD)—as well as computational or energy efficiency indicators, were extracted to provide a more comprehensive evaluation of model performance.

For anomaly detection studies, which are formulated as classification problems, the primary outcomes include F1-score, Precision, and Recall. Secondary outcomes such as Accuracy, Area Under the ROC Curve (AUC), and False Positive Rate (FPR) were also collected when reported.

All outcomes compatible with each domain were sought for every included study. In the case of forecasting models, no restriction was imposed on the temporal prediction horizon; instead, all reported horizons (e.g., short-term, long-term, and multi-step predictions) were systematically extracted and documented to enable comparative analysis of performance degradation across time scales.

When multiple evaluation metrics or experimental configurations were reported within a single study, the extraction process prioritized the most representative and methodologically consistent results, ensuring fair and transparent comparison across studies. This typically involved selecting results associated with the main proposed model and standard evaluation settings while avoiding selective reporting of only the most favorable outcomes.

Studies that did not report sufficient or comparable quantitative evaluation metrics, or that failed to enable meaningful cross-study comparison, were excluded during the selection process in accordance with the predefined inclusion and exclusion criteria.

To mitigate reporting bias, all predefined outcomes, including negative results and reported limitations, were systematically extracted and documented for each study, ensuring that the synthesis reflects both the strengths and weaknesses of the analyzed approaches without selective outcome reporting.

To ensure consistency in the synthesis and presentation of results, several data preparation procedures were applied. Building upon the outcome selection criteria described above, when studies reported multiple evaluation metrics or experimental configurations, the most representative and methodologically consistent results were selected, prioritizing those associated with the main proposed model under standard evaluation settings.

In cases of missing or incomplete reporting of summary statistics, no imputation or estimation was performed, in accordance with the predefined no-inference policy. Such instances were explicitly documented as “not reported” to avoid introducing interpretative bias.

Additionally, no numerical transformations or unit conversions were required, as the extracted evaluation metrics (e.g., MAE, RMSE,

R^{2}

, F1-score) are standardized across the literature. When necessary, results were qualitatively harmonized to ensure comparability across studies with different temporal horizons or dataset characteristics without altering the original reported values.

It should be noted that, owing to the computational and predictive nature of the included studies—focused on machine learning algorithms and time-series forecasting rather than clinical or social interventions—the conventional reporting of summary statistics per intervention group and effect size estimates with confidence intervals is not applicable. The included studies do not compare intervention and control groups; instead, individual study results were extracted and reported using the standard predictive performance and algorithmic error metrics employed in this research domain, as detailed above. When available, measures of statistical precision—including confidence intervals, p-values from paired significance tests, and model confidence set assessments—were also systematically extracted and are reported alongside the primary performance metrics in Table 6. Where no such measures were reported by the original authors, this absence was explicitly documented.

3.11. Data Presentation and Tabulation

To ensure a structured and transparent presentation of the synthesized evidence, the extracted data were organized into both tabular formats and narrative summaries.

Two complementary tables were constructed to support the synthesis. Table 5 summarizes the methodological characteristics of each study, including energy domain, dataset origin, temporal resolution, model architecture, model category, and optimization strategy. Table 6 presents the evaluation metrics, main results, and reported limitations, enabling direct cross-study comparison of performance and methodological trade-offs.

In addition, a PRISMA 2020 flow diagram (Figure 1) was used to visually represent the study selection process. No advanced graphical transformations or statistical visualizations were applied, as the synthesis follows a qualitative comparative approach based on heterogeneous study designs.

3.12. Data Items (Other Variables)

In addition to the predefined outcomes, a comprehensive set of non-outcome variables was systematically extracted to characterize the methodological, contextual, and operational properties of each included study.

In line with PRISMA 2020 recommendations, study-level characteristics were adapted to the engineering domain to reflect the operational environment and underlying energy infrastructure. Accordingly, variables describing the application context were collected, including system type (e.g., smart grids, photovoltaic plants, wind farms, hydro-turbines, or nuclear power plants), energy source, and geographic location. These variables enable contextual interpretation of model performance across heterogeneous deployment scenarios.

The extracted variables were organized into two main categories to ensure methodological clarity and consistency with the comparative tables presented in this study. First, algorithmic and methodological variables were collected, including model architecture, model category (statistical, machine learning, deep learning, hybrid, LLM-based, or neuromorphic), optimization strategies, and the target energy domain. These variables are summarized in Table 5. Second, data and contextual variables were extracted, including dataset origin (real-world, simulated, or experimental), dataset size, temporal resolution, geographic scope, and input features or exogenous variables. These characteristics are detailed in Tables 10 and 11.

No data were collected regarding funding sources of the included studies, as these were not considered methodologically relevant for evaluating predictive performance, model behavior, or comparative validity within the scope of this review.

A strict no-inference policy was applied throughout the data extraction process. Only explicitly reported information in the full-text articles was recorded. Missing, incomplete, or unclear data were not estimated, reconstructed, or inferred from figures or secondary descriptions; instead, such cases were systematically documented as “not reported” (or “—” in the corresponding tables).

In cases of ambiguity, particularly regarding dataset descriptions, temporal horizons, or metric definitions, information was recorded exactly as presented by the original authors to avoid interpretative bias. However, studies exhibiting methodological ambiguity that prevented fair quantitative comparison—such as inconsistent evaluation protocols or non-standardized metrics—were excluded during the eligibility phase in accordance with the predefined exclusion criteria.

This structured extraction strategy ensured consistency between the narrative synthesis and the tabulated data, enabling transparent and reproducible comparison of methodological approaches, data characteristics, and operational contexts across all included studies.

4. Study Extraction and Selection

Once the methodological criteria had been established, the research questions were addressed through structured data synthesis based on in-depth reading and comparative analysis of the selected studies. To ensure a reproducible, empirical, and objective review process, a set of predefined variables was defined to standardize the selection and analysis criteria.

4.1. Variables to Be Extracted

The comparative analysis of the selected studies reveals several recurring methodological dimensions across the literature. Firstly, dataset characteristics (V1) highlight the heterogeneous operational environments in which forecasting and anomaly detection models are evaluated. These datasets vary significantly in geographical scope, temporal resolution, and data origin, ranging from real-world measurements obtained through infrastructures such as Advanced Metering Infrastructure (AMI) and SCADA systems to controlled simulated environments designed to reproduce rare disturbances or critical failure scenarios.

Secondly, the application domain addressed by the studies (V2) is heterogeneous. Some contributions focus on electricity consumption forecasting, while others address renewable energy generation prediction—particularly photovoltaic and wind systems—or anomaly and fault detection in power infrastructures. This diversity reflects the multifaceted nature of smart grid management, in which forecasting accuracy, operational planning, and infrastructure monitoring are treated as interconnected analytical dimensions.

Another relevant dimension is the integration of exogenous variables (V3), which aim to capture the complex dynamics of energy systems. Many studies incorporate meteorological factors such as temperature, wind speed, or solar irradiance, while others extend the modeling framework by including economic, demographic, or industrial indicators.

To further characterize the role of exogenous variables across domains, Figure 2 summarizes their distribution in the reviewed core studies. The figure reveals a clear asymmetry: while consumption forecasting integrates a heterogeneous combination of meteorological, socioeconomic, and mixed variables, generation forecasting relies predominantly on meteorological inputs, and anomaly detection is mainly driven by technical or intrinsic system variables.

Consequently, the predictive models employed (V4) vary considerably across the literature. Traditional statistical models—such as SARIMA or Holt–Winters—coexist with more advanced Machine Learning and Deep Learning architectures, including XGBoost, LSTM networks, Residual Neural Networks (ResNets), and diffusion-based probabilistic models, reflecting the methodological diversity observed across the reviewed studies.

Despite these differences, a substantial proportion of the reviewed studies report competitive predictive performance (V5), typically evaluated using standard metrics such as MAPE, RMSE, or the coefficient of determination (

R^{2}

). However, notable disparities arise when computational efficiency and model complexity are considered (V6), highlighting potential limitations for large-scale deployment or real-time applications in operational grid environments. To address these constraints, several studies incorporate advanced optimization strategies—such as bio-inspired metaheuristics—or explore alternative architectures such as Spiking Neural Networks (SNNs), which are used to improve predictive performance and computational efficiency through hyperparameter optimization and data representation.

4.2. Data Extraction and Selection Diagram

A diagram illustrating the process described below is shown in Figure 1. Following the systematic search conducted across Google Scholar and Scopus, a total of 50 articles were pre-selected for detailed full-text review. This set comprises studies derived from the seven search filters, following the standardized screening rule of selecting up to five articles from Google Scholar and up to three from Scopus per filter during the title and abstract review stage, resulting in a total of 50 articles for full-text review. This initial pool reflects a deliberate and reproducible retrieval strategy: a standardized limit applied consistently across Google Scholar and Scopus ensured that selection was driven by relevance ranking rather than database volume, yielding a methodologically homogeneous corpus. The subsequent snowballing expansion to 60 articles further demonstrates that the core pool was sufficiently representative, as the additional 48 secondary references were captured through citation tracing rather than gaps in the primary search. These studies were published in high-impact JCR-indexed journals across major scientific publishers, primarily Elsevier, IEEE, Springer Nature, and MDPI.

During the initial deduplication stage, two articles were removed due to explicit duplication between the Google Scholar and Scopus databases, specifically within the consumption and statistical forecasting filters. After removing these duplicated entries, a total of 48 unique manuscripts remained for the rigorous application of the inclusion and exclusion criteria through full-text analysis.

The application of the predefined inclusion and exclusion criteria resulted in the exclusion of 20 articles. These studies were primarily discarded because they evaluated predictive models in isolation, lacked rigorous empirical comparative analysis against baseline algorithms, or failed to validate their frameworks using real-world operational datasets such as AMI, SCADA, or domain-specific meteorological records. Consequently, a refined subset of 28 articles demonstrating stronger methodological consistency and empirical grounding was retained for the subsequent quality evaluation stage.

Subsequently, the 6-point Study Quality Evaluation questionnaire defined in the previous section was applied to these 28 articles. This evaluation framework assessed the methodological rigor, operational applicability, and algorithmic contribution of each study. Based on the established minimum acceptance threshold of four out of six points, a total of 10 articles were excluded. Although these studies satisfied several foundational requirements, they generally failed to incorporate advanced optimization frameworks—such as bio-inspired metaheuristic algorithms or dimensionality reduction techniques—or did not demonstrate a clear operational contribution to infrastructure monitoring or grid stability. As a result, a refined pool of 18 articles proceeded to the final selection stage.

The final inclusion phase was conducted to consolidate a balanced and methodologically robust study set. From the 18 articles that successfully passed the quality evaluation stage, a comparative thematic assessment was performed to ensure a coherent distribution across the predefined research domains. In line with the methodological objective of retaining approximately two representative studies per filter, six additional articles were excluded during this final screening stage. Although these works satisfied the minimum quality threshold, they were ultimately excluded because other studies presented greater algorithmic novelty, stronger benchmarking procedures, or closer alignment with the analytical focus of this review. Consequently, the definitive corpus of this systematic review consists of exactly 12 core articles. One of these—[15]—was not identified through the primary database search filters but was retrieved during the snowballing phase of [14] and subsequently incorporated as a core study on the grounds of its direct methodological relevance to the energy generation forecasting domain.

Once the definitive core set of 12 main articles had been consolidated and their relevant information extracted, a rigorous snowballing technique was applied to each of them. Specifically, four highly relevant secondary references were selected from the state-of-the-art citations of each core study in order to provide methodological background, algorithmic foundations, and baseline comparisons. Through this process, an additional 48 secondary studies were incorporated, resulting in a final theoretical and methodological framework composed of 60 articles for the complete systematic review.

The synthesized findings remain dependent on the specific datasets employed and the algorithms implemented in each analyzed study.

Table 4 presents the studies that were evaluated through full-text analysis but ultimately excluded from the core corpus during the eligibility, quality assessment, or final thematic selection stages. For each study, the corresponding search string and the primary reason for exclusion are provided to ensure transparency and reproducibility of the selection process. Studies excluded during the earlier title and abstract screening stage are not included in this table. Records of exclusions for search strings 1 and 2 were not systematically documented during the review process and therefore cannot be reported.

4.3. Table for Systematic Literature Review

The comparative summary of the twelve core articles selected through the PRISMA methodology is presented in Table 5 and Table 6. Table 5 synthesizes the methodological profile of each study, including the energy domain, data sources, temporal resolution, proposed architecture, model category, and optimization strategy. Table 6 complements this information with the evaluation metrics, main results, key contributions, and limitations reported by the authors.

4.4. Explanation of the Comparison of Tables

Table 5 and Table 6 provide a structured comparison of the twelve final articles that successfully completed the stages of the aforementioned PRISMA process. The following synthesis compares the core findings, methodological approaches, and limitations reported in the selected studies.

Starting with the datasets used in the articles, their sizes vary widely. This is due to the fact that the number of samples depends strongly on the specific study. For example, the dataset employed by [9] contains 280 samples, while those used by [2,19] contain 316.7 M and 2.07 M samples, respectively. The most common data partition strategy is an 80/20 % split for training and testing sets, although [20] additionally incorporates the last 20% as an explicit validation set. These findings indicate a recurrent methodological limitation, namely the limited use of explicit validation sets under real operational conditions. However, one of the inclusion criteria applied—which favors real datasets over synthetic ones—partially mitigates the risk of limited real-world applicability. The most commonly used preprocessing techniques include min–max normalization [2,13,19] and standardization [2,15]. Other relevant preprocessing approaches include PCA / ICA and Random Forest-based feature selection, which provide a well-defined framework for data transformation and enrichment.

Regarding the proposed architectures, hybrid models are widespread, with six of the twelve core articles combining multiple components, such as Autoencoder-LSTM, BO-CNN-LSTM, or XGBoost-SSA. Alternative approaches are also represented, including LLM-based models [7], SNNs [13] for energy efficiency, and classical statistical models, which remain competitive for long-horizon forecasting with monthly granularity [4]. Many high-performing architectures incorporate explicit optimization techniques. The most commonly used methods include RAdam and AdamW in the deep learning domain, BOA [19], SSA [18], and BO [9] for metaheuristic optimization, whose computational complexity and scalability constitute recurrent limitations in the reviewed literature [48], and LoRA for efficient LLM fine-tuning [7].

Most of the core articles share a prediction horizon ranging from one hour to one week. In contrast, anomaly detection studies are not constrained by a specific temporal horizon. This permits partial comparison across studies using common evaluation metrics. The most widely used metrics for forecasting are MAE, RMSE, and

R^{2}

, while anomaly detection studies primarily rely on F1-score, Precision, and Recall. Notably, [5] base their conclusions on statistical significance rather than isolated numerical improvements, highlighting the importance of generalization. Overall, the highest

R^{2}

values are reported by [15,18]. Furthermore, the most data-efficient approach is presented by [7], which achieves effective training using only 10% of the full dataset. Finally, one of the most distinctive and energy-efficient solutions is proposed by [13], achieving improvements of approximately 7–9× in efficiency while maintaining competitive predictive performance.

Finally, several cross-cutting limitations emerge from the comparison of the twelve core studies. One of the most recurrent issues is the scarcity of real-world data for anomalous or extreme operational conditions, particularly evident in [8,9], where experimental and simulated datasets limit validation under real infrastructure stress. Similarly, Refs. [15,18] lack geographical transferability, as their architectures were not validated across different climatic or socio-economic regions. Another important concern is the dependency on specialized hardware—the advantages of neuromorphic approaches in [13] diminish on conventional GPUs, while the LLM-based approach of [7] requires substantial computational resources, limiting scalability. Furthermore, only [4,18] partially incorporate socio-economic exogenous variables, and although transfer learning appears only marginally in the reviewed literature despite the recurring problem of data scarcity, it is only explicitly considered by [8,9].

To complement the tabulated comparison and capture the structural patterns emerging from heterogeneous experimental settings, an ordinal synthesis is introduced. Figure 3 provides a unified abstraction of the reviewed studies by jointly positioning them along two key dimensions: predictive precision and computational efficiency.

The visualization reveals a clear concentration of hybrid and deep learning architectures in the high-precision region, confirming their dominant role in achieving state-of-the-art performance. However, this gain is consistently associated with increased computational cost and reduced real-time applicability, particularly for LLM-based and optimization-intensive models. In contrast, statistical and lightweight machine learning approaches occupy lower-cost regions while maintaining competitive performance, especially in constrained environments and long-horizon forecasting scenarios.

Importantly, no single modeling paradigm consistently dominates across all evaluation dimensions. Instead, each approach provides value under specific operational conditions, reflecting a clear trade-off between predictive precision and computational efficiency. This behavior is also observed in anomaly detection studies, which follow different evaluation criteria and problem formulations compared to forecasting tasks. Consequently, the reviewed studies should be interpreted as complementary solutions rather than hierarchically superior alternatives within energy systems.

5. Results and Discussion

This section presents the results of the systematic review on artificial intelligence approaches to energy consumption and generation forecasting, anomaly detection, and their implications for public decision-making. The analysis encompasses several key aspects, including the impact of the reviewed literature, the indexing coverage of the selected core studies, the characteristics of the datasets employed in each approach, and a critical discussion of the reviewed methodologies.

5.1. Risk-of-Bias Assessment Results

Table 7 presents the results of the study-level risk-of-bias assessment applied to the twelve core articles, evaluated using the six-item hybrid quality questionnaire described in the previous section. All included studies surpassed the minimum acceptance threshold of four points, confirming their methodological adequacy for inclusion in this systematic review.

The assessment reveals three distinct risk profiles across the included corpus. Six studies attained the maximum score of six points [4,8,9,15,18,20], combining real-world or experimentally grounded datasets, advanced modelling frameworks, and demonstrable operational applicability. Five studies scored five points [2,7,13,14,19], each incurring a single methodological limitation: [2] did not propose an advanced modelling framework beyond the comparative evaluation of established algorithms, yielding Q4 = 0, whereas [7,13,14,19] lack explicit evidence of operational deployment in real grid environments, yielding Q6 = 0. The sole borderline case is [5], which attained four points and relies on a simulated Kaggle dataset without demonstrable real-world operational validation, introducing both a data validity risk (Q2 = 0) and an external validity risk (Q6 = 0); its inclusion was nonetheless retained on the grounds that the PCA/ICA dimensionality reduction framework and the cross-model benchmarking structure satisfy the remaining criteria and provide methodologically relevant evidence for the comparative synthesis. Across all twelve studies, the most recurrent sources of bias are geographical concentration—with datasets drawn predominantly from Turkey, the Arabian Peninsula, and China—temporal data scarcity in single-year deployments [20], and the controlled laboratory conditions inherent to anomaly detection studies [8,9], where real-world fault data remain scarce and experimental surrogates are unavoidable. These limitations were systematically considered during the synthesis and are explicitly reflected in the conclusions of this review.

Concerning the risk of bias due to missing results, it could not be ascertained whether any included study collected additional evaluation metrics beyond those reported in the original publications. A limited number of potentially relevant studies were inaccessible due to paywall restrictions, which may have introduced a retrieval bias, favouring institutionally accessible publications. Furthermore, the restriction to Q1–Q3 JCR-ranked journals, while ensuring methodological rigour, may have systematically excluded studies reporting negative or inconclusive findings published in lower-ranked venues. Within the anomaly detection domain, the limited availability of eligible core studies may partially reflect an underrepresentation of empirical work in the published literature, although the conclusions derived from the available evidence remain consistent and methodologically substantiated. These factors collectively constitute a recognized limitation of this review.

Concerning the certainty of evidence, confidence in the synthesized findings was assessed separately for each outcome domain based on the convergence, independence, and contextual diversity of the supporting studies. For energy forecasting outcomes (MAE, RMSE,

R^{2}

, MAPE), the reviewed evidence supports the superior performance of hybrid deep learning architectures with high certainty, as convergent results were independently reported across geographically and methodologically diverse studies [14,18,19,20]; however, the competitiveness of statistical models for long-horizon forecasting rests on moderate certainty, relying primarily on [4] and its secondary references. For anomaly detection outcomes (F1-score, Precision, Recall, AUC), certainty is comparatively lower, as only two core studies address this domain [8,9], both dependent on experimental or simulated datasets, which constrains the generalizability of reported performance to real-world operational environments.

Building upon the qualitative assessment of methodological quality and risk of bias, Table 8 presents a structured cross-study comparison of the performance metrics reported by the selected core articles, providing a descriptive quantitative overview to support the comparative synthesis.

From a real-world deployment perspective, advanced AI models face several practical constraints that limit their direct integration into operational smart grid environments. First, scalability remains a critical challenge, as hybrid deep learning and transformer-based architectures entail high computational requirements, which may hinder their applicability in large-scale, real-time grid operations. Second, data availability remains uneven across domains: while forecasting tasks benefit from extensive historical consumption, generation, and meteorological datasets, anomaly detection is constrained by the structural scarcity of labeled fault data and the resulting reliance on simulated or experimental environments. Although generative approaches such as GANs and diffusion models partially mitigate this limitation through realistic data augmentation, the generalization of these models to real-world operational conditions remains limited. Finally, although predictive performance remains the primary evaluation criterion in the reviewed literature, interpretability should be considered a key requirement for deployment in high-stakes operational and policy contexts, where transparent and explainable outputs are necessary to support decision-making. Therefore, the practical deployment of AI in smart grids depends not only on predictive accuracy, but also on computational feasibility, data availability, and model transparency.

5.2. Analysis of Impact

After analyzing the articles extracted during the systematic literature review process, the impact of the selected studies meeting the inclusion and exclusion criteria was assessed and is presented in Table 9. Citation counts and journal impact factors are used as complementary indicators to assess the influence of the selected core studies.

As shown in Table 9, the core article published in the journal with the highest impact factor is the work by [7], which presents a multimodal approach based on LLMs for ultra-short-term wind energy forecasting. This method effectively integrates textual prompts and temporal numerical data through a Semantic Augmenter, achieving strong performance compared to traditional deep learning benchmarks, supporting its relevance within the reviewed corpus of energy forecasting studies.

Beyond individual examples, Table 9 reveals a heterogeneous impact distribution across the selected core studies. High-impact journals are predominantly associated with recent deep learning and hybrid approaches, particularly in energy forecasting and anomaly detection domains. For instance, works published in Applied Energy and Energy [7,9,18] combine high journal impact factors with strong citation performance, reflecting their relevance within the research community.

In contrast, several recent contributions published in high-quartile journals show relatively low citation counts due to their recency [5,8,13], indicating that citation-based impact should be interpreted in conjunction with publication year. Earlier works [19] accumulate higher citation counts despite being published in lower-impact venues, highlighting the temporal bias inherent in citation metrics.

Overall, the analysis suggests that methodological relevance, journal prestige, and citation counts are not always directly aligned and that recent advances in deep learning, multi-modal modeling, and anomaly detection are gaining visibility but have not yet reached citation maturity within the evaluated time window.

5.3. Analysis of Dataset Characteristics

With the aim of compiling the main datasets employed in each study, two comparative tables are presented. Table 10 shows the original dataset sources, while Table 11 provides an in-depth analysis highlighting their core features, thereby facilitating their comparative interpretation within this review.

Time granularity, sample size, and feature composition are particularly relevant for cross-study comparison; these characteristics are summarized in the following table.

The dataset analysis presented in Table 10 and Table 11 reveals a geographically diverse corpus spanning three continents, with a notable concentration in the Middle East and China. Eight out of twelve datasets correspond to real-world measurements, while the two anomaly detection studies [8,9] rely on experimental and simulated data, reinforcing the cross-cutting limitation of scarce real-world data for anomalous conditions. A substantial disparity in scale—from 280 samples [9] to over 316 million entries [2]—and in temporal resolution further reflects the heterogeneous operational environments of the domain. Additionally, consumption forecasting studies tend to incorporate socio-economic exogenous variables, whereas generation forecasting studies rely almost exclusively on meteorological parameters.

5.4. In-Depth Analysis of Research Approaches

In this section, a comprehensive examination of the strengths and limitations of the different research lines analyzed throughout the systematic review is provided, to compare the main strengths, limitations, and application contexts of the reviewed approaches for energy forecasting, anomaly detection, and public decision-making in smart grid environments.

5.4.1. Critical Cross-Comparison of Methodological Approaches

Regarding the reporting of individual study results, it should be noted that the computational and engineering nature of the included studies precludes the presentation of summary statistics per intervention group and effect estimates with confidence intervals. The included studies do not compare experimental and control groups; instead, individual study outcomes are reported through standard predictive performance metrics (MAE, RMSE,

R^{2}

, MAPE) for forecasting studies and classification metrics (F1-score, Precision, Recall, AUC) for anomaly detection studies, as systematically compiled in Table 6. Nevertheless, five partial exceptions approach the notion of statistical precision: Ref. [5] employs paired t-tests with reported p-values (

p < 0.05

) to validate the statistical significance of PCA over ICA; Ref. [4] reports execution times with a 98% confidence margin and utilizes 95% confidence intervals for ACF/PACF parameter estimation; and Ref. [20] applies the Model Confidence Set (MCS) method with associated p-values to assess whether competing models exhibit statistically indistinguishable performance. Additionally, Ref. [18] presents error variability distributions via box plots reporting median, dispersion, and outliers across all hybrid models, and Ref. [7] provides daily RMSE distributions via box plots showing median, quartiles, and maximum errors across datasets. To ensure compliance with PRISMA 2020 Item 19, all available precision and statistical significance indicators have been systematically extracted and documented in the Reported Precision column of Table 6. For the remaining seven core studies, no measures of precision were reported by the original authors; this predominant absence constitutes a methodological limitation inherited from the primary literature and should be taken into account when interpreting the comparative evidence synthesized in this review.

Firstly, the performance of predictive algorithms is analyzed in terms of accuracy, computational efficiency, and scalability. Hybrid Deep Learning architectures consistently achieve the highest accuracy—with

R^{2}

values of up to 0.9984 [18]—yet the computational demands of the underlying deep learning components limit real-time deployment in several operational settings, while metaheuristic optimization partially mitigates this burden, In parallel to these developments, recent NILM studies have explored multi-task learning formulations that jointly model appliance state detection and power estimation. In this context, expert-based architectures such as SAMNet [49] leverage cross-task dependencies through attention-based feature learning, while mixture-of-experts (MoE) approaches, such as the heterogeneous multi-gate framework in [50], introduce specialized subnetworks to capture diverse load patterns, particularly under low-frequency sampling conditions. These studies indicate a shift from single-task formulations toward modular and specialized learning for modeling heterogeneous appliance behaviors. In contrast, statistical models [4] offer interpretability and faster processing but struggle with non-linear patterns. LLM-based [7] and neuromorphic [13] paradigms emerge as complementary solutions addressing data scarcity and energy efficiency, respectively. These findings directly address RQ1, confirming that no single model category universally outperforms the others; instead, the optimal choice depends on temporal resolution, data availability, and operational context.

Secondly, the role of preprocessing and exogenous variable integration is assessed. A critical asymmetry is observed: consumption studies incorporate socio-economic indicators [4,18], whereas generation studies rely almost exclusively on meteorological parameters. This divergence, combined with the effectiveness of dimensionality reduction techniques such as PCA [5], indicates that the joint integration of both variable types remains underrepresented in bidirectional modeling between consumption and generation—directly addressing RQ2 and supporting more informed Demand Response strategies and operational decision-making.

From a policy perspective, the practical value of AI-based forecasting models does not lie solely in predictive accuracy, but in how performance metrics can be operationalized into decision-making processes. In this context, horizon-dependent error metrics such as RMSE provide differentiated signals for action: low short-term error enables reliable real-time control, whereas higher long-term error reflects structural uncertainty that limits predictive confidence. Complementarily, relative error metrics such as MAPE offer an interpretable measure of relative performance across heterogeneous consumption and generation contexts. Furthermore, uncertainty quantification becomes essential for assessing model reliability, as higher predictive variance directly translates into increased operational risk within smart grid environments.

Finally, the suitability of datasets and their implications for anomaly detection are examined. The anomaly detection domain faces a critical limitation, as both [8,9] rely on experimental and simulated data due to the scarcity of labeled real-world fault data. Combined with the lack of geographical cross-validation in several forecasting studies, this limitation addresses RQ3 by showing that model generalization across diverse operational contexts remains insufficiently demonstrated in the current literature, with direct implications for infrastructure security and public energy decision-making.

5.4.2. Hardware Dependency of Neuromorphic Computing Benefits

The energy efficiency gains reported for Spiking Neural Network (SNN) architectures are contingent on execution over dedicated neuromorphic hardware. Ref. [13] demonstrated that SNNs achieve approximately 7–9× improvements in power efficiency relative to conventional ANNs; however, the authors explicitly acknowledge that these gains diminish substantially when the same models are deployed on standard GPU or CPU infrastructure, in which the sparse, event-driven computation that underpins neuromorphic efficiency cannot be natively exploited. This hardware dependency constitutes a critical barrier to the operational adoption of SNNs in smart grid environments, where neuromorphic processors remain scarce and economically prohibitive at scale.

The reviewed evidence indicates that the practical advantage of SNNs over optimized deep learning pipelines remains restricted to specialized deployments in which dedicated neuromorphic hardware is available. A consistent limitation across studies is that spike-based computational sparsity is not equivalently preserved on conventional GPU or CPU infrastructure, which constrains the broader operational applicability of these models in current energy forecasting systems.

5.4.3. Integrated Narrative Synthesis of Core and Secondary Literature

Table 12 consolidates the forty-eight secondary references incorporated through snowballing, organized by domain, main model, and best reported performance metric, providing a structured reference point for the domain-specific synthesis that follows.

A narrative synthesis was conducted to integrate the evidence provided by the twelve core studies with the forty-eight secondary references obtained through snowballing. Representative references are included to preserve traceability between the narrative synthesis and the underlying PRISMA corpus. Rather than treating both groups separately, this synthesis identifies cross-temporal and cross-method patterns across the three research domains.

5.4.4. Energy Consumption Forecasting

Across energy consumption forecasting, a clear methodological transition is observed from statistical baselines toward hybrid deep learning architectures. The secondary literature anticipates this shift by introducing increasingly complex modeling paradigms, while core studies [2,4,5,18,19] provide empirical consolidation through benchmarking and real-world evaluation. In this context, both core and associated secondary studies show that statistical approaches such as ARIMA and SARIMA remain dominant due to their computational efficiency under linear seasonality [51,57,63,66], but consistently fail under non-linear consumption dynamics [17,58].

This limitation motivates the adoption of neural and sequential architectures, with ANN and LSTM models outperforming classical baselines in complex temporal settings [17,19,59,61,62]. Hybrid approaches further improve performance by combining linear and non-linear modeling capabilities [53,55,60], while metaheuristic optimization enhances convergence and predictive accuracy [18,19,54].

Recent works highlight limitations in feature representation and uncertainty handling. Dimensionality reduction techniques such as PCA and RPCA show mixed effectiveness depending on noise characteristics and climate conditions [56,64,67], while probabilistic ensemble methods are required to address non-stationarity and uncertainty [68]. Additionally, high-dimensional feature modeling through regularized machine learning approaches significantly outperforms manually curated feature sets [65].

A consistent structural limitation across both core and secondary literature is the inability of current models to capture bidirectional feedback between demand evolution and system-level transformations, particularly under renewable integration and electrification scenarios [3].

5.4.5. Energy Generation Forecasting

Energy generation forecasting exhibits a rapid progression toward increasingly complex deep learning architectures. Core studies [7,13,14,15,20] and the secondary literature establish CNN–LSTM hybrids as the dominant baseline due to their ability to jointly model spatial and temporal dependencies [69,74,76].

Performance improvements are increasingly driven by architectural sophistication. Attention mechanisms and feature-weighting strategies enhance forecasting accuracy [80], while multi-kernel convolutional designs capture multi-scale temporal dependencies [72]. Separate-stream feature extraction strategies and modal decomposition techniques improve representation quality and reduce non-stationarity prior to learning [73,79], and non-linear ensemble strategies improve model combination [78].

Sequential and encoder–decoder architectures consistently outperform traditional baselines [81,84], while autoencoder-based approaches improve latent representation [75]. Classical deep learning architectures such as ResNet also show strong performance across time-series tasks [82,83,92,93].

More recent advances indicate a transition toward multi-modal and foundation-model-based approaches. Transformer-based models improve multi-step forecasting [77], while transfer learning and large language model adaptation enhance generalization under limited data [6,52,70]. Multi-modal fusion approaches further extend predictive capabilities [71].

Despite these advances, important limitations remain. High model complexity introduces scalability constraints, and neuromorphic approaches, while energy-efficient, remain constrained by encoding strategies and hardware availability [13].

5.4.6. Anomaly Detection

The anomaly detection domain exhibits the most pronounced methodological evolution, transitioning from statistical approaches to advanced generative models. The secondary literature introduces increasingly complex detection paradigms, while core studies [8,9] provide empirical validation under application-specific constraints. In this context, early approaches rely on linear and rule-based methods, which are computationally efficient but insufficient for high-dimensional and non-stationary data [88,91].

Deep learning approaches improve detection performance by modeling temporal dependencies and latent structures. LSTM-based and VAE models enable anomaly detection through reconstruction-based strategies but suffer from instability and sensitivity to noise [87,89]. CNN-based methods achieve high accuracy in spatial domains but fail to capture temporal dependencies [85].

Generative models further enhance detection capabilities. GAN-based approaches outperform classical baselines but introduce training instability [90]. Diffusion-based models represent a recent advancement, showing promising performance and improved robustness in high-dimensional settings [8,86].

Despite these advances, a fundamental limitation persists: the scarcity of labeled fault data. Real-world failures are rare and difficult to capture, leading to reliance on synthetic data and motivating Sim2Real strategies and generative data augmentation approaches [87,91].

5.4.7. Cross-Temporal and Cross-Method Analysis

A cross-temporal analysis suggests a general three-stage evolution in forecasting-related domains, transitioning from statistical and classical machine learning models (2021–2022) to hybrid deep learning architectures (2023–2024), and more recently to emerging paradigms such as generative models, neuromorphic computing, and foundation models (2025–2026). However, this progression is not uniform across all domains, particularly in anomaly detection, where methodological choices remain strongly conditioned by data availability and application context.

The synthesis shows that secondary literature acts as a leading indicator of methodological innovation, introducing emerging architectures and data paradigms [52,70], while core studies provide empirical consolidation through benchmarking and application-oriented validation, often under real-world or realistically simulated conditions [7,8].

This synthesis consistently reveals a structural fragmentation across the three domains. While methodological advances have significantly improved predictive performance within each domain, the reviewed literature does not provide integrated frameworks capable of jointly modeling consumption, generation, anomaly detection, and decision-making. This limitation is consistently observed across both core and secondary studies and reflects a fundamental characteristic of the current state of the art.

5.4.8. Interpretation in the Context of the Reviewed Evidence

The three principal conclusions of this review are examined below in light of the forty-eight secondary references incorporated through snowballing.

Regarding the first conclusion—that hybrid deep learning architectures optimized with bio-inspired metaheuristics achieve the highest forecasting accuracy at substantial computational cost—the secondary literature confirms the accuracy claim while introducing an important nuance. MLP-PSOGWO architectures attain

R^{2}

values of 0.998 [53], and multi-sequence LSTM with GA/PSO tuning consistently improves multi-step forecasting [60], corroborating the synergy between deep learning and metaheuristic optimization. However, the evidence also indicates that metaheuristics frequently reduce computational cost: LSTM-BOA reduced prediction time by 25–30 % [19], XGBoost-SSA demonstrated superior convergence speed [18], and GA/PSO tuning is presented as a strategy to bypass exhaustive grid searches [60]. The computational burden therefore stems primarily from the deep learning component itself, whereas metaheuristic optimization often acts as a cost-reducing factor.

Regarding the second conclusion—that consumption studies integrate socio-economic variables while generation studies rely on meteorological inputs—the secondary evidence broadly supports this asymmetry [3,6,55,78]. However, the dichotomy is not absolute: several consumption studies combine economic indicators with weather variables to capture seasonal demand fluctuations [58,65], suggesting that the asymmetry is most pronounced in the selection of primary predictive drivers.

Regarding the third conclusion—that anomaly detection remains constrained by scarce labeled real-world fault data—the secondary references strongly reinforce this limitation, as actual failures are catastrophic and actively prevented [87,91]. The field has responded by shifting toward unsupervised generative models—GANs [90], VAEs [87], and DDPMs [86]—that detect anomalies through reconstruction error without labeled fault data, reinforcing the need for transfer learning and data augmentation to bridge the gap between simulated validation and operational deployment.

5.4.9. Sim2Real Roadmap: Generative Models and Digital Twins for Anomaly Detection

The persistent reliance on experimental and simulated fault data in anomaly detection studies [8,9] highlights the need for a structured Sim2Real transition strategy. Generative Adversarial Networks and Denoising Diffusion Probabilistic Models have demonstrated the capacity to synthesize realistic fault scenarios [86,90], making them natural candidates for constructing digital twin environments that closely replicate operational conditions in photovoltaic plants, wind turbines, and nuclear facilities.

A viable roadmap comprises three sequential stages: (1) generative augmentation, in which GANs or diffusion models are trained on available experimental data to produce diverse and physically plausible fault signatures; (2) digital twin validation, in which augmented datasets are used to calibrate high-fidelity simulation environments against real SCADA and sensor records; and (3) cross-domain transfer, in which models trained within the digital twin are progressively adapted to real infrastructure through domain adaptation techniques. This pipeline would directly address the labeled data scarcity that currently constrains supervised and unsupervised anomaly detection alike, strengthening both model generalizability and infrastructure security in smart grid deployments.

5.5. Limitations of the Review Process and Implications for Practice

Several process limitations should be considered. Screening was conducted by a single reviewer without formal inter-rater reliability, although all decisions were collaboratively validated. The standardized retrieval rule of up to five articles from Google Scholar and up to three from Scopus per search string, while ensuring reproducibility, may have excluded relevant studies ranking below these thresholds. In addition, the structured snowballing procedure applied to the core studies reveals the presence of terminological variability across related research works, particularly in the use of alternative expressions and domain-specific formulations that are not always captured by predefined keyword-based search strings. While the original search strategy was designed to ensure methodological consistency and reproducibility across databases, this observation highlights a potential limitation of keyword-based retrieval approaches in interdisciplinary domains. Systematically incorporating such terminological patterns into expanded search strategies constitutes a methodological refinement that can further enhance retrieval coverage, without affecting the validity of the current core corpus, which remains grounded in a rigorously defined PRISMA 2020-compliant selection process. The protocol was not pre-registered, and exclusion records for Search Strings 1 and 2 were not documented. The restriction to Q1–Q3 JCR journals may have introduced publication bias toward positive results. No formal sensitivity analysis or quantitative heterogeneity assessment was conducted, although these constraints were partially mitigated by the predefined quality threshold.

To address the identified gap regarding the fragmented treatment of energy consumption, generation, anomaly detection, and policy processes in the existing literature, Figure 4 proposes an integrated conceptual framework. Rather than representing a fully implemented modeling approach, this framework synthesizes the current state of the art by explicitly structuring the interactions between these domains within a closed-loop perspective. In this formulation, consumption and generation forecasting define the expected system behavior, while real-time operational data capture the observed system state. Anomaly detection operates at the interface between both layers, identifying deviations, instability patterns, and infrastructure failures arising from load variability and renewable intermittency. These signals are then propagated to the public energy decision-making and planning layer, where they inform actions such as demand response activation, infrastructure investment, and regulatory adjustment. In turn, these decisions modify both consumption dynamics and generation planning, closing the feedback loop and reinforcing the need for integrated, system-level AI-driven analysis.

This integrated perspective provides a structured basis for bridging the gap between predictive modeling and actionable policy design, reinforcing the value of the review as a synthesis of the current state of the art while identifying a coherent pathway for future integration.

Regarding implications for practice and policy, the superiority of hybrid architectures for short-term forecasting suggests that operational smart grid deployment should prioritize metaheuristic-optimized deep learning pipelines where real-time accuracy is critical. The asymmetry in exogenous variable integration implies that energy agencies should promote standardized data collection jointly capturing meteorological, socio-economic, and demographic indicators. For anomaly detection, the persistent dependence on simulated fault data underscores the need for regulatory frameworks incentivizing the systematic collection and anonymized sharing of real-world incident records; in parallel, Sim2Real strategies based on generative models represent a complementary technical pathway to reduce this dependency.

These implications are not only driven by technical constraints but also by the broader regulatory framework currently shaping modern energy systems.

From a policy perspective, the transition toward carbon neutrality introduces additional constraints that directly influence the design requirements of AI-based energy models. The increasing penetration of renewable energy sources, driven by decarbonization targets, amplifies system intermittency and uncertainty, thereby reinforcing the need for high-resolution forecasting, adaptive demand response strategies, and robust anomaly detection mechanisms. In this context, AI models must not only optimize predictive performance but also support system-level objectives aligned with carbon reduction, such as improved integration of variable renewable generation, enhanced grid flexibility, and more efficient energy management. Consequently, the design of AI architectures in smart grids is increasingly conditioned by policy-driven requirements, where decarbonization goals act as an external driver shaping data needs, modeling approaches, and operational priorities.

6. Research Gaps and Future Directions

Despite the rapid advancement of AI-based approaches for energy forecasting and anomaly detection, several structural research gaps remain unresolved. A key limitation lies in the limited transferability of existing models across domains, as most approaches remain highly task-specific and dependent on narrowly defined datasets and operational conditions. This restricts their applicability across heterogeneous environments, including variations in geographical regions, temporal resolutions, and energy system configurations. In this context, recent advances in foundation models and Large Language Models (LLMs) suggest a potential shift toward more flexible and generalizable learning frameworks, capable of integrating heterogeneous data sources and capturing complex dependencies across energy systems. Rather than converging toward a single universal model, future research should focus on developing adaptable and modular architectures that enable cross-domain generalization and efficient knowledge transfer. This transition is particularly relevant in emerging applications such as electric vehicle (EV) charging demand forecasting, where temporal variability and user behavior introduce additional modeling complexity.

From an implementation perspective, the lack of standardized design and evaluation frameworks for emerging paradigms such as Spiking Neural Networks (SNNs) constrains their practical deployment. Although SNNs demonstrate significant potential in terms of energy efficiency, their performance remains highly dependent on encoding schemes, training protocols, and hardware configurations. Establishing standardized benchmarks and reproducible evaluation pipelines is therefore essential to facilitate their integration into real-world smart grid environments. More broadly, future research should prioritize the development of scalable and interoperable AI systems that enable the integration of forecasting, anomaly detection, and decision-making layers within unified operational frameworks, supporting both real-time grid management and long-term infrastructure planning.

7. Conclusions

This systematic review synthesizes the current state of artificial intelligence approaches for energy consumption and generation forecasting, anomaly detection, and their implications for public decision-making in smart grid environments. Through the rigorous application of the PRISMA 2020 methodology, twelve core articles and forty-eight secondary references were selected, analyzed, and cross-compared across three interconnected research domains.

Overall, the synthesis of the reviewed literature reveals three consistent findings across the analyzed domains. First, hybrid deep learning architectures optimized with metaheuristic strategies achieve the highest predictive accuracy, although no single paradigm demonstrates universal superiority. Second, predictive techniques significantly contribute to smart grid optimization, particularly in demand response, yet their effectiveness is constrained by a systematic asymmetry in the integration of exogenous variables between consumption and generation models. Third, anomaly detection approaches remain fundamentally limited by the scarcity of labeled real-world fault data, with emerging generative models partially mitigating this constraint. Collectively, these findings confirm that the primary limitation of the current state of the art lies not in individual model performance, but in the absence of integrated frameworks jointly modeling consumption, generation, anomaly detection, and public decision-making.

Regarding the first objective—to evaluate and compare the efficiency of advanced forecasting models—the results reveal that hybrid architectures combining Deep Learning with bio-inspired metaheuristic optimization consistently achieve the highest predictive accuracy, with

R^{2}

values reaching up to 0.9984, although the computational burden stems primarily from the deep learning components themselves, with metaheuristic optimization frequently acting as a cost-reducing factor. Nevertheless, purely statistical models such as SARIMA and FB Prophet remain competitive for long-horizon monthly forecasting, where interpretability and computational efficiency are prioritized. Furthermore, LLM-based and neuromorphic paradigms emerge as two disruptive directions, addressing data scarcity through few-shot learning and computational sustainability through a seven-to-nine-fold improvement in energy efficiency, respectively. In direct response to RQ1, no single algorithmic paradigm achieves universal dominance; rather, optimal model selection is conditioned by temporal resolution, data availability, and operational context.

Concerning the second objective—to analyze the impact of predictive techniques on Smart Grid management—a critical asymmetry is identified in the integration of exogenous variables: consumption forecasting studies tend to incorporate socio-economic indicators such as GDP or population, whereas generation forecasting studies rely almost exclusively on meteorological parameters. This divergence, combined with the effectiveness of dimensionality reduction techniques and the bidirectional feedback between demand and generation mix identified through the snowball contextual analysis, supports the conclusion for RQ2 that integrating both variable types constitutes a key and largely unexplored direction for improving Demand Response strategies, operational decision-making, and sustainable energy policy formulation.

With respect to the third objective—to identify computational strategies for mitigating infrastructure vulnerability—the anomaly detection domain presents a critical limitation. Both supervised and unsupervised core studies rely on experimental and simulated datasets due to the scarcity of labeled real-world fault data in operational power plants. The contextual analysis further indicates that diffusion-based probabilistic models represent a highly effective paradigm for unsupervised safety-critical anomaly detection, while Bayesian-optimized CNN-LSTM hybrids provide a strong supervised alternative. In response to RQ3, the lack of geographical cross-validation and real-world anomalous data highlights the need for transfer learning frameworks and domain adaptation strategies to ensure model generalization and infrastructure security across diverse operational contexts.

Several future research directions are identified. First, the development of integrated frameworks capable of simultaneously modeling the bidirectional feedback loop between consumption, generation, and political decision-making remains the most significant unexploited research gap. Second, the exploration of domain-specific foundation models—trained on energy time-series rather than general-purpose language corpora—could enhance the few-shot learning capabilities demonstrated by LLM-based approaches. Third, the systematic incorporation of transfer learning and data augmentation techniques is necessary to address the scarcity of real-world anomalous data in critical energy infrastructure, with Sim2Real pipelines based on generative models emerging as a promising direction to bridge the gap between simulated validation and operational deployment. Finally, the large-scale deployment of neuromorphic computing architectures on dedicated hardware could help reconcile the trade-off between predictive accuracy and computational sustainability, which currently constrains the operational applicability of advanced Deep Learning models in real-time smart grid environments.

Author Contributions

Conceptualization, D.V.A.; methodology, J.Á.R.G.; writing—original draft preparation, D.V.A.; writing—review and editing, C.Z.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research is part of the International Chair on Trustworthy Artificial Intelligence and Demographic Challenge within the National Strategy for Artificial Intelligence (ENIA), in the framework of the European Recovery, Transformation and Resilience Plan (Ref. TSI-100933-2023-0001). This project is funded by the Secretary of State for Digitalization and Artificial Intelligence and by the European Union (Next Generation). The APC was funded by the University of Salamanca.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article. The working documentation of the screening process, including article-by-article selection records and filter-level breakdowns, is available from the corresponding author upon reasonable request.

Acknowledgments

The authors acknowledge the institutional support provided by the University of Salamanca and the International Chair on Trustworthy Artificial Intelligence and Demographic Challenge.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ACF	Autocorrelation Function
ADF	Augmented Dickey–Fuller Test
AE	Autoencoder
AMI	Advanced Metering Infrastructure
ANN	Artificial Neural Network
AOD	Aerosol Optical Depth
ARIMA	Autoregressive Integrated Moving Average
ARIMAX	ARIMA with Exogenous Variables
AUC	Area Under the Curve
BO	Bayesian Optimization
BOA	Butterfly Optimization Algorithm
CNN	Convolutional Neural Network
DDPM	Denoising Diffusion Probabilistic Model
DL	Deep Learning
ESS	Energy Storage System
EV	Electric Vehicle
FPR	False Positive Rate
GA	Genetic Algorithm
GAN	Generative Adversarial Network
GHI	Global Horizontal Irradiance
GRU	Gated Recurrent Unit
GS	Google Scholar
GWO	Grey Wolf Optimizer
HVAC	Heating, Ventilation, and Air Conditioning
ICA	Independent Component Analysis
JCR	Journal Citation Reports
LIME	Local Interpretable Model-agnostic Explanations
LLM	Large Language Model
LoRA	Low-Rank Adaptation
LSTM	Long Short-Term Memory
MAE	Mean Absolute Error
MAPE	Mean Absolute Percentage Error
ML	Machine Learning
MLP	Multilayer Perceptron
MSE	Mean Squared Error
NILM	Non-Intrusive Load Monitoring
NSRDB	National Solar Radiation Database
NWP	Numerical Weather Prediction
PACF	Partial Autocorrelation Function
PCA	Principal Component Analysis
PRISMA	Preferred Reporting Items for Systematic Reviews and Meta-Analyses
PSO	Particle Swarm Optimization
RF	Random Forest
RMSE	Root Mean Square Error
RPCA	Robust Principal Component Analysis
SARIMA	Seasonal Autoregressive Integrated Moving Average
SARIMAX	SARIMA with Exogenous Variables
SC	Scopus
SCADA	Supervisory Control and Data Acquisition
SHAP	SHapley Additive exPlanations
SNN	Spiking Neural Network
SSA	Single Scattering Albedo/Sparrow Search Algorithm
SVM	Support Vector Machine
SVR	Support Vector Regression
TBATS	Trigonometric Seasonality, Box–Cox Transformation, ARMA Errors, Trend and Seasonal Components
VAE	Variational Autoencoder
VEP	Virtual Energy Plant
VRE	Variable Renewable Energy

References

Ejuh Che, E.; Roland Abeng, K.; Iweh, C.D.; Tsekouras, G.J.; Fopah-Lele, A. The Impact of Integrating Variable Renewable Energy Sources into Grid-Connected Power Systems: Challenges, Mitigation Strategies, and Prospects. Energies 2025, 18, 689. [Google Scholar] [CrossRef]
Park, M.J.; Yang, H.S. Comparative Study of Time Series Analysis Algorithms Suitable for Short-Term Forecasting in Implementing Demand Response Based on AMI. Sensors 2024, 24, 7205. [Google Scholar] [CrossRef] [PubMed]
Jin, H.; Guo, J.; Tang, L.; Du, P. Long-term electricity demand forecasting under low-carbon energy transition: Based on the bidirectional feedback between power demand and generation mix. Energy 2024, 286, 129435. [Google Scholar] [CrossRef]
Serrano, A.L.M.; Rodrigues, G.A.P.; Martins, P.H.d.S.; Saiki, G.M.; Filho, G.P.R.; Gonçalves, V.P.; Albuquerque, R.D.O. Statistical Comparison of Time Series Models for Forecasting Brazilian Monthly Energy Demand Using Economic, Industrial, and Climatic Exogenous Variables. Appl. Sci. 2024, 14, 5846. [Google Scholar] [CrossRef]
Saeedi, N.; Baharvand, D.; Shirini, K.; Gharehveran, S.S. Prediction of electrical energy consumption using principal component analysis and independent components analysis. J. Supercomput. 2025, 81, 1072. [Google Scholar] [CrossRef]
Peng, X.; Yang, Z.; Li, Y.; Wang, B.; Che, J. Short-term wind power prediction based on stacked denoised auto-encoder deep learning and multi-level transfer learning. Wind Energy 2023, 26, 1066–1081. [Google Scholar] [CrossRef]
Fan, H.; Li, M.; Zhang, Z.; Cheng, L.; Ye, Y.; Liu, W.; Liu, D. M2WLLM: Multi-modal multi-task ultra-short-term wind power prediction algorithm based on large language model. Inf. Fusion 2026, 126, 103541. [Google Scholar] [CrossRef]
Liu, S.; Zhu, Z.; Zhao, X.; Wang, Y.; Sun, X.; Yu, L. Unsupervised anomaly detection for Nuclear Power Plants based on Denoising Diffusion Probabilistic Models. Prog. Nucl. Energy 2025, 178, 105521. [Google Scholar] [CrossRef]
Dao, F.; Zeng, Y.; Qian, J. Fault diagnosis of hydro-turbine via the incorporation of bayesian algorithm optimized CNN-LSTM neural network. Energy 2024, 290, 130326. [Google Scholar] [CrossRef]
Mosavi, A.; Salimi, M.; Faizollahzadeh Ardabili, S.; Rabczuk, T.; Shamshirband, S.; Varkonyi-Koczy, A.R. State of the Art of Machine Learning Models in Energy Systems, a Systematic Review. Energies 2019, 12, 1301. [Google Scholar] [CrossRef]
Wang, H.; Lei, Z.; Zhang, X.; Zhou, B.; Peng, J. A review of deep learning for renewable energy forecasting. Energy Convers. Manag. 2019, 198, 111799. [Google Scholar] [CrossRef]
Voyant, C.; Notton, G.; Kalogirou, S.; Nivet, M.L.; Paoli, C.; Motte, F.; Fouilloy, A. Machine learning methods for solar radiation forecasting: A review. Renew. Energy 2017, 105, 569–582. [Google Scholar] [CrossRef]
Ayasi, B.; Vázquez, I.X.; Saleh, M.; Garcia-Vico, A.M.; Carmona, C.J. Application of spiking neural networks and traditional artificial neural networks for solar radiation forecasting in photovoltaic systems in Arab countries. Neural Comput. Appl. 2025, 37, 9095–9127. [Google Scholar] [CrossRef]
Abdul Baseer, M.; Almunif, A.; Alsaduni, I.; Tazeen, N. Electrical Power Generation Forecasting from Renewable Energy Systems Using Artificial Intelligence Techniques. Energies 2023, 16, 6414. [Google Scholar] [CrossRef]
Alwadei, S.; Farahat, A.; Ahmed, M.; Kambezidis, H.D. Prediction of Solar Irradiance over the Arabian Peninsula: Satellite Data, Radiative Transfer Model, and Machine Learning Integration Approach. Appl. Sci. 2022, 12, 717. [Google Scholar] [CrossRef]
Salhein, K.; Kobus, C.J.; Zohdy, M. Forecasting Installation Capacity for the Top 10 Countries Utilizing Geothermal Energy by 2030. Thermo 2022, 2, 334–351. [Google Scholar] [CrossRef]
Tarmanini, C.; Sarma, N.; Gezegin, C.; Ozgonenel, O. Short term load forecasting based on ARIMA and ANN approaches. Energy Rep. 2023, 9, 550–557. [Google Scholar] [CrossRef]
Li, X.; Wang, Z.; Yang, C.; Bozkurt, A. An advanced framework for net electricity consumption prediction: Incorporating novel machine learning models and optimization algorithms. Energy 2024, 296, 131259. [Google Scholar] [CrossRef]
Hora, S.K.; Poongodan, R.; de Prado, R.P.; Wozniak, M.; Divakarachari, P.B. Long Short-Term Memory Network-Based Metaheuristic for Effective Electric Energy Consumption Prediction. Appl. Sci. 2021, 11, 11263. [Google Scholar] [CrossRef]
Zafar, A.; Che, Y.; Ahmed, M.; Sarfraz, M.; Ahmad, A.; Alibakhshikenari, M. Enhancing Power Generation Forecasting in Smart Grids Using Hybrid Autoencoder Long Short-Term Memory Machine Learning Model. IEEE Access 2023, 11, 118521–118537. [Google Scholar] [CrossRef]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. Declaración PRISMA 2020: Una guía actualizada para la publicación de revisiones sistemáticas. Rev. Espa NOla Cardiol. 2021, 74, 790–799. [Google Scholar] [CrossRef]
Ramos, P.V.B.; Villela, S.M.; Silva, W.N.; Dias, B.H. Residential energy consumption forecasting using deep learning models. Appl. Energy 2023, 350, 121705. [Google Scholar] [CrossRef]
Nooruldeen, O.; Baker, M.R.; Aleesa, A.M.; Ghareeb, A.; Shaker, E.H. Strategies for predictive power: Machine learning models in city-scale load forecasting. e-Prime-Adv. Electr. Eng. Electron. Energy 2023, 6, 100392. [Google Scholar] [CrossRef]
Sundaram, K.; Sri Preethaa, K.R.; Natarajan, Y.; Muthuramalingam, A.; Ali, A.A.Y. Advancing building energy efficiency: A deep learning approach to early-stage prediction of residential electric consumption. Energy Rep. 2024, 12, 1281–1292. [Google Scholar] [CrossRef]
Devanathan, B.; Jnana Varshitha, K.; Pavan Kumar, L.; Lakshmanan, S.A.; Krishna Prakash, N. Explainable AI Framework Using XGBoost With SHAP and LIME for Multi-Scale Household Energy Forecasting. IEEE Access 2025, 13, 149750–149764. [Google Scholar] [CrossRef]
Moon, Y.; Lee, Y.; Hwang, Y.; Jeong, J. Long Short-Term Memory Autoencoder and Extreme Gradient Boosting-Based Factory Energy Management Framework for Power Consumption Forecasting. Energies 2024, 17, 3666. [Google Scholar] [CrossRef]
Wang, H.; Chen, H.; Yu, F.; Xu, Z.; Peng, M. Anomaly detection and reconstruction of sensors in nuclear power plants based on principal component analysis and improved deep neural networks. Prog. Nucl. Energy 2026, 191, 106088. [Google Scholar] [CrossRef]
Caixeta, B.M.; Guimaraes, J.V.S.A.; Santos, M.C.; Silva, M.C.; Nicolau, A.S.; Schirru, R.; Candeias, D.S.M.; Frazão, M.G.; Castro, J.M. Optimizing deep neural networks for nuclear power plant temperature estimation: A study on feature importance and outlier detection. Prog. Nucl. Energy 2026, 191, 106039. [Google Scholar] [CrossRef]
Elbordany, A.A.; Kandil, M.M.; Youness, H.A.; Abdelaal, H.M. An efficient AI algorithm for fault diagnosis in nuclear power plants based on machine deep learning techniques. Prog. Nucl. Energy 2025, 180, 105580. [Google Scholar] [CrossRef]
Xing, Y.; Wang, J.; Cui, S.; Liu, X.; Song, M. Interpretable Bayesian-optimized Autoencoder for fault detection and diagnosis in nuclear power plants. Prog. Nucl. Energy 2026, 190, 105982. [Google Scholar] [CrossRef]
Su, B.; Zhou, Z.; Chen, H. PVEL-AD: A Large-Scale Open-World Dataset for Photovoltaic Cell Anomaly Detection. IEEE Trans. Ind. Inform. 2023, 19, 404–413. [Google Scholar] [CrossRef]
Khan, P.W.; Yeun, C.Y.; Byun, Y.C. Fault detection of wind turbines using SCADA data and genetic algorithm-based ensemble learning. Eng. Fail. Anal. 2023, 148, 107209. [Google Scholar] [CrossRef]
Leite Coelho da Silva, F.; da Silva Cordeiro, J.; da Costa, K.; Saboya, N.; Canas Rodrigues, P.; López-Gonzales, J.L. Time series forecasting via integrating a filtering method: An application to electricity consumption. Comput. Stat. 2025, 40, 5023–5042. [Google Scholar] [CrossRef]
Alizadegan, H.; Rashidi Malki, B.; Radmehr, A.; Karimi, H.; Ilani, M.A. Comparative study of long short-term memory (LSTM), bidirectional LSTM, and traditional machine learning approaches for energy consumption prediction. Energy Explor. Exploit. 2025, 43, 281–301. [Google Scholar] [CrossRef]
Hossain, M.L.; Shams, S.M.N.; Ullah, S.M. Time-series and deep learning approaches for renewable energy forecasting in Dhaka: A comparative study of ARIMA, SARIMA, and LSTM models. Discov. Sustain. 2025, 6, 775. [Google Scholar] [CrossRef]
Moustati, I.; Gherabi, N.; Saadi, M. Time-Series Forecasting Models for Smart Meters Data: An Empirical Comparison and Analysis. J. Eur. Des. Syst. Autom. 2024, 57, 1419–1427. [Google Scholar] [CrossRef]
Qureshi, S.; Shaikh, F.; Kumar, L.; Ali, F.; Awais, M.; Gürel, A.E. Short-term forecasting of wind power generation using artificial intelligence. Environ. Chall. 2023, 11, 100722. [Google Scholar] [CrossRef]
Faruque, M.O.; Hossain, M.A.; Islam, M.R.; Alam, S.M.M.; Karmaker, A.K. Very short-term wind power forecasting using hybrid deep learning model with optimization algorithm. Clean. Energy Syst. 2024, 9, 100129. [Google Scholar] [CrossRef]
Xiong, B.; Lou, L.; Meng, X.; Wang, X.; Ma, H.; Wang, Z. Short-term wind power forecasting based on Attention Mechanism and Deep Learning. Electr. Power Syst. Res. 2022, 206, 107776. [Google Scholar] [CrossRef]
Ye, L.; Dai, B.; Pei, M.; Lu, P.; Zhao, J.; Chen, M.; Wang, B. Combined Approach for Short-Term Wind Power Forecasting Based on Wave Division and Seq2Seq Model Using Deep Learning. IEEE Trans. Ind. Appl. 2022, 58, 2586–2596. [Google Scholar] [CrossRef]
Michalakopoulos, V.; Zakynthinos, A.; Sarmas, E.; Marinakis, V.; Askounis, D. Hybrid short-term wind power forecasting model using theoretical power curves and temporal fusion transformers. Renew. Energy 2026, 256, 124008. [Google Scholar] [CrossRef]
Elshewey, A.M.; Jamjoom, M.M.; Alkhammash, E.H. An enhanced CNN with ResNet50 and LSTM deep learning forecasting model for climate change decision making. Sci. Rep. 2025, 15, 14372. [Google Scholar] [CrossRef]
Ahmed, U.; Mahmood, A.; Khan, A.R.; Kuhlmann, L.; Alimgeer, K.S.; Razzaq, S.; Aziz, I.; Hammad, A. Parallel boosting neural network with mutual information for day-ahead solar irradiance forecasting. Sci. Rep. 2025, 15, 11642. [Google Scholar] [CrossRef]
Ye, A.; Xu, D.; Li, Y.; Du, J.; Wu, Z.; Tang, J. Neuromorphic energy economics: Toward biologically inspired and sustainable power market design. Front. Comput. Neurosci. 2025, 19, 1597038. [Google Scholar] [CrossRef]
Satti, S.; Dharmaraj, G.I. Power Quality Optimization in PV Grid Systems Using Hippopotamus-Driven MPPT and SyBel Inverter Control. Electronics 2025, 14, 4790. [Google Scholar] [CrossRef]
Travieso-González, C.M.; Cabrera-Quintero, F.; Piñán-Roescher, A.; Celada-Bernal, S. A Review and Evaluation of the State of Art in Image-Based Solar Energy Forecasting: The Methodology and Technology Used. Appl. Sci. 2024, 14, 5605. [Google Scholar] [CrossRef]
Alharbi, F.; Alwadie, A. Neuromorphic computing-based model for short-term forecasting of Global Horizontal Irradiance in Saudi Arabia. IEEE Access 2024, 12, 137642–137655. [Google Scholar] [CrossRef]
Pérez-Delgado, M.L.; Román-Gallego, J.Á. Computational complexity of swarm-based algorithms: A detailed analysis. AIMS Math. 2025, 10, 15539–15587. [Google Scholar] [CrossRef]
Liu, Y.; Qiu, J.; Ma, J. SAMNet: Toward Latency-Free Non-Intrusive Load Monitoring via Multi-Task Deep Learning. IEEE Trans. Smart Grid 2022, 13, 2412–2424. [Google Scholar] [CrossRef]
Liang, Z.; Yung Chung, C.; Yang, H.; Liang, J.; Zhang, W.; Dong, H.; Zhu, J. A Heterogeneous Multiple-Experts Approach to Low-Frequency Nonintrusive Load Monitoring. IEEE Trans. Smart Grid 2026, 17, 746–765. [Google Scholar] [CrossRef]
Khan, A.M.; Osinska, M. Comparing forecasting accuracy of selected grey and time series models based on energy consumption in Brazil and India. Expert Syst. Appl. 2023, 212, 118840. [Google Scholar] [CrossRef]
Germán-Morales, M.; Rivera-Rivas, A.; del Jesus Díaz, M.; Carmona, C. Transfer Learning with Foundational Models for Time Series Forecasting Using Low-Rank Adaptations. Inf. Fusion 2025, 123, 103247. [Google Scholar] [CrossRef]
Afzal, S.; Ziapour, B.M.; Shokri, A.; Shakibi, H.; Sobhani, B. Building energy consumption prediction using multilayer perceptron neural network-assisted models; comparison of different optimization algorithms. Energy 2023, 282, 128446. [Google Scholar] [CrossRef]
da Silva, F.L.C.; da Costa, K.; Canas Rodrigues, P.; Salas, R.; López-Gonzales, J.L. Statistical and Artificial Neural Networks Models for Electricity Consumption Forecasting in the Brazilian Industrial Sector. Energies 2022, 15, 588. [Google Scholar] [CrossRef]
Ding, S.; Hipel, K.W.; Dang, Y.g. Forecasting China’s electricity consumption using a new grey prediction model. Energy 2018, 149, 314–328. [Google Scholar] [CrossRef]
Ismail, N.; Abdullah, S. Principal component regression with artificial neural network to improve prediction of electricity demand. Int. Arab. J. Inf. Technol. 2016, 13, 196–202. [Google Scholar]
Kien, D.T.; Huong, P.D.; Minh, N.D. Application of Sarima Model in Load Forecasting in Hanoi City. Int. J. Energy Econ. Policy 2023, 13, 164–170. [Google Scholar] [CrossRef]
Kumar Dubey, A.; Kumar, A.; García-Díaz, V.; Kumar Sharma, A.; Kanhaiya, K. Study and analysis of SARIMA and LSTM in forecasting time series data. Sustain. Energy Technol. Assess. 2021, 47, 101474. [Google Scholar] [CrossRef]
Abbasimehr, H.; Paki, R. Improving time series forecasting using LSTM and attention models. J. Ambient. Intell. Humaniz. Comput. 2022, 13, 673–691. [Google Scholar] [CrossRef]
Bouktif, S.; Fiaz, A.; Ouni, A.; Serhani, M.A. Multi-sequence LSTM-RNN deep learning and metaheuristics for electric load forecasting. Energies 2020, 13, 391. [Google Scholar] [CrossRef]
Le, T.; Vo, M.T.; Vo, B.; Hwang, E.; Rho, S.; Baik, S.W. Improving electric energy consumption prediction using CNN and Bi-LSTM. Appl. Sci. 2019, 9, 4237. [Google Scholar] [CrossRef]
Khan, N.; Haq, I.U.; Khan, S.U.; Rho, S.; Lee, M.Y.; Baik, S.W. DB-Net: A novel dilated CNN based multi-step forecasting model for power consumption in integrated local energy systems. Int. J. Electr. Power Energy Syst. 2021, 133, 107023. [Google Scholar] [CrossRef]
Chaturvedi, S.; Rajasekar, E.; Natarajan, S.; McCullen, N. A comparative assessment of SARIMA, LSTM RNN and Fb Prophet models to forecast total and peak monthly energy demand for India. Energy Policy 2022, 168, 113097. [Google Scholar] [CrossRef]
Parhizkar, T.; Rafieipour, E.; Parhizkar, A. Evaluation and improvement of energy consumption prediction models using principal component analysis based feature reduction. J. Clean. Prod. 2021, 279, 123866. [Google Scholar] [CrossRef]
Albuquerque, P.C.; Cajueiro, D.O.; Rossi, M.D. Machine learning models for forecasting power electricity consumption using a high dimensional dataset. Expert Syst. Appl. 2022, 187, 115917. [Google Scholar] [CrossRef]
Burg, L.; Gürses-Tran, G.; Madlener, R.; Monti, A. Comparative analysis of load forecasting models for varying time horizons and load aggregation levels. Energies 2021, 14, 7128. [Google Scholar] [CrossRef]
Nyangon, J.; Akintunde, R. Principal component analysis of day-ahead electricity price forecasting in CAISO and its implications for highly integrated renewable energy markets. Wiley Interdiscip. Rev. Energy Environ. 2024, 13, e504. [Google Scholar] [CrossRef]
Berrisch, J.; Ziel, F. Multivariate probabilistic CRPS learning with an application to day-ahead electricity prices. Int. J. Forecast. 2024, 40, 1568–1586. [Google Scholar] [CrossRef]
Chen, Y.; Bhutta, M.S.; Abubakar, M.; Xiao, D.; Almasoudi, F.M.; Naeem, H.; Faheem, M. Evaluation of Machine Learning Models for Smart Grid Parameters: Performance Analysis of ARIMA and Bi-LSTM. Sustainability 2023, 15, 8555. [Google Scholar] [CrossRef]
Lai, Z.; Wu, T.; Fei, X.; Ling, Q. BERT4ST: Fine-tuning pre-trained large language model for wind power forecasting. Energy Convers. Manag. 2024, 307, 118331. [Google Scholar] [CrossRef]
Fan, H.; Zhang, X.; Mei, S.; Chen, K.; Chen, X. M2gsnet: Multi-modal multi-task graph spatiotemporal network for ultra-short-term wind farm cluster power prediction. Appl. Sci. 2020, 10, 7915. [Google Scholar] [CrossRef]
Ren, X.; Zhang, F.; Zhu, H.; Liu, Y. Quad-kernel deep convolutional neural network for intra-hour photovoltaic power forecasting. Appl. Energy 2022, 323, 119682. [Google Scholar] [CrossRef]
Gao, H.; Qiu, S.; Fang, J.; Ma, N.; Wang, J.; Cheng, K.; Wang, H.; Zhu, Y.; Hu, D.; Liu, H.; et al. Short-Term Prediction of PV Power Based on Combined Modal Decomposition and NARX-LSTM-LightGBM. Sustainability 2023, 15, 8266. [Google Scholar] [CrossRef]
Wang, K.; Qi, X. A comparison of day-ahead photovoltaic power forecasting models based on deep learning neural network. Appl. Energy 2019, 251, 113315. [Google Scholar] [CrossRef]
Zhang, Y.; Qin, C.; Srivastava, A.K.; Jin, C.; Sharma, R.K. Data-Driven Day-Ahead PV Estimation Using Autoencoder-LSTM and Persistence Model. IEEE Trans. Ind. Appl. 2020, 56, 7185–7192. [Google Scholar] [CrossRef]
Mukhtar, M.; Oluwasanmi, A. Development and Comparison of Two Novel Hybrid Neural Network Models for Hourly Solar Radiation Prediction. Appl. Sci. 2022, 12, 1435. [Google Scholar] [CrossRef]
Wu, H.; Meng, K.; Fan, D.; Zhang, Z.; Liu, Q. Multistep short-term wind speed forecasting using transformer. Energy 2022, 261, 125231. [Google Scholar] [CrossRef]
Chen, J.; Zeng, G.Q.; Zhou, W.; Du, W.; Lu, K.D. Wind speed forecasting using nonlinear-learning ensemble of deep learning time series prediction and extremal optimization. Energy Convers. Manag. 2018, 165, 681–695. [Google Scholar] [CrossRef]
Xiong, B.; Meng, X.; Wang, R. Combined Model for Short-term Wind Power Prediction Based on Deep Neural Network and Long Short-Term Memory. J. Phys. Conf. Ser. 2021, 1757, 012095. [Google Scholar] [CrossRef]
Lai, C.S.; Zhong, C. A deep learning based hybrid method for hourly solar radiation forecasting. Expert Syst. Appl. 2021, 177, 114941. [Google Scholar] [CrossRef]
Machalek, D.; Tuttle, J.; Andersson, K.; Powell, K.M. Dynamic energy system modeling using hybrid physics-based and machine learning encoder–decoder models. Energy AI 2022, 9, 100172. [Google Scholar] [CrossRef]
Alzahrani, A.; Shamsi, P.; Dagli, C.; Ferdowsi, M. Solar Irradiance Forecasting Using Deep Neural Networks. Procedia Comput. Sci. 2017, 114, 304–313. [Google Scholar] [CrossRef]
Benali, L.; Notton, G.; Fouilloy, A.; Voyant, C.; Dizene, R. Solar radiation forecasting using artificial neural network and random forest methods: Application to normal beam, horizontal diffuse and global components. Renew. Energy 2019, 132, 871–884. [Google Scholar] [CrossRef]
Mukhoty, B.P.; Maurya, V.; Shukla, S.K. Sequence to sequence deep learning models for solar irradiation forecasting. In 2019 IEEE Milan PowerTech; IEEE: New York, NY, USA, 2019. [Google Scholar] [CrossRef]
Kellil, N.; Aissat, A.; Mellit, A. Fault diagnosis of photovoltaic modules using deep neural networks and infrared images under Algerian climatic conditions. Energy 2023, 263, 125902. [Google Scholar] [CrossRef]
Yong, S.; Linzi, Z. Robust deep auto-encoding network for real-time anomaly detection at nuclear power plants. Process Saf. Environ. Prot. 2022, 163, 438–452. [Google Scholar] [CrossRef]
Kim, H.; Arigi, A.M.; Kim, J. Development of a diagnostic algorithm for abnormal situations using long short-term memory and variational autoencoder. Ann. Nucl. Energy 2021, 153, 108077. [Google Scholar] [CrossRef]
Zhu, G.N.; Ma, J.; Hu, J. A fuzzy rough number extended AHP and VIKOR for failure mode and effects analysis under uncertainty. Adv. Eng. Inform. 2022, 51, 101454. [Google Scholar] [CrossRef]
Du, Z.; Liang, X.; Chen, S.; Zhu, X.; Chen, K.; Jin, X. Knowledge-infused deep learning diagnosis model with self-assessment for smart management in HVAC systems. Energy 2023, 263, 125969. [Google Scholar] [CrossRef]
Plakias, S.; Boutalis, Y.S. Exploiting the generative adversarial framework for one-class multi-dimensional fault detection. Neurocomputing 2019, 332, 396–405. [Google Scholar] [CrossRef]
Chen, S.; Peng, M.; Xiong, H.; Wu, S. An anomaly detection method based on Lasso. Clust. Comput. 2019, 22, 5407–5419. [Google Scholar] [CrossRef]
Wang, Z.; Yan, W.; Oates, T. Time Series Classification from Scratch with Deep Neural Networks. In 2017 International Joint Conference on Neural Networks (IJCNN); IEEE: New York, NY, USA, 2017; pp. 1578–1585. [Google Scholar] [CrossRef]
Fawaz, H.I.; Forestier, G.; Weber, J.; Idoumghar, L.; Muller, P.A. Deep learning for time series classification: A review. Data Min. Knowl. Discov. 2019, 33, 917–963. [Google Scholar] [CrossRef]

Figure 1. PRISMA 2020 flow diagram illustrating the study selection process.

Figure 2. Distribution of exogenous variable types across domains. Consumption forecasting integrates a broader mix of meteorological, socioeconomic, and hybrid variables, whereas generation forecasting relies predominantly on meteorological inputs. Anomaly detection studies are primarily based on technical or intrinsic system variables.

Figure 3. Ordinal positioning of core studies by predictive precision and computational efficiency. Points are categorized by task domain and scaled by real-time applicability. Hybrid and deep learning models cluster in high-precision regions with higher computational cost, while statistical and lightweight approaches occupy lower-cost regions with competitive performance.

Figure 4. Conceptual framework integrating energy consumption forecasting, generation forecasting, anomaly detection, and public energy decision-making and planning within a unified smart grid perspective. The framework highlights the bidirectional feedback between consumption and generation, the influence of load variability and renewable intermittency on system stability, and the role of anomaly detection in identifying infrastructure vulnerabilities. AI techniques are represented as a transversal layer supporting policy-relevant decision-making processes, including demand response, infrastructure investment, and regulatory strategies.

Table 1. Seven systematic search strings with execution dates, databases, temporal filters, and applied limits.

SS	Search String	Execution Date	DB	Filters and Limits
SS1	`“electric energy consumption forecasting” AND (“machine learning” OR “deep learning”) AND (“short-term” OR “long-term”)`	October 2025 (earliest: 12 October 2025)	GS, SC	2022–2026; peer-reviewed; Q1–Q3 JCR; empirical ML/DL/statistical; English
SS2	`(“energy consumption forecasting” OR “electricity consumption prediction”) AND (“machine learning” OR “deep learning”) AND (“PCA” OR “principal component analysis” OR “feature weighting” OR “feature importance”) AND (“efficiency” OR “training time” OR “performance”)`	October–December 2025 (earliest: 29 October 2025)	GS, SC	2022–2026; peer-reviewed; Q1–Q3 JCR; empirical ML/DL/statistical; English
SS3	`(“energy generation forecasting” OR “electric energy generation forecasting” OR “power generation forecasting” OR “power output prediction”) AND (“artificial intelligence” OR “machine learning” OR “deep learning” OR “AI”)`	October–November 2025 (earliest: 19 October 2025)	GS, SC	2022–2026; peer-reviewed; Q1–Q3 JCR; empirical ML/DL/statistical; English
SS4	`(“power generation forecasting” OR “energy generation forecasting” OR “solar power prediction” OR “photovoltaic forecasting” OR “wind power forecasting”) AND (“machine learning” OR “deep learning” OR “artificial intelligence”) AND (“short-term” OR “long-term” OR “multi-step”)`	January 2026 (earliest: 4 January 2026)	GS, SC	2022–2026; peer-reviewed; Q1–Q3 JCR; empirical ML/DL/statistical; renewable sources; English
SS5	`(“solar radiation forecasting” OR “solar irradiance forecasting” OR “photovoltaic forecasting” OR “PV forecasting”) AND (“spiking neural network” OR “SNN” OR “neuromorphic computing”) AND (“artificial neural network” OR “ANN” OR “deep learning”) AND (“comparison” OR “energy efficiency” OR “computational efficiency” OR “power consumption”) AND (“short-term” OR “hourly”)`	February 2026 (earliest: 8 February 2026)	GS, SC	2022–2026; peer-reviewed; Q1–Q3 JCR; empirical; explicit SNN vs. ANN efficiency comparison required; English
SS6	`(“photovoltaic” OR “wind turbine” OR “power plant”) AND (“fault detection” OR “anomaly detection”) AND (“deep learning” OR “machine learning”)`	December 2025 (earliest: 17 December 2025)	GS, SC	2023–2026; peer-reviewed; Q1–Q3 JCR; empirical; operational applicability required; no purely theoretical works; English
SS7	`(“energy consumption forecasting” OR “power generation forecasting”) AND (“SARIMA” OR “Holt-Winters” OR “Kalman filter”) AND (“comparison” OR “benchmark” OR "baseline") AND ("short-term" OR "long-term")`	January 2026 (earliest: 4 January 2026; latest: 10 January 2026)	GS, SC	2022–2026; peer-reviewed; Q1–Q3 JCR; empirical; benchmarking against ML/DL required; English

SS = Search String; DB= Database; GS = Google Scholar; SC = Scopus.

Table 2. Inclusion and exclusion criteria applied to the twelve core articles (IC1–IC8, EC1–EC7).

ID	Type	Description
IC1	Inclusion	All studies must be directly focused on one of the three core research domains: electric energy consumption forecasting, energy generation prediction, or anomaly and fault detection in power systems.
IC2	Inclusion	All studies must move beyond isolated algorithms by proposing, implementing, or benchmarking advanced hybrid models, ensemble methods, statistical approaches, or Large Language Models (LLMs) in order to clarify and contrast the limitations of different model types.
IC3	Inclusion	Studies focused on consumption or generation forecasting must utilize real-world time-series datasets^a (such as AMI, SCADA, or meteorological data) to train, validate, or test their proposed models.
IC4	Inclusion	Studies focused on predictive models must address the complex hyperparameter tuning process by incorporating advanced optimization techniques, such as bio-inspired metaheuristics (e.g., BOA, SSA, GWO) or Bayesian Optimization algorithms.
IC5	Inclusion	Studies aiming to achieve a certain level of efficiency must address data redundancy, multicollinearity, and the “curse of dimensionality” by integrating robust data refinement frameworks, explicitly utilizing dimensionality reduction techniques (e.g., PCA, ICA) or advanced feature selection mechanisms.
IC6	Inclusion	Studies proposing neuromorphic computing architectures (e.g., SNNs) as a solution to the high computational requirements of Deep Learning models must include an explicit analysis of energy efficiency or computational cost, benchmarking them against traditional ANNs.
IC7	Inclusion	Studies within the anomaly detection domain must diagnose equipment faults or identify anomalous states in energy components with direct operational applicability to infrastructure safety and stability, using either supervised or unsupervised learning approaches.
IC8	Inclusion	Studies focused on statistical predictive models must clearly, quantitatively, and empirically justify their advantages and disadvantages compared to more modern and sophisticated architectures.
EC1	Exclusion	Studies focusing on generic time-series forecasting or unrelated domains without a direct and explicit application to electric energy consumption, generation, or power system anomaly detection.
EC2	Exclusion	Predictive studies that evaluate their proposed architectures in isolation, lacking rigorous empirical comparative analysis (benchmarking) against established baseline models, classical statistical methods, or state-of-the-art algorithms.
EC3	Exclusion	Studies on consumption or generation forecasting that rely exclusively on synthetic or purely simulated data and fail to validate their frameworks using real-world operational datasets (such as AMI, SCADA, or meteorological records).
EC4	Exclusion	Studies within the anomaly detection field that present purely theoretical approaches without clear and demonstrable operational applicability for monitoring components, diagnosing faults, or ensuring data quality in real power plants.
EC5	Exclusion	Studies exploring neuromorphic architectures (e.g., SNNs) that omit a direct quantitative assessment of their energy efficiency or computational footprint compared to conventional ANNs.
EC6	Exclusion	Purely qualitative research, systematic reviews, bibliometric analyses, or surveys that do not propose, implement, or quantitatively evaluate a mathematical, statistical, or algorithmic model.
EC7	Exclusion	Studies relying solely on traditional statistical methods without justifying their application through comparison with more modern machine learning or deep learning architectures.

Note: For the purposes of this review, “real-world data” are defined as time-series datasets obtained from operational energy systems and measurement infrastructures, including AMI, SCADA systems, and operational meteorological records, directly acquired from physical processes under real deployment conditions, excluding purely simulated or artificially generated data. For anomaly detection studies, an exception is made allowing synthetic datasets generated under realistic conditions and reflecting plausible operational distributions, due to the limited availability of labeled real-world fault data.

Table 3. Inclusion and exclusion criteria applied to secondary references derived from snowballing (SIC1–SIC4, SEC1–SEC3).

ID	Type	Description
SIC1	Inclusion	Studies must be published prior to or in the same year as the core article citing them. The strict 2022–2026 timeframe applied to primary studies is relaxed to include foundational models, baselines, or comparative datasets that directly influenced the development of the core article, thereby preserving the historical and temporal relevance of the algorithms.
SIC2	Inclusion	Studies must strictly comply with the foundational inclusion criteria (IC1 and IC2). Additionally, compliance with the remaining primary inclusion criteria (e.g., hyperparameter tuning, dimensionality reduction, or hybrid architectures) is positively valued.
SIC3	Inclusion	Studies must demonstrate strong contextual relevance to the core article, providing algorithmic background, foundational mathematical frameworks, or prior state-of-the-art architectures.
SIC4	Inclusion	Studies must facilitate synthesis and cross-comparison, enabling direct methodological or quantitative comparisons with the core articles or other secondary references.
SEC1	Exclusion	Outdated studies published significantly prior to the core article’s timeframe are excluded. Older literature is only retained if it represents a direct methodological predecessor, a closely related comparative study within the energy domain, or a specific baseline application that the core article explicitly uses to benchmark its performance or justify its research gap.
SEC2	Exclusion	Studies referenced by the core article solely for general contextualization or peripheral topics that diverge from the three main domains of this review (electric energy consumption, power generation, or anomaly detection in time-series data).
SEC3	Exclusion	Purely qualitative literature reviews, conceptual surveys, or theoretical papers that do not provide empirical quantitative results, baseline comparisons, or specific hyperparameter configurations necessary to contextualize the methodological evolution of the core article.

Table 4. Studies evaluated at full-text eligibility stage but excluded from the core corpus, with reasons for exclusion.

SS	Excluded Study	Reason for Exclusion
SS3	[22] Residential energy consumption forecasting using deep learning models	Benchmarking limited to comparison among deep learning architectures (RNN, LSTM, GRU, Transformer) without rigorous evaluation against established statistical baselines (EC2); quality score below the minimum four-point threshold.
SS3	[23] Strategies for predictive power: Machine learning models in city-scale load forecasting	Published in a journal not indexed in Q1–Q3 JCR rankings; failed to meet the indexing quality criterion.
SS3	[24] Advancing building energy efficiency: A deep learning approach to early-stage prediction of residential electric consumption	Evaluated proposed model against internal variants without rigorous benchmarking against established statistical or state-of-the-art baselines (EC2); did not incorporate metaheuristic optimization (IC4).
SS3	[25] Explainable AI Framework Using XGBoost With SHAP and LIME for Multi-Scale Household Energy Forecasting	Primary contribution focused on model explainability (SHAP/LIME) rather than on advancing predictive accuracy through hybrid architectures (IC2), metaheuristic optimization (IC4), or dimensionality reduction (IC5).
SS3	[26] Long Short-Term Memory Autoencoder and Extreme Gradient Boosting-Based Factory Energy Management Framework for Power Consumption Forecasting	Addressed factory-level consumption forecasting rather than energy generation prediction; did not incorporate metaheuristic optimization (IC4) or explicit dimensionality reduction (IC5).
SS4	[27] Anomaly detection and reconstruction of sensors in nuclear power plants based on PCA and improved deep neural networks	Primary focus on sensor signal reconstruction rather than direct fault detection or classification (IC1/IC7); insufficient alignment with the anomaly detection scope of this review.
SS4	[28] Optimizing deep neural networks for nuclear power plant temperature estimation: A study on feature importance and outlier detection	Primary objective centered on temperature estimation rather than anomaly or fault detection (IC1/IC7); outlier detection treated as a secondary byproduct without dedicated benchmarking.
SS4	[29] An efficient AI algorithm for fault diagnosis in nuclear power plants based on machine deep learning techniques	Met the minimum quality threshold but excluded during the final thematic selection stage; standard ML/DL classifiers without advanced architectural contributions offered insufficient novelty relative to [8,9].
SS4	[30] Interpretable Bayesian-optimized Autoencoder for fault detection and diagnosis with application in nuclear power plants	Met the minimum quality threshold but were excluded during the final thematic selection stage; primary contribution focused on interpretability, which falls outside the core methodological focus of this review, offering insufficient novelty to justify inclusion alongside studies with greater architectural or algorithmic contributions.
SS4	[31] PVEL-AD: A Large-Scale Open-World Dataset for Photovoltaic Cell Anomaly Detection	Dataset contribution rather than a predictive or diagnostic model; did not propose or benchmark an algorithmic solution (EC6).
SS4	[32] Fault detection of wind turbines using SCADA data and genetic algorithm-based ensemble learning	Excluded during the final thematic selection stage; supervised ensemble approach superseded by the included core studies, which address unsupervised detection paradigms more aligned with the focus on unlabeled fault scenarios.
SS5	[33] Time series forecasting via integrating a filtering method	Met the minimum quality threshold but excluded during the final thematic selection stage in favour of studies with stronger alignment with the energy domain focus and research gaps identified in this review.
SS5	[34] Comparative study of long short-term memory (LSTM), bidirectional LSTM, and traditional machine learning approaches for energy consumption prediction	Comparative benchmark without proposing hybrid architectures (IC2) or incorporating metaheuristic optimization (IC4) or dimensionality reduction techniques (IC5); quality score below the minimum four-point threshold.
SS5	[35] Time-series and deep learning approaches for renewable energy forecasting in Dhaka	Not indexed in Q1–Q3 JCR-ranked journals; failed to meet the indexing quality criterion.
SS5	[25] Explainable AI Framework Using XGBoost With SHAP and LIME	Primary contribution focused on model explainability rather than statistical benchmarking; thematic misalignment with the comparative scope of this search string (IC8).
SS5	[36] Time-Series Forecasting Models for Smart Meters Data	Not indexed in Q1–Q3 JCR-ranked journals; failed to meet the indexing quality criterion.
SS6	[37] Short-term forecasting of wind power generation using artificial intelligence	Did not propose hybrid architectures (IC2) or incorporate metaheuristic optimization (IC4) or dimensionality reduction techniques (IC5); quality score below the minimum four-point threshold.
SS6	[38] Very short-term wind power forecasting for real-time operation using hybrid deep learning model with optimization algorithm	Excluded during the final thematic selection stage; superseded by studies with greater algorithmic novelty and stronger benchmarking procedures.
SS6	[39] Short-term wind power forecasting based on Attention Mechanism and Deep Learning	Excluded during the final thematic selection stage; superseded by studies with greater algorithmic novelty and closer alignment with the identified research gaps.
SS6	[40] Combined approach for short-term wind power forecasting based on wave division and Seq2Seq model using deep learning	Excluded during the final thematic selection stage in favour of studies presenting greater novelty.
SS6	[41] Hybrid short-term wind power forecasting model using theoretical power curves and temporal fusion transformers	Excluded during the final thematic selection stage in favour of studies presenting greater novelty.
SS6	[42] An enhanced CNN with ResNet50 and LSTM deep learning forecasting model for climate change decision making	Insufficient direct applicability to power generation forecasting; primary focus on climate change modelling (EC1).
SS7	[43] Parallel boosting neural network with mutual information	Addressed irradiance forecasting but did not employ SNN architectures nor include a comparison with ANN models within a neuromorphic computing framework (EC1); thematic misalignment with the neuromorphic scope of this search string.
SS7	[44] Neuromorphic energy economics: toward biologically inspired and sustainable power market design	Conceptual contribution without empirical benchmarking of SNN against ANN for forecasting tasks; did not propose or evaluate a quantitative predictive model (EC6).
SS7	[45] Power quality optimization in PV grid systems using hippopotamus-driven MPPT	Oriented towards power quality optimisation rather than irradiance forecasting (EC1); no neuromorphic or SNN component present.
SS7	[46] A review and evaluation of the state of the art in image-based solar energy forecasting	Systematic review that does not propose or quantitatively evaluate an algorithmic model (EC6).
SS7	[47] Neuromorphic computing-based model for short-term forecasting of Global Horizontal Irradiance in Saudi Arabia	Did not establish a systematic quantitative comparison between ANN and SNN architectures, nor provide a quantitative energy efficiency analysis (IC6/EC5).

SS = Search String. Exclusion records for Search Strings 1 and 2 were not systematically documented and cannot be reported. Ref. [15] was identified through snowballing rather than direct database retrieval and is therefore not associated with any search string.

Table 5. Comparative summary of the selected core studies—methodological profile.

Reference	Energy Domain & Source	Dataset Origin	Time Resolution	Main Model	Model Category	Optimization Method
[13]	Generation Forecasting—Photovoltaic	Real-world	Hourly	Spiking Neural Network	SNN	RAdam, SLAYER 2.0, Bootstrap
[2]	Consumption Forecasting—Smart grid (AMI)	Real-world	Hourly	ARIMA, SARIMA, LSTM, SVM	Comparative	Adam
[14]	Generation Forecasting—Wind turbine, Photovoltaic	Real-world	10 min to hourly	LSTM, LightGBM, Sequenced-GRU	Hybrid	RMSProp
[15]	Generation Forecasting—Photovoltaic	Hybrid	Daily	Residual Network	Deep Learning	Adam
[4]	Consumption Forecasting—National grid	Real-world	Monthly	SARIMAX, FB Prophet, Holt–Winters, TBATS	Statistical	ADF, ACF/PACF
[18]	Consumption Forecasting—National grid	Real-world	Not reported	XGBoost-SSA	Hybrid	Sparrow Search Algorithm
[20]	Generation Forecasting—Photovoltaic	Real-world	Daily	Autoencoder-LSTM	Hybrid	Not reported
[9]	Anomaly Detection—Hydro-turbine	Experimental	48 kHz	BO-CNN-LSTM	Hybrid	Bayesian Optimization
[19]	Consumption Forecasting—Residential	Real-world	1 min to 10 min	LSTM-BOA	Hybrid	Butterfly Optimization Algorithm
[7]	Generation Forecasting—Wind turbine	Real-world	15 min	M2WLLM (GPT-2 based)	LLM-based	Low-Rank Adaptation
[5]	Consumption Forecasting—Facility	Real-world	Not reported	PCA/ICA + RF, SVR, LR, ANN, LSTM	Hybrid	Not reported
[8]	Anomaly Detection—Nuclear power plant	Simulated	Not reported	DDPM	Deep Learning	AdamW

Table 6. Comparative summary of the selected core studies—evaluation results, reported precision, and limitations. “Not reported” in the Reported Precision column indicates that the original study did not provide confidence intervals, p-values, or equivalent measures of statistical precision.

Reference	Evaluation Metrics	Main Results	Reported Precision	Limitations
[13]	MAE, RMSE, nRMSE, Power Efficiency	SNN achieved performance comparable to ANNs while being approximately 9 times more power efficient on neuromorphic hardware.	Not reported	Performance depends heavily on the encoding schema; efficiency gains diminish on non-neuromorphic hardware such as GPUs.
[2]	MSE, MAE, RMSE	SVM proved superior in handling nonlinear patterns and limited datasets compared to LSTM and statistical models.	Not reported	LSTM requires large datasets to perform adequately; the small sample size of 10 households limits the generalizability of the results.
[14]	MAE, MAPE, RMSE, $R^{2}$	The stacking ensemble methodology outperformed all individual base models, achieving an optimal $R^{2} = 0.9821$ .	Not reported	LightGBM is sensitive to oversampling; prediction precision drops as the forecasting time horizon increases.
[15]	MAE, MSE, $R^{2}$	ResNet achieved $R^{2} = 0.99$ , accurately predicting irradiance patterns despite dust storm variability.	Not reported	Cloud effects on albedo and diurnal albedo variations were not considered in the model.
[4]	MAPE, MPE, RMSE, nRMSE	FB Prophet achieved 0.71% MAPE; SARIMA was the most adequate model after residual autocorrelation tests.	98% confidence margin on execution times; 95% CI for ACF/PACF parameter estimation	Limited exogenous variables were incorporated; the study was restricted to demand data without demographic indicators.
[18]	RMSE, MAPE, $R^{2}$ , MBE, A10, PCD	XGBoost-SSA achieved the highest $R^{2} = 0.9984$ and the lowest testing errors among all optimizer combinations.	Error variability distribution via box plots (median, dispersion, and outliers)	Transferability to other geographical regions remains unvalidated; the temporal data span is limited.
[20]	MAE, MSE, RMSE	AE-LSTM achieved the lowest RMSE of 0.136, outperforming standard LSTM and Bi-LSTM architectures.	Model Confidence Set (MCS) with reported p-values	Standard LSTM models struggle with capturing long-term temporal dependencies in solar generation data.
[9]	Accuracy, Precision, Recall, F1-score	Achieved classification accuracies up to 98.4%, improving by up to 9% over unoptimized baseline models.	Not reported	Background noise degrades performance; scarce real-world fault data limits practical validation.
[19]	MAPE, MAE, RMSE, MSE	Obtained minimum MAPE between 0.05 and 0.09, while reducing prediction time by 25–30%.	Not reported	Not reported by the authors.
[7]	MAE, RMSE	Outperformed all deep learning benchmarks; demonstrated strong few-shot learning capability with only 10% of training data.	Daily RMSE distribution via box plots (median, quartiles, and maximum errors)	Relies on general pre-trained language models; the authors suggest training domain-specific base models for further improvement.
[5]	$R^{2}$ , RMSE, Willmott Index, MAE, MAPE	PCA outperformed ICA as a dimensionality reduction technique; Random Forest and LSTM combined with PCA achieved the highest accuracy.	Paired t-tests, $p < 0.05$	SVR struggles with sequential dependencies; the study would benefit from broader macroeconomic indicators.
[8]	Precision, Recall, F1-score, AUC, FPR	DDPM F1-score was 21.4% higher than the Autoencoder baseline; robustness remained above 0.935 under noisy conditions.	Not reported	Not reported by the authors.

Table 7. Study-level risk-of-bias assessment based on the six-item methodological quality questionnaire. Q3 = 1 (N/A) for non-anomaly-detection studies; Q4 = 1 (N/A) for purely statistical studies; Q5 = 1 (N/A) for non-statistical studies. Inclusion threshold: ≥4 points. Q1: domain relevance; Q2: real-world dataset; Q3: realistic anomaly dataset; Q4: advanced framework; Q5: statistical benchmark; Q6: operational applicability.

Reference	Q1	Q2	Q3	Q4	Q5	Q6	Total	Risk
[18] XGBoost-SSA for net electricity consumption prediction	1	1	1	1	1	1	6	Low
[13] SNNs for solar radiation forecasting in Arab countries	1	1	1	1	1	0	5	Low
[14] Ensemble learning for wind and solar power generation	1	1	1	1	1	0	5	Low
[2] Comparative study of ARIMA, SARIMA, LSTM and SVM on AMI data	1	1	1	0	1	1	5	Low
[4] Statistical models for monthly energy demand forecasting in Brazil	1	1	1	1	1	1	6	Low
[20] AE-LSTM hybrid for solar power plant generation forecasting	1	1	1	1	1	1	6	Low
[9] BO-CNN-LSTM for fault diagnosis in hydraulic turbines	1	1	1	1	1	1	6	Low
[19] LSTM-BOA for electric energy consumption prediction	1	1	1	1	1	0	5	Low
[7] Multimodal LLM for ultra-short-term wind energy forecasting	1	1	1	1	1	0	5	Low
[5] PCA/ICA combined with RF, SVR, ANN and LSTM for facility energy consumption	1	0	1	1	1	0	4	Low
[8] DDPM for unsupervised anomaly detection in nuclear power plants	1	1	1	1	1	1	6	Low
[15] ResNet for solar irradiance prediction over the Arabian Peninsula	1	1	1	1	1	1	6	Low

Table 8. Comparative benchmark of performance metrics across the selected core studies.

Reference/Domain	Category	Model/Approach	Metric	Best Value	Notes
Consumption Forecasting
[4] Consumption	Statistical	SARIMA/SARIMAX	MAPE	1.28–1.38%	Monthly demand, Brazil
[2] Consumption	ML	SVM	RMSE	0.120	Best model on AMI data
[19] Consumption	DL	LSTM-BOA	MAPE	0.09%	Metaheuristic-optimized LSTM
[18] Consumption	Hybrid	XGBoost-SSA	$R^{2}$	0.9984	Highest accuracy reported
[5] Consumption	Hybrid	PCA + RF / LSTM	$R^{2}$	0.79	Dimensionality reduction applied
Generation Forecasting
[13] Generation	SNN	SNN-CNN	nRMSE	0.022	Comparable accuracy, higher efficiency
[15] Generation	DL	ResNet	$R^{2}$	0.99	Solar irradiance, Arabian Peninsula
[14] Generation	Hybrid	LSTM + LightGBM	$R^{2}$	0.9821	Stacking ensemble, wind & PV
[20] Generation	Hybrid	AE-LSTM	MAE	0.0565	Autoencoder reduces LSTM error
[7] Generation	LLM	M2WLLM (GPT-2)	MAE	3.03	Few-shot, 10% training data
Anomaly Detection
[8] Anomaly	DL	DDPM	F1	0.971	Best unsupervised detection
[9] Anomaly	Hybrid	BO-CNN-LSTM	Accuracy	98.4%	Best supervised detection

DL = Deep Learning; ML = Machine Learning; LLM = Large Language Model; SNN = Spiking Neural Network. Best values correspond to the top-performing configuration reported in each core study. Forecasting (regression) and anomaly detection (classification) metrics are not directly comparable across domains.

Table 9. Comparative table of the impact of the selected core articles on AI-based energy forecasting and anomaly detection.

Reference	Journal	Impact Factor	Quartile	Citations	Indexing
[13]	Neural Computing and Applications	6.517	Q1	6	Scopus
[2]	Sensors	4.620	Q1	2	SCIE, Scopus
[14]	Energies	4.000	Q2	12	SCIE, Scopus
[15]	Applied Sciences	3.145	Q2	17	SCIE, Scopus
[4]	Applied Sciences	3.145	Q2	16	SCIE, Scopus
[18]	Energy	10.951	Q1	63	SCIE, Scopus
[20]	IET Smart Grid	3.370	Q1	11	Scopus
[9]	Energy	10.951	Q1	231	SCIE, Scopus
[19]	Applied Sciences	3.145	Q2	199	SCIE, Scopus
[7]	Applied Energy	12.811	Q1	5	SCIE, Scopus
[5]	The Journal of Supercomputing	4.199	Q1	2	SCIE, Scopus
[8]	Annals of Nuclear Energy	2.653	Q1	4	SCIE, Scopus

Table 10. Dataset sources employed by the selected core studies.

Dataset	Year	Geographic Origin	Reference
National Solar Radiation Database (NSRDB)	2017–2019	Palestine, Jordan	[13]
KT AMI Platform (Time-Specific AMI AI Training Data)	2021–2022	South Korea	[2]
IEA/KAPSARC (Wind & Solar Energy Data)	2019, 2022	Middle East	[14]
MODIS/Terra satellite & RTM simulations	2015–2018	Arabian Peninsula	[15]
EPE, IPEA, ABVE, INMET	2004–2023	Brazil	[4]
Turkish Electricity Transmission Company	1990–2010	Turkey	[18]
100 MW Solar Power Plant (real-time)	Not reported	Not reported	[20]
Hydro-turbine fault experiment bench	Not reported	China	[9]
IHEPC/AEP	2006–2010	France	[19]
Wind farm datasets (NWP-enriched)	2020	Inner Mongolia, Yunnan, Gansu (China)	[7]
Energy Consumption Prediction (Kaggle)	Not reported	Not reported	[5]
Fuqing Unit 2 Full-Scale Simulator (FU-FS)	Not reported	China	[8]

Table 11. In-depth analysis of the dataset characteristics employed by the selected core studies.

Dataset	Samples	Time Res.	Features/Exogenous Variables	Type
NSRDB	26,280	Hourly	GHI, air temperature, dew point, relative humidity, wind direction, wind speed	Real-world
KT AMI Platform	316.7 M	Hourly	Consumer number, district code, date/time, consumer type, holiday indicator, energy consumed	Real-world
IEA/KAPSARC	Not reported	10 min—Hourly	Solar irradiance, air temperature, relative humidity, atmospheric pressure, rainfall, wind speed/direction	Real-world
MODIS/RTM	183,960/year	Daily	Aerosol Optical Depth (AOD), Single Scattering Albedo (SSA), pressure, temperature, wind, ozone, water vapor	Hybrid
EPE, IPEA, ABVE, INMET	239	Monthly	Energy consumption, industrial (vehicles, oil, fertilizers), climatic (temperature, precipitation, radiation, wind), economic (GDP, NCPI, ENCPI)	Real-world
Turkish Elec. Transmission Co.	420	Monthly	Gross income, population, hourly load, immediate load, import, export, gross production, transmitted energy, net electricity consumption	Real-world
100 MW Solar Plant	365	Daily	Daily power generation (kWh), max grid-connected generation (MW), irradiance (MJ·m⁻²)	Real-world
Hydro-turbine bench	280	High-freq.	Acoustic signal amplitude (m/s²) under normal, sediment (0.73 kg/m³, 1.4 kg/m³), and impact conditions	Experimental
IHEPC/AEP	2.07 M	1 min/10 min	Voltage, global intensity, active/reactive power, sub-metering (IHEPC); visibility, humidity, temperatures, pressure, wind speed (AEP)	Real-world
Wind farm (NWP)	17,546	15 min	Historical wind power output, NWP wind speed, air pressure, temperature, textual prompts	Real-world
Kaggle Energy Consumption	Not reported	Not reported	Temperature, humidity, square meters, occupancy, HVAC usage, lighting, renewable energy, day of week, holiday	Simulated
FU-FS Simulator	7920	Not reported	2215 low-sensor monitoring points (reduced to 20–200 top features)	Simulated

Table 12. Summary of representative secondary references obtained through snowballing, organized by domain, main model, and best reported performance metric.

Reference	Domain	Main Model	Best Metric/Value
Energy Consumption Forecasting
[3] Long-term electricity demand forecasting under low-carbon energy transition	Consumption	SDs + PGMP	MAPE = 1.8%
[51] Comparing forecasting accuracy of grey and time series models in Brazil and India	Consumption	ONGBM/NGBM-PSO	MAPE = 0.53%
[52] Transfer learning with foundational models for time series forecasting using low-rank adaptations	Consumption	LLIAM	SMAPE = 0.094
[53] Building energy consumption prediction using MLP neural network-assisted models	Consumption	MLP-PSOGWO	$R^{2}$ = 0.998
[17] Short-term load forecasting based on ARIMA and ANN approaches	Consumption	ANN	MAPE = 1.80%
[54] Statistical and ANN models for electricity consumption forecasting	Consumption	MLP	MAPE = 2.32%
[55] Forecasting China’s electricity consumption using a new grey prediction model	Consumption	Rolling NOGM ( 1,1)	MAPE = 2.86%
[56] Principal component regression with ANN to improve prediction of electricity demand	Consumption	PCR-BPNN	MAPE ≈ 5%
[57] Application of SARIMA model in load forecasting in Hanoi City	Consumption	SARIMA	$R^{2}$ = 0.90
[58] Study and analysis of SARIMA and LSTM in forecasting time series data	Consumption	LSTM	RMSE = 0.23
[59] Improving time series forecasting using LSTM and attention models	Time-series	ATT-LSTM	SMAPE = 1.39
[60] Multi-sequence LSTM-RNN deep learning and metaheuristics for electric load forecasting	Consumption	LSTM-GA	CV (RMSE) = 0.621
[61] Improving electric energy consumption prediction using CNN and Bi-LSTM	Consumption	CNN + Bi-LSTM	MAPE = 11.66
[62] DB-Net: A dilated CNN multi-step forecasting model for power consumption in local energy systems	Consumption	DB-Net	MSE = 0.0029
[63] Comparative assessment of SARIMA, LSTM and Fb Prophet for monthly energy demand forecasting in India	Consumption	FB Prophet	MAPE = 3.01%
[64] Evaluation and improvement of energy consumption prediction models using PCA	Consumption	PCA + RF	$R^{2}$ = 0.99
[65] Machine learning models for forecasting power electricity consumption using a high dimensional dataset	Consumption	Random Forest	MAPE = 2.17
[66] Comparative analysis of load forecasting models for varying time horizons and aggregation levels	Consumption	Random Forest	MSE = 0.003
[67] PCA of day-ahead electricity price forecasting in CAISO	Price forecast.	PCA + Lin. Reg.	RMSE = 5.83
[68] Multivariate probabilistic CRPS learning for day-ahead electricity prices	Price forecast.	CRPS BOA	CRPS = 1.28
Energy Generation Forecasting
[69] Evaluation of ML models for smart grid parameters: ARIMA and Bi-LSTM	Generation (PV)	Bi-LSTM	MAE = 0.012
[70] BERT4ST: Fine-tuning pre-trained LLM for wind power forecasting	Generation (wind)	BERT4ST	MAE = 1.20
[71] M2GSNet: Multi-modal graph spatiotemporal network for wind farm cluster power prediction	Generation (wind)	M2GSNet	RMSE = 4.40%
[72] Quad-kernel deep CNN for intra-hour photovoltaic power forecasting	Generation (PV)	QK-CNN	$R^{2}$ = 0.98
[73] Short-term prediction of PV power based on combined modal decomposition	Generation (PV)	CD-NARX-LSTM	RMSE = 0.399 kW
[74] A comparison of day-ahead photovoltaic power forecasting models	Generation (PV)	CNN + LSTM	MAPE = 0.022
[75] Data-driven day-ahead PV estimation using autoencoder-LSTM and persistence model	Generation (PV)	AE-LSTM	nRMSE = 10.45%
[76] Development and comparison of hybrid neural network models for hourly solar radiation forecasting	Generation (solar)	CNN-ANN	r = 0.993
[77] Multistep short-term wind speed forecasting using transformer	Generation (wind)	EEMD-Transformer	MAE = 0.167
[78] Wind speed forecasting using nonlinear-learning ensemble of deep learning models	Generation (wind)	EnsemLSTM	MAPE = 5.42%
[79] Combined model for short-term wind power prediction	Generation (wind)	DNN-LSTM	MAE = 0.039
[6] Short-term wind power prediction based on stacked denoised auto-encoder and transfer learning	Generation (wind)	SDAE + TL	NRMSE = 11.56%
[80] A deep learning based hybrid method for hourly solar radiation forecasting	Generation (solar)	Clustering + FADF	RMSE = 112.60 W/m²
[81] Dynamic energy system modeling using hybrid physics-based and ML encoder-decoder models	Generation	Physics ED (GRU)	Norm. MSE = 0.024
[82] Solar irradiance forecasting using deep neural networks	Generation (PV)	DRNN + LSTM	RMSE = 0.086
[83] Solar radiation forecasting using artificial neural network and random forest methods	Generation (solar)	Random Forest	nRMSE = 19.65%
[84] Sequence to sequence deep learning models for solar irradiation forecasting	Generation (solar)	LSTM encoder-decoder	MAE = 30.3 W/m²
Anomaly Detection
[85] Fault diagnosis of photovoltaic modules using deep neural networks and infrared images	Anomaly det. (PV)	VGG-16	Accuracy = 99.91%
[86] Robust deep auto-encoding network for real-time anomaly detection at nuclear power plants	Anomaly det. (nuclear)	MVCGED	F1 = 0.999
[87] Development of a diagnostic algorithm for abnormal situations using LSTM and variational autoencoder	Anomaly det. (nuclear)	LSTM-VAE	Accuracy = 98.44%
[88] A fuzzy rough number extended AHP and VIKOR for failure mode and effects analysis	Fault evaluation	FR-MCGDM	Spearman $ρ$ = 0.972
[89] Knowledge-infused deep learning diagnosis model with self-assessment for HVAC systems	Anomaly det. (HVAC)	KINN	Accuracy = 0.897
[90] Exploiting the generative adversarial framework for one-class multi-dimensional fault detection	Anomaly det.	GAN	g-mean = 88.1%
[91] An anomaly detection method based on Lasso	Anomaly det.	Lasso-LARS-SCAD	Accuracy = 95.49%
Time-Series Classification (Methodological Antecedents)
[92] Time series classification from scratch with deep neural networks: a strong baseline	Time-series classif.	FCN/ResNet	MPCE = 0.0219
[93] Deep learning for time series classification: a review	Time-series classif.	ResNet	Rank 1st (85 datasets)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ayuso, D.V.; Román Gallego, J.Á.; Domínguez, C.Z. Artificial Intelligence Approaches for Energy Consumption and Generation Forecasting, Anomaly Detection, and Public Decision-Making: A Systematic Review. Energies 2026, 19, 2347. https://doi.org/10.3390/en19102347

AMA Style

Ayuso DV, Román Gallego JÁ, Domínguez CZ. Artificial Intelligence Approaches for Energy Consumption and Generation Forecasting, Anomaly Detection, and Public Decision-Making: A Systematic Review. Energies. 2026; 19(10):2347. https://doi.org/10.3390/en19102347

Chicago/Turabian Style

Ayuso, David Velasco, Jesús Ángel Román Gallego, and Carolina Zato Domínguez. 2026. "Artificial Intelligence Approaches for Energy Consumption and Generation Forecasting, Anomaly Detection, and Public Decision-Making: A Systematic Review" Energies 19, no. 10: 2347. https://doi.org/10.3390/en19102347

APA Style

Ayuso, D. V., Román Gallego, J. Á., & Domínguez, C. Z. (2026). Artificial Intelligence Approaches for Energy Consumption and Generation Forecasting, Anomaly Detection, and Public Decision-Making: A Systematic Review. Energies, 19(10), 2347. https://doi.org/10.3390/en19102347

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Artificial Intelligence Approaches for Energy Consumption and Generation Forecasting, Anomaly Detection, and Public Decision-Making: A Systematic Review

Abstract

1. Introduction

2. State of the Art

3. Methodology

3.1. Definition of Research Questions

3.2. Systematic Literature Search

3.3. Study Selection

3.4. Study Grouping and Synthesis Strategy

3.5. Exploration of Heterogeneity

3.6. Sensitivity Analysis

3.7. Reporting Bias Assessment

3.8. Certainty of Evidence Assessment

3.9. Study Quality and Risk of Bias Evaluation

3.10. Data Items (Outcomes)

3.11. Data Presentation and Tabulation

3.12. Data Items (Other Variables)

4. Study Extraction and Selection

4.1. Variables to Be Extracted

4.2. Data Extraction and Selection Diagram

4.3. Table for Systematic Literature Review

4.4. Explanation of the Comparison of Tables

5. Results and Discussion

5.1. Risk-of-Bias Assessment Results

5.2. Analysis of Impact

5.3. Analysis of Dataset Characteristics

5.4. In-Depth Analysis of Research Approaches

5.4.1. Critical Cross-Comparison of Methodological Approaches

5.4.2. Hardware Dependency of Neuromorphic Computing Benefits

5.4.3. Integrated Narrative Synthesis of Core and Secondary Literature

5.4.4. Energy Consumption Forecasting

5.4.5. Energy Generation Forecasting

5.4.6. Anomaly Detection

5.4.7. Cross-Temporal and Cross-Method Analysis

5.4.8. Interpretation in the Context of the Reviewed Evidence

5.4.9. Sim2Real Roadmap: Generative Models and Digital Twins for Anomaly Detection

5.5. Limitations of the Review Process and Implications for Practice

6. Research Gaps and Future Directions

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI