This Special Issue features five studies, each addressing distinct forecasting challenges within power systems. We analyze these contributions from two perspectives: the specific forecasting problems they tackle and the methodologies they employ to provide solutions.
2.1. Data and Forecasting Problems
The forecasting problem analyzed in the study by Di Grande et al. focuses on predicting hydropower generation from a 1.1 MW hydroelectric plant integrated into a Water Distribution System (WDS). Unlike traditional hydroelectric plants that rely on river flows or reservoir storage, this plant generates electricity by utilizing excess hydraulic pressure within a municipal water supply network. The integration of hydropower within a WDS introduces unique forecasting challenges, as energy production is influenced not only by natural hydrological factors such as rainfall, snowmelt, and seasonal variations but also by municipal water demand patterns, which fluctuate on both daily and annual scales.
The dataset used in this study consists of real-world operational data collected from the Alcantara 1 Hydroelectric Plant in Taormina, Sicily, Italy. It covers the period from January 2019 to November 2023, with 5 min interval measurements, resulting in a high-resolution time series dataset. The dataset includes 129 observations for the two-week aggregation format and 59 observations for the monthly aggregation format, highlighting the challenges associated with limited sample sizes in long-term forecasting.
The primary objective of this study was to develop univariate time series forecasts for hydropower output, predicting future energy generation using only historical hydropower generation data. However, the dataset contained incomplete records due to missing values resulting from plant malfunctions and maintenance activities. To address this, additional hydropower-related variables were used to reconstruct a more reliable estimation of normal hydropower generation, including inflow (water volume passing through the turbine per second), net head (water elevation difference before and after the turbine), and hydropower efficiency (system performance ratio).
To ensure data quality, the dataset underwent an extensive preprocessing phase. Outlier detection was performed using boxplot analysis to identify and remove anomalies. Additionally, the study employed missing data imputation techniques to handle approximately 28% of missing values, which resulted from sensor malfunctions and data acquisition failures. Two different imputation approaches were tested: linear interpolation, which replaces missing values using adjacent data points, and Seasonal-Trend decomposition using LOESS (STL), a more advanced method that decomposes the time series into seasonal, trend, and residual components before reconstructing missing values. The STL method consistently improved forecasting accuracy, as it effectively captured seasonal variations in hydropower generation.
The main goal of the paper by Di Grande et al. was to develop and compare different ML models for one- and two-step-ahead predictions of hydropower output. The forecasting horizons included monthly and two-week predictions, with an additional two-step-ahead forecasting model trained on two-week data to predict hydropower output for the next two time steps, effectively serving as an alternative monthly forecast.
The forecasting problem addressed in the study by Delgado et al. focuses on predicting photovoltaic (PV) power generation using ML models. The research aims to mitigate the variability caused by weather conditions and improve the planning and control of power generation. The study leverages a Hidden Markov Model (HMM) for data preprocessing and a Long Short-Term Memory (LSTM) model to predict PV power output based on meteorological and system operation data.
The dataset used in this research comes from the DKA Solar Center and the Ambient Weather Network. The DKA Solar dataset includes measurements collected from PV farms in Australia between 2013 and 2020, totaling 1,281,324 measurements recorded at 5 min intervals. This dataset contains 13 numerical variables. The Ambient Weather dataset, used to test model robustness, contains 56,340 measurements taken at 5 min intervals from February 2022 to September 2022 in Puerto Rico. It includes 18 meteorological variables, but due to data gaps caused by electrical interruptions, missing values were handled using the forward-fill interpolation method.
The input variables selected for the forecasting models include active power, temperature, humidity, wind speed, horizontal irradiance, diffuse irradiance, and wind direction. Active power serves as the target variable. A correlation analysis using Pearson’s coefficient was performed to determine the most influential factors. Global horizontal irradiance showed the highest correlation with PV power output (0.96), followed by diffuse irradiance (0.55). These variables were prioritized in the model training phase.
The dataset underwent a comprehensive preprocessing pipeline before model training. Outlier detection and removal were performed using a Gaussian HMM, which classified measurements as outliers, inliers, or constants. The difference between consecutive values was calculated to identify anomalies, and observations exceeding a predefined threshold were flagged as outliers. These outliers were eliminated to create a cleaner dataset. Additionally, normalization using min-max scaling was applied to transform the input features into values ranging between 0 and 1. This step ensured that all variables were on the same scale, preventing any single feature from dominating the model.
The goal of this study is to develop a highly accurate predictive model capable of forecasting PV power output with minimal error. The proposed deep learning models aim to enhance both short- and long-term forecasting accuracy, enabling better grid integration, ramp rate control, and energy management decisions.
The forecasting horizon is set at five minutes ahead, making it an ultra short-term forecasting problem. This specific horizon was chosen to capture the rapid fluctuations in solar irradiance caused by cloud dynamics. Such frequent updates are crucial for real-time PV system control and grid stability. Additionally, the study compares single-input single-output (SISO) and multiple-input single-output (MISO) LSTM models, evaluating their performance in different configurations.
The forecasting problem addressed in the paper by Wang et al. focuses on predicting daily peak and valley electric loads using a SSA-LSTM-RF model. This type of forecasting is critical for power system operations, as it helps optimize electricity generation, distribution, and grid stability by anticipating periods of high and low demand. Unlike traditional load forecasting, which estimates overall energy consumption, peak and valley forecasting identifies the maximum and minimum load values within a given day, along with their corresponding occurrence times.
The dataset used in the study consists of regional power grid data collected at 15 min intervals, industrial daily load data, and meteorological data. The load data provides information on historical electricity consumption patterns, while meteorological variables such as temperature, humidity, precipitation, wind speed, and solar radiation influence power demand variations. The data spans a period of two years, covering a diverse range of seasonal and economic conditions. The dataset was preprocessed to handle missing values, detect and remove outliers, and normalize variables to ensure consistency across different scales. The Pearson correlation coefficient and random forest feature selection methods were used to identify the most relevant predictors, ensuring that only the most significant variables were included in the model.
The forecasting goal is to provide medium-term predictions for peak and valley electric loads, aiming to predict both the magnitude of the daily maximum and minimum loads and the time at which they occur over a forecast horizon of three months. The study also examines the impact of different seasonal, economic, and operational factors on the variability of load patterns. The inclusion of both load-based trend factors (such as historical peak and valley values) and external meteorological influences enhances the robustness of the forecasting approach.
The study in by David et al. examines the challenge of forecasting solar irradiance by utilizing sky camera imagery to generate probabilistic binary predictions for ultra-short-term intervals spanning from 1 to 30 min. Unlike conventional solar forecasting models that estimate continuous irradiance values, this study proposes a binary approach, predicting whether solar irradiance will exceed a predefined threshold. This method is particularly useful for concentrated solar power (CSP) plants, where operators need to anticipate cloud cover conditions to adjust system operations efficiently.
The data used in this study consists of images from a sky camera combined with meteorological measurements from pyranometers and a pyrheliometer. The sky camera captures hemispheric images at one-minute intervals, providing a high-resolution dataset of cloud formations and atmospheric conditions. The pyranometers measure global and diffuse irradiance, while the pyrheliometer records direct normal irradiance (DNI). The dataset covers the years 2010 and 2011, comprising approximately 270,000 image-based predictions. The 2010 data was used for model training, while the 2011 data served as the test set, ensuring an independent evaluation of the forecasting approach.
The key variables in this forecasting problem include cloud presence, solar irradiance levels, and meteorological factors derived from sky images. The primary forecasting target is the probability that DNI will exceed 400 W/m2, which is a critical threshold for CSP plant operation. Cloud presence is determined using RGB and HSV color space processing, followed by motion estimation techniques that track cloud displacement over consecutive frames. A cloud motion vector model calculates cloud trajectories, enabling deterministic forecasts of solar conditions.
Preprocessing steps involved outlier detection and removal, image segmentation, and feature extraction. The sky camera images were divided into 23 sectors, each analyzed separately to account for localized cloud movement. Cloud motion was determined using maximum cross-correlation methods, ensuring accurate tracking of dynamic weather patterns. The deterministic forecasts were then converted into probabilistic forecasts through a post-processing step, using three different models: the logit model, probit model, and random forest classifier. These models transformed the deterministic binary outputs into probability distributions, providing uncertainty quantification crucial for decision-making in CSP operations.
The goal of this forecasting approach is to provide more reliable and actionable information than traditional deterministic methods. By incorporating probability levels, the model enables CSP operators to assess risk levels associated with cloud cover, allowing for more adaptive energy management strategies. The forecasting horizon ranges from 1 to 30 min, capturing rapid fluctuations in solar irradiance caused by transient cloud movements.
The case study was conducted at the CIESOL research center at the University of Almería, Spain, which experiences a Mediterranean climate with significant maritime aerosol presence. The facility is equipped with state-of-the-art solar measurement instruments, including sky imagers and high-precision irradiance sensors. The location’s frequent cloud cover variability makes it an ideal testing ground for evaluating the effectiveness of probabilistic forecasting models.
Overall, this study introduces a novel probabilistic solar forecasting approach that enhances traditional deterministic models by incorporating uncertainty quantification and binary event prediction. The combination of sky camera imagery, advanced cloud motion tracking, and probabilistic modeling offers a more robust and adaptive solution for solar power management, particularly for CSP applications where rapid operational adjustments are required.
The authors of the paper by Lara-Cerecedo et al. present a real-world, data-driven PV power forecasting problem that involves managing large-scale, high-resolution time-series data. Accurate forecasting requires advanced AI-based techniques capable of capturing complex, non-linear dependencies between meteorological conditions and PV output. The forecasting model must effectively handle uncertainties in weather patterns while maintaining high accuracy and computational efficiency over an extended prediction period.
The study focuses on predicting electricity generation from a 60 kW PV system using intelligent models. The system consists of 240 monocrystalline silicon modules, each with a nominal power of 250 W, arranged into five sections. Continuous data collection from meteorological and electrical sensors, including a pyranometer for solar irradiance measurements and an anemometer for wind speed monitoring, provides the foundation for the predictive model.
The dataset used in this study is extensive, comprising 225,400 records per variable, spanning 26 months. Data is recorded at 5 min intervals, resulting in a high-resolution time series dataset. The large volume and fine granularity of this dataset enable the model to capture both short-term fluctuations and long-term seasonal trends, enhancing its robustness and predictive capability.
The study incorporates four meteorological variables as inputs to predict the electrical power output of the PV system: global horizontal solar radiation—the primary driver of PV generation, determining the amount of energy absorbed by the panels; module temperature—influences PV cell efficiency, as higher temperatures can cause performance losses; ambient temperature—provides additional context for thermal effects on system efficiency; and wind speed—contributes to heat dissipation from PV modules, indirectly affecting their efficiency.
Preprocessing steps include data cleaning, normalization, and correlation analysis. Missing values were addressed using interpolation techniques to maintain data continuity. A Pearson and Spearman correlation analysis was conducted to evaluate the relationships between input variables and PV power output, confirming that solar radiation is the most influential factor in determining energy generation.
The goal of this study is to develop a highly accurate predictive model that forecasts PV power output based on historical meteorological and operational data. These predictions are intended to support energy planning, grid integration, and economic analysis of PV systems, ensuring more efficient resource management.
Unlike many existing studies that focus on short-term forecasting (hours to days ahead), this research aims to predict PV generation over an extended horizon of eight months. This long-term forecasting capability provides valuable insights for seasonal energy planning, addressing fluctuations in solar availability across different times of the year.
2.2. Forecasting Models
The authors of the study by Di Grande et al. investigate multiple ML-based forecasting models for hydropower generation prediction, analyzing their architectures, optimization strategies, training processes, and overall performance. The models considered include random forest (RF), Temporal Convolutional Network (TCN), and Neural Basis Expansion Analysis for Time Series (NBEATS), with Seasonal Autoregressive Integrated Moving Average (SARIMA) serving as a baseline statistical model.
The RF model is an ensemble learning algorithm that constructs multiple decision trees, each trained on a random subset of the data. The final prediction is obtained by aggregating individual tree outputs, typically through averaging. RF is highly effective in capturing non-linearity and is resistant to overfitting, making it particularly suitable for datasets with limited observations. In this study, RF was applied to both monthly and two-week forecasting tasks, demonstrating strong performance in long-term prediction scenarios.
The TCN is a deep learning model optimized for sequential data processing. Unlike traditional recurrent architectures such as LSTM networks, TCN employs causal convolutions, allowing for parallelization and efficient long-term dependency modeling. The network consists of multiple convolutional layers, utilizing dilated convolutions to expand the receptive field while keeping computational complexity manageable. Weight normalization and dropout regularization were applied to enhance the model’s generalization capabilities.
The NBEATS model is a fully connected deep learning architecture specifically designed for time-series forecasting. Instead of relying on recurrence, it uses a stacked architecture with trend and seasonality decomposition, enabling it to capture both short-term fluctuations and long-term trends. The model incorporates residual connections and supports multi-horizon forecasting, making it well-suited for hydropower prediction. However, due to the relatively small dataset size, NBEATS exhibited limited effectiveness compared to RF and TCN in this study.
The SARIMA model, used as a benchmark, is a statistical forecasting technique that integrates autoregressive (AR), differencing (I), and moving average (MA) components, along with a seasonal adjustment component. While SARIMA effectively models linear trends and periodic patterns, it struggles with complex non-linear dependencies, making it less competitive than ML-based models.
To improve predictive accuracy, each ML model underwent hyperparameter optimization using Optuna, an automated optimization framework. The RF model was tuned across eight hyperparameters, including the number of estimators, maximum depth, and minimum samples per split, balancing accuracy and computational efficiency. The TCN model was optimized for seven hyperparameters, such as kernel size, number of layers, dropout rate, and dilation factor, while NBEATS was fine-tuned for five hyperparameters, including the number of stacks, blocks, and layers, to enhance generalization. SARIMA’s parameters were manually optimized based on the Akaike Information Criterion and Bayesian Information Criterion.
All models were trained using Walk-Forward Validation (WFV), a time-series-specific cross-validation approach that preserves the temporal order of observations. In WFV, models are iteratively trained on past data and evaluated on the next time step(s), ensuring that future data is never leaked into training.
The models were trained using the Darts Python library, a specialized toolkit for time-series forecasting. The RF model achieved the highest accuracy for monthly forecasting, with SMAPE ≈ 8.0%, outperforming both TCN and NBEATS. SARIMA performed worse than RF but surpassed NBEATS, confirming its suitability for simpler time-series patterns. For two-week forecasting, TCN exhibited the lowest error, achieving SMAPE ≈ 4.9%, outperforming both RF and NBEATS. While SARIMA performed reasonably well, it was slightly less accurate than TCN. For two-step-ahead forecasting, an RF model trained on two-week data yielded SMAPE ≈ 6.8%, outperforming the direct monthly forecasting model. This result underscores the advantage of higher-frequency data in long-term forecasting, suggesting that aggregating shorter time intervals can improve predictive accuracy for monthly hydropower forecasting.
The forecasting model developed in the study by Delgado et al. is based on an LSTM network, optimized for short-term PV power prediction. The architecture is designed to capture temporal dependencies in PV power generation while mitigating data noise and outliers through an HMM preprocessing step. The model is implemented in two configurations: SISO LSTM, which takes only active power as an input and predicts the next time step’s power output, and MISO LSTM, which incorporates multiple meteorological variables such as temperature, humidity, wind speed, horizontal and diffuse irradiance, and wind direction.
The LSTM network consists of three primary components: a forget gate that determines which past information should be discarded, an input gate that updates cell states with new information, and an output gate that generates the final hidden state for the next time step. The MISO LSTM model includes three hidden layers, a dense output layer, and dropout layers for regularization, while using the ReLU activation function and the Adam optimizer. The hidden layers allow the model to learn hierarchical feature representations, and the dropout layers help prevent overfitting.
The optimization strategy employs the Adam optimizer, which provides efficient gradient updates and an adaptive learning rate. Hyperparameter tuning was performed using Optuna, which searched for the best combination of the number of neurons per layer, batch size, dropout rate, learning rate decay schedule, and number of training epochs. The SISO model was trained for ten epochs, whereas the MISO model, due to its larger feature space, was trained for 200 epochs.
The model was trained using WFV, a time-series-specific cross-validation technique that ensures the model learns only from past data, avoiding data leakage. The training dataset was divided into 40% for training, 30% for validation, and 30% for testing. The primary training data came from the DKA Solar dataset, while the Ambient Weather dataset was used for additional testing and robustness evaluation. The data underwent preprocessing, including HMM-based outlier detection and min-max normalization, ensuring that input values remained within a uniform range.
Performance evaluation was conducted using MSE, RMSE, and MAE. The SISO model produced satisfactory results with an MSE of 3.05 × 10−3 kW, but the MISO LSTM significantly improved performance, achieving an MSE of 2.17 × 10−7 kW. This demonstrated the advantage of incorporating multiple input features. For one-step-ahead forecasting, the MISO model outperformed baseline methods such as Support Vector Machine (SVM), Radiation Classification Coordinate LSTM (RCC-LSTM), and Echo State Network Convolutional Neural Network (ESNCNN). The two-step-ahead forecasting model, trained on two-week aggregated data, showed that higher frequency data improved long-term predictions, with an SMAPE of 6.8%, compared to the direct monthly model’s SMAPE of 8.0%.
In conclusion, the LSTM-based forecasting model provides highly accurate short-term PV power predictions, with MISO LSTM outperforming traditional statistical and ML models. The combination of HMM-based outlier detection, Optuna hyperparameter tuning, and deep learning optimization significantly enhances prediction accuracy, making the model a robust tool for real-time grid integration and PV power planning.
The forecasting model developed in the study by Wang et al. integrates Sparrow Search Algorithm (SSA), LSTM, and RF to predict daily peak and valley electric loads. The architecture is designed to capture non-linear dependencies, temporal patterns, and external meteorological influences while ensuring optimized parameter selection and robust classification.
The model architecture consists of three primary components. The LSTM network serves as the core of the forecasting process, leveraging its ability to retain long-term dependencies in sequential data while addressing the vanishing gradient problem commonly associated with traditional recurrent neural networks. To enhance accuracy and prevent overfitting, the SSA is employed to fine-tune LSTM’s hyperparameters, including the number of hidden units, learning rate, and iteration count. The optimized LSTM outputs predictions for daily peak and valley load values. These predicted values are then passed into an RF classifier, which determines the precise time of occurrence for peak and valley loads, ensuring a more structured and interpretable forecasting framework.
The optimization process relies on SSA, an advanced swarm intelligence algorithm that mimics the foraging and anti-predation behavior of sparrows. SSA effectively improves the LSTM model’s performance by refining hyperparameter selection, thus avoiding local minima and enhancing convergence speed. Unlike conventional grid search or particle swarm optimization (PSO), SSA demonstrates superior adaptability in balancing exploration and exploitation, leading to better predictive accuracy.
The training process involves processing 15 min interval load data along with meteorological variables such as temperature, humidity, and precipitation. Data preprocessing includes outlier detection using standard deviation filtering, missing value imputation with linear interpolation, and feature selection using Pearson correlation and RF-based importance ranking. Time series features, including historical peak and valley values, seasonal patterns, and economic activity indicators, are incorporated to improve the model’s ability to capture load variations.
The LSTM model is trained with the SSA optimized parameters, after which the predicted daily peak and valley values are used as inputs to the RF model. The RF classifier then determines the time of occurrence for peak and valley loads, enhancing the interpretability of the forecast.
Performance evaluation is conducted using standard forecasting metrics, including MAPE, RMSE, and R2. The results indicate that the SSA-LSTM-RF model outperforms alternative forecasting approaches such as RF-PSO-LSTM and traditional regression models. The proposed model achieves a lower RMSE and higher R2, demonstrating improved accuracy in predicting both load magnitudes and peak-valley times. Over a forecast horizon of three months, the SSA-LSTM-RF approach exhibits superior generalization, maintaining high forecasting precision even as the time step increases.
The forecasting model presented in the paper by David et al. is designed to predict solar irradiance using sky camera imagery, focusing on probabilistic binary forecasts for ultra short-term horizons ranging from 1 to 30 min. The model architecture consists of two main stages: a deterministic cloud detection and motion estimation system based on sequential sky images, followed by a probabilistic forecasting approach using statistical and ML models.
The deterministic stage involves identifying cloud presence and movement from a sequence of sky images captured at one-minute intervals. This is achieved through image processing techniques that analyze color space transformations (RGB and HSV) and radiometric data to segment clouds that attenuate direct normal irradiance (DNI) below a predefined threshold. A cloud motion vector is then computed using cross-correlation methods to track cloud displacement over time, allowing the system to generate initial deterministic forecasts of future cloud cover conditions.
To refine these deterministic forecasts, the study employs three different probabilistic modeling approaches: logit regression, probit regression, and a non-parametric RF model. The logit and probit models belong to the family of generalized linear models, where the probability of a clear sky event is estimated as a function of explanatory variables. Logit regression applies a logistic transformation to map input variables to probability values, while the probit model uses a cumulative normal distribution function for probability estimation. In contrast, the RF model comprises an ensemble of decision trees trained on historical data, enabling it to capture complex, non-linear relationships between meteorological variables and cloud movement patterns.
The optimization of the models was conducted using cross-validation techniques, where hyperparameters such as the number of decision trees in RF and the regularization parameters for logit and probit models were fine-tuned. The dataset from 2010 was used for model training, while the 2011 data served as the test set to ensure an independent evaluation of forecasting performance. The best-performing probabilistic model was selected based on a combination of reliability, resolution, and overall predictive accuracy.
Training was performed using sequences of sky images and corresponding solar irradiance measurements, with input features including cloud motion vectors, solar zenith angle, historical clear sky indices, and past irradiance levels. A post-processing step was applied to the deterministic forecasts, converting them into probabilistic predictions that express the likelihood of exceeding the DNI threshold at different forecasting horizons.
Performance evaluation was conducted using multiple verification metrics. The deterministic forecasts were assessed using traditional accuracy measures, while the probabilistic forecasts were evaluated based on reliability diagrams, relative operating characteristic (ROC) curves, and the Brier Score, which quantifies the accuracy of probability predictions. The results demonstrated that the RF model significantly outperformed both the logit and probit models in terms of reliability and overall skill. The RF approach provided the highest accuracy, with improvements of up to 11.6 percentage points for short-term forecasts compared to the baseline deterministic model.
The Adaptive Neuro-Fuzzy Inference System (ANFIS) optimized with particle swarm optimization (PSO) proposed in the paper by Lara-Cerecedo et al. is a hybrid forecasting model designed to improve the accuracy of PV power generation predictions. ANFIS integrates the strengths of artificial neural networks (ANNs) and fuzzy logic inference systems (FISs), while PSO optimizes ANFIS parameters to enhance predictive performance. Unlike purely statistical models such as ARIMA or Holt–Winters, ANFIS-PSO effectively captures non-linear dependencies between meteorological factors and PV output while remaining computationally efficient compared to deep learning techniques.
ANFIS functions as a five-layer neuro-fuzzy network, where each layer plays a distinct role in the learning and inference process: fuzzification layer—converts crisp numerical input variables into fuzzy membership functions, assigning each input a degree of belonging to a specific fuzzy set; rule layer—defines fuzzy rules based on the combination of membership function values; normalization layer—normalizes rule strengths to ensure they sum to 1; defuzzification layer—maps fuzzy rule outputs to crisp numerical values using a weighted sum; and output layer—produces the final prediction by aggregating the results from the defuzzification layer. Each rule represents a logical condition linking inputs to the output.
ANFIS requires extensive parameter tuning to optimize fuzzy membership function shapes, rule weights, and consequent parameters. To address this, PSO is integrated to improve generalization and accuracy. PSO is a swarm intelligence-based optimization algorithm inspired by the collective behavior of flocks of birds or schools of fish. Each particle in the swarm represents a candidate solution, defined as a vector of ANFIS parameters. Particles are initialized randomly, and their positions evolve dynamically through iterative updates. Over time, particles converge toward the optimal parameter set that minimizes the error function.
The ANFIS-PSO model is trained using historical PV generation data, along with key meteorological inputs, including solar radiation, module temperature, ambient temperature, and wind speed. Its performance is evaluated using standard statistical error metrics such as RMSE, RMSPE, MAE, and MAPE. The results show that the optimized ANFIS-PSO model significantly outperforms standard ANFIS, with a 58% and 62% reduction in RMSE and MAPE, respectively, demonstrating superior predictive accuracy and improved robustness in PV power forecasting.