1. Introduction
The development of renewable energy sources (RES) in the European Union (EU) plays a crucial role in the transition to a sustainable and low-emission economy. The EU is striving to significantly reduce greenhouse gas emissions and achieve carbon neutrality by 2050, which requires a substantial increase in the share of renewable energy in the energy mix. Key areas include the development of solar and wind energy, which have immense potential to meet Europe’s growing energy needs without harming the environment. Moreover, the introduction of RES helps reduce economic dependence on imported fossil fuels, which is strategically important for the region’s energy security.
In addition, investments in renewable energy sources create new jobs and help stimulate economic development, especially in non-urbanized (remote and rural) areas. Technological innovations in RES not only improve their efficiency and reliability but also make them more competitive with traditional energy sources. Support from the European Commission and national governments in creating favorable legislation plays a vital role in accelerating the integration of RES into the energy system. This allows for more effective use of the RES potential and minimizes financial losses associated with generation constraints and energy system imbalances. Thus, the development of RES in the EU is an important step towards a sustainable and environmentally friendly future.
One of the problems, in addition to the large financial losses of the state (using Ukraine as an example), is the responsibility of renewable energy sources for imbalances—deviations of actual electricity production from the forecast. Unlike traditional energy sources (e.g., coal or nuclear power plants), it is quite difficult to forecast electricity production from RES due to the variability of weather conditions. Modern forecasting systems predict hourly production from RES with an average error of around 10–20%. Adopting a more decentralized and transparent approach to organized trade and regional balancing could eliminate inefficiencies in the centralized energy market model and improve its functionality [
1].
An analysis of the Electricity Market Act [
2] highlights the importance of accurate forecasting of RES generation to reduce losses for power plant owners and ensure the reliability of the grid. Therefore, the issue of more accurate forecasting of electricity production from RES, both large power plants (solar and wind farms) and small, private generation units (prosumers), becomes pressing. More accurate forecasting will allow grid operators to efficiently utilize the potential of RES without resorting to generation limitations. It will also help reduce the need to maintain outdated coal-fired power plants in “hot” reserve and increase energy efficiency in networks through more precise planning of power flows from various RES.
The development of RES and the introduction of cheaper, sustainable and environmentally friendly alternatives to traditional energy sources provide the countries affected by fossil fuel shortages with an opportunity to reduce energy dependency [
3].
Current research related to photovoltaic (PV) energy generation forecasting emphasizes the use of meteorological data to improve the forecast accuracy [
4,
5]. The forecasting methodology for PV sources is based on the use of the design of experiments (DOE) method [
6], which can significantly reduce the number of experimental runs while ensuring high decision-making accuracy and reliability. A review of the application of machine learning (ML) and deep learning (DL) methods for PV power forecasting shows that while ML methods have been widely studied, the use of DL methods for this purpose is limited [
7].
The systematic review of current trends in photovoltaic power forecasting technologies highlights the need to develop highly accurate forecasting models with minimal dependence on weather conditions [
8]. The application of ML and artificial intelligence (AI) methods in this area can improve the forecast accuracy. The work [
9] presents a method for short-term PV power forecasting based on recurrent neural networks (RNN). The model shows high forecast accuracy in 15 min and 30 min horizons, indicating the high stability and reliability of the proposed method for real PV power plants.
The work [
10] analyzes various methods for forecasting PV power, comparing them in terms of the prediction method, time horizon, measurement error and computational cost. Artificial neural network (ANN) and support vector machine (SVM) methods are most effective in solving complex nonlinear predictive models. In addition to using ANN alone for forecasting electricity generation by PV plants suggest the use of hybrid models combining both ANN and statistical methods to forecast solar radiation intensity [
11]. These methods have shown the best performance compared to traditional methods, especially for sunny days.
The use of ANN for forecasting electricity imbalances, presented, showed that the long short-term memory (LSTM) model has the lowest error values compared to other methods, indicating the high effectiveness of the proposed approach [
12]. Unlike traditional RNN, LSTM networks can store information over long intervals, which is achieved through the use of special mechanisms such as memory cells and control gates.
Deep learning artificial neural network architecture was proposed for predicting the amount of electricity supply by renewable energy producers, as well as the upper and lower limits of the forecasting interval, the characteristic feature of which is the use of auto-coding blocks with short connections [
13]. This allowed the reduction of the average forecast error for the upcoming day to 4.46% and the maximum one to 12.81%.
At the same time, a significant part of the research is devoted to the linking of meteorological parameters with the production of electricity by solar power plants. For example, a study on the relationship between meteorological variables and the output power of a PV plant presented in the paper [
14] found a strong correlation between temperature, solar radiation intensity and installation efficiency. The use of dimensionality reduction techniques, such as feature selection and principal component analysis, can significantly reduce computation time while maintaining high model accuracy. Additionally, an innovative approach to PV power forecasting based on satellite images described in the paper [
15], utilizing a model that accounts for the nonlinear movement of clouds, allows for more accurate predictions of their trajectories. The work [
16] uses an artificial neural network to predict generation based on various weather parameters. However, in the case of forecasting electricity generation using meteorological data, it is crucial to consider the potential for missing data. For example, studies have demonstrated that hybrid learning methods robust to data absence can maintain high forecasting accuracy even when faced with substantial amounts of incomplete data [
17].
In addition to meteorological parameters, it is also proposed to use the characteristics of ground station locations for forecasting. For instance, based on a broad learning system and copula theory, it is proposed to improve the accuracy of distributed generation forecasting [
18]. This approach enables a more precise consideration of time–space dependencies between installations, allowing for more effective accounting of interdependencies between the outputs of individual photovoltaic plants. In the paper [
19], the importance of considering the geographical distribution of installations and the combination of different types of installations is emphasized. The work [
20] proposes using a natural simulation model to improve the automatic PV generation forecasting system. This helps increase PV generation volumes in power systems and improves the management of distributed generation.
In addition to the technical aspects of PV plant operation forecasting, research is also being conducted on the integration of PV plants into the power grids of various countries. The results presented in the paper [
21] showed that Germany and Spain have the highest forecast accuracy. Power forecasts enable grid operators and system designers to optimally plan plant operation and manage energy supply and demand. A study of the forecasting system in the German energy market indicates that improved forecasts for fundamental variables, such as electricity demand and solar and wind energy production, contributed to a 13.5% increase in revenue from market selection for energy sales [
22]. Study [
23] discussed the importance of PV energy generation forecasting for successful integration into the power grid and participation in energy markets.
Many of the analyzed studies present comparisons of different models for forecasting the operation of photovoltaic (PV) power plants. For example, an ensemble forecasting method based on energy consumption data was used for planning operations in the electricity market [
24]. Future steps may involve integrating additional data, such as weather conditions, to improve the models. The comparison of LSTM and gated recurrent unit (GRU) neural network models for predicting PV power generation showed that GRU outperforms LSTM when using long-term training data [
25]. The study presented in the paper [
26] proposed a method for forecasting PV power in distribution networks with high PV saturation, based on the use of a small set of representative monitoring locations.
The analysis of the publications shows that PV generation forecasting depends on the accuracy of meteorological data and the use of advanced machine learning (ML) and deep learning (DL) methods. Future research should focus on developing highly accurate predictive models by considering and integrating additional data, such as weather conditions, to improve the forecast accuracy. It can be stated that hourly power generation forecasting using neural networks is characterized by high accuracy, and further studies may focus on reducing the computational resource requirements and training time for neural networks.
For example, the average error for the actual measured solar radiation intensity ranges from −0.97% to 4.91%, and for calculated values and weather forecasts, from −3.86% to 5.12% [
27]. This results from the inaccuracies in weather forecasts and calculations of solar radiation intensity. ANN requires only 1000–2000 data samples, which reduces computational requirements and accelerates training. To improve forecasting accuracy, it is necessary to determine the optimal set of input data for ANN (primarily meteorological data) needed for high-quality forecasting of solar power plant operation, without overloading the models with unnecessary data that complicate the ANN training process.
3. Research Results
In the first stage of the analysis, correlation coefficients were determined between the daily vectors of hourly values of individual meteorological parameters, e.g., temperature [
T1,
T2, …,
T24] and the generated active power [
P1,
P2, …,
P24]. The results of the calculations, averaged for the period of the year, are shown in
Table 1. Spearman’s rank correlation coefficients were calculated for all meteorological parameters, similar to the Pearson correlation coefficient calculations (
Table 1). The results showed that Spearman’s correlation between active power and meteorological parameters was higher than Pearson’s coefficient, confirming the presence of a nonlinear relationship.
As can be seen from this table, the highest correlation with active power generation is shown by the parameters related to the intensity of solar radiation (about 0.92), the parameters of time and panel temperature (0.82), as well as air temperature (0.54) and wind speed (0.56). Example measurement results illustrating the daily relationship between changes in various meteorological parameters and active power generated by a solar power plant are shown in
Figure 4,
Figure 5,
Figure 6 and
Figure 7.
Since solar power generation depends on the time of day, in assessing the correlation coefficient of generation with the time of day, a modified time scale was introduced that more accurately reflects the physical nature of the process. The transformation (conversion) of the time scale was carried out in such a way that an increasing countdown from 0 to 12 corresponds to an increase in solar intensity, and then a decreasing countdown from 12 to 0, reflects a decrease in intensity in the afternoon. This approach allows for a linear representation of the relationship between the time of day and the level of power generation, eliminating any non-linearities associated with traditional time measurement. This methodology allows for a more accurate representation of the nature of solar power generation as a function of time of day, which in turn can improve the accuracy of forecasting and modeling in photovoltaic power systems.
The main reasons for the high level of the correlation coefficient of the time of day with the indicator under consideration (0.82) can be associated with the cyclicality and predictability of daily changes in meteorological parameters. The time of day has a significant impact on many factors, such as the intensity of solar radiation and temperature, which in turn can significantly affect the level of power generation by the solar power plant.
An important role is also played by the temperature of the panels, for which the correlation coefficient is
r = 0.82. The temperature of the panels affects their performance. For example, in the case of solar panels, their temperature directly affects the efficiency and effectiveness of converting solar energy into electricity. However, it must be taken into account that a higher panel temperature can also lead to a reduction in the efficiency of the solar power plant’s power generation (
Figure 4).
At the same time, air temperature, which has a correlation coefficient of 0.54, has a moderate effect on the analyzed parameters. First of all, air temperature alone does not affect the generation of power by the solar power plant and is only an intermediate parameter that depends on the intensity of solar radiation.
Figure 5 shows a graph illustrating the variation of air temperature and generated active power, where the maximum generation of active power occurs at 11:00 ÷ 12:00, and the maximum air temperature occurs at 14:00 ÷ 15:00. This is due to the fact that air has thermal inertia, so it takes time to warm up after reaching the maximum intensity of solar radiation.
Wind speed also has a significant influence, with a correlation coefficient of 0.56. Wind can cause a variety of effects, such as cooling the surface of the panels or changing the heat exchange with the environment, which affects the parameters studied. However, the mechanism of the effect of this parameter on active power generation requires further research in this direction. A graph illustrating the changes in active power generation and wind speed is shown in
Figure 6.
In addition, the reason for the negative value of the correlation coefficient for some meteorological parameters, (e.g.,
r = −0.48 for air humidity), is clear.
Figure 7 shows a graph illustrating the daily variation of active power generation of the relationship between generation and air humidity, which shows a decrease in humidity levels as generation increases. Of course, this parameter is not directly related to generation, but indirectly indicates the level of solar radiation and the associated air temperature, which reduce the humidity level.
The values of the correlation coefficients given in
Table 1 do not properly reflect the real impact of meteorological parameters on electricity production, as it is clear that the correlation of active power generation with the intensity of solar radiation dominates. It should be noted that meteorological parameters (air temperature, panel temperature, air humidity and wind speed) depend directly on the intensity of solar radiation. So, for example, the correlation of panel temperature or air temperature is part of the correlation between active power generation and solar radiation.
In order to eliminate the influence of solar radiation intensity on the values of other meteorological parameters, it was decided to seek a correlation of the unit active power generation (
P*), derived from the current solar radiation intensity (kW/(W/m
2)), with these parameters, as a more objective assessment of the correlation relationship. The unit active power generation is defined by the formula:
where
P is the generated active power (kW), and
G is the solar radiation (W/m
2).
Figure 8 shows a combined graph of solar power plant power generation, solar radiation and unit active power generation. This graph confirms the close dependence of generation on solar radiation, resulting in a high coefficient of their correlation (
Table 1). At the same time, it can be noted that the level of unit active power generation (
P*) practically does not change during the period of operation of the power plant (from sunrise to sunset), since the generation and intensity of solar radiation change similarly during the day.
The use of this indicator (
P*) allows for the determination of more realistic correlation coefficients between meteorological parameters and electricity production.
Table 2 shows the results of calculating the correlation coefficients of meteorological parameters with unit active power generation (
P*).
The correlation coefficients shown in
Table 2 have lower values than those in
Table 1 and better reflect the relationship between selected meteorological parameters and unit active power generation. The highest correlation coefficients for the studied period are the time of day (0.71), air temperature (0.35), panel temperature (0.57) and wind speed (0.42). In addition to the meteorological parameters presented, there are also parameters that require more detailed study, such as the intensity and amount of precipitation (
r = 0), as well as the degree of pollution of photovoltaic panels (
r = −0.14). Their influence on the generation of energy by the solar power plant in this study is practically absent.
The dependence of the unit active power generation
P*(
t) during the day is not unchanged, although it changes to a much lesser extent than generation
P(
t) (
Figure 8). This is due to the fact that the nature of changes in the unit active power generation of
P*(
t) is influenced by virtually all meteorological parameters, with the exception of solar radiation. The stronger the correlation between meteorological parameters and power generation, the greater the changes in the unit active power generation.
To evaluate the influence of correlation coefficients on the quality of forecasting of solar power plant generation (SPP), an artificial neural network was created, the structure of which is shown in
Figure 9. The type and structure, as well as the parameters, of the ANN were selected based on the experience of previous studies [
27].
Neural Network Parameters:
- –
Type—feedforward network;
- –
Number of neurons in the hidden layer—30;
- –
Hidden layer activation function—hyperbolic tangent sigmoid;
- –
Output layer activation function—linear;
- –
Learning algorithm—Levenberg–Marquardt;
- –
Performance evaluation function—mean squared error;
- –
Maximum number of learning epochs—1000;
- –
Minimum gradient to stop learning—1 × 10−5.
The work takes into account that in real conditions, meteorological monitoring data may be incomplete or contain gaps. To ensure the correct operation of the neural network in the event of missing data, the interquartile range (IQR) method is used, the essence of which is to divide the data sample into quartiles:
- –
Q1 (first quartile)—the value below which 25% of the sample is located;
- –
Q3 (third quartile)—the value below which 75% of the sample is located;
- –
IQR = Q3 − Q1—the interquartile range.
A value is considered abnormal if it is less than Q1 − 1.5 IQR or greater than Q3 + 1.5 IQR.
In the next step, the “outlier” is replaced by the average value of neighbouring points (rolling average). In addition, the architecture of the neural network itself has the ability to generalize, which allows the reduction of the sensitivity of the model to noise and random fluctuations in the data.
Next, min–max normalization of the intensity indicators of the input parameters and the active power generated by the solar power plant is performed. This allows the elimination of discrepancies in the data scales and reduction of the impact of outliers (anomalous values), which can be caused by measurement errors or instability of weather conditions:
Today, as a rule, the Mean Average Percentage Error (
MAPE) method prevails in assessing the accuracy of forecasting electricity production and consumption. The formula for
MAPE includes the calculation of the relative error for each value, which is expressed as the proportion of the absolute error and the actual value [
31]:
where
n is the size of the data sample (24 h);
Pi—real value;
P′
i—predicted value.
The use of the
MAPE metric to assess forecasting accuracy has a significant drawback in cases where actual values are close to zero. When
Pi approaches zero, the denominator becomes very small, which leads to a sharp increase in the relative error. This creates a distortion in the overall estimate, as even small absolute deviations can result in very high relative error values. As a result, the
MAPE metric becomes unsuitable for analyzing data that contain low or zero values. To address this issue, we propose an approach that uses error normalization by dividing
MAE (Mean Absolute Error) by the average power consumption to avoid this drawback [
31]:
This approach ensures the stability of the estimate, since the normalization is performed relative to the average level of energy consumption and not to each individual value. Thus, the large error near zero values of the actual consumption does not dominate the overall estimate, making the metric more stable and adequate for use in electricity consumption forecasting.
The
Table 3 shows the forecast errors in %, is calculated by the Formula (10), for different sets of input data. The last two columns present the average value of the forecast errors for 14 days and improving the forecast by using the unit active power generation.
The results show that taking into account meteorological parameters with a high correlation coefficient (≥0.3), such as panel temperature, wind speed or humidity, leads to a decrease in the error in forecasting electricity generation. Moreover, adding unit active power generation to any set of meteorological parameters leads to an additional decrease in the error (
Figure 10 and
Figure 11).
This is due to the fact that the unit active power generation indirectly takes into account most of these parameters, reflecting the integral influence of environmental conditions. This is especially pronounced in the case where the unit active power generation is added to a small set of meteorological data (
Table 3, items 1–5). When adding the unit active power generation to a large set of data (
Table 3, item 6), the error decreases to a much lesser extent (
Figure 12 and
Figure 13).
This is due to the fact that unit active power generation already takes into account the impact of all these meteorological parameters on electricity generation. The reduction in the error from 1.53% to 1.35% is due to the fact that unit active power generation also takes into account other meteorological parameters that were not taken into account in forecasting.
The unit active power generation is actually a parameter that indirectly takes into account most meteorological parameters, reflecting the integral influence of environmental conditions on electricity generation. This emphasizes the importance of this option because it allows us to account for hidden dependencies and implicit influences that are difficult to incorporate into the model in any other way.
Thus, to achieve an optimal balance between model complexity and forecast accuracy, it is recommended to use a set of parameters that includes insolation, time and unit active power generation. The consideration of other meteorological parameters may be justified only in specific conditions or to improve forecasting in short-term intervals.
The proposed forecasting method takes into account the influence of solar radiation by using the unit active power generation (P*), which allows the model to be adaptive to changes in input meteorological conditions. However, it is worth noting that geographical location and seasonal fluctuations can indeed affect the accuracy of the forecast. In such cases, slight deviations in the accuracy of the forecast are possible, especially under conditions of a sharp deterioration in weather conditions, which are difficult to predict based on standard meteorological data. To minimize the influence of these factors, the study used the following:
- –
Adaptation of the model to local climatic features by retraining on regional samples;
- –
Additional meteorological parameters (temperature, humidity, cloudiness) to increase accuracy;
- –
Constant retraining of the ANN to ensure adaptability.
4. Conclusions
The study of correlation relationships between meteorological parameters and power generation by photovoltaic power plants confirms the significant role of meteorological factors, such as air and photovoltaic panel temperature, wind speed and time of day, in the electricity generation process. It is important to emphasize that the primary parameter influencing electricity generation by SPP is the level of solar radiation intensity. However, adding meteorological parameters can allow for more accurate forecasting of electricity production.
Currently, traditional forecasting models often focus solely on solar radiation, ignoring meteorological variables that also have a significant impact on electricity generation by solar power plants. The aim of estimating correlation coefficients between active power generation and meteorological parameters is to determine whether and to what extent these parameters should be included in forecasting models for solar power generation. The outcome of this analysis should be an optimal set of parameters used to build predictive models.
The inclusion of meteorological data in the model for forecasting SPP generation is justified only in the presence of moderate or high correlation of meteorological parameters with the output value, which is confirmed by correlation coefficients above 0.3–0.4. This approach allows us to reduce the volume of input data, avoiding taking into account factors that do not significantly affect the forecast accuracy. This is especially important for simplifying models and minimizing computational costs when training neural networks.
Analyzing the results obtained, it was found that in the correlations of meteorological parameters (e.g., panel temperature or air temperature), there is a component related to the dependence of both power generation and solar radiation intensity. In order to isolate the influence of meteorological parameters (excluding the influence of solar radiation intensity), the article proposes a new indicator P* defined by Formula (8), which allows for moving from using absolute values of solar radiation intensity to unit active power production per unit of solar radiation. The transition from using generated active power values to unit active power generation (P*) as a more objective assessment of correlation dependence allowed for moving away from the dominant correlation with solar radiation intensity and focusing on parameters that directly (time) and indirectly (air temperature, panel temperature, air humidity) affect the evaluation of the impact on electricity production by photovoltaic panels.
It should be emphasized that determining the value of this indicator for a specific solar power plant is relatively easy, because the both values of generated active power and the intensity of solar radiation are continuously monitored at its location. Using the indicator of unit active power generation to forecast electricity production significantly increases the accuracy of forecasts, which is illustrated by the calculation results in
Table 3.
The results of this study are important because they allow for an optimal approach to building neural network models for forecasting electricity production from photovoltaic panels. They help avoid the use of unnecessary data, prevent computational overload during calculations and improve the quality and accuracy of forecasting. This is achieved by using input parameters that have a strong correlation with the target values and clearly influence them.