Methods for Integrating Extraterrestrial Radiation into Neural Network Models for Day-Ahead PV Generation Forecasting

: Variability, intermittency, and limited controllability are inherent characteristics of photo-voltaic (PV) generation that result in inaccurate solutions to scheduling problems and the instability of the power grid. As the penetration level of PV generation increases, it becomes more important to mitigate these problems by improving forecasting accuracy. One of the alternatives to improving forecasting performance is to include a seasonal component. Thus, this study proposes using information on extraterrestrial radiation (ETR), which is the solar radiation outside of the atmosphere, in neural network models for day-ahead PV generation forecasting. Speciﬁcally, ﬁve methods for integrating the ETR into the neural network models are presented: (1) division preprocessing, (2) multiplication preprocessing, (3) replacement of existing input, (4) inclusion as additional input, and (5) inclusion as an intermediate target. The methods were tested using two datasets in Australia using four neural network models: Multilayer perceptron and three recurrent neural network(RNN)-based models including vanilla RNN, long short-term memory, and gated recurrent unit. It was found that, among the integration methods, including the ETR as the intermediate target improved the mean squared error by 4.1% on average, and by 12.28% at most in RNN-based models. These results verify that the integration of ETR into the PV forecasting models based on neural networks can improve the forecasting performance.


Introduction
Recently, the use of renewable energy sources (RES) for reducing greenhouse gases and consequent sustainable development has been considered inevitable. The integration of RES has been successful to some extent, as various countries have actively implemented policies for wide integration, such as feed-in-tariffs and renewable portfolio standards, and the levelized cost of energy (LCOE) of RES has decreased through technological developments. The International Renewable Energy Agency (IRENA) reported that photovoltaic (PV) power generation tripled from 250 TWh in 2015 to 720 TWh in 2019 [1]. Additionally, it was projected via an International Energy Agency (IEA) sustainable development scenario that approximately 3268 TWh will be produced by PV generation by 2030 [2].
However, such an increase in PV generation causes instability in power systems due to its variability, intermittency, and limited controllability. Specifically, from the perspective of transmission system operators (TSOs), the unfavorable characteristics of RES, including PV generation, can exacerbate the imbalance between power supply and demand [3] and make power system planning and operation more difficult [4]. To address such problems, TSOs need to secure a significant number of flexible resources, which is then followed by an increase in the electricity bills of customers. Its threat to power systems is ongoing due to its unlimited access to the grid. For example, electricity consumption is frequently settled at a negative price in Germany, where 32.5% of gross power production is generated by wind and solar. The demand of the power system subtracted by generation of the RESs is called net demand, and their generation cannot be changed, much like usual demand, due to unlimited access to the grid. The reason for negative prices is that significantly reduced net demand, due to a surge in increased supply from RES at an instant, causes in the power system a condition of oversupply, requiring that inflexible generators with high ramp-up and ramp-down costs generate even for negative electricity prices. Therefore, a negative electricity price indicates a severe condition of a power system because its occurrence means the power system cannot remain in balance using only flexible resources [5]. Furthermore, from the perspective of distribution system operators (DSOs), a high penetration of PV generation results in the need for investment in various alternative resources to achieve stable power system operation by addressing problems such as voltage fluctuation, increased network losses, and feeder overloading in the distribution network [6][7][8].
Increasing the forecasting accuracy of PV generation is one of the most simple and economic solutions to such problems because it can incentivize balance-responsible parties (BRPs) in an attempt to reduce the penalty caused by the imbalance between the scheduled and actual PV generation. Ultra short-term forecasting is used in techniques such as power smoothing and real-time electricity dispatch. Short-term forecasting is useful for power scheduling tasks, such as unit commitment and economic power dispatch, and can also be used in PV-integrated energy management systems [9]. In contrast, medium-term and long-term forecasting are effectively used in power system planning, where network optimization is performed and investment decisions are made [10].
PV forecasting can be classified, according to its methodology, into physical methods, statistical approaches, and machine learning approaches [9]. In physical methods, the radiant energy reaching the Earth's surface is determined using a physical atmospheric model affecting the solar radiation. In Dolara et al. [11], the PV output was forecasted using an irradiance model considering the transmittance of the atmosphere and an air-glass optical model. In statistical approaches, PV generation is forecasted through the statistical analysis of input variables. For instance, multi-period PV generation was forecasted using autoregressive moving average (ARMA) and autoregressive-integrated moving average (ARIMA) models in Colak et al. [12], and an autoregressive moving average with exogenous inputs (ARMAX) model was applied to statistical PV forecasting by including weather forecasting data in Li et al. [13]. In the last category of machine learning approaches, a forecasting model is trained by updating the model parameters based on existing data. Machine learning has wide applications, such as in dynamical systems with memory [14] and health monitoring [15]. Various machine learning models have been applied to forecasting PV generation, such as a support vector machine (SVM) [16], Bayesian neural network [17] and RNN-based models such as long short-term memory (LSTM) [18] and gated recurrent network (GRU) [19].
There have been attempts to separate the components of a forecasting target in the time series forecasting. In Keles et al. [20], the spot prices in the electricity market were forecasted by separating them into deterministic and stochastic signals. In Zhu et al. [21], Zang et al. [22], and Li et al. [23], time series data were decomposed using wavelet decomposition, and then the decomposed signals were used to teach a neural network. Similarly, time series data PV generation can be regarded as having seasonal and temporal components. The seasonal component results from changes in the altitude of the Sun due to the Earth's orbit and rotation, while various stochastic factors, such as weather conditions, are the source of the temporal component.
On the other hand, there have been studies where PV generation data of similar dates during a year are combined to forecast the output. As the generations at similar dates have the seasonal component in common, their combination leads to more accurate forecasting of the seasonal component. Thus, a forecasting method using the seasonal component is likely to show a good forecasting performance. PV generation data of adjacent days were Energies 2021, 14, 2601 3 of 18 used in the convolution neural network (CNN) [22] and the time correlation method [24]. In Li et al. [25], the seasonal component was forecast using the weighted sum of the two outputs from the CNN with inputs of PV generation at adjacent days and from the LSTM with inputs of total PV generation over one day. The forecasting error can be reduced by indirectly considering seasonal changes through the regularizing effect of an ensemble model. In Wen et al. [26], the ensemble of a back-propagation neural network, radial basis function neural network, extreme learning machine, and Elman neural network was used for forecasting the seasonal component of PV generation, and in Gigoni et al. [27], the combination of the Grey-box model, neural nework, k-nearest neighbor, quantile random forest, and support vector regression was the best in terms of forecasting accuracy. Table 1 summarizes prior studies on PV generation forecasting by method. Table 1. Summary of existing PV generation forecasting methods.

Method Reference Highlights
Physical [11] Models the physical state and dynamics of the atmosphere. Statistical [12,13] Generates forecasting data with appropriate statistical assumptions.
In contrast to the approaches used in the previous studies, the seasonal component can be determined more accurately by direct geometric modeling, based on the fact that solar radiation changes periodically according to the Earth's rotation and orbit. Thus, in this study, we propose methods to calculate the seasonal component using the angle of incidence and solar constant for improving forecasting accuracy, and those methods are verified using four different neural network models: MLP and three RNN-based models including Vanilla RNN, LSTM, and GRU. The contribution of this paper is as follows.

•
This is the first study that explicitly introduces extraterrestrial radiation, that is, the solar radiation outside the atmosphere, into various neural network models; • Methods for integrating the extraterrestrial radiation into neural network models for PV generating forecasting are presented; • To verify the effectiveness of the proposed methods, the methods are applied to four neural network models: MLP and three RNN-based models including Vanilla RNN, LSTM, and GRU. Then, their forecasting performances are examined and compared.

Seasonal Changes in Extraterrestrial Radiation
Extraterrestrial radiation (ETR) refers to the power per unit area of sunlight irradiated at the distance between the Sun and the Earth in space. The ETR is radiation that does not consider the effects of the atmosphere and acts as an upper limit of the energy from the Sun reaching the Earth. The ETR has a seasonal characteristic because the Earth's orbit and rotation are seasonal. The parameters for modeling the seasonality of the ETR include the change in the distance between the Sun and Earth, the change in the solar declination due to the Earth's orbit, and the circumferential motion of the Sun due to the Earth's rotation.
Since the Earth's orbit is not circular but elliptical, the distance between the Sun and the Earth changes according to the orbit, which causes corresponding changes in radiation. In addition, since the Earth orbits the Sun with its rotation axis tilted by 23.5 degrees, the declination of the Sun varies with time, which is associated with the amount of sunlight throughout the year. The Earth's rotation also affects the radiation throughout the day, that is, there is a large amount of radiation during the day in contrast to the absence of radiation during the night. In this study, the ETR was geometrically determined as a function of time based on the diurnal motion model. Specifically, the solar constant was used to determine the maximum of daily ETR, and then it was adjusted according to the angle of incidence of the Sun with respect to the PV panel.

Solar Constant
The solar constant, G sc , is defined as the power per unit area irradiated from the Sun at the average distance between the Sun and Earth. It can be measured from a satellite to eliminate the atmospheric effect and determined as [28]: The radiation on a surface perpendicular to the sunlight on the n-th day of a year, denoted as G on , is calculated as [29]: (2)

Angle of Incidence
The ETR parallel to the final radiation incident on the PV panel is obtained considering the angle of incidence (θ), which means the angle between the radiation and the orthogonal line to the PV panel. The angle of incidence can be derived as a function of time by geometric modeling. The necessary variables include not only the declination, hour angle, latitude, and longitude but also the tilt angle and azimuth angle, which are the angles defining the geometric configuration of the PV panels.

Declination
As illustrated in Figure 1, the declination (δ) is defined as the angle between the lines connecting the Sun and the Equator in the equatorial coordinate system. It has a positive value in the Northern Hemisphere. The declination on the n-th day of a year can be approximately determined as [28]: the day, that is, there is a large amount of radiation during the day in contrast t sence of radiation during the night. In this study, the ETR was geometrically determined as a function of time the diurnal motion model. Specifically, the solar constant was used to deter maximum of daily ETR, and then it was adjusted according to the angle of inc the Sun with respect to the PV panel.

Solar Constant
The solar constant, , is defined as the power per unit area irradiated Sun at the average distance between the Sun and Earth. It can be measured from lite to eliminate the atmospheric effect and determined as [28]: The radiation on a surface perpendicular to the sunlight on the -th day o denoted as , is calculated as [29]: = × 1 + 0.033 × cos .

Angle of Incidence
The ETR parallel to the final radiation incident on the PV panel is obtained ering the angle of incidence (θ), which means the angle between the radiation orthogonal line to the PV panel. The angle of incidence can be derived as a fu time by geometric modeling. The necessary variables include not only the dec hour angle, latitude, and longitude but also the tilt angle and azimuth angle, w the angles defining the geometric configuration of the PV panels.

Declination
As illustrated in Figure 1, the declination ( ) is defined as the angle betw lines connecting the Sun and the Equator in the equatorial coordinate system positive value in the Northern Hemisphere. The declination on the -th day can be approximately determined as [28]:  The Sun's position in the celestial coordinate system can be expressed as a function of time within a day by considering the circumferential motion of the Sun due to the Earth's rotation. However, to accurately integrate the time zone into the geometrical model, the local time needs to be converted into solar time, which is determined based on the Sun, following the rule as follows [28]: where L st and L loc are the longitudes of Local Standard Time Meridian and the longitude at a specific location, respectively, and E is calculated as: Then, the solar time is further converted into the hour angle (ω) according to the following relationship as: The hour angle is conceptually illustrated in Figure 2.
pared with the latitude ( ).

Hour Angle
The Sun's position in the celestial coordinate system can be expressed of time within a day by considering the circumferential motion of the Su Earth's rotation. However, to accurately integrate the time zone into th model, the local time needs to be converted into solar time, which is dete on the Sun, following the rule as follows [28]: where and are the longitudes of Local Standard Time Meridian a tude at a specific location, respectively, and is calculated as: Then, the solar time is further converted into the hour angle (ω) acc following relationship as: The hour angle is conceptually illustrated in Figure 2. The PV panels are installed with a slope to maximize the amount of ated by minimizing the angle of incidence. The parameters defining the geo figuration of a PV panel consist of tilt (β) and azimuth (γ), which are show The tilt is the angle between the horizontal plane and the PV panel. The a angle measured from the Meridian to the point where the normal vector of is projected orthogonal to the horizontal plane, and thus it is positive in the and negative in the eastern area [28].

Tilt and Azimuth of PV Panels
The PV panels are installed with a slope to maximize the amount of power generated by minimizing the angle of incidence. The parameters defining the geometrical configuration of a PV panel consist of tilt (β) and azimuth (γ), which are shown in Figure 3. The tilt is the angle between the horizontal plane and the PV panel. The azimuth is the angle measured from the Meridian to the point where the normal vector of the PV panel is projected orthogonal to the horizontal plane, and thus it is positive in the western area and negative in the eastern area [28].

Calculation of the Extraterrestrial Radiation
Once the solar constant and the angle of incidence at a place of interest are calculated according to (2) and (8), the value of ETR, denoted as G ext , can be simply calculated as: The PV panels are installed with a slope to maximize the amou ated by minimizing the angle of incidence. The parameters defining t figuration of a PV panel consist of tilt (β) and azimuth (γ), which are The tilt is the angle between the horizontal plane and the PV panel. angle measured from the Meridian to the point where the normal vec is projected orthogonal to the horizontal plane, and thus it is positive and negative in the eastern area [28]. For instance, the process of calculating the ETR was applied to a place in Yulara, Australia. The resulting values of the ETR for a tilted surface on the first day of each month throughout a year are shown in Figure 4. The values of ETR had the same bell shape as a typical clear-sky PV generation over a day. The place is located in the Southern Hemisphere and, thus, the Sun's altitude and the corresponding value of cos θ are the highest in December and January. Accordingly, the ETR is the greatest in those months. The final radiance on the PV panel had the same bell-shaped pattern, but with more fluctuations depending on weather conditions. Consequently, the ETR can be interpreted as a piece of clean information on radiation by the Sun with noise from the atmosphere removed.

Calculation of the Angle of Incidence
The angle of incidence (θ) for determining the ETR can be calculated by using the explained parameters in Subsections 2.2.1., 2.2.2., and 2.2.3., and is as follows [28]: = cos (sin sin cos − sin cos sin cos + cos cos cos cos + cos sin sin cos cos + cos sin sin sin ).

Calculation of the Extraterrestrial Radiation
Once the solar constant and the angle of incidence at a place of interest are calculated according to (2) and (8), the value of ETR, denoted as , can be simply calculated as: For instance, the process of calculating the ETR was applied to a place in Yulara, Australia. The resulting values of the ETR for a tilted surface on the first day of each month throughout a year are shown in Figure 4. The values of ETR had the same bell shape as a typical clear-sky PV generation over a day. The place is located in the Southern Hemisphere and, thus, the Sun's altitude and the corresponding value of cos are the highest in December and January. Accordingly, the ETR is the greatest in those months. The final radiance on the PV panel had the same bell-shaped pattern, but with more fluctuations depending on weather conditions. Consequently, the ETR can be interpreted as a piece of clean information on radiation by the Sun with noise from the atmosphere removed.

Forecasting Models
In this section, the traditional persistent model and representative neural network models are briefly described, which are used for comparison purposes in the case study later.

Forecasting Models
In this section, the traditional persistent model and representative neural network models are briefly described, which are used for comparison purposes in the case study later.

Persistence Model
In the persistence model, PV outputs of yesterday are used as forecasted PV generation. Although the persistence model is simple, it shows a satisfactory forecasting performance, particularly when the weather conditions do not change significantly. The persistence model was used as a reference in a study comparing the forecasting accuracy.

Multilayer Perceptron
The multilayer perceptron (MLP) is a model where nodes that imitate a human neural network are stacked in series and parallel. The node receives input data, multiplies them by its weights, applies its activation function to the intermediate value, and finally generates the outputs. The wider and deeper the nodes are, the more complex the functions can be modeled.
A learning process is performed by computing the gradient of the loss function and updating the weights of a node using the back-propagation algorithm. If the MLP is deep with many layers, a vanishing gradient problem can occur, which means the gradient is no longer passed to the previous layers. Some activation functions, such as the rectified linear unit (ReLU), can mitigate the vanishing gradient problem. The MLP can be regarded as more effective and flexible than traditional regression models because there is no assumption on the form of the target function to be modeled.

Recurrent Neural Network
Unlike the MLP, the recurrent neural network (RNN) has a memory function that is implemented by sequentially feeding back the outputs or states in the previous times to the current input. Obviously, the outputs in the previous times contain the information in the past. This property makes the RNN suitable for dealing with time series data. The learning process of the RNN is performed by the technique of back-propagation through time, which unfolds the RNN with respect to time and applies the same back-propagation algorithm as the MLP. However, the weights of the RNN have the risk of divergence, particularly when handling a long sequence of data, because the same weight parameters are updated repetitively during training.
To address the divergence problem, gate structures are proposed for the RNN, called gated RNN, such as long short-term memory (LSTM) and gated recurrent unit (GRU). The gates in the gated RNN determine which information is retained or discarded in the hidden state. Nodes constituting the gates make these decisions. The nodes applies the sigmoid function as their activation function to their intermediate outputs, which are then multiplied with the hidden state. As the sigmoid function is bounded in (0, 1), only a portion of the hidden states is preserved by multiplication, meaning the gate considers the portion important. The parameters of the nodes are added to those of the simple RNN, and they are also updated in the learning process so that the gates make effective decisions. It has been empirically verified that gated RNNs have better characteristics in terms of convergence and accuracy for the forecasting task of a long sequence. Recently, gated RNNs have been widely used for time series forecasting [29,30].

Vanilla RNN
Vanilla RNN is a preliminary RNN structure that stores historic information by a hidden state. The hidden state at time t (h t ) is determined by applying the hyperbolic tangent function to the weighted sum of the current input and hidden states in the previous times as follows: where W h,x and W h,h are the weights; b h is the bias; x t is the input at time t; and h t is the output at time t. Figure 5 shows the structure of a cell in the LSTM. The cell has the input, forget, and output gates that determine the output through weight update Equations in (11)- (14).

Long Short-Term Memory
where and W g,h are the weights; b i , b f , b o , and b g are the biases; x t and h t are the input and output at time t; i t , f t , and o t are the input, forget, and output gates at time t, respectively; g t is the candidate; σ(·) is the sigmoid function. The function of the three gates of the LSTM cell is implemented as the elementwise multiplication, denoted as x around a circle in Figure 6. Specifically, the forget gate eliminates unnecessary information from the cell state (c t−1 ) in the previous time; the input gate extracts important information from the input; the output gate generates the selective output from the sigmoid function values (o t ) for the hidden state (h t−1 ) in the previous time. Figure 5 shows the structure of a cell in the LSTM. The cell ha and output gates that determine the output through weight update (14). and ℎ are the input and output at time ; , , an forget, and output gates at time , respectively; is the candidate; function. The function of the three gates of the LSTM cell is imple ment-wise multiplication, denoted as x around a circle in Figure 6. S get gate eliminates unnecessary information from the cell state ( time; the input gate extracts important information from the input; th erates the selective output from the sigmoid function values ( ) fo (ℎ ) in the previous time.

Gated Recurrent Unit
Unlike the LSTM, the cell of the GRU has two gates, that is, a re date gate ( Figure 6). The update equations associated with the gates a

Gated Recurrent Unit
Unlike the LSTM, the cell of the GRU has two gates, that is, a reset gate and an update gate ( Figure 6). The update equations associated with the gates are given as: where W z,x , W z,h , W r,x , W r,h , W h,x , and W h,h are the weights; b z , b r , and b h are the biases; x t and h t are the input and output at time t; z t and r t are the update and reset gates at

Forecasting Methods with Extraterrestrial Radiation
The ETR has rarely been considered in forecasting application machine learning models. One of the most useful properties of the E problem is that it accurately contains the seasonal effect on the ta forecasted. Thus, the forecasting accuracy is expected to improve if th integrated into existing forecasting methods. In this section, we prop tegrate the ETR into a data-driven forecasting method.

Forecasting Framework of the Base Method
In this subsection, the base day-ahead hourly forecasting fram not consider the ETR, is presented as shown in Figure 7. The procedu 1. Imputer imputes day-ahead generation and weather data; 2. The imputed data are arranged in the timeframe of a past in present interval (1 h); 3. Sequenced data are split for training and testing; 4. Split data are scaled using the MinMaxScaler; 5. A forecasting model is trained and tested; 6. Forecasted results are saved and errors are calculated

Forecasting Methods with Extraterrestrial Radiation
The ETR has rarely been considered in forecasting applications using data-driven machine learning models. One of the most useful properties of the ETR in a forecasting problem is that it accurately contains the seasonal effect on the target variable to be forecasted. Thus, the forecasting accuracy is expected to improve if the ETR is effectively integrated into existing forecasting methods. In this section, we propose methods to integrate the ETR into a data-driven forecasting method.

Forecasting Framework of the Base Method
In this subsection, the base day-ahead hourly forecasting framework, which does not consider the ETR, is presented as shown in Figure 7. The procedures are as follows:
The imputed data are arranged in the timeframe of a past interval (15 min) and present interval (1 h); 3.
Sequenced data are split for training and testing; 4.
Split data are scaled using the MinMaxScaler; 5.
A forecasting model is trained and tested; 6.
Forecasted results are saved and errors are calculated The simple imputer imputes omitted data by nearby data. The past interval, which is the interval of the model's input, was determined to be 15 min. The present interval, which is the interval of the model's output, was determined to be 1 h as most of the day-ahead electricity market receives bids and offers on the hourly units and unit commitment is solved in an hourly manner. The lower bound of the scaling interval of the MinMaxScaler was determined to be 0.1 to prevent the distortion of data resulting from dividing by a small number. The simple imputer imputes omitted data by nearby data. The past interval, which is the interval of the model's input, was determined to be 15 min. The present interval, which is the interval of the model's output, was determined to be 1 h as most of the day-ahead electricity market receives bids and offers on the hourly units and unit commitment is solved in an hourly manner. The lower bound of the scaling interval of the MinMaxScaler was determined to be 0.1 to prevent the distortion of data resulting from dividing by a small number.

Forecasting Framework of Proposed Methods
In this subsection, the methods combining the ETR with a data-driven neural network model are described. The framework of the proposed methods is presented in Figure 7. Specifically, five integration methods were developed and are presented: (1) division preprocessing, (2) multiplication preprocessing, (3) replacement of existing input, (4) inclusion as additional input, and (5) inclusion as an intermediate target, which are denoted in order as M1, M2, M3, M4, and M5. In the proposed forecasting framework, the ETR is calculated for given timestamps using the latitude, longitude, tilt angle, and azimuth angle of the panel in the place. Compared with the base method, M1 and M2 perform the additional function in the adjustment/re-adjustment block, which is drawn with a dashed line in Figure 7. Similarly, M3, M4, and M5 perform the function of the input filtering block, which is drawn with the dashed-double dotted line in Figure 7, to modify their input for their method-specific implementation.

Forecasting Framework of Proposed Methods
In this subsection, the methods combining the ETR with a data-driven neural network model are described. The framework of the proposed methods is presented in Figure 7. Specifically, five integration methods were developed and are presented: (1) division preprocessing, (2) multiplication preprocessing, (3) replacement of existing input, (4) inclusion as additional input, and (5) inclusion as an intermediate target, which are denoted in order as M1, M2, M3, M4, and M5. In the proposed forecasting framework, the ETR is calculated for given timestamps using the latitude, longitude, tilt angle, and azimuth angle of the panel in the place. Compared with the base method, M1 and M2 perform the additional function in the adjustment/re-adjustment block, which is drawn with a dashed line in Figure 7. Similarly, M3, M4, and M5 perform the function of the input filtering block, which is drawn with the dashed-double dotted line in Figure 7, to modify their input for their method-specific implementation.

Division Preprocessing
ETR is attenuated by the atmosphere in proportion to the total magnitude of ETR. The attenuation ratio of ETR to the solar radiation reaching the land is called the clearness index in meteorological terms. Even if the atmospheric conditions are the same at 9:00 a.m. and 12:00 p.m., the absolute attenuated amount is larger at 12:00 p.m. than the other because of differences in ETR. To accurately reflect the atmospheric conditions in a model, PV generations need to be adjusted for the model to receive the same signal when meteorological circumstances are the same. Dividing the radiation by ETR is one way of transforming the radiation to only meteorologically affected signals. Since PV generation is strongly correlated with the radiation by R 2 = 0.99 [9], dividing the PV generation by ETR is expected to have the same effect as dividing the radiation. Thus, method M1, which normalizes PV generation using ETR, is proposed. The framework for M1 is shown in Figure 7, and it is implemented by dividing the PV generation by the ETR in the adjustment block and multiplying in the re-adjustment block.

Multiplication Preprocessing
Data-driven neural network models are trained by back-propagating an error. Mean squared error, which calculates the squared difference between the predicted output and the target, is used in general for training a model, and is one of the performance indexes for evaluation. Since MSE is calculated by squaring absolute differences, the outputs around noon with a large ETR significantly affect total MSE due to the fact of their magnitude. For example, if the model predicts poorly around noon and favorably around morning, the MSE will be larger than the opposite situation. If the training objective is to minimize MSE, which is the most likely situation in real applications, it can be assumed desirable to train more accurately in timeslots with a high ETR. Multiplying PV generation by ETR makes squared error in the timeslots with a high ETR large during training, resulting in a high back-propagation signal to input variables in those timeslots to minimize the modified MSE. For this reason, a method that multiplies PV generation by ETR is proposed. The framework for M2 is also shown in Figure 7, and it is implemented by multiplying the PV generation by the ETR in the adjustment block and dividing in the re-adjustment block.

Replacement of Existing Input
Most of the forecasting frameworks use historical PV generation as one of their inputs. In the frameworks, the model receives limited data, such as for one or two days prior, as input to forecast generation at a specific time unless long periods of historical data are inserted in direct ways, such as sequence-to-sequence LSTM, or indirect ways such as prior studies have done. In other words, the model preserves historical data only in a form of model parameters and has limited access to historical generation only as input for forecasting. As meteorological conditions vary from day to day, today's PV generation can be significantly different from day-ahead generation due to the stochasticity. Performance can be expected to improve if the model receives a clean signal with the stochasticity removed. The ETR is the solar radiation unaffected by meteorological conditions and can be deemed a total sum of historical data since summing every historic generation erases the stochasticity. Therefore, a method that replaces day-ahead generation with the ETR is proposed. The framework for M3 is also shown in Figure 7, and it is implemented by inserting the ETR and weather data through an input filtering block to a sequencing block.

Inclusion as Additional Input
The ETR is the solar radiation without stochasticity and is an upper bound of PV power generation. Day-ahead PV generation includes information about the day-ahead meteorological conditions, which can be correlated with today's conditions. Specifically, the day-ahead data are especially valuable in a situation where similar weather conditions frequently continue for several days. Both ETR and day-ahead PV generation are meaningful information for forecasting. Therefore, a method that includes both day-ahead generation and ETR is proposed. The framework for M4 is also shown in Figure 7, and it is implemented by inserting day-ahead PV generation, ETR, and weather data through an input filtering block to a sequencing block.

Inclusion as an Intermediate Target
Instead of inserting the raw ETR directly, it would be more effective for the ETR to be scaled by the daily averaged clearness index considering the weather. Therefore, a method that utilizes an MLP model trained separately to predict the clearness index is proposed.
This MLP model is named as the clearness model, as it predicts the clearness index. The training framework for this model is shown in Figure 8 and is as follows: 1.
The clearness model takes weather data as input and outputs a clearness index, which is a one-dimensional scalar; 2.
ETR is multiplied by the output of the clearness model; 3.
MSELoss between the adjusted ETR and the target value is calculated; 4.
MSELoss is backpropagated to train the clearness model.
proposed. This MLP model is named as the clearness model, as it pr index. The training framework for this model is shown in Figure 8 an 1. The clearness model takes weather data as input and outputs which is a one-dimensional scalar; 2. ETR is multiplied by the output of the clearness model; 3. MSELoss between the adjusted ETR and the target value is calcu 4. MSELoss is backpropagated to train the clearness model.
The framework for M5 is also shown in Figure 7, and it is implem day-ahead PV generation, ETR multiplied by the output of the train and weather data through an input filtering block to a sequencing blo

Dataset
In the case study, datasets provided by the Desert Knowledge A tre (DKASC) in Australia were used [31,32]. DKASC operates solar po Springs and Yulara in Australia, and their information is listed in Tab The datasets provided by DKASC contain both meteorological i power generation data in the two regions. The specific features in th in Table 3. The feature of wind direction was excluded because it inc ing error. The features of global horizontal radiation and diffuse h were not chosen either, because it strongly correlates with PV genera ter measurement was also excluded, as it measures radiation in a band.
The total period of data used in the case study was from May 20 Among them, the data from May 2016 to January 2020 were used maining data over the year from February 2020 to January 2021 were The framework for M5 is also shown in Figure 7, and it is implemented by inserting day-ahead PV generation, ETR multiplied by the output of the trained clearness model, and weather data through an input filtering block to a sequencing block.

Dataset
In the case study, datasets provided by the Desert Knowledge Australia Solar Centre (DKASC) in Australia were used [31,32]. DKASC operates solar power plants in Alice Springs and Yulara in Australia, and their information is listed in Table 2. The datasets provided by DKASC contain both meteorological information and PV power generation data in the two regions. The specific features in the dataset are listed in Table 3. The feature of wind direction was excluded because it increased the forecasting error. The features of global horizontal radiation and diffuse horizontal radiation were not chosen either, because it strongly correlates with PV generation [9]. Pyranometer measurement was also excluded, as it measures radiation in a specific frequency band. The total period of data used in the case study was from May 2016 to January 2021. Among them, the data from May 2016 to January 2020 were used for training; the remaining data over the year from February 2020 to January 2021 were used for the test.

Models and Performance Index
The effectiveness of the proposed methods was verified and compared using four representative neural network models: MLP, Vanilla RNN, LSTM, and GRU.
To evaluate the forecasting performance, two types of indexes, that is, mean squared error (MSE) and mean absolute error (MAE), were used. The two indexes are defined as follows: where n is the number of samples, x i is the predictors, y i is the target output, and f is the output forecasted by a model. To determine the hyperparameters of the models, 10-fold cross-validation was conducted by randomly sampling 10 subsets with equal size from the training set and using each of the subsets for validation in each stage of validation. The resulting hyperparameters of the models are listed in Table 4. The Adam optimizer was used in the training. Epoch was determined as 150 because the average of 10-fold validation errors were saturated and oscillated in a small range after being trained 60~90 times, and increased after being trained over 200 times, as shown in Figure 9. Figure 9 also shows the average of a 10-fold training error of the same model, which steadily declined as the epoch increased. This means the model had enough capacity to fit the training data, validating a reasonable choice of the size of the hidden states. The difference in scale between the validation error and training error was because MinMaxScaler was applied during training. Learning rate schedulers were not used in this study, because they can make the model converge into a poor local minimum, which leads to a higher MSE. The RNN-based models were configured to be bidirectional, as shown in Figure 10, because it is more effective to consider future data and predicted outputs for prediction especially for the day-ahead forecasting. A simple two-layer MLP model was applied to the output of the RNN-based models to reduce the dimension of the hidden states to one.

Results
The proposed methods were applied to the four neural network models described in Section 5.2 for the two datasets. The results are listed in Tables 5 and 6. Each method was trained and tested 20 times for each combination of the model and the dataset to examine its effect on average.
As a result, M5 was the most effective in the majority of the dataset-model combinations. Table 7 shows the improvements M5 achieved relative to the base method in RNN-based models. M5 reduced the MSE by 4.1% on average and, at most, by 12.28% in the RNN-based models. These results imply that integrating ETR into neural network models can improve model performance without additional investment in data collection.
The fact that M5 had consistently higher performance than other methods under different datasets and RNN-based models means that M5 can be expected to be effective when one forecasts PV generation without verifying which method would be the best. tional, as shown in Figure 10, because it is more effective to consider future data and predicted outputs for prediction especially for the day-ahead forecasting. A simple two-layer MLP model was applied to the output of the RNN-based models to reduce the dimension of the hidden states to one.   tional, as shown in Figure 10, because it is more effective to consider future data and predicted outputs for prediction especially for the day-ahead forecasting. A simple two-layer MLP model was applied to the output of the RNN-based models to reduce the dimension of the hidden states to one.     We analyzed the results of each method, from which important lessons were derived as follows. First, for M1, two points are notable: (1) Comparing the performance of M1 with the persistence model on each dataset to consider the relative error, the performance on the BP Solar dataset was worse than that on the Desert Gardens dataset. (2) Even though M5 was superior most of the time, the performance of M1 was the best when MLP was used on the BP Solar dataset. From point (1), it can be derived that the performance of each method varies considerably between different datasets even though they are not vastly different, as their geographical locations are similar. From point (2), it can be derived that the performance of each method varies across different models. The reason that M1 featured a large error in one of the datasets is inferred as follows. When normalized, PV generation in timeslots with small ETR and large ETR become closer to each other. Then, gradient signals with similar magnitudes are backpropagated in each timeslot during training, the two timeslots being treated equally even though their ETRs are different. During testing, the outputs of the model are multiplied by ETR, which leads to amplification of the error. If trained inappropriately, forecasting in the timeslots with a large ETR can be inaccurate compared with a small ETR, resulting in a higher MSE than the base method. However, as shown in point (2), M1 can be the best method depending on the forecasting configuration. Therefore, rather than using the best method for a different dataset, it is important to verify the competitiveness of each method by validation to choose the best method for the dataset and model used. M2 had better performance than M1 on average. Nevertheless, it is not certainly superior to the base method, implying that adjusting the gradient signal can affect the result inappropriately.
M3, which replaces the historical generation with ETR, performed poorly in general. In particular, the MSE of M3 was higher by 150% than the base method when the Desert Gardens dataset and RNN-based models were used. This result suggests two implications: (1) Assuming that ETR performed as a good baseline for prediction, inaccurate forecasting of M3 means a lack of enough meteorological data to predict the attenuation ratio of the atmosphere. (2) Comparing the situations with and without day-ahead generation, the superior performance of the method that includes day-ahead generation implies that day-ahead PV forecasting is effective by using day-ahead generation as a baseline for the prediction. In other words, the neural network-based models improve their performance from the persistence model by considering meteorological information.
The results of M4 and M5 can be analyzed from the perspective above. M4 and M5, which take day-ahead PV generation as one of the inputs, had additional improvement by having one more feature than the base method with day-ahead generation as a baseline. M4 including raw ETR as one of the inputs saw decent improvement in the Desert Gardens dataset while performing worse in the BP Solar dataset. It was more desirable to adjust ETR using meteorological information most of the time in this experiment.
Although M5 had a general advantage in our experiments, there are various circumstances in which PV generation is forecasted in a day-ahead manner in real world applications, and the effectiveness of each method varies depending on models and datasets. Therefore, it should be noted that for a given model and dataset, various methods should be evaluated by validation to effectively integrate ETR into a forecasting model.

Conclusions
This study presents a simple and effective method to improve the forecasting accuracy of PV generation for mitigating the problems caused by the inherent characteristics of PV, such as variability, intermittency, and limited controllability. This study focused on the ETR strongly associated with the seasonal component of PV as a means to improve forecasting performance. We selected neural network models as the basic forecasting method. Then, we composed five methods to integrate the ETR into them and examined the effect in terms of forecasting performance. The specific integration methods were (1) division preprocessing, (2) multiplication preprocessing, (3) replacement of existing input, (4) inclusion as additional input, and (5) inclusion as an intermediate target.
The methods were tested on MLP, Vanilla RNN, LSTM, and GRU using the two PV datasets. The results show that combining the ETR with existing models can achieve meaningful improvement in forecasting performance and present a new approach to considering seasonal changes in PV generation. Among the methods, including the ETR as an intermediate target (i.e., M5) showed relatively better results than the other integration methods. However, the combination of M5 with Vanilla RNN was the best for one dataset, but the combination of M5 with LSTM was the best for the other. Thus, a certain neural network model combined with the ETR did not show absolute superiority as usual with the comparison results between AI methods.
This study was limited to a few selected neural network models, even though they are known as the methods that effectively deal with time series data. Thus, the effectiveness of the proposed integration methods of the ETR can be further examined in other neural network models. It is also necessary that the proposed methods be applied to other PV datasets in extended studies. Then, the relative superiority of M5 can be further evaluated and more elaborate advice for combining the ETR with neural network-based models can be given.