Forecasting of Photovoltaic Power by Means of Non-Linear Auto-Regressive Exogenous Artiﬁcial Neural Network and Time Series Analysis

: In this research paper, a nonlinear autoregressive with exogenous input (NARX) model of the nonlinear system based on neural network and time series analysis is proposed to deal with the one-month forecast of the produced power from photovoltaic modules (PVM). The PVM is a monocrystalline cell with a rated production of 175 watts that is placed at Heliopolis University, Bilb é is city, Egypt. The NARX model is considered powerful enough to emulate the nonlinear dynamic state-space model. It is extensively performed to resolve a variety of problems and is mainly important in complex process control. Moreover, the NARX method is selected because of its quick learning and completion times, as well as high appropriateness, and is distinguished by advantageous dynamics and interference resistance. The neural network (NN) is trained and optimized with three algorithms, the Levenberg–Marquardt Algorithm (NARX-LMA), the Bayesian Regularization Algorithm (NARX-BRA) and the Scaled Conjugate Gradient Algorithm (NARX-SCGA), to attain the best performance. The forecasted results using the NARX method based on the three algorithms are compared with experimentally measured data. The NARX-LMA, NARX-BRA and NARX-SCGA models are validated using statistical criteria. In general, weather conditions have a signiﬁcant impact on the execution and quality of the results.


Background
Recently, forecasting renewable energy sources or solar radiation has become a major challenge for numerous researchers [1]. Forecasting photovoltaic temperature and power output and optimizing the injected power to the grid play vital roles in the smart grid and in the stability of the network. Moreover, supplying the network or isolated system with sufficient electricity can be key in a photovoltaic generator. It is important to analyze and study the performance of the injected power from the photovoltaic system to ensure the stability of the network. A photovoltaic panel has a non-linear and complex electrical equivalent circuit, which imposes serious instability in the generated power. Photovoltaic structures are essentially variable, unstable and are greatly affected and dependent on several factors, such as meteorological parameters, shading conditions, dust, sand and soiling deposition, wind speed and orientation and cable losses [2]. Any problems occurring on the photovoltaic panels, such as hot-spots, shading and dust, can be transferred to the grid, affecting the produced power output and also affecting the used inverter and converter. These instabilities and disturbances can be transferred to the network and influence its stability. In order to inject stable power to ensure the stability of the network, forecasting of a photovoltaic system will be necessary for monitoring the consumption of the power system [3]. In this instance, a number of studies have proposed various strategies and methods to forecast several parameters in a photovoltaic system. The selected technique depends mainly on different parameters and available data, which depend on weather conditions [4,5]. It is classified into (i) very short-term, (ii) short-term, (iii) medium-term and (iv) long-term forecasting.
The existing forecasting methods and techniques can be classified into physical methods, statistical methods and hybrid methods. The physical methods are mostly dependent on meteorological variables that influence electricity generation. These methods are mainly based on mathematical equations, which use several parameters to transform solar radiation into electricity. The above models can be simple and depend on sunlight, or they can be complex and include further parameters such as ambient temperature, soiling, cable losses and humidity [6]. The statistical modeling techniques are supported by data that have been tested for repeatability and reproduced at periodic intervals [7]. They are based on previous experimental data that have been analyzed by applying time-series data and measured parameters correlated with meteorological information based on the assumption that historical data will reappear in the future [8]. Hybrid methods are a combination of the aforementioned methods [9]. The objectives are to combine several techniques to overcome the limitations of a simple technique and to improve the outcomes' effectiveness.

Literature Review
In the research literature, numerous papers predicting the generated output power of photovoltaic systems have been published, applying different techniques and methods. An artificial neural network (ANN) calibration methodology has been used to forecast the day-ahead of photovoltaic power in Milan, Italy [10]. The approach was used to determine the optimum network settings in terms of layer number, neurons and trials. The results were validated by using different statistical indexes, particularly the normalized mean absolute error (NMAE%), with low values. Another study used a Radial Belief Neural Network (RBNN) to forecast power generated by large photovoltaic plants in India [11]. The performance of the RBNN was compared with deep and machine learning techniques in terms of the different statistical indexes. The results showed that the RBNN generates low errors compared with the other techniques. The Internet of Things (IoT) was implemented to collect different natural factors in the environment to forecast the generated energy from a photovoltaic module using ANN [12]. The technique reduced the mean square error (MSE) while improving the forecasting effectiveness. The investigation, on the other hand, failed to offer the necessary data for appropriate model training. This resulted in the inability to train ANNs adequately due to the lack of some crucial features. A combination of the WRF-solar model with multi-layered urban canopy and building energy models was utilized to provide an integrated physical approach. A fuzzy c-means optimized by Whale Optimization Algorithm (WOA) and Least Squares Support Vector Machine (LSSVM) were used to forecast the day ahead for a photovoltaic power station with 2.2 MW capacity [13]. The results showed that the forecasting accuracy using WOA-LSSVM outperformed LSSVM, long short-term memory (LSTM), and Particle Swarm Optimization-Backpropagation (PSO-BP) in terms of root mean square error (RMSE) under different climatic conditions. A hybrid model combining Wavelet Transform (WT), PSO and support vector machines (SVMs) was implemented to forecast short-term photovoltaic power [14]. The results using the WT-PSO-SVM outperformed seven different models, namely a Back Propagation Neural Network (BPNN), Hybrid Genetic Algorithm and Neural Network (HPNN), SVM, Hybrid Genetic Algorithm (HGS), hybrid PSO-SVM and Hybrid Hilbert-Huang Transform (HHT). On the other hand, a Moth Flame Optimization Algorithm (MFOA) was proposed to optimize the SVM parameters in order to forecast the photovoltaic power output [15]. This methodology led to enhanced forecasting accuracy, according to the conclusions. These investigators primarily used classical optimization Electronics 2021, 10,1953 3 of 17 approaches to achieve the forecasting performance of SVM by optimizing the parameter values. The optimized genetic algorithm has been proposed to optimize Bidirectional Long Short-Term memory (BiLSTM) to forecast multiple photovoltaic power outputs [16]. The results demonstrate the importance of adjacent photovoltaic system power series, and the suggested model performs adequately in ultra-short-term forecasting.
The acquisition of photovoltaic power data has stationary points, and to overcome this problem, a statistical method has been developed for the short-term (0-6 h) forecasting of the power output of a photovoltaic system [17]. The paper presented a forecasting methodology that uses existing data from geographically distributed photovoltaic installations to forecast the power production of a given plant by exploiting spatial and temporal correlations. These approaches were investigated by analyzing the relationship between historical performance and the associated meteorological data. The method used a network of sensors made up of geographically scattered power plants. The used input was derived from the prepared data and not based on the global solar irradiation data, and the Numerical Weather Predictions (NWP) were not taken into consideration. In relation to state-of-the-art forecasting methodologies, the Normalized Root Mean Square Error (nRMSE) can be reduced by 20% or more. Again, for short-term forecasting of the power output of a photovoltaic system, a new probabilistic method has been proposed based on a competitive ensemble of diverse base predictors [18]. The author selected three different probabilistic methods and trained them as base predictors in order to construct an ensemble of the predictive distribution with the best precision and accuracy criteria. These methods have been implemented in three different steps to form a probabilistic multi-model ensemble forecast. In terms of production forecast dependability, the Multi-Objective (MO) optimization technique outperformed the Single-Objective (SO) optimization technique by a wide margin, with minimal Continuous Ranked Probability Score (CRPS) losses. A simple forecasting model was compared with more complicated and sophisticated models over 32 photovoltaic plants of different sizes and technology over a period of one year [19]. The collected data were classified into three categories, namely meteorological data, measured data and computed data. The data were trained hourly, and the results were evaluated using different performance indices; the normalized Mean Absolute Error (nMAE) was the main index compared with other algorithms. The accuracy and quality of the results were evaluated using the Grey Box (GB) model, Quantile Random Forest (QRF) and an ensemble of methods. The results were improved by 5% in terms of nMAE. In another paper, the output power of a photovoltaic system was forecasted using a hybrid Deep Learning (DL) system based on Convolutional Neural Networks (CNN) and Long-Short Term Memory Recurrent Neural Networks (LSTM) [20]. The CNN model was used to find nonlinear patterns and persistent frameworks in prior output power data, allowing for a more accurate photovoltaic power forecast. The LSTM was also used to estimate the photovoltaic power of the following time step by modeling the dynamic variations in the previous photovoltaic data. The suggested approach was thoroughly tested on data from a photovoltaic system installed in Limberg, Belgium, and numerical results show that it can deliver good photovoltaic system predictive accuracy.
Recently, the NARX technique monthly based on NN and time-series analysis have been widely used in several papers to solve nonlinear and complex systems. This technique has already proven its immense potential through its capability to forecast complicated input-output correlations, and it has become a significant aspect of the modern approach to forecasting [21]. To reduce the computational costs associated with the linearity in the variable's mode, the NARX model is sometimes described as a continuous combination of complex functions [22]. NARX structures are well-suited to learning approaches, and their composition is complicated by the fact that the number of terms is rapidly increasing. The NARX methodology is generally recognized as a powerful modeling and evaluation technique that converges slightly quicker and generalizes far more than most other NN methodologies. These were created to handle time-dependent operating conditions using real-time data from equipment activity. Because the validity of the results is simple to com-Electronics 2021, 10, 1953 4 of 17 pute and the prediction results are simple to interpret, NARX was widely used to handle a range of data-driven modeling difficulties. Generally, the purpose of the NARX method is to predict the next power output using the time-series y(t) based on the identification of the previous information of the same exogenous information and previous information of the time-series x(t) [23].

Contributions
The aim of this paper is to consider the application of the NARX model combined with NNs and time-series to predict the power of photovoltaic modules for one month.
The PVM system is installed on the rooftop of Heliopolis University on the eastern edge of the southern Nile delta in Egypt. It is located on the east side of the Nile, with coordinates given as Latitude = 30.420 and Longitude = 31.565. The used data are collected from the monocrystalline modules with 175 W peak solar power fabricated by SunModule SW linked to a converter (Sunny Island) [24]. The above PVM is investigated in [25], which has indicated that the monocrystalline PVM is the best equipment for a hot climate after studying the influence of the experimental module on power generation. The proposed method uses solar radiation, hours and temperature as the exogenous variables. The NARX paradigm is assumed to be capable of simulating a highly nonlinear state-space model. It is also widely used to handle a range of issues, and it is mainly important in sophisticated control applications. The NARX model is also chosen because of its quick training and convergence times, as well as its exceptional accuracy and attractive dynamics and interference tolerance. In this paper, three algorithms are used to train and improve the NNs: NARX-LMA, NARX-BRA and NARX-SCGA. In order to evaluate the quality of the results, a comparison between the experimental measurement is presented. For further accuracy, the statistical measures are generated to examine the efficacy and productivity of the NARX-LMA, NARX-BRA and NARX-SCGA models. The results show that NARX-BRA outperforms NARX-LMA and NARX-SCGA in terms of Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Residual Sum of Squares (RSSE), Root Mean Squared Error (RMSE), Auto-Correlation Function (ACF) and R 2 . Finally, the results reveal that the NARX-BRA models are suitable for performing the one-month time-series composite index forecasting.
This paper is organized as follows: in Section 2, we provide a description and a presentation of the photovoltaic system, weather characteristics and the used methodologies. Section 3 deals with the forecasting performance metrics. Section 4 illustrates the simulation, presents a comparison of the results and provides an assessment of the prediction accuracy measures, as well as the discussion, and Section 5 concludes this study.

Materials and Methods
This section presents the implementation of the NARX model to forecast the power output of the PVM system. The PVM is installed in Heliopolis University, Belbeis city, Egypt. The system is installed to supply electricity to the university, which is connected to the grid. A weather station collects various data daily, such as solar radiation, ambient temperature and the power output of the photovoltaic modules. The output power is forecasted using the NARX model based on neural network time-series and the NARX-LMA, NARX-BRA and NARX-SCGA optimized training algorithms.

Photovoltaic Module and Weather Station
The photovoltaic modules are SunModule SW monocrystalline cells with 175 W peak solar power, located on the Nile's southern delta, with coordinates La = 30.420 and Lo = 31.565. The modules are fabricated by Solar Word, with an innovative module structure. They are made up of 72 series-connected cells with 125×125 mm dimensions and a generation capacity of 4.5 kWp. The electrical characteristics of the monocrystalline photovoltaic modules at STC are P max = 175 Wp, V mpp = 35.7, I mpp = 4.9 A, V oc = 44.4 V and I sc = 5.4 A [26]. The PVM is coupled to an SMA inverter and is inclined at a predefined angle Electronics 2021, 10, 1953 5 of 17 to collect the maximum power throughout the day. Figure 1 illustrates the photovoltaic modules on the roof-top of the university [27]. In this investigation, the monocrystalline PV ( Figure 1) was chosen for the aforementioned purposes: (1) this is the most commonly used device in Egypt; (2) it has the most reasonable costs among its competitors; (3) according to manufacturer reports, it has the best heat resistance.
Numerous papers have analyzed the performance of the 175 W SunModule SW monocrystalline cell. The thermal effects are reported in [28] by using different microinverter placements. It indicates that, at lower temperatures, the performance is approximately reduced by 0.65%, while the AC/power is enhanced by 0.9%. The electrical characteristics and performance analyses are provided in [29]. The supervision and control of the used PVM linked to two stages and connected to a DC/DC converter at the string level and the main inverter are presented in [30]. Another publication discusses and presents the experimental analysis and investigation of the single-phase Z-source inverter attached to the used photovoltaic modules (175 W SunModule SW monocrystalline) [31].

NARX Neural Network Model
The NARX neural network technique is an important class of nonlinear recurrent dynamic ANN computer program networks comprising linked nodes inspired by a simplification of the human neural system. As a result, every point contains an artificial neuron that takes one or more inputs and accumulates them and passes through a nonlinear active function to generate an output [32][33][34][35]. In the Feedforward Neural Networks (FNNs) model, the information moves in one direction, with nodes structured in layers. Meanwhile, in a Recurrent Neural Network (RNN) architecture, such as NARX, the information/communication moves both forward and backward, providing connections between neurons situated in the same or previous layers [36]. This structure evaluates the present level of an incoming time-series with preceding values of the same series, as well as the present and previous values of an exogenous series. This typically includes simulating the inputs and outputs of complex processes, constructed by two tapping delays, one flowing over the inputs and the other over the outputs. Likewise, the NARX methodology takes time-series data and is based on the linear autoregressive exogenous (ARX) model. The NARX technique is indeed an important form of discrete-time of a nonlinear system.
The time-series y(t) can be presented as the weighted sum of n independent power outputs of a photovoltaic system presented by P 1 (t), P 2 (t), . . . P n (t) as [22,37]: where e(t) is the deviation error.
The new value of y(t) is directly reliant on the previous output parameters and even the previous independent exogenous input parameters when a NN multilayer perceptron predicts the function f.
where d i (t), (i = 1, 2 . . . , 31) is the days of each power given as Nonlinear multidimensional equations could be approached using a linear structure of nonlinear functions, and the function f can be modeled as follows: The input u(t) and output y(t) of a nonlinear model are modeled and approached by: The values of a 1 , a 2 . . . , a na and b 1 , b 2 . . . , b na are constant, the system exponent number is na, and nb represents the input exponent number.
Generally, the function f is nonlinear, and the future values of y(t) are extrapolated on prior output and input u(t) values. The desired forecasted values of the power output generated by the photovoltaic system are regressed by direct feedback and connected with the output and input of the multi-layer NN. Figure 2 gives a description of the input and output in the NARX model's NN framework [38].
Electronics 2021, 10, x FOR PEER REVIEW 6 of 17 where e(t) is the deviation error.
The new value of y(t) is directly reliant on the previous output parameters and even the previous independent exogenous input parameters when a NN multilayer perceptron predicts the function f.
where ( ), ( 1,2 … ,31) is the days of each power given as Nonlinear multidimensional equations could be approached using a linear structure of nonlinear functions, and the function f can be modeled as follows: The input u(t) and output y(t) of a nonlinear model are modeled and approached by: The values of , … , and , … , are constant, the system exponent number is na, and nb represents the input exponent number.
Generally, the function f is nonlinear, and the future values of y(t) are extrapolated on prior output and input u(t) values. The desired forecasted values of the power output generated by the photovoltaic system are regressed by direct feedback and connected with the output and input of the multi-layer NN. Figure 2 gives a description of the input and output in the NARX model's NN framework [38]. The nonlinear function f has a multilayer perception and consists of different nodes structured in two layers; the first layer is named H and performs the output function, and Z represents the second layer, which refers to the value of a node's activation. In complete NNs, the activation function and system parameters have always been associated one to one and aggregated. The value of the state variables is estimated by the next step and can be modeled as follows: The weight and number of each node are external inputs in the NARX architecture model, while the activation function for each node is presented as: The nonlinear function f has a multilayer perception and consists of different nodes structured in two layers; the first layer is named H and performs the output function, and Z represents the second layer, which refers to the value of a node's activation. In complete NNs, the activation function and system parameters have always been associated one to one and aggregated. The value of the state variables is estimated by the next step and can be modeled as follows: The weight and number of each node are external inputs in the NARX architecture model, while the activation function for each node is presented as: The real values of the weights a ij , b i and c i are fixed, and the output layer y(t) is made up of a single linear node, written as:

Levenberg-Marquardt Training Algorithm
The LMA is an adaptable and flexible methodology that interpolates between the Gauss-Newton algorithm (GNA) and the Gradient-Descent Method (GDM) [39] to identify a multidimensional function's lowest values [40]. As evidenced by many applications, the LMA is more trustworthy than the GNA, and it may identify a solution even when starting from a very low point [41]. The LMA uses the GDM to initialize the variables of the model in the contrary direction of the gradient of the objective function if the first assumption of the variable's type is not enough to reach their optimal point. The minimal element is calculated as the sum of squares from nonlinear real-valued functions [42]. If the initial estimate is around or close to the optimum, the LMA selects the GNA by supposing that the objective function is quadratic [43].
If LMA uses a sum of squares as an objective function using the Hessian matrix, and considering the mathematical equation of LMA as a linear approximation of the function f, the LMA can be formulated as [44]: where J indicates the Jacobian matrix, x k+1 the Hessian matrix, x k represents the weight vector, µ represents the scalar operator, I is the identity matrix, and ε represents the residual error vector.

Bayesian Regularization Training Algorithm
The BRA is recognized to be one of the best methods for handling NN learning problems since it can randomly select regularization coefficients and combines the highconvergence qualities of traditional BP with earlier Bayesian statistics [45]. Regularization is a training method that significantly enhances the recognition accuracy by incorporating a limitation notion into the NN's training criterion component in order to ensure that the network's weight connection is properly configured [46]. The supplementary term is written in the following form: where E D and E w are the sum of squared network errors and the sum of squared network weights, respectively. While α and β denote the regularization parameters [47], their values can be determined by: Electronics 2021, 10, 1953 8 of 17 where γ is the effective weight and H is the Hessian matrix of the objective function.

Scaled Conjugate Gradient Training Algorithm
The SCGA technique is a set of fundamental approaches for reducing smooth functions, particularly when the dimension is largely based on conjugate directions [48]. In SCGA, the gradient is simply implemented to define the first error descent direction, which we can indicate by D(0), and then each successive direction is considered to be congruent with the preceding one, i.e., for those along which there is something, the gradient varies only in magnitude rather than direction [49]. This was created to eliminate the time-consuming process of searching for boundaries. After this, a line search mechanism is utilized to reduce the error function. In MATLAB, 'trainscg' is a network training function that uses the Scaled Conjugate Gradient (SGG) methodology to adjust weights and biases [50]. This could train a certain network that has derivative functions for the weight, net-input and transfer functions. This algorithm can be summarized in five steps, by supposing that the beginning point for the local optimization operation is w(0) [51,52]: Step 1: k = 0 and the error value in w(0) is generated from gradient vector [53] Step 2: if k modd = 0, then D(k) = −g(k) Otherwise, The coefficient λ determines how many of the previous search directions are included in the present one. This could be any of a variety of different formulations. In our investigation, we used the one proposed by Polak and Ribiere, which is more reliable in nonquadratic error functions, which, according to [54], and presented in the following equation: Step 3: To find a step size ϕ, execute a line search, starting at w(k) and moving through the directions D(k), that is close enough to the minimum of the set of variable functions given by: Step 4: This step deals with the adjustment of the approximated minimum of E, and it could be given by: Step 5: If the termination criteria are not met, move to Step 1. It is indeed noticeable that the line adopted in the proposed algorithm has a considerable impact on the algorithm's general performance.

Forecasting Performance Metrics
To evaluate and verify the efficiency and correctness of the proposed forecasting models, experimental data from the PVM system in Heliopolis university are used. Various statistical criteria are adopted and used to measure the forecast accuracy [55]. The statistical criteria are applied in a wide range of scientific disciplines to properly evaluate the suitability of a forecasting model [56]. These statistical criteria are typically determined by the difference between the experimental power P exp and the forecast power P for , referred to as the t-th time's sample. The approximated models are evaluated based on a comparison of statistical criteria to maintain the quality and effectiveness of the results.
The Individual Error (IE) is commonly used and defined as the difference between the experimental and forecasted power.
Starting from the IE, the other error indexes adopted for the assessment can be derived: Relative Error (RE): Mean Absolute Error (MAE): Mean Absolute Percentage Error (MAPE): Residual Sum of Squares (RSSE): Mean squared error (MSE): Root Mean Squared Error (RMSE): Coefficient of Determination: The R 2 was used to examine the appropriate following explanation among both observed and predicted values [57], and it was written as: where N is the number of measured powers. The autocorrelation function (ACF) indicates how and why the correlation among any given input values varies when their separation changes. For the error signal, the ACF can be presented as:

ACF =
Conv(e t , e t+k ) Var(e t )Var(e t+k )

Results
In this section, the forecasting models used are implemented by using measured data from the 175 W peak photovoltaic modules, installed at the roof-top of the Heliopolis University for Sustainable Development, Belbeis city, Egypt. The PVMs are monocrystalline, with 72 cells linked in series. The used exogenous daily experimental data are measured during the daytime and heavily depend on the weather conditions. Figure 3a shows a comparison of the specific inverter total yield by kWh of the injected power by the PVM for January, February and March of 2018.

Results
In this section, the forecasting models used are implemented by using measured data from the 175 W peak photovoltaic modules, installed at the roof-top of the Heliopolis University for Sustainable Development, Belbeis city, Egypt. The PVMs are monocrystalline, with 72 cells linked in series. The used exogenous daily experimental data are measured during the daytime and heavily depend on the weather conditions. Figure 3a shows a comparison of the specific inverter total yield by kWh of the injected power by the PVM for January, February and March of 2018.
(a) (b) Figure 3. Illustration of (a) specific inverter total yield and (b) soiling in the PVM.
The system experiences several natural problems, such as raining or soiling, which can increase the generated power energy production from the photovoltaic modules. For example, Figure 3b presents the soiling in the PVM due to wind, rain and sand. The fluctuation in power production illustrated in Figure 3a is due to the soiling caused by wind, rain and sand, because the system is installed in the desert. Köppen-Geiger categorizes the site as being in Egypt's hot desert, with warm and long summers; winters are damp, sandy and cold, and the atmosphere is mostly clear. Throughout the year, the temperature regularly ranges between 10 and 36 °C, with temperatures rarely falling below 7 °C or rising over 39 °C [58]. To illustrate and forecast the power output of the PVM, we considered daily measured data for one month. The experimental data are measured every hour during August and the former 450 sampling points considered are the input data for the NARX model.
The performance, quality and accuracy of the proposed models are compared using 10 and 6 numbers of input hidden neurons with 1, 2 and 3 delayers for the one-day pass as time delays. In order to forecast the next day's energy P(t) given previous at d day of P(t + 1) and a different series x(t), the configuration of the NARX approach is illustrated in Figure 4 [38]. Generally, the NARX model is based on the following three time-stage targets: 1. The network is trained and its deviation is corrected; Figure 3. Illustration of (a) specific inverter total yield and (b) soiling in the PVM.
The system experiences several natural problems, such as raining or soiling, which can increase the generated power energy production from the photovoltaic modules. For example, Figure 3b presents the soiling in the PVM due to wind, rain and sand. The fluctuation in power production illustrated in Figure 3a is due to the soiling caused by wind, rain and sand, because the system is installed in the desert. Köppen-Geiger categorizes the site as being in Egypt's hot desert, with warm and long summers; winters are damp, sandy and cold, and the atmosphere is mostly clear. Throughout the year, the temperature regularly ranges between 10 and 36 • C, with temperatures rarely falling below 7 • C or rising over 39 • C [58]. To illustrate and forecast the power output of the PVM, we considered daily measured data for one month. The experimental data are measured every hour during August and the former 450 sampling points considered are the input data for the NARX model.
The performance, quality and accuracy of the proposed models are compared using 10 and 6 numbers of input hidden neurons with 1, 2 and 3 delayers for the one-day pass as time delays. In order to forecast the next day's energy P(t) given previous at d day of P(t + 1) and a different series x(t), the configuration of the NARX approach is illustrated in Figure 4 [38].

Results
In this section, the forecasting models used are implemented by using measured data from the 175 W peak photovoltaic modules, installed at the roof-top of the Heliopolis University for Sustainable Development, Belbeis city, Egypt. The PVMs are monocrystalline, with 72 cells linked in series. The used exogenous daily experimental data are measured during the daytime and heavily depend on the weather conditions. Figure 3a shows a comparison of the specific inverter total yield by kWh of the injected power by the PVM for January, February and March of 2018.
(a) (b) Figure 3. Illustration of (a) specific inverter total yield and (b) soiling in the PVM.
The system experiences several natural problems, such as raining or soiling, which can increase the generated power energy production from the photovoltaic modules. For example, Figure 3b presents the soiling in the PVM due to wind, rain and sand. The fluctuation in power production illustrated in Figure 3a is due to the soiling caused by wind, rain and sand, because the system is installed in the desert. Köppen-Geiger categorizes the site as being in Egypt's hot desert, with warm and long summers; winters are damp, sandy and cold, and the atmosphere is mostly clear. Throughout the year, the temperature regularly ranges between 10 and 36 °C, with temperatures rarely falling below 7 °C or rising over 39 °C [58]. To illustrate and forecast the power output of the PVM, we considered daily measured data for one month. The experimental data are measured every hour during August and the former 450 sampling points considered are the input data for the NARX model.
The performance, quality and accuracy of the proposed models are compared using 10 and 6 numbers of input hidden neurons with 1, 2 and 3 delayers for the one-day pass as time delays. In order to forecast the next day's energy P(t) given previous at d day of P(t + 1) and a different series x(t), the configuration of the NARX approach is illustrated in Figure 4 [38]. Generally, the NARX model is based on the following three time-stage targets: 1. The network is trained and its deviation is corrected; Generally, the NARX model is based on the following three time-stage targets: 1.
The network is trained and its deviation is corrected; 2.
Validation is used to check the adaptation of the network and to avoid training from increasing the generalization of the network; 3.
Testing with low impact on training, as well as providing independent network output analysis before and after training. The NARX model is implemented, designed and trained in open-loop form, to rectify the previous training inputs that were supposed to obtain the correct outputs of the current loop. Table 1 highlights the forecasted output power effectiveness for the three training algorithms. The index metrics R 2 are calculated to assess the efficiency of the predicted results compared to the experimental results. The R 2 quantifies the correlation among predicted and measured values in the interval [0 1]; there is no positive correlation if it equals zero, or if it is one, it indicates that the connection is excellent. Generally, the low values of MSE and the high value of R 2 imply good training, and the forecasted values could be near to the experimental values.
From Table 1, the value and high value of MSE and R 2 can be obtained for the NARX-BRA. Figure 5 shows the compared performance of NARX-LMA, NARX-BRA and NARX-SCGA in terms of MSE. 2. Validation is used to check the adaptation of the network and to avoid training from increasing the generalization of the network; 3. Testing with low impact on training, as well as providing independent network output analysis before and after training.
The NARX model is implemented, designed and trained in open-loop form, to rectify the previous training inputs that were supposed to obtain the correct outputs of the current loop. Table 1 highlights the forecasted output power effectiveness for the three training algorithms. The index metrics R 2 are calculated to assess the efficiency of the predicted results compared to the experimental results. The R 2 quantifies the correlation among predicted and measured values in the interval [0 1]; there is no positive correlation if it equals zero, or if it is one, it indicates that the connection is excellent. Generally, the low values of MSE and the high value of R 2 imply good training, and the forecasted values could be near to the experimental values.
From Table 1, the value and high value of MSE and R 2 can be obtained for the NARX-BRA. Figure 5 shows the compared performance of NARX-LMA, NARX-BRA and NARX-SCGA in terms of MSE.    Figure 6 depicts a regression plot of the difference among both forecasted measured and forecasted values. The best value of R 2 = 0.9927 is obtained for NARX-BRA, which is close to 1. Figure 6 illustrates the measured power output of the PVM compared with the forecasted power using NARX-LMA, NARX-BRA and NARX-SCGA for the seven and six days. Applying 10 and 6 hidden layers, and 1 and 3 delays, the performance and accuracy of the employed NARX models are evaluated. Figure 7 presents the obtained results using the three optimization training algorithms, which are close to the measured power output of the PVM.  Figure 6 illustrates the measured power output of the PVM compared with the forecasted power using NARX-LMA, NARX-BRA and NARX-SCGA for the seven and six days. Applying 10 and 6 hidden layers, and 1 and 3 delays, the performance and accuracy of the employed NARX models are evaluated. Figure 7 presents the obtained results using the three optimization training algorithms, which are close to the measured power output of the PVM. From Figure 8, all the evaluated results are similar, and it is difficult to determine which training algorithm is more effective and performs faster. In order to compare the performance of the NARX-LMA, NARX-BRA and NARX-SCGA and present each algorithm more accurately, the IE and RE with 6 hidden layers and 3 time delays are presented and compared in Figure 8. From Figure 8, all the evaluated results are similar, and it is difficult to determine which training algorithm is more effective and performs faster. In order to compare the performance of the NARX-LMA, NARX-BRA and NARX-SCGA and present each algorithm more accurately, the IE and RE with 6 hidden layers and 3 time delays are presented and compared in Figure 8. From Figure 8, all the evaluated results are similar, and it is difficult to determine which training algorithm is more effective and performs faster. In order to compare the performance of the NARX-LMA, NARX-BRA and NARX-SCGA and present each algorithm more accurately, the IE and RE with 6 hidden layers and 3 time delays are presented and compared in Figure 8. The effectiveness of the evaluated optimization training algorithms is determined using statistical criteria. The IE indicates the systematic errors, whereas the RE provides the measurement precision. The IE presents the quantity of quantitative inaccuracy and is the difference between the experimentally measured power and the forecasted power using the three optimization training algorithms.
The RE is considered an accuracy indicator and is the fraction of a measurement's IE to the measuring being taken. In other words, this form of error is proportional to the size of the object being measured. Figure 9 presents a comparison of IE and RE using the three selected techniques, indicating that the NARX-LMA and NARX BRA present low values for IE and RE. For more precision, we need to calculate another statistical index to further determine the best optimization training algorithm. The effectiveness of the evaluated optimization training algorithms is determined using statistical criteria. The IE indicates the systematic errors, whereas the RE provides the measurement precision. The IE presents the quantity of quantitative inaccuracy and is the difference between the experimentally measured power and the forecasted power using the three optimization training algorithms.
The RE is considered an accuracy indicator and is the fraction of a measurement's IE to the measuring being taken. In other words, this form of error is proportional to the size of the object being measured. Figure 9 presents a comparison of IE and RE using the three selected techniques, indicating that the NARX-LMA and NARX BRA present low values for IE and RE. For more precision, we need to calculate another statistical index to further determine the best optimization training algorithm. The autocorrelation function (ACF) describes how and why the correlation among any consecutive signal values varies as their dispersion varies. It is a time-domain measure of stochastic process memory that provides no information about the framework's frequency content. Figure 10 depicts the ACF test, which clarifies that the forecasted output power of the The autocorrelation function (ACF) describes how and why the correlation among any consecutive signal values varies as their dispersion varies. It is a time-domain measure of stochastic process memory that provides no information about the framework's frequency content. Figure 10 depicts the ACF test, which clarifies that the forecasted output power of the PVM using NARX-LMA, NARX-BRA and NARX-SCGA validates the test, as the values are in the range of −1 and 1.
(a) (b) Figure 9. Illustration of (a) individual absolute error and (b) the relative error.
The autocorrelation function (ACF) describes how and why the correlation among any consecutive signal values varies as their dispersion varies. It is a time-domain measure of stochastic process memory that provides no information about the framework's frequency content. Figure 10 depicts the ACF test, which clarifies that the forecasted output power of the PVM using NARX-LMA, NARX-BRA and NARX-SCGA validates the test, as the values are in the range of −1 and 1.  Table 2 presents a comparison between different performance metrics for the three used training optimization algorithms. The aim is to determine the best NARX model in terms of the forecasting power output of the PVM. The lowest values of the statistical error index illustrate the best forecasting data.   Table 2 presents a comparison between different performance metrics for the three used training optimization algorithms. The aim is to determine the best NARX model in terms of the forecasting power output of the PVM. The lowest values of the statistical error index illustrate the best forecasting data. The NARX-BRA showed the lowest statistical metric values for IAE, MAE, MAPE, SSE, RMSE and R 2 values compared with NARX-LMA and NARX-SCGA. It can be concluded that the NARX-BRA is able to forecast the power output produced by the PVM.

Conclusions
This paper presents the one-month forecasting of the generated power from photovoltaic modules installed in a hot area in Egypt at the Nile delta. The monocrystalline photovoltaic modules are connected to the grid of Heliopolis University. A NARX approach is presented and used to overcome several problems that can occur during the forecasting. It was selected for its power to forecast various fields, and it is an ensemble of a neural network and time-series analysis. The NARX model was chosen due to its high flexibility of use and ability to train a nonlinear model with an input-output relationship utilizing time data.
The neural network of the NARX was trained using three algorithms, namely the Levenberg-Marquardt Algorithm, Bayesian Regularization Algorithm and Scaled Conjugate Gradient Algorithm, to attain the best performance using experimental data collected daily during the hot month of August 2018. The theoretical results obtained using the three training algorithms were compared with the experimental power data. In order to enhance the validity of the method used, various statistical indexes were calculated to compare the accuracy, quality and performance of the results. In this paper, the NARX-BRA showed the lowest values for MAE = 0.0030, MAPE = 7.8334 × 10 −6 , RSSE = 8.9073 × 10 −6 , RMSE = 1.981 × 10 −2 and R 2 = 0.99271. NARX-BRA outperformed the NARX-LMA and NARX-SCGA. The NARX model can forecast the power of a photovoltaic system under different conditions, such as ambient temperature, wind speed and solar radiation in humid and hot regions. The effectiveness and correctness of the results are, in general, highly influenced by the environment and data inputs.
Future research may address the forecasting of short-term photovoltaic module temperature and solar radiation using NARX and other optimization training algorithms. This method can also be applied for the forecasting of hybrid systems in terms of predicting the self-consumption of a photovoltaic/battery/supercapacitor system.