Solar Radiation Forecasting by Pearson Correlation Using LSTM Neural Network and ANFIS Method: Application in the West-Central Jordan

: Solar energy is one of the most important renewable energies, with many advantages over other sources. Many parameters affect the electricity generation from solar plants. This paper aims to study the inﬂuence of these parameters on predicting solar radiation and electric energy produced in the Salt-Jordan region (Middle East) using long short-term memory (LSTM) and Adaptive Network-based Fuzzy Inference System (ANFIS) models. The data relating to 24 meteorological parameters for nearly the past ﬁve years were downloaded from the MeteoBleu database. The results show that the inﬂuence of parameters on solar radiation varies according to the season. The forecasting using ANFIS provides better results when the parameter correlation with solar radiation is high (i.e., Pearson Correlation Coefﬁcient PCC between 0.95 and 1). In comparison, the LSTM neural network shows better results when correlation is low (PCC in the range 0.5–0.8). The obtained RMSE varies from 0.04 to 0.8 depending on the season and used parameters; new meteorological parameters inﬂuencing solar radiation are also investigated.


Introduction
Solar irradiation is the total quantity of electromagnetic irradiation emitted by the sun over a frequency range. Solar energy is one of the most abundant and adaptable renewable energy sources; it can be used directly or indirectly. However, among all the non-conventional energy sources, solar energy is the greatest option since it is both costeffective and ecologically friendly [1][2][3][4]. The increasingly intensive use of renewable energy sources to produce clean electricity will reduce dependence on fossil fuels, allowing a strong reduction in carbon emissions [5][6][7].
According to the Renewable Energy Policy Network for the Twenty-First Century, solar energy will reach a total production of 8000 GWatt in 2050 [8][9][10][11]. Solar irradiation is variable and intermittent, leading to significant output-power variability; this limit represents a serious challenge for the generated photovoltaic energy (PV) that must be continuously fed into the grid [4,12,13]. Solar and wind sources are particularly suitable for mega-project investments in this field. Before starting the design of any renewable energy production plant, the factors that influence solar energy production must be studied. The parameters influencing solar radiation are well known and addressed in each region worldwide. Some researchers found that the efficient way to study the relationship between solar radiation and environmental parameters is through the wide application of Machine Learning (ML) or Deep Learning (DL) techniques, providing an accurate prediction of the energy that will be produced. Therefore, different databases worldwide have been collected and generated to aid researchers in generating the best conditions and methods for forecasting solar radiation. Thus, the location, weather specifications, ML algorithm, and the number of selected parameters play an important role in forecasting operations. In other words, the forecasting procedure can be different from place to place, depending on which parameters are used and their total number. For this reason, project lenders and designers must rely on accurate and dependable forecasting models to protect their investments.
Jordan primarily depends on imported crude oil for electricity generation, becoming heavily dependent on imports to meet its energy needs. The local energy produced in 2020 (by natural gas, crude oil, and renewable energy) reached around 610 ktoe (thousand tons of oil equivalent), representing only 6% of the energy needed to meet demands. The cost of imported energy is as much as 19% of GDP (Gross Domestic Product). The shortage in electrical energy is covered by importing energy from Egypt through a submarine cable in the seasons of high demand for electrical energy. Jordan has been blessed with bountiful solar energy located within the Sunbelt. The intensity of direct solar irradiance in Jordan is one of the highest globally, with an annual daily average of 4-8 kWh/m 2 , which corresponds to 1400-2300 kWh/m 2 per year. The sun shines on more than 300 days per year.
According to the national strategy, the Jordanian government plans to boost electricity generation capacity from renewable sources to 3.22 GW by 2025. The peak load reached 4010 MW in 2022. The production of electrical energy grew at an average rate of about 12% during the last 20 years. The Government of Jordan is keen to expand solar energy usage domestically for small-and large-scale solar power generation. It is taking important steps to make solar energy a major contributor to overall energy needs, such as the legislation, the compulsory energy efficiency code, and incentives for investing in the solar energy sector. However, compared to dispatchable plants, the variability of renewable sources, including solar energy, creates a major reliability problem for any electrical power system. Two of the most difficult aspects of integrating renewables into the Jordanian power system are unpredictability and intermittent electricity delivery. Therefore, solar power forecasting is very important for stable electric grid operation and optimal dispatch. Machine learning techniques are employed to overcome these problems and successfully integrate the produced electrical power into the national grid. Various local and global parameters influence solar energy production. The availability of solar energy's reliable information becomes essential to allow the design and construction of energy production plants with high yields in economic terms. Solar irradiation forecasting is investigated and evaluated in this research work, with particular attention to the capability of new environmental parameters to influence solar irradiation levels. The meteorological parameters reported in the literature on the international level and in Jordan were focused on a few parameters such as solar radiation, wind, and air temperature. Moreover, the influence of these parameters was not studied per season, and researchers did not investigate them in the western region of Jordan. Other parameters that can influence solar radiation are analyzed in this article: specifically, evapotranspiration, humidity, and pressure. The obtained results showed that these parameters have a remarkable influence on solar radiation depending on the season. Therefore, this paper aims to study twenty-four solar radiation metrological parameters (inputs to the ML or DL algorithms) instead of a few parameters as in other research works. To the best of our knowledge, this complete set of parameters (listed in Table 1) has been tested in the west-central area of Jordan for the first time. The data, relating to a time interval of almost five years, were downloaded from the international database Meteobleu and analyzed using the Long Short-Term Memory (LSTM) and ANFIS methods to build the solar radiation forecasting model. In more detail, some parameters usually not used in the literature as they are considered not useful for the solar radiation forecasting (such as the direct and diffuse short-wave radiation, evapo-transpiration, vapor pressure deficit at 2 m, relative humidity, sunshine duration, and soil temperature), have been taken into consideration in this research work for the first time, demonstrating their effects and contribution. Table 1. The twenty-four parameters employed in our study for solar radiation forecasting.

Number Parameter
1 Solar radiation (sum of direct and diffuse short-wave radiation) (W/m 2 ) 2 Direct short-wave radiation 3 Diffuse short-wave radiation 4 Temperature (2 m above ground) 5 Vapor pressure deficit (VPD) at 2 m 6 Relative humidity (2 m above ground) 7 Growing degree days (2 m) estimates plants' growth and development, depending on the temperature variation 8 Sunshine duration 9 Soil temperature (0-10 cm under the ground level) 10 Total cloud cover (percent) 11 Low cloud cover (percent) 12 Geopotential (height 500 mb) represents the average air temperature in the vertical column 13 Evapotranspiration represents the sum of evaporation from the land surface plus transpiration from plants 14 Soil moisture (0-10 cm under the ground level) 15 Wind speed (10 m above ground) 16 Total precipitation amount (mm/m 2 ) 17 Medium cloud cover (percent) 18 Snowfall amount (cm/m 2 ) 19 Wind direction (80 m above ground) 20 High cloud cover (percent) 21 Wind gust (10 m above ground) 22 Wind speed (80 m above ground)

23
Convective available potential energy CAPE (180 mb) measures the air parcel's potential energy per kilogram of the air mass. High CAPE value means that atmosphere is unstable and would produce a strong updraft. 24 Wind Direction (10 m above ground) These parameters have been studied case by case, starting with the historical data of the solar radiation itself (as input) to predict the solar radiation values. Afterwards, by analyzing each parameter and determining its influence, they have been divided into two groups: the first contains parameters already investigated in the literature around the world (namely, solar radiation, air temperature, wind and direction speed, cloud cover, humidity, rain and snow precipitation, soil temperature, pressure, and sunshine duration). The second group contains only the parameters already used in Jordan for solar radiation forecasting (i.e., solar radiation, wind, and air temperature).

Related Work
In the literature, significant performance differences from the different ML and DL applications in solar radiation forecasting were reported. Castangia et al. [14] used five machine learning models based on Feedforward, Echo State, 1D-Convolutional, LSTM neural networks, and Random Forest (RF) method. They used in their study six parameters: the cloud cover, air temperature, relative humidity, dew point, wind bearing, and sunshine duration. The Root Mean Square Error (RMSE) equals 6.60% for hourly forecasting. However, since some significant parameters are not considered in the study, this could impact the prediction accuracy negatively. In [15], the convolutional LSTM method is used to forecast solar irradiance on several locations simultaneously; the obtained RMSE is less On the other hand, the DL applications have been successfully applied in the last few years. For example, the hybrid DL model proposed by Yan et al. [16] was applied for one-year forecasting solar radiation, comparing the obtained RMSE values for different time intervals related to the four seasons. However, the performed study considered only short time intervals to generate solar irradiance predictions.
In [17], the Support Vector Machines (SVM) and Random Forest (RF) models were applied to forecast solar irradiance using weather parameters, such as temperature, humidity, rainfall and wind speed. In this work, the authors utilize SVM and RF models to predict individual PV generator output and compare their performances. Convolutional Neural Network (CNN) and LSTM have been used to forecast solar power, utilizing input datasets related to temperature, wind speed, humidity, ground temperature; the obtained RMSE equals 0.0987 [18]. In Poolla et al. [19], the local and global meteorological data (air temperature and wind speed), in the time interval from December 2017 to May 2018, were employed for the forecasting process based on an auto-regressive time-series (ARTS) model, achieving an accuracy of 80%. However, in this research work, the air temperature and wind speed are the only weather variables considered in the prediction process.
Han et al. [20] considered air pressure, zenith angles, temperature, and humidity parameters. The cross-correlation coefficient between the predicted and measured solar radiation values was 0.947. Wang et al. studied day-ahead photovoltaic power forecasting by utilizing the convolutional LSTM networks [21]. The recorded data, collected continuously for two years, referred to five parameters as inputs to LSTM neural network: temperature, pressure, humidity, wind and speed direction. As result of the power forecasting, the RMSE showed a very acceptable value of 0.0865. However, in long-term forecasting, the efficacy of the proposed models is not investigated. A forecasting model based on Artificial Neural Network (ANN) has been proposed in [22]; it used the temperature, dew point, relative humidity, and wind speed as inputs. The Mean Absolute Percentage Error (MAPE) of 0.53% was achieved for 14 days of prediction. In the same domain of forecasting solar irradiation, the LSTM is widely used. In [23], four deep learning algorithms were independently trained to predict the solar irradiance of Johannesburg city: LSTM, CNN, Convolutional LSTM, and CNN-LSTM hybrid models. Based on obtained results, the Convolutional LSTM provided the best performance with a normalized RMSE of 1.62% (corresponding to a RMSE of 7.18).
In [24], the authors used an artificial neural network, CNN, bidirectional and stacked LSTM to predict solar irradiance values. The used parameters are humidity, station and ambient temperature, station altitude, sea level pressure, absolute pressure and wind speed. A dataset from September 2019 to February 2020 was used to obtain a Mean Absolute Error (MAE) of 41.738; the authors concluded that stacked LSTM is the best model for predicting solar irradiance. In [25], the data over four years were used to forecast temperature, precipitation, and wind speed parameters; the obtained MAE value was equal to 0.708. The authors evaluated the forecasting performance of a stacked bidirectional LSTM (SB-LSTM) approach for both day-ahead and week ahead load. They recommended three approaches to further improve the performance of SB-LSTM: increasing the size of the processed dataset, allowing capturing of variations not included in the limited available dataset, and implementing other architectures.
Some researchers found that analyzing moving clouds is very useful for predicting the future position of the clouds and sun occlusion using image processing algorithms [26]. The researchers proposed a framework to forecast the solar irradiance changes; by combining image processing and machine learning. CNNs were used for processing whole sky images with 6-month datasets; the obtained RMSE was equal to 6.11. Alvarez et al. [27] used SVM, Linear Regression (LR) and Neural Network models (NNM) to forecast the solar energy by using collected data every hour from 1 January 2020 to 5 June 2020, in Aguascalientes (Mexico). The authors used weather parameters for the forecasting, such as wind velocity and direction, temperature, pressure, humidity, sunrise time, and sunset time. Different machine learning approaches were used; the multi-layer perceptron (MLP) algorithm gave the best outcome with an MSE equal to 0.2222. In [28], the backpropagation (BP) neural network is used to construct an effective forecasting model of solar radiation. As inputs to the developed model, the authors used weather parameters, specifically the rainfall, air humidity, and clear-air index; the obtained RMSE equals 0.4708. In the literature, few studies were performed in Middle Eastern regions and partially in Jordan. Alomari et al. [29] used ANN to forecast solar radiation in central Jordan using a dataset from 15 May 2015 to 30 September 2017. Instead, Al-Sbou et al. [30] used ANN to forecast solar radiation in southern Jordan, obtaining an MSE of 0.00237. Furthermore, Shboul et al. [31] used ANN to forecast solar radiation in southern, central, and north Jordan. The dataset from 1 January 1999 to 30 September 2019, was related to wind, air temperature and solar radiation. The obtained MAPE values did not exceed the 3% value.
In our work, the proposed prediction process uses two learning models, LSTM and ANFIS, with the aim to perform accurate forecasting of solar energy radiation in westcentral Jordan based on meteorological data for the last five years. LSTM can memorize the data sequence and contains a set of modules where the data streams are captured and stored. In contrast, ANFIS is a hybrid model that uses numerical and linguistic knowledge. Its advantages include abilities of adaptation, nonlinearity, and rapid learning. The proposed prediction process aims to accurately forecast solar radiation in west-central Jordan based on meteorological data for the last five years. The two models are compared with each other. The Principal Component Analysis (PCA) is used to filter the input signals. New parameters not previously addressed in other studies in the literature are considered; it is concluded that some of these parameters used, such as direct short-wave radiation, diffuse short-wave radiation, and evapotranspiration, greatly influence solar radiation.
This research study is organized as follows: in the introduction, a brief analysis of the different machine learning classifiers proposed in the literature to predict solar radiation was provided, indicating the used parameters and obtained forecast results. Section 2 describes the methods used in this study; data standardization, Principal Component Analysis (PCA) for noise filtering, Pearson Correlation Coefficient (PCC) for feature selection, and the application of LSTM network and ANFIS for the prediction process. Section 3 is devoted to the obtained results related to the solar radiation forecasting for five years with the consideration of seasonal variation. Section 4 provides a comparison of results obtained using ANSIF and LSTM. Finally, Section 5 presents the work's conclusions.

Materials and Methods
In this research work, the data for nearly the past five years (i.e., from 1 January 2017, until 22 August 2021) were downloaded from the international database Meteobleu (https://www.meteoblue.com/en/historyplus (accessed on 2 February 2022)). This website provides meteorological data, with hourly updates relating to the twenty-four parameters shown in Table 1, starting from 1985 for worldwide locations. We have analyzed Jordan's west-central region data to forecast solar radiation as accurately as possible; the final dataset comprises 40.676 samples for each parameter. According to the Pareto principle, the dataset needs to be split up into train and test subsets with an 80:20 ratio; in other words, the learning model will use 80% of the dataset for training, while the remaining 20% (test subset) will be used for the solar radiation prediction. Figure 1 shows the main phases of the prediction process. The data are pre-processed by applying a standardization technique; afterwards, a Principal Component Analysis (PCA) noise filter is used. The most significant input variables are then selected using Pearson Correlation Coefficient (PCC) to predict solar radiation, while the remaining variables are removed from the learning set. Once the selected training data are prepared, the deep learning LSTM and machine learning ANFIS models can be trained to predict solar radiation. Finally, the trained models are evaluated by calculating the root-mean-square error (RMSE) and compared according to their prediction performance for the total period of 5 years and for each season (summer, autumn, winter, and spring). reto principle, the dataset needs to be split up into train and test subsets with an 80:20 ratio; in other words, the learning model will use 80% of the dataset for training, while the remaining 20% (test subset) will be used for the solar radiation prediction. Figure 1 shows the main phases of the prediction process. The data are pre-processed by applying a standardization technique; afterwards, a Principal Component Analysis (PCA) noise filter is used. The most significant input variables are then selected using Pearson Correlation Coefficient (PCC) to predict solar radiation, while the remaining variables are removed from the learning set. Once the selected training data are prepared, the deep learning LSTM and machine learning ANFIS models can be trained to predict solar radiation. Finally, the trained models are evaluated by calculating the root-mean-square error (RMSE) and compared according to their prediction performance for the total period of 5 years and for each season (summer, autumn, winter, and spring).    Figure 1 presents the flow chart of the proposed prediction process. We conducted two different sets of experiments related to meteorological parameters influencing solar radiation, the first one with the ANFIS model, the second one with LSTM; then, the outcomes were evaluated based on the obtained RMSE values.

Data Standardization
Data standardization aims to ensure the application of a common measurement scale to improve data quality. The standardization formula is given below [32,33]: where µ is the average value and σ the standard deviation of the dataset distribution.

Principal Component Analysis (PCA) for Noise Filtering
Karl Pearson posited principal component analysis (PCA) for the first time in 1901. It has gained specific applications in many areas, such as chemometrics, image processing, sociology, and economics. PCA is commonly used for data clustering, filtering, extraction and classification, outlier detection, data compression, and for minimizing the correlation between variables [34][35][36]. PCA is a data compression approach that reduces the data dimensionality. PCA collects the covariance matrix's eigenvectors and eigenvalues to construct the data's uncorrelated principal components. Principal components are conceived of as new axes, the orientations of which reveal the most significant variance in the source data. The eigenvectors of the covariance matrix define the directions of the principal components. The eigenvalues, on the other hand, are the coefficients or weights of the principal components that reflect the amount of variation carried by each component [37,38]. Shaker et al. [39] investigated how data mining can be employed to estimate the aggregated solar power generation from a large set of solar power generation plants without continuously measuring the output of every single site by using only the measured values from a small number of representative sites. The obtained results showed that the proposed framework is capable of estimating solar generation with good accuracy; the combination of linear regression and the proposed hybrid k-means + PCA dimension reduction method gave the best results, demonstrating that PCA is very fast and computationally efficient.
Another characteristic of PCA is that the loading factors are sorted by their contribution to the variance of the original data, which means that the first loading factor captures most of the total variance of the original data among the other elements. The second factor then accounts for the second-largest portion of the total variance, and so on. The contribution of the final loading factors will be minimal, and they are typically used to model noise. As a result, these factors may be overlooked to suppress most of the noise and eliminate the variables' redundancy. Once the loading factors are determined, they are used in connection with the original data to compute the raw data scores. Score vectors are the projection amounts of the nth feature in the original data on the entire loading vectors. Because the scores in each data set are orthogonal, uncorrelated, and span the entire data range, the pseudoinverse of the score matrix can be computed. Thus, the raw data can be represented based on the definition of the loading vectors and scores as: The T rows are the score vectors, and P columns are the loading vectors. Singular value decomposition (SVD) [40], Eigenvector decomposition [41], nonlinear iterative partial least squares (NIPALS) [42], the covariance method [12], expectation-maximization method [13], and successive average orthogonalization (SAO) are all used to generate the PCA model [35]. Figure 2 shows the effect of applying the PCA-based noise filter on a set of observations representing the temperature data; from Figure 2, it is clear that the noise variance of the signal is reduced.
For comparison, we added another denoising tool, wavelet signal denoising, and we compared the results of the two techniques. In Figure 3, wavelet signal denoising is used; it decomposes the signal into different scales, significantly improving the denoising step in the prediction process.  The denoising performance of each method (PCA and wavelet signal denoising) was assessed using Signal-to-Noise Ratio (SNR) parameter. To calculate SNR, the residual noise is determined as follows: The SNR value is then converted to decibels (dB) using the equation: It is found that the SNR_dB is equal to 47.48 and 32.25 for the PCA denoising and wavelets signal denoising methods, respectively. It is clear that denoising the temperature signal is more effective using the PCA filter.

Feature Selection
Pearson correlation coefficient (PCC) measures the linear correlation between each input feature with the solar radiation. It represents the ratio between the covariance of two features and the product of their standard deviations [43,44].  The denoising performance of each method (PCA and wavelet signal denoising) was assessed using Signal-to-Noise Ratio (SNR) parameter. To calculate SNR, the residual noise is determined as follows: The SNR value is then converted to decibels (dB) using the equation: It is found that the SNR_dB is equal to 47.48 and 32.25 for the PCA denoising and wavelets signal denoising methods, respectively. It is clear that denoising the temperature signal is more effective using the PCA filter.

Feature Selection
Pearson correlation coefficient (PCC) measures the linear correlation between each input feature with the solar radiation. It represents the ratio between the covariance of two features and the product of their standard deviations [43,44]. The denoising performance of each method (PCA and wavelet signal denoising) was assessed using Signal-to-Noise Ratio (SNR) parameter. To calculate SNR, the residual noise is determined as follows: The SNR value is then converted to decibels (dB) using the equation: It is found that the SNR_dB is equal to 47.48 and 32.25 for the PCA denoising and wavelets signal denoising methods, respectively. It is clear that denoising the temperature signal is more effective using the PCA filter.

Feature Selection
Pearson correlation coefficient (PCC) measures the linear correlation between each input feature with the solar radiation. It represents the ratio between the covariance of two features and the product of their standard deviations [43,44].
where cov is the covariance, σ x the standard deviation of one input feature, and σ y the standard deviation of the solar radiation feature (output). This technique allows determining the correlation of each meteorological measurement with solar radiation [14], correlations that can be different with the season. The prediction accuracy can be statistically evaluated using PCC as a metric; a larger PCC intuitively reflects a higher linear correlation between the predicted and true values [33,45]. Figure 4 shows the PCC values ranking for all meteorological parameters with respect to solar radiation. The threshold value equal to 0.5 (red line in Figure 4) between the selected and discarded parameters was determined during the learning phase of the model. It is found that the significant parameters, with PCC values greater than or equal to 0.
where cov is the covariance, σ the standard deviation of one input feature, and σ the standard deviation of the solar radiation feature (output). This technique allows determining the correlation of each meteorological measurement with solar radiation [14], correlations that can be different with the season. The prediction accuracy can be statistically evaluated using PCC as a metric; a larger PCC intuitively reflects a higher linear correlation between the predicted and true values [33,45]. Figure 4 shows the PCC values ranking for all meteorological parameters with respect to solar radiation. The threshold value equal to 0.5 (red line in Figure 4) between the selected and discarded parameters was determined during the learning phase of the model. It is found that the significant parameters, with PCC values greater than or equal to 0.

Evaluation Measures
The most common forecasting indices are the root-mean-square error (RMSE), the mean squared error (MSE) and mean absolute error (MAE); they are used to evaluate the performance of the solar radiation prediction. The above reported error indices between the predicted X and actual Y values in the test dataset are calculated as follows: LSTM is a recurrent neural network that remembers the problem for longer, having a chain structure to repeat the module. Other networks repeat the module whenever the input receives new information. LSTM interacts in a particular way and contains four

Evaluation Measures
The most common forecasting indices are the root-mean-square error (RMSE), the mean squared error (MSE) and mean absolute error (MAE); they are used to evaluate the performance of the solar radiation prediction. The above reported error indices between the predicted X and actual Y values in the test dataset are calculated as follows:

Long Short-Term Memory (LSTM) Network
LSTM is a recurrent neural network that remembers the problem for longer, having a chain structure to repeat the module. Other networks repeat the module whenever the input receives new information. LSTM interacts in a particular way and contains four layers of neural networks. The data transfer process is the same as standard recurrent neural networks, while the information dissemination operation is different. As information passes through, the operation decides which information needs to be further processed and which is to be discarded. The main operation consists of cells and gates ( Figure 5); the former contains various activations called sigmoids, containing certain values ranging from zero to one. They help to forget and retain information. If the data is multiplied by one, the value remains the same; if the data is multiplied by zero, the value becomes zero and disappears. layers of neural networks. The data transfer process is the same as standard recurrent neural networks, while the information dissemination operation is different. As information passes through, the operation decides which information needs to be further processed and which is to be discarded. The main operation consists of cells and gates ( Figure 5); the former contains various activations called sigmoids, containing certain values ranging from zero to one. They help to forget and retain information. If the data is multiplied by one, the value remains the same; if the data is multiplied by zero, the value becomes zero and disappears. There are three types of gates [14,[46][47][48][49][50]: -Forget gate: its function is to decide whether to keep or forget the information. Only information from previously hidden layers and current input remain with the sigmoid function. Any value closer to one will remain, while values closer to zero will disappear: where x is the input vector, h are the output of the previous block, W and U the weight matrices of the hidden state and input respectively for each gate, σ is the sigmoid activation function. -Input Gate: the front door helps to update the cell condition. Current input and previous state information go through the sigmoid function, which updates the value by multiplying it by zero and one. Likewise, for network regulation, data also pass through the tanh function (Equation (9)); i is the input gate vector.
The cell state vector aggregates the two components (old memory via the forget gate and new memory via the input gate) C is a memory from the previous block, C is defined as a memory from the current block; the "*" operator is the Hadamard product. -Output Gate: the next hidden state is set in the output gate. The sigmoid output has to be multiplied by the tanh function; the result of this multiplication decides which There are three types of gates [14,[46][47][48][49][50]: -Forget gate: its function is to decide whether to keep or forget the information. Only information from previously hidden layers and current input remain with the sigmoid function. Any value closer to one will remain, while values closer to zero will disappear: where x t is the input vector, h t−1 are the output of the previous block, W and U the weight matrices of the hidden state and input respectively for each gate, σ is the sigmoid activation function. 1. Input Gate: the front door helps to update the cell condition. Current input and previous state information go through the sigmoid function, which updates the value by multiplying it by zero and one. Likewise, for network regulation, data also pass through the tanh function (Equation (9)); i t is the input gate vector.
The cell state vector aggregates the two components (old memory via the forget gate and new memory via the input gate) Future Internet 2022, 14, 79 11 of 24 C t−1 is a memory from the previous block, C t is defined as a memory from the current block; the " * " operator is the Hadamard product. -Output Gate: the next hidden state is set in the output gate. The sigmoid output has to be multiplied by the tanh function; the result of this multiplication decides which information the hidden state h_t should carry. This hidden state is used for the prediction. After, the new hidden state and cell state will move on to the next step: where o t is the output gate vector, and h t the current block output. Table 2 illustrates the used training hyper-parameters for the LSTM neural network.

Adaptive Neuro-Fuzzy Inference System (ANFIS)
Neuro-Flous systems combine the advantages of two complementary techniques (Fuzzy system and neural network). Fuzzy systems provide a good representation of knowledge. The integration of neural networks within these systems improves their performance because of the learning capacity of neural networks.
ANFIS is an optimization method for Takagi and Sugeno's-type fuzzy inference system, based on the use of multilayer networks. ANFIS uses least squares estimation (LSE) combined with the gradient descent backpropagation methods to model a training data set ( Figure 6) [51][52][53][54]. information the hidden state h_t should carry. This hidden state is used for the prediction. After, the new hidden state and cell state will move on to the next step: where o is the output gate vector, and h the current block output. Table 2 illustrates the used training hyper-parameters for the LSTM neural network.

Adaptive Neuro-Fuzzy Inference System (ANFIS)
Neuro-Flous systems combine the advantages of two complementary techniques (Fuzzy system and neural network). Fuzzy systems provide a good representation of knowledge. The integration of neural networks within these systems improves their performance because of the learning capacity of neural networks.
ANFIS is an optimization method for Takagi and Sugeno's-type fuzzy inference system, based on the use of multilayer networks. ANFIS uses least squares estimation (LSE) combined with the gradient descent backpropagation methods to model a training data set ( Figure 6) [51][52][53][54]. The rule base contains one fuzzy if-then-rule of Takagi and Sugeno's type (with X the input and f the output data).
Rule: If x is A1, then f1 = pl x + rl where fi is the fuzzy inference according to the desired output: X Figure 6. ANFIS structure for one input variable.
The rule base contains one fuzzy if-then-rule of Takagi and Sugeno's type (with X the input and f the output data).
Rule: If x is A1, then f 1 = pl x + rl where f i is the fuzzy inference according to the desired output: where {a i , c i } is the parameter set. ANFIS uses a 5-layer MLP (Multilayer perceptron) as following described: Layer 1: Generating degree of membership: The first layer of an ANFIS-type architecture comprises as many neurons as there are fuzzy subsets in the inference system represented. Each neuron calculates the degree of truth of a particular fuzzy subset by its transfer function. The activation function of neurons i of the first layer is O 1,i ; x is the input to neuron i, and Ai is a fuzzy subset corresponding to x. O 1,i is the membership function of Ai and indicates the degree to which a given x satisfies the quantifier Ai. We choose µ Ai (x) to be in the Gaussian form.
Layer 2: Fuzzy intersection: The outputs of this layer are the weights w i of the rules; they are obtained by a simple multiplication of the inputs in each cell. The neurons receive as input the truth degree of the different fuzzy subsets making up this premise and are responsible for calculating their truth degree. The activation functions used for these neurons depend on the operators present in the rules (AND or OR).
The activation function of neurons i of this layer is the following: Layer 3: Normalization: This layer corresponds to the normalization of the weights of the rules. It calculates the ratio between the weights w i of the rule and the sum of all the weights of the rules.
Layer 4: Defuzzification: Each node i in this layer is calculated as reported in Equation (17); w i are the outputs of layer 3, whereas p i x + r i are the consequent parameters of the output function.
Layer 5: The output layer: The cell represents the sum of all the input signals and therefore returns, at the output, the approximate value of the desired function.

Results
In this study, LSTM and ANFIS learning models are used to predict the amount of solar radiation available in the west region of Jordan. Twenty-four meteorological parameters are considered in the prediction process, selected for five different scenarios (i.e., referring to the five-year time interval or specific seasons, autumn, summer, spring and winter). The results show that the degree of influence of these parameters depends on seasonal variation. The forecast RMSE related to the five-year dataset is calculated for the different scenarios. The meteorological parameters are ranked based on PCC values using LSTM and ANFIS and properly selected for the different seasons.
As a first result of the research work, Figure 7 depicts the forecasted solar radiation based on temperature parameter in the summer season using the LSTM model. When the temperature changes, the solar radiation changes in the same way with a prediction error of the solar radiation according to the temperature variation. The obtained RMSE value is 0.14, proving that the model accurately forecasts the solar radiation. In this study, LSTM and ANFIS learning models are used to predict the amount of solar radiation available in the west region of Jordan. Twenty-four meteorological parameters are considered in the prediction process, selected for five different scenarios (i.e., referring to the five-year time interval or specific seasons, autumn, summer, spring and winter). The results show that the degree of influence of these parameters depends on seasonal variation. The forecast RMSE related to the five-year dataset is calculated for the different scenarios. The meteorological parameters are ranked based on PCC values using LSTM and ANFIS and properly selected for the different seasons.
As a first result of the research work, Figure 7 depicts the forecasted solar radiation based on temperature parameter in the summer season using the LSTM model. When the temperature changes, the solar radiation changes in the same way with a prediction error of the solar radiation according to the temperature variation. The obtained RMSE value is 0.14, proving that the model accurately forecasts the solar radiation.  The RMSE values are lower for meteorological parameters with greater PCC values (close to 1) relative to solar radiation (i.e., direct short-wave radiation, diffuse short-wave radiation, and temperature). The pink area (below the red line related to the 0.5 PCC threshold) indicates the parameters set with PCC values lower than 0.5 (not significant for the forecasting process), which have therefore been excluded from the learning phase [55]. The solar radiation forecast by the ANFIS model gives a better result than the LSTM one when the PCC is high (>0.95); instead, for (0.5 ≤ PCC < 0.95), the LSTM provides better results. It  The RMSE values are lower for meteorological parameters with greater PCC values (close to 1) relative to solar radiation (i.e., direct short-wave radiation, diffuse short-wave radiation, and temperature). The pink area (below the red line related to the 0.5 PCC threshold) indicates the parameters set with PCC values lower than 0.5 (not significant for the forecasting process), which have therefore been excluded from the learning phase [55]. The solar radiation forecast by the ANFIS model gives a better result than the LSTM one when the PCC is high (>0.95); instead, for (0.5 ≤ PCC < 0.95), the LSTM provides better results. It can be concluded that the proposed methodology gives good results, with an RMSE equal to 0.12 (by ANFIS model) for direct short-wave radiation (PCC = 0.98) up to an RMSE value of 0.32 for temperature parameter (PCC = 0.80) provided by the LSTM model.   Figure 10 shows the meteorological parameters' ranking for the 2020 autumn season. The selected parameters with PCC ≥ 0.5 are listed below with decreasing PCC values: the solar radiation, direct and diffuse short-wave radiation, temperature, vapor pressure deficit at 2 m, growing degree days at 2 m elevation corrected, relative humidity, sunshine duration and evapotranspiration. Figure 11 shows the meteorological parameters' ranking for the 2020 winter season (employed dataset from 22 December 2020 to 20 March 2021). The parameters with PCC ≥ 0.5 (significant for the forecasting process) are the solar radiation, direct and diffuse short-wave radiation, temperature, evapotranspiration, vapor pressure deficit, relative humidity, growing degree days at 2 m elevation corrected, and sunshine duration.  Figure 9 shows the meteorological parameters' ranking for the 2020 summer season. The selected parameters with PCC ≥ 0.5 are listed below with decreasing PCC values: the direct and diffuse short-wave radiation, temperature, sunshine duration, growing degree days at 2 m elevation corrected, vapor pressure deficit at 2 m, and relative humidity. As explained above, the other parameters with a PCC value less than 0.5 were not used in the forecasting models, but discarded.   Figure 10 shows the meteorological parameters' ranking for the 2020 autumn season. The selected parameters with PCC ≥ 0.5 are listed below with decreasing PCC values: the solar radiation, direct and diffuse short-wave radiation, temperature, vapor pressure deficit at 2 m, growing degree days at 2 m elevation corrected, relative humid-  Figure 10 shows the meteorological parameters' ranking for the 2020 autumn season. The selected parameters with PCC ≥ 0.5 are listed below with decreasing PCC values: the solar radiation, direct and diffuse short-wave radiation, temperature, vapor pressure deficit at 2 m, growing degree days at 2 m elevation corrected, relative humidity, sunshine duration and evapotranspiration. Future Internet 2022, 14, 79 15 of 23 Figure 12 shows the meteorological parameters' ranking for the 2021 spring season. The parameters significant for the forecasting process (i.e., with PCC ≥ 0.5) are listed below with decreasing PCC values: the solar radiation, direct and diffuse short-wave radiation, temperature, vapor pressure deficit at 2 m, growing degree days at 2 m elevation corrected, and relative humidity at 2 m.    Figure 11 shows the meteorological parameters' ranking for the 2020 winter season (employed dataset from 22 December 2020 to 20 March 2021). The parameters with PCC ≥ 0.5 (significant for the forecasting process) are the solar radiation, direct and diffuse short-wave radiation, temperature, evapotranspiration, vapor pressure deficit, relative humidity, growing degree days at 2 m elevation corrected, and sunshine duration.  Figure 12 shows the meteorological parameters' ranking for the 2021 spring season. The parameters significant for the forecasting process (i.e., with PCC ≥ 0.5) are listed below with decreasing PCC values: the solar radiation, direct and diffuse short-wave radiation, temperature, vapor pressure deficit at 2 m, growing degree days at 2 m elevation corrected, and relative humidity at 2 m.    Figure 12 shows the meteorological parameters' ranking for the 2021 spring season. The parameters significant for the forecasting process (i.e., with PCC ≥ 0.5) are listed below with decreasing PCC values: the solar radiation, direct and diffuse short-wave radiation, temperature, vapor pressure deficit at 2 m, growing degree days at 2 m elevation corrected, and relative humidity at 2 m.

Discussion
In this research work, twenty-four meteorological parameters have been processed to investigate their influence on solar radiation in west-central Jordan. We used the PCC to select the most significant parameters to facilitate solar radiation prediction. After selecting the parameters, ANFIS and LSTM methods are used to forecast the solar radiation and calculate the prediction RMSE for each selected parameter according to solar radiation (i.e., with PCC value ≥ 0.5). The selected parameters have been then treated to study their influence on solar radiation with the changing of the seasons. Figure 8 shows the first attempt to study solar radiation forecasting in west-central Jordan, based on a five-year database relative to twenty-four meteorological parameters (from 1 January 2017, until 22 August 2021). Only parameters with a PCC value greater than 0.5 (listed in the previous section) were selected for the forecasting process by LSTM and ANFIS to obtain low and acceptable RMSE values. The parameters that strongly correlate with the solar radiation (PCC in the range 0.98 ÷ 1) are the solar radiation itself, direct short-wave radiation, and diffuse short-wave radiation. As for the solar radiation, ANFIS provides a low RMSE of 0.04 and an LSTM of 0.07. As regards parameters with a PCC between 0.5 and 0.8, the LSTM method certainly performs better, providing lower and acceptable RMSE values; for example, for the temperature parameter (PCC = 0.8), LSTM gives a low RMSE of 0.35, while ANFIS gives a much higher value, equal to 0.6. The sunshine duration, soil temperature and cloud cover have a low influence on solar radiation (PCC ≤ 0.5) because Jordan is poor in rain and cloud.
As for 2020 summer forecasting (Figure 9), the parameters that strongly correlate with solar radiation having PCC values between 0.98 and 1, are the solar radiation itself, direct and diffuse short-wave radiation, and temperature. Other parameters have an average correlation (PCC between 0.5 and 0.8) with solar radiation, such as the sunshine duration, growing degree days at 2 m elevation, vapor pressure deficit, and relative humidity. In particular, the sunshine duration has a remarkable influence in summer, stronger than other parameters, whereas it has no noticeable influence in other seasons.
As for the 2020 autumn season, the parameters with PCC ≥ 0.5 selected for LSTM and ANFIS analysis are shown in Figure 10. In more detail, parameters that strongly correlate with the solar radiation (i.e., PCC values between 0.98 and 1) are the solar radiation itself, and direct and diffuse short-wave radiation. Other parameters with significant correlation (PCC between 0.5 and 0.8) are temperature, vapor pressure deficit,

Discussion
In this research work, twenty-four meteorological parameters have been processed to investigate their influence on solar radiation in west-central Jordan. We used the PCC to select the most significant parameters to facilitate solar radiation prediction. After selecting the parameters, ANFIS and LSTM methods are used to forecast the solar radiation and calculate the prediction RMSE for each selected parameter according to solar radiation (i.e., with PCC value ≥ 0.5). The selected parameters have been then treated to study their influence on solar radiation with the changing of the seasons. Figure 8 shows the first attempt to study solar radiation forecasting in west-central Jordan, based on a five-year database relative to twenty-four meteorological parameters (from 1 January 2017, until 22 August 2021). Only parameters with a PCC value greater than 0.5 (listed in the previous section) were selected for the forecasting process by LSTM and ANFIS to obtain low and acceptable RMSE values. The parameters that strongly correlate with the solar radiation (PCC in the range 0.98 ÷ 1) are the solar radiation itself, direct short-wave radiation, and diffuse short-wave radiation. As for the solar radiation, ANFIS provides a low RMSE of 0.04 and an LSTM of 0.07. As regards parameters with a PCC between 0.5 and 0.8, the LSTM method certainly performs better, providing lower and acceptable RMSE values; for example, for the temperature parameter (PCC = 0.8), LSTM gives a low RMSE of 0.35, while ANFIS gives a much higher value, equal to 0.6. The sunshine duration, soil temperature and cloud cover have a low influence on solar radiation (PCC ≤ 0.5) because Jordan is poor in rain and cloud.
As for 2020 summer forecasting (Figure 9), the parameters that strongly correlate with solar radiation having PCC values between 0.98 and 1, are the solar radiation itself, direct and diffuse short-wave radiation, and temperature. Other parameters have an average correlation (PCC between 0.5 and 0.8) with solar radiation, such as the sunshine duration, growing degree days at 2 m elevation, vapor pressure deficit, and relative humidity. In particular, the sunshine duration has a remarkable influence in summer, stronger than other parameters, whereas it has no noticeable influence in other seasons.
As for the 2020 autumn season, the parameters with PCC ≥ 0.5 selected for LSTM and ANFIS analysis are shown in Figure 10. In more detail, parameters that strongly correlate with the solar radiation (i.e., PCC values between 0.98 and 1) are the solar radiation itself, and direct and diffuse short-wave radiation. Other parameters with significant correlation (PCC between 0.5 and 0.8) are temperature, vapor pressure deficit, growing degree days at 2 m elevation, relative humidity, sunshine duration, and evapotranspiration. Notably, the temperature and sunshine duration parameters have less influence in the autumn (PCC equal to 0.8 and 0.55 respectively) than the summer season with a PCC of 0.98 and 0.7, respectively. In comparison, evapotranspiration has more influence in autumn than summer and significantly influences solar radiation.
As for the 2020 winter season, the parameters with the highest correlation (PCC in the range 0.95 ÷ 1) are the solar radiation, direct and diffuse short-wave radiation. Other parameters with average correlation (PCC between 0.5 and 0.8) are the temperature, evapotranspiration, vapor pressure deficit 2 m, relative humidity, growing degree days at 2 m elevation, and sunshine duration, listed by decreasing PCC values ( Figure 11). Notably, the temperature has a lower influence (PCC = 0.8) than the summer season, with a PCC of 0.98. In contrast, the evapotranspiration parameter has a greater influence on solar radiation forecasting (PCC = 0.8) than in summer (PCC = 0.45).
As for the 2021 spring season, the parameters that strongly correlate with the solar radiation (PCC between 0.95 and 1) are solar radiation itself, direct short-wave radiation, and diffuse short-wave radiation, while other parameters have PCC values between 0.5 and 0.8 ( Figure 12). In this season, the temperature has less influence than in the summer season, with a PCC of 0.8 compared to 0.98 in summer. Evapotranspiration and sunshine duration were not selected for the forecasting process because their PCC is less than 0.5 (0.48 for both parameters in the spring season compared to PCC values of 0.80 in winter and 0.7 in summer for the two parameters).
Based on the previous analysis of solar radiation forecasting in the different seasons, the significant parameters have been grouped into two classes depending on the determined PCC value (Table 4). From the results reported in Section 3 (shown in Figures 8-12), as for first-class parameters (i.e., likely linear correlation) as inputs to the solar radiation forecasting process, the ANFIS model gives low RMSE values compared to LSTM. Instead, concerning second class parameters (PCC ranges from 0.5 to 0.8, i.e., unlikely linear correlation), the LSTM performs better than the ANFIS method based on the RMSE values for the forecasting process performed in all four seasons. This work provides the following contributions: firstly, the parameters of solar radiation, direct short-wave radiation, diffuse short-wave radiation, and temperature always have a very high degree of influence on solar radiation forecasting based on results obtained with both the complete five-year dataset as well as the seasonal ones. Secondly, evapotranspiration, sunshine duration and humidity showed a remarkable influence in west-central Jordan; instead, other parameters like cloud cover, snowfall amount, wind speed, and total precipitation amount have no influence in Jordan on the solar radiation prediction. Due to Jordan's geographical location with relatively high values of daily solar irradiance, the average sunshine duration is approximately 300 days a year, with average daily sunshine of 9.07 h. The PCC values' ranking of the different parameters with respect to solar radiation can change with the season. For example, regarding the temperature, the obtained PCC values with respect to solar radiation are 0.99 in summer, 0.85 in spring, 0.83 in autumn, and 0.81 in winter; this means that in summer, the temperature has a higher correlation with solar radiation than other seasons. The RMSE of forecasting by LSTM equals 0.14 in summer and only 0.5 in winter, as the PCC between the temperature and solar radiation decreases. Table 5 shows the five criteria researchers must consider in order to build a reliable solar radiation forecasting model; namely, test location, time duration of the study, employed parameters as inputs, machine learning models, and evaluation criteria. The experimental data were collected worldwide, whereas the last three studies were conducted in Jordan [29][30][31]. For the published research articles, the time-length of studied parameters in the analysis that influence the solar radiation prediction ranged from 5 months to 20 years. The employed parameters varied in number and type.
In the presented research work, twenty-four parameters have been involved for the first time, a significantly higher number than in the scientific literature to date. The achieved RMSE ranges from 0.04 to 0.8, which is very competitive compared to other experimental results obtained in the same region. As for the research works carried out in North America and Hawaii [15,16,19], the analysis time length ranged from 6 to 20 months, obtaining RMSE values equal to 6.11 and 0.086, respectively. This means that the short-term forecast (only six months) provides a lower outcome prediction with respect to the longer ones (up to 20 months). In these studies, the DL LSTM method was employed in [15] and [16]; up to five forecasting parameters were studied, including wind, clouds, longitude, and latitude. Compared to these published results, our research work presents some advantages, such as the longer time length (5 years), significantly lower RMSE values (in the range 0.04-0.86), and a particularly high number of studied parameters (up to 24), some of which were analyzed for the first time, to our knowledge.
In [56], the authors proposed a new short-term load forecasting model that integrates different machine learning methods, such as support vector regression (SVR), grey catastrophe, and RF modeling. The developed model, focusing on characteristics of electric load sequence as stability and flexibility sequence, can help systems to balance power supply and demand, to avoid possible catastrophes, to rationally allocate resources, and to capture trends in power system loads. In studies conducted in East-Asian countries with a time length from 6 months to 3 years, the obtained RMSE values varied from 0.086 to 1.39 [17,18,21,22]. The number of forecasting parameters was five, including the dew point and wind speed. The best RMSE (i.e., 0.086) was achieved using the LSTM model and a three-year analysis. A six-month short-term study for solar irradiation forecasting was carried out through ML methods in Mexico [27]. The best MSE values, by processing acquired data related to six ambient parameters, were obtained by MLP (Multi-layer perceptron) and RF (Random Forest) algorithms, respectively 0.222 and 0. In the study presented in [25] and conducted in Scotland for a four-year period, the determined MAE was equal to 0.525 for day-ahead and 0.708 for week-ahead forecasting by using only three parameters as inputs to the stacked bidirectional LSTM neural network. In Al-Sbou et al. [30], the minimum MSE value obtained relative to the solar radiation prediction was equal to 0.00237.
Compared to these reported performances, in this research work, we obtained better results, as shown in the following Table 5

Conclusions
This work presents two learning models, LSTM and ANFIS, to forecast solar energy radiation in west-central Jordan. The proposed ML models process meteorological data for the last five years, downloaded from the Meteobleu site (Table 1). Many new parameters, not yet studied before in the literature, were considered for solar radiation forecasting in our study. A PCC algorithm is used to indicate the most influencing parameters correlated with solar radiation to facilitate the training process with LSTM and ANFIS.
An important result of the proposed work is that new parameters greatly influence solar radiation which have not been previously investigated in other studies everywhere, such as direct short-wave radiation, diffuse short-wave radiation, temperature, sunshine duration and evapotranspiration. According to obtained results, these parameters remarkably influence solar radiation, differently depending on the seasons. Also, our study affirms that the LSTM is the best model for solar radiation forecasting when the PCC is not high (i.e., in the range 0.5-0.8). In contrast, the ANFIS model gives lower RMSE values concerning first-class parameters with a high correlation with the solar radiation (i.e., PCC values between 0.95 and 1) ( Table 4). In total 24 meteorological parameters have been analyzed, a very large set; the results showed that the influence of each parameter varies significantly according to the season; altogether this we believe is an important result not yet reported in the literature. Summarizing the experimental results reported in Section 3 for the LSTM and ANFIS models, we obtained RMSE values in the range 0.04-0.8, MSE in the range 0.0016-0.64 and MAE between 0.034 and 0.86, very competitive values compared to the existing literature as reported in the comparative Table 5. We believe the model can be improved by building a local weather station, which provides meteorological records every 10-15 min. In addition, we plan to apply the methodology by exploiting information from another region, the city of El Kerak in the south of Jordan, with specific meteorological features different from west-central Jordan.

Conflicts of Interest:
The authors declare no conflict of interest.