Near Real ‐ Time Global Solar Radiation Forecasting at Multiple Time ‐ Step Horizons Using the Long Short ‐ Term Memory Network

: This paper aims to develop the long short ‐ term memory (LSTM) network modelling strategy based on deep learning principles, tailored for the very short ‐ term, near ‐ real ‐ time global solar radiation (GSR) forecasting. To build the prescribed LSTM model, the partial autocorrelation function is applied to the high resolution, 1 min scaled solar radiation dataset that generates statistically significant lagged predictor variables describing the antecedent behaviour of GSR. The LSTM algorithm is adopted to capture the short ‐ and the long ‐ term dependencies within the GSR data series patterns to accurately predict the future GSR at 1, 5, 10, 15, and 30 min forecasting horizons. This objective model is benchmarked at a solar energy resource rich study site (Bac ‐ Ninh, Vietnam) against the competing counterpart methods employing other deep learning, a statistical model, a single hidden layer and a machine learning ‐ based model. The LSTM model generates satisfactory predictions at multiple ‐ time step horizons, achieving a correlation coefficient exceeding 0.90, outperforming all of the counterparts. In accordance with robust statistical metrics and visual analysis of all tested data, the study ascertains the practicality of the proposed LSTM approach to generate reliable GSR forecasts. The Diebold–Mariano statistic test also shows LSTM outperforms the counterparts in most cases. The study confirms the practical utility of LSTM in renewable energy studies, and broadly in energy ‐ monitoring devices tailored for other energy variables (e.g., hydro and wind energy).


Introduction
Conventional energies (e.g., fossil fuel) have been a primary energy resource for many decades [1][2][3]; however, these resources are being replaced gradually by various renewable resources as a pivotal solution that aims to meet the future energy crisis caused by their depleting nature and the environmental damage caused by greenhouse gas emissions through the burning of carbon-positive fuel [4,5]. Following the global trends in energy exploration [6][7][8] and the recommendations of the United Nations Sustainable Development Goal that advocates a dire need for cleaner, affordable and accessible energy in all nations and regions. Thus, Vietnam has recently commenced capacity development for solar energy resources. With its geographical location close to the solar energy belt, Vietnam can harvest this energy from freely available sunlight, theoretically, providing 60-100 GWh•year −1 of solar concentrated power and 0.8-1.2 GWh•year −1 as photovoltaic power [9]. These figures advocate a continuous growth of solar energy which will meet the increasing consumer power demand. As such, it is important to develop modern technologies for energy management systems that purposely support real-time energy integration in a power grid or a distribution system [10]. An accurate near-real-time forecasting tool, especially tailored for solar energy management and proportional dispatching to and from a grid system is therefore a scientific contrivance for the update of solar energy into a real grid system [11].
Solar energy forecasting is typically based on consumer usage to provide greater stability and energy regulation, reverse management and dispatching, scheduling and unit commitments [11]. For each of the consumer's usages, the forecasting timescales can vary from being a long-term forecast (e.g., a monthly forecast) [12] to a mid-term forecast (e.g., a day-ahead forecast) [13,14] and to a shortterm forecast (e.g., an hour-ahead) period [15,16]. However, studies on a very short-term or near-realtime forecast are relatively scarce, thus, the present research work aims to fulfil this need.
There are many approaches in solar radiation forecasting, divided roughly into data-driven (or artificial intelligence) and physical (atmospheric dynamic) models [17]. Many existing studies, however, reveal limitations in forecasting techniques (e.g., computational resources to calibrate a huge volume of data, thus encountering unexpected errors) and challenges arising from complexity of predictor variables (e.g., intermittent and chaotic properties of consumer demands, meteorological and geographical data) [11,18]. To overcome these issues, the present research work is focused on developing a new modelling strategy for near-real-time solar radiation forecasting by implementing the latest deep learning techniques.
The construction of a solar radiation-forecasting model in general and the global solar radiation (GSR) model, in particular, have been intensively explored. With the recent advances of computational data science, machine learning-based forecasting models typically provide distinct advantages over physical models [17,19,20] and time-series models [21][22][23][24][25][26][27][28][29][30][31][32][33][34][35]. Models based on machine learning and neural networks have evolved over recent decades. However, common weaknesses have been reported, such as those causing bias when extending data volume, and overfitting [11]. Deep learning techniques, the latest advancement of machine learning, can solve the above issues but have not been fully explored. On the other hand, solar forecasting is relatively new in Vietnam, although there were previous studies of solar radiation [36][37][38] and solar potential mapping [39] in recent decades. Studies implementing machine learning methods for solar forecasting can be found in other Asian countries [40][41][42][43]. However, the application of these techniques has not been performed in the context of Vietnam, although a recent study with a similar approach was undertaken in Australia [44]. Nonetheless, to the best of the author's knowledge, the present study is the first exploring the predictive power of a deep learning method-the long shortterm memory (LSTM) network model for minute-ahead solar energy forecasting, particularly in the context of Vietnam where the prospect for solar powered energy systems is relatively high.
This study adopts long short-term memory (LSTM) networks, a branch of deep neural networks, which has shown an excellent ability to handle predictive issues and has been extensively employed in image recognition, automatic speech recognition and natural language processing [45][46][47][48][49][50][51][52]. LSTM is believed to overcome limitations of conventional data-driven models in capturing short-and longterm dependency between a target (e.g., future solar radiation) and corresponding historical variables and big data issues. In addition, due to its ability of removing abundant information to resolve vanishing gradient issues, LSTM is appropriate to represent the learning data over different temporal domains [17]. Hence, LSTM has been studied in solar forecasting during the past five years [15,17,44,[53][54][55][56][57][58][59][60][61][62][63]. For instance, the first study regarding LSTM [64] demonstrated its forecasting skills for one-day ahead utilizing remote-sensing data under various topographical conditions with the best root mean square error (RMSE) ~24% and mean absolute error (MAE) ~17%. In [15,65], LSTM performed well for one-day-ahead forecasting under multiple seasonal and weather conditions. Generally, although forecasting methods show their predictive skills in a different context, optimizing the forecasting methods has still been an important problem of interest. Similarly, developing an optimal LSTM model in terms of solar energy forecasting is also under consideration. Most recent papers have focused on data pre-processing techniques to optimize predictive results [59,60,66,67]; hence, there is a lack of intensive papers relating to optimizing the LSTM technique itself. Moreover, a huge dataset volume was discussed in the context of traditional time-series methods which pointed out that performance efficiency decreased against the increase of data volume. Thus, the present work aims to extensively explore optimization and performance assessment of the LSTM forecasting technique in the near-real-time case by also considering multiple performance metrics (i.e., relative prediction errors, including Willmott's index, the Nash-Sutcliff index, and the Legates and McCabe index) adopted for multiple forecast horizons ranging from 1 min to ½ hour periods.
In terms of the model performance evaluation, despite higher levels of model assessment skill in the error measurement approaches compared with the correlation coefficient (r) which represents the relationship between observed and predicted values [68], it is not totally sensible when applying RMSE and MAE alone [44,69], especially in deep learning method evaluation. Therefore, it is reasonable to apply multiple metrics in model performance evaluation to avoid their specific weaknesses [70]. For this reason, applying multiple evaluation metrics to assess the predictive performance of the LSTM method in near-real-time forecasts is a novelty in this paper.
Moreover, due to geographical location and weather condition, in some circumstances (e.g., Vietnam), it is difficult and costly to obtain such meteorological variables in a near-real-time horizon (i.e., minute interval). To address this issue, the present work will employ historical global solar radiation (GSR) time-series, data which is hardly seen in the literature.
Since the model's accuracy is expected to decrease over the passage of time, the timescale of the forecasts encompasses the next minute of the GSR data in advance, to verify the persistence skill of the LSTM model. Therefore, to address the gaps in knowledge and also to advocate the need for a sustainable real-time energy management, the novelty of this paper is to firstly develop a near-realtime solar forecasting based on the integration of the LSTM algorithm. The paper also aims to emulate the LSTM model at multiple forecast horizons (i.e., 1, 5, 10, 15, 30 min) to ensure it is validated over a much longer period.
To perform this, a time series of the GSR data measured at the minute interval at a selected location (Bac-Ninh, Vietnam) is obtained. To demonstrate the advantages of the LSTM model in terms of near-real-time solar forecasting, this paper also compares LSTM performances against those of the traditional forecasting method, autoregressive integrated moving average (ARIMA), and the wellknown machine learning methods of multilayer perceptron (MLP) network, support vector regression (SVR), and a deep learning method, deep neural network (DNN), in GSR near-real-time solar forecasting. As a representative of traditional forecast modelling, ARIMA and SVR are chosen for the modelling due to the non-stationary properties of the collected data [71]. Meanwhile, the MLP and DNN models are representative of neural network algorithms, which have been widely employed in recent decades [72,73]. To explore the predictive skill of the proposed method, the minute interval data is evaluated at multiple time horizons: 1 min (1M), 5 min (5M), 10 min (10M), 15 min (15M) and 30 min (30M) forecast.
The main contributions of this study are as follows.
1. Development and optimization of a near-real-time GSR forecasting method by implementing the LSTM algorithm for 1 minute using lagged combinations of the aggregated GSR data as the predictor variables.
2. Evaluation of the performance of the proposed model against benchmarked models (DNN, MLP, ARIMA, SVR) by a range of model evaluation metrics.
3. Implementation of the proposed models for multi-minute ahead (e.g., 5M, 10M, 15M, 30M) and evaluation of the performance of LSTM over multiple forecast horizons.
To reach these objectives, this paper is organized as follows: Section 2 reviews previous literature. Section 3 presents a theoretical overview of the objective models. In Section 4, the dataset considered is introduced and explained, detailing model tuning and benchmark algorithms. Section 5 presents model performance metrics. A discussion of empirical results is available in Section 6 before the paper concludes in Section 7.

Related Work
In terms of solar irradiance forecasting, there is no one-fits-all modelling approach; in particular, the forecast horizon determines the suitability of alternative models (e.g., to support decision-making in operational management). Previous research has studied short-term models which forecast solar irradiance from 5 min to a few hours ahead. The focus of this paper is minute ahead forecasting (e.g., 1 min, 5 min, 10 min, 15 min and 30 min). A minute horizon is established in the literature and meaningful from an economic perspective since a rise in accuracy of solar energy forecasts may facilitate major cost savings [74]. The main purpose of minute-ahead forecasts is to maintain operational security [11].
In the following, previous studies with comparable forecast horizons from the past decade are discussed. Specifically, a review of solar irradiance forecasts using machine learning algorithms with forecast horizon of 30 min and below are given. The review [75][76][77] is based on a comprehensive study of several solar energy forecasting methods. A design of solar irradiance forecasts offers several research views and these complicate cross-study comparisons. Although studies often employ a specific spatio-temporal data in a unique context of weather characteristics, there is no guarantee that a method can be successful in all places and with all different time horizons. To depict a review of minute solar energy forecasting, the following table classifies and summarizes the previous related studies in terms of forecast horizon, corresponding data, and the employed forecasting methods.
As shown in Table 1, several methods have been applied to different data sets with different spatio-temporal scales and time resolution, ranging from 1 to 7.5 min resolution. In addition, several forecast horizons have been tested, from a minute timescale to few-minutes-ahead forecasting. Moreover, Table 1 reveals few studies involving an evaluation of models across several forecast horizons and in a big data context. Except for [58], the authors devise a model for multiple forecast horizon with training sets greater than 100,000 points. In terms of forecasting technique, these studies show that machine learning (ML) has good potential in very short-term solar energy forecasting. However, a limitation of ML algorithms is the insufficient learning models for high dimensional datasets [85], which directly influences the precision and accuracy of forecasting model by over-fitting and extrapolation [86]. By considering that the GSR time series often have long-term and short-term dependency in the low-frequency approximate parts, the long short-term memory (LSTM) network, a special type of recurrent neural network (RNN), is employed to predict the decomposed low-frequency sub-layer in this study. Table 1. Firstly, the potential of LSTM has not been fully explored, especially in analysing high dimensional data at 1 min intervals. The similar conclusion can be found in [11]. Secondly, no study considers the efficiency of forecasting models for more than one forecast horizon. This might be due to the shortage of data in real time horizons. Finally, no study considers the ability of LSTM in dealing with data efficiency (e.g., data volume). Therefore, a restriction of prior studies is that they solely explain the GSR behaviour and forecasting efficiency in a specific context.

Multiple conclusions emerge from
In this paper, the aim is to overcome these issues. Firstly, the forecasting ability across multiple forecast horizons through employing GSR accelerated from 1 min interval data for each horizon is demonstrated. This technique can resolve the shortage of available datasets, which facilitates a broader demonstration of the forecasting model efficiency across multiple forecast horizons. In addition, the bias toward data efficiency through employing different partition proportions in training and testing sets is addressed. An appropriate data proportion in forecasting models is thus carefully chosen. Finally, a high number of data points (over 100,000) is used for validation to test model efficiency in terms of big data.
As the objective technique in this study, LSTM is employed to prove its potential in real-time solar radiation forecasting, as well as dealing with high dimensional real-time GSR. As shown in Table 1, this approach has not yet been fully explored and previous studies on LSTM-based solar energy forecasting faced a risk of over-tuning [87]. Overcoming the limitation of the available dataset and facilitating multiple forecast horizons, the approach implemented in this study allows the mitigation of this risk using appropriate hyperparameter testing to optimize the LSTM model. With respect to benchmark methods, another deep learning technique (i.e., DNN), two machine learning techniques (i.e., MLP, SVR) and a statistical technique (i.e., ARIMA) are developed for comparison.

Objective Predictive Model: Long Short-Term Memory (LSTM) Network
The LSTM algorithm, used recently in solar radiation modelling is a branch of the deep recurrent neural network (RNN) (Figure 1a), which is an advanced method of machine learning, feed-forward neuron networks (FFNNs) (Figure 1b) [88]. Both models apply the idea of the human brain neuron network in which each neuron (blue colour) is an information processing unit. The improvement of RNN over FFNN is feedback loops (in red colour), which are units with memory. These units can remember, re-incorporate and update information from patterns learnt from previous steps, thus, RNN can learn progressively, rather than randomly, as is the case with FFNN. The previous state of the neuron, that is, the parameters of the previous time step, can be re-incorporated and taken into account when updating the memory. However, this property of RNN causes the vanishing gradient problem that prevents RNNs learning from deep sequences of broad contexts [89]. The long short-term memory (LSTM) algorithm was introduced by Hochreiter and Schmidhuber [90] in 1997 to address the vanishing gradient issue. Figure 2 illustrates the internal structure of LSTM with the innovative memory blocks called cells from which LSTM outperforms RNNs. From Figure  2, the transmission stage is between the previous hidden layer, the cell state and the next hidden layer. The cell state is the main chain of data flow, which allows the data to flow forward essentially unchanged. However, in this cell, there are specific gates, which allow some linear transformations to occur. The main utility of the LTSM model applied in real-time modelling contexts is its capability to learn long-term dependencies among the consecutive events on a relative timestamp through incorporating self-connected "gates" in the hidden units. In the context of GSR, especially at multiple forecasting horizons in this study, this model is likely to capture more accurately the real-time dependence of the historical, the current and the future GSR values, to finally create a more representative modelling framework. The gates enable LSTM units to read, write and remove information in the memory. Thus, they allow LSTM to remember the relevant data patterns while removing the irrelevant data, hence, sustaining a constant and relatively low error level, unlike the ARIMA (and other time-series) statistical model that uses its error to propagate into the future timescale, and potentially induces the inherent inaccuracies in the testing phase.
In terms of solar energy forecasting and applications in real time, the LSTM model is expected to exploit the temporal and spatial dependence of antecedent GSR data, while utilizing the contextual information. Consequently, in recent years, the LSTM has been implemented in many fields, including solar energy prediction [55,58], although the present study is the first of its kind to develop and apply this model for Vietnam, and, in particular, at multiple forecasting horizons.

Computational Aspects of LSTM Network Model
To gain an in-depth understanding of LSTM, Figure 2 illustrates a single localized LSTM cell in the first layer of a network at the timestep t. Symbols  and  represent point-wise scalar multiplication and summation, respectively. The colour arrows   show direction of input to the systems.  is an activation function which sets the ReLU (rectified linear units) in this experiment.
These units are known as the Input Gate, Update Gate, Output Gate, and Forget Gate, and they represent the output value at the separate gates [92]. The gates normally receive an input of the same LSTM unit's output, but obtained at a previous time step ( 1 t h  ). These gates also receive the input data related to the current time step ( t x ) in order to emulate the future value of GSR at any given timestep.
With the same structure as RNN, a novel Forget Gate function enables inappropriate information to be removed and forgotten, which resolves the gradient vanishing issues of the RNN algorithm when applied to a large dataset context.
Firstly, based on the last hidden state ( 1 t h  ) and the new input t x , LSTM possibly selects the information, which is to be processed from the cell state represented by the Forget Gate ( f ): Secondly, the next step is to determine the information that is stored in the cell state. There is a new candidate  t c which is generated by t x and 1 t h  through a tanh layer. This new candidate is then scaled by the Input Gate i: Then, by combining the previous cell state C , both 1 t C  and t c , in which the former is determined in the Forget Gate ( f ) and the latter is identified by the Input Gate ( i ) as Equation (4): The above three kinds of gates are not static. The recent state information Finally, in the output process, there are two steps. The Output Gate is known as a new gate which is responsible for deciding appropriate parts from the cell state to be outputted. The cell state t c is activated by tanh function, which is then filtered through multiplying by t o . The multiplication result is the desired output t h : where , , , This study also adopts the autoregressive integrated moving average (ARIMA) model to further validate the efficacy of the LSTM Network model. The ARIMA model was popularized by the work of Box and Jenkins [93]. ARIMA analyses a set of (univariate) predictor data partitioned into a subset of input/target to validate the LSTM and other models. Using its own time-lagged information and the respective model errors, ARIMA can identify the intermittent and chaotic patterns of original GSR time-series data, which is an alternative effective skill when other methods (e.g., LSTM or others) are not available.
An ARIMA model includes three parameters (p, d, q), with p as the number of autoregressive terms, d as non-seasonal differences and q as the number of lagged errors. The ARIMA process generally involves model identification, estimation and forecasting, defined as follows: in which p  -the autoregressive parameter of order p; B-the backshift operator; t Y -the original predictor dataset;  -constant value; q  -the moving average parameter q; and d is the differencing order used for the regular or non-seasonal part of the series.
In the identification of an ARIMA model, the differencing parameter (d) is analysed by autocorrelation and partial autocorrelation to decide whether the differencing effect should be performed in a non-stationary dataset. Furthermore, p and q terms are identified for the model by analysing maximum likelihood estimation, which determines parameters maximising the probability of data by least squares. There are various terms (e.g., log likelihood, Akaike's information criterion (AIC), Bayesian information criterion (BIC), r, RMSE) to determine the maximum combination of (p, d, q). Expressions of AIC and BIC are as follows: where L is the log likelihood of data, k = 1 if c = 0 and k = 0 if c = 0; n is the sample size. The last term in brackets is the number of parameters (including the variance of the residual) A detailed description of the ARIMA model can be found elsewhere and further applications of this method can be found in other's works [31,94,95]. Generally, the ARIMA model assumes a scenario where there is no change in consecutive periodical measurements or the readings used to construct a model. Given that previous studies have employed an ARIMA model for GSR forecasting, this technique is also employed in this study in the interest of its ability in capturing historical patterns from the present time-series data.

Benchmark Model: Support Vector Regression (SVR)
Support vector regression (SVR) is a regression version of the SVM model that is usually applied in solar energy forecasting [96,97]. SVR transforms the original feature space into a high-dimensional one using a hyperplane which employs kernel functions (e.g., Gaussian, linear) to effectively separate data [98]. Herein, SVR is implemented using Python environment version 3.6 using the Sklearn library which is optimized using a grid search procedure.

Benchmark Model: Deep Neural Network (DNN)
The deep neural network is a machine learning method that has been advanced based on artificial neural networks (ANN), and is capable of trained complex input and learning procedures [45]. Similar applications of DNN in solar energy forecasting can be found in [17,99]. Herein, the deep learning library of Python retrieving solar radiation is applied. Like other neural network methods, the employed model comprised one input/output layer and multiple hidden layers. Various structures of the deep neural network were analysed to determine an optimal training model.

Benchmark Model: Multilayer Perceptron Network (MLP)
The multilayer perceptron network (MLP) is the most common type of feed-forward network [100]. MLP has three layers: an input layer, an output layer and a hidden layer. In this paper, MLP is implemented by the Python environment version 3.6 using the deep learning library. As for LSTM and DNN, various structures of MLP were analysed to determine an optimal training model.

Study Region
The data utilized to build and evaluate the proposed LSTM network model comprised the minute interval time-series of global solar radiation (GSR), acquired from the reliable source of World Bank repositories from September 2017 to June 2019. The Vietnam government aims to develop large solar plants near its capital city that can reduce load emissions and avoid a downwind situation. A chosen location, the Bac-Ninh region (Figure 3), is a small city with about 100,000 people. The city is located in the North of Vietnam, not far from the capital, at latitude 21.2013° N and longitude 106.0629° E, and elevation of 60 m above sea level. The Bac-Ninh site has a subtropical dry winter climate characterized by hot and humid summers with frequent tropical downpours of short duration, and warm and frequently dry winters [101]. This province is undergoing revitalization in terms of more sustainable future solar energy systems, which are partly funded by the World Bank. The province also meets the criteria set for the selection of the present study location for the future installation of solar measurement stations, i.e., it is solar-rich with terrain either flat or characterised by low obstructions, homogeneous landscape and land-usage (clearly represented by satellite pixels for validation), without large water bodies, mountains, dirt roads, industrial pollution, open-pit mining operations, or a danger of flooding [102,103]. Thus, the development of a solar forecasting model at multiple forecast horizons, especially in Bac-Ninh, is a justified research endeavour to support the United Nation's Sustainable Development Goal #7 related to the access to affordable renewable energies for all populations.

Data Preparation
This section details the activities of related to data preparation, including the construction of multiple time-scale datasets using 1 min original measurements, the handling of missing data, and the input of those data structures into the LSTM network. Notably, GSR measurements were performed simultaneously, 24 h a day, at equidistant time intervals of 1 min. Only the data from 06:00 to 18:00 were used for designing the predictive model as these times represent a period of meaningful daylight hours.
With the aim to construct a framework for a near-real-time prediction model, the raw 1 min time-series data were firstly used to generate the data at 5-, 10-15-and 30 min-ahead time-series data, which were then used as the target variable. Details of those data are presented in Table 2. With respect to the missing data, it is noted that missing values represent only 0.15% of the timeseries data, and were due to equipment faults or site closures in the measured period. We imputed those missing values by the mean value of the whole period [95,104]. Clearly, more powerful techniques (e.g., step-wise linear regression fit, Kalman filters) could be considered and might facilitate better imputation but given the relatively low percentage of missing data, these may not be Latitude Longitude required in this study. Moreover, these data are employed for the comparison of LSTM with the other models. Consequently, the imputation method should not influence the relative performance of the alternative forecasting methods.
To prepare the suitable number of inputs for each time-scale horizon (based on historical behaviour of short-term solar radiation measurements), the autocorrelation coefficient and the partial correlation coefficient (PACF) were employed. The detailed procedures can be found in [94]. Explicitly, the PACF function computes a time-series regression against its n-timescale lagged values by removing the dependency on intermediate elements and identifying those patterns potentially prevalent in the future GSR data that are correlated to the antecedent GSR data. This procedure aims to develop forecast models that consider the role of memory (i.e., antecedent GSR) adopted in forecasting the current GSR value, and possibly, considers several other atmospheric factors that could potentially influence ground level GSR. Consequently, the input vector , called the n-lagged set of inputs deduced from the PACF method, was then used as the LSTM model's input to predict the GSR as the target. Figure 4 shows the PACF plot of GSR time series with lagged inputs as predictor variables for the LSTM model applied at 1M, 5M, 10M, 15M, and 30M forecast horizons. The primary scope of this study is to design, for the first time in the present study region, an LSTM model that has the capability to forecast near real-time GSR using minute interval data, applied for multi-step forecasting horizons. To expand the practical scope of the modelling techniques, the developed model was applied at 1 min (1M), 5 min (5M), 10 min (10M), 15 min (15M) and ½ hourly (30M) forecasting horizons, to enable LSTM to generate a more granular interval GSR, as required in real-life decisions, for example, through constant monitoring of solar energy resources. Hence, the primary task is to construct a matrix of a training and testing dataset that can reliably be applied to the proposed LSTM model. The normalization of modelled data was accomplished by statistical rules to overcome the numerical difficulties caused by the data features, patterns and fluctuations using the conventional methods of feature scaling [105]. Normalization is applied to the n-lagged inputs [9] to be in the range of [0-1] by the following formula: After normalization, a major task was to determine the training data, to construct the predictive model, and the testing data, and thus achieve the highest performance. The partitioning of data followed the notion that researchers use different divisions between testing and training sets, which generally vary with the problem. There is no 'rule of thumb' for data divisions. In [58], the authors employed about 75% of inputs for training and the remainder for the testing set, while in [44] the partition proportion for training and testing sets was approximately 80:20. Subsequently, the normalized data are then divided into the training (80%) and testing (20%) sets (Table 3a). Noticeably, the number of data points is significantly higher than any of the previous relevant papers [15,64].

LSTM Model Implementation
Prior to developing the proposed LSTM-based solar radiation forecasting model, the historical GSR data were pre-processed at multiple forecasting time horizons. The proposed model-based LSTM was developed under the Python environment on an Intel Core i5 and 16 GB RAM computer.
The development and validation of the proposed method, as shown in Figure 5, is presented in the following steps: Step 1: Construct the data matrix which is used as the input in the first layer. The statistically significant lags were calculated from the original GSR time series data using the partial autocorrelation function (PACF). In addition, to demonstrate data efficiency in this model, we also used different partition proportion in dividing training and testing sets (Table 3b). As a result, the scale of 80:20 represents the highest performance of the LSTM, therefore, this scale is appropriate in this study.
Step 2: After incorporating the significant lagged inputs as the input predictor, the LSTM was implemented using the Keras deep learning package in Python [106]. The input layer of the trained LSTM network had four timesteps; hidden neurons were set to 80; and the output layer with a linear activation function had one neuron. In addition to these fixed values, we ran the LSTM model with different combinations of hyperparameters (epoch-drop rate-batch size) which were selected through a grid search. Step 3: To select the optimal model for each case, the LSTM algorithm begins with the change of each hyperparameter. Then, based on the recorded evaluation metrics (r-value, RMSE) in the training phase, we selected the optimal LSTM model with the highest r-value and the lowest RMSE. Table 4b presents the experimental results in the training phase with the optimal models highlighted in red. However, only those experiments for 1M are shown. After conducting all experiments, the summarized results of the optimal model for all forecasting time horizons (1M, 5M, 10M, 15M and 30M) are shown in Table 4c. For the case of an LSTM network model, the computational cost is considered to be an important aspect in terms of learning process [107], which is directly influenced by the dataset size in the training phase [107] and the respective hyperparameters [108]. To reduce the high computational cost in the learning process of the objective (i.e., LSTM) model, the hyperparameters for the model are chosen through a grid search process for the optimal parameters; however, this can be relatively timeconsuming. For instance, for each LSTM model, the search took about 11-12 h; however, when the optimal hyperparameters are determined prior to running the primary LSTM model, the computational time of the model was reduced to a much lower value (<15 min).
Generally, a hyperparameter is a parameter whose value is set before the learning process commences. There are two types of hyperparameters, namely, model hyperparameters (e.g., the size of the neural network and the number of input layers in FFNNs) and algorithm model hyperparameters (e.g., dropout and batch size). Model hyperparameters cannot be inferred during the training process since this must be referred to the model selection task. The latter, algorithm hyperparameters, in principle, can increase the speed and quality of the learning process. Therefore, determining the most appropriate hyperparameters is essential for the success of a deep learning model such as the LSTM model adopted in this study. Depending on the model types, strategies for choosing hyperparameters may vary. While some of the hyperparameters are model-specific, some common hyperparameters that can be used in any deep learning model, and that were also adopted in this study, are:


Epoch defines the number of times that the learning algorithm will work through the entire training dataset. The number of epochs is usually hundreds or thousands, allowing the learning algorithm to run until the error from the learning model is minimized. In this study, the number of epochs is set to a maximum of 2000 (Table 4).


Batch size defines the number of data points that are propagated through the network. The batch size can be seen as a for-loop iterating over one or more data points. At the end of each batch, the predicted values are compared to the actual values and the errors are calculated. From these errors, the update algorithm is used to improve the model. Depending on data length, to determine whether a greater batch size can provide the better performance, the batch size is set as in Table 3.  Dropout is a regularization layer that blocks a random set of cell units in one iteration of LSTM training. Since over-fitting is prone during training, the dropout layer creates blocked units which can remove connections in the network. Therefore, it possibly decreases the number of free data points to be predicted and the complexity of the network. The dropout rate is often set between 0 and 1. In this study, this parameter was tested between two values, 0.1 and 0.2, to determine whether a greater value of dropout rate improves LSTM performance (Table 4a).  Least absolute deviations and least square error (L1 and L2 regulation): In addition to dropout, the L1 and L2 regularization parameter is also used such that the L1 and L2 penalization parameter decreases the sum of absolute differences and the sum of square of differences between observed and forecasted values. In principle, adding a regularization term to the loss will facilitate a better network mapping (by penalizing large values of parameters which minimize the amount of nonlinearity of GSR values).  Activation function: With the exception of the output layer, all the layers within a network typically use the same activation function known as the rectified linear unit (ReLU).

Benchmark Models Implementations
To comprehensively evaluate the optimal LSTM forecasting model, five other popular forecasting models based on the ARIMA, DNN, MLP, and SVR algorithms were developed under the Python environment, version 3.6, on an Intel Core i5 computer. For the purpose of brevity and conciseness, only the results at the 1 min (1M) forecasting horizon are shown here, but the results obtained at the other forecasting horizons resulted in relatively similar deductions. Finally, following the previous steps, the optimal models based on the LSTM versus the counterpart models are shown in Table 4c for a diverse range of forecasting horizons.

Model Performance Criteria
Several methods have been previously adopted to evaluate model performance [109]. In the present work, a popular set of statistical metrics (e.g., bias, mean square error, linear correlation coefficient) are employed to assess the model performance since each individual metric has its own strength and weakness [110]. For instance, due to the standardization of observed and forecasted means and variance, the robustness of Pearson's correlation coefficient (r), which exceeds 1 as the perfect model, may have limited meaning [70,95]. Moreover, while root mean square error (RMSE) is relevant for high values, mean absolute error (MAE) assesses all deviations of observed data both in the same manner and regardless of sign [111]. RMSE and MAE are recommended to address each other's weaknesses and to obtain accuracy in an absolute unit [111]. The performance of a model can be decreased because of partial peaks and higher magnitudes, which may cause large errors and be insensitive to small magnitudes. To solve this problem, efficiency measurement indexes, such as Willmott ( WI ) and Nash-Sutcliffe ( NS E ) [112] are introduced with the advantage of overcoming insensitivity and over-dominance of significant errors over small errors [113,114]. Nevertheless, NS E is relatively high even in the poorly-fitted models and vice versa, hence, it can confuse performance evaluation [115]. Therefore, WI is implemented to be incorporated with NS E [112].
However, a certain degree of insufficiency still occurs with WI that can be improved by the Legate and McCabe index (LM) [116]. Since different forecasting horizons can lead to differences in data distribution, the relative root mean square error (RRMSE) [117] and mean absolute percentage error (MAPE) [118] are computed since they are also the benchmark of evaluating a "good" model. A model's precision level is excellent if RRMSE < 10%, good if 10% < RRMSE < 20%, fair if 20% < RRMSE < 30%, and poor if RRMSE > 30% [117]. Therefore, to properly assess model performance, in this paper, several statistical score metrics are exploited, such as the Pearson's correlation coefficient (r) [119], root mean square error [120]  E , which are adopted as the well-known metrics employed elsewhere (e.g., [79]).

Statistical Significance Testing
Based on performance metrics it is difficult to conclude whether the results are due to chance or decisive. We possibly reject a factually good parallel model since the performance metrics might be generated stochastically. Consequently, from a statistical perspective, a significant difference of forecasting performance cannot be solely judged by traditional performance metrics. Therefore, this paper employed a modern statistic evaluation method, the Diebold-Mariano (DM) test [122], which can offer a quantitative method to evaluate the forecast accuracy of forecasting model. The DM test is applicable to nonquadratic loss functions, multi-period forecasts and forecast errors that are non-Gaussian and nonzero-mean. Details of the DM test can be found in [122] and some applications of the DM test can be found in [123,124] Finally, with the help of the DM test, the interference by sample stochastic difference can be revealed, such that the better forecasting model can be identified statistically. To determine whether one forecasting model is better than another, we might first test the equal accuracy hypothesis. A null hypothesis ( o h ) means that the observed differences between the performances of two forecasting models are not significant. An alternative hypothesis (h1) means that the observed differences between the performances of two forecasting models are significant. Since the DM statistics converge to a normal distribution, we can reject the null hypothesis at the 5% level of significance if |DM| > 1.96, otherwise, we cannot reject the null hypothesis ( o h ). If the DM statistic value does not meet the acceptable criterion, then the null hypothesis cannot be accepted, i.e., the two forecasts are statistically not different. By comparing LSTM to each counterpart in turn, it can be concluded whether the LSTM model is superior than its counterparts.

Results and Discussion
In this section, the experimental results and the overall performance assessment at different forecasting horizons are presented. For each modelling experiment, five GSR forecasting models, including the proposed LSTM model and the counterpart models (i.e., ARIMA, DNN, MLP, and SVR) are employed. To demonstrate the merits of the LSTM model over the counterpart models in terms of their near-real-time solar forecasting capability, a plethora of model evaluation metrics for the testing phase, as described by Equations (9)- (16), is presented in Tables 5-7.   For all of the modelling experiments capturing the highest Pearson's correlation coefficient (r), the lowest root mean square error (RMSE), and the lowest mean absolute error (MAE), the proposed LSTM model achieves better results than the counterpart models executed for multiple time horizons. In particular, the LSTM model-simulated 1M forecast horizon outperforms all of the other developed models with the statistics r = 0.9920, RMSE = 40.9125  Figure 6 illustrates scatterplots for the observed and the forecasted GSR values of the developed models for the 1M horizon. In each panel, the coefficient of determination ( 2 R ) and a linear fit equation (

OBS FOR
G SR m G SR c   ) are shown to demonstrate the coherence between forecasted and observed GSR [104]. Here the constants-'m' (gradient) and 'c' (intercept on the y-axis)-and 2 R are utilized to outline the models' overall accuracy. Note that 2 R and m values close to 1.00 and c value close to 0 should be attained for a perfect forecasting model. Evidently, the LSTM model achieves a better degree of agreement than the corresponding counterpart models. Moreover, to demonstrate the LSTM model's outstanding performance in predicting the GSR data in the testing phase, Figure  6 also shows a time-series plot for all of the study cases, for which the forecasted values of LSTM (in red) appear to be closer to the observed GSR values (in blue) compared to those of the counterpart models. To further explore the precision of the proposed LSTM model, Table 6 presents the metrics evaluating the forecasting errors (i.e., RRMSE, LM and MAPE). As can be seen, the proposed LSTM model is seen to outperform the counterpart models in all of the study cases in terms of the lowest RRMSE and MAPE and the highest value of the LM index. Evidently, the LM values produced by LSTM for all of the multiple forecasting horizons outperform those of both of the counterpart models. For instance, LM in the 1M model is 0.9204, whereas those of counterpart models (i.e., MLP, DNN, ARIMA, and SVR) are 0.8739, 0.9062, 0.8825, and 0.8842, respectively. While it is argued that RRMSE is limited in the context of a dataset with the same variance, in our case, the RRMSE value clearly shows us which model would be better in terms of producing fewer and relatively low-magnitude errors [125]. Thus, LSTM certainly performs better than the counterpart models as it generated an RRMSE that is lower than that of the comparative models. Meanwhile, in terms of the MAPE value, the results of the proposed LSTM over multiple time horizons yield values of 16%, 48%, 100%, 86% and 116%, respectively, implying that the LSTM does not perform particularly well. However, the disadvantage of MAPE, which could perhaps explain this result, is that it generates a substantial percentage error for near-zero observed values as infinite MAPEs, and this effect can be quite pronounced if the observed GSR values are less than 1 [126]. This is a reasonable explanation for low performance in terms of MAPE since GSR time-series data, particularly at the very short-term timescales used in this study, are expected to intermittently contain numerous near-zero values in the morning as observed elsewhere [125]. Figure 7 illustrates the boxplots for the case of the 1M model that depict the different forecasting skills regarding the absolute prediction error (i.e., forecasted-observed GSR values). The lower and upper lines of the boxplot denote the first and third quartile values (25th and 75th percentiles), respectively, and the median value (50th percentile) is represented by the central line. Additionally, two horizontal lines are also drawn from the first and third quartiles to the smallest and largest nonoutliers, respectively. To concur with earlier results, the boxplot provides further justification that the distributed errors for the proposed LSTM model also acquire a much smaller spread with a correspondingly smaller magnitude of the quartile statistics and median values compared to the MLP, DNN, ARIMA, and SVR models. Lastly, to consolidate the findings presented so far that demonstrate the efficacy of LSTM model, a Taylor diagram that determines the angular location to the inverse cosine of the correlation coefficient is presented in Figure 8 to show the closest model in respect to the observed data in the testing period. The correlation coefficient (r), on the radial axis, and the standard deviation, on the polar axis, are used simultaneously to judge the closest fitting model. For all different timescales, the LSTM generates the highest value of r, with the forecasted results being closest to the observed data compared to the other comparative approaches.
In addition, the forecasting performance of the five models is compared by the DM test (Section 5). The forecasting comparison of every pair of models is summarized in Table 8. The null hypothesis means that the two forecasts have the same accuracy, otherwise, the two forecasts have different levels of accuracy in the alternative hypothesis. The statistically significant better performance of LSTM over the counterparts is indicated as 'yes'. From Table 8, the conclusions of comparison of the LSTM model and the counterparts (i.e., DNN, MLP, ARIMA and SVR) can be drawn as follows.
Firstly, since the absolute value of the DM statistic in most cases is greater than 1.96, the null hypothesis is rejected at the 5% level of significance; that is to say, the observed differences are significant and the forecasting accuracy of LSTM models is better than that of the counterparts. The exceptions are the comparison of LSTM vs. DNN at the 1M forecast horizon and that of LSTM vs. SVR at the 15M forecast horizon, with corresponding absolute DM statistics of 0.272 and 0.268, respectively, which are less than 1.96. This shows the performance of LSTM vs. DNN and LSTM vs. SVR are not significant and might be due to stochastic interference. These clearly prove that the LSTM models receive more significance than the others. In addition, the p-value at a 5% level of significance is less than 0.05, which means all models are statistically significant.
In summary, by an evaluation of forecasting based on performance metrics and the DM test, the LSTM model was demonstrated to outperform the comparative models. Thus, it is found to be a versatile solar forecasting tool, especially over short-term, multiple timescale horizons.

Further Discussion, Limitations and Future Scope
Despite the excellent performance of the developed LSTM model, as evaluated by several statistical metrics and visualized model analysis, the proposed model is further evaluated by comparing the results in this study with those of previous studies. In one such study, an LSTM model was developed for 1-hourly day-ahead solar irradiance forecasting on the island of Santiago, Cape Verde. The study employed weather variables (i.e., temperature, dew point, humidity, visibility, wind speed, weather type) for 30 months (March 2011 to August 2012). In concurrence with the present study ( Wm , respectively).
Another relevant comparative study employed LSTM to estimate hourly and daily GSR in Atlanta, New York, and Hawaii using hourly meteorological data and cloud type information from 2013 to 2017 as a training and testing population. The proposed LSTM demonstrated excellent forecasting performance for hourly forecasts on all-weather (i.e., mixed days and cloudy days). The mean absolute percentage error (MAPE) of LSTM in measured locations (i.e., Atlanta, New York, Hawaii) on cloudy days was 14.9%, 20.1% and 18.1% respectively. All r-values of LSTM were above 0.85, outperforming comparative models (i.e., ARIMA, SVR, ANN, CNN, and RNN) with one exceptional case where the r-value of RNN was higher than that of LSTM (0.91 and 0.90, respectively). Overall, however, LSTM showed its outstanding performance. The study of Ghimire et al. [44] designed a hybridized framework that integrated a convolutional neural network with LSTM for half-hourly GSR forecasting in Australia; their model was superior, with over 70% of predictive errors lying below ±10 -2 Wm . The results from the last two studies are the only close available comparisons of solar forecasting studies employing LSTM. Regarding the evaluation in terms of statistical score metrics, in this study, model-based LSTM outperformed by a noticeable margin, with outstanding r, RMSE and MAE (Table 5) in all forecasting horizons (i.e., 1M, 5M, 10M, 15M, and 30M). Moreover, the two compared studies focused solely on a specific forecasting horizon, but this study presented LSTM performance over multiple time horizons in which the r-value was over 0.9 for all cases.
In terms of optimization of the LSTM model, an epoch can be set to various times for a given dataset, and is used in the training stage. In this study, the number of epochs (Table 4a) was set to be 2000 in every case. However, the training phase stopped when the evaluation metric MAE did not increase on the validation set, in other words, the number of epochs did not reach 2000 (Table 4b). To allow LSTM to perform better, the number of epochs should be set at a higher value to possibly reach the optimized model. Therefore, the number of training times or epochs does not influence the performance in the training phase since it demonstrates a random property in practice. Moreover, it is noticeable that the LSTM performance improved when the drop rate and batch size increased (Table 4b). This aspect is also a novelty of this study.
To summarize, the newly developed model-based LSTM can be considered to be superior for near-real-time solar forecasting modelling and future solar energy management to the compared previous machine learning methods (i.e., MLP, SVR), deep learning method (i.e., DNN) and timeseries method (i.e., ARIMA).
This study supports the significant merits of a deep learning technique to attain better precision in near-real-time solar forecasting. Further, it also demonstrates the ability of models based on LSTM architecture in different forecasting horizons that can assist power generation companies in energy management. Since the r-value in very short-term horizons (i.e., 1M, 5M, 10M, 15M, 30M) is quite high, this model can be applied elsewhere with similar climatic conditions to Vietnam. However, the scope of this study could be further improved as it is restricted in terms of the forecasting horizon.
Further studies can study LSTM's ability in longer-term forecasting horizons, such as medium term (i.e., hourly, daily) and long term (i.e., weekly, monthly, yearly) to support specific application purposes. Moreover, further studies can also apply feature extraction and feature selection to develop a hybrid LSTM model. Since this study focuses on GSR from ground-based measurement, further study can apply LSTM in the context of multiple weather variables or satellite-derived variables in multiple weather conditions (i.e., so that the LSTM capability can be thoroughly explored.

Conclusions
For the first time in this study region, this paper developed and demonstrated a forecasting model-based LSTM algorithm for a near-real-time horizon using only global solar radiation times series, which is also an alternative approach for those circumstances where there is a lack of available predictor variables. The model was evaluated over multiple time horizons utilizing antecedent lagged global solar radiation (GSR) data from 2017 to 2019 in Vietnam. Moreover, several types of evaluation metrics were employed to assess the performance of the forecasting models, from which it was shown that the LSTM model yielded the most accurate results.
The LSTM models were optimized by the combination of hyperparameters (Table 3a) and were then compared to the optimized counterpart models. Evidently, the performance of the LSTM models were better in all cases, and the LSTM model was found to be superior compared to its counterparts at a 1 min horizon (Tables 5 and 6) as evidenced by its low relative forecasting errors (i.e., RMSE = 40.9125 In addition, the DM test was employed to provide an evaluation framework for different models and to provide a strict criterion to evaluate the forecasting accuracy. A meaningful evaluation conclusion of superior performance of LSTM over the counterpart models was reached when most absolute DM statistic values were greater than 1.96 at a 5% level of significance. The only two exceptions were those of LSTM vs. DNN at a 1M forecast horizon (|DM| = 0.272 < 1.96) and LSTM vs. SVR at a 15M forecast horizon (|DM| = 0.268 < 1.96) at a 5% level of significance. Moreover, the p-values at a 5% level of significance were less than 0.05, which means all models were statistically significant.
In short, this study provides a baseline investigation that is relevant to other potential models to be used in near-real-time solar forecasting in future studies. Examples include the hybridization of LSTM with other methods, such as a convolutional neural network for feature mapping, using a wrapper-based feature selection method employing several atmospheric predictor variables [42], other deep learning methods (e.g., deep neural networks), and data decomposition methods such as wavelets and ensemble mode decompositions [10,127]. While these methods can potentially improve the proposed LSTM model, the present study, as a first investigation of the near real-time forecasting of GSR, has set a clear future foundation for adopting these techniques in the context of solar radiation modelling in Vietnam. Nonetheless, the findings of this study ascertain that the standalone LSTM model could adequately capture the nonlinear dynamics of global solar radiation time-series data. The model-based LSTM can be employed in longer time horizon solar forecasting (i.e., long term, medium term, and short term). Furthermore, the government and electricity generator companies in Vietnam can use this model prior to generating solar energy to derive an optimal production strategy. Acknowledgments: The paper utilized minute-level solar radiation data from the World Bank, which are duly acknowledged. We also would like to thank Barbara Harmes (Language Centre, Open Access College, University of Southern Queensland, Australia) for providing help in proof-reading this paper. Finally, we thank both reviewers for their constructive comments that improved the clarity of the final paper.

Conflicts of Interest:
The authors declare no conflict of interest.