Literature Review on Integrating Generalized Space-Time Autoregressive Integrated Moving Average (GSTARIMA) and Deep Neural Networks in Machine Learning for Climate Forecasting

: The issue of climate change holds immense signiﬁcance, affecting various aspects of life, including the environment, the interaction between soil conditions and the atmosphere, and agriculture. Over the past few decades, a range of spatio-temporal and Deep Neural Network (DNN) techniques had been proposed within the ﬁeld of Machine Learning (ML) for climate forecasting, using spatial and temporal data. The forecasting model in this paper is highly complex, particularly due to the presence of nonlinear data in the residual modeling of General Space-Time Autoregressive Integrated Moving Average (GSTARIMA), which represented nonstationary data with time and location dependencies. This model effectively captured trends and seasonal data with time and location dependencies. On the other hand, DNNs proved reliable for modeling nonlinear data that posed challenges for spatio-temporal approaches. This research presented a comprehensive overview of the integrated approach between the GSTARIMA model and DNNs, following the six-stage Data Analytics Lifecycle methodology. The focus was primarily on previous works conducted between 2013 and 2022. The review showed that the GSTARIMA–DNN integration model was a promising tool for forecasting climate in a speciﬁc region in the future. Although spatio-temporal and DNN approaches have been widely employed for predicting the climate and its impact on human life due to their computational efﬁciency and ability to handle complex problems, the proposed method is expected to be universally accepted for integrating these models, which encompass location and time dependencies. Furthermore, it was found that the GSTARIMA–DNN method, incorporating multivariate variables, locations, and multiple hidden layers, was suitable for short-term climate forecasting. Finally, this paper presented several future directions and recommendations for further research.


Introduction
The climate is a long-term average weather condition in a specific area or zone, and it is determined by the climatic system of that region, such as the atmosphere.A spatiotemporal model can observe and analyze various aspects based on location and time, and it is integrable with other research fields for examining climate phenomena and developing decision-making strategies.This model focuses on the sequence of events that can be observed and identified according to their location and time.In the spatio-temporal model, the Box-Jenkins spatial model and time series are combined [1].One commonly studied model in this context is the Space-Time Autoregressive (STAR) model, which incorporates a spatial lag operator, representing the effect of nearest neighbors on a particular spatial location through weights.It was the first spatio-temporal model to be introduced, but it is only applicable to homogeneous locations, assuming the same parameters for each area [2].To address this limitation, the Generalized Space-Time Autoregressive (GSTAR) model is developed, which is a natural extension of the STAR model.It allows autoregressive parameters to vary across locations, making it suitable for analyzing heterogeneous samplesite characteristics [3].Furthermore, the Generalized Space-Time Autoregressive Integrated Moving Average (GSTARIMA) model is specifically designed for analyzing nonstationary spatio-temporal data [4].
Climate and air pollution are closely related, as climate change can significantly impact air quality.Previous research, such as that focused on the GSTARMA model, has focused on improving predictions of pollutant datasets within spatio-temporal frameworks [5].These predictions are crucial for understanding the economic and societal impacts of air pollution [6].Furthermore, the influence of rainfall, as a climate factor, can be estimated using GSTAR to plan rice planting seasons, which vary across different sites.GSTARIMA is employed to analyze the distribution of yields and determine pricing based on the transfer function [7][8][9].These applications extend beyond agriculture, encompassing various sectors [10].Moreover, unobserved locations can be predicted using GSTAR-Kriging [11,12].When modeling temperature and forecasting nonlinear time-series data with spatial dependency, the STARMA-GARCH hybrid model outperforms STARMA in terms of modeling efficiency and forecasting accuracy [13].Lastly, the effect of exogenous climatic factors on disease prediction during seasons with spatio-temporal variation can be examined using the Seasonal Difference Space-Time Autoregressive Integrated Moving Average (SD-STARIMA) approach.This approach is specifically designed to analyze spatiotemporal series data exhibiting seasonal distribution features [14], and can even generate a statistical spatio-temporal model [15] for forecasting photovoltaic plant electricity output in the near future [16].
In previous research, Deep Neural Networks (DNNs) have been used as part of Deep Learning (DL), which is a widely recognized approach for integrating climatological data.The supervised Convolutional Neural Network (CNN) algorithm technique has been proven to be highly effective in handling complex climate and environmental data, particularly nonlinear data; an example is the CNN forecast system for the prediction of the El Niño model [17].This has achieved a training accuracy of up to 94% [18], making it particularly suitable for datasets containing multiple time series of high-dimensional climate variables and multidimensional spatial series with undetermined sequences [19].The accurate prediction of climate variables is crucial for social and economic activities, and the CNN-LSTM hybrid model outperforms traditional Machine Learning (ML) approaches in predicting rainfall for the next three hours [20].This hybrid model has also proven to be effective in analyzing air pollution using Deep Learning [21,22] and in large storms such as typhoons [23].Additionally, wind speed and meteorological conditions can be predicted using the multilayer perceptron technique [24,25].The characteristics of big data, including large volumes, various variables, and rapid data growth, significantly impact predictions in regional tourism management.ML, particularly the random forest algorithm, plays a vital role in increasing tourist visits by providing accurate predictions [26].Meteorological forecasting, which includes enormous and complicated datasets, is intrinsically related to mitigating environmental consequences on daily activities [27].
The integration of the Space-Time and Neural Network (NN) models as part of ML is an emerging trend.These two models are being developed both independently and in hybrid forms to analyze climate data.To forecast air quality and pollution, GSTAR has been integrated with the NN model [28], and the Generalized Regression Neural Network (GRNN) model has been employed to predict solar radiation, enabling the identification of outlier data [29].Nonlinear patterns can be generated and effectively addressed using spatio-temporal techniques in conjunction with NNs [30].When downscaled weather and climate data simulations of summer monsoon rainfall are conducted, a deep convolutional architecture with data-type dependencies is selected for forecasting, as no single optimum architecture exists for all variables [31].DNN has a significant impact on the predictive analysis of nonlinear data because cluster techniques provide relatively accurate results [32]. Figure 1 shows the rapid growth of research on spatio-temporal and DL in the context of big climate data.This literature review is organized into five sections, with the first providing an overview of its purpose.In the second section, information from previous research on spatio-temporal modeling and DL in the context of climate is systematically identified.The third focuses on traditional mathematical and statistical modeling methods.The fourth explores the potential of integrating spatio-temporal and DNN models with big data concepts for future climate forecasting.Lastly, the fifth section provides a conclusion by analyzing the developed model and its ramifications.

Literature Review and Information Analysis
This research focused on the implementation of GSTARIMA model with ML using a DNN.Although there is considerable research on of GSTARIMA and NN and their implementation, it remained incomplete.The integration of GSTARIMA-DNN using an ML approach had not been extensively explored.To highlight the novelty of this research, a literature search was conducted using search engines such as Scopus, Web of Science, EBSCO, Dimensions, and other academic research sources (Other Sources).The search followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) approach.
PRISMA serves as an evidence-based minimum reporting standard for systematic reviews and meta-analyses.It is also beneficial for peer reviewers and editors [33] when critically assessing published systematic reviews.In addition, a bibliometric approach was employed, and despite its limited exploration, the method held significant theoretical and empirical value [34].Bibliometrics involves studying the scientific literature [35], utilizing mathematics and statistics as connection tools.This acted as a supporting instrument for The purpose of this research is to compile previous findings on forecasting spatiotemporal and DNN models applied to climate data.It aims to cover topics such as spatial models for stationary and nonstationary data, methods for estimating location and time parameters using various approaches, supervised ML algorithms, and the potential for integrating spatio-temporal and DNN models in climate forecasting.The findings from this research will contribute to future climate forecasting efforts, with a focus on multivariate time-series big data and the use of different DNN algorithms to enhance the accuracy and timeliness of predictions.It provides an overview of published papers on integrated Generalized Space-Time ARIMA with DNN forecasting.The examination of climatic big data datasets employs the Data Analytics Lifecycle to gain insights.To guide the analysis, the following three research questions (RQ) have been developed: This literature review is organized into five sections, with the first providing an overview of its purpose.In the second section, information from previous research on spatio-temporal modeling and DL in the context of climate is systematically identified.The third focuses on traditional mathematical and statistical modeling methods.The fourth explores the potential of integrating spatio-temporal and DNN models with big data concepts for future climate forecasting.Lastly, the fifth section provides a conclusion by analyzing the developed model and its ramifications.

Literature Review and Information Analysis
This research focused on the implementation of GSTARIMA model with ML using a DNN.Although there is considerable research on of GSTARIMA and NN and their implementation, it remained incomplete.The integration of GSTARIMA-DNN using an ML approach had not been extensively explored.To highlight the novelty of this research, a literature search was conducted using search engines such as Scopus, Web of Science, EBSCO, Dimensions, and other academic research sources (Other Sources).The search followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) approach.
PRISMA serves as an evidence-based minimum reporting standard for systematic reviews and meta-analyses.It is also beneficial for peer reviewers and editors [33] when critically assessing published systematic reviews.In addition, a bibliometric approach was employed, and despite its limited exploration, the method held significant theoretical and empirical value [34].Bibliometrics involves studying the scientific literature [35], utilizing mathematics and statistics as connection tools.This acted as a supporting instrument for parameterizing and evaluating scientific outputs [36].The approach incorporated the latest developments in scientific disciplines by reviewing related experts, cited publications, journals, and countries.It enabled mapping papers based on scientific knowledge; analyzing authors, journals, institutions, articles, and countries using keyword searches and several citations; identifying novel research topics [37]; and conducting comprehensive literature reviews [38].Additionally, it served as a medium for carrying out the complex literature review method in scientific fields [39].
The bibliographic survey in this research involved searching for the published literature that supported spatio-temporal modeling, namely GSTAR, and research on DNNs in ML.The search focused on peer-reviewed journals published in international languages.The results were saved in various formats (CSV, BIB, RIS, CIW), with a dedicated format reader.
The determination of keyword A in Table 1 of the main paper corresponded to the research topic of spatio-temporal and NN models.This was supported by keywords in the paper and aligned with the research title on the Space-Time model [3,4,40].Keyword B referred to the concept of a DNN using the MLP and CNN algorithms for analyzing time-series data in climate research [41][42][43].Keyword C encompassed the concept of the Data Analytics Lifecycle methodology, the use of climate variables that impacted the environment, and the derived theory of the backpropagation algorithm [44][45][46].A literature search was conducted to identify papers that included the main keywords, such as "Spatio Temporal", "Generalized Space-Time Autoregressive", "Machine Learning", "Multivariate Time Series", "Data Analytics Lifecycle", and "Deep Learning".These keywords were obtained from several research papers discussing the basic concepts and their application to the research object.The search engine was used to find papers based on keywords and applied inclusion restrictions such as year, language, and article type.All the papers obtained were sourced from the specified database portals.

Dataset Analysis
Figure 2 shows the systematic literature process using the PRISMA flow diagram, which consisted of three stages, namely Identification, Screening, and Included.In the Identification stage, articles were obtained from four search databases using code D, as well as other academic research sources that supported this investigation.The analysis using the PRISMA method began with determining keywords related to the topic of this proposed research.The keywords were generated from the keywords of several papers, which are the main studies described above.After obtaining the keywords shown in Table 1, in stage 2, the search (to process code D) was based on four databases and papers on academic references (Table 1).In stage 3, 575 papers were obtained through an inclusion search strategy by limiting the paper years to 2013-2022, the paper types to articles and proceedings, and the language used to English.Stage 4 removed duplicates of papers from the four search databases using reference management software; we used Mendeley to analyze them.After removing 265 duplicated articles, the screening stage led to 354 articles.From these, 168 articles were excluded as they did not match the title and supporting abstract.In the final stage for appropriate topics, we selected 47 papers as research references by conducting content The analysis using the PRISMA method began with determining keywords related to the topic of this proposed research.The keywords were generated from the keywords of several papers, which are the main studies described above.After obtaining the keywords shown in Table 1, in stage 2, the search (to process code D) was based on four databases and papers on academic references (Table 1).In stage 3, 575 papers were obtained through an inclusion search strategy by limiting the paper years to 2013-2022, the paper types to articles and proceedings, and the language used to English.Stage 4 removed duplicates of papers from the four search databases using reference management software; we used Mendeley to analyze them.After removing 265 duplicated articles, the screening stage led to 354 articles.From these, 168 articles were excluded as they did not match the title and supporting abstract.In the final stage for appropriate topics, we selected 47 papers as research references by conducting content analysis to find gaps in the research that we would conduct.
At the Included stage, 47 articles were obtained and analyzed for this research.Meanwhile, in the case of GSTAR research, there were six occurrences in 2018; CNN had thirteen, and Meteorological Data had a total of four in the 2021 publication.The visualization revealed that GSTAR research still had gaps and distant clusters from DL and CNN, which formed the basis for the analysis conducted in this research.Despite the availability of datasets, utilizing GSTARIMA for spatio-temporal modeling to evaluate data based on location and time was still frequent, particularly when integrated with DNN.

Space-Time Autoregressive (STAR)
Time-series models that incorporated univariate and multivariate periods could be observed in the ARIMA and Vector Autoregressive (VAR) models.On the other hand, the Space-Time model combined elements of location and time in a multivariate time series.It was first introduced by Pfeifer and Deutsch [2] and is known as STAR.The STAR model assumes the same parameters for all locations and was used for homogeneous locations.The STAR (p) model with order (1) was defined using a spatial lag operator to express the effect of the closest location on a particular spatial lag using weights.This could be formulated as follows: where: Meanwhile, in the case of GSTAR research, there were six occurrences in 2018; CNN had thirteen, and Meteorological Data had a total of four in the 2021 publication.The visualization revealed that GSTAR research still had gaps and distant clusters from DL and CNN, which formed the basis for the analysis conducted in this research.Despite the availability of datasets, utilizing GSTARIMA for spatio-temporal modeling to evaluate data based on location and time was still frequent, particularly when integrated with DNN.

Theoretical Background 2.3.1. Space-Time Autoregressive (STAR)
Time-series models that incorporated univariate and multivariate periods could be observed in the ARIMA and Vector Autoregressive (VAR) models.On the other hand, the Space-Time model combined elements of location and time in a multivariate time series.It was first introduced by Pfeifer and Deutsch [2] and is known as STAR.The STAR model assumes the same parameters for all locations and was used for homogeneous locations.The STAR (p) model with order (1) was defined using a spatial lag operator to express the effect of the closest location on a particular spatial lag using weights.This could be formulated as follows: where: λ k : the spatial order of the autoregressive of the kth order Z t : vector with size (n × 1) at time-t Z t−k : vector with size (n × 1) at a time (t − k) ϕ kl : the STAR parameter at time-k and spatial lag-l W (l) : matrix weight size (n × n) on spatial lag-l (with l = 1, 2, . . .), and the weights selected for w ii = 0 and ∑ i =j w ij = 1 e t : error vector with size (n × 1) at time-t, assuming e iid t ∼ N(0, σ 2 I)

Generalized Space-Time Autoregressive (GSTAR)
GSTAR, originally developed by Borovkova [3] as a natural generalization of the STAR model, allows autoregressive parameters to vary per location.The GSTAR model (p λ p ) applied to heterogeneous sample-site characteristics and was formulated when the differencing and the Moving Average vector were 0, respectively.The given formula was different in the aspect of the phi parameter, of which STAR was a scalar vector quantity while GSTAR was a matrix: where Φ kl is the diagonal matrices for the AR parameter at lag-k and spatial lag-l of size The expansion of GSTAR by adding a Moving Average (MA) element to GSTARMA (p λ p , q v q ), with a differencing of 0, produced: If Z t was an observation vector that was not stationary and the differencing process was applied to make ∇Z t = (1 − B) d Z t stationary, the GSTARIMA (p λ p , d, q v q ) model could be defined as: where: ∇Z t : observation vector with ∇Z t = [∇Z 1,t , ∇Z 1,t , . . ., ∇Z n,t ] at the time-t = 1, 2, . . ., T with size (n × 1) Θ kl : the diagonal matrices for the MA parameter lag-k and spatial lag-l are of size (n × n); diag(θ 1 kl , θ 2 kl , . . .θ n kl ) p: autoregressive vector order (AR) q: Moving Average vector order (MA) λ p : the spatial order-p of the autoregressive v q : the spatial-q order of the moving average

Machine Learning (ML)
ML is a decision-making technology adopted through Artificial Intelligence (AI), which is used in all areas of life for basic research and practical applications.The ML approach could be defined as the exploration of computational methods to test the validity of new knowledge and discover novel ways to organize existing knowledge [47].
In addition, ML offers various techniques for problem-solving, particularly in predicting the future.In this technology era, it plays crucial roles in real-time applications, such as business analytics, education, pharmaceuticals/molecular biology, manufacturing, crime detection, financial support, and marketing.The technique is predominantly employed for tackling complex problems, regardless of whether they involve structured or unstructured data.AI facilitates ML in learning from past information by adapting to and extracting valuable insights from large datasets (big data).The incorporation of ML features in data analysis [48] is very important for model development.This technique has permeated various software-based sectors and applications.
In ML, supervised learning is a potent method for classifying labeled/tagged data using learning algorithms, such as regression techniques (Linear, Logistic, Polynomial) and classification (Random Forest, Support Vector Machine, Linear Discriminant Analysis, K-Nearest Neighbor, NN Classifier), even using ensemble algorithms for algorithms in the learning model [49].This algorithm focused on qualitative data used in forecasting, considering the labels assigned to the data attributes.Supervised learning-based classification methods were utilized to build the best predictive model.
On the other hand, unsupervised learning categorized and clustered unlabeled data based on similarity.Through this technique, it became possible to discover hidden layers and patterns [50].This type of learning aided in diverse clustering applications, encompassing numerical and categorical data distribution as well as Dimensionality Reduction (Wavelet Transform, PCA Method).
The semi-supervised learning algorithm combined supervised and unsupervised approaches known as reinforcement learning.In general, this technique addressed problems on a large scale [51].Reinforcement learning constituted the third paradigm of the ML technique, which differed from other learning methods.It could shape data based on past experiences even when the data were lacking.Learning techniques within this framework were based on sequential decision-making [52].

Multilayer Perceptron (MLP)
The perceptron, which is the basic concept of the NN model, enabled the construction of more complex artificial neuron hierarchies.Figure 4 shows the architecture of the Multilayer Perceptron (MLP).The network had three input signals (x ∈ R × R × R), referred to as the input layer.Input variables were not mapped to neurons but represented real number values.In addition, the network had two output neurons arranged in the output layer.The input and output layers were visible because they were directly connected to the model.All layers in between were called hidden layers and could contain an arbitrary number of neurons.In Figure 4, there are two hidden layers, each with four neurons.The layer with all neurons connected to the preceding ones could be referred to as fully connected.This network topology, denoted as {3, 4, 4, 2} referred to a feedforward NN since the information flowed from the input to the output layer without loops.To mathematically describe NNs, proper notation and indexing schemes were required, with j neurons (j = 1, 2, . . ., m l ) and l layer (l = 1, 2, . . ., L) consisting of m l neurons from the sum v (l) j and the activation output y applications.
In ML, supervised learning is a potent method for classifying labeled/tagged data using learning algorithms, such as regression techniques (Linear, Logistic, Polynomial) and classification (Random Forest, Support Vector Machine, Linear Discriminant Analysis, K-Nearest Neighbor, NN Classifier), even using ensemble algorithms for algorithms in the learning model [49].This algorithm focused on qualitative data used in forecasting, considering the labels assigned to the data attributes.Supervised learningbased classification methods were utilized to build the best predictive model.
On the other hand, unsupervised learning categorized and clustered unlabeled data based on similarity.Through this technique, it became possible to discover hidden layers and patterns [50].This type of learning aided in diverse clustering applications, encompassing numerical and categorical data distribution as well as Dimensionality Reduction (Wavelet Transform, PCA Method).
The semi-supervised learning algorithm combined supervised and unsupervised approaches known as reinforcement learning.In general, this technique addressed problems on a large scale [51].Reinforcement learning constituted the third paradigm of the ML technique, which differed from other learning methods.It could shape data based on past experiences even when the data were lacking.Learning techniques within this framework were based on sequential decision-making [52].

Multilayer Perceptron (MLP)
The perceptron, which is the basic concept of the NN model, enabled the construction of more complex artificial neuron hierarchies.Figure 4 shows the architecture of the Multilayer Perceptron (MLP).The network had three input signals ( ∈ × × x    ), referred to as the input layer.Input variables were not mapped to neurons but represented real number values.In addition, the network had two output neurons arranged in the output layer.The input and output layers were visible because they were directly connected to the model.All layers in between were called hidden layers and could contain an arbitrary number of neurons.In Figure 4, there are two hidden layers, each with four neurons.The layer with all neurons connected to the preceding ones could be referred to as fully connected.This network topology, denoted as {3, 4, 4, 2} referred to a feedforward NN since the information flowed from the input to the output layer without loops.To mathematically describe NNs, proper notation and indexing schemes were required, with j neurons (   ) = 1 is performed for each layer.With this definition, the sum output at layer l is calculated as follows: The vector notation for the weights in layer l is obtained as: Also, the vector containing all the weights in their entirety is calculated as: w = (w (1) , w (2) , . . ., w (L) ) T (7) Equation ( 6) states that w (l) 10 is the weight in the lth hidden layer, the data input-1, and the neuron-0.Moreover, w (l) 11 is the weight on the hidden layer to-l, input data to-1 and neuron-1, continuing until weight-l, hidden layer-ml, and neuron(ml-1).Equation ( 7) states the weight vector for each hidden layer (1), (2), . . ., (L).

Convolutional Neural Network (CNN)
A CNN refers to a specialized type of MLP widely used for computer vision tasks such as image classification and time-series analysis.The network is effective in qualitative data analysis, particularly when dealing with numerical data for training and testing predictions of historical data.A CNN functions by sliding a window over the data matrix.In this review, convolution layers with multiple filters were applied, followed by an activation function for classification, and the stages are shown in Figure 5.The CNN's architecture consisted of three layers, namely convolution, pooling, and fully connected.To obtain the highest level of accuracy, the existing parameters were tested.In essence, the convolution layer modified the weights of the filtered neuron layer to produce output.The defined filters moved horizontally and vertically across each input vector matrix, generating feature maps.Padding with a value of 0 was applied to maintain the maximum length of the input.Equation ( 8) presents the process in the convolutions layer: x l j is a dependent variable of a th convolution, * is a convolution operator that multiplies the input by the kernel.x l−1 i is the ith input vector, k l ij is the kernel received from j through filtration combined with the ith feature map, while b l j is related to the jth input filter.M represents the output result of a feature, while the function f(x) serves as the activation function used in the CNN.It includes the Rectified Linear Unit (ReLU) in the convolutional layer and the sigmoid function in the output classification [54].M represents the output result of a feature, while the function f (x) serves as the activation function used in the CNN.It includes the Rectified Linear Unit (ReLU) in the convolutional layer and the sigmoid function in the output classification [54].
The pooling layer was an integral part of the CNN, serving to reduce the input dimensionality by decreasing the number of parameters from the convolution layer.The pooling method selected the maximum value from each vector as a feature using max pooling.The equation for this layer was as follows: Variable representation with value pooled as a feature of the input, f was the activation function in the input processing.Meanwhile, ψ l i represented the bias set multiplied by b l j which corresponded to the jth input bias from the convolution filter results.M indicated the output value containing the classification features required to achieve the desired model output.
The CNN included a fully connected layer, x l j as shown in Equation (11).The kernel contained in the convolution layer equation was replaced by the multiplication of the weights w l ij , which represented the process of obtaining the jth output value associated with the ith input variable.Nonlinear activation functions were necessary for this layer to produce the desired output as predictive classes.This layer was the final process for generating classification classes from the CNN [55].The use of the CNN architecture aimed to facilitate and avoid trial and error by leveraging well-established and tested models, such as LeNet-5, utilized for image detection [56], or VGGNet, introduced by Simonyan and Andrew Zisserman from the University of Oxford in 2014.

Data Analytics Lifecycle
The Data Analytics Lifecycle was designed to address the challenge of big data and Data Science, including large data volumes, diverse data structures, and rapid data growth.This lifecycle consisted of six stages, which could occur simultaneously in certain cases.In most conditions, the analysis could progress both forward and backward, allowing for an iterative approach that accommodated new information as it became available [57].This enabled problem-solving and moving through the process iteratively, also facilitating the operationalization of research goals.The Data Analysis Lifecycle established best practices for the analytical process, spanning from discovery to the completion of the research work.
The overview of the Data Analysis Lifecycle, spanning six stages, is presented as follows:  The Data Analysis Lifecycle could be repeated from steps 1 to 5 if further improvements were required.The evaluation of the modeling process, from steps 6 to 1, was indicated by dotted lines, highlighting the possibility of revisiting certain stages if the modeling results did not meet the desired criteria.

Results
A total of 47 appropriate research studies were selected for additional investigation based on the selection criteria.Table 2 shows the distribution of spatio-temporal and DL methods used in the selected papers.The most widely applied models were DNN and spatio-temporal models-location-based models offering flexibility in multivariate time-series modeling.The traditional DNN model was specifically developed to solve nonlinear time-series data problems.Hybrid combinations such as ConvLSTM, CNN-BiLSTM, ConvGRU, and CNN-SVM were special variants of DL capable of learning long-term dependencies as well as handling nonlinear and nonstationary data problems.The hybrid model had the key advantage of achieving maximum prediction accuracy with minimal error.The CNN and LTSM models had different working approaches in completing the modeling process, with CNN excelling in speed and parallelization, while LSTM operated sequentially.
Regarding the data used, satellite data predominated in the analysis, which could be obtained from various sources such as NASA, ECMWF, NOAA, ENSO, and Himawari-8.Satellite data proved to be highly effective, specifically in areas without manual measurement sensors.Converting satellite images into data matrices enabled complex analysis and minimized missing values at measurement locations.Furthermore, spatio-temporal and DNN models were employed in several articles across various domains, as shown in Table 2.For example, Jiao et al. (2022) [58] utilized three models (CNN, LSTM, GSINN) to forecast solar datasets for 17 locations.Although all models performed well, GSINN yielded the highest performance analysis value.Andayani et al. (2018) [9] employed a spatio-temporal model to compare rainfall at three different locations using GSTARIMA-X and GSTAR.The results showed that GSTARIMA-X exhibited a small error value, highlighting the significant influence of the Moving Average on the modeling.Furthermore, Cui et al. (2019) [87] applied a spatio-temporal hybrid model with ML to forecast soil moisture for detecting the growing season on the Tibetan plateau, achieving a satisfactory R 2 value.Traditional models such as STAR, GSTAR, and GSTARIMA, developed from scratch, also played a promising role in modeling location and time dependencies with stationary and nonstationary data.On the other hand, DL, as a subset of ML, was widely used and reliable for achieving high forecasting accuracy.All the application models employed appropriate methods to select data, although there were variations across different models and contexts (Kumar et al., 2022) [13]; due to the geographical dependency of the time-series data and their nonlinear features, temperature modeling and forecasting is challenging.The authors designed a hybrid model of monthly maximum temperatures and temperature ranges called the Space-Time Autoregressive Moving Average Generalized Autoregressive Conditional Heteroscedasticity (STARMA-GARCH) model.The fitted STARMA model residual was checked for nonlinear behavior.GARCH modeling is, therefore, necessary.To capture the dynamics of monthly maximum temperatures and temperature ranges, the STARMA-GARCH hybrid model was applied.

Gaps in the Literature
The analysis results in this research showed an interesting area for further investigation.The GSTARIMA model produced residuals, which were an integral part of the proposed model.The DNN process, whether using an MLP or a CNN, played a crucial role in minimizing residual values.In general, traditional spatio-temporal prediction models [8,11,89,90] or hybrids with NNs tended to use variables from a limited number of locations, or NN modeling employed a single layer [78,87,88].
The model used diverse climate data sources, prioritizing the integration of DL models without considering location parameters [20,23,[58][59][60]79].The performance analysis of DL integration yielded quite good results based on empirical simulation.However, climatic conditions in a particular location could vary due to the influence of other locations.This research did not extensively explore this gap, such as the influence of temperature, humidity, wind speed, solar radiation, and soil surface humidity, which could affect rainfall patterns across different locations.Climate analysis used more than single-variable constraints, such as predicting air pollution, rainfall, and wind speed separately [10,65,71,74,81].Although this approach enhanced model accuracy, addressing other complex parameters was necessary to provide different solutions during general climate forecasting for adjacent locations that exhibited a significant correlation.
In the equatorial regions, the spatio-temporal hybrid model and NN employed stationary data, emphasizing the GSTAR model and NN with a single hidden layer [30].Computation time and the analysis of specific variables played a notable role.The integration of the spatio-temporal model with DNN required a fast computation time, considering the hidden layer and selecting the appropriate regression activation function to improve accuracy compared to previous research.Fine-tuning the model was essential to mitigate trial and error during the integration process and optimize research time.For example, in the CNN model, the GSTARIMA residual input adopted the architecture of the output layer, which consisted of a fully connected layer.A nonlinear activation function was necessary to generate prediction classes as output.While this layer served as the final step in classifying CNN outputs [55], the use of the architecture was based on traditional tested models, such as LeNet-5 used in image detection [56], or VGGNet introduced by Simonyan and Andrew Zisserman from the University of Oxford in 2014.This architecture was primarily employed in image processing and could hypothetically enhance the accuracy of the regression DNN.

Integration of GSTARIMA with DNN for Forecasting
Based on the gap analysis and previous research reviews of existing models, GSTARIMA was integrated with DNN in a conceptual model, utilizing multiple layers and complex algorithms.Figure 7 shows the framework diagram for GSTARIMA model planning as part of the model building stage.As stated in the introduction, to answer the first research question, we explored how to integrate the two different models regarding GSTARIMA and DNN.The process began by inputting pre-processed climate data results comprising eight variables: rainfall, temperature, humidity, air pressure, wind speed, solar radiation, soil surface moisture, and root moisture.
Descriptive statistics were then obtained to examine the overall data average for each  Figure 7 shows the process of obtaining the value of e t for the GSTARIMA model using Equation (4).The error vector symbolized by e t followed independently and identically distributed (iid) in the multivariate normal distribution  Descriptive statistics were then obtained to examine the overall data average for each variable, standard deviation, maximum and minimum values, and data correlation between variables in the preparation data.The next stage involved testing data stationarity using Autocorrelation and Partial Autocorrelation functions.If the test results indicated nonstationarity in the average, a first-level differencing process was carried out on the identification process.On the other hand, if the data were stationary, the spatial weight matrix calculation was conducted using a distance weighting matrix.By plotting the Spatial Autocorrelation and Spatial Partial Autocorrelation functions, data were analyzed to observe autocorrelation and partial autocorrelation relationships, accounting for spatial dependence between locations.For parameter estimation of the GSTARIMA model, maximum likelihood was employed to obtain parameters for each location variable.Subsequently, the GSTARIMA model went through a diagnostic test to ensure no correlation existed among the residuals.The final stage of GSTARIMA modeling yielded residuals, which served as input for the DNN process in determining its architecture.
Figure 7 shows the process of obtaining the value of e t for the GSTARIMA model using Equation (4).The error vector symbolized by e t followed independently and identically distributed (iid) in the multivariate normal distribution N M (0, σ 2 I M ), while the element matrix W (l) denoted the spatial lag weight l.The values ε * t could be obtained for N locations and Z variables, generating the GSTARIMA residual.
Following residual modeling, the forecasting process employed a DNN.The design of the DNN architecture significantly influenced the performance of the model analysis.Configuring the hidden layer, the number of neurons, and updating the weights through backpropagation greatly affected the MSE and MAPE values.The residual of the GSTARIMA model representing nonlinear data served as the input for the DNN.The residual input comprised a multivariate vector of climate variables.The process utilized the DNN algorithm with a multilayer architecture, such as an MLP or a CNN.The value Ẑi,t represented a combination of GSTARIMA results Ẑ * i,t−1 and the DNN calculation of the nonlinear residual value ε * i,t .Selecting an appropriate activation function for DNN regression modeling greatly aided in fine-tuning the integration model.GSTARIMA-DNN integration model enabled short-term forecasting for the next year and long-term forecasting for 5 to 10 years.The interpretation of the integration model was visualized using geospatial thematic maps, such as choropleth, heat, [91] or dot density maps, generating evaluating insights for climate forecasting.The end of the modeling process was the interpretation of forecasting, which generates knowledge for the short and long term.
Furthermore, the integration model obtained was applied to climate data with the abovementioned eight parameters.The hypothesis was based on each residual climate variable obtained from the GSTARIMA results.Nonlinear data were processed using MLP or CNN algorithms separately.The results were returned to the initial GSTARIMA model and compared without using a DNN to obtain the Mean Absolute Percentage Error (MAPE) resulting from the two models.The main element of this integration model is the residual, which is assumed to be independent and identically distributed (iid) and normal with constant variance.Although the residuals obtained from the GSTARIMA model are not linear, Deep Neural Networks anticipate non-constant (nonlinear) data.If the residual is not linear, then it is assumed that the DNN algorithm (we use MLP and DNN) makes the residual stable due to the involvement of the Moving Average (MA) element and further validation using MAPE values.

Conclusions
This paper proposed a systematic review integrating the spatio-temporal (GSTARIMA) model and DNNs for forecasting climate datasets.The review shows that hybrid and traditional models utilizing spatio-temporal and DL techniques have achieved high accuracy in performance analysis.Incorporating spatially sourced data from satellites and temporal data facilitates the development of intricate models for representing climate and environmental phenomena across multiple regions of the world.Notably, DNNs, representing ML, have shown promising outcomes and can provide reliable climate forecasting using multivariate time-series data.Through this research, the proposal is to integrate the GSTARIMA model with a DNN, capitalizing on the respective strength of each model.GSTARIMA reliably handles nonstationary data properties characterized by location dependence and seasonal patterns.However, the complexity of the model is complemented by a DNN, which adeptly captures nonlinear data patterns from GSTARIMA residuals.This integration is achieved by utilizing algorithms such as Multilayer Perceptron or CNN, which encompass multiple hidden layers and diverse activation functions.As a part of ML, the spatio-temporal aspect of DNNs is anticipated to assume a critical role in the future of statistical and mathematical climate forecasting.

Mathematics 2023 ,Figure 1 .
Figure 1.Cumulative research topics regarding spatio-temporal, neural network, and machine learning models on climate data from 2013 to 2022.

Figure 1 .
Figure 1.Cumulative research topics regarding spatio-temporal, neural network, and machine learning models on climate data from 2013 to 2022.

•:
RQ1How does the integration of the GSTARIMA-DNN model using the ML technique work?• RQ2: How does the integration of the GSTARIMA-DNN model utilizing ML contribute to climate data forecasting?• RQ3: How does the GSTARIMA-DNN model compare to the GSTARIMA model in forecasting climate data?

Figure 2 .
Figure 2. Systematic literature review selection was used in this research.

Figure 2 .
Figure 2. Systematic literature review selection was used in this research.
Figure 3 shows the data analysis findings, divided into 4 clusters comprising 25 items.The search keywords related to DL had 11 occurrences (cases) in 2021.

Figure 4 .
Figure 4. Architecture of multilayer perceptron [53].The weight w (l) ij represents that the index j is a neuron measuring layer l, and the second index-i shows the output neuron in the previous layer l − 1.The first layer l = 0 is connected to the input value y (0) = x i .The weight w (l) j0 captures the bias value, ensuring a

Figure 6 .
Figure 6.Overview of Data Analytics Lifecycle methodology of the GSTARIMA-DNN model for climate datasets forecasting.The standard selection of the architecture (Multilayer Perceptron or CNN) determined accuracy and error values.The training and testing process of the model optimized accuracy and minimized errors by updating the GSTARIMA-DNN model integration architecture, including the hidden layer and the number of neurons, and adjusting the activation function used to determine the weight for the next layer.This ensured that the obtained results aligned with the initial target.The stage of communicating the results involved the

Figure 7 .
Figure 7. Framework diagram of GSTARIMA-DNN model integration process for forecasting.

W
denoted the spatial lag weight l.The values * ˆt ε could be obtained for N locations and Z variables, generating the GSTARIMA residual.Following residual modeling, the forecasting process employed a DNN.The design of the DNN architecture significantly influenced the performance of the model analysis.Configuring the hidden layer, the number of neurons, and updating the weights through backpropagation greatly affected the MSE and MAPE values.The residual of the GSTARIMA model representing nonlinear data served as the input for the DNN.The residual input comprised a multivariate vector of climate variables.The process utilized the DNN algorithm with a multilayer architecture, such as an MLP or a CNN.The value , ˆi t Z represented a combination of GSTARIMA results * , 1 Ẑi t− and the DNN calculation of the nonlinear residual value * , ˆi t ε .Selecting an appropriate activation function for DNNregression modeling greatly aided in fine-tuning the integration model.GSTARIMA-DNN integration model enabled short-term forecasting for the next year and long-term forecasting for 5 to 10 years.The interpretation of the integration model was visualized using geospatial thematic maps, such as choropleth, heat,[91] or dot density maps, generating evaluating insights for climate forecasting.The end of the modeling process

Figure 7 .
Figure 7. Framework diagram of GSTARIMA-DNN model integration process for forecasting.

•
Stage 4-Model Building: At this stage, the research was directed towards developing datasets for training purposes, testing, and producing output models.Consideration was given to whether the existing device supported running the model efficiently, such as fast hardware and parallel processing capabilities.

Table 2 .
Selected publications of spatio-temporal and DNN models based on ML approach for climate and environmental datasets.