A Review of Neural Networks for Air Temperature Forecasting

: The accurate forecast of air temperature plays an important role in water resources management, land–atmosphere interaction, and agriculture. However, it is difﬁcult to accurately predict air temperature due to its non-linear and chaotic nature. Several deep learning techniques have been proposed over the last few decades to forecast air temperature. This study provides a comprehensive review of artiﬁcial neural network (ANN)-based approaches (such as recurrent neural network (RNN), long short-term memory (LSTM), etc.), which were used to forecast air temperature. The focus is on the works during 2005–2020. The review shows that the neural network models can be employed as promising tools to forecast air temperature. Although the ANN-based approaches have been utilized widely to predict air temperature due to their fast computing speed and ability to deal with complex problems, no consensus yet exists on the best existing method. Additionally, it is found that the ANN methods are mainly viable for short-term air temperature forecasting. Finally, some future directions and recommendations are presented.


Introduction
Global warming has recently drawn scientists' attention since it is correlated with the rise in air temperature. Increasing air temperature leads to changes in climatic conditions, such as sea-level rise, growth of extreme events, and global warming, ultimately negatively impacting humans' lives [1]. Air temperature is the state variable of the atmosphere and affects atmospheric and land surface processes [2][3][4]. Forecasting air temperature is an important part of weather prediction because it is used to protect human lives and properties. People may suffer potential health problems when the air temperature is not in a suitable range [5,6]. Extreme changes in air temperature may cause damage to plants and animals. The accurate forecast of air temperature is essential due to its significant effect on various sectors, such as industry, energy, and agriculture [7,8]. Reliable air temperature predictions increase the accuracy of energy consumption [9]. Air temperature is also one of the key factors in predicting other meteorological variables, such as streamflow [10], evapotranspiration [11], and solar radiation [12]. Therefore, finding an appropriate approach for the prediction of air temperature is vital and may mitigate the consequences of global warming and climate change. Furthermore, the accurate prediction of air temperature plays an important role in establishing a plan for human activities, energy policy, and business development [13].
Recently, models based on artificial neural networks (ANNs) have attracted scientists' attention in various disciplines, such as meteorology, water resources, and hydrology, because of their capability in capturing nonlinear relationships between inputs and outputs. Various ANNs-based approaches performed successfully in many hydrologic problems, such as flood [14], rainfall [15], water quality [16], and air temperature [17] predictions. Inspired by the biological nervous systems, ANNs are powerful tools for modeling nonlinear relations between dependent and independent variables. Generalization is one of the capabilities of ANNs, allowing them to predict patterns that were not provided to them during training. As a result, ANN forecasting models are able to provide a more promising performance than physical and statistical approaches. They are also easily accessible in commonly used programming environments (e.g., Matlab, Python, etc.) as a toolbox.
Different types of ANNs (e.g., multi-layer perceptron (MLP), recurrent neural network (RNN), long short-term memory (LSTM), convolutional neural network (CNN), etc.) have been utilized to forecast air temperature [18]. Each type has its unique structure to learn the air temperature patterns and forecast them. However, accurate air temperature forecasting has remained a major challenge (especially when the forecast time horizon increases) for many decades due to the chaotic and complex nature of air temperature data.
This paper provides a review of neural network (NN) models for air temperature forecasting. We focused on the recent studies during the last 15 years. This review paper also identifies new research problems arising from the published literature. To the best of our knowledge, this is the first review paper on the application of neural network-based techniques in predicting air temperature. In total, 26 studies that used different kinds of neural networks, such as MLP, generalized feed forward neural network (GFFNN), modular neural network (MNN), RNN, and LSTM, to predict air temperature are discussed. The review of neural network methodologies and their performance will encourage researchers to utilize these techniques to forecast air temperature.

ANN Inputs
This work focuses on the widely used neural network approaches (e.g., MLP, RNN, and LSTM) in air temperature prediction. Different studies have used various input variables as they can significantly impact the performance of models. In a number of studies (e.g., Chattopadhyay et al. [19], Ustaoglu et al. [20]), air temperature was predicted based on the historical air temperature data by accounting for time lags (the so-called univariate model). Another common approach is to use other relevant climatic variables (e.g., rainfall, air humidity, wind speed, air pressure, etc.) as inputs to forecast air temperature (the socalled multivariate model) [21,22]. Therefore, the ANN models can be categorized into two groupings: the first group uses only the historical air temperature measurements as inputs, and the second group employs air temperature and other relevant hydrologic variables.

Multilayer Perceptron (MLP)
The MLP is a feed-forward ANN, which has been used widely for air temperature prediction [19,26]. The MLP is composed of an input layer, one or more hidden layers, and an output layer [27]. The basic processing elements of the MLP are interconnected neurons or nodes, which are connected by adaptable weights (Figure 1). Each neuron receives input signals from the outputs of other neurons. The output of a neuron is a function of the weighted input, bias, and activation function [28]: where y is an output from the neuron, x i is the ith input to the neuron, w i is the connection weight of the ith input, b is the bias, and f is the activation function.
where y is an output from the neuron, i x is the ith input to the neuron, i w is the connection weight of the ith input, b is the bias, and f is the activation function. During the training process, all weights and biases are adjusted by a learning algorithm to minimize the forecasting error of networks. Then, the validation process is employed to evaluate the performance of the neural network [17].

Recurrent Neural Network (RNN)
RNN is a class of ANNs developed for processing sequential data [29]. Unlike the traditional ANNs, RNN has recurrent layers in which neurons are connected ( Figure 2). Hence, information from a neuron is transferred to the neurons in the same and next layers. As seen in Figure 2, RNN also has a hidden state to recall some sequence data. RNN computes new states by applying its activation functions to prior states and new inputs recursively. The hidden state value ( t h ) at a time step t can be obtained via:  During the training process, all weights and biases are adjusted by a learning algorithm to minimize the forecasting error of networks. Then, the validation process is employed to evaluate the performance of the neural network [17].

Recurrent Neural Network (RNN)
RNN is a class of ANNs developed for processing sequential data [29]. Unlike the traditional ANNs, RNN has recurrent layers in which neurons are connected ( Figure 2). Hence, information from a neuron is transferred to the neurons in the same and next layers. As seen in Figure 2, RNN also has a hidden state to recall some sequence data. RNN computes new states by applying its activation functions to prior states and new inputs recursively. The hidden state value (h t ) at a time step t can be obtained via: where x t , h t−1 , w x and u h are the inputs at time t, hidden states of the previous step (t − 1), weight for the input, and weight for the previous state value, respectively. Additionally, b is the bias and f is the activation function applied to the hidden state of current time. RNN is convenient for processing time series as it is able to model the temporal dynamics in the sequence of data by the feedback connections, which transmit information from the previous input to the next one. However, a shallow or simple RNN often encounters the vanishing gradient problem [30]. Therefore, it cannot model the long-term temporal patterns and make the network weak. In recent years, the gradient vanishing problem in RNN has been resolved by the long short-term memory (LSTM) neural network, which has greater computational cost. RNN is convenient for processing time series as it is able to model the temporal dynamics in the sequence of data by the feedback connections, which transmit information from the previous input to the next one. However, a shallow or simple RNN often encounters the vanishing gradient problem [30]. Therefore, it cannot model the long-term temporal patterns and make the network weak. In recent years, the gradient vanishing problem in RNN has been resolved by the long short-term memory (LSTM) neural network, which has greater computational cost.

Long Short-Term Memory (LSTM)
LSTM was first presented by Hochreiter [31]. LSTM is a class of RNN, which was developed for learning long-term dependencies. Each neuron in LSTM is a memory cell, which includes three gates: input gate, forget gate, and output gate to control the flow of information between different time steps ( Figure 3). Unlike conventional ANNs, the LSTM cells generate two separate values by a series of activations and operations. One value is the cell state (c t ) that carries information and stores memory in the long term, and the other is the output of the hidden layer (s t ). When the number of inputs increases, the gradients to the first several inputs vanish and become equal to zero. The LSTM can solve this problem by using the internal gates that can add, edit, or remove information in the cell. The readers are referred to Tran et al. [32] for a detailed description of LSTM.

Related Work
Herein, we provide a summary of studies, which adopted neural network models to forecast air temperature for a few minutes to several months ahead (see Table 1). The focus is on reviewing the papers published during the last 15 years (2005-2020). The reviewed studies are categorized based on their inputs into the univariate and multivariate models.

Related Work
Herein, we provide a summary of studies, which adopted neural network models to forecast air temperature for a few minutes to several months ahead (see Table 1). The focus is on reviewing the papers published during the last 15 years (2005-2020). The reviewed studies are categorized based on their inputs into the univariate and multivariate models.

Univariate Models
Ustaoglu et al. [20] employed three distinct ANNs namely, feed-forward back propagation (FFBP), radial basis function (RBF), and generalized regression neural network (GRNN), to forecast daily mean, maximum, and minimum air temperature in Turkey. The models used daily air temperature measurements of the previous seven days to forecast 1-day-ahead air temperature. Using the correlation coefficient (R 2 ), root mean square error (RMSE), and index of agreement (IA) statistical metrics, they showed that all the utilized neural network methods produced satisfactory results. Additionally, air temperature predictions from the ANN models were compared to those of the multiple linear regression (MRL) approaches. The ANN methods were found slightly superior to the MLR models.
Chattopadhyay et al. [19] applied three types of ANNs (multilayer perceptron (MLP), generalized feed forward neural network (GFFNN), and modular neural network (MNN)) to predict monthly maximum air temperature across the northeast of India. The periodicity of 12 months was found in the monthly maximum air temperature time series, and therefore a multiplicative model was used to deseasonalize the data. Additionally, the increasing trend in time series was identified by both the Mann-Kendall non-parametric and parametric tests. A trend equation was fitted to remove the trend from the deseasonalized time series. Consequently, the monthly maximum air temperature time series was found to be stationary. This allowed the networks to perform more efficiently. In their study, maximum air temperature values in a number of previous months (ranging from 2 to 4) were used as inputs to the neural networks. It was found that the MNN model using air temperature measurements in the previous four months performed better than MLP and GFFNN.
Abhishek et al. [33] investigated the feasibility of the feed-forward neural network (FFNN) for predicting daily maximum air temperature in Canada from 1999-2009. The input data consisted of daily maximum air temperature measurements in the past 10 years. Different transfer functions, number of hidden layers, and neurons were tested to evaluate the performance of neural networks. Finally, the results showed that the ANN with 5 hidden layers, 10 neurons per layer, and a tan-sigmoid transfer function generated the best maximum air temperature predictions.
Kumar et al. [34] used FFNN to forecast weekly mean air temperatures in India. Air temperature data in the previous six weeks were used in various ANN architectures to predict 1-week ahead air temperature. The predictive ability of different configurations was assessed by computing R 2 and RMSE metrics. Finally, a two-hidden-layer model with five neurons in each layer was found to produce the best results.
Optimizing hyperparameters of ANNs improves their ability to forecast hydrologic variables [49,50]. Tran et al. [32] employed a genetic algorithm (GA) to optimize hyperparameters of conventional multilayer ANN, RNN, and LSTM models. The hybrid models were used to forecast maximum air temperature at the Cheongju station in South Korea. Air temperature observations in the last seven days were used as inputs to forecast 1-to 15-days-ahead maximum air temperature. The results showed that the hybrid GA-LSTM had a better performance than the other models for long-term air temperature forecasting.
In another effort, Tran and Lee [35] applied the traditional multilayer ANN models to predict 1-day-ahead maximum air temperature at 55 stations in South Korea. They tried various numbers of parameters (i.e., the total number of weights and bias) by using different numbers of neurons and hidden layers. It was found that the ANN model with 5 hidden layers and a total of 49 weights and biases generated the smallest error at 52 stations in South Korea.
Other studies used more complex deep learning architectures for air temperature forecasting. For example, Zhang et al. [23] forecasted daily average air temperature for 4 days ahead by a convolutional recurrent neural network (CRNN), which combined convolutional neural networks (CNNs) with recurrent neural networks (RNNs). They utilized daily air temperature data over China from 1952 to 2018 to train the CRNN. The results demonstrated that their model could predict air temperature successfully based on the previous air temperature data.
Li et al. [23] employed a stacked long short-term memory network (stacked LSTM) to predict half-hourly air temperature from its historical observations. The proposed LSTM model had three hidden layers with 20, 10, and 4 memory cells in each layer. The fully connected layer and output layer had four and one neurons, respectively. Finally, the LSTM model was compared with the deep neural network (DNN) and random forest (RF) approaches under different sliding windows. It was observed that the network built by stacked LSTM is superior to the DNN and RF methods.
Afzali et al. [37] developed two different types of neural networks (namely, FFNN and Elman neural network) to predict 1-day-ahead mean, minimum, and maximum air temperature in the Kerman city (Iran) from the corresponding values in the last 15 days. The results showed that both neural networks provided satisfactory air temperature predictions. Additionally, the Elman network generated better forecasts.
De and Debnath [38] employed the FFNN model to forecast the air temperature of the monsoon months (June, July, and August) in India for 1901-2003. In their study, the monthly mean air temperatures in December, January, February, March, April, and May were used as inputs.

Multivariate Models
Smith et al. [22] used ANN models to forecast hourly air temperature for 1-12 steps ahead. The inputs consisted of air temperature, relative humidity, wind speed, solar radiation, and rainfall measurements in the previous 24 h. The data from 2001 to 2005 in the southern and central regions of Georgia were used to train and test the networks. The models used a linear input layer, and three equally sized parallel "slabs" using the Gaussian, Gaussian complement, and hyperbolic tangent activation functions in the hidden layer. The number of hidden nodes varied from 2 to 75 nodes. The results showed that the model with 40 nodes in the hidden layer produced the most accurate predictions.
Smith et al. [39] forecasted air temperature for 1-12 h ahead by the Ward-style ANN model. Hourly air temperature, wind speed, relative humidity, solar radiation, and rainfall observations as well as their hourly rate of change in the last 24 h were used as inputs. The data were recorded by the Georgia Automated Environmental Monitoring Network (AEMN) during 1997-2005. The temperature prediction models had a single hidden layer with 120 nodes that were distributed equally among the three slabs. The MAE of the evaluation set (2004-2005) ranged from 0.516 • C for the 1-h horizon to 1.873 • C for the 12-h horizon prediction. Additionally, two ensemble techniques (parallel and series aggregations) were investigated and found to be infeasible for air temperature prediction.
Altan Dombayci and Gölcü [17] employed the MLP neural network with Levenberg-Marquardt (LM) feed-forward backpropagation algorithms to predict daily mean air temperature in Turkey for one day ahead. The model was trained and tested by the data in 2003-2005 and 2006, respectively. The inputs of the network were the month of the year, the day of the month, and mean temperature of the previous day. The number of hidden neurons was varied from 3 to 30, and the network with 6 hidden neurons produced the best result.
Many studies utilized deep learning networks and geographical information to predict air temperature. Bilgili and Sahin [28] used three geographical variables (latitude, longitude, and altitude) and the number of months (1, 2, . . . , 12) as the inputs of the ANN model to predict monthly air temperature and rainfall in Turkey. The data from 76 weather stations between 1975 and 2006 were used to train and test the model. They showed that the ANN approach can predict monthly temperature and rainfall fairly well using the geographical variables and number of months.
Kisi and Shiri [40] used the number of months (1-12) and geographical information (latitude, longitude, and altitude) in ANN and the adaptive neuro-fuzzy inference System (ANFIS) to predict monthly average air temperature at 30 sites in Iran. Their robustness was compared by the RMSE, MAE, and coefficient of determination (R 2 ) metrics. The results showed that the performance of ANN was better than that of ANFIS in most stations.
The geographical variables (latitude, longitude, and altitude) along with the month of the year (1-12) were fed into the feed-forward network (FFN), ANFIS, support vector regression (SVR), and gene expression programming (GEP) models to predict monthly mean air temperatures at 50 stations in Iran by Kisi and Sanikhani [41]). The data of 30, 10, and 10 stations were selected for training, validation, and testing the models. SVR had the best performance followed by ANFIS and FFN.
Sahin [42] used the urban heat island (UHI) effect, number of months (1-12), altitude, latitude, longitude, and monthly mean land surface temperatures of 20 cities in Turkey into the three-layer FFN to predict monthly mean air temperature. The monthly data from 1995 to 2004 were used to train the FFN model, while the data of 2005 were used to test it. In their study, the number of hidden neurons was increased from 1 to 50 to find the optimized neural network. In the test period, the RMSE of monthly mean air temperature predictions at the 20 investigated cities ranged from 0.705 to 2.600 K.
Salcedo-Sanz et al. [26] compared the performance of SVR and MLP for predicting monthly mean air temperature at 10 sites in Australia and New Zealand. Air temperature from the previous month, two dummy variables d 1 = sin 2πn 12 and d 2 = cos 2πn 12 (where n = 0, 1, . . . , 11 depending on the month of the year), Southern Oscillation Index (SOI), Indian Ocean Dipole (IOD), and Pacific Decadal Oscillation (PDO) were used as inputs [51,52]. The results showed that SVR was able to provide more accurate predictions than MLP.
Akram and El [43] applied a deep LSTM network to forecast air temperature, humidity, and wind speed for 24 (or 72) h ahead in 9 cities of Morocco using the 24 (or 72) previous hourly values of air temperature, humidity, and wind speed as inputs. The model had a fully connected hidden layer (with 100 neurons) between two LSTM layers. The results showed that the proposed LSTM model could predict weather variables with high accuracy.
Jallal et al. [44] used an MLP model to predict air temperature in Morocco for 30 min ahead from the three previous half-hourly air temperature and global solar radiation measurements. They changed the number of hidden layers (from 1 to 5) and neurons (from 1 to 15) as well as activation functions (radial basis activation function, logistic sigmoid function, and hyperbolic tangent function) to find the best configuration. It was found that a two-hidden-layer network that used the hyperbolic tangent function with 5 and 8 hidden nodes in each layer respectively generated the best predictions with the MSE of 0.272 • C and R 2 of 0.997.
Park et al. [45] applied an LSTM model to forecast air temperature at three locations in South Korea. Wind speed, air temperature, and humidity were employed as inputs. The LSTM model with four layers could predict air temperature accurately for both short (6, 12, and 24 h ahead) and long (14 days in advance) periods. The results showed that the LSTM approach outperformed the deep neural network (DNN).
Huang et al. [46] utilized the RNN model to forecast daily maximum and minimum air temperature at 14 sites in Guangxi, China. Based on the climatology and persistence (CLIPER) method [53], the average, maximum, and minimum air temperature, and precipitation in the previous days, as well as a total of 50 CLIPER predictors were selected for temperature prediction. The performance of the RNN model was compared with the stepwise regression method. It was found that the accuracy of RNN was higher than that of the stepwise regression method.
Sundaram et al. [47] compared the performance of three machine learning models namely, support vector machine (SVM), MLP, and RNN for daily air temperature prediction. Different meteorological variables, such as air temperature, atmospheric pressure, relative humidity, wind direction, total cloud cover, horizontal visibility, and dew point temperature, were inputted into the abovementioned models. The RMSE of air temperature forecasts from RNN is 1.41 • C, which is lower than the RMSEs of 3.1 • C and 6.67 • C from MLP and SVM, respectively.
Roy [48] explored three deep neural networks namely, MLP, LSTM, and hybrid CNN-LSTM, to forecast the air temperature for 1-10 days ahead. The past seven days of wind speed, precipitation, snow depth, and mean, maximum, and minimum temperature were used as inputs. The results indicated that the hybrid CNN-LSTM model outperformed the other models.
Kreuzer et al. [24] used the convolutional long short-term memory (convLSTM) method to forecast air temperature up to 24 h in advance in five weather stations of Germany during 2009-2018. They compared the performance of convLSTM with those of the seasonal autoregressive integrated moving average (SARIMA), seasonal naive approach, and univariate and multivariate LSTMs. Hourly air temperature, relative humidity, cloud coverage, precipitation, wind speed and direction, month of year, hour of day, sealevel air pressure, and the difference between the air pressure at the station and the sea level were used as inputs in multivariate LSTM and ConvLSTM. They showed that the seasonal naive approach has the worst performance for most of the prediction horizons. While the SARIMA and univariate LSTM network performed well for the first two-to three-hour air forecasts, the ConvLSTM and multivariate LSTM showed a better performance for longer forecast horizons. In the stations with large variations of air temperature during the day, convLSTM outperformed other methods.
Lee et al. [24] employed three neural network models (namely, MLP, LSTM, and CNN) to forecast the average, minimum, and maximum air temperatures for the next day in three regions of South Korea. They tried both hourly and daily air temperature, precipitation, humidity, vapor pressure, dew point temperature, atmospheric pressure, sea-level pressure, hours of sunshine, solar radiation, cloud cover, ground surface temperature, and wind speed and direction as inputs in the previous 30 days. Hourly input data provided better information for daily air temperature forecasting than daily input data. Overall, the CNN with hourly input data showed better performance than the MLP and LSTM.

Discussion
This study reviewed the recent (2005-2020) articles that utilized ANN methodologies to forecast air temperature. For this purpose, 26 publications were chosen, categorized according to their input variables, and finally discussed. As described in Section 4, neural network approaches have been applied extensively in the context of air temperature forecasting. The summary of the reviewed papers is provided in Table 1. As can be seen, different types of neural network approaches, such as MLP, FFBF, GRNN, RBF, CRNN, RNN, and LSTM, were used for forecasting air temperature. Some studies in Table 1 also compared the performance of neural network techniques with those of other machine learning methods, such as SVM, GEP, and RF [36,41]. They stated that the ANN approaches often provide more accurate air temperature forecasts. Additionally, only a few numbers of studies used deep learning methods, such as RNN and LSTM, although they are highly promising.
A variety of meteorological and geographical variables have been used as inputs in the neural network approaches. They include air temperature, wind speed and direction, air pressure, precipitation, solar radiation, relative humidity, cloudiness, latitude, longitude, and altitude [24,25,28]. Among them, air temperature, relative humidity, precipitation, and wind speed are found to be the common inputs for air temperature predictions. While various meteorological variables have been fed into different types of NN approaches as inputs, the geographical inputs (i.e., latitude, longitude, altitude) have been used only in simple NN techniques (e.g., MLP and FFNN) rather than complex ones (e.g., RNN and LSTM). However, it should be noted that choosing the best input variables for a particular NN approach is difficult due to the complexity of the problem and limited number of studies.
Moreover, it is found that neural network methods are mainly applied to short-term air temperature forecasting. Only a few studies were dedicated to the medium-and long-term forecasting of air temperature, which mainly utilized the RNN and LSTM models due to their capabilities in capturing the temporal trends of air temperature time series [32]. RNN and LSTM are known as efficient methods for long-term forecasting of hydrologic variables [54,55]. However, there are only eight studies that forecasted air temperature via RNN and LSTM. It is shown that the accuracy of the abovementioned models varies mainly with the input variables and network structure. Using ancillary data (e.g., rainfall, air pressure, and humidity) in the deep learning methods improves air temperature predictions.
The literature shows that the performance of NN models is dependent on the network configuration, such as the number of hidden neurons and layers [21,22,45]. Since there is no rule for choosing the optimum number of hidden neurons and layers to avoid underfitting and overfitting of the network, they were mostly determined by trial and error [20,44]. These optimal numbers could be found by searching algorithms, such as GA [32]. In general, increasing the size of hidden layers and neurons allows the neural networks to learn complicated processes more robustly, ultimately enhancing their forecasting abilities. However, a number of studies showed that adding hidden layers and neurons did not always increase the accuracy of the network [21,44]. Based on the literature, it is still difficult to pick the best methodology for air temperature forecasting. As can be seen in Table 1, there are a few studies that take advantage of optimization techniques, such as GA, to tune the hyperparameters of neural networks for a more accurate air temperature prediction. Hybrid models can improve the accuracy of air temperature predictions [56]. However, coupling the ANN models with optimization algorithms and developing hybrid approaches have not yet been studied sufficiently. Therefore, the effectiveness of these methods should be investigated thoroughly in predicting hydrologic variables and of course, air temperature forecasting can highly benefit from them.

Conclusions and Future Research Work
In this paper, we conducted a comprehensive review of studies that forecasted air temperature via neural networks. The review showed that air temperature could be forecasted successfully by various types of artificial neural networks (ANNs).
According to the reviewed studies, MLP and a lesser extent RBF, GRNN, and wardstyle ANN models were used to predict air temperature. It is noteworthy that the selection of input variables highly affects the robustness of ANNs. The historical air temperature and other micrometeorological variables were used as inputs in ANNs. Additionally, the number of hidden neurons plays an important role in the accuracy of predictions. Selection of the number of the hidden neurons is mostly performed by trial and error.
Overall, the neural network models have been shown to be promising and can provide reliable air temperature forecasts. It is anticipated that neural networks play an important role in the future of air temperature prediction. The information presented in this review paper helps us understand the current state of air temperature predictions.
The following directions can be considered for future works: • The combination of neural networks with many optimization algorithms (e.g., particle swarm algorithm (PSO), harmony search, genetic programming, etc.) has not been applied to air temperature forecasting. The meta-learning approaches can be utilized in the future to forecast air temperature more accurately. They can be combined with neural network models to strengthen the model robustness since the heuristic algorithm can optimize the hyperparameters of ANNs.

•
The effect analysis of relevant meteorological (e. g., maximum, minimum, and mean temperature, rainfall, and relative humidity) and geographical (e.g., latitude, longitude, and elevation) variables should be performed to improve the accuracy of air temperature prediction. Thus, the feature selection techniques, such as recursive feature elimination, random forest, and correlation coefficient, should be employed to select the useful input variables for air temperature forecast.

•
Comparison of the performance of ANN-based models with other soft computing approaches, such as support vector machines (SVMs), autoregressive moving average model (ARMA), and extreme learning machines to determine the best approach to forecast air temperature over different hydrologic conditions and time horizons.

•
The long-term air temperature prediction has an important role in human lives and other sectors, such as energy consumption and agriculture. Hence, it should be investigated more deeply in future studies via the RNN and LSTM models. Their performance should also be compared with other medium or long-range models, such as the European Centre for Medium-Range Weather Forecasts (ECMWF) model and global weather forecast models [57].