Next Article in Journal
Adaptive Dynamic Search for Multi-Task Learning
Previous Article in Journal
Evaluation of Microleakage of a New Bioactive Material for Restoration of Posterior Teeth: An In Vitro Radioactive Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Temperature Prediction of Chinese Cities Based on GCN-BiLSTM

1
School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
2
Smart Health Big Data Analysis and Location Services Engineering Research Center of Jiangsu Province, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
3
School of Communications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
4
Key Lab. of Virtual Geographic Environment (Nanjing Normal University), Ministry of Education, Nanjing 210023, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(22), 11833; https://doi.org/10.3390/app122211833
Submission received: 19 October 2022 / Revised: 15 November 2022 / Accepted: 18 November 2022 / Published: 21 November 2022

Abstract

:
Temperature is an important part of meteorological factors, which are affected by local and surrounding meteorological factors. Aiming at the problems of significant prediction error and insufficient extraction of spatial features in current temperature prediction research, this research proposes a temperature prediction model based on the Graph Convolutional Network (GCN) and Bidirectional Long Short-Term Memory (BiLSTM) and studies the influence of temperature time-series characteristics, urban spatial location, and other meteorological factors on temperature change in the study area. In this research, multi-meteorological influencing factors and temperature time-series characteristics are used instead of single time-series temperature as influencing factors to improve the time dimension of the input data through time-sliding windows. Meanwhile, considering the influence of meteorological factors in the surrounding area on the temperature change in the study area, we use GCN to extract the urban geospatial location features. The experimental results demonstrate that our model outperforms other models and has the smallest root mean squared error (RMSE) and mean absolute error (MAE) in the following 14-day and multi-region temperature forecasts. It has higher accuracy in areas with stable temperature fluctuations and small temperature differences than in baseline models.

1. Introduction

Meteorological changes play an essential role in various human activities and are closely related to the healthy lives of human beings and the economic development of society [1]. However, the process of meteorological changes is complex and elusive. Therefore, meteorologists have been exploring accurate and timely meteorological forecasting methods for a long time [2]. Different industries and fields are increasing demands on the accuracy and timeliness of meteorological forecasts [3]. Therefore, it is of great significance to study temperature prediction for human production and life.
Meteorological methods have mainly been applied in early temperature prediction research [4]. These methods are used to predict the temperature for a future period based on meteorological knowledge and experience with reference to existing weather conditions. They did not perform an analysis of meteorological data. Traditional meteorological methods rely on the subjective judgment of observers, which usually leads to low prediction accuracy [5]. To predict weather changes more accurately, mathematical statistics methods [4] have been employed to analyze historical weather information. The methods utilize a mathematical function to fit the temperature change trend based on historical temperature data to obtain the prediction formula. For example, Bogdanovs et al. [6] used the Kalman filter to correct temperature prediction errors, which improved the accuracy of short-term temperature prediction. However, mathematical statistics methods only analyze temperature factors, ignoring the impact of other meteorological factors on the results. Temperature is often affected by other meteorological factors, such as precipitation and atmospheric pressure. It is necessary to take multiple factors into account when making temperature predictions, where there is some correlation between the factors. Compared to mathematical statistics methods, machine learning-based approaches are able to explore the connections between temperature and other meteorological factors with higher accuracy. For example, Perez-Vega et al. [7] used a support vector machine (SVM) with different kernel functions to predict the temperature with an average mean square error (MSE) of 0.09 °C. However, temperature data are a kind of time series data with prominent non-stationary fluctuation characteristics. These methods are good at exploring connections between different data but lack analysis of the time dependence of temperature data. With the rapid development of deep learning (DL) methods, deep neural networks have received increasing attention because they capture dynamic features well. Among them, prediction methods based on the Long Short-Term Memory (LSTM) model achieved relatively high accuracy [8], which can learn the information of timing features in the temperature data very well. However, LSTM only focuses on past temporal features and cannot completely learn the overall features of timing. Moreover, in addition to temporal dependence, spatial dependence also has a significant impact on temperature prediction. It is not enough to consider the meteorological factors and temporal dependence of the target area because the meteorological conditions in the surrounding area also have an impact on the target area.
To address the above issues, in this research, we consider the relationships between temperature and various meteorological factors, and the spatial-temporal features of meteorological data. We combine Bidirectional Long Short-Term Memory (BiLSTM) [6] containing historical and future time-series characteristics with a Graph Convolutional Network (GCN), which can extract spatial meteorological characteristics to construct a GCN-BiLSTM temperature prediction model.
The main contributions of this research are as follows:
(1)
We consider the spatial-temporal features of meteorological data and integrate the GCN-BiLSTM model to propose a novel temperature prediction method that improves the accuracy of temperature prediction from the perspective of temporal and spatial distribution.
(2)
The spatial relationship between the monitoring stations is used to learn the features of the meteorological data from the surrounding stations and to explore the correlation between the meteorological factors of the surrounding stations and the temperature of the target stations.
(3)
The research results show that the average RMSE of this prediction method is 2.671 °C and the average MAE is 3.024 °C for the future 14-day temperature prediction; for the multi-area temperature prediction, the average RMSE is 1.906 °C, and the average MAE is 2.583 °C. The model’s performance in this study is significantly better than other baseline models, and it has higher accuracy in cities with stable temperature fluctuations and small temperature differences.
The rest of this paper is organized as follows. In Section 2, we systematically introduce a series of related works on modeling approaches for meteorological forecasting learning. Section 3.1 describes the data source and the data preprocessing workflow. Section 3.2 details our proposed temperature prediction model, GCN-BiLSTM. In Section 4, extensive experiments are designed and conducted to verify the effectiveness of the GCN-BiLSTM model. We describe the experiments of multi-day and multi-regional temperature prediction, including the data description, baseline methods, evaluation metrics, and experimental settings. For further illustration, we present the experimental results of various baseline models while comparing the performance with some common assessment criteria in Section 4.3. Finally, in Section 5, we summarize and discuss our research work.

2. Related Works

Traditional temperature prediction methods can be classified into two categories: mathematical statistics-based and machine learning-based. Common mathematical statistics methods include the Kalman filter [9] and regression analysis [10]. These methods are simple in principle and fast in calculation, but they only analyze a single variable and are highly dependent on current temperature data. They do not perform well in long-term temperature predictions. More importantly, the influence of other meteorological factors is not considered on temperature, and the prediction accuracy is not high. Machine learning-based methods are able to explore the correlation features of each meteorological factor to achieve excellent predictive performance. Common methods of machine learning include SVM [11,12], Genetic Algorithms (GA) [13], Artificial Neural Networks (ANN) [14], etc. Radhika et al. [12] used SVM to predict atmospheric temperature and compared the results with multilayer perceptron (MLP). The results demonstrated that SVM outperformed MLP’s. Abbot et al. [14] used the ANN model to predict temperature in six local regions with good performance, including Swiss Alps, Canadian Rockies, Tasmania, and so on. Venkadesh et al. [13] applied GA to determine the optimal duration and resolution of the input variables and set them as inputs for the ANN model. The improved method enhanced the MAE of temperature prediction for the next 1, 2, 8, and 12 h compared to the traditional GA method. In these methods, the current inputs are used to output the prediction results through mathematical functions or algorithms. These methods, based on machine learning, ignore the analysis of timing dependence. Temperature prediction is a typical time series problem that is highly dependent on time series. Therefore, their prediction accuracy cannot be further improved when faced with a long-term temperature prediction task. These traditional methods of temperature prediction are not good at handling large amounts of data.
In recent years, DL methods have been widely used in the prediction of future weather, such as PM2.5, wind speed, rainfall, etc. In particular, it has better performance than other traditional methods when processing time series data using LSTM. Feng et al. [15] leveraged the LSTM to analyze the multi-year surface meteorological data of the city and predicted the future temperature with a precision of 86%. Liang et al. [8] used the LSTM method to predict soil temperature. Considering air quality data and meteorological data, Wu et al. [16] proposed an improved LSTM method to predict PM2.5 concentrations using big data. However, the above research only analyzed the data of the target region without considering the influence factors of multiple regions. More and more researchers are focusing on the effects of spatial-temporal dependence in DL models. The meteorological change of a place is not only affected by its own geographical and temporal factors. Therefore, it is necessary to improve prediction accuracy by analyzing the observation data of the surrounding areas of the region. For example, Qiao et al. [17] used a 3-Dimensional Convolutional Neural Network (3D CNN) to capture the spatial correlations among Sea Surface Temperature (SST) field data composed of multiple observation points in a selected sea area for SST prediction. Jeong et al. [18] used BiLSTM to process time series observations and CNN to process RDASP image data to obtain spatial features, and combined both features to predict the temperature in the next 14 days. In other research fields, Cao et al. [19] proposed an end-to-end model called ITRCN, which converts interactive network traffic into images. The model used CNN to capture the interaction function of traffic and GRU to extract temporal features. The results show that the ITRCN method outperforms the conventional GRU and CNN methods by an improvement of 14.3% and 13.0% in RMSE, respectively. The above approaches employed CNN to model spatial dependence and made great progress in some prediction tasks. However, CNN is applicable to Euclidean space and is mostly used to process raster data. It has limitations in dealing with geographically distributed networks with complex topology. Therefore, it cannot capture spatial dependence accurately. Recently, GCN has received a lot of attention promoted by CNN, which can obtain local features from graph structures. For instance, Zhao et al. [20] used GCN to capture the topology of urban road networks to obtain spatial dependence in an urban traffic prediction task. The experiments showed that the prediction has good performance. In this research, we mainly considered the spatial distribution characteristics of the observation stations in terms of geographical location. The GCN network is employed to analyze the geospatial relations of the target area and its multi-order neighbors to capture spatial features between nodes. We summarize the advantages and disadvantages of existing temperature prediction methods, as shown in Table 1.
Considering the temporal features of multidimensional meteorological data and the spatial features of urban areas, this research takes advantage of the multi-meteorological element model by considering various meteorological influences such as temperature, wind speed, and precipitation, and sets up a time-sliding window. Simultaneously, considering the meteorological spatial features of the surrounding sites and learning the relationships of meteorological elements between different regions, we integrate GCN and BiLSTM to build a temperature prediction model combining the bidirectional propagation characteristics of BiLSTM and the spatial data features extraction of GCN. This model will enhance the network capacity and model complexity to improve the accuracy of temperature prediction further.

3. Materials and Methods

3.1. Data Source and Preprocessing

3.1.1. Data Source

The Daily Value data set of China’s Surface Climate Data (V3.0) [21] for this research between 1 May 2010, and 31 December 2019, including 2071 primary meteorological stations on the Chinese mainland, was obtained from the National Meteorological Information Center and the China Meteorological Data Service Center, which were selected as the sample data set. Some of the original data from Heilongjiang Station (District Station No.: 50136) are shown in Table 2.
The datasets include the station number, location, and other station information of 2071 stations, as well as daily updated values of 17 related meteorological element values. The data elements include station number, latitude, longitude, elevation, date, average temperature, maximum temperature, minimum temperature, average relative humidity, minimum relative humidity, average wind speed, maximum wind speed, sunshine duration, precipitation, small evaporation, large evaporation, average pressure, maximum pressure, minimum pressure, average surface temperature, maximum surface temperature, and minimum surface temperature. The units of the original data are described as follows: temperature (°C), humidity (1%), wind speed (0.1 m/s), sunshine (0.1 h), precipitation (0.1 mm), evaporation (0.1 mm), air pressure (0.1 hPa), latitude (d, m), longitude (d, m), and elevation (0.1 m).

3.1.2. Data Preprocessing

The data of the Chinese Meteorological Stations were obtained from the ground meteorological stations. Usually, the data of ground weather stations are considered reliable, but errors, such as missing data and some data exceeding the actual value, are inevitable. To ensure the accuracy and rationality of the experimental data, it is necessary to preprocess the site’s original meteorological data.
The preprocess starts with data cleaning, including removing meteorological factors such as evaporation that were severely missing, filling in the missing data, and calculating the averages of the 14 days for each week before and after the abnormal values to modify the values. Second, data standardization was implemented by Min-Max standardization to normalize the data and convert them into dimensionless evaluation values. Then, based on the gray correlation analysis (GRA) method, correlation attributes with a correlation degree greater than 0.88 are selected. These include average temperature, average humidity, sunshine duration, precipitation, average wind speed, maximum temperature, average pressure, and date. Finally, data reconstruction was performed using a time-sliding window. In this study, we predict the average temperature and set a time-sliding window to reconstruct the meteorological time series. For example, taking the setting value of the time sliding window as 2, the meteorological factors of days t-1 and t-2 are input into the model. Then, the outputs are the temperatures of day t. In order, the original data are slid backward by 1 and 2 cells, respectively, and the slid meteorological factors together form the new inputs. Figure 1 illustrates the workflow of data preprocessing.

3.2. Methods

3.2.1. GCN Model

GCN is an important branch of Graph Neural Networks (GNN), which is an extension of Convolutional Neural Networks (CNN) commonly used in image processing and graph data processing [22,23]. GCN is essentially a spatial feature extractor that can extract the influence of other nodes on the target nodes through the construction of adjacency and degree matrices, combing with the feature matrix of the graph network. The GCN principle is to process a set of graph data with N nodes whose features form an N × D -dimensional matrix X , and the relationships between each node form an N × N -dimensional adjacency matrix A . Taking X and   A as inputs, the propagation between layers is described in Equation (1).
H ( l + 1 ) = σ ( D ˇ 1 2 A ˇ D ˇ 1 2 H ( l ) W ( l ) )
where A ˇ = A + I and I is the identity matrix. D ˇ is the degree matrix of A ˇ , calculated by the formula D ˇ i i = j A ˇ i i .   H is the feature of each layer, and for the input layer, H is X . σ is the nonlinear activation function, and W is the weight matrix. l is the number of propagation layers. In this research, l = 2.
In our study, the GCN network model is built by taking Jiangsu Province as an example. According to whether the 13 prefecture-level cities are directly connected geographically, we determine whether there is an adjacency relationship between the two cities.
For convenient descriptions, city names are replaced by numbers. The numbers correspond to the cities shown in Table 3. As shown in Figure 2a, Nanjing is adjacent to Zhenjiang and Changzhou, and not adjacent to Nantong. The adjacency network of each city in Jiangsu Province is extracted and depicted, as shown in Figure 2b.
According to Figure 2b, an adjacency relationship matrix is established based on whether the 13 cities in Jiangsu Province are adjacent in terms of geographical location. Among them, 1 means that two cities are adjacent, which will be taken into account when updating the meteorological factor data in each round; 0 means that two cities are not adjacent or are the same, which will not be taken into account when updating the data in each round. As a result, an adjacency matrix A can be constructed, as shown in Table 4.
The degree matrix D is constructed as shown in Table 5:
According to the data provided by the “Table of Observation Data for Basic Meteorological Elements of China’s Surface Meteorological Stations” [21], the observation station data of 13 prefecture-level cities of Jiangsu Province were selected in our experiments. The station number, station name, latitude, longitude, altitude of the pressure sensor, and altitude of the observation site are shown in Table 6.
For meteorological data with a graph structure, the spatial-temporal correlation in the meteorological data is extracted using graph convolution, that is, the correlations between the meteorological data of the surrounding sites and the temperature data of the target sites. As shown in Figure 3, with node 1 as the starting point of the GCN convolution operation, the meteorological information of adjacent nodes 2, 3, and 4, which are directly connected to node 1, can be extracted using first-order ChebNet graph convolution. Then, after the updated data are processed by the activation function, we perform the first-order ChebNet graph convolution again to extract the meteorological information of nodes 5 and 6, which are directly connected to nodes 2, 3, and 4. Nodes 2, 3, and 4 can be considered as first-order adjacent nodes centered on node 1, and nodes 5 and 6 can be regarded as second-order adjacent nodes centered on node 1.
Using the second-order ChebNet graph convolution, the convolution kernel can directly extract information about the surrounding first-order neighboring nodes 2, 3, 4, and second-order neighboring nodes 5, 6 centered on node 1. The detailed implementation process of the second-order ChebNet graph convolution is shown in Figure 4.
Generally, high-order convolution nodes can be achieved by convoluting and stacking multiple low-order nodes. For example, a second-order convolution can be accumulated by two first-order convolutions. When the order is 0, each node can only extract its meteorological features but cannot extract the meteorological features of surrounding nodes. However, the higher the order is not better. When the order is too large, the complexity of the network and the computation time will increase obviously. Taking the graph network of Jiangsu Province as an example, with node 1 as the starting point, the meteorological information of each node in the province can be extracted by 4 times the convolution and superposition of first-order nodes. In this research, it is not necessary to extract the features of all nodes in the research area. According to the experimental verification, the first-order GCN graph convolution operations are performed twice in this research.

3.2.2. BiLSTM Model

LSTM solves the long-term dependency problem of RNN and performs well in processing time series with a long time span. However, the transmission of the unit state in LSTM is unidirectional from front to back [24], so the LSTM model can only learn the meteorological features of past moments and cannot learn the meteorological features of future moments. BiLSTM means bidirectional LSTM, which means that the signal propagates backward and forward in time [25]. The BiLSTM can learn the features of future meteorological information while using past meteorological information and apply recursion and feedback to it. Its prediction results are more accurate than unidirectional LSTM [26].
The LSTM network unit comprises a cell, an input gate, an output gate, and a forget gate. The state unit is employed to coordinate the operation of the whole network. The calculation formula of the LSTM network unit implemented by gating is as follows:
f t = σ ( W f x t + U f h t 1 + b f )
i t = σ ( W i x t + U i h t 1 + b i )
C ˜ t = tanh ( W c x t + U c h t 1 + b c )
C t = C t 1 f t + C ˜ t i t
o t = σ ( W o x t + U o h t 1 + b o )
h t = o t tanh ( c t )
Here, f t is the forget gate. i t is the input gate and o t is the output gate. C t is the internal state at time t. W , U , and b are the model parameters.
The structure of the BiLSTM network is shown in Figure 5. The BiLSTM network is a bidirectional structure that makes the input flow in both directions to preserve future and past information. Therefore, BiLSTM-based models can better mine the association features of time series data.
Suppose that h t is the hidden layer state of the forward LSTM network at time t , and its calculation formula is as described in Equation (8).
h t = LSTM ( x t , h t 1 )
where x t is the input at time t and h t 1 is the hidden layer state of the forward LSTM network at time t 1 . h t is the hidden layer state of the back LSTM network at time t , and its calculation formula is as shown in Equation (9):
h t = LSTM ( x t , h t 1 )
where x t is the input at time t , and h t 1 is the hidden layer state of the back LSTM network at time t 1 . The output of the BiLSTM network is the combination of two hidden layer states h t and h t , to form the whole hidden state h t of the network.

3.2.3. GCN-BiLSTM Model

After preprocessing the original meteorological time series data, such as data cleaning and standardization, the data set is divided into training sets, validation sets, and test sets. After the initial data pass through the two-layer GCN network, the feature extraction of meteorological information from the surrounding area is achieved. After this, the information contains spatial features. The output of the previous layer is further fed into the BiLSTM network. Then, the final output { y 1 , y 2 , y n }   is obtained after dropout optimization and 6 fully connected layers. When the input data is the training set, the output values { y 1 , y 2 , y n }   and the actual temperature { T 1 , T 2 , , T n } are used to calculate the loss of mean squared error. The model is continuously optimized by the Adam optimizer. When the input data are the test set, no loss calculation and model optimization are performed, and the output values are the predicted temperature values of the GCN-BiLSTM model. The detailed workflow of the GCN-BiLSTM model is shown in Figure 6.

4. Results and Discussion

4.1. Experimental Data

The experimental data in this research are the daily weather data of 2071 stations in the Chinese mainland for a total of 3531 days between 1 May 2010, and 31 December 2019. After the gray correlation analysis, eight attributes with correlations greater than 0.88 were selected, including average temperature, average humidity, sunshine duration, precipitation, average wind speed, maximum temperature, average pressure, and date. After data processing by the time sliding window, 3521 experimental data are retained for each site. The data sets are divided into the training set, validation set, and test set according to the ratio of 8:1:1. To analyze the performance of each model, the data sets will be re-divided according to different experiments, and the model will be retrained to perform prediction experiments.

4.2. Baseline Models

In this research, to evaluate the performance of the proposed GCN-BiLSTM model, we select three typical models for controlled experiments. Autoregressive Integrated Moving Average (ARIMA) [27] belongs to the classical mathematical statistics method, which is good at dealing with temperature data with prominent non-stationary fluctuation characteristics. LSTM [15] is a commonly used model in time series processing research. It can learn the temporal dependence of temperature from historical data and has good performance in handling data with time series features. The Deep Feedforward Network (DFN) [28] is a DL method that can learn the relationships between meteorological elements and temperature to perform accurate predictions.
ARIMA: This model converts a nonstationary time series into a stationary time series, and then regresses the lag values of the dependent variables, the present values, and lag values of the random errors to deal with time series forecasting problems.
LSTM: The model explores the dependencies between sequence elements by transferring internal states. The gating mechanism is applied to solve the defect of the RNN gradient update.
DFN: DFN consists of an input layer, an output layer, and multiple hidden layers. There is no backward feedback in the network, and the signal propagates unidirectionally from the input layer to the output layer.

4.3. Evaluation Metrics

In this research, the root mean squared error (RMSE) and mean absolute error (MAE) are employed as evaluation metrics to measure model performance and analyze prediction accuracy. The RMSE and MAE values can be considered the absolute Celsius error between the true and predicted values. The calculation formula is described as follows.
R M S E = 1 N i = 1 N ( y i y i ) 2
M A E = 1 N i = 1 N | y i y i |
Here, y i is the predicted value. y i is the actual value. N is the sample number of the test set. Smaller RMSE and MAE indicate higher prediction accuracy of the model; conversely, higher evaluation metrics indicate lower prediction accuracy.

4.4. Results

4.4.1. 14-Day Temperature Prediction Experiment

To evaluate the performance of the model for short-term temperature prediction and the prediction performance differences among each model, the temperature prediction experiments are set up for the next 14 days, as follows in this research. We took Nanjing city as the research area and set the number of time-sliding windows to 30. The number of input nodes is 30 × 8, a total of 240, and the number of output forecast days is 14 days. We use training and validation datasets to train GCN-BiLSTM, DFN, and LSTM, respectively. Then, three baseline models and GCN-BiLSTM are employed to forecast 14-day temperatures by selecting 10 random time points. The RMSE and MAE of each model are calculated by taking the average results of 10 experiments.
The experimental results of temperature prediction for the next 14 days are shown in Table 7.

4.4.2. Multi-Regional Temperature Prediction Experiment

Temperature prediction experiments are conducted in several cities to evaluate the performance differences of each model for temperature prediction in different regions. Harbin, Beijing, Jinan, Qingdao, Nanjing, Hangzhou, Guangzhou, and Shenzhen are selected as the experimental cities. The number of time-sliding windows is set to 30. The number of input nodes is 30 × 8 for a total of 240, and the number of output forecast days is 1 day.
The experimental RMSE and MAE of temperature prediction for the next day at 8 cities are shown in Table 8 and Table 9, respectively.

4.5. Discussion

From the 14-day temperature prediction experiments, it can be seen in Table 6 that for the average RMSE and MAE of the temperature forecast for the next 14 days, the GCN-BiLSTM model reaches 2.671 °C and 3.024 °C, respectively. Compared to LSTM, DFN, and ARIMA models, the RMSE value decreased by 23.45%, 38.16%, and 46.48%; the MAE value decreased by 32.61%, 34.15%, and 47.35%, respectively.
The RMSE and MAE variations of the four models are shown in Figure 7 and Figure 8. It can be seen that the GCN-BiLSTM model achieves the best prediction performance for different predictions in different experimental regions. Specifically, as the number of forecast days increases, both the RMSE and MAE errors of these models appear to have an increasing trend. The ARIMA model outperforms the DFN and LSTM methods in terms of RMSE for short forecast days. However, due to its limitation of single attribute prediction, ARIMA performs poorly in longer time-step prediction. As shown in Figure 7, the ARIMA error exceeds the DFN, with the highest error from day 5, and the error tends to increase significantly afterward. This indicates that the DFN model and the GCN-BiLSTM model, which integrate multiple influencing factors, have better performance than the ARIMA model, which only considers single factors. After the 7th day, the RMSE and MAE of the DFN model continue to get higher. This is because the DFN model does not analyze the time-series features of temperature data and performs poorly in long-term temperature prediction. From the 7th day to the 14th day, the LSTM model and the GCN-BiLSTM model, which take into account the time series features, show better performance in long-term temperature prediction. Their RMSE and MAE are relatively lower, and the change in errors is smaller. Further, compared with the LSTM model, which only considers temporal features, the average RMSE and MAE of GCN-BiLSTM decrease by 23.45% and 32.61%, respectively, in the long-short term predictions. This demonstrates that the GCN-BiLSTM model, considering spatial-temporal features, has better prediction performance. This is because it captures the influence of meteorological factors of the surrounding area on the target area from the perspective of urban spatial distribution. Meanwhile, the information containing spatial features is also modeled based on temporal features, and a DL model with spatial-temporal feature capture capability is constructed. Overall, the GCN-BiLSTM model performed better in the learning of temporal and spatial features., and has the best performance in long-short term temperature forecasting.
From the multi-regional temperature prediction experiments, as shown in Table 8, the average RMSE of the GCN-BiLSTM model is 1.906 °C, which is 26.35%, 33.79%, and 38.73% lower than that of the LSTM, ARIMA, and DFN models, respectively. Similarly, in terms of absolute error, the average MAE of the GCN-BiLSTM model is 2.583 °C, which is 15.09%, 26.87%, and 38.18% lower than that of LSTM, DFN, and ARIMA, respectively, as shown in Table 9.
The RMSE and MAE of different models in different cities are shown in Figure 9 and Figure 10. As can be seen in Figure 9, the GCN-BiLSTM model outperforms ARIMA, LSTM, and DFN in temperature prediction in different regions. As in the above analysis, the ARIMA model outperforms the DFN model in terms of RMSE in all research cities. However, for MAE, the result is the opposite. The LSTM model still performs better than the ARIMA model and the DFN model. Comprehensively, the prediction accuracy of the four models has a similar trend, but the GCN-BiLSTM model is more adaptive in each region. This is because the GCN-BiLSTM model can capture the spatial-temporal features of the temperature data in the current region. Specifically, the RMSE of the GCN-BiLSTM model in Hangzhou, Qingdao, Guangzhou, and Shenzhen are lower than the average of the eight cities, while the RMSE in Harbin, Beijing, Jinan, and Nanjing are higher than the average. The GCN-BiLSTM model has the optimal performance for temperature prediction in Guangzhou, with an RMSE of 1.485 °C, which is 22.09% lower than the average. The worst performance for Jinan is 2.337 °C, which is 22.61% higher than the average. As shown in Figure 10, in terms of MAE, the GCN-BiLSTM model performs best for temperature prediction in Guangzhou. In Jinan and Harbin, the MAE is high, and the GCN-BiLSTM model performs well.
Figure 11 depicts the temperature trends in the four cities of Harbin, Jinan, Hangzhou, and Guangzhou during the five-year period from 1 May 2014 to 1 May 2019. As shown in Figure 11, the temperature difference in Guangzhou is small throughout the year, about 24 °C to 26 °C due to its low latitude, its proximity to the South China Sea, and its subtropical maritime climate. Therefore, the prediction accuracy in Guangzhou is better than in other cities. However, in Harbin and Jinan, the temperature fluctuates throughout the year, with a temperature difference of about 35 °C. The prediction accuracy of the model is lower than in other cities, with more slight fluctuations.
To further explore the spatial correlation of GCN-BiLSTM, the RMSE and MAE of the temperature prediction results for each city in Jiangsu province are spatially visualized, as shown in Figure 12. Hereinto, the average values of RMSE and MAE for the 14-day temperature prediction results are selected in the figure. Cities located at the edge of the study area and with fewer neighbor nodes (including first-order and second-order nodes) have higher prediction errors, such as Suqian and Xuzhou. Cities with more neighbor nodes have smaller prediction errors, such as Yangzhou and Taizhou. GCN is employed to extract the spatial features of the city’s topological maps in our model. If there are more neighbor nodes in the target region, the more features used to extract spatial correlation, the more accurate the prediction will be. However, when the number of neighbor nodes exceeds a certain threshold, the prediction effect does not improve significantly, such as in Yancheng. We consider that when there are too many neighbor nodes, it will bring additional noisy data to affect feature extraction.
In summary, the GCN-BiLSTM model shows the best performance in temperature prediction among the above four models. The model has significant spatial performance differences and is influenced by the meteorological factors of the surrounding areas. The The GCN-BiLSTM model has higher accuracy in cities with stable temperature fluctuations and slight temperature differences than in cities with sharp temperature fluctuations and significant temperature differences.

5. Conclusions

It is difficult to forecast future weather changes because of the nonlinear characteristics and spatial-temporal heterogeneity of meteorological time-series data. For accurate and effective temperature prediction, GCN and BiLSTM were integrated to construct the GCN-BiLSTM temperature prediction model in this research. Then, the influence of multiple meteorological elements and time-sliding windows on the forecast results was analyzed and compared among the four models.
By analyzing the influence of multiple meteorological elements and time-sliding windows on the forecast results and the forecast performance of each temperature forecast model on meteorological time series from different time and space perspectives, this research draws the following conclusions.
(1) Multidimensional meteorological features were used to replace single meteorological features in the GCN-BiLSTM model. The sliding time window setting was added to extend the model input data dimension and enable the model to further learn the correlation of multidimensional meteorological elements and temporal characteristics of historical meteorological data on future temperatures and improve prediction accuracy.
(2) Compared to baseline DL models, the prediction effect of the GCN-BiLSTM model was significantly improved, benefiting from the time series and spatial features of the meteorological data. GCN-BiLSTM model had the lowest RMSE and MAE values, which can accurately predict the future temperature with ±2 °C average errors.
(3) From the perspective of forecast time, the prediction accuracy of each model decreases as the number of prediction days increases. Among them, the error curve of the GCN-BiLSTM-based prediction is relatively smooth. From the perspective of space, the prediction effect of the GCN-BiLSTM model is influenced by the meteorological factors of neighboring regions. The prediction results of each model in areas with smaller annual temperature differences are better than those with larger annual temperature differences.
In the future, we plan to enhance this research in the following ways. First, this research makes model performance comparisons for temperature prediction in different regions and analyzes the spatial performance differences of the models. Most of China’s areas are located in the temperate monsoon climate zone, the subtropical monsoon climate zone, and the temperate continental climate zone, with noticeable seasonal changes. In future research, the performance differences of the model will be considered in different seasons. In addition, controlled experiments can be conducted and analyzed on season or month scales. Second, only adjacency is considered a spatial feature for constructing the experimental graph network in this research, in which the spatial feature relationship is relatively simple. Furthermore, in this research, the station temperature is used to represent the temperature of the city where the station is located, which should be further optimized on a smaller scale. For example, we can generate large amounts of raster data to describe the spatial feature relationship of meteorological time series by further interpolation methods from points to surfaces.

Author Contributions

Conceptualization, L.M.; methodology, L.M.; software, D.Y.; validation, Y.P. and Y.Z.; formal analysis, Y.Z.; investigation, Y.Z.; resources, L.M.; data curation, D.Y. and Y.Z.; writing—original draft preparation, D.Y. and Y.Z.; writing—review and editing, L.M. and Y.P.; visualization, D.Y. and Y.Z.; supervision, Y.P. and L.M.; project administration, Y.P. and L.M.; funding acquisition, Y.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Opening Foundation of Ministry of Education of Key Lab of Virtual Geographic Environment (Grant No. 2020VGE02) and the Natural Science Research Start-up Foundation of Recruiting Talents of Nanjing University of Posts and Telecommunications (Grant No. NY220166).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available at http://data.cma.cn/ accessed on 23 September 2022. All other data presented in the study are available on reasonable request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Dyurgerov, M. Mountain and subpolar glaciers show an increase in sensitivity to climate warming and intensification of the water cycle. J. Hydrol. 2003, 282, 164–176. [Google Scholar] [CrossRef]
  2. WMO. Reducing and managing risks of disasters in a changing climate. WMO Bull. 2013, 62, 23–31. [Google Scholar]
  3. He, Y.; Zhang, Z.; Theakstone, W.H.; Chen, T.; Yao, T.; Pang, H. Changing features of the climate and glaciers in China’s monsoonal temperate glacier region. J. Geophys. Res.-Atmos. 2003, 108. [Google Scholar] [CrossRef]
  4. Tao, S.Y.; Zhao, S.X.; Zhou, X.P.; Ji, L.R.; Sun, S.Q.; Gao, S.T.; Zhang, Q.Y. Advances in weather science and weather forecasting. Atmos. Sci. 2003, 27, 451–467. [Google Scholar]
  5. Gustavo, Z.; Nuria, P.G.; Francisco, A.; Valdir, A.S.; Charlei, A. A Short Critical History on the Development of Meteorology and Climatology. Climate 2017, 5, 23. [Google Scholar]
  6. Bogdanovs, N.; Bistrovs, V.; Petersons, E.; Ipatovs, A.; Belinskis, R. Weather prediction algorithm based on historical data using kalman filter. In Proceedings of the 2018 Advances in Wireless and Optical Communications (RTUWO), Riga, Latvia, 15–16 November 2018; pp. 94–99. [Google Scholar]
  7. Pérez-Vega, A.; Travieso, M.C.; Hernández-Travieso, G.J.; Alonso, B.J.; Dutta, K.M.; Singh, A. Forecast of temperature using support vector machines. In Proceedings of the 2016 International Conference on Computing, Communication and Automation (ICCCA), Greater Noida, India, 29–30 April 2016; IEEE: New York, NY, USA, 2016; pp. 388–392. [Google Scholar]
  8. Liang, S.; Wang, D.; Wu, J.; Wang, R.; Wang, R. Method of Bidirectional LSTM Modelling for the Atmospheric Temperature. Intell. Autom. Soft Comput. 2021, 30, 701–714. [Google Scholar] [CrossRef]
  9. Federico, C.; Massimiliano, B. Wind speed and wind energy forecast through Kalman filtering of Numerical Weather Prediction model output. Appl. Energy 2012, 99, 154–166. [Google Scholar]
  10. Wang, H.; Huang, J.; Zhou, H.; Zhao, L.; Yuan, Y. An integrated variational mode decomposition and ARIMA model to forecast air temperature. Sustainability 2019, 11, 4018. [Google Scholar] [CrossRef] [Green Version]
  11. Kisi, O.; Cimen, M. Precipitation forecasting by using wavelet-support vector machine conjunction model. Eng. Appl. Artif. Intell. 2012, 25, 783–792. [Google Scholar] [CrossRef]
  12. Radhika, Y.; Shashi, M. Atmospheric temperature prediction using support vector machines. Int. J. Comput. Theory Eng. 2009, 1, 55. [Google Scholar] [CrossRef] [Green Version]
  13. Venkadesh, S.; Hoogenboom, G.; Potter, W.; McClendon, R. A genetic algorithm to refine input data selection for air temperature prediction using artificial neural networks. Appl. Soft Comput. 2013, 13, 2253–2260. [Google Scholar] [CrossRef]
  14. Abbot, J.; Marohasy, J. The application of machine learning for evaluating anthropogenic versus natural climate change. GeoResJ 2017, 14, 36–46. [Google Scholar] [CrossRef]
  15. Feng, H.C.; Xu, D.G. LSTM-based weather temperature prediction. Digit. Des. 2018, 7, 52. [Google Scholar]
  16. Wu, Y.; Wang, X. PM2. 5 concentration prediction using improved LSTM in big data environment. Fresenius Environ. Bull. 2020, 29, 10098–10108. [Google Scholar]
  17. Qiao, B.; Wu, Z.; Tang, Z.; Wu, G. Sea surface temperature prediction approach based on 3D CNN and LSTM with attention mechanism. In Proceedings of the 2022 24th International Conference on Advanced Communication Technology (ICACT), PyeongChang Kwangwoon_Do, Republic of Korea, 13–16 February 2022; IEEE: New York, NY, USA, 2022; pp. 342–347. [Google Scholar]
  18. Jeong, S.; Park, I.; Kim, H.; Song, C.; Kim, H. Temperature prediction based on bidirectional long short-term memory and convolutional neural network combining observed and numerical forecast data. Sensors 2021, 21, 941. [Google Scholar] [CrossRef]
  19. Cao, X.; Zhong, Y.; Zhou, Y.; Wang, J.; Zhu, C.; Zhang, W. Interactive temporal recurrent convolution network for traffic prediction in data centers. IEEE Access 2017, 6, 5276–5289. [Google Scholar] [CrossRef]
  20. Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; Li, H. T-gcn: A temporal graph convolutional network for traffic prediction. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3848–3858. [Google Scholar] [CrossRef] [Green Version]
  21. Meteorological Information Center. Daily Meteorological Dataset of Basic Meteorological Elements of China National Surface Weather Station (V3.0) (1951–2010); National Tibetan Plateau Data Center: Qinghai, China, 2019; Available online: http://data.cma.cn/ (accessed on 19 November 2022).
  22. Zhu, H.; Lin, Y.; Liu, Z.; Fu, J.; Chua, T.-S.; Sun, M. Graph Neural Networks with Generated Parameters for Relation Extraction. arXiv 2019, arXiv:1902.00756. [Google Scholar]
  23. Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
  24. Shahani, N.M.; Kamran, M.; Zheng, X.; Liu, C. Predictive modeling of drilling rate index using machine learning approaches: LSTM, simple RNN, and RFA. Pet. Sci. Technol. 2022, 40, 534–555. [Google Scholar] [CrossRef]
  25. Kiperwasser, E.; Goldberg, Y. Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations. Trans. Assoc. Comput. Linguist. 2016, 4, 313–327. [Google Scholar] [CrossRef]
  26. Li, L.; Yang, Y.; Yuan, Z.; Chen, Z. Spatial-temporal approach for traffic status analysis and prediction based on Bi-LSTM structure. Mod. Phys. Lett. B 2021, 35, 2150481. [Google Scholar] [CrossRef]
  27. Jian, L.; Zhao, Y.; Zhu, Y.-P.; Zhang, M.-B.; Bertolatti, D. An application of ARIMA model to predict submicron particle concentrations from meteorological factors at a busy roadside in Hangzhou, China. Sci. Total Environ. 2012, 426, 336–345. [Google Scholar] [CrossRef] [PubMed]
  28. Luc, S.; Frederic-Victor, D. Modelling progressive failure in fractured rock masses using a 3D discrete element method. Int. J. Rock Mech. Min. Sci. 2012, 52, 18–30. [Google Scholar]
Figure 1. Data preprocessing process.
Figure 1. Data preprocessing process.
Applsci 12 11833 g001
Figure 2. The adjacent relationship between cities in Jiangsu Province.
Figure 2. The adjacent relationship between cities in Jiangsu Province.
Applsci 12 11833 g002
Figure 3. First-order GCN implementation process.
Figure 3. First-order GCN implementation process.
Applsci 12 11833 g003
Figure 4. Second-order GCN implementation process.
Figure 4. Second-order GCN implementation process.
Applsci 12 11833 g004
Figure 5. The network structure of BiLSTM.
Figure 5. The network structure of BiLSTM.
Applsci 12 11833 g005
Figure 6. The workflow of the GCN-BiLSTM model.
Figure 6. The workflow of the GCN-BiLSTM model.
Applsci 12 11833 g006
Figure 7. Comparison of RMSE of temperature prediction for the next 14 days.
Figure 7. Comparison of RMSE of temperature prediction for the next 14 days.
Applsci 12 11833 g007
Figure 8. Comparison of MAE of temperature prediction for the next 14 days.
Figure 8. Comparison of MAE of temperature prediction for the next 14 days.
Applsci 12 11833 g008
Figure 9. Comparison of RMSE of multiregional temperature prediction experiments.
Figure 9. Comparison of RMSE of multiregional temperature prediction experiments.
Applsci 12 11833 g009
Figure 10. Comparison of MAE in multiregional temperature prediction experiments.
Figure 10. Comparison of MAE in multiregional temperature prediction experiments.
Applsci 12 11833 g010
Figure 11. Temperature trend map of various regions.
Figure 11. Temperature trend map of various regions.
Applsci 12 11833 g011
Figure 12. Spatial distribution of temperature prediction errors in Jiangsu Province.
Figure 12. Spatial distribution of temperature prediction errors in Jiangsu Province.
Applsci 12 11833 g012
Table 1. Comparison of temperature prediction methods.
Table 1. Comparison of temperature prediction methods.
MethodsAdvantagesDisadvantages
Mathematical statistics
(e.g., ARIMA)
Simple principle; fast calculation speed.Single factor analysis; poor long-term predictive performance.
Machine learning methods (e.g., SVM)Expertise in uncovering correlations between the factors.Lack of analysis of time series.
Time series methods
(e.g., LSTM)
Simple modeling; analysis of time series features.Lack of analysis of spatial feature; low prediction accuracy when there are few time samples.
Spatial-temporal feature methods (e.g., CNN-BiLSTM)Consider spatial-temporal characteristics; good at processing raster data and regular structure.Poor performance in handling irregular graph structures
Table 2. Part of the original daily value data of Heilongjiang station of China’s surface climate data.
Table 2. Part of the original daily value data of Heilongjiang station of China’s surface climate data.
District
Station
DateTemperature
(°C)
Relative Humidity
(%)
Precipitation
(0.1 mm)
Air Pressure
(0.1 hPa)
Wind Speed
(0.1 m/s)
501361 May 20145.138096,3403.5
501362 May 20141.336096,6101.4
501363 May 20142.334096,6002.2
501364 May 20142.733096,6603
501365 May 20142.2711.896,2502.7
501366 May 20144.975096,0901.3
Table 3. Serial number of the city.
Table 3. Serial number of the city.
NumberCityNumberCity
1Suqian8Zhenjiang
2Xuzhou9Taizhou
3Lianyungang10Nantong
4Huai’an11Changzhou
5Yancheng12Wuxi
6Yangzhou13Suzhou
7Nanjing
Table 4. Adjacency matrix of cities in Jiangsu Province.
Table 4. Adjacency matrix of cities in Jiangsu Province.
Number12345678910111213
10111000000000
21010000000000
31101100000000
41010110000000
50011010011000
60001101110000
70000010100100
80000011010100
90000110101110
100000100010001
110000001110011
120000000010101
130000000001110
Table 5. Degree matrix of the cities in Jiangsu Province.
Table 5. Degree matrix of the cities in Jiangsu Province.
Number12345678910111213
13000000000000
20200000000000
30040000000000
40004000000000
50000500000000
60000050000000
70000003000000
80000000600000
90000000060000
100000000003000
110000000000500
120000000000030
130000000000003
Table 6. Observation data of basic meteorological elements at ground meteorological stations in Jiangsu Province.
Table 6. Observation data of basic meteorological elements at ground meteorological stations in Jiangsu Province.
ProvinceDistrict Station NumberStation NameLatitude (d m)Longitude (d m)Altitude of the Barometric Sensor (m)Altitude of the Observation Site (m)
Jiangsu58238Nanjing315611,85436.435.2
Jiangsu58354Wuxi313712,0214.13.2
Jiangsu58027Xuzhou341711,70942.041.2
Jiangsu58342Changzhou314311,9337.76.9
Jiangsu58349Suzhou312512,0348.98.0
Jiangsu58259Nantong320512,0594.64.8
Jiangsu58044Lianyungang343211,9144.74.7
Jiangsu58141Huai’an333811,85613.712.5
Jiangsu58154Yancheng332612,0126.82.5
Jiangsu58245Yangzhou322511,92514.79.9
Jiangsu58341Zhenjiang315911,93610.26.6
Jiangsu58246Taizhou323312,0003.42.6
Jiangsu58131Suqian335811,81326.025.0
Table 7. Experimental results of temperature prediction for the next 14 days.
Table 7. Experimental results of temperature prediction for the next 14 days.
Forecast DaysRMSE/°CMAE/°C
GCN-BiLSTMDFNLSTMARIMAGCN-BiLSTMDFNLSTMARIMA
11.5342.9332.2452.1242.4143.4452.9824.122
21.4593.2382.3572.2452.3583.5623.1524.258
31.7453.5962.4582.4582.5743.4212.8634.022
41.8623.8952.5343.5182.6823.8323.4504.249
52.2734.1562.8984.0562.7993.7513.7154.348
62.4793.7803.4874.5853.1714.1413.8874.985
72.6354.2863.9115.2872.8544.3583.9565.681
82.9564.3423.6784.9282.9634.6553.6585.852
92.8314.7563.7565.5803.1924.8554.2546.258
103.1494.8563.8955.7873.4874.9564.0586.574
113.4554.5604.3556.4563.2235.4124.5357.152
123.6244.9814.2596.8543.5945.7584.9146.895
133.3685.4154.5607.7583.2616.1825.1127.556
144.0235.6764.4578.2363.7585.9625.4578.457
Average2.6714.3193.4894.9913.0244.5924.0105.744
Table 8. Comparison of RMSE for multiregional temperature prediction.
Table 8. Comparison of RMSE for multiregional temperature prediction.
CityGCN-BiLSTMDFNLSTMARIMA
Harbin2.2943.5462.8492.868
Beijing2.0353.3242.7513.146
Jinan2.3373.6602.9552.886
Qingdao1.8482.9152.4672.748
Nanjing2.1413.2232.7112.755
Hangzhou1.6012.8822.4452.668
Guangzhou1.4852.6642.2122.524
Shenzhen1.5102.6722.3132.329
average1.9063.1112.5882.741
Table 9. Comparison of MAE for multiregional temperature prediction.
Table 9. Comparison of MAE for multiregional temperature prediction.
CityGCN-BiLSTMDFNLSTMARIMA
Harbin2.8803.8573.4624.402
Beijing2.7653.7143.3514.246
Jinan2.8863.9703.5514.583
Qingdao2.3723.4462.8214.057
Nanjing2.6123.5822.9284.356
Hangzhou2.4873.2332.7854.012
Guangzhou2.3453.1202.6943.924
Shenzhen2.3183.3352.7433.846
average2.5833.5323.0424.178
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Miao, L.; Yu, D.; Pang, Y.; Zhai, Y. Temperature Prediction of Chinese Cities Based on GCN-BiLSTM. Appl. Sci. 2022, 12, 11833. https://doi.org/10.3390/app122211833

AMA Style

Miao L, Yu D, Pang Y, Zhai Y. Temperature Prediction of Chinese Cities Based on GCN-BiLSTM. Applied Sciences. 2022; 12(22):11833. https://doi.org/10.3390/app122211833

Chicago/Turabian Style

Miao, Lizhi, Dingyu Yu, Yueyong Pang, and Yuehao Zhai. 2022. "Temperature Prediction of Chinese Cities Based on GCN-BiLSTM" Applied Sciences 12, no. 22: 11833. https://doi.org/10.3390/app122211833

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop