STFM: Accurate Spatio-Temporal Fusion Model for Weather Forecasting
Abstract
1. Introduction
2. Related Work
2.1. Spatiotemporal Weather Prediction
2.2. Spatiotemporal Dependency Modeling
3. Data and Methods
3.1. ERA5 Data of Qinghai Province, China
3.2. Methodology
3.2.1. Problem Definition
3.2.2. Overall Framework
- Spatiotemporal Encoder. The encoder plays a crucial role in transforming raw data into meaningful feature representations, rich in spatiotemporal features and context modeling information essential for deep networks to comprehend dynamic data changes. Traditional encoder architectures [42,43,44] often rely solely on convolutional neural networks (CNNs) to capture spatiotemporal features. However, due to CNNs’ inherent limitations, although they can recognize local patterns or features, they cannot remember the state of the previous input and do not have the ability to propagate state information throughout the sequence, and traditional encoders may inadequately capture temporal dependencies within data sequences. To address this limitation, we propose an innovative encoder structure that integrates a bidirectional long short-term memory network (Bi-LSTM) prior to traditional two-dimensional convolution. The improvements in the encoder structure are particularly crucial for meteorological forecasting in regions with complex climate characteristics. In Qinghai Province, the monsoon system introduces significant seasonal variability, resulting in intricate spatial patterns of precipitation and temperature distribution. Our model employs a Bi-LSTM architecture to capture the cumulative effects of these seasonal changes over time, followed by convolutional layers to extract spatial features. This approach enhances the traditional encoder, which relies solely on CNNs, by better capturing the spatiotemporal correlations within the climate system, ultimately improving forecast accuracy. We reshape the input data initially from the shape of to and feed it into a Bi-LSTM [45] for temporal encoding, where B represents the batch size of the input data, T represents the sequence length, C represents the channel, and H and W represent the height and width of the image. This step allows us to capture temporal patterns and trends at each position in the data. The Bi-LSTM’s output is reshaped to and subsequently inputted into two-dimensional convolution layers to extract spatial features, resulting in a more comprehensive spatiotemporal representation. The fundamental concept underlying this structure is the fusion of Bi-LSTM’s temporal feature extraction capabilities with two-dimensional convolution’s spatial feature extraction abilities, thereby enhancing the encoder’s spatiotemporal modeling prowess. The above process can be expressed as
- Spatiotemporal Fusion Unit. In order to further improve the accuracy of meteorological forecasts, we proposed a spatiotemporal fusion module. As shown in Figure 3, our spatiotemporal fusion module consists of a spatial attention unit and a temporal convolution unit. We found that the spatial attention module we designed can capture the spatial correlation of meteorological data in a global and local manner, and then effectively capture the temporal continuity and change law through a one-dimensional convolution operation, while enabling the temporal convolution to have spatial perception capabilities to a certain extent. The organic combination of the two can realize the comprehensive modeling of the spatiotemporal relationship in meteorological data in the model, thereby improving the prediction performance of the model.
- Spatial Attention Unit. We represent encoded meteorological data as a multi-dimensional tensor of shape , where B denotes the batch size. Within the spatial attention unit (Figure 3), we compress the temporal and channel dimensions of the feature, reshaping it into . This transformation facilitates decomposition of spatial attention into static attention (Global Static Attention, GSA), focusing on global information, and dynamic attention (Channel Dynamic Attention, CDA), emphasizing channel interactions (Figure 3). Inspired by the spatial attention mechanism and the gradual changes in meteorological elements, we employ global average pooling and global minimum pooling operations to capture spatial features. Global average pooling aggregates data across the entire time axis to preserve trend information, while global minimum pooling extracts local minimums, crucial for detecting subtle changes without losing significant details. This dual pooling strategy yields a comprehensive feature representation. However, static attention alone insufficiently captures temporal dynamics of spatial information. Therefore, we augment it with dynamic spatial attention, which interacts and integrates spatial information across different time steps using compressed excitation methods [46]. This modification enhances the model’s ability to learn dynamic spatial patterns over time. In summary, our spatial attention module (SAU) combines global and dynamic spatial attention mechanisms to more accurately model the dynamic evolution of meteorological data. Since complex climate systems often exhibit pronounced spatiotemporal dynamics, including large-scale precipitation variations due to monsoons, localized effects of mountainous terrain on wind patterns, and interactions among multiple climate systems. Our model addresses these challenges through the use of global static attention (GSA) and channel dynamic attention (CDA). GSA effectively captures broad-scale climate trends, such as long-term seasonal changes and large-scale airflow patterns. CDA identifies key dynamic climate events within time series data, such as frontal activities and cyclone trajectories. Furthermore, our dual-pooling strategy retains comprehensive climate trend information through global average pooling, while global minimum pooling enables the model to detect local extreme events and subtle changes, which are critical for identifying phenomena like heavy rainfall and drought. Dynamic spatial attention further enhances the model’s ability to adapt its focus to spatial features over time, thereby capturing the spatiotemporal evolution of meteorological elements, particularly in regions with complex terrain, such as Qinghai Province. By integrating global and dynamic attention mechanisms, our model provides a more thorough simulation of meteorological data across spatiotemporal scales. This approach improves prediction accuracy, especially for complex climate phenomena, and significantly enhances the model’s generalization capability. We elaborate on the proposed spatial attention module (SAU):where R is the processed representation of shape ; and represent average pooling and minimum pooling; represents the result of fusion of average pooling and minimum pooling using the convolution kernel of ; represents the activation function; represents the compressed excitation module composed of fully connected layers; and represent global static attention and channel dynamic attention, respectively; ⊗ represents Kronecker product; and ⊙ represents Hadamard product.
- Temporal Convolution Unit. The temporal convolution unit utilizes a one-dimensional convolution operation. The convolution kernel performs local perception by sliding across different time steps to capture short-term changes in meteorological data. To comprehensively understand long-term dependencies, we implement a two-layer convolution structure: first, a standard one-dimensional convolution layer, and second, a dilated convolution layer, which expands the receptive field. This two-layer structure effectively encodes temporal information and enriches the representation of temporal features through residual connections. To fully consider the spatiotemporal relationships in the data, we adopt a data reshaping strategy. Specifically, we reshape the data tensor from to , merging the spatial and temporal dimensions. This operation aids the model in capturing spatiotemporal dynamics, enabling the temporal convolution unit to fuse spatial and temporal information more effectively. After processing through the spatiotemporal convolutional network (TCN) [47], we reshape the data back into the format for subsequent processing in the decoder module. This reshaping ensures smooth data transfer between modules and facilitates the collaborative functioning of the entire model. In summary, the organic combination of temporal convolution units and spatial attention units enables our spatiotemporal fusion module to comprehensively consider the spatiotemporal relationships in meteorological data, thereby significantly improving forecasting accuracy. We have theoretically analyzed how this combination enriches the model’s understanding of meteorological data. In the next section, we will experimentally verify the significant improvement in forecasting performance achievable through these enhancements.
4. Results
4.1. Experiment Setup
4.2. Evaluation Metrics
4.3. Analysis and Evaluation of Comparative Experimental Results
- Temperature. In Table 1 and Table 2, we present the quantitative comparison results of the STFM and existing models for the 850 hPa temperature (t_850) and 2 m Temperature (t2m) variables. For these variables, the STFM reduced the MSE evaluation index by 8.5% and 7.5%, respectively, compared to the optimal TAU. Figure 4 and Figure 5 show the qualitative results for the Temperature_850 and 2 m Temperature variables. It is evident that after the sixth frame (six hours into the future), the predicted values of the TAU begin to significantly deviate from the true values, whereas the STFM maintains a relatively small deviation. Between the tenth and twelfth frames (ten to twelve hours into the future), the predicted values of the TAU in some areas show serious inconsistencies with the true values, while the STFM significantly reduces the area of such prediction inconsistencies. This shows that the STFM demonstrates superior performance in capturing seasonal variations compared to TAU. Specifically, when forecasting temperature over the next six hours and the subsequent ten to twelve hours, the STFM exhibits a smaller prediction bias, indicating a more accurate response to seasonal fluctuations. Additionally, in long-term forecasts, the STFM effectively maintains prediction consistency, reflecting its enhanced adaptability to the dynamic changes within the climate system. This advantage is particularly notable in the short- to medium-term forecasts, from the next six hours to the next ten to twelve hours. The STFM’s efficacy in capturing long-term dependencies and spatiotemporal dynamics is attributed to its extended convolutional layers and dynamic spatial attention mechanisms. These features contribute to lower mean squared error (MSE) and reduced prediction bias, especially in the context of complex climate conditions.
- Wind speed component. In Table 1, we present the quantitative comparison results of the STFM and existing models for the 10 m wind speed u-component (u_10) and the 10 m wind speed v-component (v_10). For the u_10 and v_10 variables, the STFM shows about a 5.7% reduction in the MSE evaluation index compared to the optimal TAU. Figure 6 and Figure 7 display the qualitative results for the u_10 and v_10 variables. From the error graphs, we can see that in the prediction of wind speed variables over the next 12 h, the STFM does not exhibit significant prediction deviations as the prediction time increases. In contrast, the TAU begins to show serious prediction deviations in multiple areas after the eighth frame (eight hours into the future). The observed performance indicates that the STFM demonstrates a strong ability to adapt to the dynamic changes within the climate system. In contrast, the TAU exhibits significant deviation in predictions beyond the eighth frame, which may be attributed to its inadequate capture of seasonal variations in wind speed and local wind patterns, leading to reduced prediction accuracy. The STFM’s stability and accuracy in the short- to medium-term forecasts underscore its effectiveness in adapting to fluctuations in wind speed and complex climate conditions. This capability not only enhances the precision of wind speed predictions but also improves responsiveness to climate change, underscoring its substantial practical application value.
- Geopotential and Relative Humidity. In Table 2, we present the quantitative comparison results of the STFM and existing models for the 500 hPa geopotential (Geopotential_500) and 500 hPa relative humidity (Relative_humidity_500) variables. For Geopotential_500, the STFM reduces the MSE by about 3% compared to the best-performing Uniformer. For Relative_humidity_500, the STFM achieves a 1% reduction in MSE compared to the best-performing SimVP-V2. As shown in Figure 8 and Figure 9, although the improvement of the STFM is modest, it is still evident that the error maps of the STFM are significantly sparser than those of the Uniformer and SimVP-V2. The STFM’s advantage in predicting 500 hPa potential indicates its superior ability to capture atmospheric circulation characteristics. Additionally, the mean squared error (MSE) for relative humidity is reduced by 1%, reflecting an improvement in the model’s performance for predicting humidity distribution. While the enhancement may seem modest, the increased stability and accuracy of the STFM contribute significantly to a better understanding of middle-atmosphere characteristics.
4.4. Analysis and Evaluation of Ablation Experiment Results
- Temporal Encoder. The results indicate that for the three variables Geopotential_500, Relative humidity_500, and u_10, the MSE of STFM-NTE increased by 9.7%, 1.8%, and 0.77%, respectively, compared to STFM. Since the encoder module is crucial for feature extraction in the entire model, and the encoder of STFM-NTE only uses convolution operations for spatial feature extraction, it is less effective than the spatiotemporal encoder used in STFM. Therefore, adding temporal encoding to the encoder significantly enhances the overall prediction accuracy of the model.
- Spatiotemporal Fusion Unit. The results show that STFM-NSTFU significantly increases the MSE evaluation index for each variable compared to SFTM. For instance, the Geopotential_500 variable increased by 12%, and the Relative humidity_500 and v_10 variables increased by 5%. Since STFM-NSTFU lacks a spatiotemporal fusion unit, it misses out on guiding high-dimensional spatiotemporal information, leading to a considerable drop in prediction accuracy.
- Temporal Convolutional Unit. The prediction loss of STFM-NTE-NSAU on each meteorological variable increased significantly compared to STFM, which lacks temporal encoding and spatial attention modules. This demonstrates the crucial role of the temporal convolutional module.
5. Conclusions and Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Joslyn, S.; Savelli, S. Communicating forecast uncertainty: Public perception of weather forecast uncertainty. Meteorol. Appl. 2010, 17, 180–195. [Google Scholar] [CrossRef]
- Scher, S.; Messori, G. Predicting weather forecast uncertainty with machine learning. Q. J. R. Meteorol. Soc. 2018, 144, 2830–2841. [Google Scholar] [CrossRef]
- Lorenc, A.C. Analysis methods for numerical weather prediction. Q. J. R. Meteorol. Soc. 1986, 112, 1177–1194. [Google Scholar] [CrossRef]
- Bauer, P.; Thorpe, A.; Brunet, G. The quiet revolution of numerical weather prediction. Nature 2015, 525, 47–55. [Google Scholar] [CrossRef]
- Schultz, M.G.; Betancourt, C.; Gong, B.; Kleinert, F.; Langguth, M.; Leufen, L.H.; Mozaffari, A.; Stadtler, S. Can deep learning beat numerical weather prediction? Philos. Trans. R. Soc. A 2021, 379, 20200097. [Google Scholar] [CrossRef]
- Salman, A.G.; Kanigoro, B.; Heryadi, Y. Weather forecasting using deep learning techniques. In Proceedings of the 2015 International Conference on Advanced Computer Science and Information Systems (ICACSIS), Depok, Indonesia, 10–11 October 2015; pp. 281–285. [Google Scholar]
- Ren, X.; Li, X.; Ren, K.; Song, J.; Xu, Z.; Deng, K.; Wang, X. Deep learning-based weather prediction: A survey. Big Data Res. 2021, 23, 100178. [Google Scholar] [CrossRef]
- Buizza, R.; Tribbia, J.; Molteni, F.; Palmer, T. Computation of optimal unstable structures for a numerical weather prediction model. Tellus A 1993, 45, 388–407. [Google Scholar] [CrossRef]
- Rodwell, M.; Palmer, T. Using numerical weather prediction to assess climate models. Q. J. R. Meteorol. Soc. A J. Atmos. Sci. Appl. Meteorol. Phys. Oceanogr. 2007, 133, 129–146. [Google Scholar] [CrossRef]
- Onyango, A.O.; Ongoma, V. Estimation of mean monthly global solar radiation using sunshine hours for Nairobi City, Kenya. J. Renew. Sustain. Energy 2015, 7, 053105. [Google Scholar] [CrossRef]
- Liang, H.; Sun, X.; Sun, Y.; Gao, Y. Text feature extraction based on deep learning: A review. EURASIP J. Wirel. Commun. Netw. 2017, 2017, 211. [Google Scholar] [CrossRef]
- Shi, X.; Yeung, D.Y. Machine learning for spatiotemporal sequence forecasting: A survey. arXiv 2018, arXiv:1808.06865. [Google Scholar]
- Cao, S.; Wu, L.; Wu, J.; Wu, D.; Li, Q. A spatio-temporal sequence-to-sequence network for traffic flow prediction. Inf. Sci. 2022, 610, 185–203. [Google Scholar] [CrossRef]
- Kim, T.; Yue, Y.; Taylor, S.; Matthews, I. A decision tree framework for spatiotemporal sequence prediction. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, 10–13 August 2015; pp. 577–586. [Google Scholar]
- Rodrigues, E.R.; Oliveira, I.; Cunha, R.; Netto, M. DeepDownscale: A deep learning strategy for high-resolution weather forecast. In Proceedings of the 2018 IEEE 14th International Conference on e-Science (e-Science), Amsterdam, The Netherlands, 29 October–1 November 2018; pp. 415–422. [Google Scholar]
- Hewage, P.; Trovati, M.; Pereira, E.; Behera, A. Deep learning-based effective fine-grained weather forecasting model. Pattern Anal. Appl. 2021, 24, 343–366. [Google Scholar] [CrossRef]
- Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.c. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar]
- Shi, E.; Li, Q.; Gu, D.; Zhao, Z. A method of weather radar echo extrapolation based on convolutional neural networks. In Proceedings of the MultiMedia Modeling: 24th International Conference, MMM 2018, Bangkok, Thailand, 5–7 February 2018; Proceedings, Part I 24. Springer: Berlin/Heidelberg, Germany, 2018; pp. 16–28. [Google Scholar]
- Akbari Asanjan, A.; Yang, T.; Hsu, K.; Sorooshian, S.; Lin, J.; Peng, Q. Short-term precipitation forecast based on the PERSIANN system and LSTM recurrent neural networks. J. Geophys. Res. Atmos. 2018, 123, 12–543. [Google Scholar] [CrossRef]
- Wang, Y.; Jiang, L.; Yang, M.H.; Li, L.J.; Long, M.; Fei-Fei, L. Eidetic 3D LSTM: A model for video prediction and beyond. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Oprea, S.; Martinez-Gonzalez, P.; Garcia-Garcia, A.; Castro-Vargas, J.A.; Orts-Escolano, S.; Garcia-Rodriguez, J.; Argyros, A. A review on deep learning techniques for video prediction. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 2806–2826. [Google Scholar] [CrossRef]
- Jhuang, H.; Gall, J.; Zuffi, S.; Schmid, C.; Black, M.J. Towards understanding action recognition. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 3192–3199. [Google Scholar]
- Li, K.; Wang, Y.; Zhang, J.; Gao, P.; Song, G.; Liu, Y.; Li, H.; Qiao, Y. Uniformer: Unifying convolution and self-attention for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 12581–12600. [Google Scholar] [CrossRef]
- Wang, Y.; Wu, H.; Zhang, J.; Gao, Z.; Wang, J.; Philip, S.Y.; Long, M. Predrnn: A recurrent neural network for spatiotemporal predictive learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 2208–2225. [Google Scholar] [CrossRef]
- Tan, C.; Gao, Z.; Li, S.; Li, S.Z. Simvp: Towards simple yet powerful spatiotemporal predictive learning. arXiv 2022, arXiv:2211.12509. [Google Scholar]
- Zhang, J.; Zheng, Y.; Qi, D. Deep spatio-temporal residual networks for citywide crowd flows prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
- Kang, S.M.; Wildes, R.P. Review of action recognition and detection methods. arXiv 2016, arXiv:1610.06906. [Google Scholar]
- Lee, J.; Lee, J.; Lee, S.; Yoon, S. Mutual suppression network for video prediction using disentangled features. arXiv 2018, arXiv:1804.04810. [Google Scholar]
- Gehring, J.; Auli, M.; Grangier, D.; Dauphin, Y.N. A convolutional encoder model for neural machine translation. arXiv 2016, arXiv:1611.02344. [Google Scholar]
- Hopfield, J.J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA 1982, 79, 2554–2558. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Wang, Y.; Long, M.; Wang, J.; Gao, Z.; Yu, P.S. Predrnn: Recurrent neural networks for predictive learning using spatiotemporal lstms. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- Wang, Y.; Gao, Z.; Long, M.; Wang, J.; Philip, S.Y. Predrnn++: Towards a resolution of the deep-in-time dilemma in spatiotemporal predictive learning. In Proceedings of the International Conference on Machine Learning. PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 5123–5132. [Google Scholar]
- Wang, Y.; Zhang, J.; Zhu, H.; Long, M.; Wang, J.; Yu, P.S. Memory in memory: A predictive neural network for learning higher-order non-stationarity from spatiotemporal dynamics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9154–9162. [Google Scholar]
- Chang, Z.; Zhang, X.; Wang, S.; Ma, S.; Ye, Y.; Xinguang, X.; Gao, W. Mau: A motion-aware unit for video prediction and beyond. Adv. Neural Inf. Process. Syst. 2021, 34, 26950–26962. [Google Scholar]
- Wang, Q.; Atkinson, P.M. Spatio-temporal fusion for daily Sentinel-2 images. Remote Sens. Environ. 2018, 204, 31–42. [Google Scholar] [CrossRef]
- Kuettel, D.; Breitenstein, M.D.; Van Gool, L.; Ferrari, V. What’s going on? Discovering spatio-temporal dependencies in dynamic scenes. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 1951–1958. [Google Scholar]
- Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv 2017, arXiv:1707.01926. [Google Scholar]
- Yao, H.; Tang, X.; Wei, H.; Zheng, G.; Li, Z. Revisiting spatial-temporal similarity: A deep learning framework for traffic prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 5668–5675. [Google Scholar]
- Lin, H.; Bai, R.; Jia, W.; Yang, X.; You, Y. Preserving dynamic attention for long-term spatial-temporal prediction. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, 6–10 July 2020; pp. 36–46. [Google Scholar]
- Jwaid, T.; Meyer, H.D.; Ismail, A.H.; Baets, B.D. Curved splicing of copulas. Inf. Sci. 2021, 556, 95–110. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; proceedings, part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Zou, X.; Li, K.; Xing, J.; Tao, P.; Cui, Y. PMAA: A progressive multi-scale attention autoencoder model for high-performance cloud removal from multi-temporal satellite imagery. arXiv 2023, arXiv:2303.16565. [Google Scholar]
- Li, K.; Yang, R.; Hu, X. An efficient encoder-decoder architecture with top-down attention for speech separation. arXiv 2022, arXiv:2209.15200. [Google Scholar]
- Schuster, M.; Paliwal, K.K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Lea, C.; Flynn, M.D.; Vidal, R.; Reiter, A.; Hager, G.D. Temporal convolutional networks for action segmentation and detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 156–165. [Google Scholar]









| t2m | u10 | v10 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Method | MSE↓ | MAE↓ | ACC↑ | MSE↓ | MAE↓ | ACC↑ | MSE↓ | MAE↓ | ACC↑ | |||
| ConvLSTM | 0.8518 | 0.6078 | 0.9929 | 1.052 | 0.6725 | 0.9 | 0.8844 | 0.6354 | 0.8603 | |||
| E3D-LSTM | 0.8451 | 0.5969 | 0.993 | 1.002 | 0.6544 | 0.9044 | 0.8995 | 0.6372 | 0.8579 | |||
| PredRNN v2 | 0.8169 | 0.588 | 0.9933 | 1.213 | 0.711 | 0.883 | 0.9234 | 0.6422 | 0.8536 | |||
| UniFormer | 0.669 | 0.5294 | 0.9944 | 0.9465 | 0.6309 | 0.9096 | 0.8261 | 0.6046 | 0.8692 | |||
| SimVP v2 | 1.005 | 0.678 | 0.9916 | 1.028 | 0.6846 | 0.903 | 0.8583 | 0.6313 | 0.8635 | |||
| TAU | 0.5685 | 0.4775 | 0.9953 | 0.8781 | 0.5975 | 0.9167 | 0.7563 | 0.5682 | 0.8816 | |||
| STFM (Ours) | 0.5255 | 0.4572 | 0.9956 | 0.8276 | 0.5833 | 0.9217 | 0.7126 | 0.5539 | 0.888 | |||
| t850 | Geopotential_500 | Relative_humidity_500 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Method | MSE↓ | MAE↓ | ACC↑ | MSE↓ | MAE↓ | ACC↑ | MSE↓ | MAE↓ | ACC↑ | |||
| ConvLSTM | 0.8499 | 0.6069 | 0.9929 | 4973 | 51.26 | 0.9968 | 158.2 | 8.72 | 0.8874 | |||
| E3D-LSTM | 0.8277 | 0.5932 | 0.9932 | 6101 | 57.49 | 0.996 | 138.9 | 8.22 | 0.9014 | |||
| PredRNN v2 | 0.9106 | 0.6152 | 0.9925 | 4224 | 47.39 | 0.9972 | 164.8 | 8.94 | 0.8825 | |||
| UniFormer | 0.6752 | 0.5317 | 0.9943 | 4164 | 47.88 | 0.9973 | 141.2 | 8.1 | 0.8995 | |||
| SimVP v2 | 0.9336 | 0.6509 | 0.9922 | 4881 | 52.16 | 0.9968 | 117.9 | 7.44 | 0.9163 | |||
| TAU | 0.5737 | 0.4781 | 0.9953 | 4658 | 50.37 | 0.9969 | 125.1 | 7.59 | 0.9111 | |||
| STFM (Ours) | 0.5244 | 0.4546 | 0.9956 | 4027 | 46.9 | 0.9974 | 116.5 | 7.37 | 0.9177 | |||
| t2m | u10 | v10 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Method | MSE↓ | MAE↓ | ACC↑ | MSE↓ | MAE↓ | ACC↑ | MSE↓ | MAE↓ | ACC↑ | |||
| STFM | 0.5255 | 0.4572 | 0.9956 | 0.8276 | 0.5833 | 0.9217 | 0.7126 | 0.5539 | 0.888 | |||
| STFM-NTE | 0.5169 | 0.4588 | 0.9957 | 0.8341 | 0.5805 | 0.9208 | 0.7126 | 0.5598 | 0.8886 | |||
| STFM-NSTFU | 0.5458 | 0.4624 | 0.9954 | 0.8636 | 0.5939 | 0.9179 | 0.7488 | 0.567 | 0.8822 | |||
| STFM-NTE-NSAU | 0.5616 | 0.4701 | 0.9953 | 0.8566 | 0.5877 | 0.9185 | 0.7424 | 0.5666 | 0.8839 | |||
| STFM-NTE-NTCN | 0.5339 | 0.4683 | 0.9955 | 0.8402 | 0.5878 | 0.92 | 0.7229 | 0.5692 | 0.8865 | 
| t850 | Geopotential_500 | Relative_humidity_500 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Method | MSE↓ | MAE↓ | ACC↑ | MSE↓ | MAE↓ | ACC↑ | MSE↓ | MAE↓ | ACC↑ | |||
| STFM | 0.5244 | 0.4546 | 0.9956 | 4027 | 46.9 | 0.9974 | 116.5 | 7.37 | 0.9177 | |||
| STFM-NTE | 0.5176 | 0.4567 | 0.9956 | 4463 | 49.58 | 0.997 | 118.7 | 7.43 | 0.9156 | |||
| STFM-NSTFU | 0.5428 | 0.4622 | 0.9955 | 4524 | 49.99 | 0.997 | 122.4 | 7.58 | 0.913 | |||
| STFM-NTE-NSAU | 0.5534 | 0.4685 | 0.9954 | 4745 | 50.67 | 0.9968 | 124 | 7.53 | 0.9124 | |||
| STFM-NTE-NTCN | 0.5339 | 0.4667 | 0.9955 | 4226 | 48.04 | 0.9972 | 119.5 | 7.4 | 0.9154 | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, J.; Wu, L.; Zhang, T.; Huang, J.; Wang, X.; Tian, F. STFM: Accurate Spatio-Temporal Fusion Model for Weather Forecasting. Atmosphere 2024, 15, 1176. https://doi.org/10.3390/atmos15101176
Liu J, Wu L, Zhang T, Huang J, Wang X, Tian F. STFM: Accurate Spatio-Temporal Fusion Model for Weather Forecasting. Atmosphere. 2024; 15(10):1176. https://doi.org/10.3390/atmos15101176
Chicago/Turabian StyleLiu, Jun, Li Wu, Tao Zhang, Jianqiang Huang, Xiaoying Wang, and Fang Tian. 2024. "STFM: Accurate Spatio-Temporal Fusion Model for Weather Forecasting" Atmosphere 15, no. 10: 1176. https://doi.org/10.3390/atmos15101176
APA StyleLiu, J., Wu, L., Zhang, T., Huang, J., Wang, X., & Tian, F. (2024). STFM: Accurate Spatio-Temporal Fusion Model for Weather Forecasting. Atmosphere, 15(10), 1176. https://doi.org/10.3390/atmos15101176
 
        


 
       