# Deep Tower Networks for Efficient Temperature Forecasting from Multiple Data Sources

^{1}

^{2}

^{3}

^{4}

^{*}

^{†}

## Abstract

**:**

`yr.no`, the tower network has an overall 11% smaller root mean squared forecasting error. For the core architectures, the tower network documents competitive performance and proofs to be more robust compared to CNN and convLSTM models.

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Data

`yr.no`, a weather forecasting website and app hosted by the Norwegian government-owned national broadcasting corporation (NRK) and the Norwegian Meteorological Institute. The

`yr.no`forecasts are based on the aforementioned NWP model data, but have gone through various forms of post-processing, and thus represent the state-of-the-art when it comes to weather forecasting.

#### 2.2. Models

## 3. Results

`yr.no`are available only from 19 February 2018, so for comparisons with

`yr.no`, the period from 19 February 2018 to 31 December 2018 is used in the evaluation.

#### 3.1. Comparison 1: Weather Forecasting Baselines

`yr.no`, which are the best available post-processed versions of the same data.

`yr.no`is included in this comparison since these are the data that the average person will see when they check the weather forecast on their phone. The weather forecasts on

`yr.no`are optimized with respect to other criteria than simply RMSE, and thus it would be an oversimplification to say that the multimodal tower network is better. However, the multimodal tower network performs better in this test, which is a strong indication of its quality and potential as a post-processing technique.

#### 3.2. Comparison 2: Deep Learning Approaches

`yr.no`for every hour, approaching, but still nowhere near, persistence in hour 1.

## 4. Discussion

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Lorenz, E.N. The predictability of a flow which possesses many scales of motion. Tellus
**1969**, 21, 289–307. [Google Scholar] [CrossRef] - Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.-k.; Woo, W.-c. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. arXiv
**2015**, arXiv:cs.CV/1506.04214. [Google Scholar] - Qiu, M.; Zhao, P.; Zhang, K.; Huang, J.; Shi, X.; Wang, X.; Chu, W. A Short-Term Rainfall Prediction Model Using Multi-task Convolutional Neural Networks. In Proceedings of the 2017 IEEE International Conference on Data Mining (ICDM), New Orleans, LA, USA, 18–21 November 2017. [Google Scholar] [CrossRef]
- Sønderby, C.K.; Espeholt, L.; Heek, J.; Dehghani, M.; Oliver, A.; Salimans, T.; Agrawal, S.; Hickey, J.; Kalchbrenner, N. MetNet: A Neural Weather Model for Precipitation Forecasting. arXiv
**2020**, arXiv:cs.LG/2003.12140. [Google Scholar] - Espeholt, L.; Agrawal, S.; Sønderby, C.; Kumar, M.; Heek, J.; Bromberg, C.; Gazen, C.; Hickey, J.; Bell, A.; Kalchbrenner, N. Skillful Twelve Hour Precipitation Forecasts using Large Context Neural Networks. arXiv
**2021**, arXiv:cs.LG/2111.07470. [Google Scholar] - Agrawal, S.; Barrington, L.; Bromberg, C.; Burge, J.; Gazen, C.; Hickey, J. Machine Learning for Precipitation Nowcasting from Radar Images. arXiv
**2019**, arXiv:cs.CV/1912.12132. [Google Scholar] - Adewoyin, R.; Dueben, P.; Watson, P.; He, Y.; Dutta, R. TRU-NET: A Deep Learning Approach to High Resolution Prediction of Rainfall. arXiv
**2021**, arXiv:cs.CE/2008.09090. [Google Scholar] [CrossRef] - Trebing, K.; Stanczyk, T.; Mehrkanoon, S. SmaAt-UNet: Precipitation Nowcasting using a Small Attention-UNet Architecture. arXiv
**2021**, arXiv:cs.LG/2007.04417. [Google Scholar] [CrossRef] - Hossain, M.; Rekabdar, B.; Louis, S.J.; Dascalu, S. Forecasting the weather of Nevada: A deep learning approach. In Proceedings of the 2015 International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland, 12–17 July 2015. [Google Scholar] [CrossRef]
- Karevan, Z.; Suykens, J.A.K. Spatio-temporal Stacked LSTM for Temperature Prediction in Weather Forecasting. arXiv
**2018**, arXiv:cs.LG/1811.06341. [Google Scholar] - Kreuzer, D.; Munz, M.; Schlüter, S. Short-term temperature forecasts using a convolutional neural network—An application to different weather stations in Germany. Mach. Learn. Appl.
**2020**, 2, 100007. [Google Scholar] [CrossRef] - Gong, B.; Hußmann, S.; Mozaffari, A.; Vogelsang, J.; Schultz, M. Deep learning for short-term temperature forecasts with video prediction methods. In Proceedings of the EGU General Assembly Conference Abstracts 2020, Online, 4–8 May 2020. [Google Scholar] [CrossRef]
- Lee, A.X.; Zhang, R.; Ebert, F.; Abbeel, P.; Finn, C.; Levine, S. Stochastic Adversarial Video Prediction. arXiv
**2018**, arXiv:cs.CV/1804.01523. [Google Scholar] - Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. arXiv
**2014**, arXiv:stat.ML/1406.2661. [Google Scholar] [CrossRef] - Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. arXiv
**2014**, arXiv:stat.ML/1312.6114. [Google Scholar] - Tran, T.T.K.; Bateni, S.M.; Ki, S.J.; Vosoughifar, H. A Review of Neural Networks for Air Temperature Forecasting. Water
**2021**, 13, 1294. [Google Scholar] [CrossRef] - Singh, S.; Kaushik, M.; Gupta, A.; Malviya, A.K. Weather Forecasting using Machine Learning Techniques. SSRN Electron. J.
**2019**. [Google Scholar] [CrossRef] - Roy, D.S. Forecasting The Air Temperature at a Weather Station Using Deep Neural Networks. Procedia Comput. Sci.
**2020**, 178, 38–46. [Google Scholar] [CrossRef] - Zhang, Z.; Dong, Y. Temperature Forecasting via Convolutional Recurrent Neural Networks Based on Time-Series Data. Complexity
**2020**, 2020, 1–8. [Google Scholar] [CrossRef] - Eide, S.S.; Riegler, M.A.; Hammer, H.L.; Bremnes, J.B. Temperature Forecasting Using Tower Networks. In Proceedings of the 2021 Workshop on Intelligent Cross-Data Analysis and Retrieval, Taipei, Taiwan, 21 August 2021; ICDAR ’21. Association for Computing Machinery: New York, NY, USA, 2021; pp. 18–23. [Google Scholar] [CrossRef]
- Nipen, T.N.; Seierstad, I.A.; Lussana, C.; Kristiansen, J.; Hov, Ø. Adopting Citizen Observations in Operational Weather Prediction. Bull. Am. Meteorol. Soc.
**2020**, 101, E43–E57. [Google Scholar] [CrossRef] - Frogner, I.L.; Andrae, U.; Bojarova, J.; Callado, A.; Escribà, P.; Feddersen, H.; Hally, A.; Kauhanen, J.; Randriamampianina, R.; Singleton, A.; et al. HarmonEPS—The HARMONIE Ensemble Prediction System. Weather Forecast.
**2019**, 34, 1909–1937. [Google Scholar] [CrossRef] - Wu, S.; Wang, G.; Tang, P.; Chen, F.; Shi, L. Convolution with even-sized kernels and symmetric padding. Adv. Neural Inf. Process. Syst.
**2019**, 32, 12. [Google Scholar] - Hewage, P.; Trovati, M.; Pereira, E.; Behera, A. Deep learning-based effective fine-grained weather forecasting model. Pattern Anal. Appl.
**2021**, 24, 343–366. [Google Scholar] [CrossRef] - Gong, B.; Langguth, M.; Ji, Y.; Mozaffari, A.; Stadtler, S.; Mache, K.; Schultz, M.G. Temperature forecasting by deep learning methods. Geosci. Model Dev. Discuss.
**2022**, 1–35. [Google Scholar] [CrossRef] - Danandeh Mehr, A.; Rikhtehgar Ghiasi, A.; Yaseen, Z.M.; Sorman, A.U.; Abualigah, L. A novel intelligent deep learning predictive model for meteorological drought forecasting. J. Ambient. Intell. Humaniz. Comput.
**2022**, 1–15. [Google Scholar] [CrossRef] - Breiman, L. Random Forests. Mach. Learn.
**2001**, 45, 5–32. [Google Scholar] [CrossRef] [Green Version] - Fisher, A.; Rudin, C.; Dominici, F. All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously. arXiv
**2019**, arXiv:stat.ME/1801.01489. [Google Scholar]

**Figure 1.**(

**a**) The geographical area used in this work, shown as an empty, black square centered around Oslo on a map of Scandinavia. (

**b**) The topography of the area used.The Oslo fjord inlet can be seen towards the bottom of the image.

**Figure 2.**Network architecture for the normal and the modified tower network. The observational data have the dimensions $40\times 40\times 12$, and the NWP data $40\times 40\times 6$. The specific parameters used in this work are listed in Table 1.

**Figure 3.**One sample of observation data. Historical observations are observations from the input times, shown here in yellow. The NWP data are forecast data valid in the output times, here shown in orange.

**Figure 4.**Root mean squared error of the tested models and meteorological baselines averaged over the spatial grid.

**Figure 5.**Example of temperature forecasts from the different models with the ground truth for reference. Each row corresponds to a model, and each column to an hour.

**Figure 6.**Network architecture for the CNN. The input data are made up of historical observations, NWP forecasts and auxiliary inputs such as land area fraction and altitude. The inputs are stacked, with the resulting dimensions being $40\times 40\times 24$.

**Figure 7.**Network architecture for the convolutional LSTM. The input data are made up of historical observations from the previous day, historical observations from the 6 h prior to the predicted times, and NWP forecasts. These three inputs are treated like channels, such that the resulting dimensions of the input data are $6\times 40\times 40\times 3$.

**Figure 8.**Network architecture for the tower network with primary input made up of stacked observational and NWP data ($40\times 40\times 18$), and auxiliary input made up of land area fraction and altitude from the observations and NWP data set, as well as sine and cosine values corresponding to day of the year mapped to values between 0 and $2\pi $ ($40\times 40\times 6$). The construction of a tower remains the same as what is shown in Figure 2.

**Figure 9.**Boxplot showing the root mean squared error of the 15 realizations of each model, averaged over the spatial grid.

**Figure 10.**Training time per epoch for the 15 realizations of CNN, convLSTM and tower network in minutes.

**Figure 11.**Violin plot of the memory (in GB) utilized when training the models. Each violin represents the memory usage of 15 realizations of a model. From the left, there is CNN in red, convLSTM in blue and the tower network in green.

**Figure 12.**Moving average with a window of 5 of the normalized validation error of each model as a function of number of epochs. Values are averages for the 15 realizations of each model type.

**Figure 13.**Comparison of the RMSE of the best realization of each neural network and meteorological baselines, all averaged over the spatial grid.

**Figure 14.**Permutation feature importance of the best CNN, i.e., the error resulting from the permutation of one parameter, leaving the remaining parameters untouched.

**Figure 15.**Permutation feature importance of the best convLSTM, i.e., the error resulting from the permutation of one parameter, leaving the remaining parameters untouched.

**Figure 16.**Permutation feature importance of the best tower network, i.e., the error resulting from the permutation of one parameter, leaving the remaining parameters untouched.

Filters 1 | Filters 2 | Kernel Size 1 | Kernel Size 2 | Strides | ||
---|---|---|---|---|---|---|

Tower 1 | Basic block 1 | 64 | 32 | 8 | 3 | 1, 1 |

Basic block 2 | 64 | 32 | 3 | 3 | 1, 1 | |

Residual blocks | 64 | 32 | 3 | 3 | 1, 1 | |

Transposed 2D | 64 | - | 4 | - | 2, 2 | |

Tower 2 | Basic block 1 | 64 | 32 | 8 | 3 | 1, 2 |

Basic block 2 | 64 | 32 | 3 | 3 | 1, 2 | |

Residual blocks | 64 | 32 | 3 | 3 | 1, 1 | |

Transposed 2D | 64 | - | 4 | - | 2, 8 | |

Tower 3 | Basic block 1 | 64 | 32 | 8 | 3 | 1, 4 |

Basic block 2 | 64 | 32 | 3 | 3 | 1, 4 | |

Residual blocks | 64 | 32 | 3 | 3 | 1, 1 | |

Transposed 2D | 64 | - | 4 | - | 2, 20 | |

Tower 4 | Basic block 1 | 64 | 32 | 8 | 3 | 1, 1 |

Basic block 2 | 64 | 32 | 3 | 3 | 1, 1 | |

Residual blocks | 64 | 32 | 3 | 3 | 1, 1 | |

Transposed 2D | 64 | - | 4 | - | 2, 2 | |

Final convolutional layer | 6 | - | 8 | - | - |

Filters 1 | Filters 2 | Kernel Size 1 | Kernel Size 2 | Stride | ||
---|---|---|---|---|---|---|

Tower 1 | Basic block 1 | 64 | 32 | 8 | 3 | 1 |

Basic block 2 | 64 | 32 | 3 | 3 | 1 | |

Residual blocks | 64 | 32 | 3 | 3 | 1 | |

Transposed 2D | 64 | - | 4 | - | 2 | |

Tower 2 | Basic block 1 | 64 | 32 | 8 | 3 | 2 |

Basic block 2 | 64 | 32 | 3 | 3 | 2 | |

Residual blocks | 64 | 32 | 3 | 3 | 1 | |

Transposed 2D | 64 | - | 4 | - | 8 | |

Tower 3 | Basic block 1 | 64 | 32 | 8 | 3 | 4 |

Basic block 2 | 64 | 32 | 3 | 3 | 4 | |

Residual blocks | 64 | 32 | 3 | 3 | 1 | |

Transposed 2D | 64 | - | 4 | - | 20 | |

Tower 4 | Basic block 1 | 64 | 32 | 8 | 3 | 1 |

Basic block 2 | 64 | 32 | 3 | 3 | 1 | |

Residual blocks | 64 | 32 | 3 | 3 | 1 | |

Transposed 2D | 64 | - | 4 | - | 2 | |

Final convolutional layer | 6 | - | 8 | - | - |

Min | Median | Max | |
---|---|---|---|

CNN | 1.01 | 1.64 | 2.91 |

convLSTM | 34.8 | 45.7 | 64.9 |

Tower network | 16.3 | 21.9 | 32.0 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Eide, S.S.; Riegler, M.A.; Hammer, H.L.; Bremnes, J.B.
Deep Tower Networks for Efficient Temperature Forecasting from Multiple Data Sources. *Sensors* **2022**, *22*, 2802.
https://doi.org/10.3390/s22072802

**AMA Style**

Eide SS, Riegler MA, Hammer HL, Bremnes JB.
Deep Tower Networks for Efficient Temperature Forecasting from Multiple Data Sources. *Sensors*. 2022; 22(7):2802.
https://doi.org/10.3390/s22072802

**Chicago/Turabian Style**

Eide, Siri S., Michael A. Riegler, Hugo L. Hammer, and John Bjørnar Bremnes.
2022. "Deep Tower Networks for Efficient Temperature Forecasting from Multiple Data Sources" *Sensors* 22, no. 7: 2802.
https://doi.org/10.3390/s22072802