Comparative Analysis of Temperature Prediction Models: Simple Models vs. Deep Learning Models

Wang, Zibo; Zhang, Weiqi; Pinsky, Eugene

doi:10.3390/cmsf2025011006

Open AccessProceeding Paper

Comparative Analysis of Temperature Prediction Models: Simple Models vs. Deep Learning Models^†

by

Zibo Wang

,

Weiqi Zhang

and

Eugene Pinsky

^*

Department of Computer Science, Metropolitan College, Boston University, Boston, MA 02215, USA

^*

Author to whom correspondence should be addressed.

^†

Presented at the 11th International Conference on Time Series and Forecasting, Canaria, Spain, 16–18 July 2025.

Comput. Sci. Math. Forum 2025, 11(1), 6; https://doi.org/10.3390/cmsf2025011006

Published: 30 July 2025

Download

Browse Figures

Versions Notes

Abstract

Accurate and concise temperature prediction models have important applications in meteorological science, agriculture, energy, and electricity. This study aims to compare the performance of simple models and deep learning models in temperature prediction and explore whether simple models can replace deep learning models in specific scenarios to save computing resources. Based on 37 years of daily temperature time series data from 10 cities from 1987 to 2024, the Simple Moving Average (SMA), Seasonal Average Method with Lookback Years (SAM-Lookback), and Long Short-Term Memory (LSTM) models are fitted to evaluate the accuracy of simple models and deep learning models in temperature prediction. The performance of different models is intuitively compared by calculating the RMSE and Percentage Error of each city. The results show that LSTM has higher accuracy in most cities, but the prediction results of SMA and LSTM are similar and perform equally well, while SAM-Lookback is relatively weak.

Keywords:

weather prediction model; simple moving average; seasonal average method with lookback years; long short-term memory

1. Introduction

At present, mainstream weather forecasting models mainly use numerical weather prediction models (NWP) [1]. Such models simulate the weather evolution process based on physical equations such as atmospheric dynamics and thermodynamics, and are widely used for global and regional weather forecasting. For example, GFS (Global Forecast System), ECMWF (European Centre for Medium-Range Weather Forecasts), and WRF (Weather Research and Forecasting Model) all belong to this category [2].

However, deep learning models are gradually changing the field of weather forecasting and becoming an important development direction for the future [3]. For example, Google DeepMind’s GraphCast, based on graph neural networks (GNNs), has surpassed traditional NWP methods in short-term forecasting accuracy and demonstrated strong data-driven forecasting capabilities. For short-term weather forecasts (such as those from a few hours to a few days), LSTM’s prediction accuracy in local areas is particularly outstanding [4,5,6,7]. However, deep learning models (such as RNNs and LSTM) consume a lot of computing resources and usually need to store a large number of intermediate states and gradients, resulting in much higher memory and computing costs than SMA and SAM-Lookback, which increases the difficulty of deployment and maintenance [8,9].

In contrast, SMA and SAM-Lookback have shorter training time, fewer computing requirements. and do not require complex optimization processes. They are simpler to implement and are especially suitable for short-term trend analysis [10]. In certain scenarios, such models still have sufficient prediction accuracy, especially when temperature changes are relatively stable [11], and can effectively remove noise or extreme values [12]. Therefore, SMA and SAM-Lookback are still feasible prediction methods in cities with relatively stable climates, providing a computing resource-friendly alternative [13,14].

Therefore, this paper compares the performance of the three models in a temperature prediction task to explore whether the simple model can achieve the same accuracy as the deep learning model in specific situations. If the simple model also performs well, it will help promote the application of hybrid modeling and provide a more efficient and low-cost prediction solution for scenarios with limited computing resources.

2. Datasets of Historical Weather

This study used the daily average temperature data of ten cities in the United States collected by the author, in degrees Fahrenheit, covering a time span of 37 years from 1987 to 2024. These cities are geographically diverse, including Boston, New York, and Miami on the east coast, San Francisco, Los Angeles, and Portland on the west coast, and Minneapolis, Dallas, Phoenix, and Kansas City in the central region. The data was preprocessed, with missing values filled with the average temperature of the two days before and after and outliers removed to ensure the quality and reliability of the data.

3. Model and Evaluation Method

This study used three prediction models to fit temperature data, including two simple models—Simple Moving Average (SMA) and Seasonal Average Method with Lookback Years (SAM-Lookback)—and a deep learning model: Long Short-Term Memory (LSTM) [15]. These models combine statistical methods with machine learning techniques to effectively analyze and predict time series data. Each method has specific application scenarios, advantages, and limitations, as described below. In addition, this paper uses the root mean square error (RMSE) and Percentage Error to evaluate the prediction accuracy of each model to provide a more intuitive comparison [16].

3.1. Simple Moving Average (SMA)

First, SMA is a basic time series forecasting method that smooths data fluctuations by calculating the average temperature within a fixed window w to capture the overall trend [12]. At the same time, SMA uses historical seasonal patterns to optimize the forecast of periodic temperature trends, which is applicable to the agricultural and renewable energy fields to help formulate planting plans or optimize energy scheduling [11].

For a time series

T = {T_{1}, T_{2}, \dots, T_{n}}

, the SMA on day t is defined as

{SMA}_{t} = \frac{1}{w} \sum_{i = t - w + 1}^{t} T_{i}, where w \leq t \leq n .

(1)

In other words, it is an average of the previous w days. SMA is simple to calculate and easy to implement. It is suitable for areas with relatively stable temperature changes and has low requirements for computing resources, making it an ideal choice for short-term forecasts. However, SMA cannot capture long-term dependencies, is slow to respond to sudden weather changes, and is sensitive to window size, w, making it difficult to adapt to complex climate patterns. Larger windows may cause over-smoothing of data and mask key trends, while smaller windows may retain too much noise and affect forecast accuracy. Therefore, SMA is more suitable for temperature forecasting in scenarios with limited computing resources or under stable climate conditions but has limited effect in long-term forecasts or when the weather fluctuates drastically. In this study, we tried different window sizes w from 1 to 5 to evaluate the forecast accuracy of SMA under different parameter settings and determine its optimal window size.

3.2. Seasonal Average Method with Lookback Years (SAM-Lookback)

SAM-Lookback is a method for weather forecasting by looking back at the average temperature at the same time point in multiple years, aiming to capture seasonal and cyclical temperature changes [17]. The method predicts the temperature on a specific date d by averaging the temperature records of the same calendar day in k backward years:

{\hat{T}}_{d} = \frac{1}{k} \sum_{j = 1}^{k} T_{d, j}, where k \geq 1 .

(2)

For example, to predict the temperature for June 1 of this year, we compute the average temperature for June 1 in the previous k years.

The advantage of the SAM-Lookback model is that it is suitable for areas with obvious seasonality, especially agricultural areas where weather patterns are crucial in crop planning. By integrating historical data, the method is able to capture regular cyclical trends and is very suitable for long-term seasonal forecasting. However, the method relies on a sufficient amount of historical data and its performance may deteriorate in areas with irregular temperature patterns or drastic climate changes. In reality, SAM-Lookback is widely used in agriculture and renewable energy to predict seasonal temperature changes and optimize production scheduling.

To evaluate its effectiveness, this study ranged k from 1 year to 5 years and ignored the treatment of February 29th in leap years. Finally, the best lookback year k was determined based on the forecast accuracy.

3.3. Long Short-Term Memory (LSTM)

Temperature prediction is a typical time series modeling task. Traditional statistical methods (such as ARIMA and exponential smoothing) have limitations in dealing with complex nonlinear relationships and long-term dependencies, while recurrent neural networks (RNNs) based on deep learning perform better in capturing dynamic patterns in time series data [7].

Based on the RNN structure, LSTM effectively alleviates the long-term dependency problem by introducing memory units and gating mechanisms, making it particularly suitable for complex data sets with nonlinear time relationships [6]. The advantages of LSTM include the ability to learn long-term time dependencies, being suitable for data with obvious time series patterns, and being more adaptable than traditional statistical methods. However, LSTM also has problems such as high computational overhead, long training time, and sensitivity to hyperparameters [18,19]. It may overfit or make unstable predictions when data is insufficient or trends change drastically.

This study used the LSTM model to fit historical temperature data to explore the time-dependent patterns of temperature and improve the prediction accuracy. By training the LSTM network, we hoped that the model could capture long-term trends and short-term fluctuations in temperature, thereby achieving a reasonable estimate of future temperature.

For a given time step t, the LSTM cell updates are defined as follows:

\begin{matrix} f_{t} & = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f}), (Forget gate) \end{matrix}

(3)

\begin{matrix} i_{t} & = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i}), (Input gate) \end{matrix}

(4)

\begin{matrix} {\tilde{C}}_{t} & = tanh (W_{C} \cdot [h_{t - 1}, x_{t}] + b_{C}), (Candidate state) \end{matrix}

(5)

\begin{matrix} C_{t} & = f_{t} ⊙ C_{t - 1} + i_{t} ⊙ {\tilde{C}}_{t}, (Cell state) \end{matrix}

(6)

\begin{matrix} o_{t} & = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o}), (Output gate) \end{matrix}

(7)

\begin{matrix} h_{t} & = o_{t} ⊙ tanh (C_{t}), (Hidden state) \end{matrix}

(8)

where

-

x_{t}

: input vector at time t;

-

h_{t}

: hidden state;

-

C_{t}

: cell state;

-

f_{t}, i_{t}, o_{t}

: forget, input, and output gate activations;

-

W_{f}, W_{i}, W_{C}, W_{o}

: weight matrices;

-

b_{f}, b_{i}, b_{C}, b_{o}

: biases;

-

σ

: sigmoid activation function;

- tanh: hyperbolic tangent activation function;

- ⊙: element-wise multiplication.

In this study,

T A V G

was used as the input feature and was normalized to stabilize the learning process. The LSTM model consists of one LSTM layer, 50 units, and a fully connected layer. The optimizer uses Adam with a default learning rate of 0.001. A time step of

l = 10

days was used as input to capture the temporal dependency of temperature data. The dataset was divided into 80% (training set) and 20% (test set).

3.4. Evaluation Metric: RMSE

The RMSE (root mean square error) is a commonly used indicator to measure the difference between a predicted value and the actual value. It imposes a higher penalty on larger errors and can therefore effectively reflect the performance of a model in overall prediction.

The root mean squared error (RMSE) was employed as the performance evaluation metric for all three methods. The RMSE quantifies the prediction error and is defined as

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(T_{i} - {\hat{T}}_{i})}^{2}},

(9)

where

T_{i}

represents the observed temperature,

{\hat{T}}_{i}

is the predicted temperature, and n is the total number of observations.

The RMSE provides a consistent and interpretable measure of prediction accuracy, enabling a fair comparison of the SMA, SAM-Lookback and LSTM methods. But the RMSE is sensitive to outliers, and it may cause overall evaluation distortion, especially in temperature prediction, where extreme errors can significantly affect the results. To solve this problem, the use of Percentage Error can pay more attention to the relative accuracy of a model and avoid the excessive impact of the RMSE on extreme values. Combining the two evaluations can more comprehensively reflect model performance.

3.5. Evaluation Metric: Percentage Error

In temperature prediction, Percent Error provides a relative error measure that directly shows the size of the prediction error relative to the actual value, expressed as a percentage. This makes it easier to understand than the RMSE, especially when the size of the prediction error is related to the scale of the actual temperature value. Compared with the RMSE, which focuses on the absolute value of the error, Percent Error can better reflect the relative importance of the error.

In this study, we set three thresholds for the Percentage Error evaluation method: 5%, 8%, and 10%. For each day, if the absolute value of the prediction error is less than the critical value of the true value of that day, the day is marked as “+”, indicating that the prediction is relatively accurate; if the error is greater than the critical value, it is marked as “-”, indicating that the prediction error is large. The overall prediction accuracy of the model is evaluated by calculating the proportion of days marked as “+” to the total number of days. Its calculation formula is

Percentage Error = \frac{| T_{i} - {\hat{T}}_{i} |}{T_{i}}

(10)

where

T_{i}

represents the observed temperature,

{\hat{T}}_{i}

is the predicted temperature.

The label is given by comparing the Percentage Error of each day with the set threshold. Percentage Error provides an intuitive measure of the relative prediction accuracy, expressed as a percentage of the actual value, allowing for a clearer understanding of the magnitude of errors in the predicted temperature values.

The combination of these two evaluation indicators can provide a more comprehensive understanding of the performance of the model.

4. Results

This section introduces the fitting results of the daily average temperature data of ten cities over the past 37 years using the SMA, SAM-Lookback, and LSTM models and presents these results using two evaluation indicators: the RMSE and Percentage Error.

4.1. Simple Moving Average (SMA)

4.1.1. RMSE for SMA

The optimal window size was found by adjusting the window size w of SMA, ranging from 1 to 5. Table 1 summarizes the RMSE corresponding to the optimal window size for each city, and the cities are ranked from small to large according to the RMSE.

From the results in Table 1, we can see that the optimal window size w of the SMA method is 2 in all cities, which means that in all experiments, using a 2-day window to calculate the sliding average could effectively minimize the RMSE. In addition, the RMSE values in the table show that the RMSEs of LA and SF are relatively low, at 1.4191 and 1.5204, respectively, while the RMSEs of Kansas and Dallas are relatively high, at 3.3290 and 3.2564, respectively. This shows that in some cities, the SMA method could provide more accurate predictions, while in other cities, the RMSE is high, which may mean that the method had poor prediction accuracy for temperature data.

It is worth noting that although the RMSE of different cities varies greatly, the optimal window size is 2 in all cities, indicating that the SMA method was relatively consistent in the choice of window size on these data sets. This result also reflects the stability of the SMA method in time series prediction, especially when adjusting the window size; its performance was not significantly affected by the window setting being too large or too small.

4.1.2. Percentage Error for SMA

Table 2 summarizes the temperature prediction accuracy of SMA at thresholds of 5%, 8%, and 10%, with the window size w ranging from 1 to 5.

According to Table 2, we can see that the optimal window size

w *

for all cities under different thresholds is always 1, which means that a smaller window size provides the best prediction performance. This is because a smaller window can better capture the dynamic characteristics of recent temperature changes, thereby avoiding over-smoothing past data. For time series with seasonality and volatility such as temperature data, a smaller window can respond to changes more quickly and improve the accuracy of short-term predictions.

For the 5% threshold, Phoenix, SF, and LA show high accuracy, with prediction accuracy of more than 70% in these cities, especially LA, which has an accuracy of 78.79%. However, the performance for Minneapolis, Boston, New York, and Kansas is not very good, with accuracy rates less than 50%. When the threshold is increased to 8% or even 10%, the accuracy of all cities increases. At the 8% threshold, except for Minneapolis, which has an accuracy of 48.51%, all other cities have an accuracy of more than 50%. Similarly, at the 10% threshold, except for Minneapolis, which has an accuracy of 56.00%, all other cities have an accuracy of more than 60%, especially Phoenix, SF, and LA, whose accuracy is as high as about 95%.

Through the above RMSE and Percentage Error evaluation results of the SMA model, we can see that cities with a smaller RMSE or higher accuracy in Percentage Error, such as LA, SF, Miami, and Phoenix, are usually located in warm or tropical climates, with stable temperature changes and weak seasonality. The climate characteristics of these cities enabled the SMA method to effectively capture temperature trends, thereby reducing prediction errors. The sliding average method is suitable for these cities because their temperature fluctuations are small and suitable for prediction using simpler models. According to the data, the lowest temperatures in these cities over the past 37 years were between 30 and 40 degrees Fahrenheit, while the highest temperatures were between 90 and 100 degrees Fahrenheit.

For cities with a larger RMSE or lower accuracy in Percentage Error, such as Kansas, Dallas, Minneapolis, and Boston, their highest temperatures are also between 90 and 100 degrees Fahrenheit, but the lowest temperatures are around 0 degrees Fahrenheit, except that for Minneapolis, which is below −20 degrees Fahrenheit. Therefore, these cities have more drastic temperature changes and significant seasonal fluctuations. Since the climate characteristics of these cities are more complex, the SMA method makes it difficult to accurately fit large fluctuations and abnormal weather, resulting in large prediction errors. Therefore, cities with extreme climate change or strong seasonal changes are less adaptable to the SMA method and show higher RMSE values.

Through Figure 1, we can observe the trend of the accuracy of the 10 cities changing with the threshold when the optimal window size of SMA is 1. First, the accuracy of all cities increases with the increase in threshold, among which Los Angeles, San Francisco, and Phoenix have significantly higher accuracy than other cities, while Portland’s accuracy shows a significant improvement. However, the accuracy of Miami does not increase significantly with the increase in threshold, which may be related to its climate characteristics and the stability of temperature changes. Miami has a tropical climate with relatively stable temperature and small changes, so the model could capture the temperature trend more accurately during the prediction process. This means that as the threshold increases, the error tolerance range increases. Although larger deviations can be tolerated, the prediction accuracy of the model is already high due to the small temperature change itself, and the impact of further increasing the threshold on the results becomes limited. In other words, the prediction accuracy of Miami may be close to the limit of SMA. Within a more tolerant error range, the model is very stable, so even if the threshold is increased, the accuracy will not change significantly. This situation is more common in cities with relatively stable climates, because the temperature itself fluctuates less and the model’s prediction error is also lower.

4.2. Seasonal Average Method with Lookback Years (SAM-Lookback)

4.2.1. RMSE for SAM-Lookback

The Seasonal Average Method was evaluated by varying the number of lookback years k from 1 to 5. Table 3 summarizes the RMSE and the optimal lookback years for each city. Similarly, the cities are ranked from small to large according to the RMSE value.

The results show that the optimal number of lookback years for all cities is 5 years; that is, when

k = 5

, the RMSE values of all cities reach the minimum. Among these cities, Miami, Los Angeles, and San Francisco have the lowest RMSEs, which are 4.2938, 4.3686, and 4.3956, respectively, while Dallas, Kansas, and Minnesota have the worst, with RMSEs of 8.1630, 9.4926, and 9.7471, respectively. This result is roughly the same as the performance of cities in the SMA model. It can be seen that the SAM-Lookback model is also more suitable for cities with relatively stable seasonal temperature patterns, while for inland cities or areas facing large climate fluctuations, the prediction effect is less effective.

4.2.2. Percentage Error for SAM-Lookback

Table 4 summarizes the temperature prediction accuracy of SAM-Lookback at thresholds of 5%, 8%, and 10%, with the lookback years k ranging from 1 to 5.

The results show that the best lookback years for each city are different, mainly due to the differences in climate characteristics and temperature change patterns in different cities. However, it can be seen that when the threshold is low, such as 5%, the best lookback years for most cities are concentrated in 3 and 4 years; when the threshold is moderate, such as 8%, the best lookback years for most cities are 3 years; and when the threshold is high, such as 10%, the best lookback years for most cities are concentrated in 2 years. This is because as the threshold increases, the SMA-Lookback model’s tolerance for errors increases, which affects the selection of the best lookback years. Under a lower threshold, the model requires more lookback years, 3 or 4 years, to accurately capture the long-term trend of temperature change; and as the threshold increases, the model tolerance increases, and the best lookback years are reduced to 3 years. When the threshold is higher, the error tolerance increases further, and the model tends to choose fewer lookback years, 2 years, to better adapt to prediction needs and reduce computational complexity.

Similarly, in the Percentage Error measurement method, the performance of each city under the SAM-Lookback model is roughly the same as that of the SMA model. Los Angeles, San Francisco, and Phoenix all performed well under the three thresholds, especially when the threshold was 10%, the accuracy of the three cities reached about 70%, or even higher. However, the performance of Minnesota, Kansas, and Boston was still weak, and the accuracy did not exceed 50% when the threshold was 10%.

As can be seen from Figure 2, the trends and rankings of the accuracy of the SAM-Lookback model and the SMA model in the 10 cities are very similar. Specifically, although there is a gap in the accuracy of the two models themselves, their performance trends under different thresholds are basically the same.

4.3. Long Short-Term Memory (LSTM)

4.3.1. RMSE for LSTM

Table 5 summarizes the RMSE for each city for the LSTM model. Similarly, the cities are ranked from small to large according to the RMSE value.

The results show that under the LSTM model, the ranking of RMSE among cities is not much different from that of the SMA and SAM-Lookback models. The performance for Los Angeles, San Francisco, and Miami continues to be effective, with RMSEs less than 3, 2.6113, 2.9035, and 2.9975, respectively. However, the RMSE performance for Minnesota, Kansas, and Dallas is still unsatisfactory, 6.0942, 6.5876, and 8.3308, respectively.

4.3.2. Percentage Error for LSTM

Table 6 summarizes the temperature prediction accuracy of LSTM at thresholds of 5%, 8%, and 10%.

As can be seen from Table 6, the performance of the 10 cities in Percentage Error under the LSTM model is similar to that in RMSE. The best performers are Miami and Los Angeles. When the threshold is 5%, the prediction accuracy of the two cities exceeds 80%, 86.91%, and 80.89% respectively. As the threshold is increased to 8% or 10%, the prediction accuracy of the two cities further increases to more than 90%. Other cities with an accuracy of more than 90% include Phoenix and San Francisco. In contrast, Minnesota performs poorly in Percentage Error. Its prediction accuracy does not exceed 60% when the threshold is increased to 10%.

From Figure 3, we can more intuitively see the change in the prediction accuracy of each city as the threshold increases. San Francisco’s accuracy performance is not outstanding at a threshold of 5%. However, when the threshold is increased, the accuracy jumps sharply to more than 90%. This shows that LSTM has a strong ability to capture the long-term trend of San Francisco data, but it may be affected by short-term fluctuations at a low threshold, resulting in large errors. As the threshold increases, the error tolerance of the model increases, and the advantage of LSTM in overall trend prediction is reflected. For Miami, Los Angeles, and Phoenix, the prediction accuracy does not change much when the threshold increases from 8% to 10%. This shows that LSTM’s learning of the temperature patterns of these cities is relatively stable, and the errors are mainly concentrated within 8%. The model captures its trends more accurately, and the higher threshold does not significantly affect the prediction performance, indicating that its error distribution is more concentrated and the prediction results are more consistent.

5. Comparison of the Accuracy of the Three Models

In this section, we will compare the RMSE and Percentage Error performance of 10 cities under the simple models, SMA and SAM-Lookback, and the deep learning model, LSTM. By analyzing the prediction results of different models, we explore which model performed better in temperature prediction and whether there were significant differences in its performance for different cities.

5.1. Comparison of Model Accuracy Using RMSE

As can be seen in Figure 4, among all cities, the SMA model had the smallest RMSE, while the SAM-Lookback had the largest RMSE. According to the performance regarding RMSE, in the temperature prediction task, although SMA as a simple model could not capture long-term trends, it performed stably in a low-noise environment, so it had the smallest RMSE among all cities. In contrast, SAM-Lookback used a longer historical window for prediction, which introduced noise or lag effects, making it difficult for the model to adapt to short-term changes. Therefore, it had the largest RMSE among all cities, indicating that long time windows may not always be advantageous. As a deep learning model, LSTM theoretically should have performed better in the nonlinear data temperature prediction task, but its RMSE was not lower than that of SMA. This shows that for simple temperature prediction problems, the SMA as a simple statistical method is accurate enough and can effectively capture the trends of data. However, due to their complexity, complex models such as LSTM may not be able to outperform SMA when data features are relatively simple or do not change much.

5.2. Comparison of Model Accuracy Using Percentage Error

As can be seen from Figure 5, Figure 6 and Figure 7, the LSTM model had the best overall performance under the three thresholds in Percentage Error, especially in Miami, where the prediction accuracy of LSTM was far ahead of that of the SMA and SAM-Lookback models. This performance was mainly due to the fact that LSTM could effectively capture long-term dependencies and complex nonlinear patterns in the data. However, the prediction accuracy of the SMA model was similar to that of LSTM in most cities. Except for in Miami, SMA’s performance was only slightly inferior, still showing excellent results. This shows that although SMA cannot handle long-term dependencies, it is accurate enough for these simple temperature data with little fluctuation. In contrast, the performance of the SAM-Lookback model under different thresholds was always the worst, failing to reach the level of LSTM and SMA.

From the perspective of temperature prediction accuracy, we analyzed in which cities the excellent simple model SMA had more accurate results. Similarly, we can conclude from Figure 5, Figure 6 and Figure 7 that Los Angeles, San Francisco, and Phoenix had the best accuracy at all thresholds; even at the most stringent 5% threshold, they all reached more than 70%. And at the 8% and 10% thresholds, they were all above 90%. Similarly, Portland’s accuracy was also above 70% and 80% at these two thresholds.

The common feature of Los Angeles, San Francisco, and Portland is that they are located on the west coast of the United States and have a mild marine climate. Los Angeles and San Francisco belong to the Mediterranean climate, while Portland belongs to the temperate marine climate zone. In addition, they are all affected by the Pacific monsoon, and the marine climate brought about by them makes the temperature changes in these cities relatively stable. Especially in summer, the buffering effect of the marine climate effectively reduces extreme fluctuations in temperature, resulting in a smaller temperature difference. The ocean air currents regulate the temperature and the mild climate makes it relatively easy to predict the temperature in these cities, which simplified the calculation of the model and improved the accuracy of the prediction. Phoenix is located in the desert area of the southwest United States and has a typical arid climate, with hot and dry summers and mild winters. The overall temperature changes are relatively stable, showing stable seasonal fluctuations. Therefore, the SMA model could also perform well for Phoenix.

6. Conclusions

This study explored the short-term temperature prediction capabilities of simple models under specific conditions. The results showed that the SMA model performed particularly well in areas with relatively stable temperature changes under RMSE and Percentage Error evaluation, fully demonstrating its ability to reduce noise and identify short-term trends. However, in cities or regions with large temperature fluctuations, SMA’s reliance on a fixed window size limits its adaptability to sudden or rapid temperature changes, which is consistent with the results of some studies [20]. This limitation further highlights the need for more flexible methods to improve prediction accuracy in areas with drastic climate fluctuations.

The performance of another simple model, SAM-Lookback, was not good. From the data results of this study, although the model has the ability to use longer historical data for prediction, SAM-Lookback failed to show a relatively accurate short-term temperature prediction capability in any city or region, whether in the evaluation of RMSE or Percent Error. This may be because the longer lookback window introduces more noise or lag effects, making it difficult for the model to adapt to short-term temperature changes.

In this study, LSTM performed worse than SMA in the RMSE evaluation of temperature prediction, but it performed similarly to SMA in the Percentage Error evaluation. Due to its high complexity, LSTM failed to show a clear advantage in short-term prediction. In addition, data scale, hyperparameter settings, and local trend characteristics of temperature data may also limit the performance of LSTM. However, in larger-scale meteorological data with significant nonlinear patterns and long-term dependencies, LSTM still has strong application potential. It has been widely used in global climate forecasting, extreme weather warning, agricultural meteorological analysis, and other fields. It can effectively model complex temperature change trends and provide more adaptable prediction capabilities than traditional statistical methods. Especially under the synergy of multivariate meteorological factors, LSTM still has broad application prospects in the future.

In summary, the excellent performance of LSTM is due to its powerful sequence modeling ability. The model will have more advantages in the face of more complex weather data analysis, such as extreme weather forecasting, multivariate meteorological modeling, etc. SMA can achieve better results in a low-noise environment due to its simplicity. According to this study, it can be concluded that the SMA model has better applicability in areas with stable temperature changes. At the same time, SMA is very suitable for scenarios with low accuracy requirements and limited computing, maintenance, and deployment capabilities, for example, in areas with less historical meteorological data and lower sampling frequency, such as remote villages or some areas in developing countries. Short-term weather forecasts at school and community levels can be used to arrange travel activities reasonably without building complex meteorological models. Or small agricultural meteorological stations can help arrange irrigation, sowing, or fertilization plans.

Finally, future research based on this paper can start from verifying whether SMA has higher accuracy in most cities with marine climate and arid climate. By expanding the diversity of the data set, including more urban temperature data on these two climate types and other representative urban temperature data on different climate types, the performance of the SMA model under different climate conditions can be further studied.

Author Contributions

Conceptualization, E.P.; methodology, E.P.; software, Z.W.; validation, W.Z., E.P., and Z.W.; formal analysis, W.Z., E.P., and Z.W.; investigation, W.Z.; resources, E.P.; data curation, W.Z.; writing—original draft preparation, Z.W.; writing—review and editing, Z.W. and W.Z.; visualization, Z.W.; supervision, E.P.; project administration, E.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was conducted without any external funding.

Data Availability Statement

The data supporting the reported results in this study are available on GitHub at https://github.com/WZibo/Research_Weather_Datasets (accessed on 29 July 2025).

Acknowledgments

All aspects of this study, including the design, data collection, analysis, and interpretation, were carried out using the resources available within the authors’ institution.

Conflicts of Interest

We declare that there are no conflicts of interest regarding the publication of this paper.

References

Al-Yahyai, S.; Charabi, Y.; Gastli, A. Review of the use of numerical weather prediction (NWP) models for wind energy assessment. Renew. Sustain. Energy Rev. 2010, 14, 3192–3198. [Google Scholar] [CrossRef]
Bi, K.; Xie, L.; Zhang, H.; Chen, X.; Gu, X.; Tian, Q. Accurate medium-range global weather forecasting with 3D neural networks. Nature 2023, 619, 533–538. [Google Scholar] [CrossRef] [PubMed]
Bochenek, B.; Ustrnul, Z. Machine learning in weather prediction and climate analyses—Applications and perspectives. Atmosphere 2022, 13, 180. [Google Scholar] [CrossRef]
Abdulla, N.; Demirci, M.; Ozdemir, S. Design and evaluation of adaptive deep learning models for weather forecasting. Eng. Appl. Artif. Intell. 2022, 116, 105440. [Google Scholar] [CrossRef]
Hasan, M.M.; Hasan, M.J.; Rahman, P.B. Comparison of RNN-LSTM, TFDF and stacking model approach for weather forecasting in Bangladesh using historical data from 1963 to 2022. PLoS ONE 2024, 19, e0310446. [Google Scholar] [CrossRef] [PubMed]
Poornima, S.; Pushpalatha, M. Prediction of rainfall using intensified LSTM based recurrent neural network with weighted linear units. Atmosphere 2019, 10, 668. [Google Scholar] [CrossRef]
Suleman, M.A.R.; Shridevi, S. Short-term weather forecasting using spatial feature attention based LSTM model. IEEE Access 2022, 10, 82456–82468. [Google Scholar] [CrossRef]
Shiri, F.M.; Perumal, T.; Mustapha, N.; Mohamed, R. A comprehensive overview and comparative analysis on deep learning models: CNN, RNN, LSTM, GRU. arXiv 2023, arXiv:2305.17473. [Google Scholar] [CrossRef]
DiPietro, R.; Hager, G.D. Deep learning: RNNs and LSTM. In Handbook of Medical Image Computing and Computer-Assisted Intervention; Elsevier: Amsterdam, The Netherlands, 2020; pp. 503–519. [Google Scholar]
Stanulov, A.; Yassine, S. A comparative analysis of machine learning algorithms for the purpose of predicting Norwegian air passenger traffic. Int. J. Math. Stat. Comput. Sci. 2024, 2, 28–43. [Google Scholar]
Dwivedi, D.N.; Patil, G. Climate change: Prediction of solar radiation using advanced machine learning techniques. In Visualization Techniques for Climate Change with Machine Learning and Artificial Intelligence; Elsevier: Amsterdam, The Netherlands, 2023; pp. 335–358. [Google Scholar]
Meenal, R.; Michael, P.A.; Pamela, D.; Rajasekaran, E. Weather prediction using random forest machine learning model. Indones. J. Electr. Eng. Comput. Sci. 2021, 22, 1208–1215. [Google Scholar] [CrossRef]
Asi, I.M. Laboratory comparison study for the use of stone matrix asphalt in hot weather climates. Constr. Build. Mater. 2006, 20, 982–989. [Google Scholar] [CrossRef]
Rusdiana, S.; Yuni, S.M.; Khairunnisa, D. Comparison of rainfall forecasting in simple moving average (SMA) and weighted moving average (WMA) methods (case study at village of Gampong Blang Bintang, Big Aceh District-Sumatera-Indonesia). J. Res. Math. Trends Technol. 2020, 2, 21–27. [Google Scholar] [CrossRef]
Hossain, M.S.; Mahmood, H. Short-term photovoltaic power forecasting using an LSTM neural network and synthetic weather forecast. IEEE Access 2020, 8, 172524–172533. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, N.; Chen, X. A short-term residential load forecasting model based on LSTM recurrent neural network considering weather features. Energies 2021, 14, 2737. [Google Scholar] [CrossRef]
Kalekar, P.S. Time Series Forecasting Using Holt-Winters Exponential Smoothing; Kanwal Rekhi School of Information Technology: Powai, Mumbai, India, 2004; pp. 1–13. [Google Scholar]
Nakisa, B.; Rastgoo, M.N.; Rakotonirainy, A.; Maire, F.; Chandran, V. Long short-term memory hyperparameter optimization for a neural network-based emotion recognition framework. IEEE Access 2018, 6, 49325–49338. [Google Scholar] [CrossRef]
Liao, L.; Li, H.; Shang, W.; Ma, L. An empirical study of the impact of hyperparameter tuning and model optimization on the performance properties of deep neural networks. ACM Trans. Softw. Eng. Methodol. 2022, 31, 53. [Google Scholar] [CrossRef]
Markovics, D.; Mayer, M.J. Comparison of machine learning methods for photovoltaic power forecasting based on numerical weather prediction. Renew. Sustain. Energy Rev. 2022, 161, 112364. [Google Scholar] [CrossRef]

Figure 1. Temperature prediction accuracy for 10 cities at different thresholds when using the SMA method, with the best window size of 1.

Figure 2. Temperature prediction accuracy of 10 cities at different thresholds using the SAM-Lookback method, with the best window size of 3.

Figure 3. Temperature prediction accuracy of 10 cities at different thresholds using the LSTM method.

Figure 4. Comparison of SMA, SAM-Lookback, and LSTM (RMSE).

Figure 5. Comparison of SMA, SAM-Lookback, and LSTM (Percentage Error threshold = 5%).

Figure 6. Comparison of SMA, SAM-Lookback, and LSTM (Percentage Error threshold = 8%).

Figure 7. Comparison of SMA, SAM-Lookback, and LSTM (Percentage Error threshold = 10%).

Table 1. RMSE for SMA.

City	RMSE	Optimal Window Size
LA.	1.4191	2
SF.	1.5204	2
Miami	1.5585	2
Phoenix	1.7225	2
Portland	1.9175	2
New York	2.7109	2
Boston	3.1351	2
Minneapolis	3.2248	2
Dallas	3.2564	2
Kansas	3.3290	2

Optimal window size

w = 2

consistently minimized RMSE across all cities.

Table 2. Percentage Error for SMA.

City	W = 1	W = 2	W = 3	W = 4	W = 5	W*
Threshold = 5%
Boston	36.92%	31.52%	29.79%	28.73%	28.83%	1
New York	43.80%	39.86%	38.05%	37.19%	36.42%	1
LA.	78.79%	74.78%	72.05%	70.65%	69.42%	1
SF.	71.05%	67.53%	65.26%	64.19%	63.06%	1
Dallas	50.82%	46.50%	43.43%	42.08%	40.96%	1
Kansas	41.12%	35.46%	32.36%	31.11%	30.25%	1
Phoenix	75.06%	67.93%	63.03%	59.99%	58.02%	1
Minneapolis	34.02%	29.97%	27.46%	26.50%	25.58%	1
Portland	54.00%	49.02%	46.25%	44.56%	43.44%	1
Miami	59.45%	57.27%	56.04%	55.20%	54.47%	1
Threshold = 8%
Boston	52.55%	47.36%	45.60%	44.38%	43.84%	1
New York	60.38%	56.50%	54.82%	53.93%	53.41%	1
LA.	91.80%	88.70%	86.87%	85.88%	84.93%	1
SF.	88.66%	85.68%	84.10%	82.99%	82.20%	1
Dallas	65.71%	60.84%	58.21%	56.52%	55.38%	1
Kansas	56.31%	50.75%	48.04%	45.74%	45.38%	1
Phoenix	89.82%	84.77%	81.14%	78.95%	77.10%	1
Minneapolis	48.51%	44.27%	41.36%	39.58%	38.43%	1
Portland	75.34%	70.20%	66.27%	64.52%	63.16%	1
Miami	65.58%	63.86%	62.98%	62.56%	62.45%	1
Threshold = 10%
Boston	61.16%	56.50%	55.01%	53.54%	52.86%	1
New York	68.60%	65.09%	63.45%	62.95%	62.35%	1
LA.	95.65%	93.69%	92.10%	91.34%	90.71%	1
SF.	94.28%	92.49%	90.98%	90.16%	89.52%	1
Dallas	72.86%	67.95%	65.19%	63.67%	62.43%	1
Kansas	64.07%	58.77%	56.19%	54.34%	53.54%	1
Phoenix	94.42%	90.88%	88.07%	86.40%	85.00%	1
Minneapolis	56.00%	51.92%	48.68%	47.07%	45.96%	1
Portland	84.01%	79.73%	76.95%	75.24%	73.87%	1
Miami	67.53%	66.33%	65.60%	65.27%	65.05%	1

W* represents the window size with the highest accuracy.

Table 3. RMSE for SAM-Lookback.

City	RMSE	Optimal Lookback Years
Miami	4.2938	5
LA.	4.3686	5
SF.	4.3956	5
Portland	5.7318	5
Phoenix	5.9209	5
New York	6.7722	5
Boston	7.6514	5
Dallas	8.1630	5
Kansas	9.4926	5
Minneapolis	9.7471	5

RMSE: root mean squared error.

k = 5

lookback years minimizes RMSE across all cities.

Table 4. Percentage Error for SAM-Lookback.

City	k = 1	k = 2	k = 3	k = 4	k = 5	K*
Threshold = 5%
Boston	20.39%	22.54%	22.98%	22.45%	21.86%	3
New York	25.94%	28.16%	28.17%	28.45%	27.39%	4
LA.	46.56%	50.30%	50.46%	50.50%	49.62%	4
SF.	43.12%	45.34%	45.96%	45.82%	45.02%	3
Dallas	28.17%	30.58%	31.16%	30.92%	30.03%	3
Kansas	20.12%	20.28%	20.01%	19.87%	18.98%	2
Phoenix	37.43%	41.09%	41.93%	42.24%	41.29%	4
Minneapolis	17.58%	19.28%	19.29%	19.31%	19.19%	4
Portland	29.28%	32.10%	32.57%	32.36%	31.76%	3
Miami	42.12%	43.14%	42.33%	41.38%	39.97%	2
Threshold = 8%
Boston	31.81%	34.74%	34.57%	34.39%	33.24%	2
New York	39.18%	41.95%	41.99%	41.98%	40.79%	3
LA.	65.44%	68.37%	68.78%	68.59%	67.30%	3
SF.	62.62%	65.07%	65.27%	65.08%	63.91%	3
Dallas	41.09%	44.28%	44.67%	43.65%	42.85%	3
Kansas	30.73%	31.29%	31.43%	30.60%	28.71%	3
Phoenix	55.74%	59.55%	60.74%	60.49%	59.61%	3
Minneapolis	27.36%	29.01%	29.60%	29.42%	29.08%	3
Portland	44.29%	48.78%	48.51%	48.50%	48.13%	2
Miami	52.81%	53.15%	52.15%	50.39%	48.32%	2
Threshold = 10%
Boston	38.97%	41.87%	41.54%	41.00%	40.15%	2
New York	46.89%	49.24%	49.65%	49.11%	47.50%	3
LA.	74.32%	76.60%	77.00%	76.10%	74.19%	3
SF.	71.54%	74.66%	74.45%	73.41%	72.12%	2
Dallas	48.40%	51.99%	51.45%	50.19%	48.97%	2
Kansas	37.16%	38.15%	37.94%	36.31%	34.35%	2
Phoenix	65.50%	68.83%	69.68%	69.38%	68.21%	3
Minneapolis	33.06%	34.96%	35.58%	35.33%	34.75%	3
Portland	53.24%	58.38%	58.14%	57.90%	57.19%	2
Miami	56.92%	56.97%	55.37%	53.58%	51.38%	2

K* represents the lookback years with the highest accuracy.

Table 5. RMSE for LSTM.

City	RMSE
LA.	2.6113
SF.	2.9035
Miami	2.9975
Phoenix	3.2422
Portland	3.6760
New York	5.1143
Boston	5.7204
Minneapolis	6.0942
Kansas	6.5876
Dallas	8.3308

The root mean squared error achieved by the LSTM method for each city.

Table 6. Percentage Error for LSTM.

City	Threshold = 5%	Threshold = 8%	Threshold = 10%
Kansas	43.82%	59.54%	63.92%
LA.	80.89%	93.70%	97.04%
Minneapolis	32.76%	49.89%	58.02%
Phoenix	77.67%	90.41%	95.10%
Portland	54.65%	76.61%	86.09%
SF.	71.82%	91.65%	95.72%
Boston	39.50%	56.69%	65.60%
New York	41.96%	62.56%	69.88%
Dallas	46.55%	69.94%	76.24%
Miami	86.91%	94.82%	97.16%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Z.; Zhang, W.; Pinsky, E. Comparative Analysis of Temperature Prediction Models: Simple Models vs. Deep Learning Models. Comput. Sci. Math. Forum 2025, 11, 6. https://doi.org/10.3390/cmsf2025011006

AMA Style

Wang Z, Zhang W, Pinsky E. Comparative Analysis of Temperature Prediction Models: Simple Models vs. Deep Learning Models. Computer Sciences & Mathematics Forum. 2025; 11(1):6. https://doi.org/10.3390/cmsf2025011006

Chicago/Turabian Style

Wang, Zibo, Weiqi Zhang, and Eugene Pinsky. 2025. "Comparative Analysis of Temperature Prediction Models: Simple Models vs. Deep Learning Models" Computer Sciences & Mathematics Forum 11, no. 1: 6. https://doi.org/10.3390/cmsf2025011006

APA Style

Wang, Z., Zhang, W., & Pinsky, E. (2025). Comparative Analysis of Temperature Prediction Models: Simple Models vs. Deep Learning Models. Computer Sciences & Mathematics Forum, 11(1), 6. https://doi.org/10.3390/cmsf2025011006

Article Menu

Comparative Analysis of Temperature Prediction Models: Simple Models vs. Deep Learning Models^†

Abstract

1. Introduction

2. Datasets of Historical Weather