Next Article in Journal
Immersive Virtual Reality-Based Exercise for Pain Management in Fibromyalgia: An Exploratory Study with Risk of Poor Outcomes Stratification
Previous Article in Journal
Advances in Vehicle Safety and Crash Avoidance Technologies
Previous Article in Special Issue
Using Eye-Tracking Data to Examine Response Processes in Digital Competence Assessment for Validation Purposes
 
 
Review
Peer-Review Record

A Survey of Machine Learning Methods for Time Series Prediction

Appl. Sci. 2025, 15(11), 5957; https://doi.org/10.3390/app15115957
by Timothy Hall * and Khaled Rasheed
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Appl. Sci. 2025, 15(11), 5957; https://doi.org/10.3390/app15115957
Submission received: 16 April 2025 / Revised: 16 May 2025 / Accepted: 20 May 2025 / Published: 26 May 2025
(This article belongs to the Special Issue Advances and Applications of Complex Data Analysis and Computing)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This manuscript presents a thorough and well-organized survey of recent machine learning approaches to time series forecasting, with a particular focus on comparing tree-based models (TBML) and deep learning (DL) methods under identical experimental conditions. This focus represents a significant and novel contribution, as prior surveys often fail to provide fair comparisons due to inconsistent datasets and evaluation procedures across studies. The methodology of including only studies that directly compare TBML and DL models using the same dataset is sound and sets this work apart from prior surveys.

Recommendations for Improvements:

1-) Overly Descriptive Sections: Sections 3 and 4 tend to be too explanatory in nature. These could be condensed, or references to prior technical reviews could be used to streamline the content.

2-) Lack of Deeper Critical Discussion: While the quantitative comparisons are excellent, the paper would benefit from a deeper critical analysis of why certain models perform better in specific domains or under certain data characteristics.  For instance, in Section 5.3.2 (Task-Specific Model Performance Analysis), the authors show that TBML models outperform DL models in transportation and anomaly detection, whereas DL models excel in environmental and financial tasks—but they do not explore the underlying data characteristics (e.g., seasonality, feature dimensionality, or noise levels) that might explain these differences. In Section 5.3.3 (Impact of Dataset Size on Model Performance), TBML models surprisingly outperform DL models in large datasets, yet there is no critical analysis of potential causes like overfitting, architectural scaling limits, or hyperparameter tuning. Furthermore, Section 5.3.5 (Impact of Research Focus) highlights bias in studies that favor a particular model class, but again, fails to critically discuss how methodological setups or evaluation practices contribute to this bias. Including such domain-informed explanations would elevate the paper from a descriptive review to a more analytically grounded contribution.

3-) Limited Representation of Advanced DL Models: Although attention-based models such as Transformers are mentioned, the dataset includes only a few related studies. The authors could emphasize that this area is still evolving and suggest future work. 

4-) Insufficient Discussion of Overfitting and Generalization: The discussion could be expanded to include how issues such as overfitting and model generalization are handled, particularly in DL models trained on small datasets.

5-) While the conclusions include some practical advice, it remains general. A table or decision framework linking data characteristics (e.g., size, domain, temporal resolution) with recommended model classes would add practical value.

Author Response

Comments 1: Overly Descriptive Sections: Sections 3 and 4 tend to be too explanatory in nature. These could be condensed, or references to prior technical reviews could be used to streamline the content.

Response 1: Thank you for pointing this out. We agree with this comment. Therefore, we have removed unnecessary content in these sections: Section 3.1, page 3: The most widely used library for RF implementation in the studies reviewed in this paper is Scikit-learn. Section 3.2, page 4: GBDTs are suitable for both classification tasks, using loss functions such as cross-entropy, and regression tasks, using loss functions like mean squared error. Key hyperparameters to tune in GBDT include tree depth and learning rate, which are crucial for balancing model complexity and reducing overfitting. Section 3.2.1, page 4: XGBoost also supports advanced features such as built-in cross-validation and highly scalable parallel processing, making it ideal for large-scale datasets. Section 4.1, page 5: Unlike other architectures, FFNNs do not contain loops or cycles within their structure. Additionally, Techical Reviews are referred to on lines 121, 133,146, 172-174, 191-192, 208, and 216

 

Comments 2: Lack of Deeper Critical Discussion: While the quantitative comparisons are excellent, the paper would benefit from a deeper critical analysis of why certain models perform better in specific domains or under certain data characteristics.  For instance, in Section 5.3.2 (Task-Specific Model Performance Analysis), the authors show that TBML models outperform DL models in transportation and anomaly detection, whereas DL models excel in environmental and financial tasks—but they do not explore the underlying data characteristics (e.g., seasonality, feature dimensionality, or noise levels) that might explain these differences. In Section 5.3.3 (Impact of Dataset Size on Model Performance), TBML models surprisingly outperform DL models in large datasets, yet there is no critical analysis of potential causes like overfitting, architectural scaling limits, or hyperparameter tuning. Furthermore, Section 5.3.5 (Impact of Research Focus) highlights bias in studies that favor a particular model class, but again, fails to critically discuss how methodological setups or evaluation practices contribute to this bias. Including such domain-informed explanations would elevate the paper from a descriptive review to a more analytically grounded contribution.

Response 2: Thank you for pointing this out. We agree with this comment. Therefore, we have added a deeper critical analysis of Task-Specific Model Performance in section 5.4, page 23, line 633: However, there are several explanations that can account for the performance differential between models.

TBML models excel in areas with tabular, noisy, missing, and categorical data, making them especially suitable for the domain of utilities, transportation, urban mobility, and anomaly detection which often involve structured, tabular datasets that may have many sparse features. TBML models are especially equipped for anomaly detection as they are able to use tree splitting criteria to capture feature interactions without needing much explicit feature engineering. Similarly, SPTB algorithms excel in transportation and miscellaneous applications as they are optimized for speed and do not require extensive feature engineering making them compatible for a wide array of miscellaneous applications. DL models outperform in environmental/meteorological predictions and mechanical health monitoring due to their ability to effectively capture unstructured and spatial-temporal patterns in data. Additionally, DL models perform better in financial/market trend forecasting as these models are better able to handle lagged effects and temporal dependencies with less feature engineering than TBML models. Specifically, RNN models excel in wide range of tasks including Water and Air Quality, Environmental and Meteorological Predictions, Structural and Mechanical Health Monitoring, Stock Market/Finance/Market Trends, and Healthcare/Biomedical Predictions due to their unique ability to remember long term dependencies. Patterns in these domains can span over days, weeks, and even months, and RNN models like LSTM are able to remember this information through memory gates without manual feature engineering.  We have also added a deeper critical analysis of dataset size in section 5.4, page 24, line 666: The reason that DL and specifically RNN models are able to perform better on smaller datasets than TBML models is they are better able to handle latent representations if sequential inputs are strong even on small amounts of data. TBML models lack inducitve bias for sequence learning and thus may stuggle on small datsets if not given the appropriately engineered sequential features. Conversely, TBML models, specifically SPTB models, outperform DL models on larger datsets as these algorithms are much more computationally effiecient at dealing with large amounts of data while remaining robust to noise, missing values, or irrelevant features allowing them to quickly capture complex patterns. DL models are comparively computaionally expensive, harder to optimize, and require careful preprocessing for noisy data. When DL models are not given adequate traning resouces or carelfully optimized, they consistently underperform relative to their TBML counterparts. Additionally, we have added a short sentence that provides a brief deeper critical analysis of Temporal Resolution in section 5.4, page 25, line 690: The reason for this may indicate that other aspects of the data composition or domain specific tasks are more important and impactful in model performance leaving weak to nonexistent trends for temporal resolution differences. Finally, we have added some critical analysis about why certain models perform better depending on the bias of the paper in section 5.4, page 25, line 702: A potential explanation for this disparity is that the accuracy of DL models is more dependent on the regularization techniques used and hyperparmeter values chosen for testing compared to TBML models. When researchers are biased towards DL methods, they may spend more time on this aspect of model development comarted to TBML model development casuing DL methods to perform better. If researchers do not take the time or effort to invest time in this area of model developemnt, then TBML models are more likely to outperform DL methods. 

 

Comments 3:  Limited Representation of Advanced DL Models: Although attention-based models such as Transformers are mentioned, the dataset includes only a few related studies. The authors could emphasize that this area is still evolving and suggest future work. 

Response 3: Thank you for pointing this out. We agree with this comment. Therefore, we have included a Future Work section that includes emphasizes the importance of further exploration of Transformers on page 30, line 919:  One of these areas that was briefly touched upon in this paper and that requires future research is Transformer based architectures. Initial research indicates that transformers excel at long range dependency modelling and thus perform well in the reviewed papers [28,33,70,74,77], showcasing performance on par with the best performing models in the literature. Because there are only 5 research papers included in this study future research should investigate the validity of this by exploring more recent papers where transformer models are being applied to time series applications. One of the biggest challenges that faces attention-based architectures is their significant computational cost. One of the most interesting areas in respect to addressing this is pre trained transformer models [92]. These models can be pre-trained on large collections of unrelated time series data enabling improved performance of training on datasets across domains.

 

Comments 4: Insufficient Discussion of Overfitting and Generalization: The discussion could be expanded to include how issues such as overfitting and model generalization are handled, particularly in DL models trained on small datasets.

Response 4: Thank you for pointing this out. We agree with this comment. Therefore, we have added a discussion showcasing how issues such as overfitting and model generalization are handled particularly in DL models trained on small datasets with evidence from the studies conducted to back it up. In Section 5.4, page 24, line 677 we include: It is worth pointing out, that these generalizations do not hold true in all circumstances. Study [15] showcases that DL models such as LSTM struggle with overfitting and generalization especially on small datasets due to the nature of model architecture (there are a larger number of weights and deviation terms for learning). The easiest way to overcome this limitation is to expand the dataset as shown in study [16]. If there is no other data available, then DL models may fall short to other TBML methods. Study [67] demonstrates that these gradient boosted TBML techniques show superior generalization ability and thus lead to higher prediction accuracy in this instance.

 

Comments 5: While the conclusions include some practical advice, it remains general. A table or decision framework linking data characteristics (e.g., size, domain, temporal resolution) with recommended model classes would add practical value

Response 5: Thank you for pointing this out. We agree with this comment. Therefore, we have included the following table in the conclusion on page 29:

Dataset Size

Best Performing Model Class

Best Performing Model Subclass

Small (0-2173)

TBML/DL

RNN

Small/Medium (2173–7800)

DL

SPTB/RNN

Medium (7800–35712)

TBML

RNN

Medium/Large (35712)

TBML

SPTB/RNN

Large (206573-11275200)

TBML

SPTB

Task Category

Best Performing Model Class

Best Performing Model Subclass

Energy and Utilities

TBML

SPTB

Environmental and Meteorological

DL

RNN

Agriculture and Crop Management

TBML

SPTB

Water and Air Quality

TBML

RNN

Transportation and Urban Mobility

TBML

SPTB

Structural and Mechanical Health Monitoring

DL

RNN

Stock Market, Finance, and Market Trends

DL

RNN

Healthcare and Biomedical Predictions

TBML

RNN

Anomaly Detection

TBML

SPTB/RNN

Other

TBML

SPTB

Time Interval

Best Performing Model Class

Best Performing Model Subclass

1 min

TBML/DL

RNN

5, 10 min

DL

RNN

15, 30 min

DL

RNN

1, 4 hour

TBML

SPTB

1 day

TBML/DL

RNN

1 week, 8 day, 15 day, 16 day

TBML

SPTB

1 month

DL

RNN

Computational Efficiency

Best Performing Model Class

Best Performing Model Subclass

Training Time

TBML

SPTB

Inference time

TBML

SPTB

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The paper provides a comprehensive and systematic survey of machine learning methods for time series prediction, covering both tree-based and deep learning approaches. The inclusion of recent forecasting competitions (M5 and M6) adds practical relevance to the study. The work can be published if the authors response to the following comments:

  1. Some figures suffer from poor quality. For example, Figure 1 is excessively large, and Figure 4 has low resolution.
  2. As discussed in Section 5.3.7.1, RMSE, MAE, R², and MAPE are the most commonly used metrics for regression tasks. Recently, a recently paper proposed a Weighted Quality Evaluation (WQE) method combining these four metrics, which may be worth mentioned, especially for the comprehensive evaluation WQE. If you find it interesting, please refer to “Zhou, Y., He, X., Montillet, J., Wang, S., Hu, S., Sun, X., Huang, J., & Ma, X. (2025). An improved ICEEMDAN-MPA-GRU model for GNSS height time series prediction with weighted quality evaluation index. GPS Solutions, Doi:10.1007/s10291-025-01867-z”
  1. Some sections (such as Section 5.3) are overly detailed. It is recommended to streamline the content to enhance conciseness and clarity.
  2. The transition between sections could be improved. At times, the paper reads more like a collection of independent analyses rather than a cohesive narrative.
  3. While the paper covers many models comprehensively, the discussion on Transformer-based models (attention mechanisms) is relatively brief. Given their emerging importance, it would strengthen the paper to expand this part slightly.
  4. The conclusions could be made more actionable by adding a summary table that highlights the best-performing models for each task type (e.g., energy, finance) and provides clearer, scenario-based recommendations regarding model selection under different conditions, such as small vs. large datasets or short- vs. long-term forecasting horizons.

 

Comments for author File: Comments.pdf

Author Response

Comments 1: Some figures suffer from poor quality. For example, Figure 1 is excessively large, and Figure 4 has low resolution.

Response 1: Thank you for pointing this out. We agree with this comment. Therefore, we have updated the size of Figure 1 and increased the resolution of Figure 4.

 

Comments 2: As discussed in Section 5.3.7.1, RMSE, MAE, R², and MAPE are the most commonly used metrics for regression tasks. Recently, a recently paper proposed a Weighted Quality Evaluation (WQE) method combining these four metrics, which may be worth mentioned, especially for the comprehensive evaluation WQE. If you find it interesting, please refer to “Zhou, Y., He, X., Montillet, J., Wang, S., Hu, S., Sun, X., Huang, J., & Ma, X. (2025). An improved ICEEMDAN-MPA-GRU model for GNSS height time series prediction with weighted quality evaluation index. GPS Solutions, Doi:10.1007/s10291-025-01867-z”

Response 2: Thank you for pointing this out. We agree with this comment. Therefore, we have

Included this study in section 5.4, page 25, line 717: Additionally, researchers may want to consider using a novel weighted quality evaluation index (WQE) proposed by [89] which combines the four most popular regression metrics (RMSE, MAE, MAPE, and R²) into a single unified evaluation standard that more holistically captures model performance.

 

Comments 3: Some sections (such as Section 5.3) are overly detailed. It is recommended to streamline the content to enhance conciseness and clarity.

Response 3: Thank you for pointing this out. We acknowledge that some sections contain a lot of detail but that all of the information is important, and readers can choose to skim this section if they do not want all the details. That being said, we found that we could

remove the following content from Section 5.3.1 on page 8: These results underscore the complexity of the performance landscape and highlight that it is insufficient to simply state that TBML models outperform DL models in time series prediction tasks. Instead, a nuanced understanding of individual subclasses helps to draw more meaningful conclusions.

 

Comments 4: The transition between sections could be improved. At times, the paper reads more like a collection of independent analyses rather than a cohesive narrative.

Response 4: Thank you for pointing this out. We agree with this comment. Therefore, we have included transitions where the transition between sections was vague. Between Section 4 and 5, page 6, line 229: With the foundational TBML and DL architectures established, the following section outlines the methodology used to conduct the analysis presented in this survey. Section 5.3.2, page 11, line 340: While overall performance metrics provide a high-level comparison of the models, they do not fully capture how each architecture performs on specific tasks. Accordingly, the next section explores task-specific analysis of model performance. Section 5.3.3, page 14, line 372: Beyond examining how model performance varies across different tasks, it is equally important to consider how external variables, such as dataset size, contribute to these performance differences. Section 5.3.4, page 15, line 406: Beyond the quantity of data, this study also investigates the temporal granularity at which data is collected to explore the implications on model performance. Section 5.3.5, page 431, line 431:  While intrinsic data characteristics undoubtedly influence model performance, it is also important to study how research priorities and biases shape model performance. Section 5.3.6, page 19, line 485:  In order to assess the impact of data characteristics and computational efficiency differences between models, it is important to analyze how these model performances are quantified in the first place. Section 5.3.8, page 22, line 561: Having examined the performance of conventional model architectures, this study now shifts focus to hybrid models, which integrate multiple ML and DL architectures, and are a common approach to improving predictive performance in time series tasks.

 

Comments 5: While the paper covers many models comprehensively, the discussion on Transformer-based models (attention mechanisms) is relatively brief. Given their emerging importance, it would strengthen the paper to expand this part slightly.

Response 5: Thank you for pointing this out. We agree with this comment. Therefore, we have included another section at the end of the paper in the future works section, page 30, line 919,  highlighting this models emerging importance: One of these areas that was briefly touched upon in this paper and that requires future research is Transformer based architectures. Initial research indicates that transformers excel at long range dependency modelling and thus perform well in the reviewed papers [28,33,70,74,77], showcasing performance on par with the best performing models in the literature. Because there are only 5 research papers included in this study future research should investigate the validity of this by exploring more recent papers where transformer models are being applied to time series applications. One of the biggest challenges that faces attention-based architectures is their significant computational cost. One of the most interesting areas in respect to addressing this is pre trained transformer models [93]. These models can be pre-trained on large collections of unrelated time series data enabling improved performance of training on datasets across domains.

 

 

Comments 6: The conclusions could be made more actionable by adding a summary table that highlights the best-performing models for each task type (e.g., energy, finance) and provides clearer, scenario-based recommendations regarding model selection under different conditions, such as small vs. large datasets or short- vs. long-term forecasting horizons.

Response 6: Thank you for pointing this out. We agree with this comment. Therefore, we have

Included a summary table like the one described in the conclusion, page 29:

 

Dataset Size

Best Performing Model Class

Best Performing Model Subclass

Small (0-2173)

TBML/DL

RNN

Small/Medium (2173–7800)

DL

SPTB/RNN

Medium (7800–35712)

TBML

RNN

Medium/Large (35712)

TBML

SPTB/RNN

Large (206573-11275200)

TBML

SPTB

Task Category

Best Performing Model Class

Best Performing Model Subclass

Energy and Utilities

TBML

SPTB

Environmental and Meteorological

DL

RNN

Agriculture and Crop Management

TBML

SPTB

Water and Air Quality

TBML

RNN

Transportation and Urban Mobility

TBML

SPTB

Structural and Mechanical Health Monitoring

DL

RNN

Stock Market, Finance, and Market Trends

DL

RNN

Healthcare and Biomedical Predictions

TBML

RNN

Anomaly Detection

TBML

SPTB/RNN

Other

TBML

SPTB

Time Interval

Best Performing Model Class

Best Performing Model Subclass

1 min

TBML/DL

RNN

5, 10 min

DL

RNN

15, 30 min

DL

RNN

1, 4 hour

TBML

SPTB

1 day

TBML/DL

RNN

1 week, 8 day, 15 day, 16 day

TBML

SPTB

1 month

DL

RNN

Computational Efficiency

Best Performing Model Class

Best Performing Model Subclass

Training Time

TBML

SPTB

Inference time

TBML

SPTB

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The manuscript is very detailed and informative (over 30 pages); however, its length and density could discourage some readers. While the depth is valuable, an executive summary or additional visual summaries (e.g., key findings tables) would significantly enhance readability and accessibility.

Although several figures are included, the addition of summary tables contrasting TBML and DL methods across major aspects—such as accuracy, training efficiency, scalability, and computational cost—would further improve clarity.

Attention-based models (Transformers) are briefly mentioned, but the discussion remains limited. Given their growing importance in time series forecasting, a dedicated subsection exploring their opportunities and challenges would strengthen the manuscript.

The survey primarily focuses on methods developed between 2017 and 2024. Including a forward-looking discussion on emerging trends—such as Diffusion Models or Neural Ordinary Differential Equations (Neural ODEs)—would make the study more comprehensive and future-oriented.

In addition, recent studies have explored the integration of physical laws into traditional machine learning models. Has the author considered including a discussion on Physics-Informed Machine Learning (PIML) approaches? Leveraging physical information can significantly enhance the interpretability and generalization ability of purely data-driven models, which is a crucial direction worth noting. Such as: Computer Methods in Applied Mechanics and Engineering(2025), 433, 117474. Computers and Geotechnics(2024), 169, 106174.

Finally, careful proofreading is recommended to address occasional minor typographical errors and to improve the clarity of figure captions.

 

Author Response

Comments 1: The manuscript is very detailed and informative (over 30 pages); however, its length and density could discourage some readers. While the depth is valuable, an executive summary or additional visual summaries (e.g., key findings tables) would significantly enhance readability and accessibility

Response 1: Thank you for pointing this out. We agree with this comment. Therefore, we have included a key findings table in the conclusion, page 29:

 

Dataset Size

Best Performing Model Class

Best Performing Model Subclass

Small (0-2173)

TBML/DL

RNN

Small/Medium (2173–7800)

DL

SPTB/RNN

Medium (7800–35712)

TBML

RNN

Medium/Large (35712)

TBML

SPTB/RNN

Large (206573-11275200)

TBML

SPTB

Task Category

Best Performing Model Class

Best Performing Model Subclass

Energy and Utilities

TBML

SPTB

Environmental and Meteorological

DL

RNN

Agriculture and Crop Management

TBML

SPTB

Water and Air Quality

TBML

RNN

Transportation and Urban Mobility

TBML

SPTB

Structural and Mechanical Health Monitoring

DL

RNN

Stock Market, Finance, and Market Trends

DL

RNN

Healthcare and Biomedical Predictions

TBML

RNN

Anomaly Detection

TBML

SPTB/RNN

Other

TBML

SPTB

Time Interval

Best Performing Model Class

Best Performing Model Subclass

1 min

TBML/DL

RNN

5, 10 min

DL

RNN

15, 30 min

DL

RNN

1, 4 hour

TBML

SPTB

1 day

TBML/DL

RNN

1 week, 8 day, 15 day, 16 day

TBML

SPTB

1 month

DL

RNN

Computational Efficiency

Best Performing Model Class

Best Performing Model Subclass

Training Time

TBML

SPTB

Inference time

TBML

SPTB

 

Comments 2: Although several figures are included, the addition of summary tables contrasting TBML and DL methods across major aspects—such as accuracy, training efficiency, scalability, and computational cost—would further improve clarity.

Response 2: Thank you for pointing this out. We agree with this comment. Therefore, we have included this large summary table (as shown above and on page 29) at the end that compares the best performing data characteristics for TBML and DL methods. Additionally, we have included a table in section 5.3.6, page 19 that improves clarity by providing a visual for the training efficiency/computational cost of model development:

 

Metric

TBML Training Advantage (%)

Study [4]

4,010.33

Study [20]

181.81

Study [29]

-22.55

Study [67]

1,251.81

Study [43]

142.66

Study [45]

7,196.53

Study [74]

905,140

Study [51]

235,559.39

Study [55]

10,145.98

Study [66]

100,140

Median

5,603.43

Mean

126,934.94

 

Comments 3: Attention-based models (Transformers) are briefly mentioned, but the discussion remains limited. Given their growing importance in time series forecasting, a dedicated subsection exploring their opportunities and challenges would strengthen the manuscript.

Response 3: Thank you for pointing this out. We agree with this comment. Therefore, we have included another dedicated section to Transformer models in the future works section that highlights the opportunities, challenges, and potential avenues for future development on page 30 starting on line 919: One of these areas that was briefly touched upon in this paper and that requires future research is Transformer based architectures. Initial research indicates that transformers excel at long range dependency modelling and thus perform well in the reviewed papers [28,33,70,74,77], showcasing performance on par with the best performing models in the literature. Because there are only 5 research papers included in this study future research should investigate the validity of this by exploring more recent papers where transformer models are being applied to time series applications. One of the biggest challenges that faces attention-based architectures is their significant computational cost. One of the most interesting areas in respect to addressing this is pre trained transformer models [93]. These models can be pre-trained on large collections of unrelated time series data enabling improved performance of training on datasets across domains.

Comments 4: The survey primarily focuses on methods developed between 2017 and 2024. Including a forward-looking discussion on emerging trends—such as Diffusion Models or Neural Ordinary Differential Equations (Neural ODEs)—would make the study more comprehensive and future-oriented.

Response 4: Thank you for pointing this out. We agree with this comment. Therefore, we have also included a discussion about these emerging trends in future works section to make the paper more future oriented on page 30 line 931: Other areas that future researchers could focus on include Diffusion Models and Neural Ordinary Differential Equation Models. Differential Equation Models have shown success in text, image, and video applications and recently have started to be applied to time series forecasting use cases (For a comprehensive review of Diffusion models for time series application see [94]). Future research should focus on combatting the exceptionally high computational cost associated with the Diffusion Models applied to time series ap-plications while maintaining high accuracy. Similarly, Neural Ordinary Differential Equations offer a modeling framework that provides a principled approach for forecasting continuous time series data by combining neural networks with the mathematics of differential equations (For a comprehensive review of Neural Ordinary Differential Equations for time series application see [95]). Neural Ordinary Differential Equations involve solving differential equations at training and inference time which can be computationally very expensive. Future research in this field should focus on reducing these computational demands by developing more efficient solvers to increase the practicality of this DL architecture for time series forecasting applications. 

Comments 5: In addition, recent studies have explored the integration of physical laws into traditional machine learning models. Has the author considered including a discussion on Physics-Informed Machine Learning (PIML) approaches? Leveraging physical information can significantly enhance the interpretability and generalization ability of purely data-driven models, which is a crucial direction worth noting. Such as: Computer Methods in Applied Mechanics and Engineering(2025), 433, 117474. Computers and Geotechnics(2024), 169, 106174.

Response 5: Thank you for pointing this out. Although this area of research looks extremely interesting, after looking over these papers we have decided that they are outside of the scope of the research we have conducted in this paper and not extremely relevant to the topics presented here. Therefore after considering including a discussion on Physics-Informed Machine Learning approaches we have decided to not include this in our paper.  

Comments 6: Finally, careful proofreading is recommended to address occasional minor typographical errors and to improve the clarity of figure captions.

Response 6: Thank you for pointing this out. We agree with this comment. Therefore, we have made several adjustments to the paper including caption updates which include but are not limited to the following lines: 100-101, 114, 127, 162-163, 249-250, 382-383, and 392-393

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors
  1. Please refrain from concluding a section or any subsection level with a figure, table, chart, etc. Instead, could you conclude with a paragraph of text? Example: Figure 1, after which you have 3.2.3. Insert a lead-in text between Figure 1 and 3.2.3 to introduce 3.2.3. Do that from the top to the bottom of the paper.
  2. Please don't stack figures one on top of the other. For instance, Figure 4 follows Figure 3. Insert text describing Figure 4 between Figures 3 and 4. 
  3. Figures must be referenced in the text, preferably before the figure, so that the reference and surrounding text describe the picture. Figure 4's description is a paragraph below the figure. This layout is not reader-friendly. Please correct these mistakes from the top to the bottom of the paper. 
  4. Your paper is missing a mandatory "Future research directions" section, as mandated by https://www.mdpi.com/about/article_types.
  5. There are gaps in your paper that you should work on. Here are some suggestions for additional ML techniques/topics with time series to add to the paper (I'm only using Xplore as one of the sources; you could do those searches in other databases easily):

    SVM for time series forecasting
    https://ieeexplore.ieee.org/search/searchresult.jsp?queryText=SVM%20support%20vector%20machine%20time%20series&highlight=true&returnFacets=ALL&returnType=SEARCH&matchPubs=true&ranges=2020_2025_Year

    elm extreme learning machine time series
    https://ieeexplore.ieee.org/search/searchresult.jsp?queryText=elm%20extreme%20learning%20machine%20time%20series&highlight=true&returnFacets=ALL&returnType=SEARCH&matchPubs=true&ranges=2020_2025_Year

    ltsm
    https://ieeexplore.ieee.org/search/searchresult.jsp?queryText=ltsm%20long%20short-term%20memory%20time%20series&highlight=true&returnFacets=ALL&returnType=SEARCH&matchPubs=true&ranges=2020_2024_Year

    DTW
    https://ieeexplore.ieee.org/search/searchresult.jsp?queryText=dtw%20dynamic%20time%20warping%20time%20series&highlight=true&returnFacets=ALL&returnType=SEARCH&matchPubs=true&ranges=2020_2025_Year

     

Comments on the Quality of English Language
  1. Please fix English spelling errors in your paper.

Author Response

Comments 1: Please refrain from concluding a section or any subsection level with a figure, table, chart, etc. Instead, could you conclude with a paragraph of text? Example: Figure 1, after which you have 3.2.3. Insert a lead-in text between Figure 1 and 3.2.3 to introduce 3.2.3. Do that from the top to the bottom of the paper.

Response 1: Thank you for pointing this out. We agree with this comment. Therefore, we have made sure that none of the sections or subsections end with a figure, table, or chart.

Comments 2: Please don't stack figures one on top of the other. For instance, Figure 4 follows Figure 3. Insert text describing Figure 4 between Figures 3 and 4. 

Response 2: Thank you for pointing this out. We agree with this comment. Therefore, we have made sure that none of the figures in the paper on stacked on top of each other.  

Comments 3: Figures must be referenced in the text, preferably before the figure, so that the reference and surrounding text describe the picture. Figure 4's description is a paragraph below the figure. This layout is not reader-friendly. Please correct these mistakes from the top to the bottom of the paper. 

Response 3: Thank you for pointing this out. We agree with this comment. Therefore, we have moved the figures to follow as closely from where they are mentioned in the text as possible.

Comments 4: Your paper is missing a mandatory "Future research directions" section, as mandated by https://www.mdpi.com/about/article_types.

Response 4: Thank you for pointing this out. We agree with this comment. Therefore, we have  included this section at the end of our paper Section 8, page 30, line 917:

  1. Future Work

There are several areas of future work that researchers could focus on to expand the insights of this paper especially in areas of growing importance in the field of DL. One of these areas that was briefly touched upon in this paper and that warrants future research is Transformer-based architectures. Initial research indicates that transformers excel at long range dependency modelling and thus perform well in the reviewed papers [28,33,70,74,77], showcasing performance on par with the best performing models in the literature. Because there are only 5 research papers included in this study future research should investigate the validity of this by exploring more recent papers where transformer models are being applied to time series applications. One of the biggest challenges that faces attention-based architectures is their significant computational cost. An interesting area to explore in addressing this is pre trained transformer models [93]. These models can be pre-trained on large collections of unrelated time series data enabling improved performance of training on datasets across domains.

Other areas that future researchers could focus on include Diffusion Models and Neural Ordinary Differential Equation Models. Differential Equation Models have shown success in text, image, and video applications and recently have started to be applied to time series forecasting use cases (For a comprehensive review of Diffusion models for time series application see [94]). Future research should focus on combatting the exceptionally high computational cost associated with the Diffusion Models applied to time series ap-plications while maintaining high accuracy. Similarly, Neural Ordinary Differential Equations offer a modeling framework that provides a principled approach for forecasting continuous time series data by combining neural networks with the mathematics of differential equations (For a comprehensive review of Neural Ordinary Differential Equations for time series application see [95]). Neural Ordinary Differential Equations involve solving differential equations at training and inference time which can be computationally very expensive. Future research in this field should focus on reducing these computational demands by developing more efficient solvers to increase the practicality of this DL architecture for time series forecasting applications. 

Comments 5: There are gaps in your paper that you should work on. Here are some suggestions for additional ML techniques/topics with time series to add to the paper (I'm only using Xplore as one of the sources; you could do those searches in other databases easily):

SVM for time series forecasting
https://ieeexplore.ieee.org/search/searchresult.jsp?queryText=SVM%20support%20vector%20machine%20time%20series&highlight=true&returnFacets=ALL&returnType=SEARCH&matchPubs=true&ranges=2020_2025_Year

elm extreme learning machine time series
https://ieeexplore.ieee.org/search/searchresult.jsp?queryText=elm%20extreme%20learning%20machine%20time%20series&highlight=true&returnFacets=ALL&returnType=SEARCH&matchPubs=true&ranges=2020_2025_Year

ltsm
https://ieeexplore.ieee.org/search/searchresult.jsp?queryText=ltsm%20long%20short-term%20memory%20time%20series&highlight=true&returnFacets=ALL&returnType=SEARCH&matchPubs=true&ranges=2020_2024_Year

DTW
https://ieeexplore.ieee.org/search/searchresult.jsp?queryText=dtw%20dynamic%20time%20warping%20time%20series&highlight=true&returnFacets=ALL&returnType=SEARCH&matchPubs=true&ranges=2020_2025_Year

Response 5: Thank you for pointing this out. The purpose of this paper was to focus only on the best performing methods in time series analysis. Therefore, because we did not encounter any surveyed papers or top performing time series competition competitors that used extreme learning machines or dynamic time warping in their methodology, we do not feel that it is appropriate to include these models in our paper especially considering that these methods have been around for many years. Additionally, we have already included much discussion about LSTM models in time series forecasting staring on line 203, section 4.2, page 6 as well as many more references through the paper. Similarly, SVM are analyzed in the research, but their lackluster results mean that they are not the focus of the paper (see line 245 in section 5.1 page 7). We did acknowledge some other, newer ML techniques that we did not come across in our research but that future researchers could focus on including Diffusion Models and Neural Ordinary Equation Models included in the future works section.

Author Response File: Author Response.pdf

Round 2

Reviewer 4 Report

Comments and Suggestions for Authors

Happy with the changes made, this paper is ready for publishing.

Back to TopTop