Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Scale-Fusion Transformer: A Medium-to-Long-Term Forecasting Model for Parking Space Availability

Electronics 2025, 14(18), 3636; https://doi.org/10.3390/electronics14183636

by Jie Chen^1,2,*

, Mengli Wu¹, Sheng Li¹, Yunyi Cai¹, Wangchen Long³ and Bo Yang^4,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Electronics 2025, 14(18), 3636; https://doi.org/10.3390/electronics14183636

Submission received: 2 August 2025 / Revised: 8 September 2025 / Accepted: 12 September 2025 / Published: 14 September 2025

(This article belongs to the Special Issue Digital Intelligence Technology and Applications)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper presents a SFFormer model for medium-to-long-term parking available space prediction. The study addresses a significant gap in the literature by focusing on multi-scale temporal feature integration and cross-scale dependency parsing. The methodological innovations, including the scale fusion module, adaptive data compression, and Transformer-based architecture, are well-motivated and rigorously evaluated.

The adaptive data compression mechanism is mentioned but not sufficiently detailed. How does the compression ratio affect model performance? Are there trade-offs between compression and prediction accuracy?
The detail of the ablation experiment were not presented clearly. Such as in the table 4, How removing the impact of the AAP and pl? Please give some description about with and without APP and pl.
in line 231, "Error! Reference source not found."
In line 101 and line 203, the AAP has defined in 101, and no need to define again in line 203. the same as the MSE, please check the full paper to revise all the
The reference formatting is inconsistent and lacks uniformity, Some authors use abbreviated names while others do not, such as [18],[21], [28]and [27], please use the consistent formatting.
The term "long-short cycle" is used inconsistently, "long and short cycles" in line 453, and long-short cycle in line 98.
Figuresresolution is low, and some text in the figures are not sufficient Such as fig.4 and fig.8
From line 354 to line 360, no need the bold type.

Author Response

1. The adaptive data compression mechanism is mentioned but not sufficiently detailed. How does the compression ratio affect model performance? Are there trade-offs between compression and prediction accuracy?

Thank you for your question. The "adaptive data compression mechanism" mentioned in our paper is primarily implemented through our designed dual-branch patch embedding module. Below, we provide further details on this mechanism.

Traditional Transformer models face a computational complexity of O(L²) for the self-attention mechanism when processing long-sequence time-series data, where L is the sequence length. When the input sequence is long (and contains substantial redundant information), we introduce a compression mechanism. This mechanism maps a longer time series into a fixed-length feature vector for subsequent prediction by the backbone network. The compression ratio can be viewed as an internal metric during model operation; for an input length L, the compression ratio is L/L' .

Regarding the question of whether there is a trade-off between compression and prediction accuracy, we conducted additional experiments to systematically evaluate performance under different compression ratios. The experimental results show that moderate compression (e.g., p_l=96, compression ratio ≈ 1.833) does not significantly degrade model performance and may even slightly improve it in some cases. For instance, under the settings p_s=16 and p_l=96 , the MAE and MSE values for most sequence lengths remain competitive compared to the baseline (p_l=48). Higher compression ratios (e.g., p_l=144 or 288) generally maintain or even enhance performance on shorter sequences (e.g., 24–96), indicating that the adaptive compression mechanism effectively preserves key information while reducing data volume. However, when compression is too aggressive, a clear trade-off between performance and compression level emerges, especially on longer sequences. For example, with pl=288pl=288 (compression ratio ≈ 3.5) and a sequence length of 720, both MAE and MSE increase significantly, suggesting that excessive compression leads to a decline in prediction accuracy.

Arg	p_s=16 p_l=48	p_s=16 p_l=96	p_s=16 p_l=144	p_s=16 p_l=288
Metric	MAE MSE	MAE MSE	MAE MSE	MAE MSE
24	0.0278 0.1239	0.0320 0.1307	0.0292 0.1255	0.0276 0.1242
48	0.0400 0.1480	0.0456 0.1494	0.0530 0.1670	0.0419 0.1535
96	0.0597 0.1858	0.0676 0.2045	0.0709 0.2022	0.0542 0.1725
144	0.0657 0.1868	0.1092 0.2475	0.0628 0.1828	0.0539 0.1784
216	0.0906 0.2117	0.1097 0.2440	0.0733 0.1949	0.0637 0.1921
288	0.0924 0.2166	0.0777 0.2054	0.0828 0.2110	0.0723 0.2014
576	0.1076 0.2438	0.1004 0.2310	0.0969 0.2298	0.0915 0.2231
720	0.1244 0.2537	0.1174 0.2506	0.1119 0.2450	0.1413 0.2719
Compression ratio	1	1.833	2.5	3.5

Overall, our adaptive compression mechanism achieves a good balance between task requirements and performance, significantly reducing computational and storage costs while maintaining high prediction accuracy.

2. The detail of the ablation experiment were not presented clearly. Such as in the table 4 , How removing the impact of the AAP and pl? Please give some description about with and without APP and pl.

Thank you for your feedback. We sincerely apologize for the lack of clarity in the description of the ablation experiments. In Table 5 (originally Table 4), we conducted ablation studies on key components of the model to verify their respective contributions. Here, we clarify the meanings of the abbreviations:

p_srefers to the long-patch branch in our dual-branch structure, which uses larger patch sizes to capture long-term trends and low-frequency information in the time series.

p_lrefers to the short-patch branch, which uses smaller patch sizes to capture local, high-frequency detailed variations.

AAP stands for the Adaptive Approximation Pooling module, responsible for integrating features from both the and branches.

Below, I will elaborate in detail on the impact of removing these key modules.

In our proposed SFFormer model, the input data is processed through two parallel branches to capture long-term global trends and short-term local details, respectively. These multi-scale features are then integrated via the Adaptive Approximation Pooling (AAP) module to ultimately make predictions. The full model achieves strong results across various evaluation metrics, demonstrating the effectiveness of the collaborative work of multiple components.

Removing the long branch means the model can only capture short-term details. While it remains responsive to high-frequency fluctuations, it lacks the ability to perceive overall trends, leading to a significant decline in prediction performance, especially on data with strong trends or periodicity. Conversely, if only the long branch is retained, the model can grasp the global direction but tends to smooth out short-term abrupt changes, resulting in insufficient responsiveness to local variations and performance that is still inferior to the full model. If the AAP module is simply replaced with a basic fusion method, the model, although better than the single-branch structures, still underperforms compared to the full model due to its inability to dynamically adjust fusion weights based on data characteristics. Therefore, as shown in Table 5 (originally Table 4), removing the AAP module leads to a performance drop, which directly proves that intelligently and adaptively integrating multi-scale features significantly enhances the model's final prediction accuracy.

Furthermore, we have revised the relevant section in the original manuscript (Lines 513-534 on Pages15) as follows:

"To validate the effectiveness of each key component in our model, we conducted comprehensive ablation studies on the C1 dataset, as summarized in Table 5 (originally Table 4). The results demonstrate that the SFFormer model, which integrates both long- and short-cycle mechanisms, significantly outperforms the single-scale baseline in multi-step prediction tasks for parking lot C1, achieving an average performance improvement of 20.89%. This clearly underscores the superiority of the proposed approach.

Specifically, in the configuration without p_l, the model loses its ability to capture global temporal patterns, resulting in a noticeable degradation in performance. Conversely, when p_s is removed, which is responsible for capturing fine-grained local variations and short-term fluctuations, the model retains some capacity to perceive long-term trends but exhibits a clear decline in prediction accuracy due to the lack of detailed modeling. Furthermore, in the setting without AAP, where we replace the AAP module with a simple feature concatenation while retaining the dual-branch structure, the model still benefits from multi-scale features but is limited by the absence of an adaptive fusion mechanism, leading to suboptimal performance.

The complete SFFormer model effectively captures multi-scale features through the p_l and p_s branches and intelligently integrates them via the AAP module, achieving the best performance across all evaluation metrics. These results not only quantitatively affirm the advantage of SFFormer over the single-scale model but also validate the efficacy of long-short cycle fusion in improving prediction accuracy and stability. Overall, the ablation study strongly demonstrates the necessity and effectiveness of the proposed multi-scale architecture and adaptive fusion strategy."

Once again, we would like to thank you for your valuable comments, which have greatly helped us improve the clarity and rigor of our paper. We are very grateful for your insightful suggestions.

3. in line 231, "Error! Reference source not found."

Thank you very much for identifying this significant error. This was an oversight during our document editing process. We have located and corrected the citation in question, replacing "Error! Reference source not found." with the proper literature reference.

4. In line 101 and line 203, the AAP has defined in 101, and no need to define again in line 203. the same as the MSE, please check the full paper to revise all the

We would like to sincerely thank you for identifying this critical issue. You are absolutely correct that the repeated definition of terms undermines the fluency and professionalism of the manuscript. We have carefully reviewed the entire paper to ensure that all abbreviations—including AAP and MSE—are defined only at their first occurrence, and all redundant subsequent definitions have been removed.

5. The reference formatting is inconsistent and lacks uniformity, Some authors use abbreviated names while others do not, such as [18],[21], [28]and [27], please use the consistent formatting.

Thank you for your correction. We have revised and standardized the format of the entire reference list, specifically addressing the issue of inconsistencies in author name abbreviations you pointed out. Currently, all reference entries comply with the unified and standardized format required by the journal.

6. The term "long-short cycle" is used inconsistently, "long and short cycles" in line 453, and long-short cycle in line 98.

This is an excellent reminder—terminology consistency is crucial to the clarity of the manuscript. We have reviewed the entire manuscript and standardized the term to "long-short period" to ensure consistency in expression.

7. Figures resolution is low, and some text in the figures are not sufficient Such as fig.4 and fig.8

We sincerely apologize for the poor quality of the figures. In accordance with your suggestion, we have redrawn Figure 4, Figure 9 (previously labeled Figure 8), and all other figures in the manuscript, ensuring they all have high resolution. Additionally, we have significantly increased the font size of all text elements in the figures—such as legends and axis labels—to ensure they are clearly readable.

8. From line 354 to line 360, no need the bold type.

We appreciate your attentiveness. We agree that the use of bold formatting here is unnecessary. We have removed the bold formatting from lines 384–397 on Page 11 and conducted a thorough check of the entire manuscript to ensure the standardized use of formatting.

Once again, we sincerely thank you for the valuable time and effort you have invested in enhancing the quality of our paper. We believe that with these revisions, the standardization and readability of the manuscript have been significantly improved.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The paper is about an advanced model of forecasting parking space availability. This topic is highly relevant, and the model developed uses up-to-date methods.

The reviewer has some remarks and questions related to the applicability of the model in practice. As the paper writes, “(parking predictions) value also lies in supporting low-carbon city development cutting energy waste and carbon emissions from unnecessary driving, offering scientific basis for urban planning”. I think the model has very loose contact with these, otherwise important, high-level goals. I recommend being more modest in defining the impact of the model.

The paper writes about short, medium- and long-term predictions. Lines 109-110 mention “diverse prediction tasks (24–720 hours/1–30 days), enabling applications from real-time parking guidance to long-term infrastructure planning.” I think infrastructure planning needs a much longer horizon, measured in years. Please respond to this issue. Maybe long-term is not the best word in this context.

I am missing the mention of any potential feedback from the parking authority based on the actual demand for the operation of the facilities under consideration. High demand during certain hours of the day/week may result in time-dependent parking fee system, which would complicate the forecasting model. Dynamic on-street or on-line in-vehicle parking guidance systems would also influence parking patterns. Please elaborate on this matter.

Author Response

As the paper writes, “(parking predictions) value also lies in supporting low-carbon city development cutting energy waste and carbon emissions from unnecessary driving, offering scientific basis for urban planning”. I think the model has very loose contact with these, otherwise important, high-level goals. I recommend being more modest in defining the impact of the model.
We would like to express our sincere gratitude for your insightful and constructive comment. We fully agree with your observation that the original descriptions in our paper regarding the model’s impact on macro-level goals such as "low-carbon urban development" and "urban planning" indeed lack sufficient connection to the model’s direct functions and are overly broad. Your suggestion has prompted us to re-examine the positioning of the model’s value, and we have revised the relevant content in lines 35–38 on Page 2 of the paper (marked in red). You advised us to "be more modest in defining the impact of the model" — we have adopted this core suggestion and revised the original description to: "In the long term, the primary value of this model lies in optimizing the utilization of parking resources and enhancing urban traffic efficiency. By providing accurate predictions of parking availability, it helps reduce time spent searching for parking and supports better-informed urban management decisions."

We believe this new description is more accurate and appropriate for the following reasons: First, it links the model’s value directly to two core and verifiable objectives — "optimizing parking resource utilization" and "enhancing urban traffic efficiency." Second, the revised content clearly identifies the path to achieving these values, namely "by reducing parking search time." Additionally, we have replaced the overly macro term "urban planning" with "urban management decisions," which better aligns with the scope of operational and management-level decisions that our model’s prediction horizon can support.

In conclusion, we are deeply grateful for your guidance, which has greatly improved the rigor and practical relevance of our paper. We have updated this description in the revised manuscript to more accurately reflect the actual contributions of our work.
The paper writes about short, medium- and long-term predictions. Lines 109-110 mention “diverse prediction tasks (24–720 hours/1–30 days), enabling applications from real-time parking guidance to long-term infrastructure planning.” I think infrastructure planning needs a much longer horizon, measured in years. Please respond to this issue. Maybe long-term is not the best word in this context.
We would like to thank you for your insightful concern regarding the applicability of "long-term forecasting" in our paper. Your comment has prompted us to re-examine and more precisely elaborate on our model’s capabilities and application scenarios. We fully agree that it is inappropriate to link our forecasting results to "long-term infrastructure planning" — we have corrected this statement in the revised manuscript (the red-highlighted part in Line 118 on Page 3). Specifically, we have revised "infrastructure planning" to "long-term parking facility planning" (parking facility long-term planning). This revision not only retains the temporal attribute of "long-term planning" but also accurately defines the target of planning through "parking facility," strongly aligning it with parking scenarios and making the logic of the application scenario more rigorous and its focus more clear.

In the field of time series forecasting literature, it is a common practice to classify forecasting horizons into short-term, medium-term, and long-term; however, these terms are relative. The "long-term forecasting" we define (e.g., 720 hours) is contextualized within the technical framework of time series forecasting models, relative to the typical 24-hour or 48-hour forecasting horizons. It demonstrates the model’s ability to capture periodic and trend patterns over a longer time span.

We acknowledge that "long-term" is a relative concept in the technical context of time series forecasting. Our model is designed to predict the next H time steps (time slices). In our main experiments, the range of H is 24 to 720. Forecasting for 24–720 time steps is conducted based on the data’s sampling interval; when the data uses a sampling interval of 1 hour or 1 day, the forecasting time range can be extended. However, due to limitations in our time series data domain, we have not collected longer time series data to conduct relevant experiments.

In summary, we have addressed your concern by correcting the inappropriate statement. We believe these revisions have made the conclusions of our paper more rigorous and have more clearly defined the value and applicable boundaries of our model. Once again, we sincerely thank you for your valuable comment, which has greatly helped us enhance the quality of our paper.
I am missing the mention of any potential feedback from the parking authority based on the actual demand for the operation of the facilities under consideration. High demand during certain hours of the day/week may result in a time-dependent parking fee system, which would complicate the forecasting model. Dynamic on-street or on-line in-vehicle parking guidance systems would also influence parking patterns. Please elaborate on this matter.

You have raised a critical question regarding the robustness of the model in real-world, dynamic environments. We acknowledge that this is a core challenge that all predictive models must address when moving from theory to practice.

Our current model, SFFormer, learns the inherent temporal patterns (such as periodicity, trends, and abrupt changes) of parking space occupancy based on historical data. As evident from its architectural design and training process, it is essentially an autoregressive model that primarily relies on the historical information of the time series itself. It does not explicitly take dynamic external factors as inputs, such as real-time price fluctuations, traffic control information, or real-time recommendations from guidance systems.

To sincerely address your concern and enhance the rigor and completeness of our paper, we have made substantial revisions to Section 5 ("Discussion") of the manuscript (Lines 594–598 and 607–612 on Page 17).

In the third paragraph of the Discussion section, we now explicitly present this issue as a core limitation. We have added the following content: "These limitations stem primarily from its reliance on learning patterns from historical time-series data alone. For example, the model does not account for real-world feedback loops; interventions like time-dependent parking fees or dynamic guidance systems can actively alter parking patterns, introducing complexities that our current auto-regressive model does not capture." We have incorporated key concepts you mentioned—including "feedback loops," "time-dependent parking fees," and "dynamic guidance systems"—to directly address your concern. Meanwhile, we have clarified the nature of the current model (an autoregressive model) from a technical perspective, explaining the fundamental reason for this limitation.

In the fourth paragraph of the Discussion section, we have translated this challenge into a specific and feasible direction for future research. We have added the following statement: "To address the aforementioned limitation, a crucial future direction is to extend our framework into a multivariate model that incorporates external event variables or an integrated anomaly detection module, thereby improving resilience against unexpected disruptions. This would involve multi-source data fusion to incorporate dynamic external variables—such as real-time pricing and traffic guidance data—as explicit inputs, enhancing the model's real-world applicability." This passage not only acknowledges the problem but, more importantly, proposes a clear solution: extending our model framework into a multivariate model to enable these dynamic external factors to be used as inputs to the model.

Reviewer 3 Report

Comments and Suggestions for Authors

This article is in the field of artificial intelligence, specifically machine learning and deep learning. It focuses on the problem of time series forecasting, specifically medium- and long-term parking space availability forecasting in intelligent transportation systems. The article includes experimental research, which I personally consider a significant strength. It can be argued that the article contributes new knowledge by proposing the Scale-Fusion Transformer model, which addresses the problems of medium- and long-term parking space availability forecasting. This is also the clearly defined goal of the study. The research methods used are appropriate and well-described. The data analysis is accurate and logical. The article includes visualizations that demonstrate the seasonality and variability of parking data. The experimental results are presented in both tables and graphs. Conclusions are based on the presented results. The authors conclude that their SFFormer model outperforms other models in most forecasting tasks. The literature review demonstrates a good understanding of the current state of research. The article is generally clearly written and well-organized, and follows a standard, well-structured approach. I have just a few comments:
The authors present MSE/MAE values, but they do not state whether the differences between the models are statistically significant (significance tests are recommended). This limits the evidential value of their superiority over other approaches.
In my opinion, the SFFormer model assumes that the data have cyclical, repetitive patterns. There is no analysis of whether the time series is stationary or whether the model can cope with sudden and unexpected situations (e.g., large events). Robustness to unusual events has not been tested – this is at least worth commenting on.
It would be worthwhile to standardize the font in the graphs and improve the readability of the figures; for example, Figure 4 is terrible.
The text also needs editorial refinement; there are incomprehensible and unjustified boldface, character clusters, and typos.

Author Response

The authors present MSE/MAE values, but they do not state whether the differences between the models are statistically significant (significance tests are recommended). This limits the evidential value of their superiority over other approaches.
Thank you very much for your valuable and constructive feedback. We fully agree that verifying the statistical significance of performance differences between models is a key step in rigorously demonstrating the superiority of models, and your suggestion has effectively helped us further improve the reliability of our research conclusions.

Regarding the issue you pointed out, namely "failing to explain the statistical significance of differences between models", we conducted strict statistical significance tests (paired t-tests) to evaluate the performance differences between the base models and the enhanced versions integrated with our proposed components. The experimental results show that for the PatchTST model, integrating our components significantly improves the performance from 0.7511 to 0.7735 (p=0.0115); for the Timer model, the performance after enhancement also significantly increases from 0.7278 to 0.7538 (p=0.0421). These results provide strong statistical evidence that the observed performance improvement is not random but a statistically significant improvement brought by our proposed components.

Model

Before Enhancement (Base)

After Enhancement

P-value

PatchTST

0.7511

0.7735

0.0115

Timer

0.7278

0.7538

0.0421

Meanwhile, our experimental design provides evidence for the superiority of the model from multiple dimensions: In the code, we set a fixed random seed to ensure the reproducibility of experiments, and verified the stability of performance through multiple experiments in actual research — SFFormer has shown good performance in most prediction lengths. In terms of evaluation metrics, we calculated both MSE and MAE. We found that SFFormer leads in both metrics for most tasks. In addition, Table 2 (originally Table 2), Table 3 (originally Table 3), and Figure 9 in the paper show that SFFormer can maintain good performance over a wide range of hyperparameter combinations, and this hyperparameter robustness reduces the possibility of "overfitting to specific parameters".

In summary, both the p-values from the special statistical tests and the reproducibility, multi-metric consistency, and hyperparameter robustness in the experimental design prove the statistical significance and reliability of SFFormer's performance advantages. Thank you again for your professional guidance to help us improve our research work.

In my opinion, the SFFormer model assumes that the data have cyclical, repetitive patterns. There is no analysis of whether the time series is stationary or whether the model can cope with sudden and unexpected situations (e.g., large events). Robustness to unusual events has not been tested – this is at least worth commenting on.

Thank you very much for your profound and valuable comments. We fully agree that analyzing the stationarity of time series and evaluating the model's robustness under unexpected events are crucial for a comprehensive understanding of the performance and applicable scope of our proposed SFFormer model. Following your suggestions, we have conducted supplementary experiments and in-depth analyses, and our responses are as follows:

(1)Regarding the analysis of time series stationarity, we have adopted your suggestion and conducted strict time series stationarity tests on the original parking dataset used in the experiments. The results have been presented in Figure 6 of the newly added Section 4.1.1 in the revised manuscript (Page 9, Lines 349–377).

Dataset	ADF statistic	ADF p-value	KPSS statistic	KPSS p-value	Stationarity
C1	-8.4060	2.17e-13	0.2361	0.100000	Stationary
C2	-8.5443	9.60e-14	0.1078	0.100000	Stationary
C3	-8.8923	1.23e-14	1.3949	0.010000	Non-Stationary
C4	-12.5428	2.29e-23	0.0376	0.100000	Stationary
C5	-10.3025	3.34e-18	0.2107	0.100000	Stationary
C6	-9.0999	3.63e-15	0.1317	0.100000	Stationary
C7	-9.5234	3.02e-16	0.1146	0.100000	Stationary
C8	-8.4773	1.42e-13	0.4448	0.057859	Stationary

The analysis results show that for the stationarity of the time series in each dataset, we jointly verify it via the ADF test (null hypothesis: existence of a unit root, i.e., non-stationary) and the KPSS test (null hypothesis: stationary). In the ADF test, the p-values of all datasets are far less than 0.05, strongly rejecting the null hypothesis of “non-stationary”. In the KPSS test, except for C3 (with a p-value of 0.01 < 0.05, thus determined as Non-Stationary), the p-values of other datasets (e.g., C1 returns a p-value of 0.1, and the actual p-value is larger as the test statistic exceeds the valid range of the lookup table) are all greater than 0.05, so the null hypothesis of “stationary” cannot be rejected. The results of the two tests mutually confirm that, except for C3, the time series of other datasets all satisfy stationarity; moreover, some returned values in the KPSS test only reflect the lower bound due to the lookup table range limitation, which actually supports the stationarity conclusion more, making the overall judgment reliable.

It should be emphasized that our SFFormer model is designed to handle non-stationary data. In our model code, there is a core forecast function, and we implement the key step of instance normalization both before feeding data into the encoder and after generating the final prediction:

a) At the input end, each independent time series sample undergoes normalization. This process first calculates and subtracts the series’ mean to eliminate the overall trend, and then calculates and divides by the series’ standard deviation to unify the fluctuation amplitude. Eventually, each input series is converted into a standard form with a mean of zero and a standard deviation of one, enabling the model to focus more on learning scale-invariant general patterns in the data.
b) At the output end, the opposite inverse operation is performed to restore predicted values to the original data scale. The standardized predictions generated by the model are first multiplied by the previously saved standard deviation to restore their fluctuation amplitude, and then added to the previously saved mean to restore their baseline level. Through this denormalization step, the model’s output is converted from the standardized space back to original values with actual physical meanings.

This mechanism, which independently calculates the mean and standard deviation for each input sample for normalization and restores the original scale after prediction, enables the model to effectively handle non-stationarity caused by trends and scale shifts. It makes the model focus on the morphological changes of series rather than their absolute values, thus intrinsically adapting to non-stationary time series without relying on external preprocessing operations such as differencing. Therefore, our model design matches the data characteristics. We have added this stationarity analysis and the discussion on how the model addresses non-stationarity to lines 246–270 on page 7 of the paper.

(2) Regarding the robustness analysis of the model against sudden events, we fully agree with your view that evaluating the model's performance under sudden and aperiodic events is a crucial part of measuring its robustness. As you have pointed out, the core advantage of the SFFormer model lies in capturing periodic and repetitive patterns in data. This is reflected in the Pool_PatchEmbedding module in our code. This module decomposes and learns time series through patches of two different scales, aiming to identify and utilize the inherent multi-scale periodicity in the data.

In real datasets, since the amount of data on sudden events is very sparse, we designed a simulation experiment for sudden events. The experimental results are shown in the following table:

Models		SFFormer	PatchTST	iTransformer	Autoformer
Metric		MSE	MSE	MSE	MSE
Park lots C1	24 48 96 144 216 288 576 720	0.9669 0.9534 0.9526 0.9503 0.9513 0.9423 0.9211 0.9102	0.9967 0.9806 0.9837 0.9792 0.9794 0.9720 0.9520 0.9453	0.9947 0.9683 0.9630 0.9589 0.9583 0.9475 0.9236 0.9122	0.9660 0.9538 0.9564 0.9524 0.9557 0.9447 0.9276 0.9136
Optimal Counts		7	0	0	1

As can be seen from the results in the table above, the SFFormer achieved lower MSE values under most prediction horizons, including 24, 48, 96, 144, 216, 288, and 720 steps. Moreover, the number of times it achieved the optimal performance reached 7, which is significantly higher than that of PatchTST (0 times), iTransformer (0 times), and Autoformer (1 time). These results indicate that the SFFormer can still maintain excellent performance and robustness when facing sudden abnormal fluctuations such as large-scale events; even in the presence of data disturbances caused by unexpected events, it can capture underlying temporal patterns and achieve accurate predictions.

We consider your comment to be highly valuable, as it has helped us more clearly define the model’s scope of application and future improvement directions. We will add the content "that incorporates external event variables or an integrated anomaly detection module, thereby improving resilience against unexpected disruptions" to the "Discussion" section of the paper (lines 608-610 on Page 17), providing comments on the model’s limitations and future improvement directions. This clarifies the applicable boundaries of the SFFormer model and offers useful insights for subsequent research.

We would like to sincerely thank you again for your profound insights. Your comments have greatly enhanced the depth of our research and the rigor of the paper. We have carefully revised the manuscript based on the above analysis.

It would be worthwhile to standardize the font in the graphs and improve the readability of the figures; for example, Figure 4 is terrible.

We sincerely apologize for the poor quality of the figures and the suboptimal reading experience it has caused you. Your observation is completely valid, as the clarity and professionalism of figures are of crucial importance for academic papers.

The core purpose of Figure 4 (heatmap) is to visually demonstrate the multi-scale temporal patterns present in the data, including intra-day cycles, weekly cycles, and abnormal event points. The heatmap we presented is intended to show the pattern differences across different dates throughout the entire month; we chose this visualization precisely because it is a powerful tool for presenting high-dimensional temporal patterns of this type. We fully acknowledge your criticism: the current version of Figure 4 indeed has shortcomings in font size, color contrast, and layout, which hinder the effective communication of information. In the final version, we assure you that all figures—especially Figure 4—will undergo thorough reformatting and enhancement.
The text also needs editorial refinement; there are incomprehensible and unjustified boldface, character clusters, and typos.
We sincerely appreciate you pointing out these detailed issues regarding text and formatting. This indeed reflects an oversight in our proofreading process prior to the final submission, and we apologize for this. The "unjustified boldface" you mentioned may have resulted from formatting errors during the paper writing process, while the character clusters and typos are unacceptable clerical mistakes that should not have occurred—we thank you for your careful observation of these issues. We will conduct a thorough language and formatting proofreading of the entire manuscript: we will ensure that the use of all formatting elements (such as boldface and italics) complies with academic writing standards and serves a clear purpose; carefully check and correct all typos, garbled characters, and grammatically incoherent sentences; and guarantee consistent use of terminology and symbols throughout the text.

We will make sure that the final manuscript features fluent language, standardized formatting, and precise expression, fully meeting the requirements of academic publication.

We sincerely thank you again for the valuable time you have invested and the profound insights you have provided. Your comments have not only helped us improve the quality of this paper but also offered important guidance for our future research. We believe that the above explanations demonstrate the rigor and innovation of our work, and we look forward to presenting a more polished version of the manuscript to you in the revised draft.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

can be accepted .

Reviewer 2 Report

Comments and Suggestions for Authors

My former concerns have been adequately addressed.

Reviewer 3 Report

Comments and Suggestions for Authors

The authors responded in detail to all my comments, I have no reasons not to accept this text and I recommend the article for publication.

Article Menu

Scale-Fusion Transformer: A Medium-to-Long-Term Forecasting Model for Parking Space Availability

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Model	Before Enhancement (Base)	After Enhancement	P-value
PatchTST	0.7511	0.7735	0.0115
Timer	0.7278	0.7538	0.0421