Hybrid Unsupervised–Supervised Learning Framework for Rainfall Prediction Using Satellite Signal Strength Attenuation
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis manuscript presents an innovative approach to rainfall prediction by leveraging satellite signal attenuation combined with machine learning techniques. The authors have clearly invested significant effort in both the experimental setup and the analytical framework. However, there are several critical methodological concerns and areas requiring substantial clarification before this work can be recommended for publication.
1. in the paper You compare against RNN/GRU but not against a single LSTM trained on all data without clustering. This is the most important comparison to justify your hybrid approach. Can be Train one LSTM on the entire dataset and compare performance. If it performs similarly, your clustering adds no valu.
2. Dataset Description Incomplete missing information:
- How many days/months of data?
- What season(s)?
- Data collection dates?
3. Figure Quality
- Figures 11, 14: Overlapping labels, hard to read
- Figure 1: Too complex, simplify
4. How long does model training take (per cluster, total)?
5. What are the hardware requirements (GPU specs)?
6. What is the inference latency (critical for operational use)?
7. Your LSTM architecture (Figure 15) is quite complex: two stacked LSTM layers (128 and 256 units), three dense layers (256, 128, 64 units), batch normalization, and dropout. However, I don't see any justification for these specific choices.
- Why two LSTM layers rather than one or three?
- How did you determine the unit counts (128, 256, etc.)?
- Did you perform hyperparameter tuning or architecture search?
- Why do the dense layers decrease in size (256→128→64)?
8. What do your four clusters represent meteorologically? Why does Cluster 2 correspond to intense convective storms? Do the clusters align with known tropical meteorology (stratiform vs. convective rainfall, etc.)?
9. How does your accuracy compare to weather radar or satellite-based precipitation products currently used in Thailand? This would help readers assess practical value.
Author Response
Comment 1: In the paper You compare against RNN/GRU but not against a single LSTM trained on all data without clustering. This is the most important comparison to justify your hybrid approach. Can be Train one LSTM on the entire dataset and compare performance. If it performs similarly, your clustering adds no value.
Answer/Action: We thank the reviewer for this critical observation. We have addressed this concern by adding a comprehensive baseline comparison in the revised manuscript.
We have added “Section 4.4.2 Baseline Comparison: Single LSTM versus Clustering-Based Approach” which directly compares our clustering-based methodology against a single LSTM model trained on the entire dataset without atmospheric regime identification. The updates are outlined as follows:
- To ensure fair comparison, both models employed identical architectures (two stacked LSTM layers with 128 and 256 units, three dense layers with 256, 128, and 64 neurons, ReLU activation, dropout regularization), identical hyperparameters (Adam optimizer, learning rate 0.001, batch size 64, early stopping with 50-epoch patience), and identical training-testing split (80/20 maintaining temporal sequence). Performance metrics were computed using bootstrap resampling with 1,000 iterations to provide 95% confidence intervals.
- The clustering-based framework demonstrates substantial and statistically significant improvements over the single LSTM baseline (Table 13):
Table 13. Performance comparison between single LSTM baseline and hybrid clustering-based approach.
|
Model Type |
R2 Score |
MAE (mm/h) |
MSE (mm2/h2) |
Training Time |
|
Single LSTM (No Clustering) |
0.8341±0.023 |
0.1847±0.012 |
1.2389±0.087 |
2.3 hours |
|
LSTM with Clustering |
0.9487±0.015 |
0.0594±0.008 |
0.3748±0.025 |
4.2 hours |
|
Improvement |
+13.7% |
-67.8% |
-69.7% |
+83% |
- These results conclusively demonstrate that atmospheric regime identification through clustering adds substantial value to the prediction framework. The single LSTM, while achieving reasonable performance (R² = 0.83), fails to capture the nonlinear dynamics across diverse meteorological conditions as effectively as the regime-specific approach. The 67.8% reduction in MAE is particularly significant for operational applications requiring accurate rainfall estimates across varying atmospheric states.
- The performance improvements validate our hypothesis that tropical precipitation exhibits fundamentally different signal-rainfall relationships across distinct atmospheric regimes (clear-sky, stratiform, convective, mixed circulation). A unified LSTM trained on all data must compromise between these competing dynamics, resulting in suboptimal performance. In contrast, the clustering-based approach enables specialized learning for each regime's unique temporal patterns, substantially improving prediction accuracy.
Comment 2: Dataset Description Incomplete missing information:
- How many days/months of data?
- What season(s)?
- Data collection dates?
Answer/Action: The manuscript has been revised by adding “Section 2.6: Dataset Characteristics and Collection Protocol”, with the following content:
- The data were collected continuously for 38 days (October 8 – November 16, 2025), covering the transition from Thailand’s late rainy season to the onset of the cool dry season, which is well suited for validating machine learning models under diverse tropical atmospheric conditions.
- The dataset contains 98,483 synchronized samples acquired from a 12-m Ku-band satellite ground station at King Mongkut’s Institute of Technology Ladkrabang, Bangkok.
- Measurement configuration:
- Signal-to-noise ratio (SNR): recorded every ~30 s.
- Atmospheric pressure: recorded every 1 min.
- Rainfall intensity: recorded hourly using a tipping-bucket rain gauge with 0.1-mm resolution.
- Statistical characteristics:
- SNR: 8.000–30.110 dB (mean 20.073 ± 2.927 dB).
- Pressure: 1003.000–1014.100 hPa (mean 1007.928 ± 1.820 hPa).
- Rainfall: 0–54.300 mm/h (mean 0.386 ± 2.984 mm/h).
- These ranges confirm coverage from clear-sky to severe attenuation conditions.
- Rainfall event distribution:
- 5,595 samples correspond to rainfall events.
- 1,244 samples exceed 10 mm/h.
- 355 samples exceed 30 mm/h.
- Seasonal coverage:
- October (late rainy season): 61.1% of samples.
- November (early dry season): 38.9% of samples.
- Table 7 summarizes the statistical characteristics of the three primary variables. The wide dynamic range of SNR values (8.000–30.110 dB) confirms coverage from severe rain-induced attenuation to clear-sky conditions, supporting the physical validity of the dataset.
- Statistical characteristics:
Table 7. Statistical characteristics of the dataset.
|
Feature |
Mean |
Median |
StdDev |
Min |
Max |
Range |
|
SNR (dB) |
20.073 |
20.760 |
2.927 |
8.000 |
30.110 |
22.110 |
|
Pressure (hPa) |
1007.928 |
1007.900 |
1.820 |
1003.000 |
1014.100 |
11.100 |
|
Rainfall (mm/h) |
0.386 |
0.000 |
2.984 |
0.000 |
54.300 |
54.300 |
Comment 3: Figure Quality
- Figures 11, 14: Overlapping labels, hard to read
- Figure 1: Too complex, simplify
- Figures 16-19: Combine into one 2×2 panel
Answer/Action: We thank the reviewer for these suggestions to improve figure clarity. All recommendations have been fully implemented:
- Figures 11 and 14 (now renumbered as Figure 12 and Figure 16), completely revised both figures to enhance readability and eliminate overlapping labels.
- Figure 1: System Architecture (now renumbered as Figure 2), we have simplified the system architecture diagram by consolidating the Data Preprocessing components into a single unified block. This reorganization reduces visual complexity while maintaining essential information flow. The consolidated preprocessing block shows the entire data preparation process (data cleaning and normalization and feature extraction) which functions as a single block to help users understand the system operations. Technical details remain available in the corresponding Methods section text.
- Figures 16-19 (now combined as Figure 19), following the reviewer's recommendation, we have successfully combined into a single 2×2 panel layout.
Comment 4: How long does model training take (per cluster, total)?
Answer/Action: The manuscript has been revised by adding Section 4.4.4: Computational Performance Analysis, with the following content:
- This section summarizes the training duration of the proposed framework. Each cluster-specific LSTM model requires 52–71 minutes of training, depending on the atmospheric regime. The total training time, including preprocessing and training of all four clusters, is 4.2 hours, compared with 2.3 hours for the baseline single LSTM model, representing an 83% increase in training time. This additional cost is a one-time offline training investment and is justified by the substantial performance improvement achieved, including a 67.8% reduction in MAE and a 13.7% increase in R².
- The training time required for each cluster-specific LSTM model is summarized in Table 15.
Table 15. Training performance for cluster-specific LSTM models.
|
Cluster |
Training Time |
Simples |
Regime |
|
Cluster 0 |
52 min |
24,500 |
Clear sky/Light rain |
|
Cluster 1 |
58 min |
25,200 |
Stratiform systems |
|
Cluster 2 |
71 min |
24,800 |
Convective events |
|
Cluster 3 |
63 min |
24,700 |
Transitional |
|
Baseline |
2.3 hours |
99,200 |
Single LSTM |
Comment 5: What are the hardware requirements (GPU specs)?
Answer/Action: The manuscript has been revised by adding the following content to specify the hardware requirements used in this study.
- All experiments were conducted on a mid-range research workstation equipped with an Intel Core i7-13700 CPU (16 cores, up to 5.4 GHz), an NVIDIA GeForce RTX 3060 GPU (12 GB VRAM, 3,584 CUDA cores) and 32 GB DDR5-5600 dual-channel RAM
- In terms of resource utilization, the peak GPU memory usage during training was 8.9 GB VRAM, while the maximum system RAM usage reached 18 GB. These requirements remain well within the capacity of standard research workstations, confirming that the proposed framework does not require specialized high-performance computing infrastructure and is suitable for practical deployment in research and operational environments.
Comment 6: What is the inference latency (critical for operational use)?
Answer/Action: The manuscript has been revised by adding the following content to report the inference latency and deployment characteristics of the proposed framework.
- The inference performance was evaluated using GPU acceleration. The average inference latency for a single sample is 11.4 ms, while processing a batch of 100 samples requires 38 ms in total, which satisfies real-time operational requirements with a comfortable margin below the 50 ms threshold. The corresponding throughput reaches 87.7 samples/s for single-sample processing and 2,632 samples/s for batch inference.
- The memory footprint during inference is 2.3 GB per cluster and 9.2 GB in total, remaining within the capacity of standard GPUs (<12 GB). The total model size is 60.8 MB for all four clusters, and the system initialization time is 1.8 s, enabling rapid deployment. These results confirm that the proposed framework is suitable for real-time operational use.
Comment 7: Your LSTM architecture (Figure 15) is quite complex: two stacked LSTM layers (128 and 256 units), three dense layers (256, 128, 64 units), batch normalization, and dropout. However, I don't see any justification for these specific choices.
- Why two LSTM layers rather than one or three?
- How did you determine the unit counts (128, 256, etc.)?
- Did you perform hyperparameter tuning or architecture search?
- Why do the dense layers decrease in size (256→128→64)?
Answer/Action: The manuscript has been revised by adding Section 3.5: LSTM Network Architecture and Training Configuration to justify the architectural design choices.
- A systematic architecture optimization was conducted by comparing single-, two-, and three-layer LSTM models. The two-layer LSTM with 128 and 256 units achieved the best trade-off between accuracy and computational efficiency (R² = 0.9487, 58 min/cluster), outperforming both shallower and deeper configurations. Hidden unit counts were selected through grid-search hyperparameter tuning, balancing representational capacity and training cost.
- The dense-layer pyramid (256→128→64) was designed to implement progressive dimensionality reduction, enabling effective feature consolidation and noise suppression prior to rainfall regression. This added section clarifies why two LSTM layers were selected, how unit sizes were determined, and how the overall architecture was optimized.
Comment 8: What do your four clusters represent meteorologically? Why does Cluster 2 correspond to intense convective storms? Do the clusters align with known tropical meteorology (stratiform vs. convective rainfall, etc.)?
Answer/Action: The manuscript has been revised by adding Section 3.4.2: Meteorological Interpretation of Clusters to clarify the physical meaning of the four atmospheric clusters. Each cluster was interpreted using principles of tropical meteorology and satellite signal propagation.
- Cluster 0 – Dry and Stable Atmosphere (Clear-Sky Regime). Cluster 0 shows atmospheric conditions which include clear skies and stable air patterns that tropical regions experience during their dry seasons. The weather pattern shows elevated signal-to-noise ratio values together with no rainfall and stable atmospheric pressure which points to descending air masses that produce minimal vertical movement. The signal strength in this cluster stays strong because oxygen and water vapor gases in the air cause more signal loss than the scattering of hydrometeors. The SNR–rainfall relationship in this regime maintains its near-linear pattern which functions as a physical reference point for typical satellite link operations.
- Cluster 1 – Stratiform Precipitation Regime, demonstrates wide distribution of stratiform rainfall which develops when nimbostratus clouds form in layered cloud systems because of synoptic-scale weather patterns. The system produces structured precipitation instead of convective storms because it maintains moderate rainfall intensity and atmospheric pressure continues to shift. The signal strength reduction during this period results from Rayleigh scattering which occurs when small raindrops encounter the signal which produces quasi-linear and easy to predict signal loss patterns. The cluster demonstrates that light rainfall patterns follow a pattern which leads to significant water accumulation during tropical weather conditions.
- Cluster 2 – Convective Storm Regime. The main characteristic of tropical precipitation in Cluster 2 emerges from its intense convective storm systems. The system operates under conditions of weak signal power and heavy rainfall, and it experiences large changes in atmospheric pressure and signal intensity because of deep cumulonimbus clouds and powerful vertical wind movements. The SNR–rainfall relationships become strongly nonlinear and rapidly evolve because Mie scattering occurs when raindrops reach large sizes and the atmosphere contains high amounts of liquid water. The present circumstances create major challenges which prevent empirical models and unified machine learning approaches from using Cluster 2 to develop their own distinct modeling methods.
- Cluster 3 – Diurnal Tropical Circulation Regime. Cluster 3 shows precipitation patterns that result from daily boundary-layer activities which produce both rain and snow through sea–land breeze circulation and the transition period of the monsoon. The system functions under conditions which include both moderate to high SNR and light rainfall and neutral pressure values that show alternating patterns of stratiform precipitation and embedded convective cells. The scattering pattern in this cluster shows different behavior because it unites features which exist in Rayleigh and Mie scattering regimes. The atmospheric states in Cluster 3 operate as intermediate weather patterns which link stable atmospheric conditions to developing convective systems while offering essential data for modeling long-term rainfall patterns in tropical areas.
Comment 9: How does your accuracy compare to weather radar or satellite-based precipitation products currently used in Thailand? This would help readers assess practical value.
Answer/Action: The manuscript has been revised by adding Section 4.5: Validation Against Operational Meteorological Systems, including Table 17.
- To evaluate the proposed framework against operational rainfall observations from the Thailand Meteorological Department (TMD). Validation using data from the nearest TMD automatic weather station over a 38-day period demonstrates strong agreement, with a correlation coefficient of r = 0.949 ± 0.015, MAE = 0.059 ± 0.008 mm/h, RMSE = 0.193 ± 0.021 mm/h, and a small bias of −2.1 ± 1.1%. These results confirm that the proposed satellite signal attenuation-based approach achieves accuracy comparable to operational meteorological systems currently used in Thailand, supporting its practical applicability, particularly in regions with limited weather radar coverage.
Table 17. Performance validation against Thailand Meteorological Department operational observations.
|
Performance Metric |
Value |
Reference |
|
Correlation coefficient (r) |
0.949 ± 0.015 |
Nearest TMD station |
|
MAE (mm/h) |
0.059 ± 0.008 |
38-day test period |
|
RMSE (mm/h) |
0.193 ± 0.021 |
Statistical validation |
|
Bias (%) |
-2.1 ± 1.1 |
Systematic neutrality |
Reviewer 2 Report
Comments and Suggestions for AuthorsMost of the concern on the manuscript are well presented. Below are some of the comments:
- Could include ITU recommended approach for comparison and error analysis.
- Significant dataset can be included in Abstract section
- Need to include brief overview of the overall paragraph in the introduction section.
- Can include simple block diagram that will show the sub systems.
- Can present flow cart to better visualize the flow nature.
- Comparison metrics need to be revised and can include ITU recommended approach.
- Need to modify the conclusion section.
It is fine
Author Response
Comment 1: Could include ITU recommended approach for comparison and error analysis.
Answer/Action: We thank the reviewer for this excellent suggestion to benchmark our approach against established standards. We have added a comprehensive comparison with the ITU-R recommended model in the revised manuscript.
We have added Section “4.4.3 Comparison with ITU-R Empirical Model” and “Table 14. Performance comparison of selected rainfall prediction methods” which presents a detailed performance comparison of our hybrid framework against three benchmark approaches:
- ITU-R P.838-3 empirical model (ITU recommended approach)
- Support Vector Regression (SVR) with RBF kernel
- Single LSTM without clustering (baseline machine learning)
ITU-R Implementation: The ITU-R P.838-3 model employs the standard specific attenuation coefficients for 12 GHz Ku-band frequency: k = 0.1968 and α = 1.1188. This represents the current international standard for rainfall prediction from satellite signal attenuation.
Table 14. Performance comparison of selected rainfall prediction methods
|
Method |
R2 |
MAE (mm/h) |
RMSE (mm/h) |
MSE (mm/h) |
|
ITU-R P.838-3 |
0.6524 |
0.8734 |
1.1123 |
1.2372 |
|
Support Vector Regression |
0.8123 |
0.1456 |
0.3814 |
0.1455 |
|
Single LSTM (no clustering) |
0.8341 |
0.1847 |
0.4305 |
1.2389 |
|
LSTM with Clustering |
0.9487 |
0.0594 |
0.1936 |
0.3748 |
- The proposed K-Means-LSTM hybrid framework substantially outperforms all benchmark approaches: The system achieved a +45.5% R² improvement and reduced MAE by 93.2% when compared to ITU-R P.838-3. The ITU-R empirical model generates acceptable results for typical weather conditions but it fails to produce precise results when analyzing tropical precipitation that shows fast weather changes and multiple atmospheric conditions.
- The model achieved a +13.7% R² improvement and reduced MAE by 67.8% when using Single LSTM instead of Versus Single LSTM. The direct comparison proves that K-Means clustering for atmospheric regime identification delivers vital performance benefits which exceed what unified modeling methods can achieve (the baseline comparison for Comment 1 also shows this).
- The model achieved a +16.8% R² improvement and reduced MAE by 59.2% when using Versus instead of SVR. The system produces better results than all current machine learning approaches.
Comment 2: Significant dataset can be included in Abstract section.
Answer/Action: We thank the reviewer for this valuable suggestion to enhance the Abstract with dataset specifications. We have added comprehensive dataset information to the Abstract as recommended.
The revised Abstract now includes the following dataset specification: "The dataset comprises 98,483 observations collected with 30-second temporal resolution, providing comprehensive coverage of diverse tropical atmospheric conditions."
Comment 3: Need to include brief overview of the overall paragraph in the introduction section.
Answer/Action: We have added a comprehensive overview paragraph at the end of the Introduction to provide readers with a clear roadmap of the paper's contributions and organization, as follows:
" The research investigates the complete hybrid system which uses satellite signals to predict rainfall in tropical areas. The research provides three main contributions through its development of a physically based clustering system which detects atmospheric conditions through signal strength measurements and its use of LSTM networks that learn nonlinear weather patterns under different weather conditions and its validation process which shows better results than both traditional empirical models and single machine learning systems. The research method receives validation through data collected from a 12-meter Ku-band satellite ground station based at KMITL in Bangkok which generated 98,483 observations during multiple months with 30-second time intervals. The remainder of this paper is organized as follows: The experimental platform and machine learning framework are described, including data acquisition, preprocessing, clustering, and LSTM architecture. The following section presents clustering analysis results which enable researchers to understand the atmospheric conditions that exist in each identified regime. The evaluation of model performance requires researchers to use various evaluation metrics, and they must compare their results to other possible solutions and check them against actual meteorological data from operational systems. The re-search establishes its limitations while demonstrating its operational value and identifying key research paths which will guide upcoming investigations."
Comment 4: Can include simple block diagram that will show the sub systems.
Answer/Action: We have added separate block diagrams for each major subsystem to enhance reader comprehension of the system architecture.
- Figure 3 - Receiver System Block Diagram: A detailed block diagram illustrating the satellite receiver subsystem architecture has been added. This diagram shows the complete signal acquisition and processing chain
- Figure 7 - Control System Block Diagram: This diagram demonstrates the closed-loop control architecture for satellite tracking, showing the relationship between mechanical components, motor controllers, position feedback encoders, and control software.
- Figure 14 - Data Preprocessing Pipeline Block Diagram: A new dedicated block diagram has been added specifically for the data preprocessing pipeline, clearly showing the four-stage workflow
- Data Cleaning: Eliminates faulty measurements, outliers, and temporal synchronization issues.
- K-Means Clustering: Identifies distinct atmospheric regimes from multi-modal data distributions.
- SMOTE (Synthetic Minority Over-sampling): Balances cluster populations to prevent training bias.
- Feature Standardization: Normalizes all features (zero mean, unit variance) for optimal LSTM training
Comment 5: Can present flow cart to better visualize the flow nature.
Answer/Action: We thank the reviewer for this excellent suggestion. We have added Figure 1 presenting a complete methodology flowchart that illustrates the end-to-end process flow from data acquisition to final performance evaluation.
Comment 6: Comparison metrics need to be revised and can include ITU recommended approach.
Answer/Action: We sincerely thank the reviewer for this valuable suggestion. We have substantially strengthened the evaluation framework by incorporating additional meteorological performance metrics together with a comprehensive comparison against the ITU-R P.838-3 standard model. The enhanced evaluation scheme now includes the following components:
- Conventional regression metrics: Model accuracy is assessed using Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Square Error (RMSE), and the coefficient of determination (R²).
- Comparison with ITU-R P.838-3: The ITU-R P.838-3 specific attenuation model was implemented as a physical baseline following standard procedures for converting satellite signal attenuation to rainfall rate using frequency-dependent coefficients (k = 0.1968, α = 1.1188 for 12 GHz vertical polarization). Performance was evaluated across all identified atmospheric regimes, and a detailed error analysis demonstrates that the proposed hybrid framework achieves 93.2% reduction in MAE and 82.6% reduction in RMSE relative to the ITU-R standard method.
Specific revisions made to the manuscript:
- Section 4.4.3 has been revised to include comprehensive comparison with ITU-R P.838-3 empirical model, Support Vector Regression, and single LSTM baseline
- Table 14 now presents a complete performance comparison including R², MAE, RMSE, and MSE metrics for all methods, demonstrating the superiority of the proposed framework with R² = 0.9487 compared to ITU-R P.838-3 (R² = 0.6524), representing a 45.5% improvement.
Comment 7: Need to modify the conclusion section.
Answer/Action: We thank the reviewer for this comment. We have substantially revised the Conclusion section to improve its structure, clarity, and completeness.
Reviewer 3 Report
Comments and Suggestions for AuthorsThis manuscript proposes a hybrid unsupervised-supervised learning framework for rainfall prediction utilizing satellite signal strength attenuation, specifically focusing on tropical regions. It combines K-Means clustering to identify distinct atmospheric regimes with cluster-specific Long Short-Term Memory (LSTM) deep learning models. The framework uses SNR patterns from a Ku-band satellite ground station, absolute pressure, and hourly rainfall measurements. While the approach seems promising, there are several areas that could benefit from further clarification and content. Specific comments are as follows.
- The description of the K-Means clustering process is superficial. While the features (SNR, pressure, rainfall) are listed, the manuscript does not show a visualization of the clusters. This makes it impossible for the reader to understand what these four distinct "atmospheric regimes" actually represent in physical terms. The application of SMOTE is problematic for time-series data.
- While RNN and GRU are compared, a brief discussion comparing the proposed method's performance with other contemporary rainfall prediction techniques (e.g., other machine learning models, traditional meteorological models) would provide more context.
- The abstract mentions using Signal-to-Noise Ratio (SNR) patterns, absolute pressure, and hourly rainfall measurements, the manuscript lacks a dedicated "Data" section or a more comprehensive description of the dataset used. The duration of the data collection, the temporal resolution of the SNR and pressure data (e.g., how often are they recorded?), and details about the rainfall measurement source (e.g., type of rain gauge, its accuracy, and calibration). A more in-depth statistical summary of each feature (mean, median, standard deviation, etc.) would be beneficial.
- The manuscript states that the relationship between SNR and rainfall is "non-linear behavior because SNR decreases at different rates when rainfall intensity remains constant because of variations in raindrop sizes and cloud liquid water content and atmospheric stability conditions." While it acknowledges the non-linearity and its causes, a more in-depth discussion of this phenomenon, possibly with theoretical background or references, would enhance understanding. How does the model specifically account for these variations in raindrop sizes, cloud liquid water, and atmospheric stability?
- Every model and methodology has its limitations. While the manuscript highlights the advantages of the hybrid approach, a dedicated section discussing the inherent limitations of using satellite signal attenuation for rainfall prediction (e.g., impact of non-rain atmospheric phenomena, varying raindrop size distributions, limitations of Ku-band during very heavy rainfall) would provide a more balanced view.
- Please elaborate on how this framework could be extended for real-time nowcasting (very short-term prediction) or longer-term forecasting, and what additional data or modifications would be required.
Author Response
Comment 1: The description of the K-Means clustering process is superficial. While the features (SNR, pressure, rainfall) are listed, the manuscript does not show a visualization of the clusters. This makes it impossible for the reader to understand what these four distinct "atmospheric regimes" actually represent in physical terms. The application of SMOTE is problematic for time-series data.
Answer/Action: We have added a new visualization as Figure 16 presenting boxplot distributions of SNR level, atmospheric pressure, and rainfall intensity for each cluster (C0–C3).And The manuscript has been substantially revised to strengthen the description, visualization, and physical interpretation of the clustering process, as well as to clarify the appropriate use of SMOTE.
- Cluster visualization and physical interpretation. Added a new visualization as Figure 16 presenting boxplot distributions of SNR level, atmospheric pressure, and rainfall intensity for each cluster (C0–C3). The figure presents direct physical evidence which demonstrates that the four clusters represent separate atmospheric conditions instead of random numerical categories. The research data reveals four separate clusters which demonstrate various weather conditions through their SNR and rainfall measurement results. The weather pattern in Cluster C0 shows dry conditions because it shows both high SNR readings and minimal rainfall. The weather pattern of Cluster C1 produces stratiform precipitation because it keeps standard SNR levels while showing increased atmospheric pressure. The weather pattern in Cluster C2 shows weak signal power while producing heavy rainfall which indicates that convective storms exist in this area. The weather pattern of Cluster C3 displays tropical features which result in average signal degradation and minimal rainfall. The research findings enable readers to learn about atmospheric regime physical characteristics through direct access to the presented data.
- Clarification of SMOTE usage for time-series data. Clarify that SMOTE is not applied to temporal sequences. Instead, SMOTE is used only to balance the sample distribution within each K-Means cluster after clustering, operating on meteorological state vectors (SNR, pressure, rainfall) belonging to the same cluster. This ensures that synthetic samples remain physically consistent with the corresponding atmospheric regime. The nearest-neighbor parameter was set to k = 5 to preserve the statistical relationships among variables. Post-SMOTE validation confirmed high consistency between original and balanced datasets, with correlation coefficients exceeding 0.95 within each cluster.
Comment 2: While RNN and GRU are compared, a brief discussion comparing the proposed method's performance with other contemporary rainfall prediction techniques (e.g., other machine learning models, traditional meteorological models) would provide more context.
Answer/Action: We thank the reviewer for this excellent suggestion to broaden the comparative evaluation. Following this recommendation, we have added comprehensive comparison with both traditional meteorological models and alternative machine learning approaches.
We have added “Section 4.4.3 Comparison with ITU-R Empirical Model” along with “Table 14. Performance comparison of selected rainfall prediction methods.” presenting a systematic comparison of the proposed K-Means–LSTM framework against three representative baseline methods
Table 14. Performance comparison of selected rainfall prediction methods.
|
Method |
R2 |
MAE (mm/h) |
RMSE (mm/h) |
MSE (mm/h) |
|
ITU-R P.838-3 |
0.6524 |
0.8734 |
1.1123 |
1.2372 |
|
Support Vector Regression |
0.8123 |
0.1456 |
0.3814 |
0.1455 |
|
Single LSTM (no clustering) |
0.8341 |
0.1847 |
0.4305 |
1.2389 |
|
LSTM with Clustering |
0.9487 |
0.0594 |
0.1936 |
0.3748 |
Comment 3: The abstract mentions using Signal-to-Noise Ratio (SNR) patterns, absolute pressure, and hourly rainfall measurements, the manuscript lacks a dedicated "Data" section or a more comprehensive description of the dataset used. The duration of the data collection, the temporal resolution of the SNR and pressure data (e.g., how often are they recorded?), and details about the rainfall measurement source (e.g., type of rain gauge, its accuracy, and calibration). A more in-depth statistical summary of each feature (mean, median, standard deviation, etc.) would be beneficial.
Answer/Action: We sincerely thank the reviewer for this valuable suggestion. We have added a comprehensive dedicated data section with detailed dataset characteristics, collection protocols, and complete statistical summaries.
We have added “Section 2.6 Dataset Characteristics and Collection Protocol” along with “Table 7. Statistical characteristics of the dataset.” providing complete statistical characterization of the dataset. This new section addresses all aspects raised by the reviewer, as follows:
- The experimental dataset was collected continuously over 38 days from October 8 to November 16, 2025, at the 12-meter Ku-band satellite ground station. The collection period extends from Thailand's late rainy season through its transition to cool dry season, which creates various tropical atmospheric conditions that support the proposed clustering-based rainfall prediction framework.
- SNR Measurement System: Operates through 30-second satellite signal acquisition intervals to continuously monitor atmospheric conditions and rain-induced signal power levels. This high temporal resolution enables capture of rapid meteorological transitions characteristic of tropical convective systems.
- Meteorological Station (Atmospheric Pressure): Operates at one-minute measurement intervals to monitor atmospheric pressure variations, enabling the system to track synoptic-scale weather patterns and pressure changes associated with storm systems.
- Rainfall Measurement: Uses tipping-bucket rain gauge data to measure hourly rainfall intensity at 0.1 mm resolution, following the operational meteorological forecast schedule. The tipping-bucket mechanism provides calibrated measurements with ±2% accuracy under standard conditions.
Table 7. Statistical characteristics of the dataset.
|
Feature |
Mean |
Median |
StdDev |
Min |
Max |
Range |
|
SNR (dB) |
20.073 |
20.760 |
2.927 |
8.000 |
30.110 |
22.110 |
|
Pressure (hPa) |
1007.928 |
1007.900 |
1.820 |
1003.000 |
1014.100 |
11.100 |
|
Rainfall (mm/h) |
0.386 |
0.000 |
2.984 |
0.000 |
54.300 |
54.300 |
Comment 4: The manuscript states that the relationship between SNR and rainfall is "non-linear behavior because SNR decreases at different rates when rainfall intensity remains constant because of variations in raindrop sizes and cloud liquid water content and atmospheric stability conditions." While it acknowledges the non-linearity and its causes, a more in-depth discussion of this phenomenon, possibly with theoretical background or references, would enhance understanding. How does the model specifically account for these variations in raindrop sizes, cloud liquid water, and atmospheric stability?
Answer/Action: We have added comprehensive theoretical background to “Section 3.4.2 Meteorological Interpretation of Clusters” explaining how the clustering-based approach specifically accounts for the nonlinear SNR-rainfall relationship and its physical causes, as follows:
- The cluster separation method creates a physically motivated framework which enables the model to address the nonlinear SNR-rainfall relationship through regime-specific characterization. The fundamental source of this nonlinearity lies in electromagnetic wave scattering theory: at the 12 GHz Ku-band frequency (λ ≈ 25 mm), the attenuation rate depends critically on the relationship between raindrop dimensions and wavelength. Tropical raindrops typically range from 0.5 to 5 mm in diameter, placing them in the scattering transition zone between Rayleigh (small particle) and Mie (large particle) regimes.
Comment 5: Every model and methodology has its limitations. While the manuscript highlights the advantages of the hybrid approach, a dedicated section discussing the inherent limitations of using satellite signal attenuation for rainfall prediction (e.g., impact of non-rain atmospheric phenomena, varying raindrop size distributions, limitations of Ku-band during very heavy rainfall) would provide a more balanced view.
Answer/Action: We have added “Section 4.6 Limitations and Methodological Constraints” providing comprehensive discussion of technical and operational limitations inherent to satellite signal attenuation-based rainfall prediction, as follows:
- The proposed methodology faces various technical and operational obstacles which make it impossible to use in actual operational environments. The combination of atmospheric effects which include molecular absorption of atmospheric gases (oxygen and water vapor) and non-precipitating cloud layers and atmospheric scintillation effects leads to signal attenuation which results in systematic errors during rainfall estimation. The sensing system reaches its maximum capacity when Ku-band attenuation reaches 30 dB or higher during heavy rainfall that exceeds 50 mm/h. The geometric elements of satellites at different elevation angles determine how much precipitation cells extend the signal path which needs individual calibration for each location. The 30-second SNR sampling interval does not provide sufficient time to detect the quick-moving convective cells which appear in tropical weather systems.
- The system reduces its operational limitations through multiple parameter verification which uses atmospheric pressure as an independent weather metric, and it uses cluster-based saturation limits and strong statistical methods to detect outliers and it verifies its results by comparing them to nearby surface-based measurements. The operational framework which exists for tropical atmospheric environments needs major changes to function properly in various climate zones. The system depends on stable satellite communication links which maintain constant signal quality because it needs periodic ground-truth rainfall measurement calibration to achieve long-term measurement accuracy.
Comment 6: Please elaborate on how this framework could be extended for real-time nowcasting (very short-term prediction) or longer-term forecasting, and what additional data or modifications would be required.
Answer/Action: We thank the reviewer for this valuable forward-looking question. We would like to clarify as follows:
- The proposed framework enables real-time nowcasting through its ability to process streaming SNR data while using a sliding-window inference method which enables cluster-specific LSTM models to produce rainfall predictions every sub-minute. The system would detect fast-rising rainfall better through the combination of short-term trend features with lead–lag indicators and adaptive thresholding methods. The framework requires expansion through meteorological data addition which includes temperature and humidity and wind speed measurements and numerical weather prediction model results to achieve precise long-term forecasting. The system needs to employ sequence-to-sequence or attention-based architectures to study how various time intervals across multiple hours impact the system. The system would transform into a predictive decision-support platform through these improvements which would allow it to perform real-time nowcasting and short-term rainfall forecasting.
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsMost of the concerns are well addressed.
Comments on the Quality of English LanguageIt's fine
Reviewer 3 Report
Comments and Suggestions for AuthorsIt can be accepted.
