Analyzing the Impact of High-Frequency Noise on Hydrological Runoff Modeling: A Frequency-Based Framework for Data Uncertainty Assessment

Liu, Tianxu; Ouyang, Wenyu; Adnan, Muhammad; Zhang, Chi

doi:10.3390/w18020195

Open AccessArticle

Analyzing the Impact of High-Frequency Noise on Hydrological Runoff Modeling: A Frequency-Based Framework for Data Uncertainty Assessment

¹

School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China

²

School of Infrastructure Engineering, Dalian University of Technology, Dalian 116024, China

^*

Author to whom correspondence should be addressed.

Water 2026, 18(2), 195; https://doi.org/10.3390/w18020195

Submission received: 26 October 2025 / Revised: 18 December 2025 / Accepted: 22 December 2025 / Published: 12 January 2026

(This article belongs to the Section Water Resources Management, Policy and Governance)

Download

Browse Figures

Versions Notes

Abstract

The performance of deep learning-based hydrological forecasting is highly sensitive to input quality, yet existing studies lack a systematic framework to evaluate the impact of high-frequency noise based on hydrological characteristics. To address this, we propose a frequency-based framework to assess the robustness of LSTM runoff prediction models. We define three hydrologically meaningful noise types—long-term trend, short-term event, and transient interference—and employ a synthetic noise injection strategy on the CAMELS dataset. Furthermore, we introduce an adaptive exponentially weighted moving average (AEWMA) algorithm that dynamically adjusts smoothing based on local signal variability. Results from dual-domain evaluation (time and frequency) indicate that model accuracy deteriorates significantly when high-frequency noise exceeds 30% of the total signal energy. Moderate adaptive smoothing (e.g.,

α = 0.9 & 0.6

) effectively preserves hydrological signals while mitigating performance loss, whereas aggressive smoothing suppresses meaningful variations. This study underscores the necessity of noise-type-specific preprocessing and suggests spectral energy ratios as quantitative thresholds for adaptive data quality control in hydrological modeling workflows.

Keywords:

hydrological forecasting; noise injection; data uncertainty; adaptive denoising; frequency-domain analysis

1. Introduction

Accurate hydrological forecasting is fundamental to water resource management and disaster mitigation [1]. In recent years, data-driven approaches, particularly those utilizing Long Short-Term Memory (LSTM) networks, have become essential tools for simulating rainfall-runoff dynamics due to their powerful capability in capturing nonlinear features [1,2]. However, the performance of these deep learning models is intrinsically sensitive to the quality of input data [3,4]. In operational settings, hydrological time series are inevitably compromised by various forms of measurement uncertainty [5], such as slow sensor drift, abrupt spikes caused by environmental interference, and intermittent errors during data transmission [6,7,8,9,10]. These disturbances distort the temporal patterns learned by the model, inducing overfitting to noise and, consequently, severely undermining the model’s generalization ability and predictive accuracy [11]. Therefore, the identification, quantification, and mitigation of these specific noise patterns constitute a critical prerequisite for building robust hydrological forecasting systems [12].

Despite the ubiquitous nature of data uncertainty, existing noise mitigation strategies still exhibit significant limitations [13,14]. Conventional signal processing techniques, such as moving averages [15] and wavelet transforms [16], have been applied to de-noise hydrological series [17]; yet, they typically rely on static parameters or subjective assumptions about signal smoothness, thereby lacking the necessary adaptability to handle the non-stationary fluctuations inherent in runoff data [18,19]. More critically, mainstream noise classification systems (e.g., borrowing “white noise” or “red noise” concepts from signal processing) often fail to establish a clear linkage with physical hydrological processes [20]. For instance, pure spectral classification can identify long-range dependence but cannot distinguish between a slow baseline shift caused by sensor drift and a seasonal change resulting from natural climate trends [21,22]. These two phenomena have vastly different implications for model calibration and prediction [23]. This discrepancy between statistical noise definitions and physical hydrological characteristics represents a core research gap in the current literature.

To address these challenges, this study proposes a systematic experimental framework based on frequency domain analysis, designed to quantify the impact of high-frequency noise on the robustness of LSTM models. We first define a hydrology-specific noise typology that delineates disturbances into three categories based on their temporal scale and physical origin: (1) Long-term Trend Noise (simulating sensor drift [8]); (2) Short-term Event Noise (simulating influences like irrigation or reservoir operations [10]); and (3) Transient Interference Noise (simulating data spikes or transmission errors [11]). Based on this classification, we devise a synthetic noise injection strategy to systematically evaluate model vulnerability under various uncertainty scenarios within a controlled environment.

Furthermore, to tackle the challenge of non-stationary data, we introduce an Adaptive Exponentially Weighted Moving Average (AEWMA) denoising algorithm [24,25]. This algorithm utilizes a dynamic threshold mechanism based on the Interquartile Range (IQR) [26,27], enabling the real-time adjustment of the smoothing coefficient (

α_{t}

) according to local signal volatility. This adaptive mechanism is engineered to balance the suppression of high-frequency noise with the preservation of critical hydrological signals, such as flood peaks, effectively overcoming the limitations of conventional static filters in protecting signal integrity.

Finally, we establish a dual-domain evaluation system integrating both the time domain and the frequency domain. The overall architecture is shown in Figure 1. Beyond employing traditional time-domain metrics [28,29] (e.g., Nash-Sutcliffe Efficiency (NSE) and Root Mean Square Error (RMSE)), we incorporate Power Spectral Density [30] (PSD) analysis to quantify how noise alters the spectral energy distribution of the runoff signal. By calculating the energy ratio contribution of the high-frequency components [31,32], we aim to determine a quantitative spectral energy ratio threshold that signals a significant degradation in model performance. This study aims to provide a systematic and practical workflow for high-frequency noise impact assessment and adaptive data preprocessing, thereby substantially enhancing the reliability of LSTM-based hydrological models under real-world, uncertain data conditions.

The main contributions of this study are summarized as follows:

We propose a three-category noise typology specific to hydrological characteristics and a frequency domain evaluation framework, filling the gap of lacking a systematic, physically relevant noise assessment system in existing research.
We develop the AEWMA adaptive denoising algorithm, which achieves an effective balance between noise suppression and hydrological signal preservation through an IQR-based dynamic adjustment mechanism.
Through dual-domain (time and frequency) analysis, we quantify the vulnerability of LSTM models to different noise types and for the first time propose quantitative thresholds based on spectral energy ratios, providing guidance for data quality control in operational hydrological forecasting systems.

2. Materials and Methods

2.1. Dataset and Variable Selection

This study utilizes the CAMELS dataset (Catchment Attributes and Meteorology for Large-sample Studies) [33], which provides daily meteorological forcings, static catchment attributes, and streamflow observations for 671 basins across the contiguous United States. It is important to note that these basins were specifically selected to represent relatively undisturbed headwater catchments with minimal anthropogenic influence, rather than constituting a random sample of all US watersheds [33]. The dataset underwent comprehensive quality control procedures for both meteorological inputs, resulting in higher data reliability compared to many operational monitoring networks [33]. While this careful selection enhances data quality for modeling purposes, it also introduces a limitation: findings may not directly transfer to heavily managed or urbanized watersheds where human interventions dominate hydrological processes. Nevertheless, the standardized structure, consistent processing methodology, and well-documented characteristics of the CAMELS dataset make it particularly suitable for our controlled investigation of noise impacts on hydrological modeling. The focus on relatively pristine basins actually strengthens our experimental design by minimizing confounding factors from human alterations when isolating the effects of measurement noise on model performance.

We selected six daily meteorological variables and fourteen static catchment attributes from the CAMELS dataset as model inputs. All observed streamflow values were converted into depth units (mm/day) to normalize for basin area. The complete list of variables is provided in Table 1.

2.2. Noise Typology Based on Temporal and Physical Characteristics

Existing hydrological noise classification lacks a systematic framework tailored to runoff characteristics. Current approaches typically treat all deviations as generic measurement errors without considering their hydrological origins [34,35]. While signal processing offers well-established noise classifications (e.g., white, red, or pink noise) [20], these spectral frameworks are fundamentally mismatched with hydrological processes that operate across distinct temporal scales and physical mechanisms [36]. Drawing from fundamental research distinguishing measurement artifacts from system dynamics [34,37,38], we propose a novel three-class typology specifically designed for streamflow time series. Our framework organizes noise according to hydrological forecasting time scales (sub-daily, daily, seasonal) while explicitly accounting for physical origins and model performance impacts [39,40]. This process-oriented approach transcends purely statistical characterizations, providing practical guidance for targeted noise identification and mitigation in rainfall-runoff modeling. The typical characteristics and classifications of these three types of noise are conceptually represented in Figure 2.

Long-term trend noise: This type of noise spans time scales of several weeks or longer and is typically induced by gradual sensor drift or variations in ambient temperature and humidity [8,9,41]. It slowly distorts the baseline trend of runoff records and interferes with the model’s ability to capture long-term hydrological dynamics, potentially leading to systematic overestimation or underestimation of runoff trends [20,42]. Removing this noise enables the model to more accurately track long-term runoff trends, reduce systematic bias, and improve long-range predictive accuracy.
Short-term event noise: This noise spans hours to several days and typically arises from transient anthropogenic disturbances such as reservoir operations, agricultural irrigation, or other land-use interventions [10,43]. Electromagnetic interference also poses a known challenge in environmental monitoring systems. These disturbances significantly affect runoff by altering land surface properties, modifying evapotranspiration patterns, or directly withdrawing water. During critical hydrological events—such as flood formation or peak flow periods, this noise can introduce spurious fluctuations and reduce the model’s sensitivity to sudden changes, potentially delaying flood peak predictions. Eliminating this type of noise can enhance the model’s accuracy in predicting short-term hydrological processes, including rainfall-runoff responses and flood peak timing.
Transient interference noise: Occurring over time scales of seconds to minutes, this noise typically results from abrupt sensor failures such as data transmission losses or electromagnetic pulses [44,45,46]. Frequent occurrences of this noise can lead to model overfitting, as the model may memorize local noise artifacts instead of learning global hydrological patterns, thereby compromising generalization performance. Removing such transient noise improves model robustness by reducing overfitting to local anomalies and promoting the learning of broader hydrological principles.

2.3. Synthetic Noise Injection Strategy

To investigate the effects of noise on hydrological models and develop more robust forecasting systems, synthetic noise simulation and injection have become essential experimental tools in hydrological research [35]. By artificially introducing random perturbations into observed runoff or forcing data, we emulate the types of noise typically encountered during data acquisition, transmission, or preprocessing. These include sensor malfunctions, environmental variability, and recording errors, which are pervasive in automated monitoring networks [47].

The goal of synthetic noise injection is to evaluate model robustness and stability under varying degrees of data uncertainty, and to provide a controlled framework for testing denoising techniques. Given the difficulty of quantitatively identifying the exact composition and distribution of real-world noise [34], we adopt a composite simulation strategy. This strategy is designed not merely as data augmentation, but to emulate specific physical error mechanisms inherent to hydrological instrumentation [48]. Specifically, the multiplicative noise component captures the signal-dependent heteroscedasticity observed in stage-discharge rating curves, where uncertainty scales with flow magnitude [14,49]. Furthermore, the periodic injection condition simulates intermittent sensor faults [47].

Each synthetic noise instance consists of two components: a gross error and a small-scale random perturbation.The gross error magnitude is controlled by a scaling parameter p, its occurrence frequency by L, and the variability of the small error by the noise variance

σ^{2}

. By tuning these parameters, we can generate diverse noise profiles and composite environments [48,49], enabling a systematic analysis of their effects on model performance and data preprocessing. Rather than reproducing specific physical processes, the synthetic noise framework is designed to abstract representative temporal and spectral patterns of real-world disturbances, enabling controlled robustness evaluation under diverse uncertainty scenarios. Future work will validate the realism of these synthetic configurations using observed noise signatures in hydrological datasets.

The formula description of the noise generation process:

noise [i] = \{\begin{matrix} (r [i] - 0.5) \cdot p \cdot data [i] + e [i], & if i mod L = 0 \\ e [i], & if i mod L \neq 0 \end{matrix}

(1)

where

r [i] \in (0, 1)

is a uniform random variable. p denotes the scaling factor for gross error, which typically ranges from 0.1 to 1.0 to reflect high-flow extrapolation uncertainty [48].

data [i]

is the original signal value.

e [i]

is the Gaussian error component drawn from a normal distribution

N (0, σ^{2})

representing background aleatoric uncertainty. The periodic term

i mod L = 0

explicitly models the sparsity of maintenance-induced outliers or intermittent transmission failures [47].

2.4. Adaptive Denoising Design: From EWMA to AEWMA

In large-sample hydrological studies such as CAMELS, denoising methods must balance signal preservation with cross-basin generalizability. We evaluated several established techniques—Wavelet Transform (WT) [50], Fourier-based filtering [51], and Kalman filtering—but found frequency-domain approaches ill-suited for operational forecasting: Discrete Wavelet Transform (DWT) [50,52], despite its multi-resolution capability, suffers from boundary artifacts and requires basin-specific tuning of wavelet type and decomposition level, hindering scalability across heterogeneous catchments [53].

Consequently, we focused on causal, time-domain filters. A simple moving average was first tested as a baseline but proved inadequate: its fixed window conflates high-frequency noise with rapid hydrological dynamics [15]. The Exponentially Weighted Moving Average (EWMA) [24] mitigates this by emphasizing recent observations; the EWMA formula is as follows:

y_{t} = \{\begin{matrix} X_{0}, & t = 0 \\ α X_{t} + (1 - α) y_{t - 1}, & t > 0 \end{matrix}

(2)

However, its static smoothing parameter

α

cannot adapt to the non-stationary volatility inherent in runoff time series.

To overcome this limitation, we adopt an Adaptive EWMA (AEWMA) [25], which dynamically modulates the smoothing coefficient

α_{t}

based on local signal variability. This design retains the computational efficiency and interpretability of time-domain filtering while introducing the responsiveness typically afforded only by more complex, frequency-based methods—enabling robust preprocessing across diverse hydro-climatic regimes without per-catchment calibration.

The core of AEWMA lies in its adaptive smoothing coefficient

α_{t}

, which varies in response to the local fluctuation level of the input signal

X_{t}

. To quantify local fluctuations, we define a robust threshold based on the interquartile range (IQR) of the runoff time series. Let

Q_{1}

and

Q_{3}

denote the first and third quartiles, and

I Q R = Q_{3} - Q_{1}

[26,27]. The dynamic threshold is:

Threshold = Q_{3} + 1.5 \cdot I Q R

(3)

The IQR-based threshold is more robust to outliers than variance-based thresholds, as it relies on quantiles rather than second-order moments.

If the runoff at time t, denoted

X_{t}

, exceeds this threshold, it is treated as a significant fluctuation and assigned a smaller smoothing coefficient

α_{low}

(e.g., 0.3) to enhance noise suppression. Otherwise, a higher coefficient

α_{high}

(e.g., 0.9) is used to better preserve long-term trends. This dynamic adjustment mechanism allows AEWMA to strike a balance between noise reduction and signal preservation across different hydrological conditions. The virtual schematic diagram is shown in the Figure 3.

The final smoothed value

y_{t}

is computed as a weighted sum of current and historical inputs, where the weights decay exponentially based on the adaptive coefficient

α_{t}

:

y_{t} = \{\begin{matrix} X_{0}, & t = 0 \\ α_{t} X_{t} + (1 - α_{t}) y_{t - 1}, & t > 0 \end{matrix}

(4)

The motivation for adopting an adaptive strategy stems from the significant heterogeneity in hydrological conditions across different basins. Factors such as basin size, topography, land cover, and human interference cause runoff time series to exhibit vastly different levels of variability and noise characteristics [23,54]. In such cases, denoising methods with fixed smoothing parameters often fail to generalize well across regions. AEWMA, by contrast, dynamically adjusts its smoothing behavior based on the local fluctuation intensity of each time step, enabling it to better match the intrinsic properties of diverse hydrological signals.

Across the CAMELS dataset, AEWMA consistently demonstrated effective noise suppression, both under frequent flood-season fluctuations and more stable non-flood conditions. Its key advantage lies in adaptively adjusting to the diverse hydrological patterns across multiple basins, in contrast to fixed-parameter filters, which often fail to generalize across heterogeneous signals. This method aligns well with intuitive expectations and enhances data quality without introducing strong distortions.

Based on a grid search across eight smoothing coefficients uniformly sampled from [0.2, 0.9], we identified two effective parameter configurations:

(α_{high}, α_{low}) = (0.9, 0.6)

and

(0.6, 0.3)

.

To approximate real-world disturbances with varying severity, we designed several synthetic noise injection schemes representing mild, intermittent, and high-intensity interference scenarios. Four representative parameter configurations were selected from these schemes to simulate distinct noise scenarios, namely mild environmental noise, intermittent disturbances, compound noise environment, and extreme noise stress test. Eight experimental configurations were constructed by combining these parameter sets with four noise scenarios, enabling a comprehensive analysis of AEWMA’s robustness across various noise intensities and temporal patterns. Descriptions of each scenario and corresponding denoising strategies are summarized in Table 2. This experimental matrix enables controlled comparisons between denoising intensities and noise levels, providing a systematic platform for evaluating model robustness under uncertain data conditions.

2.5. LSTM Model Configuration and Experimental Control

To rigorously evaluate the proposed denoising strategy, we adopted a controlled experimental design. Rather than optimizing the model architecture for individual basins, we employed a unified Long Short-Term Memory (LSTM) network structure based on established hydrological benchmarks [55].

This standardization isolates the impact of data quality from model variance, ensuring that performance differences are driven solely by the effectiveness of the noise injection and AEWMA denoising methods. Data preprocessing involved Min-Max normalization to accelerate convergence and a strict temporal split (Training: 1985–1995; Validation: 1995–2000; Testing: 2000–2010). Crucially, synthetic noise and denoising were applied only to the training and validation sets, while the test set remained pristine to provide an unbiased evaluation of generalization capability. An input sequence length of 100 days was selected to balance the capture of seasonal hydrological memory with the computational efficiency required for the extensive noise scenario matrix. All detailed hyperparameters and training configurations are listed in Table 3.

2.6. Evaluation Metrics

To evaluate the performance of the LSTM models under various noise conditions, we adopt two commonly used hydrological metrics: Nash–Sutcliffe Efficiency (NSE) [28] and Root Mean Square Error (RMSE).

NSE is widely applied in hydrological modeling to assess how well the predicted values match the observed runoff. It is defined as:

NSE = 1 - \frac{\sum_{t = 1}^{n} {(y_{t} - {\hat{y}}_{t})}^{2}}{\sum_{t = 1}^{n} {(y_{t} - \bar{y})}^{2}}

(5)

where

y_{t}

denotes the observed runoff at time t,

{\hat{y}}_{t}

is the predicted value, and

\bar{y}

is the mean of the observed series. Higher NSE values indicate better predictive performance, with a perfect score of 1.

RMSE is used to quantify the average magnitude of prediction errors and is calculated as:

RMSE = \sqrt{\frac{1}{n} \sum_{t = 1}^{n} {(y_{t} - {\hat{y}}_{t})}^{2}}

(6)

Lower RMSE values suggest better model accuracy.

In addition to these time-domain metrics, we also employ frequency-domain analysis to assess the spectral characteristics of runoff series and their response to noise and denoising. Specifically, we use Power Spectral Density (PSD) [29,30] to quantify the distribution of signal energy across different frequency bands. This analysis provides crucial insight into the temporal scale of hydrological dynamics, helping to distinguish between meaningful low-frequency trends and high-frequency noise.

Through PSD analysis, we aim to determine which frequency components are most affected by synthetic noise and how denoising methods restore signal integrity. This is particularly important in hydrological modeling, where both event-driven variability and long-term trends coexist [39,40]. In this study, PSD is estimated using the Welch method, which partitions the time series into overlapping segments and averages their periodograms to reduce variance and improve spectral estimation stability.

The PSD at each frequency f is computed as:

PSD (f) = \frac{1}{N} {|\sum_{n = 0}^{N - 1} x (n) e^{- j 2 π f n / N}|}^{2}

(7)

where

x (n)

is the runoff signal, and

PSD (f)

indicates the power contribution of frequency f.

To explicitly quantify the contribution of specific frequency components (e.g., high-frequency noise vs. low-frequency trends) to the overall signal variability, we calculate the spectral energy. Based on Parseval’s theorem [31], the integral of the PSD over the frequency domain is equivalent to the variance (total energy) of the time series. Therefore, the spectral energy

E_{band}

contained within a specific frequency range

[f_{1}, f_{2}]

is defined as:

E_{band} = \int_{f_{1}}^{f_{2}} PSD (f) d f

(8)

Consequently, the Energy Contribution Ratio (ECR), denoted as

η_{band}

, represents the percentage of the total signal variance governed by that specific frequency band:

η_{band} = \frac{E_{band}}{E_{total}} = \frac{\int_{f_{1}}^{f_{2}} PSD (f) d f}{\int_{0}^{f_{N}} PSD (f) d f} \times 100 %

(9)

where

f_{N}

represents the Nyquist frequency [32]. This metric allows us to intuitively visualize the “energy occupancy” of different components, providing a rigorous standard to verify whether the denoising process effectively suppresses the energy in high-frequency noise bands while preserving the energy of hydrological drivers.

3. Results and Discussion

3.1. Noise Injection and Denoising Results

The results of the noise injection experiments confirm that the four designed scenarios effectively simulate environmental disturbances of varying intensities in hydrological data. Meanwhile, the denoising experiments clearly illustrate how varying smoothing intensities affect the recovery of underlying signal features.

Under mild disturbance scenarios, the AEWMA-based denoising method effectively restores both the trend and short-term fluctuations of the original time series. However, as the noise intensity increases, even adaptive methods struggle to fully recover extreme anomalies. Nevertheless, the overall denoising performance remains robust across most cases.

Figure 4 presents the denoising results for basin 01013500 (from May 2000 to November 2000) as a representative example. Due to manuscript length constraints, over 20,000 detailed basin-specific experimental figures are not presented here. Interested readers are directed to the Data Availability Statement for the complete access information.

3.2. LSTM Results Under Noise and Denoising

In this section, we evaluate the performance of the LSTM model under varying noise intensities and corresponding denoising strategies. The four noise scenarios, denoted as Noise 1 to Noise 4, represent increasing levels of disturbance from mild to severe. For each scenario, two AEWMA-based smoothing strategies were applied: a light smoothing scheme (denoted AE96) and a stronger smoothing scheme (denoted AE63), corresponding to different

α

parameter settings.

Table 4 summarizes the resulting model performance in terms of Nash–Sutcliffe Efficiency (NSE) and Root Mean Square Error (RMSE), offering a comprehensive view of how different denoising intensities affect predictive accuracy under noisy conditions.

Figure 5 presents the LSTM model performance under varying noise intensities and denoising strategies. As noise levels increase from NE1 to NE4, model accuracy declines notably, especially in terms of median NSE, highlighting the vulnerability of LSTM to noise contamination.

Across all configurations, the model trained on raw (clean) data achieves the highest performance, with a median NSE of 0.6785. In contrast, noise injection without denoising leads to a marked drop in performance. For example, in the most severe scenario (NE4), the median NSE drops to 0.6410 and further to 0.5920 under strong denoising (NE4AE63), indicating that over-smoothing may sometimes harm prediction accuracy.

Nevertheless, appropriate denoising substantially mitigates noise impact. Lighter AEWMA smoothing (e.g., NE2AE96 with median NSE = 0.6770) often recovers performance close to the clean baseline, particularly under moderate noise conditions. This suggests that moderate smoothing is effective in balancing trend preservation and noise suppression.

Overall, the results demonstrate that denoising plays a crucial role in preserving model robustness under noisy inputs. The effectiveness of each smoothing strategy depends on the noise intensity: light denoising suffices under mild to moderate disturbances, while stronger noise scenarios require more aggressive—but carefully tuned—denoising interventions.

3.3. Distributional Characteristics of Prediction Errors Under Noise and Denoising

To analyze the impact of different processing methods on data distribution characteristics, we used basin 01013500 as a representative example. We performed a violin plot analysis on the data processed by each method. This analysis aimed to reveal changes in data distribution and its structural characteristics after the introduction of noise. Due to space constraints, additional experimental figures are not included in the manuscript. Readers may contact the corresponding author for access to the full set of visualizations.

Figure 6 illustrates the distribution of LSTM prediction errors for basin 01013500 using violinplots across the four noise scenarios (NE1–NE4), each under raw, noisy, and denoised conditions.

In the clean data (baseline), the prediction errors are tightly distributed with few outliers, indicating stable and accurate model performance. As noise intensity increases, the error distributions widen noticeably, with a sharp rise in outliers and interquartile range (IQR), especially in NE3 and NE4, reflecting a clear degradation of model robustness.

AEWMA smoothing (AE96 and AE63) significantly reduces prediction error dispersion, with AE63 performing best under extreme noise. Notably, AE96 achieves a better balance between variance reduction and signal preservation under moderate noise (e.g., NE2), while AE63 performs better under extreme noise (NE4) by aggressively suppressing high-frequency fluctuations.

Overall, violin analysis confirms that noise injection induces dispersion and instability in model outputs, while denoising restores prediction stability, with smoothing intensity requiring calibration based on noise severity.

3.4. Frequency-Domain Results via PSD Analysis

To investigate how noise injection and denoising strategies affect the spectral characteristics of streamflow signals, we first computed the average Power Spectral Density (PSD) across all 671 CAMELS basins. The resulting spectral energy distribution is summarized over three frequency bands—low (0–0.01 Hz), mid (0.01–0.05 Hz), and high (0.05–1 Hz), where frequency is expressed in a unified Hz-based form for interpretability, while remaining directly applicable to time series at other temporal resolutions without unit conversion. The specific division of low-, mid-, and high-frequency bands is primarily guided by the characteristics of the data and the observed cumulative energy distribution, with the detailed calculation analysis charts and explanations for interested readers (see Appendix A). The original series is shown in Figure 7, and the twelve denoising experiments are shown in Figure 8.

Across all 671 CAMELS basins, the averaged spectral energy distribution reveals that approximately 30% of the total signal energy resides in the high-frequency range (Figure 7). As noise intensity increases, this high-frequency portion becomes progressively contaminated, leading to inflated energy contributions that do not correspond to meaningful hydrological variability. Denoising mitigates part of this distortion: AEWMA63 strongly suppresses high-frequency components, reducing more than half of the high-frequency energy on average, whereas AEWMA96 provides more moderate attenuation(Figure 8). The comparison between spectral patterns and modeling performance (e.g., NSE) indicates that aggressive denoising such as AEWMA63 removes not only noise but also essential short-term information, while AEWMA96 achieves a better balance—successfully counteracting low-level noise injection but remaining less effective under extreme high-frequency perturbations.

Due to space limitations, we present representative examples from two contrasting basins: a humid basin (01013500) and an arid basin (12010000). Interested readers are directed to the Data Availability Statement for the complete access information.

Their comparative spectral attenuation patterns across the three frequency bands—illustrated in Figure 9, Figure 10 and Figure 11—highlight the effectiveness of the denoising strategies under different hydrological conditions.

In the low-frequency range (Figure 9), the original data and AEWMA96-denoised series maintain dominant energy concentrations, capturing long-term hydrological trends. However, the noisy groups show diffused spectral energy and mild distortions, particularly in NE3 and NE4. This indicates that high-intensity noise corrupts low-frequency components that are critical for long-memory processes in runoff modeling.

In the mid-frequency band (Figure 10), the effect of AEWMA becomes more pronounced. AEWMA96 retains moderate energy and preserves key event-scale fluctuations, while AEWMA63 shows evident spectral suppression. Such suppression corresponds to smoother time series, which may improve robustness under light noise but risks information loss under strong filtering.

In the high-frequency domain (Figure 11), all noisy datasets exhibit elevated energy, reflecting the artificial injection of high-frequency disturbances. AEWMA96 effectively dampens this spike while preserving some short-term dynamics. In contrast, AEWMA63 nearly eliminates high-frequency components, leading to a flattened spectrum. This excessive attenuation contributes to the loss of peak flow detail and timing, as observed in model predictions.

Overall, PSD analysis confirms that light denoising (AEWMA96) balances noise suppression with signal preservation, enhancing model robustness. Over-smoothing (AEWMA63), while reducing noise, may degrade modeling accuracy by distorting crucial temporal features, particularly at mid-to-high frequencies. This is consistent with the lower NSE observed in heavily smoothed groups (2, 4, 6, and 8).

3.5. Discussion

This study highlights the critical influence of noise—particularly high-frequency components—on the performance of hydrological prediction models and offers a theoretical framework for its quantification and mitigation in practical applications. While denoising techniques can partially restore model accuracy, the accurate estimation of noise intensity and its frequency composition remains central to improving predictive robustness.

High-frequency noise and model degradation. Our frequency-domain analysis demonstrates that high-frequency noise substantially deteriorates model accuracy, especially when its energy dominates the signal. This effect is particularly pronounced in scenarios where data quality is compromised due to unstable measurements or sensor malfunctions. In real-world hydrological systems, such short-term fluctuations may stem from instrumentation errors, environmental disturbances, or transmission anomalies, all of which distort the physical meaning of the underlying signals. Therefore, establishing a method to quantify the proportion of high-frequency noise in hydrological time series is essential. Once quantified, this information can be used to define threshold levels in water resource management systems: when the high-frequency noise surpasses a given threshold, corrective actions such as denoising or data recalibration should be initiated to ensure modeling reliability.

Implications for data preprocessing and decision making. In operational hydrological forecasting, data originate from diverse monitoring stations, subject to regional variation in equipment quality and environmental conditions. Consequently, the extent of noise contamination is spatially heterogeneous. Quantifying the spectral characteristics of noise—especially its high-frequency components—can provide a standardized basis for preprocessing strategies. For example, data with a high proportion of high-frequency energy may require stronger denoising, while cleaner datasets might be preserved to avoid signal distortion. Such tailored strategies can significantly improve model training and reduce uncertainty in downstream predictions.

Regional heterogeneity and adaptive strategies. The effectiveness of noise mitigation varies with local infrastructure. Well-maintained stations typically yield higher-quality data and are less sensitive to noise, whereas remote or resource-limited regions are more prone to data degradation. As a result, developing region-specific noise thresholds and preprocessing strategies based on quantification results is essential for generalizing the proposed framework. Future work should focus on defining spectral-based standards for data quality and integrating them into adaptive model pipelines.

Extreme events and trade-offs in denoising. Beyond general prediction accuracy, the impact of noise on the forecasting of extreme events (e.g., floods or droughts) warrants special attention. High-frequency noise may introduce false anomalies that trigger false alarms or obscure critical signals, resulting in delayed or incorrect emergency responses. However, excessive denoising can also erase legitimate extremes, weakening the model’s sensitivity to rare but important hydrological events. Thus, striking a balance between noise suppression and signal preservation is key to maintaining model robustness in disaster forecasting scenarios.

Limitations of dataset and ground truth. Although this study employed the widely recognized CAMELS dataset as a benchmark, we must acknowledge its inherent limitations in terms of applicability. While the CAMELS dataset contains 25 years of historical hydrological records across the United States and has undergone rigorous manual screening to ensure high data quality, this screening process inevitably excludes watersheds that are heavily influenced by human activities or suffer from poor measurement reliability. As a result, the “background noise” level within CAMELS is likely lower than that encountered in many real-world operational settings. Consequently, the conclusions drawn in this study may represent a conservative estimate of the potential benefits of noise mitigation. In data environments with lower measurement quality, the proposed method may yield even greater improvements.

Furthermore, this study relied on synthetic noise injection to emulate data imperfections rather than directly using observational data containing genuine noise. This choice stems from the well-known “Ground Truth paradox” in hydrology: since streamflow records are derived from stage–discharge rating curves, they are inherently affected by channel geometry changes, measurement errors, and extrapolation uncertainties. Under such circumstances, truly noise-free streamflow data cannot be obtained. Synthetic noise injection therefore provides a scientifically valid and reproducible means of isolating the effects of noise and systematically evaluating model robustness under controlled conditions.

Future research should extend this experimental framework to more diverse datasets, including basins experiencing stronger anthropogenic disturbances or regions with uneven and lower-quality hydrological observations. Such expansions will help assess the generalizability and practical value of the proposed denoising strategy in non-ideal, real-world data environments.

Methodological evolution and parameter optimization. In the experimental design of this study, the division of high-, mid-, and low-frequency bands was primarily based on empirical rules. Similarly, the denoising thresholds, noise injection intensities, and quartile-based outlier screening procedures followed common practices in mathematical signal processing, combined with intuitive understanding of hydrological and physical processes. Although these parameter choices performed well under the current experimental setting, fixed heuristic parameters may lack flexibility when applied to basins with heterogeneous hydrological response characteristics, such as varying runoff concentration times or distinct flow generation mechanisms. For instance, the traditional Tukey fence method may misclassify flood peaks as outliers when dealing with highly skewed hydrological distributions.

Future work should consider the introduction of adaptive parameter optimization mechanisms. Approaches such as metaheuristic search algorithms or Bayesian optimization could be employed to automatically infer optimal frequency partition thresholds and denoising parameters based on the intrinsic hydrological characteristics of each watershed. Such strategies would enable a transition from static, empirically driven preprocessing toward dynamic, data-driven adaptive work.

Outlook on architectural advances. This study adopted the classical LSTM as the core predictive model, primarily because it serves as a benchmark architecture in hydrological modeling with strong interpretability and community support. This choice allowed us to focus on evaluating the impact of data quality without introducing additional variability from model structures. However, deep learning architectures are evolving rapidly. Transformer models, equipped with global attention mechanisms, have shown strong potential in long-sequence modeling, while emerging state-space models such as Mamba can capture long-range dependencies with linear computational complexity. These architectures may exhibit noise-sensitivity characteristics that differ substantially from RNN-based models.

Due to the scope and time constraints of this study, we did not systematically evaluate the proposed noise quantification and data-processing framework on Transformer or Mamba models. Future work should therefore include cross-architecture robustness auditing to examine the stability and performance gains of this framework across different sequence modeling paradigms. This will help explore the synergy between data-centric AI and advanced model architectures, paving the way toward next-generation, highly robust hydrological forecasting systems.

4. Conclusions

This study systematically investigates the impact of synthetic high-frequency noise on hydrological modeling, particularly during the preprocessing stage of rainfall–runoff data. By injecting controlled noise into runoff series and analyzing their effects on LSTM model performance, we demonstrate that increased high-frequency components (especially above 0.1 Hz) significantly distort the spectral structure of the original data, leading to a marked degradation in model accuracy.

The findings highlight that hydrological signals are inherently low-frequency dominant, and the presence of excessive high-frequency fluctuations—often originating from sensor errors or short-term disturbances—can obscure meaningful patterns in the data. Spectral analysis reveals that when high-frequency energy exceeds approximately 30% of the total signal energy, model performance indicators such as NSE deteriorate substantially. This underscores the need for frequency-aware preprocessing. While appropriate denoising can restore essential low-frequency features and improve robustness, excessive smoothing may suppress valid dynamics and weaken rainfall–runoff coupling.

In conclusion, this work demonstrates the critical role of frequency-domain characteristics in hydrological data quality and model performance. We recommend incorporating power spectral density (PSD)-based evaluation into hydrological data quality control workflows. The proportion of high-frequency energy can serve as a quantitative threshold to guide the decision of whether and how aggressively denoising should be applied. Future research should focus on building adaptive noise discrimination and preprocessing frameworks that dynamically adjust strategies based on frequency energy profiles and observational contexts, thereby enhancing the resilience and reliability of hydrological forecasting systems.

Author Contributions

T.L.: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Writing—Original Draft, Visualization. W.O.: Conceptualization, Methodology, Investigation, Writing—Review and Editing, Supervision, Project Administration, Funding Acquisition. M.A.: Writing—Review and Editing. C.Z.: Resources, Supervision, Project Administration, Funding Acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Doctoral Start-up Foundation of Liaoning Province (Grant No. 2023-BSBA-075), and a provincial horizontal project entitled “Procurement Project (Package 1) of the 2025 Heilongjiang Provincial Small Watershed Flash Flood Disaster ‘Four-Prevention’ Capability Construction Program”.

Data Availability Statement

Data in this study are accessible from public resources. The meteorological data, catchment attributes, and streamflow observations are derived from the CAMELS dataset (https://ral.ucar.edu/solutions/products/camels, accessed on 4 December 2025). Due to space limitations, not all experimental visualizations are included in the manuscript. Interested readers can visit Mendeley data (https://doi.org/10.17632/zzwrgn7mm6.1, updated on 4 December 2025).

Acknowledgments

The authors thank the editors and reviewers for their insightful comments, which significantly improved the quality of the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

We performed cumulative spectral energy analysis based on the integrated power spectral density (PSD) for both the original streamflow series and all experimental datasets. As an illustrative example, Figure A1 presents the mean cumulative energy distribution under the

NE 4

noise condition; the corresponding results for the other three noise levels are highly similar and are therefore not shown to avoid redundancy. The cumulative energy curves are computed and visualized following Equations (7)–(9).

Figure A1. Average cumulative energy distribution for NE4 data across all basins. Vertical dashed lines denote reference frequencies at 0.01, 0.05, and 0.1 cycles day⁻¹.

Several key characteristics observed in Figure A1 motivate the empirical division of the frequency domain into low-, mid-, and high-frequency bands in subsequent analysis. First, the cumulative energy curve exhibits a distinct concave shape, which indicates that streamflow variability is predominantly governed by low-frequency components (i.e., long-term and seasonal variations). The CAMELS streamflow data are recorded at a daily temporal resolution, meaning the physical frequency unit is cycles day⁻¹. According to the Nyquist sampling theorem, the maximum resolvable physical frequency is 0.5 cycles day⁻¹. For consistency with subsequent analyses involving data at potentially different temporal resolutions (e.g., hourly data), we adopt a unified frequency range of 0 to 1.0 cycles day⁻¹ for visualization, noting that the region above 0.5 cycles day⁻¹ does not contain physically resolvable information for the daily data. Although the frequency axis is labeled in cycles day⁻¹, we may occasionally refer to this axis as being in a Hz-based form conceptually, to facilitate reader understanding and conceptual linkage with high-frequency analyses in other literature. Furthermore, noticeable stratification between the original series (Raw) and the processed series (Noise, AE96, AE63) occurs around 0.01 cycles day⁻¹ and 0.05 cycles day⁻¹. These points represent critical frequencies where the impact of noise injection and the subsequent denoising operation on the spectral energy distribution become distinctly differentiated. Based on these observations, we select 0.01 cycles day⁻¹ and

0.05

cycles day⁻¹ as practical boundaries for separating the Low-, Mid-, and High-Frequency bands. Compared with the noise-injected series, the AEWMA-based denoising results exhibit distinct spectral behaviors, especially concerning the high-frequency components. Specifically, the AE63 setting (which corresponds to a stronger denoising intensity than AE96) substantially suppresses high-frequency components. The massive removal of high-frequency energy leads to a proportionally higher concentration of the remaining energy in the lower frequency bands. This results in the AE63 curve displaying the steepest growth rate and the most pronounced concave shape in the low-frequency range, effectively demonstrating that AE63 successfully shifts the vast majority of the cumulative energy to the lowest resolvable frequencies. Conversely, the AE96 strategy (with lower denoising intensity) retains relatively more high-frequency variability, positioning its curve between the AE63 and the Raw/Noise curves.

References

Kratzert, F.; Klotz, D.; Brenner, C.; Schulz, K.; Herrnegger, M. Rainfall–runoff modelling using long short-term memory (LSTM) networks. Hydrol. Earth Syst. Sci. 2018, 22, 6005–6022. [Google Scholar] [CrossRef]
Guo, J.; Liu, Y.; Zou, Q.; Ye, L.; Zhu, S.; Zhang, H. Study on optimization and combination strategy of multiple daily runoff prediction models coupled with physical mechanism and LSTM. J. Hydrol. 2023, 624, 129969. [Google Scholar] [CrossRef]
Chen, S.; Feng, Y.; Li, H.; Ma, D.; Mao, Q.; Zhao, Y.; Liu, J. Enhancing runoff predictions in data-sparse regions through hybrid deep learning and hydrologic modeling. Sci. Rep. 2024, 14, 26450. [Google Scholar] [CrossRef]
Li, H.; Zhang, C.; Chu, W.; Shen, D.; Li, R. A process-driven deep learning hydrological model for daily rainfall-runoff simulation. J. Hydrol. 2024, 637, 131434. [Google Scholar] [CrossRef]
Akinsoji, A.H.; Adelodun, B.; Adeyi, Q.; Salau, R.A.; Odey, G.; Choi, K.S. Integrating machine learning models with comprehensive data strategies and optimization techniques to enhance flood prediction accuracy: A review. Water Resour. Manag. 2024, 38, 4735–4761. [Google Scholar] [CrossRef]
Renard, B.; Kavetski, D.; Kuczera, G.; Thyer, M.; Franks, S.W. Understanding predictive uncertainty in hydrologic modeling: The challenge of identifying input and structural errors. Water Resour. Res. 2010, 46, 1–22. [Google Scholar] [CrossRef]
Shin, Y.; Na, K.Y.; Kim, S.E.; Kyung, E.J.; Choi, H.G.; Jeong, J. LSTM-Autoencoder Based Detection of Time-Series Noise Signals for Water Supply and Sewer Pipe Leakages. Water 2024, 16, 2631. [Google Scholar] [CrossRef]
Goebel, K.; Yan, W. Correcting sensor drift and intermittency faults with data fusion and automated learning. IEEE Syst. J. 2008, 2, 189–197. [Google Scholar] [CrossRef]
Helm, I.; Jalukse, L.; Leito, I. Measurement uncertainty estimation in amperometric sensors: A tutorial review. Sensors 2010, 10, 4430–4455. [Google Scholar] [CrossRef]
Neupane, R.P.; Kumar, S. Estimating the effects of potential climate and land use changes on hydrologic processes of a large agriculture dominated watershed. J. Hydrol. 2015, 529, 418–429. [Google Scholar] [CrossRef]
Alías, F.; Alsina-Pagès, R.M. Review of wireless acoustic sensor networks for environmental noise monitoring in smart cities. J. Sensors 2019, 2019, 7634860. [Google Scholar] [CrossRef]
Jiang, Y.; Xu, Z.; Xiong, L. Runoff variation and response to precipitation on multi-spatial and temporal scales in the southern Tibetan Plateau. J. Hydrol. Reg. Stud. 2022, 42, 101157. [Google Scholar] [CrossRef]
Moges, E.; Demissie, Y.; Larsen, L.; Yassin, F. Review: Sources of hydrological model uncertainties and advances in their analysis. Water 2021, 13, 28. [Google Scholar] [CrossRef]
McMillan, H.K.; Westerberg, I.K.; Krueger, T. Hydrological data uncertainty and its implications. Wiley Interdiscip. Rev. Water 2018, 5, e1319. [Google Scholar] [CrossRef]
Hansun, S. A new approach of moving average method in time series analysis. In Proceedings of the 2013 Conference on New Media Studies (CoNMedia), Tangerang, Indonesia, 27–28 November 2013; IEEE: New York, NY, USA, 2013; pp. 1–4. [Google Scholar]
Pang, J.; Luo, W.; Yao, Z.; Chen, J.; Dong, C.; Lin, K. Water quality prediction in urban waterways based on wavelet packet Denoising and LSTM. Water Resour. Manag. 2024, 38, 2399–2420. [Google Scholar] [CrossRef]
Shao, P.; Feng, J.; Lu, J.; Tang, Z. Data-driven and knowledge-guided denoising diffusion probabilistic model for runoff uncertainty prediction. J. Hydrol. 2024, 638, 131556. [Google Scholar] [CrossRef]
Bai, L.; Chen, Z.; Xu, J.; Li, W. Multi-scale response of runoff to climate fluctuation in the headwater region of Kaidu River in Xinjiang of China. Theor. Appl. Climatol. 2016, 125, 703–712. [Google Scholar] [CrossRef]
Xu, D.M.; Liao, A.D.; Wang, W.; Tian, W.C.; Zang, H.F. Improved monthly runoff time series prediction using the CABES-LSTM mixture model based on CEEMDAN-VMD decomposition. J. Hydroinform. 2024, 26, 255–283. [Google Scholar] [CrossRef]
Bunde, A. The different types of noise and how they effect data analysis. Chem. Ing. Tech. 2023, 95, 1758–1767. [Google Scholar] [CrossRef]
Du, Y.; Bao, A.; Zhang, T.; Ding, W. Quantifying the impacts of climate change and human activities on seasonal runoff in the Yongding River basin. Ecol. Indic. 2023, 154, 110839. [Google Scholar] [CrossRef]
Tarasova, L.; Basso, S.; Zink, M.; Merz, R. Exploring controls on rainfall-runoff events: 1. Time series-based event separation and temporal dynamics of event runoff response in Germany. Water Resour. Res. 2018, 54, 7711–7732. [Google Scholar] [CrossRef]
Sang, Y.F.; Wang, D.; Wu, J.C.; Zhu, Q.P.; Wang, L. The relation between periods’ identification and noises in hydrologic series data. J. Hydrol. 2009, 368, 165–177. [Google Scholar] [CrossRef]
Li, B.; Zheng, T.; Wang, R.; Liu, J.; Guo, J.; Tan, X.; Xiao, T.; Zhu, J.; Wang, J.; Cai, X. Predictor-corrector enhanced transformers with exponential moving average coefficient learning. Adv. Neural Inf. Process. Syst. 2024, 37, 20358–20382. [Google Scholar]
Haq, A.; Gulzar, R.; Khoo, M.B.C. An efficient adaptive EWMA control chart for monitoring the process mean. Qual. Reliab. Eng. Int. 2018, 34, 563–571. [Google Scholar] [CrossRef]
Hodge, V.; Austin, J. A survey of outlier detection methodologies. Artif. Intell. Rev. 2004, 22, 85–126. [Google Scholar] [CrossRef]
Wan, X.; Wang, W.; Liu, J.; Tong, T. Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC Med Res. Methodol. 2014, 14, 135. [Google Scholar] [CrossRef]
Nash, J.E.; Sutcliffe, J.V. River flow forecasting through conceptual models part I—A discussion of principles. J. Hydrol. 1970, 10, 282–290. [Google Scholar] [CrossRef]
Schaefli, B.; Gupta, H.V. Do Nash values have value? Hydrol. Process. 2007, 21, 2075–2080. [Google Scholar] [CrossRef]
Welch, P. The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms. IEEE Trans. Audio Electroacoust. 2003, 15, 70–73. [Google Scholar] [CrossRef]
Hassanzadeh, M.; Shahrrava, B. Linear version of Parseval’s theorem. IEEE Access 2022, 10, 27230–27241. [Google Scholar] [CrossRef]
Por, E.; Van Kooten, M.; Sarkovic, V. Nyquist–Shannon sampling theorem. Leiden Univ. 2019, 1, 1–2. [Google Scholar]
Addor, N.; Newman, A.J.; Mizukami, N.; Clark, M.P. The CAMELS data set: Catchment attributes and meteorology for large-sample studies. Hydrol. Earth Syst. Sci. 2017, 21, 5293–5313. [Google Scholar] [CrossRef]
Elshorbagy, A.; Simonovic, S.P.; Panu, U.S. Noise reduction in chaotic hydrologic time series: Facts and doubts. J. Hydrol. 2002, 256, 147–165. [Google Scholar] [CrossRef]
Nourani, V.; Partoviyan, A. Hybrid denoising-jittering data pre-processing approach to enhance multi-step-ahead rainfall–runoff modeling. Stoch. Environ. Res. Risk Assess. 2018, 32, 545–562. [Google Scholar] [CrossRef]
Kim, G.I.; Chung, K. Extraction of Features for Time Series Classification Using Noise Injection. Sensors 2024, 24, 6402. [Google Scholar] [CrossRef] [PubMed]
Groos, J.; Ritter, J. Time domain classification and quantification of seismic noise in an urban environment. Geophys. J. Int. 2009, 179, 1213–1231. [Google Scholar] [CrossRef]
Mahmood, M.Q.; Wang, X.; Aziz, F.; Pang, T. Evaluating the sustainability of groundwater abstraction in small watersheds using time series analysis. Groundw. Sustain. Dev. 2024, 26, 101288. [Google Scholar] [CrossRef]
Chow, V.T.; Kareliotis, S.J. Analysis of stochastic hydrologic systems. Water Resour. Res. 1970, 6, 1569–1582. [Google Scholar] [CrossRef]
Yevjevich, V. Stochastic models in hydrology. Stoch. Hydrol. Hydraul. 1987, 1, 17–36. [Google Scholar] [CrossRef]
Khatri, P.; Gupta, K.K.; Gupta, R.K. Drift compensation of commercial water quality sensors using machine learning to extend the calibration lifetime. J. Ambient Intell. Humaniz. Comput. 2021, 12, 3091–3099. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, L.; Qi, X.; Yang, X.; Tan, Q. A Baseline Drift-Elimination Algorithm for Strain Measurement-System Signals Based on the Transformer Model. Appl. Sci. 2024, 14, 4447. [Google Scholar] [CrossRef]
Mitra, N.A.; Usama, I.S.; Nasrin, A. Role of impoundment and irrigation in intensive agriculture watersheds. J. Hydrol. 2025, 662, 134075. [Google Scholar] [CrossRef]
Demeyer, S.; Kristoffersen, S.K.; Le Pichon, A.; Larsonnier, F.; Fischer, N. Contribution to uncertainty propagation associated with on-site calibration of infrasound monitoring systems. Remote Sens. 2023, 15, 1892. [Google Scholar] [CrossRef]
Maurya, P.K.; Christensen, F.E.; Kass, M.A.; Pedersen, J.B.; Frederiksen, R.R.; Foged, N.; Christiansen, A.V.; Auken, E. Technical note: Efficient imaging of hydrological units below lakes and fjords with a floating, transient electromagnetic (FloaTEM) system. Hydrol. Earth Syst. Sci. Discuss. 2021, 26, 2813–2827. [Google Scholar] [CrossRef]
Lin, T.; Yao, X.; Yu, S.; Zhang, Y. Electromagnetic noise suppression of magnetic resonance sounding combined with data acquisition and multi-frame spectral subtraction in the frequency domain. Electronics 2020, 9, 1254. [Google Scholar] [CrossRef]
Leigh, C.; Alsibai, O.; Hyndman, R.J.; Kanaarachchi, S.; King, O.C.; McGree, J.M. A framework for automated anomaly detection in high frequency water-quality data from in situ sensors. Sci. Total Environ. 2019, 664, 885–898. [Google Scholar] [CrossRef]
Coxon, G.; Freer, J.; Westerberg, I.K.; Wagener, T.; Woods, R.; Smith, P.J. A novel framework for discharge uncertainty quantification applied to 500 UK gauging stations. Water Resour. Res. 2015, 51, 5531–5546. [Google Scholar] [CrossRef]
McMillan, H.; Jackson, B.; Clark, M.; Kavetski, D.; Woods, R. Rainfall uncertainty in hydrological modelling: An evaluation of multiplicative error models. J. Hydrol. 2011, 400, 83–94. [Google Scholar] [CrossRef]
Abbaszadeh, P. Improving hydrological process modeling using optimized threshold-based wavelet de-noising technique. Water Resour. Manag. 2016, 30, 1701–1721. [Google Scholar] [CrossRef]
Xie, Y.; Huang, Q.; Chang, J.; Liu, S.; Wang, Y. Period analysis of hydrologic series through moving-window correlation analysis method. J. Hydrol. 2016, 538, 278–292. [Google Scholar] [CrossRef]
Xu, D.M.; Li, Z.; Wang, W.C.; Hong, Y.H.; Gu, M.; Hu, X.X.; Wang, J. WaveTransTimesNet: An enhanced deep learning monthly runoff prediction model based on wavelet transform and transformer architecture. Stoch. Environ. Res. Risk Assess. 2025, 39, 883–910. [Google Scholar] [CrossRef]
Quilty, J.; Adamowski, J. Addressing the incorrect usage of wavelet-based hydrological and water resources forecasting models for real-world applications with best practices and a new forecasting framework. J. Hydrol. 2018, 563, 336–353. [Google Scholar] [CrossRef]
Rhif, M.; Ben Abbes, A.; Farah, I.R.; Martínez, B.; Sang, Y. Wavelet transform application for/in non-stationary time-series analysis: A review. Appl. Sci. 2019, 9, 1345. [Google Scholar] [CrossRef]
Kratzert, F.; Klotz, D.; Shalev, G.; Klambauer, G.; Hochreiter, S.; Nearing, G. Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets. Hydrol. Earth Syst. Sci. 2019, 23, 5089–5110. [Google Scholar] [CrossRef]

Figure 1. Architecture of the Proposed Frequency-Based Framework.

Figure 2. Conceptual Diagram of Three Hydrologically Meaningful Noise Types.

Figure 3. Mechanism Diagram of the Adaptive EWMA Denoising Algorithm.

Figure 4. Rain-runoff results for basin 01013500 under different NE settings (part 1). The first row shows the legend, followed by NE1-4 results.

Figure 5. Comparison of median NSE and mean NSE across four experimental settings: (a) NE, (b) NE+AE96, (c) NE+AE63. Each subfigure shows the predictive performance over all basins, highlighting the effect of adaptive error correction on model accuracy.

Figure 6. Violin plot showing the distribution and maximum values of LSTM prediction errors for basin 01013500 across different noise scenarios (NE1–NE4). This figure highlights the variability and extreme values of prediction errors for comparison purposes.

Figure 7. Average Signal Energy Contribution of Raw Streamflow Data.

Figure 8. Average Signal Energy Contribution under Noise Injection and Adaptive Denoising Scenarios.

Figure 9. Power Spectral Density (PSD) Analysis in the Low-Frequency Band (0–0.01 Hz) for Basins 01013500 and 12010000 Under Various Noise Conditions.

Figure 10. Power Spectral Density (PSD) Analysis in the Low-Frequency Band (0.01–0.05 Hz) for Basins 01013500 and 12010000 Under Various Noise Conditions.

Figure 11. Power Spectral Density (PSD) Analysis in the Low-Frequency Band (0.05–1.0 Hz) for Basins 01013500 and 12010000 Under Various Noise Conditions.

Table 1. List of Meteorological and Catchment Attributes from the CAMELS Dataset.

Meteorological Forcings (6)	Catchment Attributes (14)
Precipitation (prcp)	Mean annual precipitation (p_mean)
Shortwave radiation (srad)	Precipitation seasonality (p_seasonality)
Maximum temperature (tmax)	Fraction of precipitation as snow (frac_snow)
Minimum temperature (tmin)	Aridity index (aridity)
Vapor pressure (vp)	Geological porosity (geol_porosity)
Day length (dayl)	Geological permeability (geol_permeability)
	Soil depth (soil_depth_statsgo)
	Soil porosity (soil_porosity)
	Soil conductivity (soil_conductivity)
	Mean elevation (elev_mean)
	Mean slope (slope_mean)
	Catchment area (area_gages2)
	Forest fraction (frac_forest)
	Maximum leaf area index (lai_max)

Table 2. Configurations of Noise Injection and Adaptive Denoising Experiments.

Group	Experiment Label	p	L	Var	$α_{high}$	$α_{low}$	Scenario Description
Scenario 1: Mild Environmental Noise
1-1	NE1AE96	0.4	10	0.05	0.9	0.6	Light random noise with mild smoothing
1-2	NE1AE63	0.4	10	0.05	0.6	0.3	Potential over-smoothing under aggressive filtering
Scenario 2: Intermittent Disturbances
2-1	NE2AE96	0.4	20	0.1	0.9	0.6	Infrequent events with moderate denoising
2-2	NE2AE63	0.4	20	0.1	0.6	0.3	Strong smoothing may suppress true short-term signals
Scenario 3: Compound Noise Environment
3-1	NE3AE96	0.8	20	0.05	0.9	0.6	Mixture of drift and small-scale noise
3-2	NE3AE63	0.8	20	0.05	0.6	0.3	Robustness testing under compound perturbations
Scenario 4: Extreme Noise Stress Test
4-1	NE4AE96	0.8	10	0.1	0.9	0.6	Threshold test under strong high-frequency contamination
4-2	NE4AE63	0.8	10	0.1	0.6	0.3	Extreme case with intensive smoothing

Note: p denotes the gross error amplitude coefficient; L is the disturbance cycle length (in days); Var indicates the variance of added Gaussian error.

α_{high}

and

α_{low}

are AEWMA smoothing coefficients for low- and high-variability regions, respectively. A higher

α

retains more signal trend, while a lower

α

enhances noise suppression.

Table 3. Configuration of the Long Short-Term Memory (LSTM) Model.

Category	Parameter
Model Architecture
First LSTM Layer	1 layer with 128 hidden units
Dropout Layer	30% dropout rate
Fully Connected Layer	16 neurons, ReLU activation
Output Layer	1 neuron, linear activation
Training Settings
Loss Function	Mean Squared Error (MSE)
Optimizer	Adam optimizer, learning rate = 0.001
Batch Size	512
Epochs	100
Learning Rate Scheduler	factor = 0.1, patience = 10, min_lr = 1 × 10⁻⁶
Early Stopping	Patience = 10, min_delta = 0.001
Data Configuration
Sequence Length(input window)	100 days
Input Features	6 meteorological variables + 14 static attributes
Training Period	40%: 1 October 1985 to 1 October 1995
Validation Period	20%: 1 October 1995 to 1 October 2000
Test Period	40%: 1 October 2000 to 1 October 2010

Table 4. Model Performance Metrics Under Various Noise and Denoising Scenarios.

Group	Experiment Label	Mean NSE	Median NSE	Mean RMSE	Median RMSE
Raw Data	-	0.3506	0.6785	1.3627	1.1526
Noise 1	NE1	0.1300	0.6410	1.4510	1.2380
Group 1-1	NE1AE96	0.4010	0.6760	1.3750	1.1910
Group 1-2	NE1AE63	0.2970	0.5990	1.5380	1.2640
Noise 2	NE2	0.2050	0.6440	1.4470	1.2600
Group 2-1	NE2AE96	0.4560	0.6770	1.3740	1.1660
Group 2-2	NE2AE63	0.3860	0.6220	1.4950	1.2510
Noise 3	NE3	0.1730	0.6510	1.4400	1.2400
Group 3-1	NE3AE96	0.4090	0.6690	1.3770	1.1900
Group 3-2	NE3AE63	0.3230	0.6110	1.5110	1.2720
Noise 4	NE4	0.1190	0.6410	1.4520	1.2300
Group 4-1	NE4AE96	0.3850	0.6700	1.3760	1.1780
Group 4-2	NE4AE63	0.2990	0.5920	1.5340	1.2920

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, T.; Ouyang, W.; Adnan, M.; Zhang, C. Analyzing the Impact of High-Frequency Noise on Hydrological Runoff Modeling: A Frequency-Based Framework for Data Uncertainty Assessment. Water 2026, 18, 195. https://doi.org/10.3390/w18020195

AMA Style

Liu T, Ouyang W, Adnan M, Zhang C. Analyzing the Impact of High-Frequency Noise on Hydrological Runoff Modeling: A Frequency-Based Framework for Data Uncertainty Assessment. Water. 2026; 18(2):195. https://doi.org/10.3390/w18020195

Chicago/Turabian Style

Liu, Tianxu, Wenyu Ouyang, Muhammad Adnan, and Chi Zhang. 2026. "Analyzing the Impact of High-Frequency Noise on Hydrological Runoff Modeling: A Frequency-Based Framework for Data Uncertainty Assessment" Water 18, no. 2: 195. https://doi.org/10.3390/w18020195

APA Style

Liu, T., Ouyang, W., Adnan, M., & Zhang, C. (2026). Analyzing the Impact of High-Frequency Noise on Hydrological Runoff Modeling: A Frequency-Based Framework for Data Uncertainty Assessment. Water, 18(2), 195. https://doi.org/10.3390/w18020195

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Analyzing the Impact of High-Frequency Noise on Hydrological Runoff Modeling: A Frequency-Based Framework for Data Uncertainty Assessment

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset and Variable Selection

2.2. Noise Typology Based on Temporal and Physical Characteristics

2.3. Synthetic Noise Injection Strategy

2.4. Adaptive Denoising Design: From EWMA to AEWMA

2.5. LSTM Model Configuration and Experimental Control

2.6. Evaluation Metrics

3. Results and Discussion

3.1. Noise Injection and Denoising Results

3.2. LSTM Results Under Noise and Denoising

3.3. Distributional Characteristics of Prediction Errors Under Noise and Denoising

3.4. Frequency-Domain Results via PSD Analysis

3.5. Discussion

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI