Next Article in Journal
Hydrogeologic and Agricultural Drivers of Groundwater Salinity, Boron, Selenium, and Nitrate in Wister Unit, Eastern Salton Sea, California
Previous Article in Journal
Investigating the Influence of Geological Uncertainty on Urban Hydrogeological Modeling
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Online Monitoring of Heavy Metals in Groundwater: A Case Study of Dynamic Behavior, Monitoring Optimization and Early Warning Performance

1
State Key Laboratory of Soil Pollution Control and Safety, School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
2
MEE Key Laboratory of Integrated Surface Water-Groundwater Pollution Control, School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
3
Guangdong Ecological and Environmental Monitoring Center, Guangzhou 518049, China
4
Zhejiang Environment Technology Co., Ltd., Hangzhou 310020, China
*
Authors to whom correspondence should be addressed.
Hydrology 2026, 13(2), 57; https://doi.org/10.3390/hydrology13020057
Submission received: 23 December 2025 / Revised: 20 January 2026 / Accepted: 27 January 2026 / Published: 2 February 2026

Abstract

Groundwater heavy metal contamination (GHMC) has drawn significant attention in China over recent decades due to industrialization. However, effective monitoring and early warning remain global challenges because of the limited understanding of heavy metal behavior in groundwater. This study conducts a detailed comparative analysis of heavy metals and conventional indicators using a long-term, high-frequency online monitoring program. Groundwater online monitoring is an automated system for real-time, continuous collection, and transmission of indicators via sensors and IoT platforms. Conventional indicators refer to the priority parameters used to assess basic water quality, hydrological characteristics and health risks in routine monitoring. Nineteen heavy metals and ten conventional indicators were monitored simultaneously, generating approximately 1.6 million data points over three years. The time series data show that online monitoring effectively captures abnormal changes in heavy metal levels. Abnormal heavy metal fluctuations appear as sharp, isolated spikes lasting at least several hours, while conventional indicators exhibit high-amplitude variations lasting over 30 h—indicating that heavy metal changes are harder to detect in a timely manner. Long-term comparisons also reveal low consistency between heavy metals and conventional indicators, supporting the need for independent heavy metal monitoring. In contrast, strong consistency among heavy metals suggests opportunities to streamline monitoring by selecting representative elements. Monitoring frequency optimization shows that daily measurement is sufficient for heavy metals, which is slightly more frequent than the typical three-day interval for most conventional indicators. Long-term data enable reliable early warnings for both indicator types, with predictions closely matching field observations. However, heavy metal alerts are shorter and less frequent than those for conventional indicators. Integrating both types into a unified early warning system enhances its comprehensiveness, accuracy and timeliness. This study provides a solid scientific foundation for efficient GHMC monitoring and early warning in groundwater in areas under the influence of industrial activities.

1. Introduction

Groundwater is an important natural resource and an integral part of the global water cycle and water environment, which plays an important role in the sustainable development of human society. Globally, approximately two billion people depend on groundwater as their principal source of drinking water. Moreover, it meets around one-third of industrial water requirements and accounts for nearly 40% of global agricultural water consumption [1]. However, with the development of industrial and agricultural activities, the contamination of groundwater by heavy metals has gradually deteriorated, leading to a substantial decrease in groundwater quality [2,3,4,5]. Exposure to heavy metals, including cadmium and chromium, has been consistently associated with a significantly elevated risk of various cancers in humans, particularly lung and prostate cancer [6,7]. In China, groundwater is notably affected by heavy metal contamination, particularly in regions with the distribution of industrial parks. This issue has garnered substantial attention and undergone in-depth research for the objective of pollution risk control and management [8,9,10,11].
The key to prevent and control heavy metal contamination in groundwater lies in timely detection and understanding the situation of heavy metals in groundwater systems. Currently, the majority of groundwater heavy metal investigations rely on manual sampling with a frequency of 2–4 times per year, which is unable to promptly detect and provide early warning for pollution events. Online monitoring with high frequency can reveal short-term trends and fluctuations, and has been progressively applied to detect the temporal changes in heavy metal pollution in aquifers [12,13,14,15]. Online monitoring integrated with sophisticated data analysis techniques have greatly enhanced groundwater environmental management and protection [16,17,18]. Online monitoring uses sensors, wireless communication, and IoT platforms to continuously and automatically collect, transmit, and analyze key parameters like water level and water quality in real time.
However, effective monitoring and early warning for GHMC remains an international challenge due to the lack of understanding of the behaviors of heavy metals in groundwater systems. Efficient monitoring and early warning for groundwater environments rely on establishing scientifically sound monitoring strategies. Although extensive research has been conducted on the transport and early warning of groundwater pollution, there is still a lack of in-depth understanding of the response characteristics of heavy metal indicators in groundwater [18,19,20]. A series of scientific issues remain to be resolved, including the timeliness of online monitoring of heavy metals, monitoring frequency, early warning mechanisms, and so on. Publications addressing these topics are relatively scarce and currently insufficient to form scientific support for the efficient and precise regulatory and early warning of heavy metal contamination in groundwater.
This research conducts a systematic investigation into the challenges associated with establishing an effective online monitoring and early warning system for GHMC through an analysis of the long term and high-frequency behaviors between heavy metals and conventional indicators. Firstly, the study assesses the efficacy of online heavy metal monitoring. Subsequently, it identifies the effective conventional indicators and refines the heavy metal indicators for online monitoring, and explores the feasibility of substituting heavy metal indicators with conventional indicators. Thereafter, the study optimizes and compares the online monitoring frequencies of these two types of indicators. Finally, based on long-term online data, the early warning performance of both heavy metals and conventional indicators are analyzed. The study concludes with a presentation of the findings and discussions.

2. Site Background

2.1. Study Area

The study area is located at a chemical industrial park with 16 enterprises in operation, among which 10 enterprises involved in the pharmaceutical, electroplating, and textile industries significantly pollute groundwater due to inappropriate development practices and high pollutant emissions, as shown in Figure 1. Groundwater pollution accidents have occurred frequently in the industrial park in recent decades and have led to wide social and public attention. Previous studies indicated that quite a few groundwater indicators exceed the national standards, including heavy metals such as antimony and nickel. Efforts of groundwater pollution control have been intensively performed at the industrial park area, which indicates that online monitoring and early warning are needed for efficient groundwater pollution control and management.
The aquifer and aquiclude within the study area can be classified into three distinct strata: the upper layer consists of a 1–3 m thick artificial fill of silt with gravel, followed by a layer of sandy clay around 2–4 m thick, which is the main pore water aquifer in the area, and the lower layer is a set of muddy clay above 8 m thick, which constitutes the bottom aquiclude. The aquifer is primarily recharged by atmospheric precipitation, which has an average annual rainfall of 1649.6 mm. Groundwater then moves laterally over a short distance and discharges by seepage into nearby surface rivers or drainage channels and evaporates. The groundwater table in the upper aquifer is typically found at a depth of 1.5 to 2 m below the surface, with an annual fluctuation range of approximately 1 m. Contaminated groundwater infiltrates quickly into the surface water bodies, potentially leading to significant adverse social and environmental impact.
Two online monitoring wells have been established in the shallow pore water aquifer to monitor the groundwater changes with the online monitoring system adopted in this study (Figure 1). A total of 10 conventional indicators are monitored in both of the two wells, including water level, water temperature, pH, conductivity, turbidity, dissolved oxygen, ammonia nitrogen, chemical oxygen demand (COD), total phosphorus (TP), and total nitrogen (TN). Among them, water level, water temperature, conductivity, and turbidity are classified as physical indicators of water quality, whereas pH, dissolved oxygen, ammonia nitrogen, chemical oxygen demand (COD), total phosphorus (TP), and total nitrogen (TN) are considered chemical indicators. Additionally, nineteen heavy metal indicators have been monitored for the purpose of comparison studies, including copper, mercury, cadmium, arsenic, lead, nickel, zinc, manganese, iron, silver, beryllium, selenium, boron, molybdenum, barium, cobalt, thallium, antimony, and aluminum. Among these elements, arsenic, boron, and antimony are classified as metalloids, while selenium is categorized as a non-metal. Nevertheless, in the context of groundwater monitoring in China, all are commonly incorporated into the regulatory framework for heavy metal pollution control. Heavy metal indicators in groundwater refer to monitoring parameters that characterize the concentrations of toxic, non-biodegradable, and bioaccumulative heavy metal elements and their compounds. These indicators serve as a critical basis for assessing the extent of heavy metal contamination in groundwater and its associated human health risks.
Among all the indicators of this study, only electrical conductivity and dissolved oxygen are not included in the groundwater quality standard (GB/T 14848-2017), while the remaining indicators are all present.
Data for each indicator are collected at an hourly interval of frequency.

2.2. Data Collection

Data collection at the wells took place from 1 October 2021 to 10 October 2024. Measurements were taken hourly to meet the requirements of real-time monitoring of the variations in the heavy metal and conventional indicators. A total of 1,538,856 data points were collected, with specific indicators listed in Table 1.
Collected data were normalized using the Z-score method. Z-score (standard score) is a statistical measurement that quantifies the distance of a given data point from the mean of the dataset, expressed in units of standard deviations. The formula is as follows:
Z = (Xμ)/σ
Specifically, X denotes a data point, μ denotes the population mean, and σ denotes the standard deviation.

3. Method

To ensure the accuracy of the analysis outcomes, the original data necessitates essential pre-processing steps, encompassing data cleaning and correction, before commencing the formal analysis. Outlier data points induced by equipment malfunctions are eliminated through the machine identification approach to avoid distorting the statistical characteristics. To tackle data gaps arising from power outages, linear interpolation is employed to preserve the integrity of the time series. After these pre-processing procedures, the cleaned data set is subjected to systematic visual verification to identify and rectify any remaining anomalies. These comprehensive pre-processing measures substantially enhance the data quality and lay a solid foundation for subsequent time series analysis and modeling tasks.

3.1. Online Monitoring and Related Correlation Analysis Methods

The data sources for this study are entirely derived from online monitoring. The online monitoring wells consist of an online automatic analysis instrument as the core component, integrated with modern sensor technology, automatic measurement techniques, automatic control systems, computer application technologies, specialized system management and analytical software, and communication networks. This platform is primarily composed of seven components: online monitoring instruments, a central control unit, a safety monitoring system, a container monitoring platform, an automatic sampling system, a monitoring station power supply system, and groundwater monitoring wells. This online heavy metal analysis system is based on an ICP-MS analyzer and integrates sample extraction, digestion, and dedicated software. It uses ICP-MS for elemental analysis and enables automated pre-treatment, real-time quality control, qualitative and quantitative detection, and remote operation, allowing accurate, efficient monitoring of heavy metals in water while meeting national and industry standards. The detection limits of this system are as follows: beryllium ≤ 0.16 μg/L, boron ≤ 5.0 μg/L, manganese ≤ 0.48 μg/L, iron ≤ 3.28 μg/L, cobalt ≤ 0.12 μg/L, copper ≤ 0.32 μg/L, zinc ≤ 2.68 μg/L, arsenic ≤ 0.48 μg/L, selenium ≤ 1.64 μg/L, molybdenum ≤ 0.24 μg/L, cadmium ≤ 0.20 μg/L, antimony ≤ 0.60 μg/L, mercury ≤ 0.05 μg/L, thallium ≤ 0.08 μg/L, lead ≤ 0.36 μg/L, nickel ≤ 0.24 μg/L, barium ≤ 0.80 μg/L, silver ≤ 0.08 μg/L, and aluminum ≤ 3.28 μg/L
This study employs the Normalized Interquartile Range (NIQR) [21] to evaluate the variability of groundwater data series for the indicators through standardized quartile comparisons, thereby enabling comparability across different conditions. The NIQR is calculated as NIQR = 0.7413 × IQR, where IQR (Interquartile Range) is defined as the difference between the third quartile (Q3) and the first quartile (Q1). This scaling process eliminates unit differences and enhances robustness against outliers.
Simultaneously, the Spearman rank correlation coefficient [22] is applied to refine indicator selection by eliminating highly correlated data series. As a non-parametric statistical method, Spearman’s correlation assesses monotonic relationships based on the ranks of variables and assigns average ranks to tied values, without requiring assumptions of data normality or linearity.

3.2. Optimization of Monitoring Frequency

The downsampling approach was used to analyze and optimize the online monitoring frequency, with time intervals systematically varied from 1 h to 240 h based on the original hourly data series.
The objective was to determine at which intervals critical information would be lost, thereby enabling the identification of an optimized frequency range and corresponding time period. This study adopts the direct decimation method for downsampling. The direct decimation method, also referred to as decimation, is a fundamental technique in signal processing that reduces data resolution by selectively retaining samples at regular intervals [23]. Mathematically, given a decimation factor M, the downsampled signal y(n) can be derived from the original sequence x(n) as follows:
y(n) = x(nM), nZ
Three quantitative evaluation metrics were used to assess discrepancies between the downsampling and the original data sequence, including the Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and the coefficient of determination (R2). The evaluated frequency intervals spanned from 2 to 240 h. At each interval, the corresponding data sequence was systematically compared with the original hourly data sequence to quantify discrepancies introduced by reduced sampling frequency.
M A E = 1 n i = 1 n | y i y |
R M S E = 1 n i = 1 n ( Y i y i ) 2
R 2 = 1 M S E V a r ( Y )
Let n represent the total number of observations, where Yi is the actual value observed for the i-th data point, and yi corresponds to the value generated for the i-th position through sparse interpolation of the original sequence. Y denotes the average of all observed values, while Var(Y) represents the variance of these observed values.
The outcomes of the three mathematical methods were computed independently, and corresponding scatter plots were constructed. Through the analysis of the dispersion patterns within these scatter plots, the optimal interval was ascertained. In the event that a sudden surge in dispersion is detected at a particular time point, that time point is designated as the maximum allowable optimization interval. Each mathematical method yields its own interval value. The intersection of these three intervals, which signifies the common acceptable range among all methods, is chosen as the final optimized monitoring frequency.

3.3. Early Warning

High-frequency water quality monitoring generates large-scale, high-dimensional datasets that often exhibit non-Gaussian and complex distributions. Traditional anomaly detection methods, which rely on strict distributional assumptions or suffer from high computational costs, are often inadequate for such data. To address these challenges, this study employs the Isolation Forest (iForest) algorithm to perform the modeling and early warning. As an unsupervised, ensemble-based method, iForest does not require a priori assumptions about data distribution and is computationally efficient with large datasets. These attributes make iForest a robust and highly suitable choice for the timely and accurate identification of anomalies in water quality data [24,25].
Isolation Forest isolates anomalies by drawing multiple subsamples of size ψ , and for each subsample recursively building isolation trees (iTrees) by randomly selecting a feature j and split value p until a maximum depth h max or a leaf with one instance is reached; the path length of a sample x in a tree is denoted h(x) and normalized by:
c ( ψ ) = 2 H ( ψ   -   1 )   - 2 ( ψ - 1 ) ψ ,   H ( i )     ln ( i ) + γ
to account for subsample size; with the average path length E[h(x)] over all trees, the anomaly score is:
s ( x , ψ ) =   2 - E [ h ( x ) ] c ( ψ )
where scores near 1 indicate anomalies, and samples with s ( x , ψ )   >   τ   ( e . g . ,   τ = 0.5 ) are flagged as outliers.
To evaluate model performance and mitigate overfitting, the dataset was partitioned into a training set (70%) and a testing set (30%). To address the inherent class imbalance where anomalies are rare, Stratified Sampling was employed. This technique ensures that the proportion of anomalies in both the training and testing sets mirrors that of the original dataset, which is crucial for unbiased model evaluation. Furthermore, optimal hyperparameters for the Isolation Forest model, such as contamination, were tuned on the training set using a Grid Search algorithm with 5-fold cross-validation to maximize performance. The model’s final performance was assessed on the independent test set using three key metrics: Precision, Recall, and F1-Score.
The evaluation metrics are defined by the following equations:
Precision =   TP TP + FP
Recall =   TP TP + FN
F 1 - Score = 2   ×   Precision × Recall Precision + Recall
These metrics are derived from four fundamental quantities of the confusion matrix:
True Positives (TP): Anomalies correctly identified as anomalies.
False Positives (FP): Normal instances incorrectly identified as anomalies.
True Negatives (TN): Normal instances correctly identified as normal.
False Negatives (FN): Anomalies incorrectly identified as normal.
A dedicated Isolation Forest (iForest) model was constructed for each monitored indicator. For all models, the number of base estimators (n_estimators) was set to 100. This value is widely adopted as an effective trade-off between performance and computational cost, enhancing model stability by reducing the variance of ensemble predictions. The contamination parameter, which represents the expected proportion of anomalies, was tuned individually for each indicator’s model via the Grid Search process [26].

4. Results

4.1. Behaviors of the Long-Term Data Series

Figure 2 shows the time series of 10 monitored heavy metals at Well 1 and Well 2, respectively. The dataset was normalized using Z-score standardization to ensure comparability across variables. It can be seen that the concentration of heavy metals is generally at a normal level with low values for most of the three monitored years. However, abnormal changes far beyond the normal values can be clearly observed, indicating that groundwater has been contaminated by heavy metals. The abnormal changes in heavy metals last from several to over a thousand hours before gradually returning to normal values. The abnormal changes lasting several hours suggest that the monitoring frequency larger than several hours might lose such information.
Noteworthily, there is a significant synergy in the abnormal changes in multiple heavy metals, indicating that monitoring one or a few heavy metals is sufficient to reflect these changes and thus be used for early warning.
The monitored data of conventional indicators in Figure 3 were also standardized using the Z-score method for normalization. Frequent abnormal changes were observed in the long-term curves of most conventional indicators. These abnormal changes persisted for a duration ranging from 30 to over a thousand hours. For these indicators, both evident increasing and decreasing trends in concentrations were detected, suggesting that the groundwater environment has been influenced by the input and dilution of contaminants. Similarly, for several indicators such as Turb., WL, NH3, and COD, the abnormal changes exhibited synchronicity over multiple time intervals. The synchronous concentration changes in multiple indicators enhance the reliability of detecting groundwater contamination, thus validating the effectiveness of online monitoring and early warning systems for groundwater contamination.
Figure 4 presents the comparisons of the long-term curves between two heavy metals, Al and Pb, and three conventional indicators, NH3, COD and H. It can be seen that generally, the abnormal changes in heavy metals are shorter in duration and fewer in frequency than those of the conventional indicators, with the exception of NH3. The curves of heavy metals and NH3 demonstrate characteristics of isolated spikes with the abnormalities lasting several hours at minimum. In contrast, the curves of the conventional indicators exhibit high-amplitude abnormalities with a duration more than 48 h at minimum, suggesting that it is more difficult to capture the changes in heavy metals in a timely manner compared to the conventional indicators. Moreover, the comparisons of the data series reveal poor consistency between the dynamic changes in heavy metals and conventional indicators, indicating the necessity of independent monitoring for heavy metals in addition to the conventional indicators.
The normalized quartile range (NIQR) is calculated to analyze the variation degree of data series for all monitored indicators as shown in Figure 5. It can be seen that apart from the conventional indicators of Turb. and TN at Well 1, the heavy metals of Hg, Ag, and Mn demonstrate higher variability at both Wells, highlighting their sensitivity to external influences and making them effective indicators for monitoring and early warning. The data series variability of conventional indicators is generally lower than that of heavy metals. Heavy metal indicators tend to fluctuate more intensely and instantly, which may explain the disparity in variability between these two categories. The above findings attest that heavy metals are more sensitive to changes in groundwater environment than the conventional indicators.

4.2. Substitutability of Heavy Metal Monitoring

The majority of the long-term behaviors of the monitored data demonstrate the efficacy of online monitoring in detecting the abnormal variations in groundwater. Nevertheless, concurrently monitoring an excessive number of indicators is not cost-effective in the context of field applications. Thus, optimizing and reducing heavy metal indicators and assessing whether conventional indicators can replace them is a worthwhile scientific endeavor.
Through comparing the long-term data series of the 19 monitored heavy metal indicators, it can be seen that multiple heavy metals have highly synergistic abnormal changes, indicating such indicators can be optimized or reduced to a few of them for cost-effective monitoring of the groundwater. Figure 6 shows that Ni, Al, Co, and Mo are highly consistent with their abnormal fluctuations during the monitored periods. However, such consistencies are not found for any of the indicators between the heavy metals and the conventional indicators. Therefore, the online monitoring of heavy metals cannot be substituted by conventional indicators.
The substitutability can be further clarified using the Spearman correlation coefficient. Figure 7 presents a heat map of the correlation coefficients among the 19 heavy metals and 10 conventional indicators at Well 1. It can be seen that Sb, Ba, and Mo displayed strong correlations with several other heavy metals. Moreover, the TP, EC, and DO manifested significant correlations among the conventional indicators. Nevertheless, the correlations between the heavy metals and the conventional indicators were relatively weak. Therefore, both the heavy metals and the conventional indicators can be optimized to a limited number of indicators for online monitoring. However, the online monitoring of the heavy metals cannot be substituted by conventional indicators.

4.3. Optimization of Online Monitoring Frequency

The frequency of online monitoring is another issue that needs to be addressed. The optimized frequency can be obtained by comparing data series of different frequencies with the original hourly data series by the time-domain downsampling techniques. The optimal frequency is one of the lowest frequencies with a data series having the smallest signal loss compared to the original hourly data series. Figure 8 presents a comparison between the original hourly data sequence and the downsampled data sequences at sampling intervals of 1 h, 2 h, 12 h, 24 h, 48 h, 72 h, 120 h, and 240 h. It can be seen that as the sampling interval increases, certain monitored anomalies exhibit loss of information for both the aluminum and chemical oxygen demand (COD) data sequences.
Three error metrics were employed to evaluate the extent of information loss between two data series: Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Coefficient of Determination (R2). Figure 9 shows the calculated results of MAE, RMSE and R2 by comparing the original Co data series at Well 1 with the downsampling data series with frequencies from 2 to 240 h with an interval increase of one hour. The interval of 20–40 h is identified as the optimal frequency range for all three metrics. Within this range, the calculated RMSE and MAE values are relatively low, while R2 achieves its peak value, indicating effective predictive performance with acceptable error levels. Data series with a frequency beyond 40 h will lead to significant errors due to information loss compared to the original hourly data sequence. Similar calculations are performed for pH at Well 1, and the results are shown in Figure 10. It can be seen that the optimal frequency for pH at Well 1 lies within the range between 20 and 30 h.
The optimal monitoring frequencies for all monitored heavy metals and conventional indicators were calculated, with the results presented in Table 2 and Table 3, respectively. As shown, the optimal monitoring frequencies for most heavy metals and conventional indicators fall within the ranges of 25–80 h and 60–100 h, respectively. However, a few conventional indicators exhibit relatively shorter frequencies similar to heavy metals, such as pH, NH3, and Turb. Both types of frequencies are significantly higher than the initial 1 h interval, which is expected to greatly improve the cost-effectiveness of online monitoring applications. In this study, we propose that the monitoring frequency for heavy metal indicators should be maintained at least once per day, whereas most conventional indicators can be monitored at a frequency of once every three days. For indicators of pH, NH3, and Turb, a daily monitoring frequency is recommended.

4.4. Performance of Early Warning

The trained Isolation Forest (iForest) models were used to test the performance of early warning with the long-term online data series. The models’ predictions were quantitatively evaluated against the manually labeled ground truth, using Precision, Recall, and F1-Score calculated for the anomaly class. The performance of the individually trained iForest models for each monitored indicator, along with their best-tuned contamination hyperparameter, are listed in Table 4.
As shown in Table 4, the models exhibit varied performance across different indicators. The model demonstrated excellent performance on the three heavy metal indicators: Ni, Zn, and Mn. In contrast, the model’s performance on COD and Water Level was lower. A common characteristic for both indicators was a low Precision paired with a relatively high Recall.
For heavy metals of Ni, Zn, and Mn, Figure 11 shows a stable baseline, with anomalies appearing as clear, isolated spikes. The predicted anomalies (red markers) align almost perfectly with the monitored anomalies (blue markers), with no significant false positives or false negatives. This high-precision visual match provides an intuitive validation of their outstanding performance metrics in Table 4 with F1-Scores ranging from 0.89 to 0.96.
For Chemical Oxygen Demand (COD) and Water Level indicators, Figure 11 exhibits high-frequency, high-amplitude fluctuations. The plots show that the predicted red markers successfully cover the vast majority of the blue monitored markers. However, a substantial number of red markers also appear on fluctuation peaks where no blue markers are present. This phenomenon perfectly explains the “high Recall, low Precision” combination listed in Table 4. In its effort to capture all true anomalies (high Recall of 0.87 and 0.80), the model was compelled to classify many normal, albeit sharp, fluctuations as anomalous, leading to a large number of false positives (low Precision of 0.48 and 0.45).
Figure 12 shows the weekly anomaly warning heatmap for two conventional indicators of COD and Water Level and three heavy metals of Ni, Zn, and Mn. It can be seen that all monitored indicators exhibit periodic or sporadic anomalous fluctuations over different periods, with COD and Water Level anomalies appearing denser, while heavy metal anomalies present as more distinct peaks.
Both conventional indicators of COD and Water Level, and heavy metal indicators of Ni, Zn, and Mn, provide valuable early warning signals through their anomaly rate fluctuations. Anomalies occurring individually or concurrently in these indicators can serve as critical bases for identifying potential environmental risks, predicting pollution incidents, or detecting system functional abnormalities. Therefore, integrating both categories of indicators into a comprehensive early warning system can significantly enhance the system’s comprehensiveness, accuracy, and timeliness.
Anomalous fluctuations in COD and Water Level often appear more continuous, which may reflect broader, persistent changes in the groundwater. Meanwhile, anomalous fluctuations in heavy metals usually manifest as more discrete, sudden spikes. Their anomalous data may appear more scattered and independent over time, suggesting specific, and sometimes infrequent, discharge events. Their anomalies are more likely to indicate direct impacts from specific industrial discharges, accidental spills, or other sudden pollution sources.
Through comparison, it is suggested that COD and Water Level provide warnings about the background state and macro-pressures in the groundwater system, while heavy metals offer warnings specific to acute discharge events of particular pollutants. In practical applications, integrating both for a multi-indicator comprehensive early warning system can provide a more holistic risk assessment.

5. Conclusions and Discussion

This study demonstrates that the online monitoring system can reliably detect abnormal changes for heavy metals in groundwater systems. Long-term data of heavy metals exhibit distinct characteristics compared to those of conventional indicators, as evidenced by their different abnormal change trends, correlations, and variability. The online monitoring indicators, prediction models, and early warning parameters developed in this study are specific to the study area’s pollution characteristics. In regions with different groundwater pollution sources or hydrogeological conditions, key indicators and dominant heavy metals/metalloids may differ, requiring local adjustments to monitoring and modeling approaches. These two types of indicators are not interchangeable for online monitoring in groundwater. While conventional indicators cannot replace heavy metal indicators, they may serve as supplementary verification for anomalies under certain extreme conditions, facilitating timely intervention measures such as early warnings and traceability. Downsampling analysis of long-term data indicates that the online monitoring frequency for heavy metal indicators can be optimized to once per day, enabling effective detection of discrete and sudden peak anomalies that are typically shorter in duration compared to the more continuous and persistent abnormal fluctuations observed in most conventional indicators. The trained iForest models show that the online monitored data can be applied to perform early warning and capture the pollution events. Integrating both categories of indicators into a comprehensive early-warning system can significantly enhance the system’s comprehensiveness, accuracy, and timeliness.
The study primarily focuses on validating the effectiveness of heavy metal online monitoring and early warning based on long-term, high-frequency monitoring data series. It reveals the response differences between heavy metals and conventional indicators under pollution conditions and analyzes the performance characteristics of online monitoring indicators, monitoring frequency, and early warning performance. The research findings can provide data and scientific support for online monitoring and early warning in groundwater.
However, groundwater environmental issues exhibit significant and highly site-specific characteristics, meaning the optimized monitoring indicators and frequency derived from this study also possess site-specific attributes. Some quantitative research results vary depending on different groundwater environments, such as karst aquifer or fracture zone heterogeneous environments. Groundwater pollution in the study area depends on local environmental conditions and the dominant industrial activities. These factors together shape the composition, concentration, and spatiotemporal distribution of pollutants. Additionally, this study lacks research on hydrogeochemical evolution processes that might also affect the long-term monitoring data sequences. The early warning method employed only utilized the iForest model to demonstrate the effectiveness of online monitoring and early warning, which does not imply that the online data are universally applicable to all early warning methods and models. The aforementioned content and limitations warrant further investigation in future studies. This data analysis framework can be extended to other domains, demonstrating its potential for broader scientific and practical applications.

Author Contributions

Conceptualization, S.Y.; Methodology, S.Y., Y.D. and P.H.; Software, P.H.; Validation, Y.L.; Formal analysis, S.Y., Y.D. and Y.L.; Investigation, Y.D.; Resources, S.Y., X.Z. and Y.S.; Data curation, Y.D., Y.L. and Y.S.; Writing—original draft, Y.D. and P.H.; Writing—review & editing, S.Y., X.Z. and Y.S.; Funding acquisition, S.Y. and X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China [41931292], the start-up funds for scientific research of high-level talents in Shenzhen [01296126].

Data Availability Statement

Data will be made available on request.

Acknowledgments

This work is supported by the State Key Laboratory of Soil Pollution Control and Safety, School of Environmental Science and Engineering, Southern University of Science and Technology, the MEE Key Laboratory of Integrated Surface Water-Groundwater Pollution Control, and Guangdong Provincial Key Laboratory.

Conflicts of Interest

Yi Shen was employed by the Zhejiang Environment Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Gleeson, T.; Cuthbert, M.; Ferguson, G.; Perrone, D. Global Groundwater Sustainability, Resources, and Systems in the Anthropocene. Annu. Rev. Earth Planet. Sci. 2020, 48, 431–463. [Google Scholar] [CrossRef]
  2. Gwira, H.A.; Osae, R.; Abasiya, C.; Peasah, M.Y.; Owusu, F.; Loh, S.K.; Kojo, A.; Aidoo, P.; Agyare, E.A. Hydrogeochemistry and human health risk assessment of heavy metal pollution of groundwater in Tarkwa, a mining community in Ghana. Environ. Adv. 2024, 17, 100565. [Google Scholar] [CrossRef]
  3. Sheng, D.; Meng, X.; Wen, X.; Wu, J.; Yu, H.; Wu, M. Contamination characteristics, source identification, and source-specific health risks of heavy metal(loid)s in groundwater of an arid oasis region in Northwest China. Sci. Total Environ. 2022, 841, 156733. [Google Scholar] [CrossRef]
  4. Meng, Z.; Bai, X.; Tang, X. Short−Term Assessment of Heavy Metals in Surface Water from Xiaohe River Irrigation Area, China: Levels, Sources and Distribution. Water 2022, 14, 1273. [Google Scholar] [CrossRef]
  5. Zhao, D.; Wu, Q.; Zeng, Y.; Zhang, J.; Mei, A.; Zhang, X.; Gao, S.; Wang, H.; Liu, H.; Zhang, Y. Contamination and human health risk assessment of heavy metal(loid)s in topsoil and groundwater around mining and dressing factories in Chifeng, North China. Int. J. Coal Sci. Technol. 2023, 10, 8. [Google Scholar] [CrossRef]
  6. Abadi, H.T.; Alemayehu, T.; Berhe, B.A. Heavy metal’s pollution health risk assessment and source appraisal of groundwater and surface water in Irob catchment, Tigray, Northern Ethiopia. Appl. Water Sci. 2024, 14, 201. [Google Scholar] [CrossRef]
  7. Zakir, H.M.; Sharmin, S.; Akter, A.; Rahman, S. Assessment of health risk of heavy metals and water quality indices for irrigation and drinking suitability of waters: A case study of Jamalpur Sadar area, Bangladesh. Environ. Adv. 2020, 12, 100005. [Google Scholar] [CrossRef]
  8. Liu, Y.; Yu, H.; Sun, Y.; Chen, J. Novel assessment method of heavy metal pollution in surface water: A case study of Yangping River in Lingbao City, China. Environ. Eng. Res. 2017, 22, 31–39. [Google Scholar] [CrossRef]
  9. Zhao, Y.-P.; Wu, R.; Cui, J.-L.; Gan, S.-C.; Pan, J.-C.; Guo, P.-R. Improvement of water quality in the Pearl River Estuary, China: A long-term (2008–2017) case study of temporal-spatial variation, source identification and ecological risk of heavy metals in surface water of Guangzhou. Environ. Sci. Pollut. Res. Int. 2020, 27, 21084–21097. [Google Scholar] [CrossRef] [PubMed]
  10. Qiao, J.; Zhu, Y.; Jia, X.; Shao, M.; Niu, X.; Liu, J. Distributions of arsenic and other heavy metals, and health risk assessments for groundwater in the Guanzhong Plain region of China. Environ. Res. 2020, 181, 108957. [Google Scholar] [CrossRef]
  11. Blaen, P.J.; Khamis, K.; Lloyd, C.; Comer-Warner, S.; Ciocca, F.; Thomas, R.M.; MacKenzie, A.R.; Krause, S. High-frequency monitoring of catchment nutrient exports reveals highly variable storm event responses and dynamic source zone activation. J. Geophys. Res.-Biogeosci. 2017, 122, 2265–2281. [Google Scholar] [CrossRef]
  12. Yaroshenko, I.; Kirsanov, D.; Marjanovic, M.; Lieberzeit, P.A.; Korostynska, O.; Mason, A.; Frau, I.; Legin, A. Real-Time Water Quality Monitoring with Chemical Sensors. Sensors 2020, 20, 3432. [Google Scholar] [CrossRef]
  13. Yang, G.X.; Moyer, D.L. Estimation of nonlinear water-quality trends in high-frequency monitoring data. Sci. Total Environ. 2020, 715, 136686. [Google Scholar] [CrossRef]
  14. Zhang, Y.; Li, W.; Wen, W.; Zhuang, F.; Yu, T.; Zhang, L.; Zhuang, Y. Universal high-frequency monitoring methods of river water quality in China based on machine learning. Sci. Total Environ. 2024, 947, 174641. [Google Scholar] [CrossRef]
  15. Alferes, J.; Tik, S.; Copp, J.; Vanrolleghem, P.A. Advanced monitoring of water systems using in situ measurement stations: Data validation and fault detection. Water Sci. Technol. 2013, 68, 1022–1030. [Google Scholar] [CrossRef] [PubMed]
  16. Wongsasuluk, P.; Chotpantarat, S.; Siriwong, W.; Robson, M. Using hair and fingernails in binary logistic regression for bio-monitoring of heavy metals/metalloid in groundwater in intensively agricultural areas, Thailand. Environ. Res. 2018, 162, 106–118. [Google Scholar] [CrossRef] [PubMed]
  17. e Silva, G.M.; Campos, D.F.; Brasil, J.A.T.; Tremblay, M.; Mendiondo, E.M.; Ghiglieno, F. Advances in Technological Research for Online and In Situ Water Quality Monitoring—A Review. Sustainability 2022, 14, 5059. [Google Scholar] [CrossRef]
  18. Barkat, A.; Bouaicha, F.; Ziad, S.; Mester, T.; Sajtos, Z.; Balla, D.; Makhloufi, I.; Szabó, G. The Integrated Use of Heavy-Metal Pollution Indices and the Assessment of Metallic Health Risks in the Phreatic Groundwater Aquifer—The Case of the Oued Souf Valley in Algeria. Hydrology 2023, 10, 201. [Google Scholar] [CrossRef]
  19. Sanad, H.; Moussadek, R.; Dakak, H.; Zouahri, A.; Lhaj, M.O.; Mouhir, L. Ecological and Health Risk Assessment of Heavy Metals in Groundwater within an Agricultural Ecosystem Using GIS and Multivariate Statistical Analysis (MSA): A Case Study of the Mnasra Region, Gharb Plain, Morocco. Water 2024, 16, 2417. [Google Scholar] [CrossRef]
  20. Rahman, S.M.; Masum, H.; Ishtiaque, A.; Sabiha, S.; Abul, H. Water quality index and health risk assessment for heavy metals in groundwater of Kashiani and Kotalipara upazila, Gopalganj, Bangladesh. Appl. Water Sci. 2024, 14, 106. [Google Scholar] [CrossRef]
  21. Kojima, I.; Kakita, K. Comparative Study of Robustness of Statistical Methods for Laboratory Proficiency Testing. Anal. Sci. 2014, 30, 1165–1168. [Google Scholar] [CrossRef][Green Version]
  22. Jiang, J.; Zhang, X.; Yuan, Z. Feature selection for classification with Spearman’s rank correlation coefficient-based self-information in divergence-based fuzzy rough sets. Expert Syst. Appl. 2024, 249, 123633. [Google Scholar] [CrossRef]
  23. Oppenheim, A.V.; Schafer, R.W. Discrete-Time Signal Processing, 3rd ed.; Pearson: Boston, MA, USA, 2019. [Google Scholar]
  24. Liang, J.L.; Liang, Q.; Wu, Z.Q.; Chen, H.; Zhang, S.; Jiang, F. A Novel Unsupervised Deep Transfer Learning Method With Isolation Forest for Machine Fault Diagnosis. IEEE Trans. Ind. Inform. 2024, 20, 235–246. [Google Scholar] [CrossRef]
  25. Yin, H.; Wu, Q.; Yinc, S.; Dong, S.; Daief, Z.; Soltanian, M.R. Predicting mine water inrush accidents based on water level anomalies of borehole groups using long short-term memory and isolation forest. J. Hydrol. 2023, 616, 128813. [Google Scholar] [CrossRef]
  26. Zhang, Z.; Jilili, A.; Jiang, F. Heavy metal contamination, sources, and pollution assessment of surface water in the Tianshan Mountains of China. Environ. Monit. Assess. 2015, 187, 33. [Google Scholar] [CrossRef]
Figure 1. Location of the study area and distribution of online monitoring wells.
Figure 1. Location of the study area and distribution of online monitoring wells.
Hydrology 13 00057 g001
Figure 2. Normalized long-term data series for online monitored heavy metals at Well 1 (left) and Well 2 (right). Synergy in the abnormal changes among different heavy metals are marked by blocks.
Figure 2. Normalized long-term data series for online monitored heavy metals at Well 1 (left) and Well 2 (right). Synergy in the abnormal changes among different heavy metals are marked by blocks.
Hydrology 13 00057 g002
Figure 3. Normalized long-term data series for online monitored conventional indicators at Well 1 (left) and Well 2 (right). Synergy in the abnormal changes among different heavy metals are marked by blocks.
Figure 3. Normalized long-term data series for online monitored conventional indicators at Well 1 (left) and Well 2 (right). Synergy in the abnormal changes among different heavy metals are marked by blocks.
Hydrology 13 00057 g003
Figure 4. Comparison of the long-term data series for heavy metals of Al and Pb and conventional indicators of NH3, COD and Water Level in Well 1 (left) and Well 2 (right). Synergy in the abnormal changes among different heavy metals are marked by blocks.
Figure 4. Comparison of the long-term data series for heavy metals of Al and Pb and conventional indicators of NH3, COD and Water Level in Well 1 (left) and Well 2 (right). Synergy in the abnormal changes among different heavy metals are marked by blocks.
Hydrology 13 00057 g004
Figure 5. The NIQR calculated for the 3 years of data series for all monitored indicators.
Figure 5. The NIQR calculated for the 3 years of data series for all monitored indicators.
Hydrology 13 00057 g005
Figure 6. Highly synergistic abnormal changes in concentrations of cobalt, aluminum, nickel, and molybdenum in Well 1 (left) and Well 2 (right). Synergy in the abnormal changes among different heavy metals are marked by blocks.
Figure 6. Highly synergistic abnormal changes in concentrations of cobalt, aluminum, nickel, and molybdenum in Well 1 (left) and Well 2 (right). Synergy in the abnormal changes among different heavy metals are marked by blocks.
Hydrology 13 00057 g006
Figure 7. Spearman correlation coefficients for the data of 10 monitored conventional indicators and 19 heavy metals at Well 1.
Figure 7. Spearman correlation coefficients for the data of 10 monitored conventional indicators and 19 heavy metals at Well 1.
Hydrology 13 00057 g007
Figure 8. Comparisons of the data sequences with different downsampling times for Al (left) and COD (right) at Well 1. The information on the time domain is lost in the data sequence due to the increase in downsampling time at different frequencies.
Figure 8. Comparisons of the data sequences with different downsampling times for Al (left) and COD (right) at Well 1. The information on the time domain is lost in the data sequence due to the increase in downsampling time at different frequencies.
Hydrology 13 00057 g008
Figure 9. Calculated RMSE, MAE and R2 between the original data series and different downsampling frequencies for Co at Well 1.
Figure 9. Calculated RMSE, MAE and R2 between the original data series and different downsampling frequencies for Co at Well 1.
Hydrology 13 00057 g009
Figure 10. Calculated RMSE, MAE and R2 between the original data series and different downsampling frequencies for pH at Well 1.
Figure 10. Calculated RMSE, MAE and R2 between the original data series and different downsampling frequencies for pH at Well 1.
Hydrology 13 00057 g010
Figure 11. The anomaly detection results on the independent test set for five indicators, comparing the model-predicted anomalies (marked in red) with the manually labeled monitored anomalies (marked in blue).
Figure 11. The anomaly detection results on the independent test set for five indicators, comparing the model-predicted anomalies (marked in red) with the manually labeled monitored anomalies (marked in blue).
Hydrology 13 00057 g011
Figure 12. Time series and weekly anomaly rate heatmap. This chart shows the weekly anomaly rate of each monitoring indicator over time, with colors ranging from light pink (low anomaly rate) to deep red (high anomaly rate).
Figure 12. Time series and weekly anomaly rate heatmap. This chart shows the weekly anomaly rate of each monitoring indicator over time, with colors ranging from light pink (low anomaly rate) to deep red (high anomaly rate).
Hydrology 13 00057 g012
Table 1. Statistical description of data obtained for the monitored indicators at the two wells.
Table 1. Statistical description of data obtained for the monitored indicators at the two wells.
Well/DepthData CategoryMonitored IndicatorInterpolated DateUnit
Well 1/15 m
Well 2/15 m
Conventional indicatorsCOD26,531mg/L
TNmg/L
pHdimensionless
TPmg/L
Ammoniacal Nitrogen (NH3)mg/L
Water Level (H)m
Water Temp. (T)°C
Turbidness (Turb.)NTU
DOmg/L
ECμS/cm
Heavy metalsCopper (Co)μg/L
Mercury (Hg)
Cadmium (Ca)
Arsenic (As)
Lead (Le)
Nickel (Ni)
Zinc (Zn)
Manganese (Mn)
Iron (Fe)
Silver (Ag)
Beryllium (Be)
Selenium (Se)
Boron (B)
Molybdenum (Mo)
Barium (Ba)
Cobalt (Cu)
Thallium (TI)
Antimony (Sb)
Aluminum (Al)
Note: The missing water level data were filled using linear interpolation, with the total data volume increasing from 26,470 to 26,531. No other variables exhibited missing values.
Table 2. The optimized monitoring frequency of heavy metals (unit: hour).
Table 2. The optimized monitoring frequency of heavy metals (unit: hour).
IndicatorOptimized Frequency Range (Well 1)Optimized Frequency Range (Well 2)Final Optimization of Frequency Range
(Adopting the Shortest Duration)
Cu35–65 h40–70 h35 h
Hg35–65 h30–70 h35 h
Cd45–80 h45–80 h45 h
As25–65 h30–50 h25 h
Pb30–65 h35–65 h30 h
Ni35–55 h25–70 h25 h
Zn40–60 h30–60 h40 h
Mn40–80 h35–80 h40 h
Fe30–60 h30–60 h30 h
Ag45–75 h30–70 h45 h
Be35–70 h35–80 h35 h
Se30–60 h25–50 h30 h
B25–60 h25–60 h25 h
Mo35–70 h35–60 h35 h
Ba30–60 h25–55 h30 h
Co20–4020–40 h40 h
Tl45–80 h35–60 h35 h
Sb30–70 h40–80 h30 h
Al25–60 h40–80 h25 h
Table 3. The optimized monitoring frequency of conventional indicators (unit: hour).
Table 3. The optimized monitoring frequency of conventional indicators (unit: hour).
IndicatorOptimized Frequency Range (Well 1)Optimized Frequency Range (Well 2)Final Optimization of Frequency Range
(Adopting the Shortest Duration)
COD70–80 h60–70 h60 h
TP60–70 h60–7060 h
Water Level120–150 h90–130 h90 h
Water Temp.110–130 h90–100 h90 h
DO40–50 h40–50 h40 h
TN120–160 h24–48 h240 h
EC72–120 h48–72 h48 h
pH20–30 h30–48 h20 h
Turb.30–40 h24–48 h24 h
NH324–40 h24–50 h24 h
Table 4. Parameters estimated and evaluation value for the iForest models.
Table 4. Parameters estimated and evaluation value for the iForest models.
IndicatorContamination
(Best Tuned)
PrecisionRecallF1-Score
COD0.45250.480.870.62
Water Level0.27160.450.80.57
Ni0.02440.930.980.96
Zn0.0690.930.940.94
Mn0.05240.890.890.89
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yi, S.; Deng, Y.; Huang, P.; Liu, Y.; Zhang, X.; Shen, Y. Online Monitoring of Heavy Metals in Groundwater: A Case Study of Dynamic Behavior, Monitoring Optimization and Early Warning Performance. Hydrology 2026, 13, 57. https://doi.org/10.3390/hydrology13020057

AMA Style

Yi S, Deng Y, Huang P, Liu Y, Zhang X, Shen Y. Online Monitoring of Heavy Metals in Groundwater: A Case Study of Dynamic Behavior, Monitoring Optimization and Early Warning Performance. Hydrology. 2026; 13(2):57. https://doi.org/10.3390/hydrology13020057

Chicago/Turabian Style

Yi, Shuping, Yi Deng, Pizhu Huang, Yi Liu, Xuerong Zhang, and Yi Shen. 2026. "Online Monitoring of Heavy Metals in Groundwater: A Case Study of Dynamic Behavior, Monitoring Optimization and Early Warning Performance" Hydrology 13, no. 2: 57. https://doi.org/10.3390/hydrology13020057

APA Style

Yi, S., Deng, Y., Huang, P., Liu, Y., Zhang, X., & Shen, Y. (2026). Online Monitoring of Heavy Metals in Groundwater: A Case Study of Dynamic Behavior, Monitoring Optimization and Early Warning Performance. Hydrology, 13(2), 57. https://doi.org/10.3390/hydrology13020057

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop