Next Article in Journal
Utilizing Geoparsing for Mapping Natural Hazards in Europe
Previous Article in Journal
Enhanced Adsorption–Photocatalytic Degradation of the Congo Red Dye in the Presence of the MOF/Activated Carbon Composite Catalysts
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Application of the Self-Organizing Map (SOM) Algorithm to Identify Hydrochemical Characteristics and Genetic Mechanism of High-Nitrate Groundwater in Baoding Area, North China Plain

1
Hebei Provincial Institute of Geological Environment Monitoring, Shijiazhuang 050021, China
2
Hebei Key Laboratory of Geological Resources and Environment Monitoring and Protection, Shijiazhuang 050021, China
3
Geological Environment Monitoring Institute, Jiangxi Geological Survey and Exploration Institute, Nanchang 330006, China
4
Faculty of Environmental Science and Engineering, Kunming University of Science and Technology, Kunming 650500, China
5
Chinese Academy of Geological Sciences, Beijing 100037, China
*
Author to whom correspondence should be addressed.
Water 2025, 17(24), 3517; https://doi.org/10.3390/w17243517
Submission received: 19 October 2025 / Revised: 6 December 2025 / Accepted: 10 December 2025 / Published: 12 December 2025
(This article belongs to the Section Hydrogeology)

Abstract

Nitrate pollution poses a pervasive environmental issue for groundwater systems worldwide, which is particularly pronounced in the agricultural heartlands of the North China Plain. Combining hydrochemical analysis, the Self-Organizing Map algorithm, and Human Health Risk Assessment, 91 shallow groundwater samples were collected to identify the hydrochemical characteristic and the genetic mechanisms of high NO3 concentration groundwater. The SOM analysis identified three distinct hydrochemical clusters. Cluster 1, with the hydrochemical characteristic of HCO3-Ca and HCO3-Mg, is severely contaminated, showing the highest NO3, Ca2+, and TDS. In contrast, the majority of samples fell into Cluster 3, which is characterized by the lowest ion concentrations and an HCO3-Ca type. Cluster 2, characterized by HCO3-Ca/Mg, exhibits an intermediate chemical signature with elevated Na+, Mg2+, and HCO3. Nitrate concentrations varied widely, with 30.43% of collected samples exceeding the anthropogenic pollution threshold. Agricultural activities are identified as the primary nitrate source, with domestic sewage as a secondary contributor. The Human Health Risk Assessment further reveals that long-term exposure poses non-carcinogenic health risks, particularly for children, who are found to be the most vulnerable group. This study provides a hydrogeochemical perspective on nitrogen pollution in shallow groundwater and offers scientific support for sustainable groundwater management in typical agricultural regions worldwide.

1. Introduction

Groundwater serves as a vital freshwater resource, essential for drinking water supply, agricultural irrigation, and industrial processes [1]. With rapid socioeconomic growth and urban expansion, anthropogenic activities have exerted increasing pressure on both groundwater quantity and quality [2,3,4], giving rise to a range of hydro-environmental issues such as groundwater resources depletion, land subsidence, and nitrate contamination [5,6,7]. Groundwater nitrate pollution has emerged as a widespread environmental problem globally. Elevated nitrate concentrations pose considerable threats to ecosystem integrity and public health. Long-term consumption of nitrate-contaminated groundwater is associated with severe health issues, including methemoglobinemia and gastrointestinal disorders [8,9]. Therefore, elucidating the factors controlling nitrate enrichment is critical for developing scientifically sound land-use strategies, implementing effective pollution mitigation measures, and safeguarding groundwater quality.
Groundwater nitrate contamination is strongly linked to intensive agricultural and economic development [10,11]. Due to agricultural over-fertilization and infiltration of untreated wastewater, the Baoding area of the central North China Plain faces severe groundwater nitrate pollution [12]. These anthropogenic inputs have caused a continuous increase in groundwater nitrate concentrations. Furthermore, the regional hydrology is characterized by converging surface and subsurface flows that drain eastward into Baiyangdian Lake, a vital wetland system in the region. While previous studies in the Baoding area have largely addressed general water quality assessment and groundwater recharge–runoff–discharge processes, limited attention has been given to elucidating the specific genetic mechanisms of high-nitrate concentrations and conducting associated Human Health Risk Assessment [13,14]. Therefore, identifying the genetic mechanisms of nitrate pollution is essential for guiding sustainable development and utilization of regional groundwater resources, as well as for protecting ecological security.
Hydrogeochemical indicators serve as important technical tools for addressing scientific challenges in the fields of hydrogeology and groundwater pollution. Cluster analysis is a commonly used method for investigating the hydrochemical characteristics of groundwater [15,16]. However, traditional algorithms have significant limitations in high-dimensional data processing, making it difficult to accurately analyze the core characteristics of data. Currently, the Self-Organizing Map (SOM) method, a type of machine learning algorithm, has been widely applied in water quality modeling due to its superior performance in handling non-linear and high-dimensional hydrochemical data [17,18], and relevant studies provide valuable references for this research.
This study reveals the factors controlling groundwater nitrate distribution in Baoding City, a representative agricultural area in the North China Plain. Based on hydrochemical data from 91 shallow groundwater samples, the SOM algorithm is employed to identify the key hydrochemical processes controlling the spatial variability of elevated nitrate concentrations. The specific objectives areas follows: (1) delineate the spatial patterns of high-nitrate groundwater in the study area; (2) elucidate the hydrochemical mechanisms influencing nitrate enrichment; and (3) assess the associated human health risks posed by nitrate contamination. The findings are expected to provide scientific support for sustainable groundwater management and protection strategies.

2. Geological Setting

The study area is located in the central part of the North China Plain (Figure 1). Tectonically, it belongs to the piedmont sloping plain along the northern section of the eastern flank of the Taihang Mountains, with the terrain generally sloping eastward and northeastward. The region is primarily covered by Quaternary deposits, which gradually thickens from west to east and transition from a single layer to multiple layers, with a thickness of 200~600 m. Lithologically, the piedmont plain consists mainly of sand–gravel and pebble layers, whereas the central–eastern plain is dominated by interbedded sand and cohesive soil.
The study area is characterized by a continental monsoon climate. The multi-year average temperature and annual mean precipitation are 13.4 °C and 498.9 mm, respectively. Over 50% of the annual precipitation is concentrated in July and August. Nine rivers are distributed within the study area, including the Baigou River, Southern Juma River, Tang River, Pu River, Cao River, Ping River, Fu River, Xiaoyi River, and Zhulong River. However, the construction of upstream reservoirs has led to perennial drying of most river channels in the area. As a result, surface water resources are limited, rendering groundwater an indispensable source for sustaining local water supply and socioeconomic activities [13].
The groundwater in the study area is mainly Quaternary unconsolidated porous water. Generally, the aquifers are divided into four aquifer groups, designated as Groups I, II, III, and IV from shallow to deep. Aquifer Groups I and II have good hydraulic connectivity and are regarded as a single aquifer system, i.e., shallow aquifers, with a burial depth of 20~212 m. Aquifer Groups III and IV belong to deep aquifers. Groundwater recharge in the study area mainly includes atmospheric precipitation, agricultural irrigation infiltration, and surface water seepage. The primary discharge pathway is artificial pumping. At the regional scale, the general groundwater flow direction is from northwest to southeast. The hydraulic connectivity between shallow and deep aquifers is weak, and the research focus of groundwater nitrate in this study is mainly on shallow groundwater.

3. Samples and Method

3.1. Samples

A total of 91 groundwater samples were collected in Baoding City in November 2023. This sampling timeframe was intentionally selected to minimize the impact of seasonal variations on hydrochemical test data, thereby enhancing the rigor of the data analysis. All samples were stored in sterile polyethylene bottles. Prior to sampling, the sampling bottles were rinsed three times with the target water body. During sampling, the bottles were filled completely without leaving any air bubbles. After sampling, all sample bottles were sealed with parafilm, thereby ensuring the authenticity of the samples and the reliability of subsequent testing. To maintain the stability of ion components in the samples, pretreatment was conducted simultaneously during the sampling process. For the cation analysis, concentrated nitric acid was added dropwise to adjust the pH to < 2.
The sample analysis included on-site testing and laboratory testing. On-site testing parameters comprised pH and total dissolved solids (TDSs). These parameters were measured with high precision using an SX-620 pH meter (Shanghai Sanxin Instrument Co., Ltd., Shanghai, China) and an SX-650 conductivity/TDS meter (Shanghai Sanxin Instrument Co., Ltd., Shanghai, China), respectively. Laboratory testing primarily focused on the determination of cation and anion concentrations in groundwater. Major cations (Na+, K+, Ca2+, Mg2+) were analyzed using an inductively coupled plasma optical emission spectrometer (ICP-OES; Perkin-Elmer Optima 5300DV, Waltham, Massachusetts, USA). Major anions (Cl, SO42−, NO3) were determined via ion chromatography (ICS-2500, Dionex, Sunnyvale, California, USA). The HCO3 concentration was measured through acid–base titration using 0.02 mol/L H2SO4.
The reliability of laboratory-derived hydrochemical analysis results was validated using the charge balance error (CBE), as defined in Equation (1). If the CBE < 10%, the hydrochemical analysis results are considered reliable; if the CBE ≥ 10%, their reliability is deemed questionable [17,19]. The CBE values are calculated for all collected groundwater samples. The results show that the CBE values of all samples range from 0.96% to 3.85%, and the absolute value of CBE for all samples is below 10%. This confirms the authenticity and reliability of the data, which meets the accuracy requirements for subsequent analysis and research and provides a solid data foundation for further hydrochemical analysis.
C B E ( % ) = c a t i o n s a n i o n s c a t i o n s + a n i o n s × 100 %

3.2. Self-Organizing Map

Traditional clustering methods, such as K-means clustering and principal component analysis, are unable to directly identify the intrinsic structure of high-dimensional hydrochemical data. They tend to overlook the spatial relationships between clusters and lack the ability to capture the hierarchical structure of data, making it difficult to identify the dominant factors influencing hydrochemical characteristics. The SOM algorithm, a machine learning algorithm, provides a robust solution to address the aforementioned challenges [20].
The SOM algorithm, developed by Kohonen [21], is a clustering technique that facilitates the analysis of complexity and variability in hydrochemical data. A key advantage of the SOM algorithm is its ability to project and visualize high-dimensional data into a low-dimensional space while retaining the topological structure of the original data [15]. Within the clustering matrix, a smaller topological distance between neurons corresponds to a higher degree of similarity between them. If groundwater samples fall into the same neuron, this implies they share analogous hydrochemical characteristic. Differences and correlations in hydrogeochemical properties among samples can be visualized via the weight values of variables and the distribution patterns of samples within the SOM matrix [16,18].
The learning process of the SOM algorithm comprises two core stages, including self-organizing training and feature map generation. Furthermore, the computational workflow of the SOM could be divided into three specific steps: (1) the number of neurons is computed using the formula 5 n , where n denotes the total number of groundwater samples [17]; (2) during the partitioning of the neural matrix, the minimum Quantization Error (QE) and Topographic Error (TE) are used to identify the optimal number of neurons [20]; and (3) the minimum Davies–Bouldin Index (DBI) is applied to determine the optimal number of clusters [15].
In the present study, a total of 91 groundwater samples were analyzed by the SOM approach. Ten hydrogeochemical parameters were incorporated into the SOM computations, including K+, Ca2+, Na+, Mg2+, HCO3, SO42−, Cl, NO3, pH, and TDS. The SOM algorithm was implemented using MATLAB software R2022a [21].

3.3. Human Health Risk Assessment

Human Health Risk Assessment (HHRA) serves as a methodological framework for establishing the causal nexus between groundwater contamination and adverse human health outcomes, thereby facilitating the quantitative characterization of potential exposure risks [15]. This assessment framework fully accounts for populations across different age groups and genders, encompassing children, adult females, and adult males. Among its key analytical steps, the hazard quotients for oral exposure (HQOral) and dermal exposure (HQDermal) can be derived using the following equations [12]:
H Q O r a l = I O r a l R f D O r a l ,   I O r a l = c i × I R × E F × E D B W × A T
H Q D e r m a l = I D e r m a l R f D O r a l × A B S g i ,   I D e r m a l = c i × K s p × S a × T × E V × C F × E F × E D B W × A T
The empirical parameters involved in the above equation are presented in Table 1 [12]. For groundwater-related health risk assessment, the total non-carcinogenic health risk associated with groundwater utilization may be calculated using the following equation [12],
H I T o t a l = i = 1 n H I i ,   H I i = H Q O r a l + H Q D e r m a l

4. Results

4.1. SOM Clustering Results

The ten different variables of 91 groundwater samples are selected to conduct the SOM analysis, including K+, Ca2+, Na+, Mg2+, HCO3, SO42−, Cl, NO3, pH, and TDS. An SOM with 7 × 7 topology is developed for sample classification. As presented in Figure 2, yellow and blue colors correspond to the high and low values of neuron parameters, respectively.
Color gradient variations across the ten variables offer an effective approach to analyzing hydrochemical processes. By comparing color gradients of different hydrochemical indicators, four distinct relationships are readily identifiable. Firstly, Na+ and Cl exhibit analogous color gradients, decreasing from the upper-right corner to the lower-left corner (Figure 2a), suggesting that they share a common source [15]. During water–rock interactions, halite dissolution serves as their primary source. Secondly, Na+ and Mg2+ both show a decreasing trend from the upper-right to the lower-left corner of the map (Figure 2a), indicating that they are regulated by the same geochemical process. This process is mainly driven by the weathering and hydrolysis of silicate minerals (e.g., albite) [17]. It may also be associated with the retention of Mg2+ released via dolomite dissolution and the subsequent combination of this retained Mg2+ with exogenous Na+ [17]. Thirdly, the color gradients of Na+ vs. SO42− and Ca2+ vs. HCO3 display obvious differences, suggesting the potential occurrence of cation exchange or carbonate precipitation. Fourthly, when Ca2+ and SO42− share similar color gradients, this indicates the occurrence of gypsum dissolution or precipitation. This further implies an additional source for these two ions.
As presented in Table 2, the calculated Davies–Bouldin Index (DBI) values indicate that all groundwater samples collected from the study area are effectively classified into three distinct clusters, with significant differences in physicochemical parameters between clusters. Specifically, six samples (6.59% of the total) are classified into Cluster 1, which is characterized by high Ca2+ concentrations. This cluster also exhibits the highest TDS among the three clusters, and its NO3 concentration exceeds the permissible limits. Thus, Cluster 1 is categorized as “high NO3 and Ca2+ concentrations with the highest TDS”. Cluster 2 comprises 21 samples (23.08% of the total samples) and is defined by “high Na+, Mg2+, and HCO3 concentrations”. The largest group, Cluster 3, is characterized by low Ca2+, low HCO3, and low TDS, including 64 samples (70.33% of the total samples). It also has the lowest NO3 concentration among the three clusters, leading to its classification as “low concentrations of Ca2+, HCO3, and NO3 with low TDS”.

4.2. Hydrochemical Results

The statistical summary of groundwater physicochemical parameters is provided in Table 3, which serves as a key indicator of the overall hydrochemical characteristics. To offer a comprehensive depiction of physicochemical characteristic across the study area, Table 1 and Figure 3 and Figure 4 present an integrated analysis combining statistical and visual representations. Cluster 1 exhibits a narrow pH range from 7.0 to 7.3, averaging 7.2. The mean value of TDS is 719 mg/L, ranging from 527 mg/L to 1172 mg/L. Notably, this cluster shows the highest NO3 concentration among all clusters, with the concentrations varying from 9.7 mg/L to 68.0 mg/L and a mean value of 34.8 mg/L. The hydrochemical characteristics are predominantly HCO3-Ca and HCO3-Mg. In Cluster 2, pH values range from 7.1 to 7.7, with an average of 7.3. TDS varies between 457 mg/L and 1128 mg/L, with an average value of 609 mg/L. Similar to the Cluster 1 samples, the hydrochemical characteristic in this cluster is also characterized by HCO3-Ca and HCO3-Mg types. Cluster 3 is distinguished by pH values between 7.2 and 8.1, averaging 7.5. It has the lowest TDS levels among all clusters, ranging from 219 mg/L to 487 mg/L with a mean of 351 mg/L. The dominant hydrochemical characteristic is HCO3-Ca. Correspondingly, this cluster also exhibits the lowest NO3 concentrations, ranging from below the detection limit to 20.0 mg/L, with an average of 6.2 mg/L.
Figure 4 displays the Pearson correlation coefficients among various physicochemical parameters measured in groundwater samples. Conducting a quantitative analysis of inter-ionic correlations aids in deciphering key hydrogeochemical processes, tracing ion origins, and understanding solute transport mechanisms within groundwater systems. When two ions show a significant positive correlation in concentration, it often implies a shared source or co-migration behavior. Conversely, a negative correlation typically reflects competitive interactions or inverse geochemical pathways.

5. Discussion

5.1. Hydrochemical Driven Factors

Ionic ratios are widely employed as an effective method for identifying hydrochemical processes [27,28]. To distinguish the dominant hydrochemical process controlling hydrochemical characteristics, the Gibbs diagram is used to evaluate the effect of evaporation, water–rock interaction, and atmospheric precipitation on hydrochemical characteristics. As shown in Figure 5, the Na+/(Na+ + Ca2+) ratios of groundwater samples in Cluster 1 range from 0.10 to 0.36, while their Cl/(Cl + HCO3) ratios are 0.12~0.39. For Cluster 2 and Cluster 3, the Na+/(Na+ + Ca2+) ratios are 0.10~0.65 and 0.06~0.79, respectively, and the Cl/(Cl + HCO3) ratios are 0.07~0.59 and 0.03~0.41, respectively. All groundwater samples are predominantly clustered in the central region of the Gibbs diagram, which indicates that water–rock interaction is the primary process controlling the hydrochemical characteristics of groundwater.
To further elucidate the processes of water–rock interaction, this end-member plotting is introduced to identify the specific rock types involved in the hydrochemical processes [29,30], including evaporites, silicate rocks, and carbonate rocks. The diagnostic binary diagrams (Mg2+/Na+ vs. Ca2+/Na+ and HCO3/Na+ vs. Ca2+/Na+) are analyzed to identify the dominant rock types controlling groundwater hydrochemical characteristics. As presented in Figure 6, all samples in the three clusters are distributed within the transitional zone between silicate rocks and carbonate rocks. Specifically, the Cluster 2 samples are closer to the silicate rock-dominated area, while the Cluster 3 samples are nearer to the carbonate rock-dominated area. This distribution pattern indicates that silicate weathering and carbonate dissolution constitute the primary sources of ionic constituents in the groundwater.
The ionic ratios and Pearson’s correlation analysis are used to identify the hydrochemical processes [31,32]. As shown in Figure 7a, most samples are distributed close to the 1:1 line for Na+/Cl, and a strong positive correlation was observed between Na+ and Cl (r = 0.51), implying a common origin dominated by halite dissolution. Between Ca2+ and SO42−, a moderate correlation (r = 0.38) is identified. With most samples exhibiting Ca2+/SO42− ratios above 2 (Figure 7b), it can be inferred that carbonate weathering contributes supplementary Ca2+ to the groundwater. The combined relationship of Ca2+ + Mg2+ versus HCO3 + SO42− is used to assess the influence of dolomite, calcite, and gypsum solution. In Figure 7c, the majority of samples plot near the 1:1 line, indicating that the dissolution of these minerals predominantly regulates Ca2+ and Mg2+ levels. Further examination of HCO3 against Ca2+ + Mg2+ (Figure 7d) indicates that most data points plot below the 1:2 equiline, suggesting the release of HCO3, Ca2+, and Mg2+ through carbonate weathering, while gypsum dissolution may contribute to excess Ca2+. Additionally, HCO3 correlates moderately with Ca2+ (r = 0.49) and strongly with Mg2+ (r = 0.86), supporting that carbonate minerals (e.g., calcite and dolomite) serve as the principal sources of HCO3, Ca2+, and Mg2+ in the groundwater system.

5.2. Identifying Nitrate Sources

In natural environments, the NO3 concentration typically remain below 10 mg/L, which is indicated that the threshold widely recognized as an indicator for detecting anthropogenic pollution [33,34]. In the study area, the NO3 concentration of collected groundwater samples ranged from 0.02 mg/L to 68.0 mg/L. Specifically, 83.8% of samples in Cluster 1, 52.4% in Cluster 2, and 17.2% in Cluster 3 exceeded this 10 mg/L benchmark. This widespread exceedance indicates that groundwater quality across the study area is significantly impacted by anthropogenic activities. Accordingly, a source apportionment study is conducted for the aforementioned high-nitrate groundwater samples to elucidate the mechanisms underlying nitrate enrichment in groundwater.
Based on the molar ratios of Cl/Na+ vs. NO3/Na+ and Cl vs. NO3/Cl, the sources of NO3 can be effectively identified by comparing these ratios with the ranges of known end-members [35,36], such as those from agricultural activities, industrial production, and domestic sewage. As shown in Figure 8a, most samples in Cluster 1, Cluster 2, and Cluster 3 are characterized by high NO3/Na+ ratios and low Cl/Na+ ratios. This is attributed to the application of nitrogen fertilizers in agricultural activities within the study area, which introduces a large amount of NO3. Thus, this proportional characteristic indicates that agricultural activities are the main source of NO3. In Figure 8a, a few samples from the three clusters are distributed between the agricultural and sewage end-members. This placement is due to the concurrent increase in NO3 and Cl from sewage, as Cl primarily originates from human waste and household detergents. Therefore, the NO3 in the study area likely comes from a mixture of agricultural and domestic sewage sources.
As shown in Figure 8b, the agricultural activity end-member exhibits the characteristics of high NO3/Cl ratios and low Cl concentrations, which is due to the increase in NO3 caused by fertilization and the relatively low content of Cl. The domestic sewage and fecal source end-member shows the characteristics of low NO3/Cl ratios and high Cl concentrations, as domestic sewage and feces contain a large amount of Cl but relatively little NO3. The soil nitrogen source end-member presents the characteristics of medium NO3/Cl ratios and medium Cl concentrations: the mineralization of soil organic nitrogen produces NO3, while Cl is mainly from the background, so the ratio falls between the agricultural end-member and the sewage end-member. The rainfall end-member exhibits the characteristics of low Cl concentrations and medium NO3/Cl ratios; however, no samples from the study area are distributed in this region, indicating that rainfall is not the main source of NO3 in the study area. As shown in Figure 8b, most samples are distributed in the area between the agricultural activity end-member and the sewage end-member, indicating that NO3 pollution in the groundwater of the study area is the result of multi-source mixing. In conclusion, high NO3 pollution in the groundwater of the study area is mainly dominated by agricultural activities and is also affected by the superimposition of domestic sewage.

5.3. HHRA Results

Groundwater resources in the study area serve as an important water source for human livelihood and agricultural production. Quantitative assessment of the potential health risks associated with groundwater is a prerequisite for its development and utilization. HHRA is a widely used evaluation method that establishes a non-linear relationship between human health conditions and potential health risks posed by major contaminants. This study quantitatively estimates the potential health risks of groundwater with high-nitrate concentrations in the study area.
The total hazard index (HITotal) for children, adult females, and adult males is shown in Figure 9. Generally, health risks based on the hazard quotient (HQ) values are classified into three categories as follows: negligible health risk when HQ < 1, moderate health risk when HQ ranges between 1 and 4, and high health risk when HQ > 4. For children, HITotal values range from 0.21 to 79.63, with a mean value of 11.17. In the children group, 72% of samples are categorized as high health risk, 21% as moderate health risk, and 8% as negligible health risk. For adult females, HITotal values vary from 0.12 to 46.55, with a mean of 6.53. In this group, 62% of samples indicated high health risk, 28% moderate health risk, and 10% negligible health risk. For adult males, HITotal values range from 0.09 to 34.17, with an average of 4.79. Among the adult males, 12%, 38%, and 50% of samples are classified as negligible, moderate, and high health risk, respectively. The HHRA results indicate that the non-carcinogenic health risks posed by high-nitrate groundwater decrease in the order of children > adult females > adult males. Consistent with previous studies, children exhibited greater susceptibility than adults, which can be attributed to their lower body weight and immature physiological systems [11].

6. Conclusions

The primary objective of this research is to investigate the hydrochemical characteristics and genetic mechanisms of high-nitrate groundwater in the shallow aquifer of the Baoding area, North China Plain. Based on the analytical results from 91 groundwater samples, the study employs a combined approach of hydrochemical analysis, the SOM algorithm, and Human Health Risk Assessment, alongside a quantitative evaluation of the associated human health risks. The key conclusions are as follows:
(1)
The SOM analysis effectively categorized 91 samples into three distinct clusters with markedly different physicochemical characteristics. Cluster 1 (6.59% of samples) is characterized by the highest concentrations of NO3 (exceeding permissible limits), Ca2+, and TDS, suggesting a high degree of mineralization and nitrate pollution. In contrast, Cluster 3 (70.33% of samples) represents the largest group and exhibits the lowest levels of NO3, Ca2+, HCO3, and TDS. Cluster 2 (23.08% of samples) presents an intermediate signature with elevated Na+, Mg2+, and HCO3. The statistical data and supporting figures collectively affirm these hydrochemical characteristics, which are predominantly HCO3-Ca and HCO3-Mg types for Clusters 1 and 2, and HCO3-Ca for Cluster 3.
(2)
The NO3 concentrations in the 91 shallow groundwater samples range from below the detection limit to 68.0 mg/L, with a mean value of 9.59 mg/L. Notably, 30.43% of the samples surpassed the anthropogenic pollution threshold of 10 mg/L, indicating significant anthropogenic influence. Molar ratio analyses further reveal that agricultural activities are the primary source of NO3, supplemented by contributions from domestic sewage.
(3)
Utilizing high-nitrate groundwater for long-term consumption presents a potential health hazard. The assessment reveals a discernible pattern in non-carcinogenic risk: children are at the highest risk, followed by adult females and then adult males. The elevated risk for children is directly linked to their lower body weight and heightened sensitivity of their developing metabolic processes.

Author Contributions

Methodology, J.G.; Software, J.G. and J.Z. (Jianqing Zhao); Validation, J.G. and Y.Y.; Formal Analysis, J.Z. (Jianqing Zhao) and Y.Y.; Investigation, J.Z. (Jun Zheng), Z.W., and S.L.; Data Curation, Y.Y. and Z.W.; Writing—Original Draft, J.G.; Writing—Review and Editing, S.Z. and J.Z. (Jianqing Zhao); Project Administration, S.Z.; Funding acquisition, S.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the Deep Earth Probe and Mineral Resources Exploration National Science and Technology Major Project (2024ZD1004103), the Chinese Academy of Geological Sciences Basal Research Fund (No. JKY202406 and No. JKYZD202401), and the Open Research Fund Project of Hebei Key Laboratory of Geological Resources and Environment Monitoring and Protection (JCYKT202404).

Data Availability Statement

The data presented in this study are available on request from the corresponding author. (The data are not publicly available due to privacy restrictions.)

Acknowledgments

We are grateful to the editorial team for their guidance throughout the review process. We deeply appreciate the anonymous reviewers for their insightful comments and valuable feedback, which significantly improved the quality of this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Machiwal, D.; Jha, M.K.; Singh, V.P.; Mohan, C. Assessment and mapping of groundwater vulnerability to pollution: Current status and challenges. Earth-Sci. Rev. 2018, 185, 901–927. [Google Scholar] [CrossRef]
  2. Chen, L.; Huang, F.; Zhang, C.; Zhang, J.; Liu, F.; Guan, X. Effects of norfloxacin on nitrate reduction and dynamic denitrifying enzymes activities in groundwater. Environ. Pollut. 2021, 273, 116492. [Google Scholar] [CrossRef]
  3. Dong, Y.; Zhang, X.; Yi, L. Hypoxia exerts greater impacts on shallow groundwater nitrogen cycling than seawater mixture in coastal zone. Environ. Sci. Pollut. Res. 2024, 31, 43812–43821. [Google Scholar] [CrossRef]
  4. Koh, E.-H.; Lee, K.-K.; Moon, S.; Koh, H.-J. Site-specific management of nitrate contamination based on groundwater flow system characterization in two agricultural areas of Jeju volcanic island, South Korea. J. Environ. Manag. 2025, 391, 126359. [Google Scholar] [CrossRef]
  5. Di Lorenzo, T.; Fiasca, B.; Di Camillo Tabilio, A.; Murolo, A.; Di Cicco, M.; Galassi, D.M.P. The weighted Groundwater Health Index (wGHI) by Korbel and Hose (2017) in European groundwater bodies in nitrate vulnerable zones. Ecol. Indic. 2020, 116, 106525. [Google Scholar] [CrossRef]
  6. Jia, H.; Qian, H. Groundwater nitrate response to hydrogeological conditions and socioeconomic load in an agriculture dominated area. Sci. Rep. 2025, 15, 1315. [Google Scholar] [CrossRef]
  7. Páez-Osuna, F.; Álvarez-Borrego, S.; Ruiz-Fernández, A.C.; García-Hernández, J.; Jara-Marini, M.E.; Bergés-Tiznado, M.E.; Piñón-Gimate, A.; Alonso-Rodríguez, R.; Soto-Jiménez, M.F.; Frías-Espericueta, M.G.; et al. Environmental status of the Gulf of California: A pollution review. Earth-Sci. Rev. 2017, 166, 181–205. [Google Scholar] [CrossRef]
  8. Jin, L.; Ye, H.; Shi, Y.; Li, L.; Liu, R.; Cai, Y.; Li, J.; Li, F.; Jin, Z. Using PCA-APCS-MLR model and SIAR model combined with multiple isotopes to quantify the nitrate sources in groundwater of Zhuji, East China. Appl. Geochem. 2022, 143, 105354. [Google Scholar] [CrossRef]
  9. Khebudkar, A.; Sohoni, M. Estimation of Nitrate Concentration in Groundwater Source Using Zonal Nitrate Balance Method in Male Village of Western Maharashtra. In Geoenvironmental Engineering; Agnihotri, A.K., Reddy, K.R., Bansal, A., Eds.; Springer Nature: Singapore, 2024; pp. 281–298. [Google Scholar]
  10. Hossain, M. A simple and effective approach to investigate the dominant contaminant sources and accuracy in water quality estimation through Monte Carlo simulation, Gaussian Mixture Models (GMMs), and GIS machine learning methods. Ecol. Indic. 2025, 171, 113188. [Google Scholar] [CrossRef]
  11. Zhang, S.; Liu, K.; Wang, L.; Zhu, W.; Deng, Y.; Yu, C. Identifying the Hydrochemical Characteristics and Genetic Mechanism of Medium-Low Temperature Fluoride-Enriched Geothermal Groundwater in the Hongjiang—Qianshan Fault of Jiangxi Province. Rock Miner. Anal. 2024, 43, 568–581. (In Chinese) [Google Scholar] [CrossRef]
  12. Xiao, Y.; Liu, K.; Zhang, Y.; Yang, H.; Wang, S.; Qi, Z.; Hao, Q.; Wang, L.; Luo, Y.; Yin, S. Numerical investigation of groundwater flow systems and their evolution due to climate change in the arid Golmud river watershed on the Tibetan Plateau. Front. Earth Sci. 2022, 10, 943075. [Google Scholar] [CrossRef]
  13. Hou, J.; Cao, M.; Liu, P. Development and utilization of geothermal energy in China: Current practices and future strategies. Renew. Energy 2018, 125, 401–412. [Google Scholar] [CrossRef]
  14. Wang, L.Y.; Liu, K.; Wan, L.; Zhang, Y.Y.; Zhang, S.C.; Jia, W.H.; Yue, X.R.; Bu, G.Y. Hydrochemical characteristics and genetic mechanisms of mid-low temperature geothermal fluids in the eastern segment of the Xinquan—Wentang Fault Zone, Jiangxi Province. Acta Geosci. Sin. 2025; in press. (In Chinese). [Google Scholar] [CrossRef]
  15. Qu, S.; Shi, Z.; Liang, X.; Wang, G.; Han, J. Multiple factors control groundwater chemistry and quality of multi-layer groundwater system in Northwest China coalfield—Using self-organizing maps (SOM). J. Geochem. Explor. 2021, 227, 106795. [Google Scholar] [CrossRef]
  16. Wang, C.; Liao, F.; Wang, G.; Qu, S.; Mao, H.; Bai, Y. Hydrogeochemical evolution induced by long-term mining activities in a multi-aquifer system in the mining area. Sci. Total Environ. 2023, 854, 158806. [Google Scholar] [CrossRef] [PubMed]
  17. Lin, C.K.; Du, R.H.; Guo, F. Implication of self-organizing map, stable isotopes combined with MixSIAR model for accurate nitrogen control in a well-protected reservoir. Environ. Res. 2024, 248, 118335. [Google Scholar] [CrossRef] [PubMed]
  18. Qu, S.; Liang, X.; Liao, F.; Mao, H.; Xiao, B.; Duan, L.; Shi, Z.; Wang, G.; Yu, R. Geochemical fingerprint and spatial pattern of mine water quality in the Shaanxi-Inner Mongolia Coal Mine Base, Northwest China. Sci. Total Environ. 2023, 854, 158812. [Google Scholar] [CrossRef] [PubMed]
  19. Dong, Y.; Zhu, S.; Li, J.; Liu, W.; Li, Z.; Sun, Z.; Liu, C. Hydrochemical characteristics and source identification of nitrate in surface water and shallow groundwater in the Poyang Lake Basin, China. Environ. Earth Sci. 2025, 84, 271. [Google Scholar] [CrossRef]
  20. Mao, H.; Wang, G.; Rao, Z.; Liao, F.; Shi, Z.; Huang, X.; Chen, X.; Yang, Y. Deciphering spatial pattern of groundwater chemistry and nitrogen pollution in Poyang Lake Basin (eastern China) using self-organizing map and multivariate statistics. J. Clean. Prod. 2021, 329, 129697. [Google Scholar] [CrossRef]
  21. Kohonen, T. Self-organized formation of topologically correct feature maps. Biol. Cybern. 1982, 43, 59–69. [Google Scholar] [CrossRef]
  22. Chinese Ministry of Health. Control Criteria for Endemie Fluorosis Areas; National Disease Control and Prevention Administration: Beijing, China, 2011. [Google Scholar]
  23. USEPA. Risk Assessment Guidance for Superfund, Volume 1, Human Health Evaluation Manual. Part E (Supplemental Guidance for Dermal Risk Assessment); EPA/540/R/99/005; Office of Superfund Remediation and Technology Innovation: Washington, DC, USA, 2004; Volume 1. [Google Scholar]
  24. USEPA. Drinking Water Standards Health Advisories Office of Water; U.S. Environmental Protection Agency: Washington, DC, USA, 2012.
  25. USGS. National Water Summary 190-1991 Stream Water Quality; USGs Water Supply; United States Geological Survey (USGS): Reston, VA, USA, 1993; p. 59.
  26. WHO. WHO Guidelines for Drinking-Water Quality, 4th ed.; World Health Organization: Geneva, Switzerland, 2011.
  27. Deng, Y.; Ye, X.; Du, X. Predictive modeling and analysis of key drivers of groundwater nitrate pollution based on machine learning. J. Hydrol. 2023, 624, 129934. [Google Scholar] [CrossRef]
  28. Sridhar, C.N.; Subramani, T.; Kumar, G.R.S.; Soundaranayaki, K. Nitrate pollution index and age wise health risk appraisal for the Pambar River basin in south India. Environ. Geochem. Health 2025, 47, 198. [Google Scholar] [CrossRef]
  29. Fenton, O.; Richards, K.G.; Kirwan, L.; Khalil, M.I.; Healy, M.G. Factors affecting nitrate distribution in shallow groundwater under a beef farm in South Eastern Ireland. J. Environ. Manag. 2009, 90, 3135–3146. [Google Scholar] [CrossRef]
  30. Shi, J.; Yang, Y.; Shi, M.; Lu, Y.; Chen, W. Hydrogeochemical and machine learning evidences for release and attenuation mechanisms of chromium contamination in a partially PRB remediation of shallow groundwater. Environ. Pollut. 2025, 385, 127109. [Google Scholar] [CrossRef]
  31. Ding, H.; Gao, H.; Zhu, M.; Yu, M.; Sun, Y.; Zheng, M.; Su, J.; Xi, B. Spectral and molecular insights into the characteristics of dissolved organic matter in nitrate-contaminated groundwater. Environ. Pollut. 2024, 355, 124202. [Google Scholar] [CrossRef]
  32. Wang, Y.Y.; Cao, W.G.; Pan, D.; Wang, S.; Ren, Y.; Li, Z.Y. Distribution and origin of high arsenic andfuoride in groundwater of the North Henan Plain. Rock Miner. Anal. 2022, 41, 1095–1109. (In Chinese) [Google Scholar] [CrossRef]
  33. Kaandorp, V.P.; Broers, H.P.; van der Velde, Y.; Rozemeijer, J.; de Louw, P.G.B. Time lags of nitrate, chloride, and tritium in streams assessed by dynamic groundwater flow tracking in a lowland landscape. Hydrol. Earth Syst. Sci. 2021, 25, 3691–3711. [Google Scholar] [CrossRef]
  34. Nakagawa, K.; Amano, H.; Shinkai, F.; Wakasa, A.; Berndtsson, R. Integrated approach to investigate groundwater nitrate nitrogen pollution and remediation simulation in Shimabara Peninsula, Nagasaki, Japan. Environ. Earth Sci. 2025, 84, 256. [Google Scholar] [CrossRef]
  35. Chen, A.; Du, Y.; Wang, Z.; Sun, X.; Xu, R.; Xiong, Y.; Yang, L.; Liu, J.; Gan, Y. Source identification of nitrate in groundwater of an agro-pastoral ecotone in a semi-arid zone, northern China: Coupled evidences from MixSIAR model and DOM fluorescence. Appl. Geochem. 2024, 175, 106197. [Google Scholar] [CrossRef]
  36. Nawaz, A.; Alfio, M.R.; Fiorese, G.D.; Balacco, G. Identification of potential causes of nitrate pollution in three apulian aquifers (Southern Italy). Sustain. Water Resour. Manag. 2025, 11, 41. [Google Scholar] [CrossRef]
Figure 1. The location of the study area and sampling sites.
Figure 1. The location of the study area and sampling sites.
Water 17 03517 g001
Figure 2. (a) Component planes illustrating the weight vector values for 10 variables. Neurons in each component plane are denoted via color coding, where blue corresponds to the low values of the parameters and yellow to the high values, respectively. (b) The clustering pattern of groundwater within the study area. Red solid lines indicate the boundaries of the clusters. Component planes of the SOM display the distribution of weight vectors corresponding to each input variable (the 10 groundwater-related variables) across all neurons. Each neuron’s color represents the magnitude of the weight value, enabling intuitive observation of variable spatial patterns. Darker colors typically indicate greater dissimilarity between neighboring neurons and brighter colors represent higher similarity, thus assisting in identifying natural cluster boundaries.
Figure 2. (a) Component planes illustrating the weight vector values for 10 variables. Neurons in each component plane are denoted via color coding, where blue corresponds to the low values of the parameters and yellow to the high values, respectively. (b) The clustering pattern of groundwater within the study area. Red solid lines indicate the boundaries of the clusters. Component planes of the SOM display the distribution of weight vectors corresponding to each input variable (the 10 groundwater-related variables) across all neurons. Each neuron’s color represents the magnitude of the weight value, enabling intuitive observation of variable spatial patterns. Darker colors typically indicate greater dissimilarity between neighboring neurons and brighter colors represent higher similarity, thus assisting in identifying natural cluster boundaries.
Water 17 03517 g002
Figure 3. Piper diagram of collected samples.
Figure 3. Piper diagram of collected samples.
Water 17 03517 g003
Figure 4. Pearson correlation coefficient diagram of hydrochemical parameters. *: p < 0.05; **: p < 0.01; ***: p < 0.001.
Figure 4. Pearson correlation coefficient diagram of hydrochemical parameters. *: p < 0.05; **: p < 0.01; ***: p < 0.001.
Water 17 03517 g004
Figure 5. Gibbs diagram.
Figure 5. Gibbs diagram.
Water 17 03517 g005
Figure 6. The bivariate plots of molar ratios: (a) Ca2+/Na+ vs. Mg2+/Na+ and (b) Ca2+/Na+ vs. HCO3/Na+.
Figure 6. The bivariate plots of molar ratios: (a) Ca2+/Na+ vs. Mg2+/Na+ and (b) Ca2+/Na+ vs. HCO3/Na+.
Water 17 03517 g006
Figure 7. The ionic ratio of (a) Na+ vs. Cl, (b) Ca2+ vs. SO42−, (c) Ca2+ + Mg2+ vs. HCO3 + SO42−, and (d) Ca2+ + Mg2+ vs. HCO3.
Figure 7. The ionic ratio of (a) Na+ vs. Cl, (b) Ca2+ vs. SO42−, (c) Ca2+ + Mg2+ vs. HCO3 + SO42−, and (d) Ca2+ + Mg2+ vs. HCO3.
Water 17 03517 g007
Figure 8. Nitrate source identification by (a) Cl/Na+ vs. NO3/Na+ and (b) Cl vs. NO3/Cl.
Figure 8. Nitrate source identification by (a) Cl/Na+ vs. NO3/Na+ and (b) Cl vs. NO3/Cl.
Water 17 03517 g008
Figure 9. Box plot of HQ value of high NO3 concentration groundwater.
Figure 9. Box plot of HQ value of high NO3 concentration groundwater.
Water 17 03517 g009
Table 1. The empirical parameters of Human Health Risk Assessment.
Table 1. The empirical parameters of Human Health Risk Assessment.
Parameter Unit Children Female Male
Oral reference dose for NO3 (RfDoral)mg/(kg × day)0.040.040.04
Gastrointestinal absorption factor (ABSgi)-111
Drinking rate (IR)L/day0.71.51.5
Exposure frequency (EF)days/year365365365
Exposure duration (ED)years63030
Average body weight (BW)kg155575
Average time (AT)days219010,95010,950
Skin permeability (Ksp)cm/h0.0010.0010.001
Contact duration (T)h/d0.40.40.4
Exposure frequency of daily dermal contact (EV)-111
Unit conversion factor (CF)L/cm30.0010.0010.001
Skin surface area (Sa)-6597.0115,475.8518,742.36
Average body height (H)cm99.4153.4165.3
Note: The empirical parameters are referenced by [22,23,24,25,26].
Table 2. The Davies–Bouldin indices (DBIs) of different clustering.
Table 2. The Davies–Bouldin indices (DBIs) of different clustering.
Clusters 1 2 3 4 5
DBINaN *0.98980.84180.89460.9126
Note: * the null value.
Table 3. Statistical results of physicochemical parameters of different SOM’s clusters.
Table 3. Statistical results of physicochemical parameters of different SOM’s clusters.
pHTDSK+Ca2+Na+Mg2+HCO3SO42−ClNO3
Unit mg/L mg/L mg/L mg/L mg/L mg/L mg/L mg/L mg/L
Cluster 1 Max 7.311725.9218.0140.088.3559.0197.0203.068.0
Min 7.05270.6116.014.730.7337.061.427.59.7
Ave 7.27193.2149.052.558.3412.3116.598.934.8
SD 0.12202.135.542.420.074.056.161.421.3
Cluster 2 Max 7.711281.8159.0162.088.0567.0135.0358.036.8
Min 7.14570.240.713.947.6349.015.219.22.1
Ave 7.36090.795.960.065.5469.262.577.712.6
SD 0.21510.430.233.914.051.134.767.58.9
Cluster 3 Max 8.14876.6132.099.252.5475.0103.094.820.0
Min 7.22190.223.34.49.6161.04.65.70.0
Ave 7.53511.480.418.229.7312.733.625.36.2
SD 0.2641.019.512.79.762.024.015.24.8
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gao, J.; Zhao, J.; Yang, Y.; Zheng, J.; Wang, Z.; Liu, S.; Zhang, S. Application of the Self-Organizing Map (SOM) Algorithm to Identify Hydrochemical Characteristics and Genetic Mechanism of High-Nitrate Groundwater in Baoding Area, North China Plain. Water 2025, 17, 3517. https://doi.org/10.3390/w17243517

AMA Style

Gao J, Zhao J, Yang Y, Zheng J, Wang Z, Liu S, Zhang S. Application of the Self-Organizing Map (SOM) Algorithm to Identify Hydrochemical Characteristics and Genetic Mechanism of High-Nitrate Groundwater in Baoding Area, North China Plain. Water. 2025; 17(24):3517. https://doi.org/10.3390/w17243517

Chicago/Turabian Style

Gao, Jue, Jianqing Zhao, Yang Yang, Jun Zheng, Zhiguang Wang, Shurui Liu, and Shouchuan Zhang. 2025. "Application of the Self-Organizing Map (SOM) Algorithm to Identify Hydrochemical Characteristics and Genetic Mechanism of High-Nitrate Groundwater in Baoding Area, North China Plain" Water 17, no. 24: 3517. https://doi.org/10.3390/w17243517

APA Style

Gao, J., Zhao, J., Yang, Y., Zheng, J., Wang, Z., Liu, S., & Zhang, S. (2025). Application of the Self-Organizing Map (SOM) Algorithm to Identify Hydrochemical Characteristics and Genetic Mechanism of High-Nitrate Groundwater in Baoding Area, North China Plain. Water, 17(24), 3517. https://doi.org/10.3390/w17243517

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop