Extraction of Major Groundwater Ions from Total Dissolved Solids and Mineralization Using Artificial Neural Networks: A Case Study of the Aflou Syncline Region, Algeria

Stamboul, Mohammed Elamin; Habib, Azzaz; Hamimed, Abderrahmane; Zakhrouf, Mousaab; Chung, Il-Moon; Kim, Sungwon

doi:10.3390/hydrology12050103

Open AccessArticle

Extraction of Major Groundwater Ions from Total Dissolved Solids and Mineralization Using Artificial Neural Networks: A Case Study of the Aflou Syncline Region, Algeria

by

Mohammed Elamin Stamboul

^1,2,

Azzaz Habib

³,

Abderrahmane Hamimed

²

,

Mousaab Zakhrouf

⁴,

Il-Moon Chung

⁵

and

Sungwon Kim

^6,*

¹

Department of Natural and Life Sciences, Faculty of Science, University Center of Aflou El Cherif Bouchoucha, Aflou 03001, Algeria

²

Biological Systems and Geomatics Laboratory, Faculty of Natural and Life Sciences, University Mustapha Stambouli of Mascara, Mascara 29000, Algeria

³

Laboratory of Water Science and Technology, Department of Hydraulics, Faculty of Science and Technology, University Mustapha Stambouli of Mascara, Mascara 29000, Algeria

⁴

Department of Hydraulics, Faculty of Technology, University of Tlemcen, Tlemcen 13000, Algeria

⁵

Department of Hydro Science and Engineering Research, Korea Institute of Civil Engineering and Building Technology, Goyang 10223, Republic of Korea

⁶

Department of Railroad Construction and Safety Engineering, Dongyang University, Yeongju 36040, Republic of Korea

^*

Author to whom correspondence should be addressed.

Hydrology 2025, 12(5), 103; https://doi.org/10.3390/hydrology12050103

Submission received: 19 March 2025 / Revised: 21 April 2025 / Accepted: 22 April 2025 / Published: 25 April 2025

(This article belongs to the Section Hydrological and Hydrodynamic Processes and Modelling)

Download

Browse Figures

Versions Notes

Abstract

Global water demand due to population growth and agricultural development has led to widespread overexploitation of groundwater, particularly in semi-arid regions. The traditional hydrochemistry monitoring system still suffers from limited laboratory accessibility and high costs. This study aims to predict the major ions of groundwater, including Ca²⁺, Mg²⁺, Na⁺, SO₄²⁻, Cl⁻, K⁺, HCO₃⁻, and NO₃⁻, utilizing two field-measurable parameters (i.e., total dissolved solids (TDS) and mineralization (MIN)) in the Aflou syncline region, Algeria. A multilayer perceptron (MLP) model optimized with Levenberg–Marquardt backpropagation (LMBP) provided the greatest predictive accuracy for the different ions of SO₄²⁻, Mg²⁺, Na⁺, Ca²⁺, and Cl⁻ with R² = (0.842, 0.980, 0.759, 0.945, 0.895), RMSE = (53.660, 12.840, 14.960, 36.460, 30.530) (mg/L), and NSE = (0.840, 0.978, 0.754, 0.941, 0.892) in the testing phase, respectively. However, the predictive accuracy for the remaining ions of K⁺, HCO₃⁻, and NO₃⁻ was supplied as R² = (0.045, 0.366, 0.004), RMSE = (6.480, 41.720, 40.460) (mg/L), and NSE = (0.003, 0.361, −0.933), respectively. The performance of our model (LMBP-MLP) was validated in adjacent and similar geological locations, including Aflou, Madna, and Ain Madhi. In addition, LMBP-MLP showed very promising results, with performance similar to that in the original research region.

Keywords:

groundwater ions; artificial neural networks; groundwater overexploitation; semi-arid regions; hydrochemical monitoring

1. Introduction

Global demand for water is expected to surge by 2050 due to population growth, economic growth, and changing consumption patterns. Estimates suggest that as many as 6 billion people could face water shortages if demand increases by 20 to 30 percent from the current levels [1]. Agriculture, which accounts for 70% of global freshwater withdrawals, will increase competition for resources, especially in dry areas [2,3]. These challenges were clearly illustrated in Algeria, where groundwater quality, a major source of irrigation in arid regions, is declining due to salinity and agricultural pollution [4,5]. For this reason, groundwater quality monitoring is very important for sustainable water resource management [6,7]. Extensive sampling campaigns and extensive water chemistry scans are required to monitor the amount of degradation. These scans should include the important chemical ions in water such as Ca²⁺, Mg²⁺, Na⁺, K⁺, HCO₃⁻, Cl⁻, and SO₄²⁻ [8,9]. Effective monitoring, however, still requires significant resources and ongoing sampling and analysis. This highlights the urgency of innovative solutions such as artificial neural networks to streamline water quality assessment [10].

Artificial intelligence (AI) has emerged as an innovative tool to simplify water quality assessment. An example of the early application of AI includes [11], who predicted the drinking water quality index (WQI) of Baghdad using artificial neural networks (ANNs) and identified pH and chloride as the main factors involved (R² = 0.973). Subsequent work by [12] optimized the ANN architecture for prediction, showing that a simpler MLP-4-5-4 model displayed better accuracy (R² = 0.989) than a deeper network. Based on these foundations, ref. [13] carried out nitrate concentration predictions by integrating land use data with pH, conductivity, and temperature, highlighting the adaptability of ANNs to multivariable systems. More recently, ref. [9] developed an ANN to predict ion concentrations (Ca²⁺, Mg²⁺, Na⁺, K⁺, HCO₃⁻, Cl⁻, and SO₄²⁻) directly from electrical conductivity (EC), achieving high accuracy within the trained EC range. These developments are consistent with a broader trend in ANN-based environmental modeling, with hybrid approaches combining physical and data-driven models gaining popularity.

These methodological innovations have been applied to address regional challenges. Ref. [14] compared radial basis function neural networks (RBF-NNs) and probabilistic neural networks (PNNs) in Iraq’s Alnekheeb Basin. They found that the PNN was superior in assessing irrigation suitability through salinity and sodium uptake ratios. Similarly, ref. [15] utilized ANNs to predict groundwater salinity, outperforming conventional regression models and enabling tailored irrigation strategies for salinity-sensitive crops in Spain’s Campo de Cartagena. Furthermore, ref. [16] demonstrated the scalability of ANNs in stressed groundwater layers, achieving perfect TDS prediction (R² = 0.984) in the Babylonian region of Iraq. However, there are still gaps in applying these technologies to regions with complex evaporite geology, such as the semi-arid regions of North Africa.

In this research, the authors focused on the application of ANN techniques in the Aflou syncline region of Algeria, a region with distinct geological and climatic features which is dependent on groundwater stored in sandstone strata influenced by Aptian gypsum and Triassic evaporite [17]. Here, the increase in the number of wells and the intensive exploitation of groundwater resources have accelerated evaporation and dissolution, increasing the risk of salinity [4]. The novelty of the current research is that it can predict major ions of groundwater including SO₄²⁻, Mg²⁺, Na⁺, Ca²⁺, Cl⁻, K⁺, HCO₃⁻, and NO₃⁻, employing ANNs optimized with various learning algorithms based on two field-measured parameters (i.e., total dissolved solids (TDS) and mineralization (MIN) values). Model performance was evaluated by utilizing statistical measures of accuracy and visual comparisons. The materials, including the study area and data collection, are provided in the next section. Also, artificial neural networks, optimization algorithms, model development, and measures of accuracy are explained one by one in Section 3. The results and discussion are presented in Section 4. Finally, the main conclusions are presented in Section 5.

2. Materials

2.1. Study Area and Data Collection

Aflou syncline region is located north of Djebel Amour, situated in the Central Sahara Atlas about 300 km southwest of Algiers at 1400 m above sea level (Figure 1). From its geographical coordinates (i.e., 34.11° N and 2.10° E), it is located in a mountainous area that acts as a natural barrier between the Sahara Atlas and the Sahara Plateau. This high terrain further exacerbates climatic contrasts, protecting the region from Mediterranean influences and creating a semi-arid climate with relatively cool temperatures and limited rainfall [17]. Geologically, the area is part of the Saharan Atlas Fold Belt, composing of Mesozoic sediments that date from the Triassic to the Cretaceous. These deposits reflect alternating marine and continental deposits, with limestone, limestone-rich beds, and sandstone-dominated strata.

The groundwater in this area flows generally in a southwest/northeast direction, and the piezometric water level varies between 1440 and 1320 m. The water level in piezometer decreases unexpectedly, suggesting that the sandstone syncline brings about aquifer channelization. Groundwater flow occurs on both sides of the syncline, and a cone of depression related to well operation is observed in the Aflou area. The hydraulic gradient ranges from 0.0005 upstream to more than 0.003 downstream, which is related to the permeability of area [18].

In arid and semi-arid regions, it is important to understand the geochemical processes for analyzing the water quality of aquifer system. This aquifer is located on a slope that is oriented in the SSW-NNE direction, and is over 80 km long and 10 km wide. Considering the multiple sources of error that could affect the results, it is important to clarify that the analysists retained only wells with a depth of less than 45 m, which is the majority, and excluded water level points with ion balance errors exceeding 5%. To accomplish this research, therefore, 153 groundwater samples were collected from wells distributed throughout the study area and were analyzed at the National Office of Water Resources (NAWR) Hydrology Laboratory. These datasets form the basis for modeling correlation among total dissolved solids (TDS), mineralization (MIN), and major ion concentrations employing ANNs. For this purpose, the dataset was split into three subsets: training (75%), validation (15%), and test (10%). The training subset was utilized to adjust the model parameters, while the validation subset was utilized for fine-tuning hyperparameters and to mitigate overfitting. Finally, the test subset was utilized to assess the model’s generalization capability, evaluating its performance on new data.

2.2. Mineralization

Among the datasets, mineralization (MIN), a French nomenclature, indicates the process by which water absorbs dissolved minerals as it moves through the Earth’s crust, interacting with rocks, sediments, and other geological structures. It refers to the inorganic fraction of total dissolved solids (TDS) in a water sample that remains after evaporation at 180 °C followed by ignition at 550 °C. It represents the non-volatile, thermally stable dissolved solids, primarily composed of inorganic salts and minerals. They mainly consist of cations (positive ions such as calcium (Ca²⁺), magnesium (Mg²⁺), sodium (Na⁺), and potassium (K⁺)) and anions (negative ions such as chloride (Cl⁻), sulfate (SO₄²⁻), carbonate (CO₃²⁻), bicarbonate (HCO₃⁻), and nitrate (NO₃⁻)). The methods for measuring MIN include direct laboratory procedure and indirect conversion factor method.

2.2.1. Laboratory Procedure

The required equipment for laboratory procedure consists of drying oven (180 °C); muffle furnace (550 °C); analytical balance (0.1 mg precision); platinum or porcelain crucible; 0.45 µm membrane filter; and beaker, pipette, and deionized/distilled water desiccator. In addition, step-by-step laboratory procedure involves the following: (1) transferring the filtered sample into a pre-weighed crucible; (2) evaporating and drying it in an oven at 180 °C until it reaches complete dryness; and (3) cooling the crucible in a desiccator, and then weighing it. The residue weight corresponds to the TDS (in mg/L).

The process for calculating MIN is defined as ignition. It can be outlined as follows: (1) take the crucible containing the TDS residue; (2) place it in a muffle furnace at 550 °C for at least 1 h; and (3) cool it in a desiccator, and then weigh it again. The remaining mass represents the MIN (mg/L) (i.e., [(weight after ignition − empty crucible weight)]/(sample volume in liters)). It is important to note, for laboratory procedure, that crucibles should always be cooled in a desiccator to avoid moisture absorption. Also, all weights must be recorded with high precision (±0.1 mg).

2.2.2. The Conversion Factor Method

The conversion factor method is one of the simplest and fastest field methods used to estimate mineralization in water. It relies on measuring the electrical conductivity (EC) of waterbody. In this method, a conversion factor previously calculated from chemical analysis data of similar water samples in the same geographical area or geological layer is employed. The procedures for measuring MIN utilizing the conversion factor method can be outlined as follows:

Step 1. Measure the EC. The EC of waterbody is measured utilizing a reliable device, ensuring it is properly calibrated. EC is measured in microsiemens per centimeter (µS/cm), which reflects the waterbody’s ability to conduct electricity, directly related to the concentration of dissolved ions in the water.

Step 2. Utilize the previously calculated conversion factor. The conversion factor, which was previously calculated from earlier chemical analyses of waterbody samples in the same region or geological layer, is applied. Also, the conversion factor is a constant number derived from the relationship between EC and MIN in that area. It reflects the relative distribution of mineral ions such as Ca²⁺, Mg²⁺, Na⁺, K⁺, Cl⁻, SO₄²⁻, HCO₃⁻, and NO₃⁻ in the waterbody.

Step 3. Calculate MIN. After measuring EC, the measured value is multiplied by the conversion factor to obtain the MIN value in mg/L or ppm (i.e., EC (µS/cm) × conversion factor).

It is important to note, that the accuracy of conversion factor method depends heavily on the accuracy of conversion factor obtained from previous chemical data. Therefore, it should be representative of the area where MIN is being measured. Also, it is advisable to periodically verify the calculated values by performing laboratory analyses on some samples to ensure the field estimates are accurate.

The advantage of conversion factor method is that it is quick and effective in the field, allowing for data collection from multiple locations. Also, it does not require complex chemical analyses in the laboratory procedure. On the other hand, the disadvantage of conversion factor method is that it heavily depends on the accuracy of conversion factor, which may not always be perfectly representative of all conditions. Furthermore, organic pollution or the presence of non-mineral components may affect the measurement results.

Therefore, the conversion factor method is an effective and field-friendly method for measuring MIN, especially in locations where laboratory access is difficult. However, to ensure accurate measurements, it is crucial that the conversion factor is derived from precise data, and periodic verification of results is required.

3. Methodology

3.1. Artificial Neural Networks and Optimization Algorithms

A multilayer perceptron (MLP), also known as a feedforward connected neural network (FCNN), is a fundamental architecture in deep learning in which every neuron in one layer is connected to all neurons in the next layer, allowing the network to learn nonlinear and complicated relationships in the data [19].

The training concept of MLP is the process of optimizing weights and biases to minimize the loss function, and it is usually accomplished utilizing the backpropagation method, a gradient-based optimization algorithm [20]. Backpropagation applies the chain rule to compute the gradient of the loss function for each weight, allowing the network to iteratively adjust its parameters [21]. However, standard backpropagation can be slow to converge or unstable, which has led to the development of advanced optimization algorithms.

These optimizing algorithms include (1) Levenberg–Marquardt (trainlm), which combines gradient descent and Gauss–Newton methods for fast convergence, but requires significant memory; (2) conjugate gradient with Polak–Ribière updates (traincgp), which is memory efficient and suitable for large networks; (3) gradient descent with momentum and adaptive learning rate (traingdx), which utilizes momentum to accelerate convergence and adapts the learning rate dynamically; (4) one-step secant (trainoss), which approximates the Hessian matrix to reduce computational complexity; (5) BFGS quasi-Newton (trainbfg), a second-order optimization method that approximates the inverse Hessian for faster convergence; (6) conjugate gradient with Powell–Beale restarts (traincgb), which periodically resets the search direction to avoid stagnation; (7) gradient descent with adaptive learning rate (traingda), which adjusts the learning rate based on gradient behavior; (8) resilient backpropagation (trainrp), which updates weights based on the sign of the gradient rather than its magnitude, making it robust to gradient vanishing; and (9) conjugate gradient with Fletcher–Reeves updates (traincgf), another conjugate gradient method which ensures efficient optimization [22,23,24,25,26].

The addressed optimization algorithms are implemented in various machine learning frameworks, such as MATLAB(R2021a)’s neural network toolbox, and are chosen based on problem requirements, including network size, data complexity, and computational constraints. For example, Levenberg–Marquardt (trainlm) algorithm is often utilized for small-to-medium-sized networks because of its speed, whereas conjugate gradient with Polak–Ribière updates (traincgp) and conjugate gradient with Powell–Beale restarts (traincgb) algorithms are preferred for large networks because of their memory efficiency. Also, the choice of optimization algorithm depends on the characteristics of loss surface. In addition, BFGS quasi-Newton (trainbfg), a second-order method, is effective on smooth, convex surfaces, whereas gradient descent with momentum and adaptive learning rate (traingdx), a first-order method, is more versatile on non-convex terrain [24,27,28].

3.2. Model Development

In this research, a Levenberg–Marquardt backpropagation multilayer perceptron (LMBP-MLP) was trained to predict the concentrations of important ions (i.e., Ca²⁺, Mg²⁺, Na⁺, SO₄²⁻, Cl⁻, K⁺, HCO₃⁻, and NO₃⁻) in water utilizing measurements of MIN, TDS, and some ions (i.e., Mg²⁺, Na⁺, and SO₄²⁻).

The selection of appropriate hyperparameters, including the number of neurons in the hidden layer, the type of activation function, the type of learning function, and the learning rate of ANN model, played a critical role in the model development.

PCA analysis, which was employed for data preprocessing of LMBP-MLP, can select the most important features from large quantities of data while maintaining the most appropriate information from the initial data [29]. The selection of important features for each ANN model (LMBP-MLP) was guided by a correlation heatmap of Pearson correlation coefficients, ensuring that the most relevant variables were utilized for each ion prediction (see Figure 2). The Pearson correlation coefficient quantifies the linear relationship between two variables, ranging from −1 (perfectly negative) to +1 (perfectly positive), with 0 indicating no linear correlation. It is commonly employed to assess the association between two variables [30]. Also, a correlation matrix helps to identify factors that exhibit statistical association based on the Pearson correlation coefficient, which quantifies the linear relationship between two variables. Figure 2 demonstrated that MIN and TDS showed strong correlations with most ions, especially SO₄²⁻, Mg²⁺, Na⁺, Ca²⁺, and Cl⁻, with correlation coefficients exceeding 0.85. This suggested that these elements originated primarily from evaporite deposits, such as gypsum and saltpeter, found in the Triassic and Aptian formations, and from limestone layers embedded within the Baremian–Aptian sandstone formations, which constitute the most important aquifer system in the region. In contrast, the remaining factors showed weak correlations, with coefficients less than 0.53. In particular, NO₃⁻ and HCO₃⁻ displayed low correlation values, reflecting that the two substances have different origins. In addition, nitrates (NO₃⁻) mainly come from agricultural fertilizers, while bicarbonates (HCO₃⁻) are produced by dissolution of calcite, the main mineral matrix of the Barem–Aptian sandstone. Potassium ions (K⁺), derived from the dissolution of the rare mineral sylvin, is present in low concentrations and contributes minimally to MIN and TDS.

The heatmap also highlights a very strong correlation for (MIN, SO₄²⁻) and (TDS, SO₄²⁻) (PCC > 0.95), followed by Na⁺ and Mg²⁺, both of which exhibit significant correlations (PCC > 0.92) with MIN and TDS. Both Na⁺ and Mg²⁺ displayed important correlations with SO₄²⁻ (PCC > 0.86). Also, a notable correlation was observed between Ca²⁺ and Cl⁻ with MIN and TDS (PCC > 0.85). A moderate correlation was found between SO₄²⁻ and Cl⁻ (PCC > 0.71). In contrast, the remaining ions (i.e., K⁺, HCO₃⁻, and NO₃⁻) did not display any significant correlations with the other studied elements. Depending on results of this analysis, feature variables suitable for each ANN model were selected as shown in Table 1.

3.3. Measures of Accuracy

To evaluate the performance of developed model, the authors employed two main statistical measures of accuracy, namely the coefficient of determination (R²) (Equation (1)), root mean square error (RMSE) (Equation (2)), and Nash–Sutcliffe efficiency (NSE) (Equation (3)). R² quantifies the proportion of variance in the dependent variable that can be predicted by the independent variables, providing insight into the explanatory power of developed model. Also, it assesses the predictive accuracy by comparing the model’s performance to the mean of the measured data, with values closer to one indicating a better fit [7,31]. Also, RMSE gives us a simple way to interpret predictive accuracy by measuring the average size of the error between the predicted and measured values [32,33]. In addition, NSE, a dimensionless statistical measure, can be interpreted as the coefficient of efficiency, and can be utilized to indicate the relative evaluation of developed model achievement [34,35].

R^{2} = \frac{\sum_{i = 1}^{n} [(Z_{i^{*}} - \bar{Z_{i^{*}}}) (Z_{i} - \bar{Z_{i}})}{{\sum_{i = 1}^{n} [(Z_{i^{*}} - \bar{Z_{i^{*}}})}^{2} {\sum_{i = 1}^{n} (Z_{i} - \bar{Z_{i}})}^{2}}

(1)

R M S E = \frac{1}{n} {\sum_{i = 1}^{n} (Z_{i} - \bar{Z_{i}})}^{2}

(2)

N S E = 1 - \frac{\sum_{i = 1}^{n} {(Z_{i} - Z_{i^{*}})}^{2}}{\sum_{i = 1}^{n} {(Z_{i} - \bar{Z_{i}})}^{2}}

(3)

where,

Z_{i^{*}}

= the predicted values,

Z_{i}

= the measured values,

\bar{Z_{i}}

= the mean of the measured values,

\bar{Z_{i^{*}}}

= the mean of the predicted values, and n = the number of data available.

In addition, the authors incorporated ion balance (a chemical index) to assess the model’s predictive ability to maintain chemical balance, which is especially important for applications involving water quality or environmental chemistry. Also, the addressed measurements provide a comprehensive assessment of the model’s predictive accuracy and reliability.

Ionic equilibrium is often evaluated via the charge balance (CB) index (Equation (4)), which is an important metric for assessing chemical consistency of a solution, especially in water quality research [36]. The interpretation of charge balance values depends on specific thresholds defined for the analysis context, such as in Equations (5) and (6).

C B = \frac{(\sum C - \sum A)}{(\sum C + \sum A)} \times 100

(4)

\sum C = \frac{{M g}^{2 +}}{12.15} + \frac{{C a}^{2 +}}{20.04} + \frac{K^{+}}{39.01} + \frac{{N a}^{+}}{22.99}

(5)

\sum A = \frac{{C l}^{-}}{35.45} + \frac{{S o}_{4}^{2 -}}{48.03} + \frac{{H C O}_{3}^{-}}{61.02} + \frac{{N O}_{3}^{-}}{62}

(6)

where ΣC = sum of cations (meq/L) and ΣA = sum of anions (meq/L).

For instance, a |CB| value of less than 5 indicates good ionic balance, reflecting a high degree of chemical consistency. A |CB| between 5 and 8 suggests moderate ionic balance, while a |CB| greater than 8 signifies poor ionic balance, indicating potential issues with the chemical composition. However, these thresholds can vary depending on the study’s requirements. In other cases, |CB| < 6 may be considered good, 6 ≤ |CB| ≤ 12 is moderate, and |CB| > 12 is poor. For more lenient assessments, thresholds such as |CB| < 10 (good), 10 ≤ |CB| ≤ 20 (moderate), and |CB| > 20 (poor) might be applied. These ranges help to classify the reliability of ion balances, ensuring the accuracy and validity of chemical data in environmental or analytical studies [37].

3.4. Hyperparameters Selection

The activation functions utilized in all developed models include the sigmoid activation function in the hidden layer and the linear transfer function in the output layer. The learning rate was set to 0.001 to ensure stable and efficient training. To select the most suitable training algorithm, the authors trained a model with two neurons utilizing the first three variables (i.e., SO₄²⁻, Na⁺, and Mg²⁺) listed in Table 1. The training algorithm illustrating the highest performance criterion was selected for optimization process. In this study, the evaluated training algorithms included trainlm, traincgp, traingdx, trainoss, trainbfg, traincgb, traingda, trainrp, and traincgf. Figure 3 shows the performance of training algorithms in this study. Results showed that Levenberg–Marquardt (trainlm) algorithm was superior to remaining algorithms and was the recommended choice for this study.

Choosing the number of neurons in the hidden layer is a critical factor in determining the accuracy of model training process. An excessive number of neurons can lead to overfitting, where the model memorizes noise instead of learning meaningful patterns. Conversely, if there are too few neurons, the model’s capacity to capture complex relationships is limited, which can lead to underfitting [38,39,40].

To enhance the performance of each model, the optimal number of neurons in the hidden layer was determined using a trial-and-error approach, with the number of neurons ranging from 1 to 30. Figure 4 illustrates the effect of difference in the number of hidden neurons on model accuracy in the validation phase. This showed how changing the number of hidden neurons affected the model accuracy. Based on the results, the optimal number of hidden neurons for each model was identified as follows: five neurons for LMBP-MLP1, LMBP-MLP2, and LMBP-MLP3; three neurons for LMBP-MLP4; nine neurons for LMBP-MLP5 and LMBP-MLP7; ten neurons for LMBP-MLP6; and eighteen neurons for LMBP-MLP8.

Figure 5 illustrates the flowchart of modeling process. Initially, SO₄²⁻ concentrations were predicted based on measured TDS and MIN values. Afterwards, Na⁺ and Mg²⁺ concentrations were computed utilizing measured (i.e., TDS and MIN) and predicted (SO₄²⁻) values. Also, the prediction of Ca²⁺ concentrations incorporated measured (i.e., TDS and MIN) and predicted (i.e., SO₄²⁻, Na⁺, and Mg²⁺) values. Similarly, Cl⁻ and K⁺ concentrations were predicted utilizing measured TDS and MIN, along with predicted SO₄²⁻ and Na⁺ values. Finally, NO₃⁻ and HCO₃⁻ concentrations were calculated based on measured TDS and MIN, integrating with the predicted Mg²⁺ values.

4. Results and Discussion

Table 2 presents the results of the developed ANN models for all major ions in the Aflou syncline region of Algeria, utilizing the coefficient of determination (R²), root mean square error (RMSE), and Nash–Sutcliffe efficiency (NSE) as evaluation metrics in the training, validation, test, and all dataset. The predictive accuracy of the models’ performance varied greatly based on different ANN models.

The ANN models used in this study, including LMBP-MLP1, LMBP-MLP2, LMBP-MLP3, LMBP-MLP4, and LMBP-MLP5, presented accurate performance, as indicated by high R² and NSE values and relatively low RMSE values across all subsets. This implies that the aforementioned models can accurately predict the concentrations of these ions (SO4²⁻, Mg²⁺, Na⁺, Ca²⁺, and Cl⁻) in the Aflou syncline region. However, some models, including LMBP-MLP6, LMBP-MLP7, and LMBP-MLP8, displayed poorer performance. That is to say, low R² and NSE values and high RMSE values indicated that these models have difficulty capturing the variability of K⁺, HCO₃⁻, and NO₃⁻ ions. In addition, among all of the developed models (i.e., from LMBP-MLP1 to LMBP-MLP8), it can be inferred from Table 2 that LMBP-MLP1 provided the strongest correlation between the predicted and measured variables (SO₄²⁻), whereas LMBP-MLP6, LMBP-MLP7, and LMBP-MLP8 did not yield a meaningful relationship between the predicted and measured variables (K⁺, HCO₃⁻, and NO₃⁻) based on various measures of accuracy in all of the datasets.

Figure 6 illustrates the comparison between the predicted and measured values of all major ions in the testing phase utilizing line plots and scatter plots. Line plots are effective when engineers and scientists want to review how the used data changes over time, or when making measurements on a non-time scale. Also, scatter plots employ various dots to interpret the suggested values for two different numeric indicators [41]. This gives a comparative assessment of the measured and predicted ion concentrations, providing important insights into the hydrochemical dynamics of the region.

For sulfate (SO₄²⁻), magnesium (Mg²⁺), and sodium (Na⁺), utilizing all of the datasets, the line plots for individual ANN models (i.e., LMBP-MLP1, LMBP-MLP2, and LMBP-MLP3) showed a strong visual agreement between the observed and predicted values, highlighting the reliability of LMBP-MLP1, LMBP-MLP2, and LMBP-MLP3. The scatter plots further quantified this alignment, showing strong predictive accuracy with high coefficients of determination (R² = 0.936 for SO₄²⁻, R² = 0.924 for Mg²⁺, and R² = 0.916 for Na⁺).

For calcium (Ca²⁺) and chloride (Cl⁻), the analysis results revealed partial model efficacy. The Ca²⁺ line plot was broadly consistent with the measured trend, but the deviations in 70–80 of the samples showed that it had limited ability to capture regional differences. The Ca²⁺ scatter plot (R² = 0.892) confirmed this, suggesting that high concentrations of outliers led to lower accuracy in the higher ranges. Similarly, the Cl⁻ scatter plot tracked the trend adequately, but underestimated the sharp peak at 90–100 of the samples, as evidenced by the clustering of scatter plot outliers (R² = 0.872) above the regression line. These discrepancies might arise from extreme values or incomplete representation of location-specific geochemical interactions, highlighting the need for targeted fine-tuning to improve prediction performance for these ions.

In contrast to the strong correlations for SO₄²⁻, Mg²⁺, and Na⁺, some models (i.e., LMBP-MLP6, LMBP-MLP7, and LMBP-MLP8) displayed poor performance for the scatter plots of potassium (K⁺), bicarbonate (HCO₃⁻), and nitrate (NO₃⁻). The K⁺ scatter plot (R² = 0.441) showed minimal agreement with the observed data and failed to reproduce the variability. Also, the HCO₃⁻ scatter plot (R² = 0.330) provided a nearly random dispersion, indicating a fundamental flaw in either variable selection or mechanical assumptions. Although the NO₃⁻ scatter plot (R² = 0.523) displayed a marginal improvement, LMBP-MLP8 systematically underestimated maximum concentrations, due to unaccounted for anthropogenic or biogeochemical influences.

It can be judged from the line plots and scatter plots (Figure 6) that LMBP-MLP1 displayed the highest R² value between the predicted and measured variables (SO₄²⁻), whereas LMBP-MLP6, LMBP-MLP7, and LMBP-MLP8 did not give the lowest R² values between the predicted and measured variables (K⁺, HCO₃⁻, and NO₃⁻) in all of the datasets.

Figure 7 presents the sorted charge balance (CB) values of all samples utilizing the predicted ion concentrations. It displays aligned equilibrium ion values for 153 water samples from the Aflou syncline region, employing the predicted ion concentrations. This charge balance serves as an indicator of data quality and the predictive accuracy of the ions, with values closer to 0 indicating a better match between positive and negative charges. Samples are classified into three groups, “Good” (green), “Moderate” (yellow), and “Poor” (red), based on their deviation from the ion balance. Most of the samples (84%) were classified as “Good”, suggesting the overall reliability of the data and the satisfactory performance of the developed models. However, 11% of the samples fell into the “Moderate” category, indicating potential problems, such as measurement errors, the presence of uncounted ions, or sample degradation. A small number of samples (5%) were classified as “Poor”, indicating serious errors that required reevaluation.

Testing the Developed Model in Adjacent Locations

To evaluate the generalization ability of the developed models, their performance was tested utilizing 20 water samples collected from three external locations within the research area. These locations share the same geological structure as the main research area, but show some petrological differences. The selected locations include Madna (six samples); Aflou ( samples), situated southwest of the Aflou syncline; and Ain Madhi (ten samples), situated further south. The predictive accuracy of the developed models was assessed utilizing the coefficient of determination (R²), and the results are presented in Figure 8.

Our results show significant differences in model performance across different ions and locations, highlighting that both ion-specific behavior and location-specific characteristics have an impact. The applied models show high prediction accuracies for SO₄²⁻, Mg²⁺, and Na⁺, and consistently high R² values at all points, indicating that the relationships between these ions and the feature variables were stable.

For Ca²⁺ and Cl⁻, the applied models performed well in Aflou and Madna, but supplied poor accuracy in Ain Madhi, suggesting that there are location-specific hydrogeochemical factors that influence ion concentrations. In contrast, the applied models performed poorly for NO₃⁻ and K⁺, with R² values close to 0 at three locations. This is likely due to external influences such as agricultural activities (NO₃⁻) and local mineral dissolution (K⁺).

The prediction of HCO₃⁻ varied significantly, displaying moderate performance at Madna but low accuracy at Aflou and Ain Madhi, indicating that hydrogeochemical control may be possible, depending on the location. The differences of performance in the applied models can be attributed to differences in rock composition, groundwater flow dynamics, and local environmental factors affecting ion concentrations.

The geological characteristics of Ain Madhi may provide more pronounced variations than those of Aflou and Madna, which may lead to inconsistencies that the applied models may not fully capture. In addition, location-specific geochemical processes, anthropogenic influences (e.g., fertilizer use affecting NO₃⁻), and varying mineral dissolution rates (e.g., sylvite dissolution for K⁺) may contribute to the observed discrepancies.

Overall, the applied models demonstrated strong predictive ability for SO₄²⁻, Mg²⁺, and Na⁺, whereas they performed poorly for Ca²⁺, Cl⁻, HCO₃⁻, NO₃⁻, and K⁺, especially at Ain Madhi. These results highlight the need for additional location-specific calibrations to improve model accuracy for specific ions and account for local hydrogeochemical variations.

To estimate the charge balance (CB) in the adjacent locations of Aflou, Madna, and Ain Madhi, the authors utilized Figure 8 to identify ions with high predictive performance (R² > 0.600), and replaced ions with low performance utilizing the measured data to ensure accuracy.

The values of the selected ions utilized for ionic balance calculations varied depending on location. In Aflou, the selected ions were SO₄²⁻, Cl⁻, Ca²⁺, Mg²⁺, Na⁺, and K⁺. In Ain Madhi, only SO₄²⁻, Ca²⁺, Mg²⁺, and Na⁺ satisfied the selection criteria, while in Madna, the chosen ions included SO₄²⁻, HCO₃⁻, Cl⁻, Ca²⁺, Mg²⁺, and Na⁺. The results of the ionic balance and their evaluation are presented in Table 3, Table 4 and Table 5.

Each table (Table 3, Table 4 and Table 5) provides the predicted and measured ion concentrations, the calculated charge balance, and the evaluation of samples based on these values. In Aflou, 75% of the samples were rated as “Good”, indicating reliable data and accurate ion estimates, 25% were rated as “Moderate”, and 0% were rated as “Poor”. In contrast, Ain Madhi featured a wider range of sample quality, featuring 50% “Good” samples, along with 30% “Poor” and 20% “Moderate” ratings. This suggests potential data quality issues specific to Ain Madhi, which could arise from sample contamination or measurement errors. Madna, similar to Aflou, gave mostly “Good” ratings (66.67%), with 33.33% being “Moderate” samples, suggesting generally reliable data with localized discrepancies. The variability in charge balance across all regions highlights the importance of ion balance analysis as a tool to assess data quality and validate ion estimates. The presence of “Poor” and “Moderate” samples highlights the need for further investigation to identify and correct potential problems to ensure the accuracy and reliability of hydrochemical data.

5. Conclusions

In Algeria, groundwater remains critical for irrigation, but groundwater quality in locations such as the Aflou syncline is increasingly compromised by salinity and agricultural contamination, threatening agricultural sustainability. Conventional monitoring methods that rely on expensive sampling campaigns and laboratory analyses highlight the urgent need for innovative and cost-effective solutions to ensure water security. Artificial intelligence (AI), especially artificial neural networks (ANNs), have emerged as a revolutionary tool in the field of hydrochemistry.

In this study, the authors introduced a novel algorithm to predict eight major ions (Ca²⁺, Mg²⁺, Na⁺, K⁺, HCO₃⁻, SO₄²⁻, Cl⁻, and NO₃⁻) utilizing only two accessible parameters (i.e., total dissolved solids (TDS) and mineralization (MIN)). The Levenberg–Marquardt backpropagation multilayer perceptron (LMBP-MLP) model with ion-specific customized architecture achieved robust predictive accuracies for SO₄²⁻, Mg²⁺, Na⁺, Ca²⁺, and Cl⁻ (R² and NSE ≥ 0.87), proving its usefulness in real-time monitoring. However, predictive accuracies for K⁺, HCO₃⁻, and NO₃⁻ were less reliable (R² ≤ 0.50), most likely due to complex environmental interactions and low concentrations leading to low statistical significance. That is, LMBP-MLP2 (R² = 0.980, RMSE = 12.840 mg/L, and NSE = 0.978) provided the best accuracy for predicting groundwater ion concentrations (Mg²⁺) in the testing phase, whereas LMBP-MLP7, LMBP-MLP8, and LMBP-MLP9 displayed the worst prediction of groundwater ion concentrations (K⁺, HCO₃⁻, and NO₃⁻) in the testing phase. If the number of measured ion datasets for the groundwater increases significantly, the prediction of groundwater ion concentration which displayed poor results (K⁺, HCO₃⁻, and NO₃⁻) can boost predictive accuracy significantly in the testing phase. Also, the validation of the charge balance (CB) analysis confirmed strong ionic balance in 95% of the predictions, but 5% showed discrepancies requiring improvement.

Spatial tests across three locations (i.e., Aflou, Madena, and Ain Madhi) showed consistent accuracy for SO₄²⁻, Mg²⁺, and Na⁺; moderate performance for Ca²⁺ and Cl⁻; variable results for K⁺ and HCO₃⁻; and overall poor prediction of NO₃⁻. These results highlight the model’s adaptability to regions such as Aflou and Madna, while also emphasizing the need for expanded geographic data to improve generalization. Despite these limitations, this algorithm has made great strides in water resource management in salinity-affected areas. Direct TDS and MIN measurements enable early detection of important ions (Ca²⁺, Mg²⁺, Na⁺, SO₄²⁻, and Cl⁻), providing three key benefits: cost savings, adaptability, and efficiency.

The limitation of this study can be explained by the use of few ANN models optimized with various learning algorithms to predict the concentration of major ions in the groundwater utilizing the restricted samples. That is, this study cannot be said, in general, to have accomplished the prediction of the concentrations of major ions in groundwater based on the suggested model. Therefore, the lack of various experiments can be improved by further studies, which should combine artificial neural networks, evolutionary optimization approaches, and data preprocessing tools to achieve the best prediction with the highest level of quality. Also, the overfitting issue occurring in the training procedure can be solved by using more high-quality data which include maximum and minimum values [38,40]. In addition, the K-fold cross validation [39] method, which has been often applied to training procedures, can also reduce the overfitting issue dependent on the diverse models.

Author Contributions

Conceptualization, M.E.S., A.H. (Abderrahmane Hamimed) and M.Z.; methodology, M.Z.; validation, M.E.S., A.H. (Abderrahmane Hamimed) and S.K.; formal analysis, A.H. (Azzaz Habib)., A.H. (Abderrahmane Hamimed) and M.Z.; investigation, A.H. (Azzaz Habib)., I.-M.C. and S.K.; data curation, M.E.S. and A.H. (Abderrahmane Hamimed).; writing—original draft preparation, M.E.S., A.H. (Azzaz Habib), A.H. (Abderrahmane Hamimed), M.Z., I.-M.C. and S.K.; writing—review and editing, M.Z., I.-M.C. and S.K.; visualization, M.E.S., A.H. (Azzaz Habib) and A.H. (Abderrahmane Hamimed); supervision, S.K.; funding acquisition, I.-M.C. All authors have read and agreed to the published version of the manuscript.

Funding

Research for this paper was carried out under the KICT Research Program (Project No. 20250108-001, Development of IWRM-Korea Technical Convergence Platform Based on Digital New Deal) funded by the Ministry of Science and ICT. This work was also supported by the Korea Environmental Industry & Technology Institute’s Drought Response Water Management Innovation Technology Development Project funded by the Ministry of Environment (2020361002).

Data Availability Statement

The data presented in this study will be made available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

TDS	Total dissolved solids
MIN	Mineralization
MLP	Multilayer perceptron
LMBP	Levenberg–Marquardt backpropagation
AI	Artificial intelligence
WQI	Water quality index
ANN	Artificial neural networks
EC	Electrical conductivity
RBF-NN	Radial basis function neural networks
PNN	Probabilistic neural networks
FCNN	Feedforward connected neural networks
R²	Coefficient of determination
RMSE	Root mean square error
CB	Charge balance
LMBP-MLP	Levenberg–Marquardt backpropagation multilayer perceptron

References

UNESCO. The United Nations World Water Development Report 2018-Nature-Based Solutions for Water; UN: New York, NY, USA, 2019. [Google Scholar]
Boretti, A.; Rosa, L. Reassessing the projections of the world water development report. NPJ Clean Water 2019, 2, 15. [Google Scholar] [CrossRef]
Canton, H. Food and agriculture organization of the United Nations—FAO. In The Europa Directory of International Organizations 2021; Routledge: Oxfordshire, UK, 2021; pp. 297–305. [Google Scholar]
Hamed, Y.; Hadji, R.; Redhaounia, B.; Zighmi, K.; Bâali, F.; El Gayar, A. Climate impact on surface and groundwater in North Africa: A global synthesis of findings and recommendations. Euro-Mediterr. J. Environ. Integr. 2018, 3, 25. [Google Scholar] [CrossRef]
Bioud, I.; Semar, A.; Laribi, A.; Douaibia, S.; Chabaca, M.N. Assessment of groundwater quality and its suitability for irrigation: The case of Souf Valley phreatic aquifer. Alger. J. Environ. Sci. Technol. 2023, 9, 1429–1441. [Google Scholar]
Shiri, N.; Shiri, J.; Yaseen, Z.M.; Kim, S.; Chung, I.M.; Nourani, V.; Zounemat-Kermani, M. Development of artificial intelligence models for well groundwater quality simulation: Different modeling scenarios. PLoS ONE 2021, 16, e0251510. [Google Scholar] [CrossRef]
Alizamir, M.; Ahmed, K.O.; Kim, S.; Heddam, S.; Gorgij, A.D.; Chang, S.W. Development of a robust daily soil temperature estimation in semi-arid continental climate using meteorological predictors based on computational intelligent paradigms. PLoS ONE 2023, 18, e0293751. [Google Scholar] [CrossRef]
Lopes, M.B.S. The 2017 World Health Organization classification of tumors of the pituitary gland: A summary. Acta Neuropathol. 2017, 134, 521–535. [Google Scholar] [CrossRef]
Khadra, F.W.; El Sibai, R.; Khadra, W.M. Deriving groundwater major ions from electrical conductivity using artificial neural networks supported by analytical hydrochemical solutions. Groundw. Sustain. Dev. 2024, 24, 101056. [Google Scholar] [CrossRef]
Tao, H.; Hameed, M.M.; Marhoon, H.A.; Zounemat-Kermani, M.; Heddam, S.; Kim, S.; Sulaiman, S.O.; Tan, M.L.; Sa’adi, Z.; Mehr, A.D.; et al. Groundwater level prediction using machine learning models: A comprehensive review. Neurocomputing 2022, 489, 271–308. [Google Scholar] [CrossRef]
Khudair, B.H.; Jasim, M.M.; Alsaqqar, A.S. Artificial neural network model for the prediction of groundwater quality. Civ. Eng. J. 2018, 4, 2959–2970. [Google Scholar] [CrossRef]
Setshedi, K.J.; Mutingwende, N.; Ngqwala, N.P. The use of artificial neural networks to predict the physicochemical characteristics of water quality in three district municipalities, eastern cape province, South Africa. Int. J. Environ. Res. Public Health 2021, 18, 5248. [Google Scholar] [CrossRef]
Stylianoudaki, C.; Trichakis, I.; Karatzas, G.P. Modeling groundwater nitrate contamination using artificial neural networks. Water 2022, 14, 1173. [Google Scholar] [CrossRef]
Allawi, M.F.; Al-Ani, Y.; Jalal, A.D.; Ismael, Z.M.; Sherif, M.; El-Shafie, A. Groundwater quality parameters prediction based on data-driven models. Eng. Appl. Comput. Fluid Mech. 2024, 18, 2364749. [Google Scholar] [CrossRef]
Mateo, L.F.; Más-López, M.I.; García-del-Toro, E.M.; García-Salgado, S.; Quijano, M.Á. Artificial Neural Networks to Predict Electrical Conductivity of Groundwater for Irrigation Management: Case of Campo de Cartagena (Murcia, Spain). Agronomy 2024, 14, 524. [Google Scholar] [CrossRef]
Al-Sulttani, A.O.; Ali, S.K.; Abdulhameed, A.A.; Jassim, D.T. Artificial Neural Network Assessment of Groundwater Quality for Agricultural Use in Babylon City: An Evaluation of Salinity and Ionic Composition. Int. J. Des. Nat. Ecodyn. 2024, 19, 329–336. [Google Scholar] [CrossRef]
Sekkoum, M.; Safa, A.; Stamboul, M. Groundwater hydrochemistry of Aflou syncline, Central Saharan Atlas of Algeria. Desalin. Water Treat. 2020, 190, 424–439. [Google Scholar] [CrossRef]
Cerlini, P.B.; Silvestri, L.; Meniconi, S.; Brunone, B. Simulation of the water table elevation in shallow unconfined aquifers by means of the ERA5 soil moisture dataset: The Umbria region case study. Earth Interact. 2021, 25, 15–32. [Google Scholar] [CrossRef]
Kim, S.; Cho, J.S.; Park, J.K. Hydrological analysis using the neural networks in the parallel reservoir groups, South Korea. In World Water & Environmental Resources Congress; American Society of Civil Engineers: Reston, VA, USA, 2003. [Google Scholar]
Kim, S.; Seo, Y.; Lee, C.J. Modeling of rainfall by combining neural computation and wavelet technique. Procedia Eng. 2016, 154, 1231–1236. [Google Scholar] [CrossRef]
Zakhrouf, M.; Bouchelkia, H.; Stamboul, M.; Kim, S.; Heddam, S. Time series forecasting of river flow using an integrated approach of wavelet multi-resolution analysis and evolutionary data-driven models. A Case Study: Sebaou River (Algeria). Phys. Geogr. 2018, 39, 506–522. [Google Scholar] [CrossRef]
Hagan, M.T.; Demuth, H.B.; Beale, M. Neural Network Design; PWS Publishing Co., Ltd.: Worcester, UK, 1997. [Google Scholar]
Haykin, S. Neural Networks: A Comprehensive Foundation; Prentice-Hall Inc.: Upper Saddle River, NJ, USA, 1999. [Google Scholar]
Kim, S.; Lee, S. Forecasting of flood stage using neural networks in the Nakdong river, South Korea. In Watershed Management and Operations Management; American Society of Civil Engineers: Reston, VA, USA, 2000. [Google Scholar]
Bishop, C.M.; Nasrabadi, N.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006. [Google Scholar]
Zakhrouf, M.; Bouchelkia, H.; Stamboul, M.; Kim, S.; Singh, V.P. Implementation on the evolutionary machine learning approaches for streamflow forecasting: Case study in the Seybous River, Algeria. J. Korea Water Resour. Assoc. 2020, 53, 395–408. [Google Scholar]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Nocedal, J.; Wright, S.J. Numerical Optimization; Springer: New York, NY, USA, 1999. [Google Scholar]
Elmeddahi, Y.; Ragab, R. Prediction of the groundwater quality index through machine learning in Western Middle Cheliff plain in North Algeria. Acta Geophys. 2022, 70, 1797–1814. [Google Scholar] [CrossRef]
Ahlgren, P.; Jarneving, B.; Rousseau, R. Requirements for a cocitation similarity measure, with special reference to Pearson’s correlation coefficient. JASIST 2003, 54, 550–560. [Google Scholar] [CrossRef]
Kim, S.; Seo, Y.; Malik, A.; Kim, S.; Heddam, S.; Yaseen, Z.M.; Kisi, O.; Singh, V.P. Quantification of river total phosphorus using integrative artificial intelligence models. Ecol. Indic. 2023, 153, 110437. [Google Scholar] [CrossRef]
Seo, Y.; Kim, S.; Singh, V.P. Physical interpretation of river stage forecasting using soft computing and optimization algorithms. In Harmony Search Algorithm: Proceedings of the 2nd International Conference on Harmony Search Algorithm (ICHSA2015); Springer: Berlin/Heidelberg, Germany, 2016; pp. 259–266. [Google Scholar]
Alizamir, M.; Gholampour, A.; Kim, S.; Keshtegar, B.; Jung, W.T. Designing a reliable machine learning system for accurately estimating the ultimate condition of FRP-confined concrete. Sci. Rep. 2024, 14, 20466. [Google Scholar] [CrossRef]
Nash, J.E.; Sutcliffe, J.V. River flow forecasting through conceptual models part I—A discussion of principles. J. Hydrol. 1970, 10, 282–290. [Google Scholar] [CrossRef]
Kim, S.; Singh, V.P.; Lee, C.J.; Seo, Y. Modeling the physical dynamics of daily dew point temperature using soft computing techniques. KSCE J. Civ. Eng. 2015, 19, 1930–1940. [Google Scholar] [CrossRef]
Reed, M.H. Calculation of multicomponent chemical equilibria and reaction processes in systems involving minerals, gases and an aqueous phase. Geochim. Cosmochim. Acta 1982, 46, 513–528. [Google Scholar] [CrossRef]
Stuyfzand, P.J. Hydrogeochemcal (HGC 2.1), for Storage, Management, Control, Correction and Interpretation of Water Quality Data in Excel^® Spread Sheet; KWR-Rapport B111698-002; KWR: Nieuwegein, The Netherlands, 2012. [Google Scholar]
Kim, S.; Kim, H.S. Uncertainty reduction of the flood stage forecasting using neural networks model. JAWRA J. Am. Water Resour. Assoc. 2008, 44, 148–165. [Google Scholar] [CrossRef]
Fushiki, T. Estimation of prediction error by using K-fold cross-validation. Stat. Comput. 2011, 21, 137–146. [Google Scholar] [CrossRef]
Gu, Y.; Wylie, B.K.; Boyte, S.P.; Picotte, J.; Howard, D.M.; Smith, K.; Nelson, K.J. An optimal sample data usage strategy to minimize overfitting and underfitting effects in regression tree models based on remotely-sensed data. Remote Sens. 2016, 8, 943. [Google Scholar] [CrossRef]
Kisi, O.; Alizamir, M.; Trajkovic, S.; Shiri, J.; Kim, S. Solar radiation estimation in Mediterranean climate by weather variables using a novel Bayesian model averaging and machine learning methods. Neural Process. Lett. 2020, 52, 2297–2318. [Google Scholar] [CrossRef]

Figure 1. Geographic map of study area.

Figure 2. Heatmap of statistical associations based on Pearson correlation coefficient (PCC).

Figure 3. The performance of training algorithms.

Figure 4. Effect of difference in number of hidden neurons on model accuracy in validation phase (a) LMBP-MLP1 (SO₄²⁻), (b) LMBP-MLP2 (Mg²⁺), (c) LMBP-MLP3 (Na⁺), (d) LMBP-MLP4 (Ca²⁺), (e) LMBP-MLP5 (Cl⁻), (f) LMBP-MLP6 (K⁺), (g) LMBP-MLP7 (HCO₃⁻), and (h) LMBP-MLP8 (NO₃⁻).

Figure 5. Flowchart of modeling process.

Figure 6. Comparison between predicted and measured values of all major ions for all datasets. (a) LMBP-MLP1 (SO₄²⁻), (b) LMBP-MLP2 (Mg²⁺), (c) LMBP-MLP3 (Na⁺), (d) LMBP-MLP4 (Ca²⁺), (e) LMBP-MLP5 (Cl⁻), (f) LMBP-MLP6 (K⁺), (g) LMBP-MLP7 (HCO₃⁻), and (h) LMBP-MLP8 (NO₃⁻).

Figure 7. The sorted charge balance (CB) values of all of the samples utilizing the predicted ions.

Figure 8. Comparison of developed models’ performance in adjacent locations (Aflou, Madena, and Ain Madhi).

Table 1. Features and output variables for developed ANN models of all major ions.

ANN Model	Features	Output
LMBP-MLP1	TDS, MIN	SO₄²⁻
LMBP-MLP2	TDS, MIN, SO₄²⁻	Mg²⁺
LMBP-MLP3	TDS, MIN, SO₄²⁻	Na⁺
LMBP-MLP4	TDS, MIN, SO₄²⁻, Na⁺, Mg²⁺	Ca²⁺
LMBP-MLP5	TDS, MIN, SO₄²⁻, Na⁺	Cl⁻
LMBP-MLP6	TDS, MIN, SO₄²⁻, Na⁺	K⁺
LMBP-MLP7	TDS, MIN, Mg²⁺	HCO₃⁻
LMBP-MLP8	TDS, MIN, Mg²⁺	NO₃⁻

Table 2. Results of developed ANN models for all major ions.

ANN Model	Output	Training			Validation			Test			All
ANN Model	Output	R²	RMSE (mg/L)	NSE	R²	RMSE (mg/L)	NSE	R²	RMSE (mg/L)	NSE	R²	RMSE (mg/L)	NSE
LMBP-MLP1	SO4²⁻	0.923	65.730	0.920	0.964	56.970	0.962	0.842	53.660	0.840	0.936	63.368	0.930
LMBP-MLP2	Mg²⁺	0.921	14.890	0.918	0.943	11.800	0.936	0.980	12.840	0.978	0.924	14.274	0.910
LMBP-MLP3	Na⁺	0.916	20.230	0.915	0.927	17.270	0.926	0.759	14.960	0.754	0.916	19.346	0.910
LMBP-MLP4	Ca²⁺	0.867	21.990	0.864	0.887	23.510	0.878	0.945	36.460	0.941	0.892	24.034	0.889
LMBP-MLP5	Cl⁻	0.865	44.640	0.857	0.902	43.600	0.898	0.895	30.530	0.892	0.872	43.296	0.870
LMBP-MLP6	K⁺	0.533	2.990	0.535	0.601	2.850	0.531	0.045	6.480	0.003	0.441	3.482	0.440
LMBP-MLP7	HCO₃⁻	0.300	64.250	0.301	0.630	37.760	0.540	0.366	41.720	0.361	0.330	59.029	0.320
LMBP-MLP8	NO₃⁻	0.325	43.400	0.325	0.865	40.870	0.823	0.004	40.460	−0.933	0.523	41.886	0.510

Table 3. The values of charge balance (CB) and th evaluation of samples (Aflou).

Location	SO₄²⁻ Pred.	NO₃⁻ Meas.	HCO₃⁻ Meas.	Cl⁻ Pred.	Ca²⁺ Pred.	Mg²⁺ Pred.	Na⁺ Pred.	K⁺ Pred.	CB %	Evaluation
Aflou	105	5	240	45	88	23	23	7	0.03	Good
	410	30	326	135	152	68	82	7	3.54	Good
	906	15	273	210	292	146	168	14	7.45	Moderate
	393	14	239	190	153	61	86	7	3.23	Good

Table 4. The values of charge balance (CB) and the evaluation of samples (Ain Madhi).

Location	SO₄²⁻ Pred.	NO₃⁻ Meas.	HCO₃⁻ Meas.	Cl⁻ Meas.	Ca²⁺ Pred.	Mg²⁺ Pred.	Na⁺ Pred.	K⁺ Meas.	CB %	Evaluation
Ain Madhi	434	9	237	145	206	83	102	5	11.60	Poor
	426	10	232	145	206	80	104	5	11.90	Poor
	123	13	212	70	95	27	28	2	0.07	Good
	124	16	185	93	95	27	28	2	1.57	Good
	1677	4	237	400	177	298	270	15	4.87	Good
	1227	10	217	370	576	173	247	12	15.28	Poor
	291	13	247	240	187	38	140	6	4.51	Good
	281	2	241	205	183	38	131	6	7.40	Moderate
	352	5	162	220	163	50	104	15	2.65	Good
	257	34	144	155	128	41	58	6	0.77	Good

Table 5. The values of charge balance (CB) and the evaluation of samples (Madna).

Location	SO₄²⁻ Pred.	NO₃⁻ Meas.	HCO₃⁻ Pred.	Cl⁻ Meas.	Ca²⁺ Pred.	Mg²⁺ Pred.	Na⁺ Pred.	K⁺ Meas.	CB %	Evaluation
Madna	888	7	230	354	199	142	221	12	1.28	Good
	903	84	281	250	236	148	209	14	2.44	Good
	631	54	263	257	185	106	155	12	1.11	Good
	437	17	247	198	213	82	107	7	7.77	Moderate
	629	71	266	250	181	104	158	14	1.64	Good
	65	3	167	29	73	16	14	4	6.72	Moderate

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Stamboul, M.E.; Habib, A.; Hamimed, A.; Zakhrouf, M.; Chung, I.-M.; Kim, S. Extraction of Major Groundwater Ions from Total Dissolved Solids and Mineralization Using Artificial Neural Networks: A Case Study of the Aflou Syncline Region, Algeria. Hydrology 2025, 12, 103. https://doi.org/10.3390/hydrology12050103

AMA Style

Stamboul ME, Habib A, Hamimed A, Zakhrouf M, Chung I-M, Kim S. Extraction of Major Groundwater Ions from Total Dissolved Solids and Mineralization Using Artificial Neural Networks: A Case Study of the Aflou Syncline Region, Algeria. Hydrology. 2025; 12(5):103. https://doi.org/10.3390/hydrology12050103

Chicago/Turabian Style

Stamboul, Mohammed Elamin, Azzaz Habib, Abderrahmane Hamimed, Mousaab Zakhrouf, Il-Moon Chung, and Sungwon Kim. 2025. "Extraction of Major Groundwater Ions from Total Dissolved Solids and Mineralization Using Artificial Neural Networks: A Case Study of the Aflou Syncline Region, Algeria" Hydrology 12, no. 5: 103. https://doi.org/10.3390/hydrology12050103

APA Style

Stamboul, M. E., Habib, A., Hamimed, A., Zakhrouf, M., Chung, I.-M., & Kim, S. (2025). Extraction of Major Groundwater Ions from Total Dissolved Solids and Mineralization Using Artificial Neural Networks: A Case Study of the Aflou Syncline Region, Algeria. Hydrology, 12(5), 103. https://doi.org/10.3390/hydrology12050103

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Extraction of Major Groundwater Ions from Total Dissolved Solids and Mineralization Using Artificial Neural Networks: A Case Study of the Aflou Syncline Region, Algeria

Abstract

1. Introduction

2. Materials

2.1. Study Area and Data Collection

2.2. Mineralization

2.2.1. Laboratory Procedure

2.2.2. The Conversion Factor Method

3. Methodology

3.1. Artificial Neural Networks and Optimization Algorithms

3.2. Model Development

3.3. Measures of Accuracy

3.4. Hyperparameters Selection

4. Results and Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI