A Novel Hybrid Model for Developing Groundwater Potentiality Model Using High Resolution Digital Elevation Model (DEM) Derived Factors

: The present work aims to build a unique hybrid model by combining six fuzzy operator feature selection-based techniques with logistic regression (LR) for producing groundwater potential models (GPMs) utilising high resolution DEM-derived parameters in Saudi Arabia’s Bisha area. The current work focuses exclusively on the inﬂuence of DEM-derived parameters on GPMs modelling, without considering other variables. AND, OR, GAMMA 0.75, GAMMA 0.8, GAMMA 0.85, and GAMMA 0.9 are six hybrid models based on fuzzy feature selection. The GPMs were validated by using empirical and binormal receiver operating characteristic curves (ROC). An RF-based sensitivity analysis was performed in order to examine the inﬂuence of GPM settings. Six hybrid algorithms and one unique hybrid model have predicted 1835–2149 km 2 as very high and 3235–4585 km 2 as high groundwater potential regions. The AND model (ROCe-AUC: 0.81; ROCb-AUC: 0.804) outperformed the other models based on ROC’s area under curve (AUC). A novel hybrid model was constructed by combining six GPMs (considering as variables) with the LR model. The AUC of ROCe and ROCb revealed that the novel hybrid model outperformed existing fuzzy-based GPMs (ROCe: 0.866; ROCb: 0.892). With DEM-derived parameters, the present work will help to improve the effectiveness of GPMs for developing sustainable groundwater management plans.


Introduction
A large portion of the population all around the globe has been experiencing the problem of clean drinking water despite living on the blue Earth, mostly due to (1) the fact that only 2.5 percent of the water is drinkable; (2) the increased intake of green water (about 70% presently); and (3) the paucity of data on the possible source of groundwater on a micro-spatial level [1]. Irrigation, domestic, municipal, and industrial sectors have all increased their dependence on groundwater because it is very easily available, and it is a healthy water source [2,3]. Agriculture intensity in developing countries such as India and Bangladesh has been gradually growing in order to satisfy increasing demand for food crops. Rainfall occurs only during the monsoon season because of the region's monsoon climate, which lasts from June to September. During non-monsoon seasons, peasants rely on groundwater for cultivation [4,5]. According to the Central Groundwater Board (CGWB, 2014), 89% of overall groundwater is harvested for irrigation, and dependence has risen unexpectedly in the urban sector [6][7][8]. India's yearly renewable groundwater potential is estimated to be about 433 billion cubic metres (BCM), with 399 BCM of water available according to recent estimates [9,10], but the demand is several times higher than the supply. In many regions of the nation, such a scenario generally results in a decrease in the groundwater table. Furthermore, India's groundwater generation rate is approximately 50% [9][10][11][12][13]. The Central Groundwater Board (CGWB, 2014) has compiled a list of all of the blocks in the Malda district of West Bengal, India, where groundwater production is found to be semi-critical to critical. In the midst of this crisis, it is critical to look for potential groundwater zones for raising water supply and conserving water [14][15][16].
The identifications of potential groundwater zones and the calculations of groundwater availability are essential but complex when many conditioning variables are directly and indirectly involved [17,18]. One of the best methods for this is scientific aquifer mapping, which is now almost nonexistent in most developing countries [19,20]. Geographic information systems (GIS) and remote sensing have ushered in a new era in this area, allowing multi-parametric research [14,16,[21][22][23]. The choices of conditioning variables and the utilization of an efficient integration approach are crucial to effective modeling [10,[24][25][26][27][28]. Table 1 shows that some groundwater potentiality modelling conditioning factors, such as soil texture, groundwater level, annual rainfall, Normalized Difference Vegetation Index (NDVI), geology, land use land cover, elevation, slope, aspect, curvature, topographic wetness index (TWI), Terrain Ruggedness Index (TRI), stream power index (SPI), distance to river, and others, have been widely used [16,21,22,[29][30][31][32]. Elevation, slope, and rainfall have been identified as paramount parameters in the plains, while geology, lineament, and the other factors listed above have been identified as paramount variables in the mountainous area [24,[33][34][35][36]. It is worth noting that not all of the variables are necessary for all spatial units. Drainage density, for example, may be an effective parameter in flood plains, but it may not be so in mountainous areas with several first and second order ephemeral streams. As a consequence, caution must be exercised when selecting parameters for modelling the research unit's spatial characteristics [37]. Groundwater potential modeling is reliant on accurate data and applicable models [38]. Groundwater potentiality has been studied by using a number of methods, including physical, heuristic, and mathematical approaches [39]. Physically based methods assess groundwater potential by analyzing topographical structure and geological conditions [40,41]. Since they require very precise topography details, these methods are typically used for a small area. Depending on the experts, heuristic-based approaches determine the probability of Groundwater potentiality zones. According to Regmi et al. [42], such methods are strongly based on professional knowledge and, in general, achieve moderate precision. Groundwater potential modelling has favoured statistically based models such as statistical index (SI) [43], logistic regression (LR) [15,43], evidential belief function (EBF) [44], probability-frequency ratio (FR) [45,46], certainty factors (CF) [47][48][49], weight of evidence (WoE) [50,51], and index of entropy (IoE) [38,52] over the two groups mentioned above. These approaches are more objective and quantitative, since they are focused on existing groundwater availability areas and contributing factors. Standard statistical methods, on the other hand, are limited in their ability to predict the dynamic and nonlinear interactions between groundwater and the conditioning variables [53]. Machine learning is taken into account since no single approach or methodology is universally suitable for all fields and study area.
A vast number of groundwater-related data are becoming more readily available thanks to the rapid advancement of remote sensing technology. The majority of research used Big Data to model groundwater potentiality by using machine learning techniques since these approaches are capable of analysing the dynamic connection between groundwater potential and influencing parameters [54]. According to Jordan and Mitchell [55], machine learning is an artificial intelligence branch that employs computational algorithms to analyse and predict data by learning from training data. According to a review of the literature, artificial neural networks [56,57], neuro-fuzzy [58], decision trees [59,60], and support vector machines [44,47,61,62] have all been used to assess groundwater potential. While it is obvious that machine learning algorithms enhance prediction accuracy of regional groundwater availability, the generalisation efficiency of single classifiers also needs to be improved [63]. However, until now, groundwater researchers have been unable to agree on an appropriate model for assessing groundwater potentiality [62]. As a result, a number of ensemble methods have recently gained popularity in geohazard susceptibility and potentiality mapping [47,64,65].
The fundamental difference between the above-mentioned techniques is whether they treat data objectively or subjectively. The choice of technique is based on two factors: the analysis' goal and scope as well as the data's availability and quality. Since the two techniques are rarely employed together, a large number of available data are underutilized, although the knowledge driven methods have biasness for assigning weights to the parameters.
However, our goal was to create a hybrid model that combined the two methods in a holistic manner by employing a data-driven method defined by its simplicity and easy interpretation of the weights and a method that can derive weights for all parameters without biasness. Therefore, in the present study, we have used the information gain ratio technique, which can produce weights for all parameters without bias. The use of a multi-model approach and hybrid modelling to research susceptibility, vulnerability, risks, potentiality, and other topics is a relatively new development [37,[66][67][68][69].
In recent years, experimental hybrid approaches for groundwater potentiality research have been considered since there is a need for modern predictive approaches and techniques to be explored in order to acquire more scientific history for drawing fair conclusions [46,56,70]. For groundwater potentiality modelling, a variety of hybrid approaches have been successfully used, which were created by integrating statistical techniques with machine learning approaches, such as ANN-fuzzy logic [71], rough set-SVM [72], the adaptive neuro-fuzzy inference system (ANFIS), stepwise weighted assessment ratio analysis (SWARA) technique [73], EBF-fuzzy logic [74], and ANFIS combined with a frequency ratio [65].
Therefore, in the present study, we have used the information gain ratio technique for assigning weights to the parameters, and then different operators of fuzzy logic have been used for integrating the fuzzified weighted parameters. In this manner, the hybrid fuzzy models have been developed. Furthermore, a novel hybrid model has been generated by integrating all fuzzy-based hybrid models with logistic regression. A key capability of hybrid methods in groundwater potentiality research is the reduction in modeling error, which can generate robust groundwater potentiality models. The application of LR in the field of landslide, flood, groundwater, and other natural hazards prediction has widely been found in the literature. In addition, previous literature reported that LR has been highly successfully implemented for predicting natural hazards. However, in the present study, the LR model has been applied differently in order to obtain highly accurate GPM. In the present study, the LR has been applied on the already generated hybrid ensemble GPMs by considering them as parameters instead of groundwater potential conditioning parameters. The logic behind the application of LR on the generated GPMs is to reduce the error of prediction. In other words, no models are perfect for generating real worldlike situation. Therefore, some errors can occur during modeling process. In order to reduce these errors, many researchers have combined several models to generate ensemble models, although it cannot be stated that ensemble models predicted GPM without errors. Therefore, in the present study, we also used fuzzy hybrid models and also combined all models though the LR model. Consequently, some errors of the generated models can be reduced due the combination of all models with training datasets. Thus, a hybrid novel ensemble model for landslide prediction has been developed.
This capability may boost the popularity of this method and aid researchers in future groundwater potentiality research.
Furthermore, previous comparable research paid little attention to the sensitivity study of thematic layers. After constructing hybrid models, thematic layers in this analysis were subjected to sensitivity tests. In order to increase the accuracy of the model's predictions, the most influential thematic layers were calculated by using multiple machine learning-based sensitivity analyses. This methodology is used to reduce uncertainties in other RS/GIS-based models, such as soil erosion susceptibility [75,76], landslide vulnerability [77], and soil property estimation [15,78]. The RF based sensitivity analyses have been used in the research study to classify the significant thematic layers produced by the model. Furthermore, both parametric and non-parametric ROC curves were used to evaluate the model's performance. Very few studies have applied both the parametric and non-parametric ROC curves for validation. As a result, this research study will have a major impact on the sustainable management of groundwater.
Some research gaps have been identified based on the existing literature. For modelling groundwater potentiality, there is no universally accepted precise and accurate technique. As a result, new methods for predicting robust groundwater potential models must be created, validated, and applied. Validation of produced models is also lacking.
Therefore, in order to fulfill the mentioned research gaps, the study's principal goals are the following: (1) carefully investigate topographic data supplied from DEM by computing fourteen distinct DEM-derived CFs; (2) develop a novel hybrid algorithm-based GPM by integrating several fuzzy operator feature selection-based hybrid models with logistic regression; (3) perform sensitivity analysis using RF; and (4) validate the groundwater potentiality models by using parametric and non-parametric ROC curves. This research study would assist planners, regulators, lawmakers, and municipal governments in managing groundwater properly. The following are the contributions of this work:

•
General: The study adds to the robustness of expertise by designing and applying methods to a previously unexplored GPM and sensitivity analysis field. • Regional: Improved understanding of groundwater potentiality mapping in the Bisha watershed of the Saudi Arabia. The findings of this study would provide a solid foundation for earth scientists, elected officials, and partners in enhancing land management and catastrophe management. • Methodical: Proposed LR-based hybrid model by considering six fuzzy hybrid models for groundwater potential mapping. RF-based sensitivity model was developed for evaluating the influence of the parameters.
The remainder of this work is structured in the following manner. The materials and methods are addressed in Section 2. Sections 3 and 4 contain the results and discussion. Section 5 consists of conclusion remarks.

Study Area
The Bisha watershed, with an area of 21,260 km 2 , is located in Saudi Arabia's southwestern region and shares a border with Yemen. The Bisha watershed boundaries are found north of the equator, between 17 • 59 27.588 N and 20 • 49 13.958 N, and east of the Greenwich meridian, between 41 • 49 50.825 E and 43 • 11 20.254 E (Figure 1). It has a diverse landscape, with highlands, high mountains (between 2000 and 3000 m MSL), plateaus, and Wadiyan. It also includes a large area of the desert to the north and east, stretching all the way to Bisha and Tathlith. The elevation varies between 950 and 2980 m, with a mean of 1655 m. As per the geological settings [79], it is made up of Jurassic-Cretaceous sedimentary rocks (limestone, sandstone, and shale) and Precambrian granite igneous rocks. The region's climate varies significantly according to geography and season. The climate varies from semi-arid in the south regions to arid climates in the north. Three meteorological stations operated by Saudi Arabia's Presidency of Meteorology and Environment serve the study region (PME), namely Abha, Khamis Mushyet, and Bisha. The average temperature in the last 30 years, according to data from three stations, has ranged between 12 • C and 44 • C. The south-west monsoon brings variable rainfall to the highlands of this region [80]. The precipitation results from orographic convection over the scarp in the south-western region of Saudi Arabia, especially during the spring and summer, and are spread out over 2-4 months (March-June), whereas rainfall for the remainder of the period is insignificant [81]. The annual average rainfall is 245 mm. Rainfall exceeding 200 mm per annum is limited to a 20-30 km wide crest zone. Consequently, eastward and northward Wadi flow decreases rapidly downstream, and deposition is greater than erosion near the eastern edge of the plateau.
The rugged landscape of the watershed has aided in biodiversity conservation in the region. The Afromontane, which is part of the watershed, has a corresponding phytogeographic area [82]. The watershed's highland is surrounded by woodlands and Juniperus procera, which are home to many endemic and rare flora and fauna [83].

Materials
The groundwater potentiality models for this study were prepared by using fourteen groundwater conditioning parameters. All of these parameters have been extracted from high resolution ALOS PALSAR DEM (spatial resolution: 12.5 m). The DEM has been collected from Earth data of NASA (https://asf.alaska.edu/ (accessed on 7 August 2021)).

Groundwater Potentiality Inventory
For GWP mapping, several researchers have utilized the positions of springs, wells, and quant for inventory. Well points were taken into account for GWP in this study. The study region's inventory graph includes 50 well points collected from various resources and detailed site inspection. First, non-groundwater data similar to the groundwater data utilized for GWP modeling must be prepared [84]. The selection was made on the basis of the field survey, with equivalent numbers of non-groundwater data (50 points). The total numbers for groundwater and non-groundwater points are 100. By arbitrary separation, all groundwater and non-groundwater data have been divided into 80% (80 points): a proportion of 20% (20 points) delineates calibrating and test datasets. Model calibration is performed with groundwater and non-groundwater training data, while model validation is performed with groundwater and non-groundwater testing data [22]. Similarly, inventory maps for other areas have been developed.

Methods for Preparing Groundwater Potentiality Conditioning Factors
Since it requires multiple variables related to topography and hydrology in geospatial layout, the architecture of the spatial groundwater potentiality model is typically very complex and systematic. As a result, identifying variables that affect groundwater potentiality is critical, and scientifically selected criteria can confirm the accuracy of groundwater potentiality modelling charts. Considering the extensive literature review, data availability and technical setup for GWP modelling in the current study area, the fourteen groundwater potentiality influencing parameters were selected, such as elevation, aspect, TWI, SPI, STI, TRI, TPI, slope, profile curvature, plan curvature, convergence index, topographic feature, slope, flow accumulation, and flow direction. All contributing variables have 12.5 m spatial resolution. In the present study, we used R studio version 4.1.1, ArcGIS version 10.5, QGIS version 3.2, WEKA 3.9, GRASS GIS version 7.4.1, and SAGA GIS version 7.8.2 for machine learning, and we used graphical work, map preparation, and parameters generation from the DEM.

Elevation
The elevation is the most important element in GWP modelling [19]. The potentiality of groundwater is inversely proportional to height. Chen et al. 2020 reported that the probability of groundwater potentiality decreases with elevation and vice versa. As a result, the study area is characterized by relatively flat and low altitude. As a consequence, the presence of groundwater potentiality is normal in the region (Figure 2a).

Slope
The slope, which influences the speed of running water, is also a significant element in influencing the GWP [19,85]. It regulates the amount of water that accumulates in a given region and, thus, acts as an important function in the groundwater recharge operation. The study area is characterized by lower slope and flat areas, which positively influences good groundwater recharge. Therefore, most of the study area has high potential for groundwater ( Figure 2b).

LS Factor
The length (L) and steepness (S) of the topography that impacts the quantity of groundwater storage are defined by slope length (LS) (Figure 2c). The following equation is used to compute LS [86]: where fa denotes flow accumulation, and θ denotes the slope in degree.

TRI
The TRI is the most important conditioning variables for GWP modeling [19,20,23]. It was used to measure landscape heterogeneities since it represents the average discrepancy between a central pixel and its neighbouring cells, which affects drainage. As a result, the greater the TRI value, the greater the elevation discrepancy between neighbouring regions since the lower TRI value means higher groundwater content, which is mainly situated at low elevated differences. As a result, low TRI values indicate greater groundwater potential. In this study, the TRI value ranges from 0 to 29 ( Figure 2d).

Curvature, Profile, and Plan Curvature
Since curvature influences groundwater allocation, it is used in GWP modelling. It measures the change rate at which a profile line is changed by the angle of inclination of the tangential plane [87]. Ginesta Torcivia et al. [88] and Talukdar et al. [89] reported that the curvature differentiates the convergent and divergent runoff area. Costache and Tien Bui [90] reported that negative value areas are linked to the process of runoff convergence. It influences the flow of water over a surface and mostly governs water retention and groundwater recharge. Therefore, the study area along the river is highly potential for groundwater recharge during the rainy season (Figure 2e,f).
The hydrology of the surface and subsurface is affected by curvature. The greatest slope in specific direction is aligned to the profile curvature. The negative value of profile curvature implies that the water flow in the surface has slowed, while the positive value denotes that the water flow has increased, and zero shows that the surface is flat. Plan curvature, on the other hand, specifies the greatest slope in a perpendicular direction. It depicts how water flowing in the earth's surface converge and diverge. Negative values reflect the surface's concave slope, which leads water flow to confluence. Positive values, on the other hand, show that the surface has a convex slope that controls the divergence of water flow in the region.

Aspect
Another factor, aspect, has an effect on the flow paths of groundwater potentiality as well as soil moisture content [91]. It impacts the duration of sunshine, which has an effect on infiltration rates and snowmelt. As a consequence, it has an indirect effect on the GWP (Figure 2g).

Topographic Power Index
TPI is a topographic feature's slope orientation metric. TPI has been intended to distinguish between the elevation of the central point and the average elevation around the centre point in general. Positive and negative TPI values denote sites that are greater or lesser than the typical surrounding region, respectively, whereas a zero TPI value denotes a flat or continuous slope. The following equation was used to compute the TPI: where M 0 denotes the altitude of the middle point, M n denotes the altitude of the grid, and n indicates the total number of pixels in the DEM raster file's neighbourhood region.

Convergence Index
Convergence index is a smaller-scale measure of the concavity or convexity of the ground. Concavities (e.g., valleys) are represented by negative convergence, whereas convex features (e.g., ridges) are represented by positive convergence. The "system for automated geoscientific analyses SAGA" computed this parameter. Convergence had minimum, maximum, and mean values of −100, 100, and −0.586, respectively (3b).

Topographic Wetness Index
Topographic wetness index is a term that is frequently utilized to illustrate how topography affects the location and length of saturated source areas [92]. It depicts how topography influences runoff generation and the volume of flow that accumulates in a basin. This index depicts the volume of water stored in the area at each pixel scale [93] and is measured using Equation (3).
As is the entire upslope catchment area flowing downward from a slope angle of β. In general, high TWI values and GWP have a robust relationship [94]. The study area has TWI value ranges from 4.1 to 19 (See Figure 3c).

Stream Power Index
Another variable that influences the groundwater potentiality is SPI, which measures the stream's erosive strength [95]. Equation (4) is used to calculate the SPI.
A s indicates the catchment area, while β denotes the slope. The SPI in the study area ranges from 0 to >12.6 ( Figure 3d).

Flow Direction
After the depressions have been corrected, each cell in the grid is allocated to flow across the DEM surface, which is continuous [96]. The flow direction of a cell may be described as the direction in which water and sediment would flow out of that cell, regardless of the many techniques employed to determine flow direction in the grid. The D-8 technique assigns the flow direction for each cell in a grid to one of the eight neighboring or diagonal neighbouring cells in the direction with the steepest downhill slope. As a result, considering the flow direction and resolution of any cell, the distance travelled by a flow may be calculated by using the basic Euclidean distance between cell centres, which is dependent on cell size (1 unit cell size for N-S and E-W; 1.414 for diagonals) [96]. As a result, the total flow length at each particular cell in the watershed may be computed. The flow direction was determined using the SRTM digital elevation model, and it is currently used as a crucial input component for determining the flow length.

Flow Accumulation
The upslope area flowed from a point in topographic orientation is referred to as flow accumulation or upslope contributing area. The flow accumulation computed from the DEM could be used as a proxy for ridgelines, residual soil sites, colluvium concentration, moisture availability, and drainage flow lines [97]. Flow accumulation influences how slope materials and rainwater are spread, as well as where water and slope material flows tend to accumulate. The former is defined as divergent slopes, while the latter is regarded as convergent slopes, and both are potential locations for colluvium and saturation, as well as recharge and seepage zones.

Topographic Features
Topographic influences are critical for GWP modelling because they affect the hydrological characteristics of the research region both directly and indirectly [98,99]. At first, a Digital Elevation Model (DEM) was generated in ArcGIS 10.5 from the ALOS PALSAR DEM for the study area. From the DEM, we extracted topographic features in the SAGA GIS QGIS software.

Method for Groundwater Potentiality Conditioning Variables Using Multicollinearity Test
In flood susceptibility mapping, the multicollinearity test is a crucial step. Multicollinearity refers to the presence of a linear relationship between several or all of a regression model's independent parameters [84]. A division-by-zero in regression calculation might be caused by the existence of a linear connection between variables. This issue can result in erroneous computations, and dividing by a little number can cause the findings to be skewed.
Multicollinearity is a method that uses strongly interrelated independent variables in a logistic regression model. It indicates that one variable may be predicted with a high degree of accuracy linearly from the others. Multicollinearity has no effect on the model's predictability or dependability. It only has an impact on individual predictor estimations [100].
Variance inflation factors (VIFs), pairwise scatter plots, and eigenvalues in a correlation matrix are some of the approaches that may be used to discover multicollinearities. For each flood conditioning parameter, VIF is utilised to detect multicollinearity in this study. The VIF is a statistic that evaluates the strictness of multicollinearity by using a least squares regression. In the present study, VIFs have been used for analyzing multicolinearity.

Proposing Fuzzy Logic-Information Gain Ratio Weighting Based Hybrid Models for Groundwater Potentiality Mapping
In order to improve the precision of groundwater potentiality models, hybrid FL models have been generated by integrating fuzzy operators with information gain ratio. This was accomplished by a series of steps. The thirteen topographic and hydrologic parameters were used to construct groundwater potential models by using fuzzy logic model and information gain ratio. The details of fuzzy logic and information gain ratiobased hybrid models are as follows.
It is critical to evaluate the importance of the groundwater potentiality influencing parameter before practicing and validating the model [101]. Based on mathematical properties and interactions with groundwater, the value of each collected parameter has been quantified. The information gain ratio, i.e., InGR technique [99], was used to define the influential parameters for groundwater potentiality prediction. InGR significance is assigned to each determining factor to quantify its relevance. The higher the InGR rating, the more important the influence factor. The InGR model was selected for use in this analysis due to its consistency and efficacy, and it is measured using Equation (5): where the feature, x, belongs to the training point Z with Z i 1 = 1, 2, 3, . . . , n subsets,.
Using the information gain ratio and ground truth data, the weights for all parameters have been derived. Then, the weights were assigned to the parameters in the raster calculator of the ArcGIS 10.5 software.
Zadeh was the first to present fuzzy set theory [102]. It allows for the mathematical understanding of non-discrete natural phenomena [103]. The membership value of elements (x) has various degrees of support and confidence ( f (x)) in the range (0, 1), according to this theory [104]. The following formula may be used to describe a fuzzy set: where A denotes a fuzzy set, x represents universal element set R, and f (x)denote the fuzzy membership function. The membership value of a crisp set range (0, 1) is either 1 or 0, but a fuzzy set inherits continuous membership in the range (0, 1). Groundwater potential mapping necessitates the establishment of a fuzzy membership function of causal components. A membership function can be provided quantitatively by using mathematical equations depending on the data type (ordered or categorical) and its relationship with the dependent variables. In the present research, thirteen parameters have different data types and nature; therefore, different types of membership function have been used. The MS-Small has been used to create fuzzy crisp layers, such as elevation, slope, aspect, topographic features, flow direction, LS factor, and TRI. It determines membership by relying on the input data's mean and standard deviation, with small values indicating high membership. On the other hand, MS-Large function was used to transform the layers into fuzzy crisp layers, such as profile curvature, plan curvature, flow accumulation, flow direction, SPI, TPI, and TRI. It determines membership by relying on mean and standard deviation of the input data, with large values indicating high membership. Thus, we transformed weighted data into simplified, normalized, and unidirectional fuzzy crisp layers The fuzzy operation is the next stage in the fuzzy logic method. Important fuzzy operators include fuzzy OR, fuzzy AND, fuzzy algebraic sum, fuzzy algebraic product, and fuzzy gamma operator [105]. Only one of the contributing fuzzy sets has an influence on the resulting value in fuzzy OR and fuzzy AND. The fuzzy algebraic sum and fuzzy algebraic product operators, respectively, render the output set greater or equal to the maximum value and smaller or equal to the minimum value across all fuzzy sets [106]. The fuzzy gamma (γ) operator produces values that fall between the fuzzy algebraic product and fuzzy algebraic sum. The γ value ranges from 0 (no compensation) to 1 (full compensation). The degree of compensation between two extreme confidence levels is used to determine the optimal γ value.
In GPM investigations, the use of an appropriate fuzzy operator for data integration is essential for obtaining the best results. The type of geographic data to be integrated determines the fuzzy operator to use. All of the 13 criteria considered in this analysis included equal amounts of data. Data integration can be performed by using a mix of fuzzy operators or multiple distinct fuzzy operators depending on the nature of the spatial data. We employed all operators to integrate the fuzzy crisp layers in this investigation. The following formula was used to integrate fuzzy crisp layers: where f Ele , f Slope , f Aspect , f TW I , f TRI , and f SPI are fuzzy crisp layers of elevation, slope, aspect, topographic wetness index, topographic ruggedness index, and stream power index, respectively. Moreover, R i represents the fuzzy membership function of the ith map, i = 1, 2, . . . , n. The GPMs were created by applying the fuzzy gamma operator on the results of Equations (6), (7), and (10). Six distinct GPMs were prepared by using the following six values: AND, OR, GAMMA0.8, GAMMA 0.85, and GAMMA 0.9. GPM maps are raster data that are ordered and continuous, with each grid/cell quantitatively depicting the degree of ground water potentiality. GPM is measured using fuzzy operators in a variety of ways (0, 1). By using Jenk's Natural Break (ESRI 2012) categorization, these GPM maps were divided into five categories: very low, low, moderate, high, and very high potentiality. GPM maps were created accordingly.

Validation of the Models
The ROC curve is a graphical representation of the sensitivity (TPR) on the y-axis and the specificity (FPR) on the x-axis for different cut-off points of test data. For convenience, it is usually shown as a square box, with both axes ranging from 0 to 1. The AUC is a useful metric of sensitivity and specificity that may be used to examine a diagnostic test's intrinsic validity. AUC = 1 indicates that the diagnostic test is completely accurate in distinguishing between groundwater and non-groundwater [24]. This means that sensitivity and specificity are equal, and both false positive and false negative errors are zero. In actuality, this is quite unlikely to occur. The closer the AUC is to one, the better the test performance. The diagonal from (0, 0) to (1, 1) divides the square into two equal pieces, each with an area of 0.5. When the ROC is at this line, the test has a 50/50 probability of accurately distinguishing between groundwater and non-groundwater. The minimal AUC value should be 0.5 rather than 0 since AUC = 0 indicates that the test wrongly categorised all subjects with groundwater as negative and all non-groundwater subjects as positive. When the test findings are reversed, area = 0 becomes area = 1; therefore, a completely incorrect test can be converted into a perfectly accurate one.
The area under the ROC curve was calculated by using both non-parametric and parametric techniques. The user must make a decision. The parametric technique, on the other hand, has been frequently used to validate prediction models. Both parametric and non-parametric techniques were utilised in this research.

Non-Parametric
This does not need any test value distribution pattern, and the resulting area under the ROC curve is referred to as empirical. The first technique employs the trapezoidal rule. It simply joins the points at each interval of the recorded values of the continuous test and creates a straight line connecting the x-axis to determine the area. This creates a number of trapezoids, each of which can be simply computed and totaled.

Parametric
When the statistical distribution of diagnostic test results in positive and negative subjects is known, these are employed. For this, a binormal distribution is often employed. When test values in both positive and negative subjects have a normal distribution, this is relevant. The necessary parameters may be simply calculated by using the means and variances of test scores in positive and negative subjects if the data are really binormal or if a transformation such as log, square, or Box-Cox renders the data binormal. An AUC of >70% would be considered satisfactory model performance in this situation [107].

Sensitivity Analysis
Random forest offers two distinct important metrics for ordering variables and variable choice, mean decrease accuracy (MDA), and mean decrease Gini (MDG). When the values of a variable become randomly permuted relative to the original data, MDA evaluates the significance of the variable by evaluating the change in prediction accuracy [100]. MDG is the total of all Gini impurity reductions caused by a particular variable (when that variable is used to generate a split in the random forest), normalised by the number of trees.

Proposing LR-Based Novel Hybrid Model for Groundwater Potentiality Mapping
In order to improve the precision of groundwater potentiality models, LR was paired with previously applied and current hybrid models. This was accomplished by a series of steps. In the first round, information gain ratio-based weightage technique and six operators of fuzzy logic were integrated in order to construct groundwater potential models. The following procedure was used to integrate statistical models, such as LR with the hybrid models for obtaining higher accuracy in groundwater potential models. For incorporating the LR models, we used six newly developed models as parameters. Testing datasets (20% of the total datasets) were used to collect data from the six hybrid models, and they were held exclusively for validation purposes. The aim of using validation datasets and ensemble models was to determine how well the newly generated hybrid models predicted groundwater potentiality. Another significant advantage of using validation datasets for second-step modelling is that they can be used for any modelling purpose. As a result, it can provide outstanding ground truth proof for obtaining exceedingly detailed information about the outputs of newly developed hybrid models. The validation data was used for combining LR and hybrid groundwater potentiality models using the collected data. In the next stage, six hybrid models have been weighted by using the weights of six hybrid groundwater potentiality models derived from the LR models. Then, weighted parameters were integrated in a raster calculator; thus, an LR-based novel hybrid model has been constructed. In the last stage, the hybrid model has been validated using ROC curve in order to observe how accurately the hybrid model predicts the groundwater potential zones. Some details of the LR are as follows

Logistic Regression
The probability of an event is weighed against a number of potentially predictive variables in LR (LR), which is one of the most widely used statistical techniques. In the case of groundwater potentiality prediction, the expected occurrence has been defined as groundwater or non-groundwater, and the purpose of LR is to determine the best suitable algorithm to evaluate the relationship between a set of conditioning variables and the appearance or absence of groundwater [108]. In addition, as LR uses its convergence criterion to maximise the likelihood function [109], its predictive success in classification problems is considered to be very exceptional [110].

Multicolinearity Analysis
VIF and Tolerances (TOL) were utilised to determine multicollinearity between independent variables in this study. A multicollinearity problem is indicated by a VIF score of >10 and a tolerance value of 0.2. The VIF and tolerance values of all factors are less than 10 and more than 0.2, respectively, according to the results of the multicollinearity test. As a result, there is no concern with multicollinearity among independent variables. As a result, in the current study, all nine flood conditioning variables were taken into account while creating the FS map.

Proposing Feature Selection Based Hybrid GW Potentiality Models
In the present study, in order to propose feature selection approach integrated fuzzy logic models, a variety of steps have been followed. The initial collections of variables have varying levels of predictability. Including all factors in the study, on the other hand, may diminish the predictive power of the generated models. As a result, the predictive capacity of the elements used in the study should be assessed, and components that have a negative impact on the accuracy of the created models should be eliminated. For these purposes, data from 14 CFs were retrieved on the basis of the training datasets. The weights for all parameters were then calculated by using the information gain ratio (IG) method. This procedure will aid in the development of more accurate prediction models. Based on the IG findings, it could be stated that no parameters have negative prediction power; hence, no parameters have been removed from the modelling of GWP. Since the IG value indicates predictive power, we used these values as weights for all factors in our analysis. The parameters were then allocated weights. As a result, the normal parameters have been converted into weighted parameters. The following are the obtained IG values for all CFs: Following the generation of the weighted parameters, the parameters were converted into fuzzy crisp layers using the fuzzy membership function. The membership functions that were picked for different variables are specified in the Materials and Methods Section. The fuzzy membership functions, on the other hand, were chosen based on the data types and their contribution to groundwater potentiality modelling. As a result, the fuzzy membership function also served as a weightage mechanism.
Following the transformation of the fuzzy crisp layer, the fuzzy operators AND, OR, GAMMA0.75, GAMMA0.8, GAMMA0.85, and GAMMA0.9 were used to integrate the fuzzy crisp layers. As a result, hybrid fuzzy models for groundwater potentiality were created. Following that, the natural break algorithm was used for the resulting fuzzy models, which range from 0 to 1 in value (with 1 indicating high potentiality and 0 indicating low potentiality), for classification. There are five groundwater potentiality classifications, including very high, high, moderate, low, and very low groundwater potentiality zones. Figure 4 represents the groundwater potentiality models as constructed using advance hybrid algorithms, such as AND, OR, GAMMA0.75, GAMMA0.8, GAMMA0.85, and GAMMA0.9. As shown in Figure 4, the potential zones of groundwater were divided into five categories: very high, high, moderate, low, and very low. The potential groundwater zone runs in a northwest-southeast direction, parallel to the drainage direction of the catchment. The south and southeast are dominated by zones with high groundwater potential, whereas the north and northwest are dominated by areas with low groundwater potential zones. Around 1850-2149 km 2 and 3644-4585 km 2 areas to the total basin area are found to have 'very high' and 'high' potentiality for groundwater, respectively, in the case of the RF model (Table 1). In general, all of the models defined the river catchment area as having a lot of potential for underground water harvesting. However, since there are variations in the size of the region, it is critical to explain the best representative model.

Validation of the Models
The AUC of empirical and binormal ROC was used to validate the generated GPMs utilising the collected GPS points. The resulting AUCs under the respective ROC (empirical and binormal) are 0.81 and 0.805 for AND; 0.81 and 0.80 for OR; 0.801 and 0.798 for GAMMA0.75; 0.785 and 0.79 for GAMMA0.8; 0.77 and 0.785 for GAMMA0.85; and 0.792 and 0.796 for GAMMA0.9 (see Figure 5a-f). Based on both ROC curves, AND appeared as the best model, followed by OR, GAMMA0.75, GAMMA0.9, GAMMA0.8, and GAMMA0.85. However, as per the binormal ROC curve, which has widely been applied for natural hazards, AND appeared as best model (AUCb: 0.805), followed by OR (AUCb: 0.8), GAMMA0.75 (AUCb: 0.798), GAMMA0.9 (AUCb: 0.796), GAMMA0.8 (AUCb: 0.79), and GAMMA0.85 (AUCb: 0.785). Although all models achieved AUC values near 0.8, they can be considered as satisfactory results. However, many researchers obtained AUC values of more than 0.4. Due to absence of other climatic, land use, hydrologic, and geological data, the generated hybrid models achieved little bit lower accuracy. In addition, the application advanced machine learning algorithms can also produce highly accurate models. However, in order to improve these models further, the LR has been recommended for utilization. As a result, the absence of other parameters can be overlooked.

Sensitivity Analysis
The development of advanced hybrid algorithms for mapping groundwater potential zones can only show the probable region enclosing the future occurrence of a significant and a quantity of groundwater supplies that can be commercially used based on the complex mathematical relationship between historical groundwater trends and their triggering variables. Neither of these models, however, mentions the role of any variables in the declining trend of groundwater potential in an area. The question arises as to how manage-ment plans can be planned and implemented if the effect of such factors on the occurrence of landslides cannot be determined.
How will management plans be developed, and how will management plans be implemented if the effect of such conditions on groundwater potentiality cannot be determined? Identifying variables related to groundwater potential zones may aid in reducing the frequency of groundwater decline in a given region. Therefore, it is crucial to figure out which variables have the most influence. In order to estimate the relevance of each conditioning variable in the RF modelling process, the mean decrease in Gini (MDG) and mean decrease in accuracy (MDA) have been employed [111]. The findings (based on MDG and MDA) show that all factors were included in the GWP modelling, but the most important ones were convergence index, topographic features, flow accumulation, elevation, LS factors, flow direction, profile curvature, and slope ( Figure 6). TPI, aspect, and TRI were the least important determinants in establishing the relative significance of the 14 variables included in the hybrid models ( Figure 6).

Development of LR-Based Hybrid Model and Its Validation
This study utilizes six hybrid algorithms for mapping groundwater potential zones. Furthermore, a sensitivity model based on RF was used to describe the sensitive parameters of these potential groundwater models. All of these models performed very well for predicting potential groundwater zones, according to the results of the ROC's AUC values, using AND, OR, GAMMA0.75, GAMMA0.8, GAMMA0.85, and GAMMA0.9 models. Nonetheless, this study aimed to improve the precision and reliability of groundwater potential mapping by integrating statistical models such as LR models with previously created potential groundwater zone models. We used all the nine groundwater potential models that had already been developed by using the machine learning algorithms. After integrating six groundwater potential maps with mathematical models, the error for over and underestimation of all models can be eliminated. The LR model's characteristics are as follows: The Chi-square significance was greater than 0.05 in the Hosmer-Lemeshow test, indicating that the equation's fitting goodness could be calculated. The independent variables could explain the dependent factor according to the Cox and Snell (R2) and Nagelkerke (R2) values (Cox and Snell R2: 0.171; Nagelkerke R2: 0.233). The following is how the LR was calculated (Equation (12)).
The performance of the LR-based novel hybrid ensemble models is shown in Figure 7. The groundwater output was divided into five subclasses using the natural break algorithm (from very high to very low). The very high groundwater potential zone was estimated by the novel hybrid model to be 1635.92 km 2 , followed by high (3235.88 km 2 ), moderate (5233.84 km 2 ), low (6001.85 km 2 ), and very low (5165.05 km 2 ). Then, by using the ROC curve, we validated the novel hybrid groundwater potential model by using the testing dataset. The new model's AUC values (AUCe: 0.866, AUCb: 0.892) indicate that it is extremely accurate and informative in estimating potential groundwater zones (Figure 8). It also outperformed the six hybrid models, implying that combining an advanced hybrid model with a statistical model would improve model accuracy even more.

Discussion
In areas with limited surface water, particularly in dry and semi-arid climates, groundwater supplies are critical. When these regions' industrial and agricultural water needs are high, aquifers can be depleted at rates that surpass recharge rates, placing them in danger of irreversible harm. As a result, precise estimates of the spatial dimensions of groundwater in a watershed are critical. Assessments such as these can help with water management planning and usage strategies. By understanding the geomorphological and hydrological conditioning variables connected with subsurface storage, groundwater mapping may be enhanced and made more cost-effective. In order map groundwater potential, several approaches have been devised, each with its own set of benefits and drawbacks.
To map groundwater potential, this study created a unique state-of-the-art method that combined statistical, machine-learning, and feature selection approaches with RS and GIS tools. To assist scientific judgments for issues with a variety of management criteria, feature selection techniques, statistical models, and machine-learning algorithms (such as information gain ratio, fuzzy logic, and LR, respectively) have been utilised. The very high groundwater potential zone, according to the results of six advanced hybrid models and one novel hybrid model, encompasses an area of 1635-2149 km 2 . The groundwater potential models were evaluated by using an empirical and binormal ROC curve. AND (AUCe = 0.81; AUCb: 0.804) was the best representation model for groundwater potentiality modelling based on both ROC curves, followed by OR, GAMMA0.75, GAMMA0.9, GAMMA0.8, and GAMMA0.85. On the other hand, the novel hybrid model achieved higher accuracy (AUCe: 0.866; AUCb: 0.892). This model outperformed the six hybrid models. The integration of all hybrid models through the LR model has eliminated the mis-classified area. Therefore, the novel hybrid model performed better and can be used for other areas, although higher accuracy can be achieved if all kind parameters have been integrated. According to our findings, GW managers in the study area region should concentrate on expanding GW exploitation and agricultural operations in the area close to rivers. Since they receive more natural recharge, these locations have a larger GW potential.
Convergence index, topographic feature, flow accumulation, altitude, LS, and slope degree all contributed significantly to the GW potential investigation. Convergence affects the flow speed on the slopes, which in turn affects erosion and sedimentation, which in turn affects the infiltration rate and GW potential indirectly. TWI is a hydrological conditioning variable that relates to the likelihood of water accumulation in different areas of the basin and, as a result, influences the basin's GW potential. The fourth most critical factor in modelling was altitude. Higher elevations have steeper slopes and increased flow velocity, resulting in higher drainage density, whereas lower altitudes have the opposite scenario. Regarding the large topographical variance in the studied region (947 to 2992 m), this aspect contributed significantly to the GW potential in our study. SPI, LS, and slope degree, among other significant conditioning variables (CF), have an influence on flow speed and infiltration rate and are, thus, listed as top contributors in this study. In estimating GW potential, Rahmati et al. [112] demonstrated the relevance of the two categorical conditioning factors of land use and lithology. Furthermore, they stated that TWI was the least important CF, which contradicts the findings of this study. According to Mousavi et al. [113], the top contributing CFs are TWI, height, distance from rivers, river density, distance from faults, and fault density. Yousefi et al. [114] presented a work that stressed the relevance of TWI and altitude variables in groundwater potential mapping, which is consistent with the findings of our study. Despite the fact that the two critical CFs connected to the fault layer were not included in this investigation, the algorithms produced highaccuracy GPMs. This demonstrates that substituting simpler calculation-process factors such as DEM-derived CFs for the key component in GW modelling may nevertheless achieve excellent accuracy.
LR-based novel hybrid model outperformed fuzzy-based hybrid models when it came to groundwater potentiality models. Based on these findings, the paper recommends utilising hybrid models and ensemble models for multi-parametric spatial prediction. To the authors' knowledge, although there has been no previous study that used these models in this study area, it should be noted that the LR-based hybrid model has demonstrated good performance in other environmental fields such as livelihood risk prediction [115,116], groundwater salinity [117], stream-flow prediction [118], piping erosion [119], and flashflood hazard assessment [120,121]. However, it can be noted that state-of-the-art machine learning models outperform older approaches in most cases [121]. Ensemble models, in particular, frequently outperformed single models [122]. For groundwater potential mapping, Mosavi et al. [123] assessed four ensemble models, i.e., Boosted generalized additive model (GamBoost), adaptive Boosting classification trees (AdaBoost), Bagged classification and regression trees (Bagged CART), and random forest (RF), and found that the Bagging models (i.e., RF and Bagged CART) had a higher performance than the Boosting models (i.e., AdaBoost and GamBoost). This indicates that ensemble models outperformed other traditional ensemble models. Therefore, based on the previous literature, it can be stated that the generated hybrid models could be reliable and used for management strategies. This study also suggests that a few more hydrogeological and meteorological factors should be included in the models in order to improve the accuracy of the results. The study area is known for its water scarcity due to damming across the river and other human issues. Such information might be useful in developing long-term water harvesting and agricultural plans. Water bodies have been recognised as a good conditioning factor for groundwater potentiality; therefore, rapid reclamation of water sources should be avoided at all costs. According to this research, land cover and canopy density are also significant conditioning factors. Forest loss and devastation, on the other hand, are irrefutable realities. As a result, maintaining forest cover will help groundwater recharge. A scientific study of groundwater at distinct prospective zones is required in order to make a more accurate suggestion on the amount of water that may be collected from each potential zone.
The results obtained are exclusive to this study, and they may differ in other investigations due to the fact that the input data for modelling vary from one location to another. As a result, possible models must be considered, with the best predictive capability-based model being chosen to aid in the identification of high groundwater potential regions in order to alleviate drought. Moreover, the novel technique can be useful in areas where there is a shortage of precise and high-resolution data, such as land use, lithology, soil, hydrogeology, and fault-related variables such as fault density and distance from faults.

Conclusions
The goal of this research study was to introduce a novel approach for determining GW potential by using a limited amount of high-resolution input data. For GPM, the suggested system used a weighted six fuzzy operator's model with logistic regression to implement feature selection. The novel hybrid model outperformed the six fuzzy-based hybrid models and provided trustworthy GPMs according to the results. The novel hybrid model's higher accuracy might be due to its capacity to provide more broad outputs and deal with overfitting. All of the models projected the same regions with extremely high GW potential. The reliability of the technique is demonstrated by the GPMs' consistency. The major goal of this work was to establish precise GPMs in the study region using solely DEM-derived variables. The relevance of the novel technique was demonstrated in the absence of other major CFs owing to the algorithms' high AUC values. The results indicated that convergence, TWI, altitude, SPI, and LS all have a significant role in the algorithms' performance. As a result, in the absence of other key CFs including land use, lithology, soil, and fault-related CFs, the novel hybrid algorithms successfully extracted connections between DEM-derived variables and GW potential. Since we only need a few numbers of datasets covering DEM and groundwater locations, the suggested methodology may be utilised for large-scale GW potential mapping at the nation and continent levels. It should be noted that the established technique is suggested for assessing GW potential through topographically driven groundwater locations. This technique may provide water sector managers and GW experts with the knowledge they need to establish appropriate water resource planning. Scholars may focus future studies on the DEM's spatial resolution and other DEM-derived variables in order to enhance the technique and, as a result, the modelling outcomes.
The absence of information on groundwater productivity features such as as transmissivity and specific capacity further hampered our investigation. In future research, when these data are available, it is suggested that the relationship between these parameters be explored. Despite the drawbacks, the groundwater potential maps projected in this work can assist water resource managers and policymakers in the disciplines of watershed and aquifer management in preserving the best possible use of this vital freshwater resource.
The superior models developed in this work might be useful with respect to water resource managers in identifying susceptible areas and developing and enforcing appropriate groundwater management regulations. Hybrid machine learning approaches and deep learning are strongly recommended for future study in order to discover an ideal model with a greater level of adaptivity, accuracy, and generalisation ability.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

GPM:
Groundwater potential model LR: Logistic regression DEM: Digital elevation model ROC: Receiver operating characteristic ROCe: Empirical receiver operating characteristic ROCb: Binormal receiver operating characteristic CGWB: Central Groundwater Board BCM: Billion cubic metres GIS: Geographic information system NDVI: Normalized Difference Vegetation Index TWI: Topographic wetness index TRI: Terrain Ruggedness Index SPI: Stream power index EBF: Evidential belief function SI: Statistical index WoE: Weight of evidence ANN: Artificial Neural network