Advancing Landslide Susceptibility Mapping in the Medea Region Using a Hybrid Metaheuristic ANFIS Approach

: Landslides pose signiﬁcant risks to human lives and infrastructure. The Medea region in Algeria is particularly susceptible to these destructive events, which result in substantial economic losses. Despite this vulnerability, a comprehensive landslide map for this region is lacking. This study aims to develop a novel hybrid metaheuristic model for the spatial prediction of landslide susceptibility in Medea, combining the Adaptive Neuro-Fuzzy Inference System (ANFIS) with four novel optimization algorithms (Genetic Algorithm—GA, Particle Swarm Optimization—PSO, Harris Hawks Optimization—HHO, and Salp Swarm Algorithm—SSA). The modeling phase was initiated by using a database comprising 160 landslide occurrences derived from Google Earth imagery; ﬁeld surveys; and eight conditioning factors (lithology, slope, elevation, distance to stream, land cover, precipitation, slope aspect, and distance to road). Afterward, the Gamma Test (GT) method was used to optimize the selection of input variables. Subsequently, the optimal inputs were modeled using hybrid metaheuristic ANFIS techniques and their performance evaluated using four relevant statistical indicators. The comparative assessment demonstrated the superior predictive capabilities of the ANFIS-HHO model compared to the other models. These results facilitated the creation of an accurate susceptibility map, aiding land use managers and decision-makers in eﬀectively mitigating landslide hazards in the study region and other similar ones across the world.


Introduction
Landslides are a multifarious phenomenon, representing a substantial global hazard and endangering human lives, infrastructure, and transportation networks [1,2].Annually, slope failures incur substantial economic losses, both directly and indirectly, potentially accounting for losses of millions of dollars [3,4].Over the past two decades, researchers have increasingly recognized a worldwide landslide susceptibility map as a valuable tool for identifying areas vulnerable to landslide risk in urban or rural se ings [3].Such maps are particularly essential in the Medea region, where the specific geological, climatic, and physiographic conditions make it prone to landslide occurrences.Moreover, human activities indirectly contribute to landslide risks due to significant population growth, prompting the government to expand the road network and urbanize the region, thereby placing more infrastructures in landslide-prone areas.The construction of such an infrastructure typically involves excavation and tunneling activities, disrupting the natural equilibrium of soils and potentially inducing slope instability [1,4].The landslide that occurred in January 2014 within the Jebel El Ouahch Tunnel in Constantine stands as a prominent example of such cases, directly linked to tunneling activities [1].As a consequence, the tunnel was closed, leading to a modification in the alignment of the A1 highway corridor [1].Hence, a landslide susceptibility map is a crucial tool in landslide hazard management, providing valuable information for local governments in order to develop master plans and derive solutions aimed at mitigating the catastrophic consequences of landslides.Such maps facilitate the development of appropriate planning and decisionmaking tools to address landslide risks effectively [3,5,6].
Since the 17th century, numerous approaches have been used to develop methodologies for predicting landslide susceptibility and producing corresponding maps [7,8].Such maps serve as valuable tools for identifying regions vulnerable to landslide hazards.In this context, a variety of approaches have been employed for landslide susceptibility mapping.These approaches can be categorized into three main types: qualitative, quantitative, and semi-quantitative methods [9,10].Qualitative methods are characterized as straightforward approaches that primarily rely on direct field measurements and the expertise and experience of experts.On the other hand, quantitative methods are regarded as rigorous and objective approaches that use statistical and mathematical techniques to analyze data [10].Qualitative and quantitative methods are used to mitigate the subjectivity inherent to landslide susceptibility assessments by integrating geotechnical and statistical models.Furthermore, new hybrid methods have been introduced in the literature.These methods originated from the aforementioned approaches by merging qualitative and quantitative methods to assess the importance of the input parameters in generating landslide hazard maps.These hybrid methods are commonly called semi-quantitative methods [11].The ease of use and effectiveness of the three aforementioned methods have rendered them popular and valuable, owing to their straightforward representation of the dependent variable (i.e., landslide susceptibility) and independent variables (i.e., its drivers) [12,13].To the best of the authors' knowledge, landslide inventories and heuristic methods stand out as the most frequently employed qualitative approaches [14].The primary drawback of qualitative methods lies in their subjectivity, stemming from the experiential ranking of landslide predisposing factors based on the expertise of individuals [14].
Similarly, quantitative methods can be categorized into several subclasses, including statistical, deterministic, and machine learning approaches [10].Deterministic methods, also called geotechnical methods, have seen extensive applications in the literature.These approaches rely on geotechnical parameters determined on-site, coupled with the engineering principles of slope instability, typically expressed through a safety factor.However, deterministic methods tend to overlook climatic and anthropogenic factors [14].The primary limitation of deterministic methods is their reliance on comprehensive geotechnical and hydrological data, which can be challenging to gather for large areas [14].Moreover, these methods are generally applicable only to mapping small areas [15].Typically, statistical methods aim to predict the relationship between historical landslides and their drivers through bivariate or multivariate techniques [10].Logistic regression, weight of evidence, and analytical hierarchy process methods are among the statistical approaches commonly employed for modeling landslide susceptibility.However, a key criticism of statistical methods is their requirement for the drivers to follow a normal distribution, which may not always be true under real world conditions.Additionally, these methods inherently assume linearity [16] and rely on simplified assumptions, such as linear behavior or production heuristics, which can limit the effectiveness of statistical methods in modeling complex nonlinear phenomena [17,18].
However, certain limitations have been mentioned in the literature with respect to traditional machine learning models.The process of selecting the optimal model can be time-consuming, and these models are prone to issues such as overfi ing and underfitting.Moreover, the convergence of algorithms during the training phase relies heavily on the initial complex values, leading to slow training speeds [16].Recent studies have primarily focused on enhancing traditional machine learning methods through the use of hybrid metaheuristic algorithms [67,68].These approaches serve as a prevalent solution to address the challenges encountered by machine learning algorithms.The primary advantage of integrating metaheuristic algorithms with machine learning methods is the enhancement of convergence during the learning phase toward the optimal solution.However, despite this benefit, hybrid machine learning methods have received relatively li le a ention in the context of landslide susceptibility mapping.
Building upon the background information provided, the current research aims to develop a novel advanced hybrid machine learning model designed to assess landslide susceptibility effectively.This model is subsequently integrated into a geographical information system (GIS), resulting in the generation of an accurate map highlighting landslide-prone areas.The proposed map serves as a valuable tool for decision-makers and land use managers in the Medea area, which is known to be highly vulnerable to landslides, helping them to mitigate landslide hazards, and has the potential to be used in other regions of the world.The study significantly contributes to introducing novel hybrid metaheuristic machine learning methods, combining Genetic Algorithm, Particle Swarm Optimization, Harris Hawks Optimization, and Salp Swarm Algorithm with ANFIS, to enhance landslide susceptibility mapping accuracy.Furthermore, an advanced approach named K cross-validation has been employed to ensure that the models are be er generalized and less prone to overfi ing and underfi ing.Additionally, the research fills in a significant gap in the literature by focusing on the understudied region of Medea and providing valuable insights into landslide susceptibility in this vulnerable area.

Case Study
The study area is located in the central portion of the Tellian Atlas and characterized by high altitude and rugged terrain enclosing some fairly fertile plains that gradually fade into the borders of the high steppe plains, forming a series of rolling hills.Such a strategic position has made Medea a major transit zone and a link between the Tell and the Sahara, on one hand, and between the Eastern and Western High Plateaus, on the other.The province of Medea is located 88 km west of the capital, Algiers (see Figure 1).It covers an area of 8775.65 km² and is bounded by longitudes of 2°07′54″ and 3°42′13″ east and latitudes of 35°25′28″ and 36°30′02″ north.The total population of the province is estimated at 861,204 inhabitants (2011), with a density of 99 inhabitants per km².The province of Medea, consisting of 64 municipalities, has important thermal springs and tourist sites.This se lement has a total agricultural area of around 773,541 hectares and adequately watered terrain.Additionally, there is pastoral activity practiced over an area of pasture of more than 200,000 hectares located in the southern zone of the province.Medea has a semi-continental Mediterranean climate: cold and humid in winter, temperate in spring, and hot and dry in summer.
From a geological standpoint, previous studies have demonstrated that the region of Medea exhibits heterogeneous characteristics, which is the reason for selecting it as the study area.Towards the northern part of the Wilaya (region), sedimentary terrains from the Senonian, Cenomanian, and Albian periods prevail.This area is notably rugged, featuring mountains cloaked in forests, where Chiffa shales and cargneule gypsums are commonly found.Rich copper deposits are also present in this region, contributing to its raw materials, particularly evident in the Mouzaia Mines (currently known as Tamezguida) located in the northwest.Additionally, a scree of sandstone formations of Upper Miocene can be found in Djebel Nador.Moving towards the southwestern part of the province, marls from the Cretaceous period and eruptive rocks like basaltic tuffs dominate the landscape.Recent alluvial deposits are observed along the wadis, while gravelly alluvium characterizes the plateaus of Ben Slimane.In the southern region of the province, gypsum clays, red clays, and sandstones are prevalent.Published studies have identified lithostratigraphic formations, including post-nappe Neogene deposits, as well as Quaternary terrains that unconformably overlay the Albian and Cenomanian formations.
Past landslide locations provide valuable insights into the spatial pa erns of landslide occurrences, aiding in landslide susceptibility zoning.They offer crucial information about past landslide behaviors and their relationship to causative factors.Therefore, creating a comprehensive landslide inventory is a critical step in a landslide susceptibility assessment.Many researchers generate landslide inventories using high-resolution remote sensing data or aerial photograph interpretation [69][70][71].Given the frequent activation of landslides during the rainy season in this area, Google Earth imagery was used to capture these rainfall-triggered landslides [69][70][71].
In this study, the landslide inventory map was prepared digitizing Google Earth imagery, with the locations subsequently field-verified.The data encompass events that occurred between 2009 and 2023, resulting in a total of 160 delineated landslides converted into raster format, And 80% of these landslides were assigned to the training dataset, while the remaining 20% were to the validation one (see Figure 2).Fieldwork and analysis revealed that most landslides are concentrated in the northern part of the Medea Wilaya.This can be explained by two primary factors.First, torrential rainfall, particularly during tropical storms, acts as a key trigger in the north.Second, the region's northern terrain features dense slopes, further contributing to landslide susceptibility.These slopes become susceptible to activation due to soil saturation, which reduces the geo-mechanical parameters of soil.

Overview of the Methodology
To present the most appropriate model for assessing landslide susceptibility and developing the target map, the methodology followed these steps: analyze the database and select the optimal input variables.4. Modeling with Hybrid Machine Learning Methods: The selected optimal inputs were modeled using novel hybrid machine learning methods: ANFIS-GA, ANFIS-PSO, ANFIS-HHO, and ANFIS-SSA.Landslide locations were divided into 128 landslide training sites (80%) and 32 landslide validation sites (20%). 5. Integrating metaheuristic algorithms with ANFIS: Metaheuristic algorithms (GA, PSO, HHO, and SSA) were used in conjunction with ANFIS to enhance the accuracy and reliability of landslide susceptibility assessments.This hybrid approach enabled us to effectively capture the complex relationships between inputs and target (landslide susceptibility), resulting in a more precise and robust susceptibility map of the study area.6. Selection of the Most Appropriate Model: Various statistical performance indicators such as sensitivity, specificity, precision, accuracy, and the Pearson correlation coefficient (R) were used to determine the most suitable model for predicting landslide susceptibility among the proposed models.7. Evaluation of Predictive Capability: The predictive capability of the optimal model was assessed using K-fold cross-validation, with K = 5 to address underfitting and overfitting issues.This approach helps ensure that the models are better generalized and less prone to overfitting and underfitting.
8. Integration into ArcGIS and Target Map Preparation: The most appropriate model was integrated into ArcGIS, and the optimal input maps were divided into 30 × 30 m cells using the fishnet tool, which is available in ArcToolbox.The ANFIS parameters of the best model were applied to analyze the optimal input layers and provide landslide susceptibility classes for each cell, resulting in the preparation of the landslide susceptibility map for the study area.
The methodology used to identify the most suitable model and create the target map is thoroughly explained in Figure 3.

Thematic Map Layers
Initially, Google Earth images were used to identify potential landslide areas.Subsequently, a series of field trips were conducted to validate the presence of landslides and assess their characteristics, including size, shape, and movement types; conduct site diagnostics; and ascertain the activity status (active, dormant, etc.) of failed slopes.A total of 160 training sites were identified, digitized, and rasterized in a GIS with a grid size of 30 × 30 m.This grid size was determined by the dimensions of the digital elevation model (DEM) obtained from the Shu le Radar Topography Mission (SRTM) database, which was used to generate the slope, aspect, and elevation maps.The SRTM 30 digital elevation model is widely used for various geospatial analyses due to its effectiveness and signifi-cance, as well-documented in the literature [72].A hydrographic network map was created on the DEM using the hydrology toolbox in ArcGIS, and this map was used to create the distance to rivers using the "near" tool available in ArcToolbox.Furthermore, the land cover map was created from the Sentinel-2 satellite imagery, 2022, with a 10 × 10 m grid, established by the Environmental Systems Research Institute, which has been classified into six classes.
The lithology reflects the physical and mechanical properties of geological formations covering the surface.The lithological map was constructed based on the geological map service of Algeria at a scale of 1:50,000.The precipitation map is classified based on data from the National Agency for Hydraulic Resources of Algeria at a scale of 1:500,000.Finally, the road map was extracted from the open-source Google Maps Road data available on Google Earth, and this map was used to create the distances to roads using the "near" tool available in ArcToolbox.Elevation, slope, and aspect had the same resolution as the DEM file, whereas lithology, land cover, and precipitation maps were converted into raster format files with the same resolution (30 × 30 m).Similarly, the distance to streams and distance to roads were converted into raster format files with a 30 m × 30 m resolution.
It is important to note that machine learning modeling can yield the optimal results when a comprehensive set of input variables is considered.However, it is crucial to include only those inputs that have a significant influence on estimating the target value [15].For this study, the selected effective drivers fall into three main classes: geological, geomorphic, and vegetation factors, which serve as inputs in the modeling process.Eight triggering landslide parameters were chosen based on a review of the landslide literature, expert knowledge, and the specific characteristics of the study area [1,8,14,15].These parameters include lithology, elevation, slope, land cover, distance to stream, precipitation, slope aspect, and distance to road.Subsequently, the Gamma Test method was employed for the optimal input selection (further explained in Section 2.6).Maps depicting landslide causative factors in the study area are presented in Figures 4-11.


Lithology (Figure 4) refers to the diverse geological formations present in the study area, characterized by their heterogeneity from a geotechnical perspective.It is recognized as one of the primary landslide-triggering drivers.Landslide susceptibility is significantly influenced by lithology, which directly controls the geo-mechanical properties of soil and rock.For example, formations with weaker mechanical properties, such as clay, marl, or highly weathered rocks, are more prone to failure, leading to a higher risk of landslides.Conversely, soils with high shear strength parameters like granite or basalt exhibit a greater resistance to slope failure [1,14,15]. Precipitation (Figure 5) plays a critical role in landslide susceptibility by saturating slopes, increasing the pore water pressure, inducing hydraulic erosion, triggering preexisting instability, and interacting with other drivers [73]. Elevation (Figure 6) is also considered an important driver of landslide occurrence.Often, elevation is indirectly linked to other drivers such as slope, erosion, precipitation, soil thickness, and land use.Additionally, it helps to classify the local terrain morphology by identifying areas of high and low elevation [1,14,15]. Slope (Figure 7) is recognized as the most important driver of landslides [74].At the local scale, it affects the pore pressure levels.On a larger scale, it regulates the regional hydraulic continuity and is considered the primary driver for GIS-based mapping.The slope angle typically correlates with slope movement and relates closely to the driving forces; higher slope angles indicate greater susceptibility to landslides [1,14,15]. Land cover (Figure 8) is another influential driver of landslide susceptibility, in which vegetation plays a significant role in reducing soil erosion.Varying vegetation in a region is a key factor that notably influences slope stability.Vegetation contributes to enhancing the mechanical soil parameters through root strengthening, which can decrease the occurrence of landslides [1,14,15].


Distance to stream (Figure 9) refers to the proximity of a site to a stream and is also a crucial driver of slope stability.Streams can adversely affect site stability by eroding the slope or saturating the lower part of the slope, leading to an increase in the water levels and a decrease in the soil mechanical parameters [1,14,15]. Distance to the road (Figure 10) has been identified as a significant driver of landslide occurrence by previous studies.Road and highway construction, particularly in mountain areas, where they are built alongside slopes, can disrupt the natural equilibrium of soils, potentially leading to slope instabilities.Conversely, a greater distance from roads reduces the load on both the topography and the toe of the slope, thereby decreasing the likelihood of landslides.Consequently, the road network can be considered a key factor in landslide occurrence [1,14,15]. Slope aspect (Figure 11) is considered by several researchers as an indispensable driver of landslide susceptibility modeling, because it is related to the primary rainfall direction, wind influence, and exposure of slopes to sunlight [1,14,15].The aspect data for the study area are summarized in Table 1, showing the surface area and percentage for each directional category.The data indicate that south-facing slopes (S) dominate the study area, accounting for 19.19% of the total surface area, followed by north-facing (N) and east-facing (E) slopes, representing 15.93% and 12.16% of the area, respectively.Northwest-facing (NW) slopes have the least coverage (8.74%).

ANFIS
ANFIS adopts a hybrid approach by combining an adaptive Artificial Neural Network (ANN) with a Fuzzy Inference System (FIS).Introduced by Jang in 1993 [75], ANFIS has consistently demonstrated its performance in addressing nonlinear problems across various studies.Through the training of input-output data, the ANFIS mechanism identifies the optimal parameters for membership functions (MFs).The ANFIS architecture, illustrated in Figure 12, consists of five layers.In the initial layer, known as fuzzification, the inputs in each node (j) undergo transformation into fuzzy membership functions through an activation function µ.This function can take various forms, such as triangular, trapezoidal, sigmoidal, Gaussian, etc., as described by Equations ( 1) and (2).= µ Aj (x) for j = 1, 2 = µ Bj − 2 (y) for j = 3, 4 where x and y are the inputs, is the membership function, and Aj and Bj are the membership values of µ.In the current study, the Gaussian membership function presented in Equation ( 3) is used: The parameters Cj and σj, which are the mean and standard deviation of the Gaussian curve, respectively, serve as the premise parameters for the membership functions.In layer 2, the weights (wk) for these membership functions are computed using Equation (4), which determines the firing strength of the rules.
In layer 4, known as the defuzzification layer, the output for each node j is evaluated using Equation ( 6), and the consequent parameters (p, q, and r) of the firing strengths f are calculated.
= j = j (pj x + qj y + rj) for j = 1,…, 4 The final layer calculates the overall output in a single node by summing up all the incoming signals, as described by Equation (7).

Genetic Algorithms (GAs)
Devised by John Holland in 1970 [76], Genetic Algorithms are inspired by natural selection and genetic processes.Population renewal depends on the success of the fi est individuals within the species.Initially populated with encoded points, GAs employ three operators (crossover, mutation for space exploration with potential solutions, and selection) to guide the population toward the optimal solution to a problem.The decision to terminate the Genetic Algorithm's execution depends on various criteria tailored to specific problem types.Commonly employed criteria include the convergence of adaptation mean within a population, available time for a single GA execution, and maximum number of generations or evaluation functions in GA testing.The random generation of a population chain may explore a limited solution domain, increasing the likelihood of identifying the optimal solution while risking the loss of proximity to the optimum during the search.Consequently, GA execution must be repeated multiple times with different sets of random starting points to enhance the probability of detecting the optimal solution.

Particle Swarm Optimization (PSO)
Introduced by Kennedy and Eberhart in 1995 [77], the Particle Swarm Optimization (PSO) algorithm is inspired by the social behavior and dynamic interactions observed in nature, such as those among insects, birds, and fish.It integrates individual experiences with social interactions, aiming to obtain an optimal solution.This is achieved by considering the best position encountered by a particle along with that of its neighbors.A predefined fitness function assesses the performance of each particle, quantifying its effectiveness in addressing the optimization problem.
The updated equations for Particle Swarm Optimization (PSO) are as follows: . .
where t is the current iteration, I the ith solution, is the position of ith solution in the h iteration, c1 and c2 are the acceleration coefficients, r1 and r2 are random values, pbest is the best solution that the ith particle has obtained so far, gbest is the best position of the total swarm, and is the velocity of ith solution in the h + 1 iteration.

Harris Hawks Optimization (HHO)
Crafted by Heidari et al. [78], Harris Hawks Optimization (HHO) stands as an innovative algorithm rooted in swarm intelligence.It has been extensively utilized in recent years for tackling intricate nonlinear optimization problems; the HHO mechanism emulates the hunting actions of hawks seeking their prey.The HHO process unfolds in two primary phases: exploration and exploitation.
In the exploration phase, a set of the initial population of Harris Hawks {X1, X2,…, Xn} is created randomly to track and detect the prey (rabbit) within the feasible space.According to Equation (10), both hawks and prey have the same chance q.
where X(t) and X(t + 1) are the positions of the Harris Hawk at iterations t and t + 1, respectively.Xrand is a randomly selected Harris Hawk among all available individuals; Xrabbit(t) is the position of the prey q; r1, r2, r3, and r4 are random values varying between 0 and 1; and LB and UB are the lower and upper bounds, respectively.Equation (11) gives the mean position Xm(t) of the total number N of Harris Hawks.
During the hunting process, the escaping energy E of the prey decreases, and the transition from the exploration to the exploitation phase can be expressed by Equation (12).
where E0 is the initial energy of the prey, ranging between −1 and 1, and T is the maximum iteration.
Exploitation is the second phase, which aims to refine the locally found solution.The prey, once a acked, tries to escape.Depending on the Hawks' chasing behavior, four scenarios can occur: Soft Besiege, Hard Besiege, Soft Besiege with Progressive Rapid Dive, and Hard Besiege with Progressive Rapid Dive.

Salp Swarm Algorithm (SSA)
The Salp Swarm Algorithm (SSA), introduced by Mirjalili in 2017 [79], mimics the behavior of salps during navigation and food source exploration in seas and oceans.In the SSA, an initial population is randomly generated and split into two groups: leaders and followers.Leader salps, positioned at the top of the salp chain, direct the swarm towards the designated food location (the target), while the other salps follow the leaders during the aggregation phase.
The position of the leading salp, denoted as xj 1 , is updated according to Equation ( 13).
Fj represents the best food solution in the jth dimension, ubj and lbj represent the upper and lower bounds in the jth dimension, respectively, c2 and c3 are random numbers ranged between 0 and 1, and c1 is a parameter that varies with the following expression (Equation ( 14)): where t and T represent the current iteration and the maximum number of iterations, respectively.Finally, the position of the followers xj i is updated according to Equation ( 15).
where i ≥ 2 and xj i represent the position of the ith follower at the jth dimension.

Gamma Test
The Gamma Test (GT) is an iterative mathematical method for assessing the variance of noise, or the Mean Squared Error (MSE), without overfi ing the model.It is particularly useful for evaluating the nonlinear correlation between two random variables, the dependent variable X and the independent variable Y.The GT method was initially utilized by Koncar [80] and Stefánsson et al. [81], and since then, numerous studies have been conducted to further develop it [82].Here, only key information about the GT is provided; for a deeper understanding of the method, it is recommended to read the cited references.
By adhering to the essential conditions outlined in previous studies, the bias of regression between γ(k) and δ(k) can be computed, where 1 ≤ k ≤ p, which is a crucial parameter for assessing the variance of noise, denoted by Γ. γ(k) and δ(k) are computed as follows: where xN[i,k] is the kth nearest neighbor of xi, yN[i,k] the corresponding target, and p the number of input variables.To compute Γ, a least squares fit line must be determined for the p points (δ(k),γ(k)).Then, the bias of the regression line will be simply valued, which characterizes the Gamma statistics parameter, Γ.Previous research has shown that Γ provides advantageous results for choosing the optimal input parameters.A small value of Γ indicates that the input set yields a be er fit.Furthermore, another very important parameter to evaluate the estimation of the input set is Vratio, computed as follows: where (y) is the output variance.In summary, in the present study, the optimal input combination is identified based on the lowest values of Γ and Vratio.

Statistical Indicators
The estimation accuracy of the suggested models was evaluated using various statistical indicators and graphical approaches.The statistical indicators employed in this study include sensitivity, specificity, precision, accuracy, and Pearson correlation coefficient (R), listed in Table 2

Database Compilation
Compiling the database is a critical step in assessing the relationship between landslide occurrence and its causal factors.Typically, landslide occurrences tend to be more frequent under certain conditions, as observed in previous incidents.Therefore, identifying the distribution of past landslides is crucial for a comprehensive study.In this research, Google Earth images were used to locate landslide sites using the Historical Imagery tool, which enables the visualization of changes over time in the study area map.This approach is valuable for monitoring suspicious sites over time and identifying potential landslide areas.Following this, field surveys were conducted to validate the potentiality; assess the sizes and shapes of landslides; determine the movement types; conduct site diagnoses; and characterize the activity level (active, dormant, etc.) of failed slopes.
To assess the performance of hybrid machine learning methods, various thematic layers representing several factors influencing landslides were prepared.The selection of drivers depended on data availability, information gathered from field surveys, and specific characteristics of the study area.These factors were digitized, organized, and rasterized using Geographic Information System (GIS) software, specifically ArcGIS 10.8, to integrate them into the proposed model and generate the final map.

Correlation between Inputs and Target
The statistical relationship between landslide susceptibility and input parameters was examined using SPSS software version 20.0.Table 4 displays the correlation matrix, providing a descriptive overview of the data distribution, the Pearson correlation coefficient (R) and its significance regarding landslide susceptibility, and other inputs.The results revealed significance levels below 0.05 for X1, X2, X3, X7, and X8, indicating statistical significance in these correlations.According to Smith's classification (1986) [83], landslide susceptibility exhibits a consistent correlation with the input parameters, except for X4, X5, and X6, which demonstrate poor correlation.This suggests a complex nonlinear relationship that necessitates advanced machine learning techniques for accurate modeling.(2-tailed) 0.000 0.000 0.000 0.732 0.597 0.567 0.000 0.000 .

Optimal Input Selection Using the Gamma Test (GT)
In this section, the impact of each input on landslide susceptibility was evaluated by constructing eight different combinations of the input factors (X1, X2, X3, X4, X5, X6, X7, and X8), as outlined in Table 5.The first combination includes all eight parameters (re-ferred to as the initial set).Similarly, the second combination consists of seven input factors (All-X1), excluding the lithology parameter; the seventh combination includes all inputs except precipitation (X7) and so forth for the remaining combinations, as detailed in Table 5.The results of the GT analysis reveal that factors X1, X3, X5, X7, and X8 have a significant influence on the output.These five input factors were selected based on the highest value of the gamma statistic (Γ) and Vratio.The findings clearly indicate that the combination of lithology, slope, distance to stream, precipitation, and slope aspect exhibited the lowest values of gamma statistics.Consequently, this set was identified as the optimal combination of input variables for modeling landslide susceptibility, as determined by the Gamma Test method.

Landslide Susceptibility Classification through the Hybrid Metaheuristic Method
To determine the optimal machine learning model, the study employed a two-step approach: first, selecting influential input parameters based on the literature recommendations, and second, identifying the best machine learning methods.Initially, eight factors were chosen, and the Gamma Test method was then applied to identify the optimal inputs.Subsequently, five statistical measures were used to assess and compare the performances of various models during both the training and validation phases.The results, including sensitivity, specificity, precision, accuracy, and Pearson correlation coefficient (R), are presented in Table 6.

Evaluating the Best-Fi ed Model Using the K-Fold Cross-Validation Approach
The evaluation of the predictive capability of the optimal ANFIS-HHO model involved the effective use of a five-fold cross-validation approach.Notably, previous studies focusing on predicting landslide susceptibility often assessed their models based on a single split, which limited the verification of their models' ability to address overfi ing and underfi ing issues.Figure 13 illustrates the performance measures of the optimal ANFIS-HHO model using five-fold cross-validation with validation data for each split.
The results clearly demonstrate the efficacy of the ANFIS-HHO model, with correlation coefficients ranging between 0.972 and 0.994 for validation data across the five splits.This substantiates the predictive capability of the optimal ANFIS-HHO model to not only learn from existing data but also generalize well to novel validation data, effectively overcoming the overfi ing and underfi ing challenges.

Landslide Susceptibility Mapping
The landslide modeling process was conducted using four hybrid metaheuristic machine learning methods based on the training dataset.The performance of each model was assessed using five statistical indicators, revealing ANFIS-HHO as the most suitable model.To generate landslide susceptibility maps, susceptibility indices were computed for all pixels in the study area using a 30 × 30 m grid size.The Fishnet Tool in ArcToolbox facilitated this step.Subsequently, the ANFIS-HHO model was integrated into ArcGIS software to classify the susceptibility indices of each pixel based on the optimal input layers.
Figure 14 illustrates the resulting landslide susceptibility maps, showing three susceptibility classes: low, moderate, and high.The distribution of susceptibility classes reveals that 48.39% of the study area exhibits low susceptibility to landslides, while 22.31% and 29.29% have moderate and high susceptibilities, respectively.
The map illustrates an increasing susceptibility from plateau surfaces to streams, primarily influenced by slope angle.Plateau surfaces demonstrate low susceptibility due to factors such as lithological characteristics (e.g., hard rock), distance from streams, low precipitation, and gentle slopes.Moderate susceptibility is observed in ravines and convex slope ruptures delineating plateaus, indicative of lithological alteration zones (passage from a hard layer to soft layer).Conversely, high susceptibility zones are characterized by soft soil lithology, sparse vegetation cover, steep slopes, high precipitation, and proximity to streams.

Comparison between Our Model and the Models Proposed by the Literature
To assess the effectiveness of the proposed ANFIS-HHO model, a comparative study was conducted involving several empirical models from the literature predicting landslide susceptibility, as outlined in Table 7.The comparison was based on classification accuracy, sensitivity, and specificity, crucial indicators for evaluating prediction accuracy, where values close to 100 represent the best model.The results of the comparative study revealed that our proposed ANFIS-HHO model outperformed the others, demonstrating the highest classification accuracy, sensitivity, and specificity with values of 99.21, 100, and 98.734, respectively.
Following our model, the random forest model proposed by Dou et al. [52] ranked second, providing acceptable accuracy.The performance hierarchy of the machine learning models in our study was as follows: Dou et al. [52], Benbouras [8], Kavzoglu et al. [39], Tien Bui et al. [26], Aghdam et al. [59], Dao et al. [52], and Yeon et al. [50].The effectiveness of our suggested model is a ributed to the metaheuristic hybrid machine learning method, which automates the training process and achieves a be er performance and optimum results in a short period of time.

Significance of the Results
In our current research, we aimed to significantly contribute to the landslide research community by enhancing the performance of landslide susceptibility models.To the authors' knowledge, the quality of these models heavily relies on the chosen method, and the current study focuses on exploring the effectiveness of novel hybrid metaheuristic machine learning methods.Furthermore, despite Medea Wilaya being highly vulnerable to landslides, the existing literature lacks a comprehensive landslide map.To address these gaps, the efficacy of four meta-heuristic algorithms combined with the Adaptive Neuro-Fuzzy Inference System (ANFIS) method was examined for a landslide susceptibility assessment.These algorithms include Genetic Algorithm (ANFIS-GA), Particle Swarm Optimization (ANFIS-PSO), Harris Hawks Optimization (ANFIS-HHO), and Salp Swarm Algorithm (ANFIS-SSA).It is worth noting that the use of hybrid metaheuristic machine learning methods in a landslide assessment is relatively rare, representing a premiere for the study area.
Our findings highlight that the ANFIS-HHO model emerged as the most suitable model, exhibiting higher values of sensitivity (100/100), specificity (98.734/100), precision (98/100), accuracy (99.22/100), and Pearson correlation coefficient (R) (99.21/99.97)during both the training and validation phases compared to the other models.Furthermore, we evaluated the newly developed model using the K-fold cross-validation method, demonstrating its ability to generate new data without overfi ing or underfi ing and its superior precision compared to the other proposed empirical models in the literature.
Our results hold significant importance for landslide research and hazard assessment.By demonstrating the effectiveness of hybrid metaheuristic machine learning methods in improving landslide susceptibility models, we provide valuable insights for researchers and practitioners.The identification of the ANFIS-HHO model as the most suitable for landslide prediction underscores the potential of these advanced techniques in addressing complex geological phenomena.The model is based on an optimal hybrid metaheuristic ANFIS-HHO model assessed by the K-fold cross-validation approach.This approach ensures that our model's performance is evaluated using different subsets of the dataset, minimizing the risk of overfi ing and enhancing its generalization.This combination shows a rigorous optimization for accurate predictions, effectively overcoming the overfi ing and underfi ing challenges.It includes essential input parameters for landslide susceptibility, such as lithology, slope, elevation, distance to stream, land cover, precipitation, slope aspect, and distance to road.Moreover, the integration of our improved ANFIS-HHO model into GIS software facilitates the generation of accurate landslide susceptibility maps, enabling decision-makers and land use managers to implement effective risk management strategies.

Inner Validation of the Results
We believe that our proposed methodology, which combines hybrid machine learning techniques with GIS tools, offers a straightforward approach that can be replicated in other regions facing similar challenges.Historically, by the beginning of the 20th century, microzoning maps and machine learning methods gained widespread usage in northern countries, aiding decision-makers and land use managers in various applications [84].Today, the imperative to leverage these tools for developing up-to-date microzoning maps extends to several countries in the south, reflecting their significance in addressing contemporary challenges [84].In this context, landslide susceptibility maps hold considerable importance [85], serving as critical resources for informed decision-making and risk management.

External Validation of the Results
The results of the current study demonstrate a significant enhancement in the performance of the landslide model through the utilization of hybrid machine learning methods.Compared to traditional methods, the metaheuristic hybrid HHO-ANFIS method yielded highly significant results.Furthermore, ANFIS-HHO outperformed the proposed models in the literature, showing a 9.22% improvement over the ANFIS and DNN proposed by Aghdam et al. [59] and Dao et al. [52], respectively, 4.81% improvement over SVM proposed by Kavzoglu et al. [39], 4.14% improvement over PSOGSA-ANN proposed by Benbouras [8], and 3.63% improvement over Random Forest proposed by Dou et al. [52].These findings are consistent with expectations, as hybrid machine learning techniques, whether employed for prediction or classification tasks, can mitigate bias and variance while averting issues such as overfi ing and underfi ing, thus enhancing the predictive capability of traditional methods.

Importance of the Results
The significance of our results lies in their potential to inform landslide risk management strategies and land use planning efforts in landslide-prone regions.By accurately assessing the landslide susceptibility and generating detailed susceptibility maps, our methodology empowers decision-makers to implement proactive measures to mitigate the impact of landslides on human lives and infrastructures.Furthermore, our study highlights the efficacy of hybrid machine learning techniques in enhancing traditional landslide modeling approaches, paving the way for future research and practical applications in similar contexts worldwide.

Study Limitations and Future Directions
The main advantage of hybrid metaheuristic machine learning methods, in contrast to traditional approaches, lies in their ability to automate training processes and achieve superior performance and optimal results within shorter timeframes.Moreover, these methods possess the capability to amalgamate various algorithms, harnessing the strengths of each to create highly adaptable methodologies compared to conventional machine learning techniques.However, it is crucial to acknowledge several limitations associated with hybrid metaheuristic machine learning methods.The most important one is the relatively small sample size used in this study, which may impact the precision of the landslide susceptibility map.This limitation could hinder the model's capacity to generalize novel conditions or scenarios not accounted for during the training phase.Additionally, researchers often rely on extensive and diverse datasets compiled from various sources to bolster learning outcomes.Thus, future studies could benefit from incorporating data gathered across multiple countries to enrich the learning process and enhance the model performance.Furthermore, the implementation of proposed models may pose other challenges in future research endeavors.Results are typically presented in complex matrices computed using transfer functions, which may prove cumbersome to utilize in subsequent cases, particularly considering the necessity to integrate the model with external programs such as ArcGIS, as demonstrated in this study.

Conclusions
The conclusions drawn from this study highlight significant contributions aimed at exploring the efficacy of new unused advanced hybrid machine learning methods in generating a reliable model for effectively assessing landslide susceptibility in Medea Wilaya, which is known as a highly vulnerable area to landslides.To achieve this objective, historical landslide locations were identified using Google Earth images, and multiple field surveys were performed.The Gamma Test method was then employed for optimal input selection, revealing that lithology, slope, distance to stream, precipitation, and slope aspect constitute the optimal input set.Following this, four meta-heuristic algorithms, namely ANFIS-GA, ANFIS-PSO, ANFIS-HHO, and ANFIS-SSA, were combined with the ANFIS method and applied to model the selected optimal input set.The accuracy of the proposed models was assessed using five statistical indicators.Based on that, the comparative assessments highlighted the superior accuracy of the ANFIS-HHO model, exhibiting the best performance in terms of sensitivity (100/100), specificity (98.734/100), precision (98/100), accuracy (99.22/100), and Pearson correlation coefficient (R) (99.21/99.97)during both the training and validation phases compared to other models.Additionally, the predictive capability of the ANFIS-HHO model was evaluated using a five-fold cross-validation with K = 5, which yielded consistently high correlation coefficients (0.972 to 0.994) across validation data splits, indicating the absence of overfi ing or underfi ing issues.Our proposed model is based on an optimal hybrid metaheuristic ANFIS-HHO model assessed by the K-fold cross-validation approach and indicates rigorous optimization for accurate predictions, effectively overcoming the overfi ing and underfi ing challenges.
Comparative analyses with the proposed models in the literature confirmed the significant improvement of our proposed ANFIS-HHO model.Finally, the proposed model was integrated into GIS software to produce an accurate map depicting landslide-prone areas.This map can serve as a valuable tool for decision-makers and land use managers in mitigating landslide hazards in the Medea region.
Theoretically, our study contributes to expanding our knowledge by providing an in-depth insight into the application of advanced machine learning techniques in landslide susceptibility assessments.By investigating the efficacy of hybrid metaheuristic algorithms combined with the Adaptive Neuro-Fuzzy Inference System (ANFIS) method, we contribute to the body of literature on landslide modeling methodologies.
Methodologically, our study presents a novel approach to landslide susceptibility modeling by integrating multiple advanced techniques and methodologies.The utilization of hybrid metaheuristic algorithms represents an innovative methodological advancement, offering a robust framework for improving the predictive accuracy and reliability.Moreover, our research demonstrates the feasibility of integrating machine learning models with Geographic Information Systems (GIS) software for practical applications in hazard mapping and risk management.This methodological innovation has implications beyond landslide research and can be adapted to other geospatial modeling applications, contributing to interdisciplinary approaches in geological-geotechnical risks.
Finally, the current research has the potential to inform decision-making processes and improve disaster preparedness efforts in landslide-prone regions.Moreover, the methodologies developed in this study can serve as a foundation for future research endeavors in the broader field of natural hazard assessment and mitigation.

Figure 1 .
Figure 1.Geographical location of the study area.

Figure 2 .
Figure 2. The landslide inventory map of the study area.

1 .
Detection of Historical Landslide Locations and Identification of Conditioning Factors: Using Google Earth images and multiple field surveys, 160 landslide sites were identified in the study area.Additionally, eight drivers were selected based on the literature, expert knowledge, and characteristics of the area.2. Construction of Database: A database was constructed containing logical scales of landslide conditioning factors and target values.3. Optimal Input Variable Selection: The Gamma Test technique (GT) was employed to

Figure 3 .
Figure 3. Flowchart of the key steps of the research methodology for mapping areas susceptible to landslide.

Figure 4 .
Figure 4. Lithological map of the study area (source: geological map service of Algeria at the scale of 1:50,000).

Figure 5 .
Figure 5. Precipitation map of the study area (source: National Agency for Hydraulic Resources of Algeria at the scale of 1:500,000).

Figure 6 .
Figure 6.Elevation map of the study area (source: DEM generated from the SRTM database).

Figure 7 .
Figure 7. Slope map of the study area (source: DEM generated from the SRTM database).

Figure 8 .
Figure 8. Land cover map of the study area (source: Sentinel-2 satellite imagery).

Figure 9 .
Figure 9. Hydrographic network map (source: DEM generated from the SRTM database).

Figure 10 .
Figure 10.Roads network map of the study area (source: Google Maps Road available on Google Earth).

Figure 11 .
Figure 11.Slope aspect map of the study area (DEM generated from the SRTM database).

Table 3 .
Parameters used and their subdivision into classes.

Figure 13 .
Figure 13.Performance measures of the ANFIS-HHO model using the K-fold cross-validation with K = 5.

Figure 14 .
Figure 14.Landslide susceptibility map for the study area based on the ANFIS-HHO model.
. The optimal model is determined based on the highest values of these statistical indicators, where  FN: False Negative, for incorrectly predicted no-event values.

Table 2 .
Statistical indicators of the model fit quality used in the current study.

Table 4 .
Matrix of the correlations between the meteorological parameters (**: correlation is significant at the 0.01 level; *: correlation is significant at the 0.05 level).

Table 5 .
Optimal input variable nomination using GT.

Table 6 .
Performances of the proposed models throughout the training and validation phases.

Table 7 .
A comparison between our HHO-ANFIS model and some of the proposed empirical models in the literature.