A Novel Estimation of the Composite Hazard of Landslides and Flash Floods Utilizing an Artiﬁcial Intelligence Approach

: Landslides and ﬂash ﬂoods are signiﬁcant natural hazards with substantial risks to human settlements and the environment, and understanding their interconnection is vital. This research investigates the hazards of landslides and ﬂoods in two adopted basins in the Yamaguchi and Shimane prefectures, Japan. This study utilized ten environmental variables alongside categories representing landslide-prone, non-landslide, ﬂooded, and non-ﬂooded areas. Employing a machine-learning approach, namely, a LASSO regression model, we generated Landslide Hazard Maps (LHM), Flood Hazard Maps (FHM), and a Composite Hazard Map (CHM). The LHM identiﬁed ﬂood-prone low-lying areas in the northwest and southeast, while central and northwest regions exhibited higher landslide susceptibility. Both LHM and FHM were classiﬁed into ﬁve hazard levels. Landslide hazards predominantly covered high-to moderate-risk areas, since the high-risk areas constituted 38.8% of the study region. Conversely, ﬂood hazards were mostly low to moderate, with high-and very high-risk areas at 10.49% of the entire study area. The integration of LHM and FHM into CHM emphasized high-risk regions, underscoring the importance of tailored mitigation strategies. The accuracy of the model was assessed by employing the Receiver Operating Characteristic (ROC) curve method, and the Area Under the Curve (AUC) values were determined. The LHM and FHM exhibited an exceptional AUC of 99.36% and 99.06%, respectively, signifying the robust efﬁcacy of the model. The novelty in this study is the generation of an integrated representation of both landslide and ﬂood hazards. Finally, the produced hazard maps are essential for policymaking to address vulnerabilities to landslides and ﬂoods.


Introduction
Landslides and floods stand out as natural perils globally, triggering significant destructive impacts with repercussions spanning loss of life, property devastation, and economic upheaval [1,2].According to the definition provided by [3], landslides represent the downward movements of debris, rocks, or earth material propelled by the force of gravity, manifesting when the driving force surpasses the resistance force due to the destabilization of natural soil or rock slopes.The destabilization, in turn, is induced by a combination of natural and anthropogenic factors, encompassing improper land-use practices, the presence of loose sediment, intense and prolonged rainfall, highly weathered and fractured rocks, gully and riverbank erosion, seismic activity, as well as the interference of superficial soil-rock layers and unplanned urban expansion [4,5].
Similarly, flash floods, characterized as sudden and swift inundations occurring within minutes or hours of intense rainfall, represent a distinct peril often linked to thunderstorms or tropical cyclones [6,7].Moreover, many infrastructures located in regions prone to significant flooding could be damaged due to inadequate investigations and a deficiency in proactive mitigation measures, as indicated by Huang et al. [8].Accordingly, Rusyda et al. [9] confirmed 57 debris flow locations and 81 landslide occurrences, including 16 slope failure locations, throughout the field survey at Tsuwano and Nasyohi river reach, Takatsu River basin, Yamaguchi prefecture, Japan.Thus, a comprehensive examination and understanding of the spatial pattern of both natural hazards is imperative for fortifying resilience and preparedness measures and minimizing further devastation, as emphasized by Wahba et al. [10].
The cartographic depiction and evaluation of landslide and flash flood hazards constitute a pivotal process in discerning and assessing the vulnerability of a specific geographical area.Landslides, arising from diverse factors such as intense precipitation, seismic activity, slope instability, altitude, aspect, curvature, drainage density, land use and lithology, necessitate a comprehensive analysis to delineate regions susceptible to these events [11].The resulting landslide hazard maps serve as tools to pinpoint areas at risk of landslides, facilitating the implementation of targeted mitigation strategies [12].Simultaneously, flash flood hazard mapping involves the identification of locales prone to sudden inundations, typically triggered by heavy rainfall and the mentioned geomorphological variables [13].This mapping not only aids in the establishment of effective warning systems and evacuation plans but also supports land use planning and development decisions, contributing to a holistic approach in managing the risks associated with these natural hazards [14].
Moreover, diverse methodologies exist for landslide and flash flood hazard mapping, encompassing traditional techniques like geological mapping, field surveys, hydrological modeling, hydraulic modeling, and statistical analysis [15,16].For instance, Khan et al. [17] utilized the Geographic Information System (GIS), Remote Sensing (RS) and hydraulic modeling to assess flooding hazards for two scenarios with and without a dam installation in Abha city, Saudi Arabia.Despite their historical usage, these methods often prove time-consuming, resource-intensive, and prone to inaccuracies, especially in regions characterized by intricate terrain or limited data availability [18].In response to these challenges, machine learning (ML), a subset of artificial intelligence (AI), has emerged as a promising and innovative tool capable of learning from data and making predictions [19].
ML's aptitude for analyzing intricate patterns and relationships from historical landslide and flash flood data renders it well suited for hazard mapping, where complexities may excel compared to in conventional methodologies [20,21].Recent efforts have leveraged ML algorithms to create hazard maps, utilizing its ability to discern patterns from past incidents to predict the severity of these hazards in new areas [22][23][24][25].It is crucial, however, to underscore that while ML holds significant promise, it is not a panacea for hazard mapping.Rather, it should be employed synergistically with traditional methods, such as geological mapping and field surveys, to ensure the development of accurate and reliable hazard maps [25].Therefore, continued research is imperative to refine and evaluate ML algorithms specifically tailored for landslide and flash flood hazard-mapping applications.
The application of ML for landslide and flash flood hazard mapping has garnered increased attention, with several studies assessing the efficacy of different ML algorithms in this domain.A notable investigation conducted by Daviran et al. [26] focused on the Darjeeling District of India, comparing the performances of four distinct ML algorithms for landslide hazard mapping.The algorithms evaluated included Random Forest (RF), Artificial Neural Network (ANN), Support Vector Machine (SVM), and Naive Bayes classifiers.The findings revealed that the RF algorithm exhibited the highest performance, followed by the ANN algorithm, while SVM and Naive Bayes classifiers demonstrated comparatively poor results.Likewise, Jones et al. [27] used logistic regression to develop four landslide susceptibility models based on 3 typhoon-triggered landslide inventories between 2009 and 2019.
In a parallel study [28], which focused on the Pearl River Basin in China, the performance of two ML algorithms-Random Forests (RFs) and Gradient Boosting Machines (GBRs)-was compared for flash flood hazard mapping.The results indicated that the GBR algorithm outperformed the RF algorithm in this context.From another perspective, it is pertinent to highlight that the process of urbanization has the potential to intensify the likelihood of floods.This is especially evident in regions undergoing urban development where diminished infiltration rates render them particularly vulnerable to the hazards associated with flooding [29].Similarly, Wagenaar et al. [30] investigated flood damage using multiple variables and supervised learning approaches, including regression trees, bagging regression trees, Random Forest, and the Bayesian network.
These studies collectively suggest that ML algorithms hold promise for developing accurate hazard maps for landslides and flash floods.However, it is crucial to acknowledge that the effectiveness of ML algorithms is contingent on the specific characteristics of the data and the objectives of the study.Variability in performance across different algorithms underscores the importance of carefully selecting and customizing ML approaches based on the unique characteristics of the hazard-mapping task at hand.
The utilization of machine learning (ML) for landslide and flood hazard mapping encompasses a variety of algorithms, among which Lasso regression stands out as a notable choice that is rarely employed for flood hazard mapping [31,32].Moreover, the Takatsu River basin and Nishikigawa River basin have not been investigated to generate flood and landslide hazard maps.
Thus, the novelty of this study lies in the utilization of Lasso regression to map landslide and flood hazards and the presentation of a combined hazard map for this zone, consolidating information on both landslide and flood hazards.In addition, the outcomes of this study are anticipated to provide valuable insights into the efficacy of ML, specifically Lasso regression, for hazard mapping in complex terrains with multifaceted factors.The results stand to contribute not only to the advancement of accurate and efficient hazard mapping but also to the overarching goal of mitigating the risk of disasters in the region.

Study Area
The investigated region includes the Takatsu River basin and Nishikigawa River basin situated in the Chugoku region of Japan (see Figure 1).This region, situated in western Japan, is renowned for its challenging mountainous terrain and frequent heavy rainfall.The Takatsu River basin occupies the eastern-western part of the Chugoku region, covering an expanse of 1220 km 2 , while the Nishikigawa River basin, located in another part of the Chugoku region, spans an area of 780 km 2 .Both basins exhibit mountainous terrains and experience substantial precipitation.
The Takatsu River basin is inhabited by approximately 100,000 residents, with its predominant economic activities being agriculture and manufacturing.Similarly, the Nishikigawa River basin sustains a population of around 50,000 people, with agriculture and tourism constituting the primary industries.Notably, both river basins face inherent threats of landslides in their mountainous regions and floods in downstream areas.The occurrence of landslides is particularly frequent in elevated terrains, whereas floods commonly impact lower-lying regions.
Moreover, recent years have witnessed the occurrence of several significant landslides and floods in these basins, resulting in considerable damage to both property and infrastructure.Moreover, the heightened vulnerability of these regions to landslides and flash floods, along with their cascading effects on natural ecosystems and transportation networks throughout the broader Chugoku region, underscores the pressing need for a thorough assessment.Such an evaluation is indispensable for formulating robust mitigation strategies aimed at reducing the detrimental repercussions linked to these natural disasters.slides and floods in these basins, resulting in considerable damage to both property and infrastructure.Moreover, the heightened vulnerability of these regions to landslides and flash floods, along with their cascading effects on natural ecosystems and transportation networks throughout the broader Chugoku region, underscores the pressing need for a thorough assessment.Such an evaluation is indispensable for formulating robust mitigation strategies aimed at reducing the detrimental repercussions linked to these natural disasters.

Inventory Map
The flood inventory map plays a pivotal role in assessing flood hazards by identifying susceptible areas at risk of flooding [33,34].This map enhances its effectiveness by achieving greater precision through the accurate delineation of flooded regions [35].Similarly, when it comes to landslide hazard mapping, the accuracy of predictions is contingent upon the availability of comprehensive data pertaining to sliding and non-sliding locations.Moreover, the landslide masses predominantly consist of a quaternary sedimentary layer, and the primary mechanism of movement involves traction and cohesive sliding [36].
In this study, we examined a sample of 301 locations within the designated study area, with 55 locations categorized as sliding points and 64 as non-sliding areas.Furthermore, we identified 93 points as flooded and 89 as non-flooded zones.The spatial distribution of these points is visually depicted in Figure 2. The areas affected by sliding and those unaffected have been segregated, allocating 70% for training and 30% for validating the model as per [37].A parallel partitioning strategy has been applied to areas prone to flooding and those not susceptible, utilizing the same ratios for training and validation purposes.

Inventory Map
The flood inventory map plays a pivotal role in assessing flood hazards by identifying susceptible areas at risk of flooding [33,34].This map enhances its effectiveness by achieving greater precision through the accurate delineation of flooded regions [35].Similarly, when it comes to landslide hazard mapping, the accuracy of predictions is contingent upon the availability of comprehensive data pertaining to sliding and non-sliding locations.Moreover, the landslide masses predominantly consist of a quaternary sedimentary layer, and the primary mechanism of movement involves traction and cohesive sliding [36].
In this study, we examined a sample of 301 locations within the designated study area, with 55 locations categorized as sliding points and 64 as non-sliding areas.Furthermore, we identified 93 points as flooded and 89 as non-flooded zones.The spatial distribution of these points is visually depicted in Figure 2. The areas affected by sliding and those unaffected have been segregated, allocating 70% for training and 30% for validating the model as per [37].A parallel partitioning strategy has been applied to areas prone to flooding and those not susceptible, utilizing the same ratios for training and validation purposes.
Both landslide and flooded points were randomly selected from areas subjected to landslide and flood.According to [38], the non-sliding spots can be generated using a physically based susceptibility model (PISA-m).However, in this study, non-landslide and non-flooded points were randomly selected to form the zones, apart from pervious areas, and were at least 5 km [26].The flood data from the hazard map portal site "https://disaportal.gsi.go.jp/ (accessed on 15 August 2023)" and landslide data from the digital archive for landslide distribution maps "https://dil-opac.bosai.go.jp/publication/ nied_tech_note/landslidemap/gis.html(accessed on 15 August 2023)" were used to select these points.
Furthermore, employing both slid, non-slid, flooded, and non-flooded data in machinelearning approaches holds significant potential.These approaches can train models to be capable of accurately forecasting flood occurrences and train their impacts on various systems.For example, machine-learning models can anticipate the timing and spatial extent of flood events by incorporating data from both flooded and non-flooded points, thereby improving predictive accuracy and contributing to more effective risk management strategies as mentioned by Bentivoglio et al. [20].Furthermore, employing both slid, non-slid, flooded, and non-flooded data in machine-learning approaches holds significant potential.These approaches can train models to be capable of accurately forecasting flood occurrences and train their impacts on various systems.For example, machine-learning models can anticipate the timing and spatial extent of flood events by incorporating data from both flooded and non-flooded points, thereby improving predictive accuracy and contributing to more effective risk management strategies as mentioned by Bentivoglio et al. [20].

Methodology
In the context of this scholarly investigation, the research can be delineated into four fundamental phases: preparatory processing, the consideration of environmental factors, the training of machine-learning models, and subsequent model validation.The inaugural step, termed "preparatory processing" involves the utilization of Arc Map 10.8.2

Methodology
In the context of this scholarly investigation, the research can be delineated into four fundamental phases: preparatory processing, the consideration of environmental factors, the training of machine-learning models, and subsequent model validation.The inaugural step, termed "preparatory processing" involves the utilization of Arc Map 10.8.2 software to execute a delineation of the Digital Elevation Model (DEM).The utilization of digital elevation models (DEMs) facilitates the automated extraction of channel networks and the quantitative delineation of the geomorphic attributes of drainage basins [39].This process is pivotal for determining flow direction, a critical element in the computation of potential streamlines and basins.Following this, various environmental factors are estimated and visually represented.These environmental factors encompass elevation, slope, lithology, aspect, plane curvature, profile curvature, land cover, surface roughness, road density, and stream density.
Furthermore, the amalgamation of slid, non-slid, flooded, and non-flooded data points with the aforementioned environmental factors is undertaken.Subsequently, the dataset is partitioned, with 70% allocated for training the machine-learning model and the remaining 30% reserved for assessing model performance.Numerous researchers have utilized the adopted training and validation ratios such as [19,40,41].This study utilized the Least Absolute Shrinkage and Selection Operator (LASSO) regression machine-learning model, specifically employed for regression purposes.The LASSO method is grounded in shrinkage estimation principles and has gained extensive utilization within the application of statistics [30,42].The benefits associated with LASSO according to Pan et al. [43] and Xu et al. [44] encompass: (1) LASSO provides greater prediction accuracy when compared to other regression models; (2) LASSO Regularization helps to increase model interpretation; and (3) Lasso regression is used for reducing the complexity of the model.In additional, it can provide an effective resolution to the multicollinearity issue and comprehensive facilitation of variable selection.The LASSO model has been generated using the sklearn linear library within the Python 3.9.13software environment.Upon the completion of model training, each model generates both a Landslide Hazard Map (LHM) and a Flood Hazard Map (FHM).These maps are generated utilizing the incorporation of the mentioned ten environmental factors.Furthermore, the hazard maps generated are integrated to create the Composite Hazard Map (CHM), which serves as a crucial reference for highlighting both types of hazards.
Ultimately, to gauge the accuracy of the models, the area under the Receiver Operating Characteristic (ROC) curve is computed.To evaluate the efficacy of the RF regression model's precision, we employed the Receiver Operating Characteristic-Area Under Curve (ROC-AUC) technique, a well-recognized approach within the domain of machine learning for the assessment of performance and the resolution of criteria selection and interpretive challenges [45].The ROC curve was constructed utilizing the withheld testing data from the model's training phase, together with its corresponding predicted values.
ROC curves are fashioned through the graphical representation of the True Positive Rate (TPR), also referred to as sensitivity, against the False Positive Rate (FPR), denoted as (1-specificity), on the y and x axes, respectively.The TPR serves as a quantifier of the model's precision in correctly identifying actual positive instances [45], whereas the FPR gauges the rate at which negative instances or non-events are erroneously classified as positive events.Essentially, FPR signifies the model's inclination to predict a positive outcome when the genuine outcome is, in fact, negative [46].
Additionally, a residual analysis is conducted, and performance metrics such as Rsquared mean absolute error (MAE) and mean square error (MSE) are calculated.These metrics serve as the basis for assessing and comparing the model's performance.Figure 3 illustrates the framework employed in this methodology.The derivation of the residual distribution is explicated through the utilization of Equation (1).
Since µ denotes the distribution's mean and σ describes its standard deviation, the mean serves to denote the distribution's central tendency, whereas the standard deviation regulates the distribution's extent or variability.The square of the standard deviation, denoted as σ 2 , is recognized as the variance.

Conditioning Factors
The present analysis discerned ten causative factors, encompassing both topographic and DEM-derived elements such as elevation, aspect, slope profile curvature, plan curvature, surface roughness, and stream density.The conditioning factors are described in Figures 4 and 5.While there is no universally acknowledged standard specifically delineated for the identification of factors responsible for inducing floods [47], the intricate interplay among diverse topographic and environmental elements significantly contributes to the evaluation of flood risk.Additionally, anthropogenic factors, exemplified by road density, geological aspects pertaining to lithology, and a satellite-influenced factor, namely land use and land cover, were considered.The land elevations ranged from the mean sea level (MSL) to approximately 1344 m above the MSL, with the elevation exerting a significant influence on hazard maps for floods and landslides.The digital elevation model (DEM) in ArcGIS 10.8.2 facilitated the generation of elevation maps, revealing that lower elevations correlated with higher flooding probabilities, while elevated areas exhibited an increased likelihood of landslides.Elevation stands out as a highly influential determinant of climatic attributes, as noted by [48].The choice of this variable was made with the intention of encapsulating the topographical attributes of the basin.Since µ denotes the distribution's mean and σ describes its standard deviation, the mean serves to denote the distribution's central tendency, whereas the standard deviation regulates the distribution's extent or variability.The square of the standard deviation, denoted as σ 2 , is recognized as the variance.

Conditioning Factors
The present analysis discerned ten causative factors, encompassing both topographic and DEM-derived elements such as elevation, aspect, slope profile curvature, plan curvature, surface roughness, and stream density.The conditioning factors are described in Figures 4 and 5.While there is no universally acknowledged standard specifically delineated for the identification of factors responsible for inducing floods [47], the intricate interplay among diverse topographic and environmental elements significantly contributes to the evaluation of flood risk.Additionally, anthropogenic factors, exemplified by road density, geological aspects pertaining to lithology, and a satellite-influenced factor, namely land use and land cover, were considered.The land elevations ranged from the mean sea level (MSL) to approximately 1344 m above the MSL, with the elevation exerting a significant influence on hazard maps for floods and landslides.The digital elevation model (DEM) in ArcGIS 10.8.2 facilitated the generation of elevation maps, revealing that lower elevations correlated with higher flooding probabilities, while elevated areas exhibited an increased likelihood of landslides.Elevation stands out as a highly influential determinant of climatic attributes, as noted by [48].The choice of this variable was made with the intention of encapsulating the topographical attributes of the basin.
Aspect calculations involved nine dip directions to investigate the potential exposures statistically linked to landslide occurrences.The detailed classification of exposures contains flat (1), north (337.5-22.5),northeast (22.5-67.5),east (67.5-112.5),southeast (112.5-157.5),south (157.5-202.5),southwest (202.5-247.5),west (247.5-292.5),Aspect calculations involved nine dip directions to investigate the potential exposures statistically linked to landslide occurrences.The detailed classification of exposures contains flat (1), north (337.5-22.5),northeast (22.5-67.5),east (67.5-112.5),southeast (112.5-157.5),south (157.5-202.5),southwest (202.5-247.5),west (247.5-292.5),and northwest (292.5-337.5)categories.This variable exerts influence over climatic parameters, including precipitation direction and sunlight intensity, subsequently impacting the frequency of natural events on the Earth's surface, as highlighted by [29].Furthermore, the choice of this factor was deliberate, aiming to provide insights into the alignment or orientation of slopes within a specified region.Simultaneously, slope, a determinant of flood probability and surface water flow, demonstrated a range from 0 to 61.06.The degree of slope holds significance in the context of floods as it directly influences the flow rate.Kourgialas and Karatzas [49] observed an inverse relationship between the occurrence of floods and slope angles.Simultaneously, the selection of slope as a variable was motivated by its ability to signify the magnitude of topographic variations.
In addition, ground curvature, categorized into profile curvature (vertical) and plan curvature (horizontal), played a pivotal role in influencing erosion processes and surface runoff.The spatial distribution of profile and plan curvature ranged from −10.285 to 12.206 and −12.25 to 10.98, respectively.
Surface roughness serves as a topographic parameter frequently employed for the identification and characterization of surface features, encompassing diverse vegetation types [50] as well as various geomorphological characteristics [51].It is gauged by the standard deviation of slope angles and the indicated variability in slope angles across the terrain.The study area exhibited surface roughness ranging from 0.111 to 0.889, reflecting diverse patterns of surface response.Moreover, stream density emerged as a crucial factor in flood susceptibility, with higher densities near rivers indicating an increased vulnerability to flooding and landslides.On the other hand, road density is a significant determinant of flood probability.It suggests that the spatial arrangement of roads impacts the hydrological dynamics of a catchment in response to rainfall events.The density of roads exhibits a direct correlation with catchment land use, particularly concerning water infiltration, and exerts influence over the efficient drainage, including factors such as the time of concentration, within a catchment through its network configuration, as elucidated by [52].Likewise, geological considerations, involving the aggregation of lithotypes into hydrogeological classes, were deemed essential for comprehensive susceptibility analysis.The lithotypes have been grouped into the following hydrogeological classes: clays, loam with a relatively equal area and clay loams that cover nearly two-thirds of the basins.Furthermore, land use/cover (LULC) data served as a key factor in identifying areas prone to flooding [53].Roads and residential areas were identified as contributors to flood occurrence, increasing water release peaks.The LULC map, generated using data extracted from the JAXA website then processed in ArcGIS, featured 12 classes including water body, urban, agriculture land, grassland, and bare land.Figures 3 and 4 demonstrate the delineated causative parameters.
However, rainfall is an important feature for both landside and flood hazards.This factor was neutralized or ignored as our study area covered a limited spatial extent and areas with almost the same rainfall pattern, which is intense.As shown in Figure 6, the monthly precipitations of the two basins' centroids for 2022 and 2023 were quite the same.These data were downloaded from "https://power.larc.nasa.gov/data-access-viewer/(accessed on 12 August 2023)" and visualized to reveal this finding.

Machine Learning and Performance Metrics
In this investigation, a machine-learning approach was employed to forecast the risk associated with both landslide and flood occurrences.The research commenced with the compilation of data encompassing areas affected by flooding and landslides, as well as non-affected areas for both phenomena within the specified region.Subsequently, relevant environmental features pertaining to the studied hazards were extracted.The amalgamation of these environmental features and the collected data was then partitioned into training and validation sets.Additionally, a suitable machine-learning model was chosen.The model underwent training using the designated training data, followed by validation using the specified validation dataset.Ultimately, the trained model was deployed to predict the likelihood of both landslide and flood hazards across the entirety of the selected region.Figure 7 sketches the schematic diagram for the machine-learning process utilized in this research.

Machine Learning and Performance Metrics
In this investigation, a machine-learning approach was employed to foreca risk associated with both landslide and flood occurrences.The research comm with the compilation of data encompassing areas affected by flooding and lands as well as non-affected areas for both phenomena within the specified region sequently, relevant environmental features pertaining to the studied hazards extracted.The amalgamation of these environmental features and the collected was then partitioned into training and validation sets.Additionally, a suitabl chine-learning model was chosen.The model underwent training using the nated training data, followed by validation using the specified validation da Ultimately, the trained model was deployed to predict the likelihood of both slide and flood hazards across the entirety of the selected region.Figure 7      (2) subject to the constraint: where: β 0 is the y-intercept or bias term, β j represents the coefficients for the input features x j , p is the number of input features, x j represents the j-th input feature, and t is the maximum allowed sum of the absolute values of the coefficients.In the application of LASSO, it becomes imperative to specify a parameter denoted as α, which plays a pivotal role in determining the extent of the imposed penalty.To comprehensively explore the ramifications of different penalty strengths, this research encompassed the assessment of various α values, including 0, 0.1, 0.5, 1, and 10.Here, α signifies the regularization parameter that governs the intensity of the penalty term.
In addition, the LASSO algorithm serves the purpose of autonomously identifying the pivotal independent predictor variables essential for effectively classifying the response of the dependent variable [27].
Through a meticulous evaluation employing metrics such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared (R 2 ), it was discerned that an α value of 0.1 offered the highest level of predictive accuracy.It was this value that was ultimately chosen for the LASSO model.The implementation of the LASSO model was executed using the scikitlearn library within the Python programming language.

Models Performance
Within the framework of evaluating machine learning models, diverse methodologies can be utilized to assess their performance.In this study, multiple metrics were employed for this purpose, encompassing the Mean Absolute Error (MAE), the Mean Square Error (MSE), the Root Mean Square Error (RMSE), and R-squared (R), which were utilized to appraise the effectiveness of both classifier and regression models.The mathematical expressions for these metrics are provided in Equation (4), Equation ( 5), Equation ( 6), and Equation (7), respectively.
Since, z = 1 m ∑ m i=1 σ i , where, p i = prediction, σ i = actual value, z = the mean of actual values, m = total count of data Moreover, the Mean Absolute Error (MAE) serves as a metric that computes the average absolute discrepancy between the predicted and actual values.It finds particular utility in scenarios where substantial errors are deemed undesirable, as it offers a direct measure of the model's accuracy in predicting the magnitude of these errors.
In contrast, the Mean Squared Error (MSE) calculates the average of the squared differences between the predicted and actual values.The MSE assigns greater significance to larger errors and proves advantageous when assessing models that must precisely predict extreme values.
The Root Mean Squared Error (RMSE) derives from the square root of the MSE and is employed to express the error in the same units as the target variable.This metric facilitates a more intuitive grasp of the error magnitude by providing a measurement that aligns with the original scale of the data.
Lastly, R-squared (R 2 ) quantifies the fraction of variance in the target variable that can be elucidated by the model.Ranging from 0 to 1, higher R values signify a more favorable alignment of the model with the data, indicating the extent to which the model accounts for the variance in the observed outcomes.
The estimation of these performance metrics has been diligently conducted and is presented comprehensively in Table 1, affording a comprehensive evaluation of the model's efficacy in this research.

The Generation of Hazard Maps
The creation of landslide hazard and flood hazard maps constitutes a fundamental aspect of geospatial analysis and disaster management [55,56].These maps serve as indispensable tools for assessing and mitigating the risks associated with natural disasters.In the present investigation, we employed the LASSO regression model to generate Landslide Hazard Maps (LHM) and Flood Hazard Maps (FHM).Figure 8 visually presents the resulting LHM and FHM.It is evident that both cartographic representations depict the most elevated hazard levels situated predominantly in the northwestern quadrant, with a notable convergence of both hazards along the central axis connecting the southern and northwestern regions.Furthermore, it is noteworthy that the extent of the landslide hazard encompasses a larger geographical area compared to the extent of the flood hazard.
Water 2023, 15, x FOR PEER REVIEW 14 of 22 presents the resulting LHM and FHM.It is evident that both cartographic representations depict the most elevated hazard levels situated predominantly in the northwestern quadrant, with a notable convergence of both hazards along the central axis connecting the southern and northwestern regions.Furthermore, it is noteworthy that the extent of the landslide hazard encompasses a larger geographical area compared to the extent of t h e flood hazard.
Viewed from an alternative vantage point, it becomes apparent that the low-lying regions situated in the northwestern and southeastern sectors exhibit a susceptibility to flood hazards, a pattern congruent with earlier scholarly investigations [57,58].Meanwhile, it is evident that within the central and northwestern regions of the study area, where lower slope values are prevalent, there exists a heightened susceptibility to flood hazards, as corroborated by the findings in the work of [59], which postulates an increased likelihood of inundation with a concurrent decrease in terrain gradient.Moreover, both hazard maps have been classified into five degrees of hazard, from "very low" to "very high", using t h e equal interval tool in Arc Map (see Figure 9).This classification is of paramount importance due to its profound implications for disaster risk reduction, public safety, and effective resource allocation.It was found that higher flood-hazard-prone areas were associated with lower elevations, lower slopes and higher stream density, as concluded by Janizadeh et al. [21].Likewise, the increased hazard degree for landslides covered a greater area compared to the flood hazard, characterized by low to moderate land levels and slopes and higher to moderate drainage density, as found by Wubalem and Meten [4].In essence, the classification of a hazard degree in flood and landslide events not only provides a scientific basis for disaster preparedness and response but also empowers Viewed from an alternative vantage point, it becomes apparent that the low-lying regions situated in the northwestern and southeastern sectors exhibit a susceptibility to flood hazards, a pattern congruent with earlier scholarly investigations [57,58].Meanwhile, it is evident that within the central and northwestern regions of the study area, where lower slope values are prevalent, there exists a heightened susceptibility to flood hazards, as corroborated by the findings in the work of [59], which postulates an increased likelihood of inundation with a concurrent decrease in terrain gradient.Moreover, both hazard maps have been classified into five degrees of hazard, from "very low" to "very high", using the equal interval tool in Arc Map (see Figure 9).This classification is of paramount importance due to its profound implications for disaster risk reduction, public safety, and effective resource allocation.It was found that higher flood-hazard-prone areas were associated with lower elevations, lower slopes and higher stream density, as concluded by Janizadeh et al. [21].Likewise, the increased hazard degree for landslides covered a greater area compared to the flood hazard, characterized by low to moderate land levels and slopes and higher to moderate drainage density, as found by Wubalem and Meten [4].In essence, the classification of a hazard degree in flood and landslide events not only provides a scientific basis for disaster preparedness and response but also empowers communities to make informed decisions about land use and development, ultimately contributing to a safer and more resilient environment.This underscores its significance in the realm of disaster-risk management and underscores the importance of ongoing research and monitoring to refine and improve these classification systems.From an alternative perspective, the Q-Q plot was constructed by initially estimating the residuals for both landslide and flood hazard predictions.The Q-Q plot was then generated using the stats.probplotfunction available in the Python scipy library.Figure 10 depicts the Q-Q plot representing the anticipated residuals of the landslide and flood hazard outcomes.Additionally, a Shapiro-Wilk test for normality was executed to obtain the p-values, yielding 0.115 for landslide and 0.332 for flood hazard.Following the recommendation of [60], a p-value greater than 0.05 is advisable to ensure a normal distribution of the results.In addition, Table 2 showcases various computed statistical measures for the predictions.From an alternative perspective, the Q-Q plot was constructed by initially estimating the residuals for both landslide and flood hazard predictions.The Q-Q plot was then generated using the stats.probplotfunction available in the Python scipy library.Figure 10 depicts the Q-Q plot representing the anticipated residuals of the landslide and flood hazard outcomes.Additionally, a Shapiro-Wilk test for normality was executed to obtain the p-values, yielding 0.115 for landslide and 0.332 for flood hazard.Following the recommendation of [60], a p-value greater than 0.05 is advisable to ensure a normal distribution of the results.In addition, Table 2 showcases various computed statistical measures for the predictions.generated using the stats.probplotfunction available in the Python scipy library.Figure 10 depicts the Q-Q plot representing the anticipated residuals of the landslide and flood hazard outcomes.Additionally, a Shapiro-Wilk test for normality was executed to obtain the p-values, yielding 0.115 for landslide and 0.332 for flood hazard.Following the recommendation of [60], a p-value greater than 0.05 is advisable to ensure a normal distribution of the results.In addition, Table 2 showcases various computed statistical measures for the predictions.

Model Validation
The assessment of the area under the curve (AUC) has been conducted for the machinelearning model under consideration.The calculated AUC proportions for the LHM and FHM are 99.36% and 99.06%, respectively, as sketched in Figure 11.These findings instill a heightened level of confidence in the efficacy of the machine-learning approach in prediction of the generated hazard maps, with particular emphasis on the LASSO regression model.This elevation in the performance of the adopted machine-learning technique can be attributed to its robust stability and its adaptability to various environmental factors, encompassing sliding, non-sliding, flooding, and non-flooding spots.
Water 2023, 15, x FOR PEER REVIEW of 22

Model Validation
The assessment of the area under the curve (AUC) has been conducted for the machine-learning model under consideration.The calculated AUC proportions for the LHM and FHM are 99.36% and 99.06%, respectively, as sketched in Figure 11.These findings instill a heightened level of confidence in the efficacy of the machine-learning approach in prediction of the generated hazard maps, with particular emphasis on the LASSO regression model.This elevation in the performance of the adopted machine-learning technique can be attributed to its robust stability and its adaptability to various environmental factors, encompassing sliding, non-sliding, flooding, and non-flooding spots.Moreover, in this study, the Monte Carlo cross validation was conducted using 20 iterations.The samples used in each iteration were changed with each trial.The maximum and minimum estimated AUC proportions were 99.69% and 92.5% for the landslide prediction, whilst the highest and lowest values for the AUC were 100% and 97.59% for the flood prediction.

Composite Hazard Map (CHM)
Integrating landslide and flood hazard maps into a single comprehensive map holds paramount importance in enhancing disaster preparedness, mitigation, and response efforts.Such integration provides a holistic understanding of natural hazards, Moreover, in this study, the Monte Carlo cross validation was conducted using 20 iterations.The samples used in each iteration were changed with each trial.The maximum and minimum estimated AUC proportions were 99.69% and 92.5% for the landslide prediction, whilst the highest and lowest values for the AUC were 100% and 97.59% for the flood prediction.

Composite Hazard Map (CHM)
Integrating landslide and flood hazard maps into a single comprehensive map holds paramount importance in enhancing disaster preparedness, mitigation, and response efforts.Such integration provides a holistic understanding of natural hazards, allowing for a more accurate assessment of areas prone to multiple threats, thereby enabling more effective landuse planning and infrastructure development.This approach not only optimizes resource allocation but also facilitates coordinated emergency response strategies.Furthermore, it aids in the identification of potential interactions and cascading effects between landslides and floods, thus enabling better-informed decision-making for risk reduction and climate resilience.In addition, the integration of landslide and flood hazard maps into a unified map offers a powerful tool to address the complex challenges posed by these concurrent hazards and promotes more resilient and safer communities.To generate the CHM, each hazard map was classified into a range from one to five, then the average value for the two reclassified hazard values in each pixel of the map was taken using the Math Algebra tool in ArcMap.After that, the composite hazard map was generated using the calculated values based on equal step classification, as shown in Figure 12.The CHM places particular emphasis on areas characterized by a significant level of hazard, specifically highlighting the "very high" and "high" risk categories which are pre-dominantly situated in the northwest, southeast, and southwest regions, with sporadic areas observed along the central axis extending from the northwest to the southeast.Conversely, the lowest hazard zones are projected to be situated in the northeast and middle-west portions of the area.These regions are characterized by higher elevations and moderate slopes.Meanwhile the overall "very low" hazard class encompasses an estimated area of approximately 340 km 2 , constituting a significant portion of the overall study area.Furthermore, a significant portion of the geographical expanse designated as the "low" hazard category is situated predominantly in the southern and northwestern regions, encompassing an approximate land area of 565 km 2 .Notably, within this hazard classification, there exist specific zones characterized by particularly favorable suitability for human habitation, especially in areas characterized by minimal terrain slope.There are several counter measures to mitigate the impact of the flooding and landslide disasters at higher prone areas.These measures include surface water and groundwater drainage, restraining work such as detention dams, culverts, convenience channels, drainage wells, anchor and pile woks, earth removal and buttress-fill work, as noted by Mansour et al. [14], Bandara et al. [61], and Higaki et al. [62].

Hazard Proportions
Furthermore, the study area proportion was calculated according to the hazard class as described in Figure 13.The provided figure presents a comprehensive assessment of landslide, flood, and composite hazard proportions across different risk categories within the study area.It is noteworthy that the majority of the study area is characterized by either high or moderate levels of landslide hazard, which collectively account for approximately three-quarters of the region.This suggests a relatively stable terrain with lower a susceptibility to landslides.In contrast, the proportions for higher levels of landslide hazard (high and very high) are notably lower, comprising 38.8% of the study area.This indicates that while the overall landslide hazard is relatively modest, there are localized areas with significantly heightened risk.
Turning to the flood hazard assessment, the data shows a strikingly different pattern.The majority of the study area falls into the low and moderate flood hazard categories, constituting a substantial 87.48% of the region.This suggests that a significant portion of the study area is exposed to relatively lower levels of flood risk, which can be beneficial for land development and urban planning.Conversely, the proportions for high and very high flood hazard levels are notably lower, collectively representing a mere 10.49% of the study area.While this may suggest a lower overall flood risk, it is essential to consider the potential severity of the consequences associated with flood events, even in areas categorized as having low or moderate flood hazards.
characterized by particularly favorable suitability for human habitation, especially in areas characterized by minimal terrain slope.There are several counter measures to mitigate the impact of the flooding and landslide disasters at higher prone areas.These measures include surface water and groundwater drainage, restraining work such as detention dams, culverts, convenience channels, drainage wells, anchor and pile woks, earth removal and buttress-fill work, as noted by Mansour et al. [14], Bandara et al. [61], and Higaki et al. [62].

Hazard Proportions
Furthermore, the study area proportion was calculated according to the hazard class as described in Figure 13.The provided figure presents a comprehensive assessment of landslide, flood, and composite hazard proportions across different risk categories within the study area.It is noteworthy that the majority of the study area is characterized by either high or moderate levels of landslide hazard, which collectively ac- count for approximately three-quarters of the region.This suggests a relatively stable terrain with lower a susceptibility to landslides.In contrast, the proportions for higher levels of landslide hazard (high and very high) are notably lower, comprising 38.8% of the study area.This indicates that while the overall landslide hazard is relatively modest, there are localized areas with significantly heightened risk.Turning to the flood hazard assessment, the data shows a strikingly different pattern.The majority of the study area falls into the low and moderate flood hazard categories, constituting a substantial 87.48% of the region.This suggests that a significant portion of the study area is exposed to relatively lower levels of flood risk, which can be beneficial for land development and urban planning.Conversely, the proportions for high and very high flood hazard levels are notably lower, collectively representing a mere 10.49% of the study area.While this may suggest a lower overall flood risk, it is essential to consider the potential severity of the consequences associated with flood events, even in areas categorized as having low or moderate flood hazards.
Comparatively, when examining the two hazards together, it becomes apparent that the study area's primary hazard concern is landslides, with 38.8% of the region experiencing high to very high landslide hazard levels.Flood hazard, on the other hand, is more widely distributed, affecting nearly a tenth of the study area at high to very high levels.This information underscores the importance of adopting a multifaceted approach to disaster-risk management and preparedness, addressing both landslide and flood hazards in accordance with their respective spatial distributions and potential impacts.
Additionally, this analysis within the study area reveals distinct patterns, with landslide hazards being concentrated in localized high-risk zones and flood hazards exhibiting a more widespread, albeit generally lower, distribution.This data underscores the importance of tailored risk-mitigation strategies and comprehensive disaster preparedness efforts, taking into account the varying spatial characteristics and potential consequences associated with these natural hazards.In terms of the CHM, the majority of the land area falls within the "Low" hazard category, comprising more than a quarter of the total area.This suggests that a substantial portion of the region faces relatively minimal combined susceptibility to both landslide and flood events, which can be beneficial for urban planning and development.
Moving on to the "Moderate" hazard category, which encompasses 23.45% of the land area, it represents regions with a medium-level risk.These areas warrant a Comparatively, when examining the two hazards together, it becomes apparent that the study area's primary hazard concern is landslides, with 38.8% of the region experiencing high to very high landslide hazard levels.Flood hazard, on the other hand, is more widely distributed, affecting nearly a tenth of the study area at high to very high levels.This information underscores the importance of adopting a multifaceted approach to disaster-risk management and preparedness, addressing both landslide and flood hazards in accordance with their respective spatial distributions and potential impacts.
Additionally, this analysis within the study area reveals distinct patterns, with landslide hazards being concentrated in localized high-risk zones and flood hazards exhibiting a more widespread, albeit generally lower, distribution.This data underscores the importance of tailored risk-mitigation strategies and comprehensive disaster preparedness efforts, taking into account the varying spatial characteristics and potential consequences associated with these natural hazards.
In terms of the CHM, the majority of the land area falls within the "Low" hazard category, comprising more than a quarter of the total area.This suggests that a substantial portion of the region faces relatively minimal combined susceptibility to both landslide and flood events, which can be beneficial for urban planning and development.
Moving on to the "Moderate" hazard category, which encompasses 23.45% of the land area, it represents regions with a medium-level risk.These areas warrant a heightened level of attention in terms of disaster preparedness and mitigation efforts, as they may experience significant impacts from landslide and flood events.
Conversely, the "High" hazard category, comprising approximately a fifth of the total area, indicates regions with a relatively elevated risk of both landslides and floods.It is essential for local authorities and stakeholders to prioritize these areas for risk-reduction measures and adopt stringent building and land-use regulations.
Lastly, the "Very Low" and "Very High" hazard categories constitute 16.78% and 12.33% of the land area, respectively.While "Very Low" regions have minimal risk, "Very High" regions represent areas with the highest susceptibility to both hazards.These "Very High" regions demand immediate and comprehensive risk-reduction strategies and necessitate close monitoring and preparedness efforts to safeguard lives and property.
Ultimately, these ratios underscored the varying degrees of landslide and flood hazards within the studied region, agreeing with the relative difference between hazard degrees obtained by Luu et al. [11].They also emphasize the importance of tailored disaster-management strategies and land-use planning based on these hazard classifications.Careful consideration of these proportions can assist policymakers and local authorities in allocating resources effectively, implementing appropriate mitigation measures, and enhancing community resilience to these natural hazards.

Conclusions
Landslides and floods are significant natural perils with substantial risks for communities and the environment.Understanding their inter-relationship is crucial as it advances our knowledge of these dangers and pinpoints geographical regions where they might occur together.In this study, a total of 10 environmental variables were employed alongside collected spatial data of sliding, non-sliding, flooded, non-flooded points.These variables were incorporated into the LASSO regression model to generate Landslide Hazard Maps (LHM), Flood Hazard Maps (FHM), and Composite Hazard Maps (CHM).
The LHM indicated that regions with lower elevation in the northwestern and southeastern parts are susceptible to flooding, whereas the central and northwestern areas of the examined basins display an increased susceptibility to landslides.Both LHM and FHM were categorized across five levels of risk, spanning from "very low" to "very high".Similarly, a significant portion of the region encounters moderate to high landslide risks, encompassing roughly three-quarters of the territory.Meanwhile, areas with high and very high landslide risks account for 38.8% of the surveyed region.Concerning flood hazard, the majority of the surveyed basins are classified as having low to moderate hazard levels (87.48%).High and very high flood hazard zones constitute only 10.49% of the surveyed area.
Moreover, the CHM places considerable emphasis on delineating regions classified as "very high" and "high" risk, predominantly situated in the northwest, southeast, and southwest areas.Conversely, the northeast and middle-west territories exhibit lower hazard levels due to their elevated topography and moderate inclines.

Figure 1 .
Figure 1.Location of the study area.

Figure 1 .
Figure 1.Location of the study area.

Figure 6 .
Figure 6.Monthly precipitation of both Nishikigawa River and Takatsu River basins' centro 2022 and 2023.

Figure 6 .
Figure 6.Monthly precipitation of both Nishikigawa River and Takatsu River basins' centroids for 2022 and 2023.Water 2023, 15, x FOR PEER REVIEW 12 of 22

Figure 7 .
Figure 7. Schematic diagram for the ML process.

3.2. 1 .
Least Absolute Shrinkage and Selection Operator (LASSO) This method constitutes a linear regression model serving the dual purpose of variable selection and effectively diminishing the number of factors incorporated into the ultimate model, as expounded upon in Hastie et al.'s work [54].The mathematical expression for the LASSO (Least Absolute Shrinkage and Selection Operator) model is formally depicted as follows in Equations (2) and (3):

Figure 7 .
Figure 7. Schematic diagram for the ML process.

3.2. 1 .
Least Absolute Shrinkage and Selection Operator (LASSO) This method constitutes a linear regression model serving the dual purpose of variable selection and effectively diminishing the number of factors incorporated into the ultimate model, as expounded upon in Hastie et al.'s work [54].The mathematical expression for the LASSO (Least Absolute Shrinkage and Selection Operator) model is formally depicted as follows in Equations (2) and (3): ŷ = β 0 + p ∑ j=1 β j x j

Figure 10 .
Figure 10.The Q-Q plot for the residual distribution of (a) landslide and (b) flood hazards.Red Line is 1:1 line; blue dots are residual values.

Figure 10 .
Figure 10.The Q-Q plot for the residual distribution of (a) landslide and (b) flood hazards.Red Line is 1:1 line; blue dots are residual values.

Figure 11 .
Figure 11.Receiver operating characteristic curve for both models (green dot line is 1:1 line).

Figure 11 .
Figure 11.Receiver operating characteristic curve for both models (green dot line is 1:1 line).

Figure 13 .
Figure 13.Proportion of study area according to hazard class.

Figure 13 .
Figure 13.Proportion of study area according to hazard class.