Data Biases in Geohazard AI: Investigating Landslide Class Distribution Effects on Active Learning and Self-Optimizing
Abstract
1. Introduction
2. Materials and Methods
2.1. Study Area
2.2. Data
2.3. Predictor Variables
2.4. Active Learning
2.5. Experiment Design
3. Results
3.1. Mean and Standard Deviation of AUROCs and pAUROCs
3.2. Self-Optimizing Ability
3.3. Classification Mapping
4. Discussion
4.1. Impact of Class Proportions in Selecting Sampling Strategies
4.2. Limitations and Outlook
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Fiorucci, F.; Ardizzone, F.; Mondini, A.C.; Viero, A.; Guzzetti, F. Visual Interpretation of Stereoscopic NDVI Satellite Images to Map Rainfall-Induced Landslides. Landslides 2019, 16, 165–174. [Google Scholar] [CrossRef]
- Haque, U.; Da Silva, P.F.; Devoli, G.; Pilz, J.; Zhao, B.; Khaloua, A.; Wilopo, W.; Andersen, P.; Lu, P.; Lee, J. The Human Cost of Global Warming: Deadly Landslides and Their Triggers (1995–2014). Sci. Total Environ. 2019, 682, 673–684. [Google Scholar] [CrossRef]
- Haque, U.; Blum, P.; da Silva, P.F.; Andersen, P.; Pilz, J.; Chalov, S.R.; Malet, J.-P.; Auflič, M.J.; Andres, N.; Poyiadji, E.; et al. Fatal Landslides in Europe. Landslides 2016, 13, 1545–1554. [Google Scholar] [CrossRef]
- Teh, D.; Khan, T. Types, Definition and Classification of Natural Disasters and Threat Level. In Handbook of Disaster Risk Reduction for Resilience: New Frameworks for Building Resilience to Disasters; Eslamian, S., Eslamian, F., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 27–56. ISBN 978-3-030-61278-8. [Google Scholar]
- Jemec Auflič, M.; Bezak, N.; Šegina, E.; Frantar, P.; Gariano, S.L.; Medved, A.; Peternel, T. Climate Change Increases the Number of Landslides at the Juncture of the Alpine, Pannonian and Mediterranean Regions. Sci. Rep. 2023, 13, 23085. [Google Scholar] [CrossRef] [PubMed]
- Zhu, Y.; Qiu, H.; Liu, Z.; Ye, B.; Tang, B.; Li, Y.; Kamp, U. Rainfall and Water Level Fluctuations Dominated the Landslide Deformation at Baihetan Reservoir, China. J. Hydrol. 2024, 642, 131871. [Google Scholar] [CrossRef]
- Qiu, H.; Li, Y.; Zhu, Y.; Ye, B.; Yang, D.; Liu, Y.; Wei, Y. Do Post-Failure Landslides Become Stable? CATENA 2025, 249, 108699. [Google Scholar] [CrossRef]
- Goetz, J.N.; Brenning, A.; Petschko, H.; Leopold, P. Evaluating Machine Learning and Statistical Prediction Techniques for Landslide Susceptibility Modeling. Comput. Geosci. 2015, 81, 1–11. [Google Scholar] [CrossRef]
- Fu, L.; Zhang, Q.; Wang, T.; Li, W.; Xu, Q.; Ge, D. Detecting Slow-Moving Landslides Using InSAR Phase-Gradient Stacking and Deep-Learning Network. Front. Environ. Sci. 2022, 10, 963322. [Google Scholar] [CrossRef]
- Zhang, Y.; Li, Y.; Meng, X.; Liu, W.; Wang, A.; Liang, Y.; Su, X.; Zeng, R.; Chen, X. Automatic Mapping of Potential Landslides Using Satellite Multitemporal Interferometry. Remote Sens. 2023, 15, 4951. [Google Scholar] [CrossRef]
- Luengo, J.; Fernández, A.; García, S.; Herrera, F. Addressing Data Complexity for Imbalanced Data Sets: Analysis of SMOTE-Based Oversampling and Evolutionary Undersampling. Soft Comput. 2011, 15, 1909–1936. [Google Scholar] [CrossRef]
- Juang, C.S.; Stanley, T.A.; Kirschbaum, D.B. Using Citizen Science to Expand the Global Map of Landslides: Introducing the Cooperative Open Online Landslide Repository (COOLR). PLoS ONE 2019, 14, e0218657. [Google Scholar] [CrossRef] [PubMed]
- Kirschbaum, D.; Stanley, T.; Zhou, Y. Spatial and Temporal Analysis of a Global Landslide Catalog. Geomorphology 2015, 249, 4–15. [Google Scholar] [CrossRef]
- Stumpf, A.; Lachiche, N.; Malet, J.-P.; Kerle, N.; Puissant, A. Active Learning in the Spatial Domain for Remote Sensing Image Classification. IEEE Trans. Geosci. Remote Sens. 2013, 52, 2492–2507. [Google Scholar] [CrossRef]
- Liu, Z.; Ding, H.; Zhong, H.; Li, W.; Dai, J.; He, C. Influence Selection for Active Learning. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; IEEE: New York, NY, USA, 2021; pp. 9254–9263. [Google Scholar]
- Fu, Y.; Zhu, X.; Li, B. A Survey on Instance Selection for Active Learning. Knowl. Inf. Syst. 2013, 35, 249–283. [Google Scholar] [CrossRef]
- Du, P.; Zhao, S.; Chen, H.; Chai, S.; Chen, H.; Li, C. Contrastive Coding for Active Learning under Class Distribution Mismatch. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; IEEE: New York, NY, USA, 2021; pp. 8907–8916. [Google Scholar]
- Frattini, P.; Crosta, G.; Carrara, A. Techniques for Evaluating the Performance of Landslide Susceptibility Models. Eng. Geol. 2010, 111, 62–72. [Google Scholar] [CrossRef]
- Brenning, A. Improved Spatial Analysis and Prediction of Landslide Susceptibility: Practical Recommendations. In Landslides and Engineered Slopes, Protecting Society Through Improved Understanding; Eberhardt, E., Froese, C., Turner, A.K., Leroueil, S., Eds.; Taylor & Francis: Banff, AB, Canada, 2012; pp. 789–795. [Google Scholar]
- Corominas, J.; Van Westen, C.; Frattini, P.; Cascini, L.; Malet, J.-P.; Fotopoulou, S.; Catani, F.; Van Den Eeckhaut, M.; Mavrouli, O.; Agliardi, F.; et al. Recommendations for the Quantitative Analysis of Landslide Risk. Bull. Eng. Geol. Environ. 2013, 73, 209–263. [Google Scholar] [CrossRef]
- Ozaki, M.; Taku, K. 1: 200,000 Land Geological Map in the Ishikari Depression and Its Surrounding Area with Explanatory Note. In Seamless Geoinformation of Coastal Zone “Southern Coastal Zone of the Ishikari Depression”, Seamless Geological Map of Costal Zone S-4; Geological Survey of Japan ALST: Tsukuba, Japan, 2014. [Google Scholar]
- Adriano, B.; Yokoya, N.; Miura, H.; Matsuoka, M.; Koshimura, S. A Semiautomatic Pixel-Object Method for Detecting Landslides Using Multitemporal ALOS-2 Intensity Images. Remote Sens. 2020, 12, 561. [Google Scholar] [CrossRef]
- Osanai, N.; Yamada, T.; Hayashi, S.; Kastura, S.; Furuichi, T.; Yanai, S.; Murakami, Y.; Miyazaki, T.; Tanioka, Y.; Takiguchi, S.; et al. Characteristics of Landslides Caused by the 2018 Hokkaido Eastern Iburi Earthquake. Landslides 2019, 16, 1517–1528. [Google Scholar] [CrossRef]
- Yamagishi, H.; Yamazaki, F. Landslides by the 2018 Hokkaido Iburi-Tobu Earthquake on September 6. Landslides 2018, 15, 2521–2524. [Google Scholar] [CrossRef]
- Zhang, S.; Li, R.; Wang, F.; Iio, A. Characteristics of Landslides Triggered by the 2018 Hokkaido Eastern Iburi Earthquake, Northern Japan. Landslides 2019, 16, 1691–1708. [Google Scholar] [CrossRef]
- Planet Team. Planet Application Program Interface: In Space for Life on Earth; San Francisco, CA, USA, 2017; p. 2. Available online: https://api.planet.com (accessed on 5 February 2025).
- Conrad, O.; Bechtel, B.; Bock, M.; Dietrich, H.; Fischer, E.; Gerlitz, L.; Wehberg, J.; Wichmann, V.; Böhner, J. System for Automated Geoscientific Analyses (SAGA) v. 2.1. 4. Geosci. Model Dev. 2015, 8, 1991–2007. [Google Scholar] [CrossRef]
- García-Álvarez, D.; Olmedo, M.T.C.; Paegelow, M. Sensitivity of a Common Land Use Cover Change (LUCC) Model to the Minimum Mapping Unit (MMU) and Minimum Mapping Width (MMW) of Input Maps. Comput. Environ. Urban Syst. 2019, 78, 101389. [Google Scholar] [CrossRef]
- Wang, Z.; Brenning, A. Active-Learning Approaches for Landslide Mapping Using Support Vector Machines. Remote Sens. 2021, 13, 2588. [Google Scholar] [CrossRef]
- Muenchow, J.; Brenning, A.; Richter, M. Geomorphic Process Rates of Landslides along a Humidity Gradient in the Tropical Andes. Geomorphology 2012, 139, 271–284. [Google Scholar] [CrossRef]
- Tucker, C.J. Red and Photographic Infrared Linear Combinations for Monitoring Vegetation. Remote Sens. Environ. 1979, 8, 127–150. [Google Scholar] [CrossRef]
- Martha, T.R.; Kerle, N.; Jetten, V.; van Westen, C.J.; Kumar, K.V. Characterising Spectral, Spatial and Morphometric Properties of Landslides for Semi-Automatic Detection Using Object-Oriented Methods. Geomorphology 2010, 116, 24–36. [Google Scholar] [CrossRef]
- Huang, F.; Cao, Z.; Guo, J.; Jiang, S.-H.; Li, S.; Guo, Z. Comparisons of Heuristic, General Statistical and Machine Learning Models for Landslide Susceptibility Prediction and Mapping. Catena 2020, 191, 104580. [Google Scholar] [CrossRef]
- Brenning, A.; Bangs, D.; Becker, M. RSAGA: SAGA Geoprocessing and Terrain Analysis in R Package. 2018. Available online: https://CRAN.R-project.org/package=RSAGA (accessed on 5 February 2025).
- Tong, S. Active Learning: Theory and Applications. Ph.D. Thesis, Stanford University, Stanford, CA, USA, 2001. [Google Scholar]
- Settles, B. Synthesis Lectures on Artificial Intelligence and Machine Learning. In Active Learning; Springer International Publishing: Cham, Switzerland, 2012; ISBN 978-3-031-00432-2. [Google Scholar]
- Tharwat, A.; Schenck, W. A Survey on Active Learning: State-of-the-Art, Practical Challenges and Research Directions. Mathematics 2023, 11, 820. [Google Scholar] [CrossRef]
- Angluin, D. Queries and Concept Learning. Mach. Learn. 1988, 2, 319–342. [Google Scholar] [CrossRef]
- Baum, E.B.; Lang, K. Query Learning Can Work Poorly When a Human Oracle Is Used. In Proceedings of the International Joint Conference on Neural Networks, Beijing, China, 3–6 November 1992; Volume 8, p. 8. [Google Scholar]
- Cohn, D.; Atlas, L.; Ladner, R. Improving Generalization with Active Learning. Mach. Learn. 1994, 15, 201–221. [Google Scholar] [CrossRef]
- Lewis, D.D. A Sequential Algorithm for Training Text Classifiers: Corrigendum and Additional Data. SIGIR Forum 1995, 29, 13–19. [Google Scholar] [CrossRef]
- Tuia, D.; Volpi, M.; Copa, L.; Kanevski, M.; Munoz-Mari, J. A Survey of Active Learning Algorithms for Supervised Remote Sensing Image Classification. IEEE J. Sel. Top. Signal Process. 2011, 5, 606–617. [Google Scholar] [CrossRef]
- Lewis, D.D.; Catlett, J. Heterogeneous Uncertainty Sampling for Supervised Learning. In Machine Learning Proceedings 1994; Elsevier: Amsterdam, The Netherlands, 1994; pp. 148–156. [Google Scholar]
- Demir, B.; Persello, C.; Bruzzone, L. Batch-Mode Active-Learning Methods for the Interactive Classification of Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2010, 49, 1014–1031. [Google Scholar] [CrossRef]
- Goetz, J.N.; Guthrie, R.H.; Brenning, A. Integrating Physical and Empirical Landslide Susceptibility Models Using Generalized Additive Models. Geomorphology 2011, 129, 376–386. [Google Scholar] [CrossRef]
- Petschko, H.; Bell, R.; Glade, T. Effectiveness of Visually Analyzing LiDAR DTM Derivatives for Earth and Debris Slide Inventory Mapping for Statistical Susceptibility Modeling. Landslides 2016, 13, 857–872. [Google Scholar] [CrossRef]
- Wang, Z.; Brenning, A. Unsupervised Active–Transfer Learning for Automated Landslide Mapping. Comput. Geosci. 2023, 181, 105457. [Google Scholar] [CrossRef]
- Knevels, R.; Petschko, H.; Leopold, P.; Brenning, A. Geographic Object-Based Image Analysis for Automated Landslide Detection Using Open Source GIS Software. ISPRS Int. J. Geo-Inf. 2019, 8, 551. [Google Scholar] [CrossRef]
- Pourghasemi, H.R.; Kornejady, A.; Kerle, N.; Shabani, F. Investigating the Effects of Different Landslide Positioning Techniques, Landslide Partitioning Approaches, and Presence-Absence Balances on Landslide Susceptibility Mapping. Catena 2020, 187, 104364. [Google Scholar] [CrossRef]
- Sameen, M.I.; Pradhan, B.; Bui, D.T.; Alamri, A.M. Systematic Sample Subdividing Strategy for Training Landslide Susceptibility Models. Catena 2020, 187, 104358. [Google Scholar] [CrossRef]
- Pradhan, B.; Lee, S. Landslide Susceptibility Assessment and Factor Effect Analysis: Backpropagation Artificial Neural Networks and Their Comparison with Frequency Ratio and Bivariate Logistic Regression Modelling. Environ. Model. Softw. 2010, 25, 747–759. [Google Scholar] [CrossRef]
- Erener, A.; Sivas, A.A.; Selcuk-Kestel, A.S.; Düzgün, H.S. Analysis of Training Sample Selection Strategies for Regression-Based Quantitative Landslide Susceptibility Mapping Methods. Comput. Geosci. 2017, 104, 62–74. [Google Scholar] [CrossRef]
- Raja, N.B.; Çiçek, I.; Türkoğlu, N.; Aydin, O.; Kawasaki, A. Landslide Susceptibility Mapping of the Sera River Basin Using Logistic Regression Model. Nat. Hazards 2017, 85, 1323–1346. [Google Scholar] [CrossRef]
- Kuglitsch, M.M.; Pelivan, I.; Ceola, S.; Menon, M.; Xoplaki, E. Facilitating Adoption of AI in Natural Disaster Management through Collaboration. Nat. Commun. 2022, 13, 1579. [Google Scholar] [CrossRef] [PubMed]
- Huang, S.-J.; Jin, R.; Zhou, Z.-H. Active Learning by Querying Informative and Representative Examples. Adv. Neural Inf. Process. Syst. 2010, 23, 1936–1949. [Google Scholar] [CrossRef]
- Brock, J.; Schratz, P.; Petschko, H.; Muenchow, J.; Micu, M.; Brenning, A. The Performance of Landslide Susceptibility Models Critically Depends on the Quality of Digital Elevation Models. Geomat. Nat. Hazards Risk 2020, 11, 1075–1092. [Google Scholar] [CrossRef]
Study Area 1 | Study Area 2 | Study Area 3 | |
---|---|---|---|
Number of landslides | 3564 | 5380 | 5473 |
Ratio of landslides to non-landslides | 1:1 | 1:12 | 1:30 |
Landslide type | co-seismic landslides | ||
Landslide process | shallow debris slides | ||
Size (km2) | 173 | 623 | 1216 |
Geological units | sedimentary and volcanic rocks | ||
Triggering mechanism | earthquake |
Study Area 1 | Study Area 2 | Study Area 3 | ||||
---|---|---|---|---|---|---|
Predictor Variable | Landslides Median (IQR) | Non-Landslides Median (IQR) | Landslides Median (IQR) | Non-Landslides Median (IQR) | Landslides Median (IQR) | Non-Landslides Median (IQR) |
Slope angle (°, slope) | 18.71 (10.86) | 14.41 (13.94) | 18.50 (11.42) | 10.26 (16.05) | 18.51 (11.42) | 9.88 (17.54) |
Plan curvature (radians per 100 m, plancurv) | −0.00013 (0.01251) | 0.00097 (0.01674) | −0.00007 (0.01244) | 0.00110 (0.01989) | −0.00007 (0.01242) | 0.00105 (0.02024) |
Profile curvature (radians per 100 m, profcurv) | −0.00033 (0.00455) | 0.00003 (0.00414) | −0.00029 (0.00448) | 0.0000 (0.00336) | −0.00029 (0.00448) | 0.0000 (0.00295) |
Upslope contributing area (log10 m2, log.carea) | 2.735 (0.663) | 2.87 (0.578) | 2.719 (0.651) | 2.874 (0.618) | 2.72 (0.65) | 2.91 (0.67) |
Elevation (m, dem) | 140.8 (69.2) | 156.6 (121.3) | 138.8 (73.8) | 117.7 (115.5) | 138.7 (73.99) | 117.75 (139.1) |
TWI | 5.74 (2.08) | 6.38 (2.81) | 5.72 (2.06) | 7.07 (5.12) | 5.72 (2.05) | 7.38 (6.76) |
Catchment slope angle (cslope) | 19.60 (7.35) | 13.75 (9.89) | 19.45 (8.03) | 10.85 (13.17) | 19.46 (8.04) | 10.95 (14.81) |
NDVI | −0.32 (0.28) | −0.08 (0.11) | −0.29 (0.28) | −0.04 (0.17) | −0.29 (0.28) | −0.03 (0.21) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Miao, J.; Wang, Z.; Ma, T.; Wang, Z.; Gao, G. Data Biases in Geohazard AI: Investigating Landslide Class Distribution Effects on Active Learning and Self-Optimizing. Remote Sens. 2025, 17, 2211. https://doi.org/10.3390/rs17132211
Miao J, Wang Z, Ma T, Wang Z, Gao G. Data Biases in Geohazard AI: Investigating Landslide Class Distribution Effects on Active Learning and Self-Optimizing. Remote Sensing. 2025; 17(13):2211. https://doi.org/10.3390/rs17132211
Chicago/Turabian StyleMiao, Jing, Zhihao Wang, Tianshu Ma, Zhichao Wang, and Guoming Gao. 2025. "Data Biases in Geohazard AI: Investigating Landslide Class Distribution Effects on Active Learning and Self-Optimizing" Remote Sensing 17, no. 13: 2211. https://doi.org/10.3390/rs17132211
APA StyleMiao, J., Wang, Z., Ma, T., Wang, Z., & Gao, G. (2025). Data Biases in Geohazard AI: Investigating Landslide Class Distribution Effects on Active Learning and Self-Optimizing. Remote Sensing, 17(13), 2211. https://doi.org/10.3390/rs17132211