Digital Mapping of Soil pH and Driving Factor Analysis Based on Environmental Variable Screening
Abstract
:1. Introduction
2. Materials and Methods
2.1. General Situation
2.2. Sample Collection and Pretreatment
2.3. Obtaining Environmental Covariates
2.4. Research Method
2.4.1. Random Forest
2.4.2. SHAP Interpretation Model
2.4.3. Boruta Algorithm
2.4.4. Recursive Feature Elimination
2.4.5. Particle Swarm Optimization
2.5. Model Accuracy Evaluation
3. Result and Discussion
3.1. Basic Statistics of Soil pH Content
3.2. Explanation of the Driving Force of Spatial Heterogeneity in Soil pH
3.2.1. Pearson Correlation Analysis
3.2.2. Importance Assessment of Environmental Variables Based on the RF Model
3.2.3. SHAP Interpretation Model Results
3.3. Optimization of Variable Combinations Based on Different Environmental Variable Screening Methods
3.4. Cross-Validation Results
3.5. Spatial Distribution of Soil pH
4. Conclusions
- (1)
- The prediction accuracy of the RF model using the Boruta, Recursive Feature Elimination (RFE), and Particle Swarm Optimization (PSO) feature selection methods is better than that of the RF model with all variables. Therefore, it is necessary to screen for environmental variables before establishing a machine learning model, and this can improve the accuracy of the model. The order of prediction accuracy for the four models is PSO-RF > RFE-RF > Boruta-RF > RF.
- (2)
- The Pearson correlation analysis, RF model importance assessment, and SHAP interpretation model all indicate that CNBL, DEM, T_m, E_m, LST_m, and H_m are key factors affecting soil pH. These variables are closely related to the characteristics of terrain changes, microclimatic conditions, and hydrological processes, which collectively affect the spatial distribution of pH.
- (3)
- The spatial distribution trend of soil pH in the four models is basically consistent, showing an overall acidic characteristic. The soil pH values in the eastern and northern regions are relatively high, while the pH values in the central, western, and southern regions are lower. This distribution pattern is closely related to terrain undulations and hydrological conditions. The terrain in the eastern and northern regions is undulating, with good water drainage and accumulation of alkaline substances. The soil pH tends to be more neutral, while the central, western, and southern regions have more water bodies. Due to excessive water content, acidic substances accumulate, resulting in lower soil pH values and exhibiting acidic characteristics.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Huang, J.; Wong, V.N.L.; Triantafilis, J. Mapping soil salinity and pH across an estuarine and alluvial plain using electromagnetic and digital elevation model data. Soil Use Manag. 2014, 30, 394–402. [Google Scholar]
- Lu, Q.; Tian, S.; Wei, L. Digital mapping of soil pH and carbonates at the European scale using environmental variables and machine learning. Sci. Total Environ. 2023, 856, 159171. [Google Scholar]
- Forkuor, G.; Hounkpatin, O.K.L.; Welp, G.; Thiel, M. High Resolution Mapping of Soil Properties Using Remote Sensing Variables in South-Western Burkina Faso: A Comparison of Machine Learning and Multiple Linear Regression Models. PLoS ONE 2017, 12, e0170478. [Google Scholar]
- Vandana, N.; Suresh, G.J.; Mitran, T.; Mahadevappa, S.G. Digital Mapping of Soil pH and Electrical Conductivity Using Geostatistics and Machine Learning. Int. J. Environ. Clim. Change 2024, 14, 273–286. [Google Scholar]
- Rossiter, D.G.; Poggio, L.; Beaudette, D.; Libohova, Z. How well does digital soil mapping represent soil geography? An investigation from the USA. Soil 2022, 8, 559–586. [Google Scholar]
- Öztürk, M.; Kiliç, M.; Günal, H. Digital Mapping of Soil pH and Electrical Conductivity: A Comparative Analysis of Kriging and Machine Learning Approaches. MAS J. Appl. Sci. 2024, 9, 1168–1185. [Google Scholar]
- Zhao, X.; He, C.; Liu, W.S.; Liu, W.X.; Liu, Q.Y.; Bai, W.; Li, L.J.; Lal, R.; Zhang, H.L. Responses of soil pH to no-till and the factors affecting it: A global meta-analysis. Glob. Change Biol. 2022, 28, 154–166. [Google Scholar]
- Demas, G.P.; Rabenhorst, M.C. Factors of subaqueous soil formation: A system of quantitative pedology for submersed environments. Geoderma 2001, 102, 189–204. [Google Scholar]
- McBratney, A.B.; Odeh, I.O.A.; Bishop, T.F.A.; Dunbar, M.S.; Shatar, T.M. An overview of pedometric techniques for use in soil survey. Geoderma 2000, 97, 293–327. [Google Scholar]
- Ma, Y.; Minasny, B.; Malone, B.P.; Mcbratney, A.B. Pedology and digital soil mapping (DSM). Eur. J. Soil Sci. 2019, 70, 216–235. [Google Scholar] [CrossRef]
- Tanasă, I.C.; Niculită, M.; Roșca, B.; Pîrnău, R. Pedometric techniques in spatialisation of soil properties for agricultural land evaluation. Bull. Univ. Agric. Sci. Vet. Med. Cluj-Napoca Agric. 2010, 67, 274–278. [Google Scholar]
- Ballabio, C. Spatial prediction of soil properties in temperate mountain regions using support vector regression. Geoderma 2009, 151, 338–350. [Google Scholar]
- Kovačević, M.; Bajat, B.; Gajić, B. Soil type classification and estimation of soil properties using support vector machines. Geoderma 2010, 154, 340–347. [Google Scholar] [CrossRef]
- Dharumarajan, S.; Hegde, R.; Singh, S.K. Spatial prediction of major soil properties using Random Forest techniques-A case study in semi-arid tropics of South India. Geoderma Reg. 2017, 10, 154–162. [Google Scholar] [CrossRef]
- da Silva Chagas, C.; de Carvalho Junior, W.; Bhering, S.B.; Calderano Filho, B. Spatial prediction of soil surface texture in a semiarid region using random forest and multiple linear regressions. Catena 2016, 139, 232–240. [Google Scholar] [CrossRef]
- Kohavi, R.; John, G.H. Wrappers for feature subset selection. Artif. Intell. 1997, 97, 273–324. [Google Scholar]
- Nilsson, R.; Pena, J.M.; Björkegren, J.; Tegnér, J. Consistent feature selection for pattern recognition in polynomial time. J. Mach. Learn. Res. 2007, 8, 589–612. [Google Scholar]
- Chen, Y.W.; Lin, C.J. Combining SVMs with various feature selection strategies. Feature Extr. Found. Appl. 2006, 207, 315–324. [Google Scholar]
- Kursa, M.B.; Jankowski, A.; Rudnicki, W.R. Boruta–a system for feature selection. Fundam. Informaticae 2010, 101, 271–285. [Google Scholar]
- Cools, N.; Delanote, V.; Scheldeman, X.; Quataert, P.; De Vos, B.; Roskams, P. Quality assurance and quality control in forest soil analyses: A comparison between European soil laboratories. Accredit. Qual. Assur. 2004, 9, 688–694. [Google Scholar]
- McBratney, A.B.; Santos, M.L.M.; Minasny, B. On digital soil mapping. Geoderma 2003, 117, 3–52. [Google Scholar] [CrossRef]
- Zeraatpisheh, M.; Ayoubi, S.; Jafari, A.; Tajik, S.; Finke, P. Digital mapping of soil properties using multiple machine learning in a semi-arid region, central Iran. Geoderma 2019, 338, 445–452. [Google Scholar] [CrossRef]
- Mousavi, A.; Karimi, A.; Maleki, S.; Safari, T.; Taghizadeh-Mehrjardi, R. Digital mapping of selected soil properties using machine learning and geostatistical techniques in Mashhad plain, northeastern Iran. Environ. Earth Sci. 2023, 82, 234. [Google Scholar]
- Lundberg, S. A unified approach to interpreting model predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar]
- Wadoux, A.M.J.C.; Saby, N.P.A.; Martin, M.P. Shapley values reveal the drivers of soil organic carbon stock prediction. Soil 2023, 9, 21–38. [Google Scholar] [CrossRef]
- Rossel, R.A.V.; Behrens, T. Using data mining to model and interpret soil diffuse reflectance spectra. Geoderma 2010, 158, 46–54. [Google Scholar] [CrossRef]
- Vohland, M.; Emmerling, C. Determination of total soil organic C and hot water-extractable C from VIS-NIR soil reflectance with partial least squares regression and spectral feature selection techniques. Eur. J. Soil Sci. 2011, 62, 598–606. [Google Scholar]
- Liess, M.; Hitziger, M.; Huwe, B. The Sloping Mire Soil-Landscape of Southern Ecuador: Influence of Predictor Resolution and Model Tuning on Random Forest Predictions. Appl. Environ. Soil Sci. 2014, 2014, 57–66. [Google Scholar] [CrossRef]
- Wang, X.; Ding, J.; Han, L.; Tan, J.; Ge, X. Enhancing soil particle content prediction accuracy: Advanced hyperspectral analysis and machine learning models. J. Soils Sediments Prot. Risk Assess. Remediat. 2024, 24, 3443–3458. [Google Scholar]
- Zhou, Q.; Ding, J.; Ge, X.; Li, K.; Zhang, Z.; Gu, Y. Estimation of soil organic matter in the Ogan-Kuqa River Oasis, Northwest China, based on visible and near-infrared spectroscopy and machine learning. J. Arid Land 2023, 15, 191–204. [Google Scholar] [CrossRef]
- Arrouays, D.; Grundy, M.G.; Hartemink, A.E.; Hempel, J.W.; Heuvelink, G.B.; Hong, S.Y.; Lagacherie, P.; Lelyk, G.; McBratney, A.B.; McKenzie, N.J.; et al. GlobalSoilMap: Toward a Fine-Resolution Global Grid of Soil Properties. Adv. Agron. 2014, 125, 93–134. [Google Scholar]
- Xue, B.; Zhang, M.; Browne, W.N. Particle swarm optimization for feature selection in classification: A multi-objective approach. IEEE Trans. Cybern. 2013, 43, 1656–1671. [Google Scholar] [PubMed]
- Niu, D.X.; Guo, Y.C. An Improved PSO for Parameter Determination and Feature Selection of SVR and its Application in STLF. J. Mult.-Valued Log. Soft Comput. 2010, 16, 567. [Google Scholar]
- Zhang, Y.; Gong, D.W.; Cheng, J. Multi-Objective Particle Swarm Optimization Approach for Cost-Based Feature Selection in Classification. IEEE/ACM Trans. Comput. Biol. Bioinform. 2017, 14, 64–75. [Google Scholar] [CrossRef] [PubMed]
- Mondal, A.; Khare, D.; Kundu, S.; Mondal, S.; Mukherjee, S.; Mukhopadhyay, A. Spatial soil organic carbon (SOC) prediction by regression kriging using remote sensing data. Egypt. J. Remote Sens. Space Sci. 2017, 20, 61–70. [Google Scholar] [CrossRef]
- Chen, S.; Mulder, V.L.; Heuvelink, G.B.M.; Poggio, L.; Caubet, M.; Dobarco, M.R.; Walter, C.; Arrouays, D. Model averaging for mapping topsoil organic carbon in France. Geoderma 2020, 366, 114237. [Google Scholar]
- Lamichhane, S.; Kumar, L.; Wilson, B. Digital soil mapping algorithms and covariates for soil organic carbon mapping and their implications: A review. Geoderma 2019, 352, 395–413. [Google Scholar]
- Gardi, C.; Yigini, Y. Continuous mapping of soil pH using digital soil mapping approach in Europe. Eurasian J. Soil Sci. 2012, 1, 64–68. [Google Scholar]
- Xia, Y.; McSweeney, K.; Wander, M.M. Digital mapping of agricultural soil organic carbon using soil forming factors: A review of current efforts at the regional and national scales. Front. Soil Sci. 2022, 2, 890437. [Google Scholar] [CrossRef]
- Zhou, Y.; Zhao, X.; Guo, X.; Li, Y. Mapping of soil organic carbon using machine learning models: Combination of optical and radar remote sensing data. Soil Sci. Soc. Am. J. 2022, 86, 293–310. [Google Scholar] [CrossRef]
- Zhang, J.; Schmidt, M.G.; Heung, B.; Bulmer, C.E.; Knudby, A. Using an ensemble learning approach in digital soil mapping of soil pH for the Thompson-Okanagan region of British Columbia. Can. J. Soil Sci. 2022, 102, 579–596. [Google Scholar]
- Liu, Y.; Han, X.; Zhu, Y.; Li, H.; Qian, Y.; Wang, K.; Ye, M. Spatial mapping and driving factor Identification for salt-affected soils at continental scale using Machine learning methods. J. Hydrol. 2024, 639, 131589. [Google Scholar] [CrossRef]
- Esmaeilizad, A.; Shokri, R.; Davatgar, N.; Dolatabad, H.K. Exploring the driving forces and digital mapping of soil biological properties in semi-arid regions. Comput. Electron. Agric. 2024, 220, 108831. [Google Scholar] [CrossRef]
- Zhao, C.; Li, P.; Yan, Z.; Zhang, C.; Meng, Y.; Zhang, G. Effects of landscape pattern on water quality at multi-spatial scales in Wuding River Basin, China. Environ. Sci. Pollut. Res. 2024, 31, 19699–19714. [Google Scholar]
- Chen, S.; Arrouays, D.; Mulder, V.L.; Poggio, L.; Minasny, B.; Roudier, P.; Libohova, Z.; Lagacherie, P.; Shi, Z.; Hannam, J.; et al. Digital mapping of GlobalSoilMap soil properties at a broad scale: A review. Geoderma 2022, 409, 115567. [Google Scholar]
- Asgari, N.; Ayoubi, S.; Demattê, J.A.M.; Jafari, A.; Safanelli, J.L.; Da Silveira, A.F. Digital mapping of soil drainage using remote sensing, DEM and soil color in a semiarid region of Central Iran. Geoderma Reg. 2020, 22, e00302. [Google Scholar]
- Costa, E.M.; Samuel-Rosa, A.; Anjos, L.H.C. Digital elevation model quality on digital soil mapping prediction accuracy. Ciência. Agrotecnol. 2018, 42, 608–622. [Google Scholar]
Soil Forming Factor | Input Variables | Spatial Resolution |
---|---|---|
Topographic | Analytical Hillshading (AH), Aspect (ASP), Closed Depressions (CD), Convergence Index (CI), Channel Network Base Level (CNBL), Channel Network Distance (CND), Elevation (DEM), Coefficient of Variation of Elevation (ECV), LS-Factor (LS), Mass Balance Index (MBI), Multi-Scale Ridge Top Flatness (MRRTF), Multi-Resolution Valley Bottom Flatness (MRVBF), Plan Curvature (PLC), Profile Curvature (PRC), Relative Slope Position (RSP), Surface Cutting Depth (SCD), Slope (SLP), Total Catchment Area (TCA), Topographic Position Index (TPI), Terrain Ruggedness Index (TRI), Topographic Wetness Index (TWI), Terrain Undulation (TU), Valley Depth (VD), Wind Exposition Index (WEI) | 12.5 m |
Biological | Bare soil index (BSI), Enhanced Vegetation Index (EVI), Global Environment Monitoring Index (GEMI), Green Normalized Difference Vegetation Index (GNDVI), Modified Normalized Difference Water Index (MNDWI), Modified Soil-Adjusted Vegetation Index (MSAVI), Normalized Difference Moisture Index (NDMI), Normalized Difference Vegetation Index (NDVI), Normalized Difference Water Index (NDWI), Net Primary Production (NPP), Soil Adjusted Vegetation Index (SAVI), Simple Ratio (SR), Visible-Light Atmospheric Impedance Index (VARI) | 10 m |
Soil texture | Sand content (Sand), Silt content (Silt), Clay content (Clay) | 900 m |
Climate | Evaporation (E_m), Humidity mean (H_m), Land surface temperature mean (LST_m), Precipitation mean (P_m), Temperature mean (T_m) | 1000 m |
Land use (LU) | Vector data | |
Soil type (ST) |
Type | Max | Min | AVE | SD | CV (%) |
---|---|---|---|---|---|
Both sets | 8.50 | 4.02 | 5.79 | 0.81 | 13.90 |
Training set | 8.50 | 4.02 | 5.80 | 0.80 | 13.77 |
Validation set | 8.40 | 4.02 | 5.76 | 0.83 | 14.41 |
Variable Selection Method | Variable Combination Results |
---|---|
Boruta | CNBL, DEM, P_m, T_m, LST_m, H_m, E_m, WEI, NDWI, MRVBF, GNDVI, GEMI, VD, SAVI, NDVI, NPP, sand, MNDWI, MSAVI, NDMI, RSP, ECV, BSI, clay, ST, MRRTF, silt |
REF | CNBL, P_m, LST_m, E_m |
PSO | CNBL, DEM, T_m, LST_m, P_m, H_m, E_m |
Method | MAE | RMSE | R2 | LCCC |
---|---|---|---|---|
RF | 0.753 | 0.888 | 0.233 | 0.237 |
Boruta-RF | 0.580 | 0.670 | 0.293 | 0.327 |
RFE-RF | 0.565 | 0.662 | 0.317 | 0.479 |
PSO-RF | 0.496 | 0.641 | 0.413 | 0.508 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huang, H.; Liu, Y.; Liu, Y.; Tong, Z.; Ren, Z.; Xie, Y. Digital Mapping of Soil pH and Driving Factor Analysis Based on Environmental Variable Screening. Sustainability 2025, 17, 3173. https://doi.org/10.3390/su17073173
Huang H, Liu Y, Liu Y, Tong Z, Ren Z, Xie Y. Digital Mapping of Soil pH and Driving Factor Analysis Based on Environmental Variable Screening. Sustainability. 2025; 17(7):3173. https://doi.org/10.3390/su17073173
Chicago/Turabian StyleHuang, He, Yaolin Liu, Yanfang Liu, Zhaomin Tong, Zhouqiao Ren, and Yifan Xie. 2025. "Digital Mapping of Soil pH and Driving Factor Analysis Based on Environmental Variable Screening" Sustainability 17, no. 7: 3173. https://doi.org/10.3390/su17073173
APA StyleHuang, H., Liu, Y., Liu, Y., Tong, Z., Ren, Z., & Xie, Y. (2025). Digital Mapping of Soil pH and Driving Factor Analysis Based on Environmental Variable Screening. Sustainability, 17(7), 3173. https://doi.org/10.3390/su17073173