Next Article in Journal
Characterization and Mapping of Conservation Hotspots for the Climate-Vulnerable Conifers Abies nephrolepis and Picea jezoensis in Northeast Asia
Previous Article in Journal
Canopy Cover Drives Odonata Diversity and Conservation Prioritization in the Protected Wetland Complex of Thermaikos Gulf (Greece)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Applicability of Visible–Near-Infrared Spectroscopy to Predicting Water Retention in Japanese Forest Soils

1
Forestry and Forest Products Research Institute, Tsukuba 305-8687, Japan
2
Kansai Research Center, Forestry and Forest Products Research Institute, Kyoto 612-0855, Japan
*
Author to whom correspondence should be addressed.
Forests 2025, 16(7), 1182; https://doi.org/10.3390/f16071182
Submission received: 7 June 2025 / Revised: 7 July 2025 / Accepted: 15 July 2025 / Published: 17 July 2025
(This article belongs to the Section Forest Soil)

Abstract

This study assessed the applicability of visible–near-infrared (vis-NIR) spectroscopy to predicting the water retention characteristics of forest soils in Japan, which vary widely owing to the presence of volcanic ash. Soil samples were collected from 34 sites, and the volumetric water content was measured at eight levels of matric suction. Spectral data were processed by using the second derivative of the absorbance, and regression models were developed by using explainable boosting machine (EBM), which is an interpretable machine learning method. Although the prediction accuracy was limited owing to the small sample size and soil heterogeneity, EBM performed better under saturated conditions (R2 = 0.30), which suggests that vis-NIR spectroscopy can capture water-related features, especially under wet conditions. Importance analysis consistently selected wavelengths that were associated with organic matter and hydrated clay minerals. The important wavelengths clearly shifted from free-water bands in wet soils to mineral-related absorption bands in dry soils. These findings highlight the potential of coupling vis-NIR spectroscopy with interpretable models like EBM for estimating the hydraulic properties of forest soils. Improved accuracy is expected with larger datasets and stratified models by soil type, which can facilitate more efficient soil monitoring in forests.

1. Introduction

Climate change is expected to increase the frequency of extreme rainfall events such as heavy rainfall and drought [1]. Predicting the effects of such extreme rainfall events on forest environments and ecosystems requires understanding the soil water dynamics, which include hydraulic properties such as the hydraulic conductivity and water retention. However, the traditional methods for measuring hydraulic properties, particularly water retention, are costly and time-consuming. Visible–near-infrared (vis-NIR) spectroscopy has been gaining wider application for measuring various physical and chemical properties of soils and their principal components owing to its simplicity and rapid processing with various regression methods [2,3,4]. Janik et al. analyzed the infrared bands to accurately predict chemical properties of soils such as the carbon content and cation exchange capacity [5]. Shepherd and Walsh used vis-NIR spectroscopy to show that the physical properties of soils are correlated with their clay content and chemical composition [2]. Minasny et al. used vis-NIR spectroscopy to demonstrate that physical properties of soil, such as the texture, clay content, and air-dry moisture content, are correlated with the specific surface area and solid composition [6].
Various studies have shown that vis–NIR spectra can be used to predict soil water retention at different matric suctions [7,8] as well as model parameters in the van Genuchten equation [9]. Moreira de Melo and Pedrollo combined vis-NIR spectroscopy with deep learning methods to predict the field capacity and wilting point of soils or even the full retention curve with high accuracy (R2 > 0.8) [10]. However, most of these studies relied on black-box algorithms such as partial least-squares regression (PLS), random forest, or neural networks, which makes it difficult to interpret how individual wavelengths contribute to the predicted water retention [3,11,12]. In particular, PLS compresses the original spectral variables into latent components, which makes it difficult to determine the predictive importance of individual wavelengths [11]. Neural networks can capture complex nonlinear relationships but often function as black-box models with limited transparency. While some studies have highlighted the importance of spectral regions around 1400, 1920, and 2200 nm, few have quantitatively evaluated the contribution of each wavelength [7,9]. To address this issue, post-hoc interpretability tools such as SHapley Additive exPlanations (SHAP) [13] and Local Interpretable Model-agnostic Explanations (LIME) [14] have been applied to attributing features after the model has been trained. However, these tools provide retrospective interpretations and remain disconnected from the model structure, which limits their scientific applicability in understanding the underlying mechanisms between soil properties and vis–NIR spectra [15,16].
In contrast, the explainable boosting machine (EBM) [17] is an inherently interpretable machine learning algorithm that combines the performance of gradient boosting with the transparency of an additive model structure. Thus, EBM can model the nonlinear effects of individual variables through smooth and additive functions while selectively incorporating only the most relevant pairwise interactions [18]. Furthermore, EBM retains the original spectral variables, allowing for direct visualization of the influence of each wavelength on the response variable. Despite its high interpretability, EBM has demonstrated a predictive performance comparable to that of modern machine learning models using black-box algorithms, including deep learning [19]. Thus, EBM can be used to visualize the effects of individual features and their pairwise interactions, which allows identification of the wavelengths that contribute most to the predicted water retention at various matric suctions and addresses a major limitation of previous approaches.
Despite its wide use in agricultural contexts, vis–NIR spectroscopy has rarely been applied to predicting the water retention of forest soils. Studies on upland field soils have suggested that the prediction accuracy is higher for dry regions than for wetter regions, which is likely due to the contribution of clay minerals to residual water retention [20,21]. Conforti et al. [22] used vis-NIR spectroscopy to predict the clay content and associated properties of Italian forest soils with high accuracy (R2 = 0.80). However, forest topsoils are generally characterized by having a higher soil organic carbon (SOC) content than upland field soils, which reduces the bulk density [23]. Especially in Japan, the SOC content and bulk density of forest soils vary widely because the parent materials are often derived from volcanic ash, which strongly affects the saturated water content. A recent meta-analysis by Chinilina et al. [24] demonstrated that SOC can be predicted from vis–NIR spectra with reasonably good accuracy across a wide range of soil types and geographic regions, reporting a median R2 of 0.67 and a median RPD of 1.99. When focusing specifically on forest soils, Liu et al. [25] reported a much higher prediction accuracy (R2 = 0.96), indicating the strong potential of vis–NIR spectroscopy for SOC estimation in such environments.
Previous studies have shown that vis-NIR spectroscopy can be used to accurately determine the SOC content and that the SOC content has a strong correlation with the bulk density. Thus, this study investigated the applicability of vis-NIR spectroscopy to predicting the water retention of Japanese forest soils based on the hypothesis that the saturated water content of forest soils can be predicted from the SOC content. Vis-NIR spectroscopy was applied to 151 forest soil samples collected from 34 soil profiles in Japan, and EBM was applied to clarify which spectral regions are most relevant to the volumetric water content at a given matric suction.

2. Materials and Methods

2.1. Study Area

Figure 1 shows the location of soil sampling sites. Japan is one of the most volcanically active countries in the world. While the geological basement consists mainly of tuff and sedimentary rocks, the surface soils are widely influenced by the addition and deposition of volcanic ash. According to the Fundamental Soil Classification Chart of Japan (Japanese Society of Pedology [26]), Brown Forest Soils (Cambisols in World Reference Base for Soil Resources (WRB) (IUSS Working Group WRB [27]) account for 33.2% of the land area, followed by Andosols (Andosols in WRB [27]) (30.3%), Fluvic Soils (Fluvisols in WRB [27]) (13.7%), Red-Yellow Soils (Acrisols, Lixisols, Alisols, or Luvisols in WRB [27]) (7.6%), and Regosols (Regosols and Lithosols in WRB [27]) (6.9%). The 34 sampling sites used in this study are distributed across the Japanese archipelago, covering a wide range of climatic and geological conditions. In central Japan (e.g., sites 1–5, 14, 19–21, 33, 34), volcanic ash–derived Andosols (Andosols in WRB [27]) are dominant [28], while western Japan (sites 7, 24–32) features forest soils developed from a variety of parent materials [28]. Sites in southern Kyushu (sites 8, 15) are strongly affected by volcanic activity [28], and the subtropical Ryukyu Islands (sites 9–13, 16–18) are characterized by highly weathered Red-Yellow Soils (Acrisols, Lixisols, Alisols, or Luvisols in WRB [27]) [28].

2.2. Soil Samples

Undisturbed soil samples were collected from 38 soil profiles in Japan (n = 151). Table 1 presents the details on the soil samples. The sampling depth depended on the soil profile and thus differed between sampling sites. Soil samples were collected from the middle part of each horizon by using 100 (φ50 × H51 mm) or 400 (φ113 × H40 mm) cm3 stainless-steel core samplers. In the laboratory, the collected soil samples were saturated by capillary rise with the water level maintained at approximately 1 cm from the bottom of the cylinder for several days. The mass of each soil sample was then measured while under saturated conditions. This condition was used to determine the volumetric water content at saturation (θs) for each sample. The volumetric water content was measured by the pressure plate method at different matric suctions: pF 1.0 (=0.98 kPa), pF 1.4 (=2.5 kPa), pF 1.7 (=4.9 kPa), pF 2.0 (=9.8 kPa), pF 2.4 (=25 kPa), pF 2.7 (=49 kPa), pF 3.0 (=98 kPa), and pF 3.2 (=147 kPa). Hereafter, the volumetric water content (θ) at a given matric suction (pF i) is referred to as θi (e.g., θ1.0 corresponds to the volumetric water content at pF1.0). The soil samples were then air-dried at room temperature for 2 weeks, after which they were passed through a sieve with a 2-mm mesh size to remove tree roots and gravel. The soil samples were then ground for vis-NIR spectroscopy.

2.3. Spectroscopic Analysis

The absorbance spectra of the soil samples were measured in the visible and near-infrared regions (400–2500 nm) at a spectral resolution of 2 nm (XDS NIR Analyzer, FOSS NIR Systems, Inc., Laurel, MD, USA). Measurements were repeated three times, and the mean absorbance was taken to predict the water retention. The Savitzky–Golay method was employed, where the absorbance spectra were smoothed by using a quadratic polynomial and a window size of five points, after which the second derivative was taken [29]. All regression models were developed by using absorbance spectra acquired from the air-dried state of each soil sample following pF measurements. The volumetric water contents at a given matric suction (θi) were grouped based on the soil profiles to avoid splitting data collected from different depths of the same soil profile into training and test datasets. The θi data were then split 8:2 into training and test datasets according to the groups. A regression model was developed from the training dataset and was then optimized using the leave-one-group-out cross-validation method.

2.4. EBM

In this study, EBM was used to predict the water retention of soil samples from their vis-NIR spectra owing to its balance of accuracy with feature-level transparency, which facilitates both robust prediction and scientific insights into the relationships between spectral data and hydraulic properties. In EBM, the conditional expectation E Y is defined as an additive function of the predictors X :
E Y = β 0 + j = 1 p f j X j + i , j I f i , j X i , X j
where β 0 , f j X j , and f i , j X i , X j denote the intercept, a univariate shape function, and an optional pairwise interaction term, respectively, for the selected variable pair i , j I . Unlike the standard multiple linear regression, where the response is modeled as a linear combination of the predictors, each component function f j X j is a nonparametric and potentially nonlinear function learned from the data. This allows EBM to flexibly capture complex and smooth relationships between individual predictors and the outcome without assuming linearity or monotonicity. For example, in spectroscopic analysis, the effect of a specific wavelength may exhibit threshold-like, saturating, or oscillatory patterns that cannot be adequately represented by a linear coefficient. These learned shape functions can be directly visualized and interpreted to provide insights into the role of each variable in the prediction. These functions are learned via cyclic gradient boosting on residuals using shallow bagged trees, typically with a maximum depth of two, to preserve interpretability and reduce variance [17,18]. The model is trained in a stage-wise fashion by minimizing a loss function (e.g., squared error or logistic loss) using additive updates. At each boosting iteration, the residuals are computed from the current ensemble prediction, and the next shape function is fitted to those residuals. By constraining the complexity of each base learner (e.g., via max depth and learning rate) and applying early stopping or feature dropout, EBM balances predictive performance and generalization. The interaction terms are only introduced when they contribute substantially to the reduction of loss based on the validation performance [19].
All models were trained on a desktop computer (Lenovo, Morrisville, NC, USA) with an Intel® CoreTM i7-14700KF CPU (3.40 GHz, 20 cores/ 28 threads) and 32 GB RAM. Parallel computation was enabled using the “n_jobs = −1” setting. Due to the sequential feature-wise training structure of EBM, the total training time for all models (corresponding to each θi) was approximately two days. Despite the relatively small dataset (n = 151), the training process was computationally intensive. Nevertheless, the high level of interpretability offered by EBM justified the additional computational cost in this study.

2.5. Performance Evaluation

The model was applied to the test dataset, and the performance was evaluated in terms of the coefficient of determination (R2), root mean squared error (RMSE), and ratio of performance to deviation (RPD):
R 2   = 1 y i y i ^ 2 y i y i ¯ 2
R M S E = 1 n y i y i ^ 2
R P D = S . E . R M S E
where y i , y i ^ , y i ¯ , and n are the observed θ i , predicted θ i , mean θ i , and number of samples, respectively. S.E. is the standard error of the means of θ i in the test dataset. The model with the lowest RMSE and the highest RPD and R2 was considered to perform the best. According to Riley et al. [30], the minimum required sample size for developing a multivariable prediction model depends on factors such as the number of predictors, the expected R2, and the outcome variance. Based on their framework and given the 1050 explanatory variables used in this study, approximately 3360 samples would be needed to achieve a target R2 greater than 0.8. [30]. As only 151 samples were used in this analysis, the theoretically expected R2 was no more than 0.157, according to Riley et al. [30]. All model development and performance evaluation processes were performed in Python (version 3.11.7, Python Software Foundation, Wilmington, DE, USA) [31] by using the interpret package (version 0.6.10., InterpretML, Redmond, WA, USA) [19] and the scikit-learn package (version 1.2.2., INRIA, Paris, France) [32]. Statistical analysis was performed in R (version 4.4.0., R Foundation for Statistical Computing, Vienna, Austria) [33] by using the multcomp package (version 1.4.26., Torsten Hothorn et al., Zurich, Switzerland) [34].

3. Results

3.1. Measurements

Figure 2a shows that the volumetric water content gradually decreased as the matric suction increased (n = 151). The most outliers were observed at pF1.7 (i.e., θ1.7). No significant differences (p < 0.05) were observed between θ1.4 and θ1.7, θ1.7 and θ2.0, θ2.0 and θ2.4, θ2.4 and θ2.7, θ2.4 and θ3.0, θ2.7 and θ3.0, θ2.7 and θ3.2, and θ3.0 and θ3.2. Figure 2b clearly visualizes the differences in volumetric water content at each matric suction as influenced by parent material. The degree of variability in water content differed notably among the parent materials. For instance, soils derived from dacite exhibited the highest variability across all matric suctions (e.g., S.D. ≈ 0.15 at θ2.0), likely reflecting heterogeneity in texture or organic matter content. In contrast, granite-derived soils showed consistently low variability (e.g., S.D. ≈ 0.02 at θ2.0), suggesting more uniform physical properties. Soils developed from andesite and conglomerate exhibited intermediate levels of variability, whereas those derived from granodiorite showed relatively stable water retention characteristics even under dry conditions.
Figure 3 shows the absorbance spectra of the soil samples and their second derivative. As the soil moisture content increased, the peak positions remained constant, although the reflectance decreased [35]. The absorbance spectra (Figure 3a) had several characteristic shoulders and peaks. The shoulder near 1100 nm is commonly associated with the first and second overtones of O–H stretching and water vapor absorption [36]. Strong absorption peaks below 600 nm may be attributable to organic matter, as the absorbance near 570–700 nm has previously been linked to chromophoric organic compounds [37]. Broad and overlapping peaks at 600–750 nm suggest potential contributions from both organic matter and other chromophores. This feature is more clearly observed in the second derivative (Figure 3b), where small but consistent variations suggest superimposed peaks in this range. This interpretation aligns with previous studies reporting that multiple weak absorbance features related to organic components and iron oxides can overlap in this spectral window. In addition, relatively strong peaks were observed near 1900 nm, and weak peaks were observed around 1400 and 2200 nm. The increase in absorption at 2250–2500 nm suggests contributions from various clay minerals, particularly those associated with O–H and H2O overtones [3]. Specifically, the peaks near 1400 and 1900 nm are typically related to bound water vibrations, while those near 2200–2500 nm correspond to metal–OH bonds involving elements such as Al, Fe, and Mg [3].

3.2. Prediction Performance

Figure 4 presents the predicted volumetric water content at a given matric suction i (θi) by EBM. The expected R2 was 0.157 based on the sample size in this study [30]. The model outperformed the expected R2 at θs (i.e., saturation conditions), performed comparably at θ1.0 and θ3.2, and underperformed at other θi. The best model performance was obtained at θs (R2 = 0.30, RPD = 1.22). This indicates that the hypothesis (i.e., the saturated water content of forest soils can be predicted from vis-NIR spectra) is likely valid under the conditions tested. In other words, despite the very limited sample size used in this study, the fact that the R2 exceeded the theoretically expected maximum of only 0.157 by nearly a factor of two provides indicative support for the plausibility of the hypothesis. The worst model performance was observed at θ2.0 and θ2.4 based on the negative R2 values. The model performances at θ1.4 and θ3.0 were inadequate because the R2 values were lower than the expected R2. The mean RPD (1.07 ± 0.07) was similar even though R2 was between −0.03 and 0.30. The RMSE between θ1.4 and θ3.0 was approximately 0.07, which is within approximately 15% of the predicted values, despite the inadequate model performance. The model performance tended to decrease toward θ3.0 but improved again at θ3.2, which indicates that the prediction accuracy improved at matric suctions higher than pF 3.2 (e.g., pF 4.2). Many previous studies [20,21] have found that the prediction accuracy of their models improved when applied to predicting the volumetric water content in dry regions rather than in wet regions. Blaschek et al. were able to predict the volumetric water content at the permanent wilting point (pF 4.2 = −1500 kPa) more accurately than at the field capacity (pF1.0 = −0.98 kPa) [20].
Figure 5 presents the importance of each wavelength for predicting the volumetric water content at a given matric suction. Table 2 lists the ten most important wavelengths and their values for a given matric suction. Clear peaks in importance were observed in specific spectral regions. Notably, the wavelengths of 1690 and 1592 nm were consistently selected at nine matric suctions, indicating their importance to the model. These wavelengths correspond to the absorption features of C–H and O–H bonds associated with organic matter and clay minerals, which are both known to influence the water retention capacity of a soil [38,39]. Additionally, the wavelength of 930 nm, which is close to the third overtone of water, and the band at 1768–1824 nm, which is associated with the first overtone of free water, were frequently selected. This suggests that both free and bound water features were captured by the model [3,40]. A key observation is the systematic shift in selected wavelengths across the pF spectrum. Under wetter conditions (i.e., θs to θ2.0), the model emphasized shorter wavelengths such as 930 and 1768 nm, which are directly related to free water absorption. Under drier conditions (i.e., θ2.4 to θ3.2), the model emphasized longer wavelengths such as 2104, 2344, and 2362 nm, which are associated with metal–OH bonds in hydrated minerals [39]. This transition illustrates the model’s ability to dynamically adjust its spectral focus based on the prevailing moisture state by utilizing both direct water-related absorption bands and indirect structural indicators. In addition to the frequency of their selection, the magnitude of their importance underscored the relevance of specific wavelengths. For instance, the wavelengths of 1690 and 1592 nm were not only selected consistently across matric suctions but also exhibited high absolute importance values, which demonstrates their strong contribution to the model. These wavelengths likely serve as robust indicators of the intrinsic soil properties governing water retention. In contrast, other wavelengths were selected less frequently or had a lower magnitude of importance, which suggests that they may function as auxiliary predictors under specific conditions. Overall, the combination of consistent selection and high importance values for key wavelengths supports the conclusion that EBM successfully identified spectrally and physically meaningful features and offers both interpretability and reliability despite the limited sample size.

4. Discussion

The prediction performance of the model exceeded theoretical expectations particularly given the small sample size of the dataset. Notably, the model consistently identified wavelengths known to be associated with soil water retention, such as organic matter and clay content. These findings suggest that the regression framework and model’s ability to extract meaningful spectral information were appropriate and that the primary limitations can be attributed to insufficient statistical power resulting from the small sample size. Importantly, as mentioned in Section 2.4, approximately 3360 samples would be theoretically required to achieve an R2 greater than 0.8 [30]; other studies have reported considerably higher prediction accuracy with fewer samples. For instance, Roth et al. [41] achieved R2 ≈ 0.6 using 230 samples, and Lalitha et al. [42] reported R2 ≈ 0.7 with 558 samples. Although these studies targeted different soil properties and environmental conditions, they nonetheless demonstrate that moderate increases in sample size can substantially improve model performance. Thus, even without reaching the theoretical threshold, increasing the sample size in future studies is expected to make a major contribution to improving prediction accuracy.
It is also important to consider the influence of soil heterogeneity. The dataset consisted of 151 samples collected from 34 profiles, encompassing a wide range of parent materials (Table 1), including volcanic ash, granite, schist, and sedimentary rocks. This heterogeneity introduces considerable spectral variability, which can reduce model performance, as previously noted by Stenberg et al. [3] and Pittaki-Chrysodonta et al. [37]. While large sample sizes may help mitigate this effect, under data-scarce conditions, such variability can act as a confounding factor. Stratifying models by soil type or incorporating ancillary information (e.g., texture, mineralogy) may enhance prediction accuracy, as demonstrated by Baumann et al. [21]. Similarly, Oberholzer et al. [43] reported improved Vis–NIR model performance (RPD = 1.14–5.27) when soils were stratified by carbonate content, supporting the utility of targeted modeling strategies in heterogeneous environments.
Despite these limitations, the EBM model successfully captured spectrally and physically meaningful relationships related to soil water retention. As shown in Figure 5 and Table 2, the model consistently selected wavelengths associated with organic matter (e.g., 1592 nm, 1690 nm) and hydrated clay minerals (e.g., 2104 nm, 2362 nm). The shift in important wavelengths from shorter (free water) to longer (bound water and mineral-associated) bands with increasing matric suction demonstrates the model’s sensitivity to moisture conditions. The interpretability of EBM is especially valuable in this context, as it allows for transparent evaluation of feature relevance, unlike traditional black-box models such as neural networks or PLS. Recent developments in interpretable machine learning, such as explainable gradient boosting frameworks (e.g., Jeong et al. [44]), have emphasized the importance of transparent modeling in environmental sciences. These models maintain high predictive accuracy while providing direct insight into the roles of individual predictors, similar to the advantages demonstrated by EBM in this study. These findings support the utility of combining vis–NIR spectroscopy with explainable machine learning techniques in future soil hydraulic studies, particularly if applied to larger, stratified datasets for enhanced generalizability.

5. Conclusions

This study investigated the applicability of vis-NIR spectroscopy to predicting the water retention characteristics of Japanese forest soils, which often contain volcanic ash. While the overall prediction accuracy remained low because of the limited sample size and heterogeneous soil types, EBM effectively identified key spectral regions associated with water retention. The prediction performance was particularly good under saturated conditions, achieving R2 = 0.30, RMSE = 0.067, and RPD = 1.22, which exceeded the theoretical expectation of R2 = 0.157 given the small sample size. Although the regression accuracy under other matric suctions was generally lower, the RMSE values remained within approximately 15% of the predicted values. Important wavelengths varied with the matric suction as clay- and moisture-related bands became more prominent under drier conditions. Although the regression accuracy did not reach practical levels, the interpretability of EBM facilitated meaningful insights into the relationships between soil properties and spectral characteristics. These findings highlight both the current limitations and future potential of vis-NIR spectroscopy for predicting the hydraulic properties of forest soils. Improved performance is expected by using larger datasets and stratifying the model by soil type.

Author Contributions

Conceptualization, R.S. and M.K.; methodology, R.S.; investigation, R.S., T.T. and M.K.; writing—original draft preparation, R.S.; writing—review and editing, T.T., M.K. and A.I.; visualization, R.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
EBMExplainable boosting machine
LIMELocal interpretable model-agnostic explanations
PLSPartial least-squares
RMSERoot mean squared error
RPDRatio of performance to deviation
SHAPShapley additive explanations
SOCSoil organic carbon
Vis-NIRVisible-Near-Infrared spectroscopy
WRBWorld Reference Base for Soil Resources

References

  1. IPCC. Summary for Policymakers. In Climate Change 2023: Synthesis Report. Contribution of Working Groups I, II, and III to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change; Core Writing Team, Lee, H., Romero, J., Eds.; IPCC: Geneva, Switzerland, 2023; pp. 1–34. Available online: https://www.ipcc.ch/report/ar6/syr/downloads/report/IPCC_AR6_SYR_SPM.pdf (accessed on 5 July 2025).
  2. Shepherd, K.D.; Walsh, M.G. Infrared spectroscopy—Enabling an evidence-based diagnostic surveillance approach to agricultural and environmental management in developing countries. J. Near Infrared Spec. 2007, 15, 1–19. [Google Scholar] [CrossRef]
  3. Stenberg, B.; Viscarra Rossel, R.A.; Mouazen, A.M.; Wetterlind, J. Chapter five—Visible and near infrared spectroscopy in soil science. Adv. Agron. 2010, 107, 163–215. [Google Scholar] [CrossRef]
  4. Mohamed, E.S.; Saleh, A.M.; Belal, A.B.; Gad, A. Application of near-infrared reflectance for quantitative assessment of soil properties. Egypt. J. Remote Sens. Space Sci. 2018, 21, 1–14. [Google Scholar] [CrossRef]
  5. Janik, L.J.; Merry, R.H.; Skjemstad, J.O. Can mid infrared diffuse reflectance analysis replace soil extractions? Aust. J. Exp. Agric. 1998, 38, 681–696. [Google Scholar] [CrossRef]
  6. Minasny, B.; McBratney, A.B.; Tranter, G.; Murphy, B.W. Using soil knowledge for the evaluation of mid-infrared diffuse reflectance spectroscopy for predicting soil physical and mechanical properties. Eur. J. Soil. Sci. 2008, 59, 960–971. [Google Scholar] [CrossRef]
  7. Vestergaard, R.-J.; Vasava, H.B.; Aspinall, D.; Chen, S.; Gillespie, A.; Adamchuk, V.; Biswas, A. Evaluation of optimized preprocessing and modeling algorithms for prediction of soil properties using VIS–NIR spectroscopy. Sensors 2021, 21, 6745. [Google Scholar] [CrossRef] [PubMed]
  8. Norouzi, S.; Sadeghi, M.; Tuller, M.; Ebrahimian, H.; Liaghat, A.; Jones, S.B. A novel laboratory method for the retrieval of the soil water retention curve from shortwave infrared reflectance. J. Hydrol. 2023, 626, 130284. [Google Scholar] [CrossRef]
  9. Fouad, Y.; Soltani, I.; Cudennec, C.; Michot, D. Using near-infrared spectroscopy to estimate soil water retention curves with the van Genuchten model. Geoderma 2025, 454, 117175. [Google Scholar] [CrossRef]
  10. De Melo, T.M.; Pedrollo, O.C. Artificial neural networks for estimating soil water retention curve using fitted and measured data. Appl. Environ. Soil. Sci. 2015, 2015, 535216. [Google Scholar] [CrossRef]
  11. Rossel, R.A.V.; Behrens, T. Using data mining to model and interpret soil diffuse reflectance spectra. Geoderma 2010, 158, 46–54. [Google Scholar] [CrossRef]
  12. Wadoux, A.M.J.-C. Interpretable spectroscopic modelling of soil with machine learning. Eur. J. Soil. Sci. 2023, 74, e13370. [Google Scholar] [CrossRef]
  13. Lundberg, S.M.; Lee, S.I. A Unified approach to interpreting model predictions. In Proceedings of the 2017 Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
  14. Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; IEEE: San Francisco, CA, USA, 2016; pp. 1135–1144. [Google Scholar]
  15. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef] [PubMed]
  16. Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. 2022. Available online: https://christophm.github.io/interpretable-ml-book/ (accessed on 5 July 2025).
  17. Caruana, R.; Lou, Y.; Gehrke, J.; Koch, P.; Sturm, M.; Elhadad, N. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘15), Sydney, Australia, 10–13 August 2015; Association for Computing Machinery: New York, NY, USA, 2015; pp. 1721–1730. [Google Scholar]
  18. Lou, Y.; Caruana, R.; Gehrke, J.; Hooker, G. Accurate intelligible models with pairwise interactions. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, 11–14 August 2013; ACM: New York, NY, USA, 2013; pp. 623–631. [Google Scholar]
  19. Microsoft InterpretML Project. InterpretML: A Unified Framework for Machine Learning Interpretability. Available online: https://interpret.ml/docs/ebm.html (accessed on 3 July 2025).
  20. Blaschek, M.; Roudier, P.; Poggio, M.; Hedley, C.B. Prediction of soil available water-holding capacity from visible near-infrared reflectance spectra. Sci. Rep. 2019, 9, 12833. [Google Scholar] [CrossRef] [PubMed]
  21. Baumann, P.; Lee, J.; Behrens, T.; Biswas, A.; Six, J.; McLachlan, G.; Viscarra Rossel, R.A. Modelling soil water retention and water-holding capacity with visible–near-infrared spectra and machine learning. Eur. J. Soil Sci. 2022, 73, 13220. [Google Scholar] [CrossRef]
  22. Conforti, M.; Castrignanò, A.; Robustelli, G.; Scarciglia, F.; Stelluti, M.; Buttafuoco, G. Laboratory-based Vis–NIR spectroscopy and partial least square regression with spatially correlated errors for predicting spatial variation of soil organic matter content. CATENA 2015, 124, 60–67. [Google Scholar] [CrossRef]
  23. Nanko, K.; Ugawa, S.; Hashimoto, S.; Imaya, A.; Kobayashi, M.; Sakai, H.; Ishizuka, S.; Miura, S.; Tanaka, N.; Takahashi, M. A Pedotransfer function for estimating bulk density of forest soil in japan affected by volcanic ash. Geoderma 2014, 213, 36–45. [Google Scholar] [CrossRef]
  24. Chinilin, A.V.; Vindeker, G.V.; Savin, I.Y. Vis-NIR Spectroscopy for Soil Organic Carbon Assessment: A Meta-Analysis. Eurasian Soil Sc. 2023, 56, 1605–1617. [Google Scholar] [CrossRef]
  25. Liu, S.; Shen, H.; Chen, S.; Zhao, X.; Biswas, A.; Jia, X.; Shi, Z.; Fang, J. Estimating forest soil organic carbon content using vis-nir spectroscopy: Implications for large-scale soil carbon spectroscopic assessment. Geoderma 2019, 348, 37–44. [Google Scholar] [CrossRef]
  26. Soil Classification System of Japan. The Fifth Committee for Soil Classification and Nomenclature of Japanese Society of Pedology; The Japanese Society of Pedology: Sendai, Japan, 2017. (In Japanese) [Google Scholar]
  27. IUSS Working Group WRB. World Reference Base for Soil Resources: International Soil Classification System for Naming Soils and Creating Legends for Soil Maps, 4th ed.; International Union of Soil Sciences: Vienna, Austria, 2022. [Google Scholar]
  28. Imaya, A.; Noguchi, A.; Watanabe, A.; Ito, C.; Takakai, F.; Shinmachi, F.; Uno, F.; Fujisawa, H.; Miura, H.; Kubotera, H.; et al. The Soils of, Japan; Hatano, R., Shinjo, H., Takata, Y., Eds.; Springer: Singapore, 2021. [Google Scholar]
  29. Savitzky, A.; Golay, M.J.E. Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 1964, 36, 1627–1639. [Google Scholar] [CrossRef]
  30. Riley, R.D.; Snell, K.I.E.; Ensor, J.; Burke, D.L.; Harrell, F.E.; Moons, K.G.M.; Collins, G.S. Minimum sample size for developing a multivariable prediction model: Part I—Continuous outcomes. Stat. Med. 2019, 38, 1262–1275. [Google Scholar] [CrossRef] [PubMed]
  31. Python Software Foundation Python Language Reference, Version 3.12.3. 2024. Available online: https://www.python.org (accessed on 5 July 2025).
  32. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Bertrand, T.; Grisel, O. Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  33. Core Team. R: A Language and Environment for Statistical Computing, Version 4.4.0. R Foundation for Statistical Computing. 2024. Available online: https://www.R-project.org/ (accessed on 5 July 2025).
  34. Hothorn, T.; Bretz, F.; Westfall, P. Simultaneous inference in general parametric models. Biom. J. 2008, 50, 346–363. [Google Scholar] [CrossRef] [PubMed]
  35. McGuirk, H.A.; Cairns, A. Relationships between Soil Moisture and Visible–NIR Soil Reflectance: A Review Presenting New Analyses and Data to Fill the Gaps. Geotechnics 2024, 4, 78–108. [Google Scholar] [CrossRef]
  36. Weyer, L.G.; Lo, S.-C. Spectra—structure correlations in the near-infrared. In Handbook of Vibrational Spectroscopy; Chalmers, J.M., Griffiths, P.R., Eds.; Wiley: New York, NY, USA, 2006; pp. 1817–1837. [Google Scholar]
  37. Pittaki-Chrysodonta, Z.; Moldrup, P.; Knadel, M.; Iversen, B.V.; Hermansen, C.; Greve, M.H.; de Jonge, L.W. Predicting the campbell soil water retention function: Comparing visible–near-infrared spectroscopy with classical pedotransfer function. Vadose Zone J. 2018, 17, 1–12. [Google Scholar] [CrossRef]
  38. Ben-Dor, E. The reflectance spectra of organic matter in the visible near-infrared and short wave infrared region (400–2500 Nm) during a controlled decomposition process. Remote Sens. Environ. 1997, 61, 1–15. [Google Scholar] [CrossRef]
  39. Demattê, J.A.M.; Campos, R.C.; Alves, M.C.; Fiorio, P.R.; Nanni, M.R. Visible–NIR reflectance: A new approach on soil evaluation. Geoderma 2004, 121, 95–112. [Google Scholar] [CrossRef]
  40. Viscarra Rossel, R.A.; McBratney, A.B. Laboratory evaluation of a proximal sensing technique for simultaneous measurement of soil clay and water content. Geoderma 1998, 85, 19–39. [Google Scholar] [CrossRef]
  41. Roth, K.; Sanderson, M.; Koestel, J.; Jarvis, N. Rapid estimation of soil water retention curves using visible–near infrared spectroscopy. J. Hydrol. 2021, 603, 127195. [Google Scholar] [CrossRef]
  42. Lalitha, M.; Sarkar, D.; Bhowmik, A.; Chakraborty, S.; Das, T.; Kundu, D. Field-scale prediction of soil moisture retention parameters using VNIR–SWIR spectroscopy and machine learning. Geoderma 2022, 417, 115816. [Google Scholar]
  43. Oberholzer, S.; Summerauer, L.; Steffens, M.; Ifejika Speranza, C. Best performances of visible–near-infrared models in soils with little carbonate—A field study in Switzerland. Soil 2024, 10, 231–249. [Google Scholar] [CrossRef]
  44. Jeong, S.; Kim, Y.-K.; Hur, S.H.; Bang, H.; Kim, H.; Chung, H. Explainable extreme gradient boosting as a machine learning tool for discrimination of the geographical origin of chili peppers using laser ablation-inductively coupled plasma mass spectrometry, X-ray fluorescence, and near-infrared spectroscopy. J. Agric. Food Res. 2024, 18, 101446. [Google Scholar] [CrossRef]
Figure 1. Locations of soil sampling sites.
Figure 1. Locations of soil sampling sites.
Forests 16 01182 g001
Figure 2. Boxplots of the volumetric water content: (a) volumetric water content at a given matric suction (n = 151), and (b) volumetric water content by parent material at a given matric suction (n = 151). In (a), the orange lines and white circles corresponded to the mean volumetric water content at a given matric suction and outliers, respectively. Significant differences (p < 0.05) weren’t observed between box plots with the same letters.
Figure 2. Boxplots of the volumetric water content: (a) volumetric water content at a given matric suction (n = 151), and (b) volumetric water content by parent material at a given matric suction (n = 151). In (a), the orange lines and white circles corresponded to the mean volumetric water content at a given matric suction and outliers, respectively. Significant differences (p < 0.05) weren’t observed between box plots with the same letters.
Forests 16 01182 g002
Figure 3. Absorbance spectra of the soil samples (n = 151): (a) smoothing by using a quadratic polynomial and (b) the second derivative (Note: positive and negative values are reversed in (b)).
Figure 3. Absorbance spectra of the soil samples (n = 151): (a) smoothing by using a quadratic polynomial and (b) the second derivative (Note: positive and negative values are reversed in (b)).
Forests 16 01182 g003
Figure 4. Scatter plots of the volumetric water content (θi) at a given matric suction (pF i) predicted by EBM. The vertical and horizontal axes represent the predicted and observed volumetric water contents, respectively. The red dots and black crosses correspond to predictions using the test and training datasets, respectively. The gray dotted lines show the 1:1 line. The model performance in terms of R2, RMSE, and RPD for the test dataset is also shown.
Figure 4. Scatter plots of the volumetric water content (θi) at a given matric suction (pF i) predicted by EBM. The vertical and horizontal axes represent the predicted and observed volumetric water contents, respectively. The red dots and black crosses correspond to predictions using the test and training datasets, respectively. The gray dotted lines show the 1:1 line. The model performance in terms of R2, RMSE, and RPD for the test dataset is also shown.
Forests 16 01182 g004
Figure 5. Importance of each wavelength for predicting the volumetric water content (θ) at a given matric suction (pF i). Red dots represent the ten most important wavelengths for each θi.
Figure 5. Importance of each wavelength for predicting the volumetric water content (θ) at a given matric suction (pF i). Red dots represent the ten most important wavelengths for each θi.
Forests 16 01182 g005
Table 1. Details of soil samples.
Table 1. Details of soil samples.
Sampling
Number
Latitude (N)
Longitude (E)
HorizonSampling Depth
[cm]
Parent MaterialSample Volume
[cm3]
Bulk Density
[g cm−3]
136.1845
140.218
A110 Volcanic ash4000.89
A230 4000.83
B154 4000.70
B285 4000.66
236.183
140.217
*10 Volcanic ash4000.61
*40 4000.70
*10 4000.67
*80 4000.55
*40 4000.64
*80 4000.68
336.183
140.217
*10 Volcanic ash4000.52
*40 4000.65
*80 4000.67
434.121
134.044
A5 Sandstone and mudstone4000.48
BA12 4000.70
B124 4001.01
534.792
135.841
AC5 Granite4000.84
C120 4000.87
C235 4000.83
C352 4000.87
R72 4000.94
637.938
139.359
A2 Dacite4000.79
B13 4001.10
C142 4001.22
C278 4001.31
C3107 4001.25
733.139
130.709
A3 Schist4000.59
B113 4000.72
B235 4000.83
2A55 4000.88
2C73 4001.16
833.139
130.709
HA-A10 Schist4000.57
B130 4000.99
B248 4001.01
B372 4001.25
BL90 4000.93
926.819
128.299
HA-AB3 Mudstone4000.90
B115 4001.39
B235 4001.28
BC58 4001.12
1026.809
128.294
A5 Mudstone4000.76
B120 4000.94
B245 4001.00
1126.8100
128.274
A4 Mudstone4000.27
B123 4000.91
B250 4000.98
1226.826
128.255
A4 Sandstone4000.57
B20 4001.36
BC150 4001.19
1326.820
128.274
A2 Mudstone4000.52
B17 4000.66
B218 4000.88
C30 4000.92
1435.738
137.014
A5 Sandstone and mudstone4000.46
BA15 4000.32
B30 4000.52
BC154 4000.74
1531.5200
130.795
A14 Volcanic ash4000.88
A214 4000.53
BC30 4000.72
2AB50 4000.46
2B170 4000.44
2B290 4000.58
1626.8100
128.274
*5 Mudstone4000.66
*20 4001.21
*50 4001.24
1726.7200
128.270
A-AB3 Mudstone4000.63
B133 4001.09
B258 4001.11
B376 4001.11
CB92 4001.20
1826.843
128.262
A-AB(1)3 Sandstone4001.01
A-AB(2)5 4000.66
B125 4001.30
B250 4001.36
BC70 4001.39
1936.183
140.217
A5 Volcanic ash4000.46
A5 4000.51
2036.183
140.217
A5 Volcanic ash4000.47
A5 4000.57
2136.429
136.645
A5 Dacite4000.67
B123 4000.73
B247 4000.87
BC172 4000.93
BC298 4000.75
2238.941
140.258
A13 Tuff4000.37
A215 4000.57
AB33 4000.59
B155 4000.79
B285 4000.91
2340.678
140.211
A5 Volcanic ash4000.20
BA12 4000.65
B124 4000.72
B243 4000.83
B366 4000.83
CB85 4000.80
2433.297
130.836
*5 Andesite1000.53
*15 1000.49
*30 1000.89
*50 1000.97
*70 1000.80
*90 1000.90
2533.297
130.836
*5 Andesite1000.58
*15 1000.68
*30 1000.79
*50 1000.87
*70 1000.96
*90 1000.90
2633.484
130.675
*5 Granodiolite1000.58
*15 1000.54
*30 1000.64
*50 1000.77
*70 1000.82
*90 1000.82
2736.174
140.177
*5 Volcanic ash1000.28
*15 1000.36
*30 1000.58
*50 1000.82
*70 1000.80
2833.647
133.718
A2 Conglomerate4000.53
B113 4000.96
B227 4001.12
B339 4000.93
2936.431
136.643
A3 Dacite4000.40
B117 4000.81
B245 4000.70
B485 4000.71
3033.484
130.675
*5 Conglomerate1000.67
*15 1000.67
*30 1000.94
*50 1001.05
3133.479
130.710
*5 Schist1000.95
*15 1000.93
*30 1000.90
*50 1001.00
*70 1001.09
*90 1001.07
3233.479
130.710
*5 Schist1000.64
*15 1001.00
*30 1001.17
*50 1001.05
*70 1001.22
*90 1001.11
3336.515
140.307
A10 Volcanic ash1000.56
B230 1000.63
B360 1000.48
B4100 1000.76
3436.515
140.308
A110 Volcanic ash1000.45
A230 1000.60
B160 1000.79
B3100 1000.30
Notes: The sample volume represents the type of soil sampler (i.e., 400 and 100 cm3 core samplers). Soil samples were collected so that the center of the sampler corresponded to the indicated “sampling depth”. Asterisks (*) in the table indicate samples collected based on depth; soil horizon information is not available.
Table 2. Ten most important wavelengths and their corresponding importance values for each volumetric water content (θ) at a given matric suction (pF i).
Table 2. Ten most important wavelengths and their corresponding importance values for each volumetric water content (θ) at a given matric suction (pF i).
θsθ1.0θ1.4θ1.7θ2.0θ2.4θ2.7θ3.0θ3.2
WavelengthImportanceWavelengthImportanceWavelengthImportanceWavelengthImportanceWavelengthImportanceWavelengthImportanceWavelengthImportanceWavelengthImportanceWavelengthImportance
23600.1367 16900.2088 16900.2087 16900.2254 16900.2307 16900.1517 16900.1423 17680.1387 17680.1366
16840.1332 15920.1735 15920.1936 15920.1845 16920.1845 17680.1415 17680.1396 16900.1299 16900.1290
23960.1294 11860.1610 17680.1584 16920.1768 15920.1818 16920.1284 18240.1195 18240.1223 18240.1224
16300.1289 16300.1602 16920.1531 17680.1538 17680.1598 15920.1213 16920.1181 16920.1132 16920.1148
15920.1268 16400.1519 11860.1464 9300.1390 9300.1332 18240.1109 15920.1117 21040.1074 21040.1044
16900.1222 16920.1510 16080.1394 17120.1215 17120.1244 21040.1108 21040.1071 16940.1046 16940.1036
11860.1175 9300.1377 16400.1379 16840.1209 23620.1235 9300.1064 16940.1043 15920.1032 15920.1012
23620.1170 17680.1372 9300.1338 11860.1204 17140.1234 16940.1054 7440.1034 7440.0989 23620.0977
9300.1148 13140.1347 16300.1248 23620.1194 18240.1212 23440.1048 9300.1008 23620.0987 17140.0976
23580.1129 16080.1347 13140.1187 16940.1163 15900.1212 7440.1048 23440.1008 23440.0980 9300.0941
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sekiguchi, R.; Tsurita, T.; Kobayashi, M.; Imaya, A. Applicability of Visible–Near-Infrared Spectroscopy to Predicting Water Retention in Japanese Forest Soils. Forests 2025, 16, 1182. https://doi.org/10.3390/f16071182

AMA Style

Sekiguchi R, Tsurita T, Kobayashi M, Imaya A. Applicability of Visible–Near-Infrared Spectroscopy to Predicting Water Retention in Japanese Forest Soils. Forests. 2025; 16(7):1182. https://doi.org/10.3390/f16071182

Chicago/Turabian Style

Sekiguchi, Rando, Tatsuya Tsurita, Masahiro Kobayashi, and Akihiro Imaya. 2025. "Applicability of Visible–Near-Infrared Spectroscopy to Predicting Water Retention in Japanese Forest Soils" Forests 16, no. 7: 1182. https://doi.org/10.3390/f16071182

APA Style

Sekiguchi, R., Tsurita, T., Kobayashi, M., & Imaya, A. (2025). Applicability of Visible–Near-Infrared Spectroscopy to Predicting Water Retention in Japanese Forest Soils. Forests, 16(7), 1182. https://doi.org/10.3390/f16071182

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop