Applicability of Visible–Near-Infrared Spectroscopy to Predicting Water Retention in Japanese Forest Soils

Sekiguchi, Rando; Tsurita, Tatsuya; Kobayashi, Masahiro; Imaya, Akihiro

doi:10.3390/f16071182

Open AccessArticle

Applicability of Visible–Near-Infrared Spectroscopy to Predicting Water Retention in Japanese Forest Soils

by

Rando Sekiguchi

^1,*

,

Tatsuya Tsurita

¹,

Masahiro Kobayashi

² and

Akihiro Imaya

¹

Forestry and Forest Products Research Institute, Tsukuba 305-8687, Japan

²

Kansai Research Center, Forestry and Forest Products Research Institute, Kyoto 612-0855, Japan

^*

Author to whom correspondence should be addressed.

Forests 2025, 16(7), 1182; https://doi.org/10.3390/f16071182

Submission received: 7 June 2025 / Revised: 7 July 2025 / Accepted: 15 July 2025 / Published: 17 July 2025

(This article belongs to the Section Forest Soil)

Download

Browse Figures

Versions Notes

Abstract

This study assessed the applicability of visible–near-infrared (vis-NIR) spectroscopy to predicting the water retention characteristics of forest soils in Japan, which vary widely owing to the presence of volcanic ash. Soil samples were collected from 34 sites, and the volumetric water content was measured at eight levels of matric suction. Spectral data were processed by using the second derivative of the absorbance, and regression models were developed by using explainable boosting machine (EBM), which is an interpretable machine learning method. Although the prediction accuracy was limited owing to the small sample size and soil heterogeneity, EBM performed better under saturated conditions (R² = 0.30), which suggests that vis-NIR spectroscopy can capture water-related features, especially under wet conditions. Importance analysis consistently selected wavelengths that were associated with organic matter and hydrated clay minerals. The important wavelengths clearly shifted from free-water bands in wet soils to mineral-related absorption bands in dry soils. These findings highlight the potential of coupling vis-NIR spectroscopy with interpretable models like EBM for estimating the hydraulic properties of forest soils. Improved accuracy is expected with larger datasets and stratified models by soil type, which can facilitate more efficient soil monitoring in forests.

Keywords:

Japanese forest soils; vis-NIR; matric suction; explainable boosting machine; volcanic ash; volumetric water content

1. Introduction

Climate change is expected to increase the frequency of extreme rainfall events such as heavy rainfall and drought [1]. Predicting the effects of such extreme rainfall events on forest environments and ecosystems requires understanding the soil water dynamics, which include hydraulic properties such as the hydraulic conductivity and water retention. However, the traditional methods for measuring hydraulic properties, particularly water retention, are costly and time-consuming. Visible–near-infrared (vis-NIR) spectroscopy has been gaining wider application for measuring various physical and chemical properties of soils and their principal components owing to its simplicity and rapid processing with various regression methods [2,3,4]. Janik et al. analyzed the infrared bands to accurately predict chemical properties of soils such as the carbon content and cation exchange capacity [5]. Shepherd and Walsh used vis-NIR spectroscopy to show that the physical properties of soils are correlated with their clay content and chemical composition [2]. Minasny et al. used vis-NIR spectroscopy to demonstrate that physical properties of soil, such as the texture, clay content, and air-dry moisture content, are correlated with the specific surface area and solid composition [6].

Various studies have shown that vis–NIR spectra can be used to predict soil water retention at different matric suctions [7,8] as well as model parameters in the van Genuchten equation [9]. Moreira de Melo and Pedrollo combined vis-NIR spectroscopy with deep learning methods to predict the field capacity and wilting point of soils or even the full retention curve with high accuracy (R² > 0.8) [10]. However, most of these studies relied on black-box algorithms such as partial least-squares regression (PLS), random forest, or neural networks, which makes it difficult to interpret how individual wavelengths contribute to the predicted water retention [3,11,12]. In particular, PLS compresses the original spectral variables into latent components, which makes it difficult to determine the predictive importance of individual wavelengths [11]. Neural networks can capture complex nonlinear relationships but often function as black-box models with limited transparency. While some studies have highlighted the importance of spectral regions around 1400, 1920, and 2200 nm, few have quantitatively evaluated the contribution of each wavelength [7,9]. To address this issue, post-hoc interpretability tools such as SHapley Additive exPlanations (SHAP) [13] and Local Interpretable Model-agnostic Explanations (LIME) [14] have been applied to attributing features after the model has been trained. However, these tools provide retrospective interpretations and remain disconnected from the model structure, which limits their scientific applicability in understanding the underlying mechanisms between soil properties and vis–NIR spectra [15,16].

In contrast, the explainable boosting machine (EBM) [17] is an inherently interpretable machine learning algorithm that combines the performance of gradient boosting with the transparency of an additive model structure. Thus, EBM can model the nonlinear effects of individual variables through smooth and additive functions while selectively incorporating only the most relevant pairwise interactions [18]. Furthermore, EBM retains the original spectral variables, allowing for direct visualization of the influence of each wavelength on the response variable. Despite its high interpretability, EBM has demonstrated a predictive performance comparable to that of modern machine learning models using black-box algorithms, including deep learning [19]. Thus, EBM can be used to visualize the effects of individual features and their pairwise interactions, which allows identification of the wavelengths that contribute most to the predicted water retention at various matric suctions and addresses a major limitation of previous approaches.

Despite its wide use in agricultural contexts, vis–NIR spectroscopy has rarely been applied to predicting the water retention of forest soils. Studies on upland field soils have suggested that the prediction accuracy is higher for dry regions than for wetter regions, which is likely due to the contribution of clay minerals to residual water retention [20,21]. Conforti et al. [22] used vis-NIR spectroscopy to predict the clay content and associated properties of Italian forest soils with high accuracy (R² = 0.80). However, forest topsoils are generally characterized by having a higher soil organic carbon (SOC) content than upland field soils, which reduces the bulk density [23]. Especially in Japan, the SOC content and bulk density of forest soils vary widely because the parent materials are often derived from volcanic ash, which strongly affects the saturated water content. A recent meta-analysis by Chinilina et al. [24] demonstrated that SOC can be predicted from vis–NIR spectra with reasonably good accuracy across a wide range of soil types and geographic regions, reporting a median R² of 0.67 and a median RPD of 1.99. When focusing specifically on forest soils, Liu et al. [25] reported a much higher prediction accuracy (R² = 0.96), indicating the strong potential of vis–NIR spectroscopy for SOC estimation in such environments.

Previous studies have shown that vis-NIR spectroscopy can be used to accurately determine the SOC content and that the SOC content has a strong correlation with the bulk density. Thus, this study investigated the applicability of vis-NIR spectroscopy to predicting the water retention of Japanese forest soils based on the hypothesis that the saturated water content of forest soils can be predicted from the SOC content. Vis-NIR spectroscopy was applied to 151 forest soil samples collected from 34 soil profiles in Japan, and EBM was applied to clarify which spectral regions are most relevant to the volumetric water content at a given matric suction.

2. Materials and Methods

2.1. Study Area

Figure 1 shows the location of soil sampling sites. Japan is one of the most volcanically active countries in the world. While the geological basement consists mainly of tuff and sedimentary rocks, the surface soils are widely influenced by the addition and deposition of volcanic ash. According to the Fundamental Soil Classification Chart of Japan (Japanese Society of Pedology [26]), Brown Forest Soils (Cambisols in World Reference Base for Soil Resources (WRB) (IUSS Working Group WRB [27]) account for 33.2% of the land area, followed by Andosols (Andosols in WRB [27]) (30.3%), Fluvic Soils (Fluvisols in WRB [27]) (13.7%), Red-Yellow Soils (Acrisols, Lixisols, Alisols, or Luvisols in WRB [27]) (7.6%), and Regosols (Regosols and Lithosols in WRB [27]) (6.9%). The 34 sampling sites used in this study are distributed across the Japanese archipelago, covering a wide range of climatic and geological conditions. In central Japan (e.g., sites 1–5, 14, 19–21, 33, 34), volcanic ash–derived Andosols (Andosols in WRB [27]) are dominant [28], while western Japan (sites 7, 24–32) features forest soils developed from a variety of parent materials [28]. Sites in southern Kyushu (sites 8, 15) are strongly affected by volcanic activity [28], and the subtropical Ryukyu Islands (sites 9–13, 16–18) are characterized by highly weathered Red-Yellow Soils (Acrisols, Lixisols, Alisols, or Luvisols in WRB [27]) [28].

2.2. Soil Samples

Undisturbed soil samples were collected from 38 soil profiles in Japan (n = 151). Table 1 presents the details on the soil samples. The sampling depth depended on the soil profile and thus differed between sampling sites. Soil samples were collected from the middle part of each horizon by using 100 (φ50 × H51 mm) or 400 (φ113 × H40 mm) cm³ stainless-steel core samplers. In the laboratory, the collected soil samples were saturated by capillary rise with the water level maintained at approximately 1 cm from the bottom of the cylinder for several days. The mass of each soil sample was then measured while under saturated conditions. This condition was used to determine the volumetric water content at saturation (θ_s) for each sample. The volumetric water content was measured by the pressure plate method at different matric suctions: pF 1.0 (=0.98 kPa), pF 1.4 (=2.5 kPa), pF 1.7 (=4.9 kPa), pF 2.0 (=9.8 kPa), pF 2.4 (=25 kPa), pF 2.7 (=49 kPa), pF 3.0 (=98 kPa), and pF 3.2 (=147 kPa). Hereafter, the volumetric water content (θ) at a given matric suction (pF i) is referred to as θ_i (e.g., θ_1.0 corresponds to the volumetric water content at pF1.0). The soil samples were then air-dried at room temperature for 2 weeks, after which they were passed through a sieve with a 2-mm mesh size to remove tree roots and gravel. The soil samples were then ground for vis-NIR spectroscopy.

2.3. Spectroscopic Analysis

The absorbance spectra of the soil samples were measured in the visible and near-infrared regions (400–2500 nm) at a spectral resolution of 2 nm (XDS NIR Analyzer, FOSS NIR Systems, Inc., Laurel, MD, USA). Measurements were repeated three times, and the mean absorbance was taken to predict the water retention. The Savitzky–Golay method was employed, where the absorbance spectra were smoothed by using a quadratic polynomial and a window size of five points, after which the second derivative was taken [29]. All regression models were developed by using absorbance spectra acquired from the air-dried state of each soil sample following pF measurements. The volumetric water contents at a given matric suction (θ_i) were grouped based on the soil profiles to avoid splitting data collected from different depths of the same soil profile into training and test datasets. The θ_i data were then split 8:2 into training and test datasets according to the groups. A regression model was developed from the training dataset and was then optimized using the leave-one-group-out cross-validation method.

2.4. EBM

In this study, EBM was used to predict the water retention of soil samples from their vis-NIR spectra owing to its balance of accuracy with feature-level transparency, which facilitates both robust prediction and scientific insights into the relationships between spectral data and hydraulic properties. In EBM, the conditional expectation

E  [Y]

is defined as an additive function of the predictors

X

:

E  [Y] = β_{0} + \sum_{j = 1}^{p} f_{j} (X_{j}) + \sum_{(i, j) \in I} f_{i, j} (X_{i}, X_{j})

(1)

where

β_{0}

,

f_{j} (X_{j})

, and

f_{i, j} (X_{i}, X_{j})

denote the intercept, a univariate shape function, and an optional pairwise interaction term, respectively, for the selected variable pair

(i, j) \in I

. Unlike the standard multiple linear regression, where the response is modeled as a linear combination of the predictors, each component function

f_{j} (X_{j})

is a nonparametric and potentially nonlinear function learned from the data. This allows EBM to flexibly capture complex and smooth relationships between individual predictors and the outcome without assuming linearity or monotonicity. For example, in spectroscopic analysis, the effect of a specific wavelength may exhibit threshold-like, saturating, or oscillatory patterns that cannot be adequately represented by a linear coefficient. These learned shape functions can be directly visualized and interpreted to provide insights into the role of each variable in the prediction. These functions are learned via cyclic gradient boosting on residuals using shallow bagged trees, typically with a maximum depth of two, to preserve interpretability and reduce variance [17,18]. The model is trained in a stage-wise fashion by minimizing a loss function (e.g., squared error or logistic loss) using additive updates. At each boosting iteration, the residuals are computed from the current ensemble prediction, and the next shape function is fitted to those residuals. By constraining the complexity of each base learner (e.g., via max depth and learning rate) and applying early stopping or feature dropout, EBM balances predictive performance and generalization. The interaction terms are only introduced when they contribute substantially to the reduction of loss based on the validation performance [19].

All models were trained on a desktop computer (Lenovo, Morrisville, NC, USA) with an Intel^® Core^TM i7-14700KF CPU (3.40 GHz, 20 cores/ 28 threads) and 32 GB RAM. Parallel computation was enabled using the “n_jobs = −1” setting. Due to the sequential feature-wise training structure of EBM, the total training time for all models (corresponding to each θ_i) was approximately two days. Despite the relatively small dataset (n = 151), the training process was computationally intensive. Nevertheless, the high level of interpretability offered by EBM justified the additional computational cost in this study.

2.5. Performance Evaluation

The model was applied to the test dataset, and the performance was evaluated in terms of the coefficient of determination (R²), root mean squared error (RMSE), and ratio of performance to deviation (RPD):

R^{2} = 1 - \frac{\sum {(y_{i} - \hat{y_{i}})}^{2}}{\sum {(y_{i} - \bar{y_{i}})}^{2}}

(2)

R M S E = \sqrt{\frac{1}{n} \sum {(y_{i} - \hat{y_{i}})}^{2}}

(3)

R P D = \frac{S . E .}{R M S E}

(4)

where

y_{i}

,

\hat{y_{i}}

,

\bar{y_{i}}

, and

n

are the observed

θ_{i}

, predicted

θ_{i}

, mean

θ_{i}

, and number of samples, respectively. S.E. is the standard error of the means of

θ_{i}

in the test dataset. The model with the lowest RMSE and the highest RPD and R² was considered to perform the best. According to Riley et al. [30], the minimum required sample size for developing a multivariable prediction model depends on factors such as the number of predictors, the expected R², and the outcome variance. Based on their framework and given the 1050 explanatory variables used in this study, approximately 3360 samples would be needed to achieve a target R² greater than 0.8. [30]. As only 151 samples were used in this analysis, the theoretically expected R² was no more than 0.157, according to Riley et al. [30]. All model development and performance evaluation processes were performed in Python (version 3.11.7, Python Software Foundation, Wilmington, DE, USA) [31] by using the interpret package (version 0.6.10., InterpretML, Redmond, WA, USA) [19] and the scikit-learn package (version 1.2.2., INRIA, Paris, France) [32]. Statistical analysis was performed in R (version 4.4.0., R Foundation for Statistical Computing, Vienna, Austria) [33] by using the multcomp package (version 1.4.26., Torsten Hothorn et al., Zurich, Switzerland) [34].

3. Results

3.1. Measurements

Figure 2a shows that the volumetric water content gradually decreased as the matric suction increased (n = 151). The most outliers were observed at pF1.7 (i.e., θ_1.7). No significant differences (p < 0.05) were observed between θ_1.4 and θ_1.7, θ_1.7 and θ_2.0, θ_2.0 and θ_2.4, θ_2.4 and θ_2.7, θ_2.4 and θ_3.0, θ_2.7 and θ_3.0, θ_2.7 and θ_3.2, and θ_3.0 and θ_3.2. Figure 2b clearly visualizes the differences in volumetric water content at each matric suction as influenced by parent material. The degree of variability in water content differed notably among the parent materials. For instance, soils derived from dacite exhibited the highest variability across all matric suctions (e.g., S.D. ≈ 0.15 at θ_2.0), likely reflecting heterogeneity in texture or organic matter content. In contrast, granite-derived soils showed consistently low variability (e.g., S.D. ≈ 0.02 at θ_2.0), suggesting more uniform physical properties. Soils developed from andesite and conglomerate exhibited intermediate levels of variability, whereas those derived from granodiorite showed relatively stable water retention characteristics even under dry conditions.

Figure 3 shows the absorbance spectra of the soil samples and their second derivative. As the soil moisture content increased, the peak positions remained constant, although the reflectance decreased [35]. The absorbance spectra (Figure 3a) had several characteristic shoulders and peaks. The shoulder near 1100 nm is commonly associated with the first and second overtones of O–H stretching and water vapor absorption [36]. Strong absorption peaks below 600 nm may be attributable to organic matter, as the absorbance near 570–700 nm has previously been linked to chromophoric organic compounds [37]. Broad and overlapping peaks at 600–750 nm suggest potential contributions from both organic matter and other chromophores. This feature is more clearly observed in the second derivative (Figure 3b), where small but consistent variations suggest superimposed peaks in this range. This interpretation aligns with previous studies reporting that multiple weak absorbance features related to organic components and iron oxides can overlap in this spectral window. In addition, relatively strong peaks were observed near 1900 nm, and weak peaks were observed around 1400 and 2200 nm. The increase in absorption at 2250–2500 nm suggests contributions from various clay minerals, particularly those associated with O–H and H₂O overtones [3]. Specifically, the peaks near 1400 and 1900 nm are typically related to bound water vibrations, while those near 2200–2500 nm correspond to metal–OH bonds involving elements such as Al, Fe, and Mg [3].

3.2. Prediction Performance

Figure 4 presents the predicted volumetric water content at a given matric suction i (θ_i) by EBM. The expected R² was 0.157 based on the sample size in this study [30]. The model outperformed the expected R² at θ_s (i.e., saturation conditions), performed comparably at θ_1.0 and θ_3.2, and underperformed at other θ_i. The best model performance was obtained at θ_s (R² = 0.30, RPD = 1.22). This indicates that the hypothesis (i.e., the saturated water content of forest soils can be predicted from vis-NIR spectra) is likely valid under the conditions tested. In other words, despite the very limited sample size used in this study, the fact that the R² exceeded the theoretically expected maximum of only 0.157 by nearly a factor of two provides indicative support for the plausibility of the hypothesis. The worst model performance was observed at θ_2.0 and θ_2.4 based on the negative R² values. The model performances at θ_1.4 and θ_3.0 were inadequate because the R² values were lower than the expected R². The mean RPD (1.07 ± 0.07) was similar even though R² was between −0.03 and 0.30. The RMSE between θ_1.4 and θ_3.0 was approximately 0.07, which is within approximately 15% of the predicted values, despite the inadequate model performance. The model performance tended to decrease toward θ_3.0 but improved again at θ_3.2, which indicates that the prediction accuracy improved at matric suctions higher than pF 3.2 (e.g., pF 4.2). Many previous studies [20,21] have found that the prediction accuracy of their models improved when applied to predicting the volumetric water content in dry regions rather than in wet regions. Blaschek et al. were able to predict the volumetric water content at the permanent wilting point (pF 4.2 = −1500 kPa) more accurately than at the field capacity (pF1.0 = −0.98 kPa) [20].

Figure 5 presents the importance of each wavelength for predicting the volumetric water content at a given matric suction. Table 2 lists the ten most important wavelengths and their values for a given matric suction. Clear peaks in importance were observed in specific spectral regions. Notably, the wavelengths of 1690 and 1592 nm were consistently selected at nine matric suctions, indicating their importance to the model. These wavelengths correspond to the absorption features of C–H and O–H bonds associated with organic matter and clay minerals, which are both known to influence the water retention capacity of a soil [38,39]. Additionally, the wavelength of 930 nm, which is close to the third overtone of water, and the band at 1768–1824 nm, which is associated with the first overtone of free water, were frequently selected. This suggests that both free and bound water features were captured by the model [3,40]. A key observation is the systematic shift in selected wavelengths across the pF spectrum. Under wetter conditions (i.e., θ_s to θ_2.0), the model emphasized shorter wavelengths such as 930 and 1768 nm, which are directly related to free water absorption. Under drier conditions (i.e., θ_2.4 to θ_3.2), the model emphasized longer wavelengths such as 2104, 2344, and 2362 nm, which are associated with metal–OH bonds in hydrated minerals [39]. This transition illustrates the model’s ability to dynamically adjust its spectral focus based on the prevailing moisture state by utilizing both direct water-related absorption bands and indirect structural indicators. In addition to the frequency of their selection, the magnitude of their importance underscored the relevance of specific wavelengths. For instance, the wavelengths of 1690 and 1592 nm were not only selected consistently across matric suctions but also exhibited high absolute importance values, which demonstrates their strong contribution to the model. These wavelengths likely serve as robust indicators of the intrinsic soil properties governing water retention. In contrast, other wavelengths were selected less frequently or had a lower magnitude of importance, which suggests that they may function as auxiliary predictors under specific conditions. Overall, the combination of consistent selection and high importance values for key wavelengths supports the conclusion that EBM successfully identified spectrally and physically meaningful features and offers both interpretability and reliability despite the limited sample size.

4. Discussion

The prediction performance of the model exceeded theoretical expectations particularly given the small sample size of the dataset. Notably, the model consistently identified wavelengths known to be associated with soil water retention, such as organic matter and clay content. These findings suggest that the regression framework and model’s ability to extract meaningful spectral information were appropriate and that the primary limitations can be attributed to insufficient statistical power resulting from the small sample size. Importantly, as mentioned in Section 2.4, approximately 3360 samples would be theoretically required to achieve an R² greater than 0.8 [30]; other studies have reported considerably higher prediction accuracy with fewer samples. For instance, Roth et al. [41] achieved R² ≈ 0.6 using 230 samples, and Lalitha et al. [42] reported R² ≈ 0.7 with 558 samples. Although these studies targeted different soil properties and environmental conditions, they nonetheless demonstrate that moderate increases in sample size can substantially improve model performance. Thus, even without reaching the theoretical threshold, increasing the sample size in future studies is expected to make a major contribution to improving prediction accuracy.

It is also important to consider the influence of soil heterogeneity. The dataset consisted of 151 samples collected from 34 profiles, encompassing a wide range of parent materials (Table 1), including volcanic ash, granite, schist, and sedimentary rocks. This heterogeneity introduces considerable spectral variability, which can reduce model performance, as previously noted by Stenberg et al. [3] and Pittaki-Chrysodonta et al. [37]. While large sample sizes may help mitigate this effect, under data-scarce conditions, such variability can act as a confounding factor. Stratifying models by soil type or incorporating ancillary information (e.g., texture, mineralogy) may enhance prediction accuracy, as demonstrated by Baumann et al. [21]. Similarly, Oberholzer et al. [43] reported improved Vis–NIR model performance (RPD = 1.14–5.27) when soils were stratified by carbonate content, supporting the utility of targeted modeling strategies in heterogeneous environments.

Despite these limitations, the EBM model successfully captured spectrally and physically meaningful relationships related to soil water retention. As shown in Figure 5 and Table 2, the model consistently selected wavelengths associated with organic matter (e.g., 1592 nm, 1690 nm) and hydrated clay minerals (e.g., 2104 nm, 2362 nm). The shift in important wavelengths from shorter (free water) to longer (bound water and mineral-associated) bands with increasing matric suction demonstrates the model’s sensitivity to moisture conditions. The interpretability of EBM is especially valuable in this context, as it allows for transparent evaluation of feature relevance, unlike traditional black-box models such as neural networks or PLS. Recent developments in interpretable machine learning, such as explainable gradient boosting frameworks (e.g., Jeong et al. [44]), have emphasized the importance of transparent modeling in environmental sciences. These models maintain high predictive accuracy while providing direct insight into the roles of individual predictors, similar to the advantages demonstrated by EBM in this study. These findings support the utility of combining vis–NIR spectroscopy with explainable machine learning techniques in future soil hydraulic studies, particularly if applied to larger, stratified datasets for enhanced generalizability.

5. Conclusions

This study investigated the applicability of vis-NIR spectroscopy to predicting the water retention characteristics of Japanese forest soils, which often contain volcanic ash. While the overall prediction accuracy remained low because of the limited sample size and heterogeneous soil types, EBM effectively identified key spectral regions associated with water retention. The prediction performance was particularly good under saturated conditions, achieving R² = 0.30, RMSE = 0.067, and RPD = 1.22, which exceeded the theoretical expectation of R² = 0.157 given the small sample size. Although the regression accuracy under other matric suctions was generally lower, the RMSE values remained within approximately 15% of the predicted values. Important wavelengths varied with the matric suction as clay- and moisture-related bands became more prominent under drier conditions. Although the regression accuracy did not reach practical levels, the interpretability of EBM facilitated meaningful insights into the relationships between soil properties and spectral characteristics. These findings highlight both the current limitations and future potential of vis-NIR spectroscopy for predicting the hydraulic properties of forest soils. Improved performance is expected by using larger datasets and stratifying the model by soil type.

Author Contributions

Conceptualization, R.S. and M.K.; methodology, R.S.; investigation, R.S., T.T. and M.K.; writing—original draft preparation, R.S.; writing—review and editing, T.T., M.K. and A.I.; visualization, R.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

EBM	Explainable boosting machine
LIME	Local interpretable model-agnostic explanations
PLS	Partial least-squares
RMSE	Root mean squared error
RPD	Ratio of performance to deviation
SHAP	Shapley additive explanations
SOC	Soil organic carbon
Vis-NIR	Visible-Near-Infrared spectroscopy
WRB	World Reference Base for Soil Resources

References

IPCC. Summary for Policymakers. In Climate Change 2023: Synthesis Report. Contribution of Working Groups I, II, and III to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change; Core Writing Team, Lee, H., Romero, J., Eds.; IPCC: Geneva, Switzerland, 2023; pp. 1–34. Available online: https://www.ipcc.ch/report/ar6/syr/downloads/report/IPCC_AR6_SYR_SPM.pdf (accessed on 5 July 2025).
Shepherd, K.D.; Walsh, M.G. Infrared spectroscopy—Enabling an evidence-based diagnostic surveillance approach to agricultural and environmental management in developing countries. J. Near Infrared Spec. 2007, 15, 1–19. [Google Scholar] [CrossRef]
Stenberg, B.; Viscarra Rossel, R.A.; Mouazen, A.M.; Wetterlind, J. Chapter five—Visible and near infrared spectroscopy in soil science. Adv. Agron. 2010, 107, 163–215. [Google Scholar] [CrossRef]
Mohamed, E.S.; Saleh, A.M.; Belal, A.B.; Gad, A. Application of near-infrared reflectance for quantitative assessment of soil properties. Egypt. J. Remote Sens. Space Sci. 2018, 21, 1–14. [Google Scholar] [CrossRef]
Janik, L.J.; Merry, R.H.; Skjemstad, J.O. Can mid infrared diffuse reflectance analysis replace soil extractions? Aust. J. Exp. Agric. 1998, 38, 681–696. [Google Scholar] [CrossRef]
Minasny, B.; McBratney, A.B.; Tranter, G.; Murphy, B.W. Using soil knowledge for the evaluation of mid-infrared diffuse reflectance spectroscopy for predicting soil physical and mechanical properties. Eur. J. Soil. Sci. 2008, 59, 960–971. [Google Scholar] [CrossRef]
Vestergaard, R.-J.; Vasava, H.B.; Aspinall, D.; Chen, S.; Gillespie, A.; Adamchuk, V.; Biswas, A. Evaluation of optimized preprocessing and modeling algorithms for prediction of soil properties using VIS–NIR spectroscopy. Sensors 2021, 21, 6745. [Google Scholar] [CrossRef] [PubMed]
Norouzi, S.; Sadeghi, M.; Tuller, M.; Ebrahimian, H.; Liaghat, A.; Jones, S.B. A novel laboratory method for the retrieval of the soil water retention curve from shortwave infrared reflectance. J. Hydrol. 2023, 626, 130284. [Google Scholar] [CrossRef]
Fouad, Y.; Soltani, I.; Cudennec, C.; Michot, D. Using near-infrared spectroscopy to estimate soil water retention curves with the van Genuchten model. Geoderma 2025, 454, 117175. [Google Scholar] [CrossRef]
De Melo, T.M.; Pedrollo, O.C. Artificial neural networks for estimating soil water retention curve using fitted and measured data. Appl. Environ. Soil. Sci. 2015, 2015, 535216. [Google Scholar] [CrossRef]
Rossel, R.A.V.; Behrens, T. Using data mining to model and interpret soil diffuse reflectance spectra. Geoderma 2010, 158, 46–54. [Google Scholar] [CrossRef]
Wadoux, A.M.J.-C. Interpretable spectroscopic modelling of soil with machine learning. Eur. J. Soil. Sci. 2023, 74, e13370. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A Unified approach to interpreting model predictions. In Proceedings of the 2017 Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; IEEE: San Francisco, CA, USA, 2016; pp. 1135–1144. [Google Scholar]
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef] [PubMed]
Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. 2022. Available online: https://christophm.github.io/interpretable-ml-book/ (accessed on 5 July 2025).
Caruana, R.; Lou, Y.; Gehrke, J.; Koch, P.; Sturm, M.; Elhadad, N. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘15), Sydney, Australia, 10–13 August 2015; Association for Computing Machinery: New York, NY, USA, 2015; pp. 1721–1730. [Google Scholar]
Lou, Y.; Caruana, R.; Gehrke, J.; Hooker, G. Accurate intelligible models with pairwise interactions. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, 11–14 August 2013; ACM: New York, NY, USA, 2013; pp. 623–631. [Google Scholar]
Microsoft InterpretML Project. InterpretML: A Unified Framework for Machine Learning Interpretability. Available online: https://interpret.ml/docs/ebm.html (accessed on 3 July 2025).
Blaschek, M.; Roudier, P.; Poggio, M.; Hedley, C.B. Prediction of soil available water-holding capacity from visible near-infrared reflectance spectra. Sci. Rep. 2019, 9, 12833. [Google Scholar] [CrossRef] [PubMed]
Baumann, P.; Lee, J.; Behrens, T.; Biswas, A.; Six, J.; McLachlan, G.; Viscarra Rossel, R.A. Modelling soil water retention and water-holding capacity with visible–near-infrared spectra and machine learning. Eur. J. Soil Sci. 2022, 73, 13220. [Google Scholar] [CrossRef]
Conforti, M.; Castrignanò, A.; Robustelli, G.; Scarciglia, F.; Stelluti, M.; Buttafuoco, G. Laboratory-based Vis–NIR spectroscopy and partial least square regression with spatially correlated errors for predicting spatial variation of soil organic matter content. CATENA 2015, 124, 60–67. [Google Scholar] [CrossRef]
Nanko, K.; Ugawa, S.; Hashimoto, S.; Imaya, A.; Kobayashi, M.; Sakai, H.; Ishizuka, S.; Miura, S.; Tanaka, N.; Takahashi, M. A Pedotransfer function for estimating bulk density of forest soil in japan affected by volcanic ash. Geoderma 2014, 213, 36–45. [Google Scholar] [CrossRef]
Chinilin, A.V.; Vindeker, G.V.; Savin, I.Y. Vis-NIR Spectroscopy for Soil Organic Carbon Assessment: A Meta-Analysis. Eurasian Soil Sc. 2023, 56, 1605–1617. [Google Scholar] [CrossRef]
Liu, S.; Shen, H.; Chen, S.; Zhao, X.; Biswas, A.; Jia, X.; Shi, Z.; Fang, J. Estimating forest soil organic carbon content using vis-nir spectroscopy: Implications for large-scale soil carbon spectroscopic assessment. Geoderma 2019, 348, 37–44. [Google Scholar] [CrossRef]
Soil Classification System of Japan. The Fifth Committee for Soil Classification and Nomenclature of Japanese Society of Pedology; The Japanese Society of Pedology: Sendai, Japan, 2017. (In Japanese) [Google Scholar]
IUSS Working Group WRB. World Reference Base for Soil Resources: International Soil Classification System for Naming Soils and Creating Legends for Soil Maps, 4th ed.; International Union of Soil Sciences: Vienna, Austria, 2022. [Google Scholar]
Imaya, A.; Noguchi, A.; Watanabe, A.; Ito, C.; Takakai, F.; Shinmachi, F.; Uno, F.; Fujisawa, H.; Miura, H.; Kubotera, H.; et al. The Soils of, Japan; Hatano, R., Shinjo, H., Takata, Y., Eds.; Springer: Singapore, 2021. [Google Scholar]
Savitzky, A.; Golay, M.J.E. Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 1964, 36, 1627–1639. [Google Scholar] [CrossRef]
Riley, R.D.; Snell, K.I.E.; Ensor, J.; Burke, D.L.; Harrell, F.E.; Moons, K.G.M.; Collins, G.S. Minimum sample size for developing a multivariable prediction model: Part I—Continuous outcomes. Stat. Med. 2019, 38, 1262–1275. [Google Scholar] [CrossRef] [PubMed]
Python Software Foundation Python Language Reference, Version 3.12.3. 2024. Available online: https://www.python.org (accessed on 5 July 2025).
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Bertrand, T.; Grisel, O. Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Core Team. R: A Language and Environment for Statistical Computing, Version 4.4.0. R Foundation for Statistical Computing. 2024. Available online: https://www.R-project.org/ (accessed on 5 July 2025).
Hothorn, T.; Bretz, F.; Westfall, P. Simultaneous inference in general parametric models. Biom. J. 2008, 50, 346–363. [Google Scholar] [CrossRef] [PubMed]
McGuirk, H.A.; Cairns, A. Relationships between Soil Moisture and Visible–NIR Soil Reflectance: A Review Presenting New Analyses and Data to Fill the Gaps. Geotechnics 2024, 4, 78–108. [Google Scholar] [CrossRef]
Weyer, L.G.; Lo, S.-C. Spectra—structure correlations in the near-infrared. In Handbook of Vibrational Spectroscopy; Chalmers, J.M., Griffiths, P.R., Eds.; Wiley: New York, NY, USA, 2006; pp. 1817–1837. [Google Scholar]
Pittaki-Chrysodonta, Z.; Moldrup, P.; Knadel, M.; Iversen, B.V.; Hermansen, C.; Greve, M.H.; de Jonge, L.W. Predicting the campbell soil water retention function: Comparing visible–near-infrared spectroscopy with classical pedotransfer function. Vadose Zone J. 2018, 17, 1–12. [Google Scholar] [CrossRef]
Ben-Dor, E. The reflectance spectra of organic matter in the visible near-infrared and short wave infrared region (400–2500 Nm) during a controlled decomposition process. Remote Sens. Environ. 1997, 61, 1–15. [Google Scholar] [CrossRef]
Demattê, J.A.M.; Campos, R.C.; Alves, M.C.; Fiorio, P.R.; Nanni, M.R. Visible–NIR reflectance: A new approach on soil evaluation. Geoderma 2004, 121, 95–112. [Google Scholar] [CrossRef]
Viscarra Rossel, R.A.; McBratney, A.B. Laboratory evaluation of a proximal sensing technique for simultaneous measurement of soil clay and water content. Geoderma 1998, 85, 19–39. [Google Scholar] [CrossRef]
Roth, K.; Sanderson, M.; Koestel, J.; Jarvis, N. Rapid estimation of soil water retention curves using visible–near infrared spectroscopy. J. Hydrol. 2021, 603, 127195. [Google Scholar] [CrossRef]
Lalitha, M.; Sarkar, D.; Bhowmik, A.; Chakraborty, S.; Das, T.; Kundu, D. Field-scale prediction of soil moisture retention parameters using VNIR–SWIR spectroscopy and machine learning. Geoderma 2022, 417, 115816. [Google Scholar]
Oberholzer, S.; Summerauer, L.; Steffens, M.; Ifejika Speranza, C. Best performances of visible–near-infrared models in soils with little carbonate—A field study in Switzerland. Soil 2024, 10, 231–249. [Google Scholar] [CrossRef]
Jeong, S.; Kim, Y.-K.; Hur, S.H.; Bang, H.; Kim, H.; Chung, H. Explainable extreme gradient boosting as a machine learning tool for discrimination of the geographical origin of chili peppers using laser ablation-inductively coupled plasma mass spectrometry, X-ray fluorescence, and near-infrared spectroscopy. J. Agric. Food Res. 2024, 18, 101446. [Google Scholar] [CrossRef]

Figure 1. Locations of soil sampling sites.

Figure 2. Boxplots of the volumetric water content: (a) volumetric water content at a given matric suction (n = 151), and (b) volumetric water content by parent material at a given matric suction (n = 151). In (a), the orange lines and white circles corresponded to the mean volumetric water content at a given matric suction and outliers, respectively. Significant differences (p < 0.05) weren’t observed between box plots with the same letters.

Figure 3. Absorbance spectra of the soil samples (n = 151): (a) smoothing by using a quadratic polynomial and (b) the second derivative (Note: positive and negative values are reversed in (b)).

Figure 4. Scatter plots of the volumetric water content (θ_i) at a given matric suction (pF i) predicted by EBM. The vertical and horizontal axes represent the predicted and observed volumetric water contents, respectively. The red dots and black crosses correspond to predictions using the test and training datasets, respectively. The gray dotted lines show the 1:1 line. The model performance in terms of R², RMSE, and RPD for the test dataset is also shown.

Figure 5. Importance of each wavelength for predicting the volumetric water content (θ) at a given matric suction (pF i). Red dots represent the ten most important wavelengths for each θ_i.

Table 1. Details of soil samples.

Sampling Number	Latitude (N) Longitude (E)	Horizon	Sampling Depth [cm]	Parent Material	Sample Volume [cm³]	Bulk Density [g cm⁻³]
1	36.1845 140.218	A1	10	Volcanic ash	400	0.89
		A2	30		400	0.83
		B1	54		400	0.70
		B2	85		400	0.66
2	36.183 140.217	*	10	Volcanic ash	400	0.61
		*	40		400	0.70
		*	10		400	0.67
		*	80		400	0.55
		*	40		400	0.64
		*	80		400	0.68
3	36.183 140.217	*	10	Volcanic ash	400	0.52
		*	40		400	0.65
		*	80		400	0.67
4	34.121 134.044	A	5	Sandstone and mudstone	400	0.48
		BA	12		400	0.70
		B1	24		400	1.01
5	34.792 135.841	AC	5	Granite	400	0.84
		C1	20		400	0.87
		C2	35		400	0.83
		C3	52		400	0.87
		R	72		400	0.94
6	37.938 139.359	A	2	Dacite	400	0.79
		B	13		400	1.10
		C1	42		400	1.22
		C2	78		400	1.31
		C3	107		400	1.25
7	33.139 130.709	A	3	Schist	400	0.59
		B1	13		400	0.72
		B2	35		400	0.83
		2A	55		400	0.88
		2C	73		400	1.16
8	33.139 130.709	HA-A	10	Schist	400	0.57
		B1	30		400	0.99
		B2	48		400	1.01
		B3	72		400	1.25
		BL	90		400	0.93
9	26.819 128.299	HA-AB	3	Mudstone	400	0.90
		B1	15		400	1.39
		B2	35		400	1.28
		BC	58		400	1.12
10	26.809 128.294	A	5	Mudstone	400	0.76
		B1	20		400	0.94
		B2	45		400	1.00
11	26.8100 128.274	A	4	Mudstone	400	0.27
		B1	23		400	0.91
		B2	50		400	0.98
12	26.826 128.255	A	4	Sandstone	400	0.57
		B	20		400	1.36
		BC1	50		400	1.19
13	26.820 128.274	A	2	Mudstone	400	0.52
		B1	7		400	0.66
		B2	18		400	0.88
		C	30		400	0.92
14	35.738 137.014	A	5	Sandstone and mudstone	400	0.46
		BA	15		400	0.32
		B	30		400	0.52
		BC1	54		400	0.74
15	31.5200 130.795	A1	4	Volcanic ash	400	0.88
		A2	14		400	0.53
		BC	30		400	0.72
		2AB	50		400	0.46
		2B1	70		400	0.44
		2B2	90		400	0.58
16	26.8100 128.274	*	5	Mudstone	400	0.66
		*	20		400	1.21
		*	50		400	1.24
17	26.7200 128.270	A-AB	3	Mudstone	400	0.63
		B1	33		400	1.09
		B2	58		400	1.11
		B3	76		400	1.11
		CB	92		400	1.20
18	26.843 128.262	A-AB(1)	3	Sandstone	400	1.01
		A-AB(2)	5		400	0.66
		B1	25		400	1.30
		B2	50		400	1.36
		BC	70		400	1.39
19	36.183 140.217	A	5	Volcanic ash	400	0.46
19	36.183 140.217	A	5	Volcanic ash	400	0.51
20	36.183 140.217	A	5	Volcanic ash	400	0.47
20	36.183 140.217	A	5	Volcanic ash	400	0.57
21	36.429 136.645	A	5	Dacite	400	0.67
		B1	23		400	0.73
		B2	47		400	0.87
		BC1	72		400	0.93
		BC2	98		400	0.75
22	38.941 140.258	A1	3	Tuff	400	0.37
		A2	15		400	0.57
		AB	33		400	0.59
		B1	55		400	0.79
		B2	85		400	0.91
23	40.678 140.211	A	5	Volcanic ash	400	0.20
		BA	12		400	0.65
		B1	24		400	0.72
		B2	43		400	0.83
		B3	66		400	0.83
		CB	85		400	0.80
24	33.297 130.836	*	5	Andesite	100	0.53
		*	15		100	0.49
		*	30		100	0.89
		*	50		100	0.97
		*	70		100	0.80
		*	90		100	0.90
25	33.297 130.836	*	5	Andesite	100	0.58
		*	15		100	0.68
		*	30		100	0.79
		*	50		100	0.87
		*	70		100	0.96
		*	90		100	0.90
26	33.484 130.675	*	5	Granodiolite	100	0.58
		*	15		100	0.54
		*	30		100	0.64
		*	50		100	0.77
		*	70		100	0.82
		*	90		100	0.82
27	36.174 140.177	*	5	Volcanic ash	100	0.28
		*	15		100	0.36
		*	30		100	0.58
		*	50		100	0.82
		*	70		100	0.80
28	33.647 133.718	A	2	Conglomerate	400	0.53
		B1	13		400	0.96
		B2	27		400	1.12
		B3	39		400	0.93
29	36.431 136.643	A	3	Dacite	400	0.40
		B1	17		400	0.81
		B2	45		400	0.70
		B4	85		400	0.71
30	33.484 130.675	*	5	Conglomerate	100	0.67
		*	15		100	0.67
		*	30		100	0.94
		*	50		100	1.05
31	33.479 130.710	*	5	Schist	100	0.95
		*	15		100	0.93
		*	30		100	0.90
		*	50		100	1.00
		*	70		100	1.09
		*	90		100	1.07
32	33.479 130.710	*	5	Schist	100	0.64
		*	15		100	1.00
		*	30		100	1.17
		*	50		100	1.05
		*	70		100	1.22
		*	90		100	1.11
33	36.515 140.307	A	10	Volcanic ash	100	0.56
		B2	30		100	0.63
		B3	60		100	0.48
		B4	100		100	0.76
34	36.515 140.308	A1	10	Volcanic ash	100	0.45
		A2	30		100	0.60
		B1	60		100	0.79
		B3	100		100	0.30

Notes: The sample volume represents the type of soil sampler (i.e., 400 and 100 cm³ core samplers). Soil samples were collected so that the center of the sampler corresponded to the indicated “sampling depth”. Asterisks (*) in the table indicate samples collected based on depth; soil horizon information is not available.

Table 2. Ten most important wavelengths and their corresponding importance values for each volumetric water content (θ) at a given matric suction (pF i).

θ_s		θ_1.0		θ_1.4		θ_1.7		θ_2.0		θ_2.4		θ_2.7		θ_3.0		θ_3.2
Wavelength	Importance	Wavelength	Importance	Wavelength	Importance	Wavelength	Importance	Wavelength	Importance	Wavelength	Importance	Wavelength	Importance	Wavelength	Importance	Wavelength	Importance
2360	0.1367	1690	0.2088	1690	0.2087	1690	0.2254	1690	0.2307	1690	0.1517	1690	0.1423	1768	0.1387	1768	0.1366
1684	0.1332	1592	0.1735	1592	0.1936	1592	0.1845	1692	0.1845	1768	0.1415	1768	0.1396	1690	0.1299	1690	0.1290
2396	0.1294	1186	0.1610	1768	0.1584	1692	0.1768	1592	0.1818	1692	0.1284	1824	0.1195	1824	0.1223	1824	0.1224
1630	0.1289	1630	0.1602	1692	0.1531	1768	0.1538	1768	0.1598	1592	0.1213	1692	0.1181	1692	0.1132	1692	0.1148
1592	0.1268	1640	0.1519	1186	0.1464	930	0.1390	930	0.1332	1824	0.1109	1592	0.1117	2104	0.1074	2104	0.1044
1690	0.1222	1692	0.1510	1608	0.1394	1712	0.1215	1712	0.1244	2104	0.1108	2104	0.1071	1694	0.1046	1694	0.1036
1186	0.1175	930	0.1377	1640	0.1379	1684	0.1209	2362	0.1235	930	0.1064	1694	0.1043	1592	0.1032	1592	0.1012
2362	0.1170	1768	0.1372	930	0.1338	1186	0.1204	1714	0.1234	1694	0.1054	744	0.1034	744	0.0989	2362	0.0977
930	0.1148	1314	0.1347	1630	0.1248	2362	0.1194	1824	0.1212	2344	0.1048	930	0.1008	2362	0.0987	1714	0.0976
2358	0.1129	1608	0.1347	1314	0.1187	1694	0.1163	1590	0.1212	744	0.1048	2344	0.1008	2344	0.0980	930	0.0941

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sekiguchi, R.; Tsurita, T.; Kobayashi, M.; Imaya, A. Applicability of Visible–Near-Infrared Spectroscopy to Predicting Water Retention in Japanese Forest Soils. Forests 2025, 16, 1182. https://doi.org/10.3390/f16071182

AMA Style

Sekiguchi R, Tsurita T, Kobayashi M, Imaya A. Applicability of Visible–Near-Infrared Spectroscopy to Predicting Water Retention in Japanese Forest Soils. Forests. 2025; 16(7):1182. https://doi.org/10.3390/f16071182

Chicago/Turabian Style

Sekiguchi, Rando, Tatsuya Tsurita, Masahiro Kobayashi, and Akihiro Imaya. 2025. "Applicability of Visible–Near-Infrared Spectroscopy to Predicting Water Retention in Japanese Forest Soils" Forests 16, no. 7: 1182. https://doi.org/10.3390/f16071182

APA Style

Sekiguchi, R., Tsurita, T., Kobayashi, M., & Imaya, A. (2025). Applicability of Visible–Near-Infrared Spectroscopy to Predicting Water Retention in Japanese Forest Soils. Forests, 16(7), 1182. https://doi.org/10.3390/f16071182

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Applicability of Visible–Near-Infrared Spectroscopy to Predicting Water Retention in Japanese Forest Soils

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Soil Samples

2.3. Spectroscopic Analysis

2.4. EBM

2.5. Performance Evaluation

3. Results

3.1. Measurements

3.2. Prediction Performance

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI