Artiﬁcial Intelligence-Based Prediction of Key Textural Properties from LUCAS and ICRAF Spectral Libraries

: Soil texture is a key soil property inﬂuencing many agronomic practices including fertilization and liming. Therefore, an accurate estimation of soil texture is essential for adopting sustainable soil management practices. In this study, we used different machine learning algorithms trained on vis–NIR spectra from existing soil spectral libraries (ICRAF and LUCAS) to predict soil textural fractions (sand–silt–clay %). In addition, we predicted the soil textural groups (G1: Fine, G2: Medium, and G3: Coarse) using routine chemical characteristics as auxiliary. With the ICRAF dataset, multilayer perceptron resulted in good predictions for sand and clay (R 2 = 0.78 and 0.85, respectively) and categorical boosting outperformed the other algorithms (random forest, extreme gradient boosting, linear regression) for silt prediction (R 2 = 0.81). For the LUCAS dataset, categorical boosting consistently showed a high performance for sand, silt, and clay predictions (R 2 = 0.79, 0.76, and 0.85, respectively). Furthermore, the soil texture groups (G1, G2, and G3) were classiﬁed using the light gradient boosted machine algorithm with a high accuracy (83% and 84% for ICRAF and LUCAS, respectively). These results, using spectral data, are very promising for rapid diagnosis of soil texture and group in order to adjust agricultural practices.


Introduction
Soils provide several ecosystem services to support human needs; however, due to the rapid growth of the human population, soils are facing unprecedented pressure through the intensification of agricultural production [1]. Therefore, monitoring the soil status is crucial to support sustainable and high agricultural productivity. Soil analysis is an important key factor for sustainable soil management, as it provides a valuable information about the soil condition prior to making recommendations regarding fertilization, and irrigation, and subsequently improving soil productivity. Gathering precise and detailed information on the soil is essential for crop yield increase [2]. There are many chemical, physical, and biological properties that can affect the soil quality and its function within an agricultural production system [3]. Soil texture, which represents the relative composition of particle sizes (sand: 0.05 to 2 mm, silt: 0.002 to 0.05 mm, and clay: ≤0.002 mm [4]), is one the fundamental physical properties. Soil texture usually shows significant spatial variation within a land area. There are many processes affected by soil texture, including plant growth and yield, water retention and infiltration, availability and absorption of plant nutrients, soil organisms and plant root growth, soil quality and productivity, soil temperature, soil compaction, tillage, and the efficiency of fertilizer use and irrigation water [5]. Considering variations in soil texture within a field is now possible with global positioning Both datasets (LUCAS and ICRAF) contain vis-NIR spectral and chemical data from different locations in many countries. LUCAS is a large-scale vis-NIR soil dataset with 19,967 soil samples collected from 23 member states of the European Union under the Land Use/Land Cover Area Frame Survey (LUCAS) that started in 2009 [19]. The standard sampling procedure used around 0.5 kg of topsoil (0-20 cm), which is a composite sample of five subsamples. The first subsample represents the central georeferenced sampling location characterized by the LUCAS point, and the other four subsamples were collected in a crossshape at 2 m from the central point. These soil samples were dried, sieved, and subjected to spectral measurements using a FOSS XDS Rapid Content Analyzer (NIRSystems, Inc., Laurel, MD, USA), operating with wavelengths ranging from 400 to 2500 nm and a spectral resolution of 0.5 nm. The LUCAS dataset includes various soil properties, such as: coarse fragment percentage, particle size distribution (% of sand, silt, and clay), organic carbon, pH, carbonate content, cation exchange capacity, N, P, extractable K, and metals. The standard analytical methods of soil properties used the ISO methods [21]. The vis-NIR analyses followed the procedure described by the FOSS spectroscope [22].
The ICRAF dataset is composed of 4438 soil samples selected from the soil information system (ISIS) of the International Soil Reference and Information Centre [20]. This library represents soil samples from 58 countries around the world (Africa, Asia, South America, North America, and Europe) [20]. These soil samples were air-dried, clod crushed, and sieved at 2 mm. Soil properties of pH, OC, CEC, exchangeable (K, Ca, Mg), sand, silt, and clay content were determined according to the IRSIC procedures for soil analysis [21]. For the spectral analysis, the soil samples were scanned using a FieldSpec FR spectroradiometer Agronomy 2021, 11,1550 3 of 17 (Boulder, CO) at spectral wavelengths ranging from 350 to 2500 nm, with a spectral resolution of 1 nm. Figure 1 shows the raw spectra for soil samples of the LUCAS and ICRAF datasets. crushed, and sieved at 2 mm. Soil properties of pH, OC, CEC, exchangeable (K, Ca, Mg), sand, silt, and clay content were determined according to the IRSIC procedures for soil analysis [21]. For the spectral analysis, the soil samples were scanned using a FieldSpec FR spectroradiometer (Boulder, CO) at spectral wavelengths ranging from 350 to 2500 nm, with a spectral resolution of 1 nm. Figure 1 shows the raw spectra for soil samples of the LUCAS and ICRAF datasets.

Pre-Processing of Raw Spectra
Pre-processing of spectral data is usually a required step to improve data quality before modeling. This pre-processing step aims at removing undesirable physical phenomena and increasing the signal resolution, as well as maximizing the quality of subsequent data processing. In this study, five spectral pre-processing methods (first derivative, second derivative, continuum removal, detrend, and standard normal variate) were applied on both datasets as shown in Figure 2. The spectral data pre-processing methods were implemented in R using the "prospectr" package [23]. The estimation of the three textural fractions of soil were performed with the h2o AutoML algorithm [24] using the R software (Version 3.0.2) [25]. The best performing pre-processing technique was used for the following steps on the study. The first (1st Der) and second (2nd Der) derivatives can be used to remove the vertical offsets and linearly sloping baselines to produce low error calibration. Converting the raw spectra to 1st Der or 2nd Der aims at accentuating the reflectance features. The 1st Der is effective for removing baseline offset; the 2nd Der is effective for both the baseline offset and linear trend from a spectrum [26,27]. Derivatives also remove both additive and multiplicative effects on the spectra. Continuum removal (CR) was used as a normalization procedure prior to features quantification [28]. Detrend correction (DT) was applied to remove the baseline shifts and curvilinearity. Standard normal variate (SNV) was used to remove the scatter interferences and differences and correct the global intensity effect [29]. A detailed description of these pre-processing methods can be found elsewhere [26].

Pre-Processing of Raw Spectra
Pre-processing of spectral data is usually a required step to improve data quality before modeling. This pre-processing step aims at removing undesirable physical phenomena and increasing the signal resolution, as well as maximizing the quality of subsequent data processing. In this study, five spectral pre-processing methods (first derivative, second derivative, continuum removal, detrend, and standard normal variate) were applied on both datasets as shown in Figure 2. The spectral data pre-processing methods were implemented in R using the "prospectr" package [23]. The estimation of the three textural fractions of soil were performed with the h2o AutoML algorithm [24] using the R software (Version 3.0.2) [25]. The best performing pre-processing technique was used for the following steps on the study. The first (1st Der) and second (2nd Der) derivatives can be used to remove the vertical offsets and linearly sloping baselines to produce low error calibration. Converting the raw spectra to 1st Der or 2nd Der aims at accentuating the reflectance features. The 1st Der is effective for removing baseline offset; the 2nd Der is effective for both the baseline offset and linear trend from a spectrum [26,27]. Derivatives also remove both additive and multiplicative effects on the spectra. Continuum removal (CR) was used as a normalization procedure prior to features quantification [28]. Detrend correction (DT) was applied to remove the baseline shifts and curvilinearity. Standard normal variate (SNV) was used to remove the scatter interferences and differences and correct the global intensity effect [29]. A detailed description of these pre-processing methods can be found elsewhere [26]. models calibration including data splitting into three sets: calibration (70% of data), validation (15% of data) and testing (15% of data) and linear correction of biased residuals to predict the three soil texture fractions (sand, silt, and clay %).

Machine Learning Algorithms for the Prediction of Clay, Silt and Sand Fractions
Using spectral libraries with machine learning is important because it can analyze more data while delivering faster and more accurate results and help agricultural, environmental and ecosystemic services to strengthen their capacity to estimate soil properties and intensify soil characterization.
This modeling process is aimed at obtaining the particle size fractions by analyzing the spectral features contained in the spectrum acquired using the spectroradiometer scanning of a sample. As a soil spectral signature produces a very large number of spectral data (1250 pairs of wavelength and corresponding reflectance) and reflectances are often autocorrelated, machine learning procedures are the most appropriate option to correlate these 1250 input parameters with a single output factor: a particle size fraction (clay, silt, or sand).
Three steps are required to build a machine learning model: (i) selection of a machine learning algorithm, (ii) training the algorithm on large datasets to create a model that converges to an optimal accuracy, and (iii) validating the performance of the resulting model on a testing dataset.
These machine learning models must be constantly calibrated as the size of the database increases to maintain the most powerful and accurate estimates.
In this study, five machine learning algorithms were used, i.e., random forest (RF), categorical boosting (CatBoost), extreme gradient boosting (XGBoost), linear regression (LR) and multilayer perceptron (MLP). A brief description of these machine learning models follows.
Random forest (RF) is an ensemble learning technique that uses a combination of classification tree predictors to give predictions for the response variable [30]. Each tree is constructed from a bootstrap sample of the dataset and uses a randomly selected subset of the variables from the candidate set of input variables that is generated at each split of data. Therefore, for tree building, random forest uses both bagging (bootstrap aggregation) [31], a method for combining unstable learners, and random variables selection. RF algorithm can handle both categorical and continuous variables. RF has a good capability for high-dimensional data processing while avoiding overfitting, and it can be used for both classification and regression tasks.
Categorical boosting (CatBoost) is a gradient boosting decision tree (GBDT), which uses binary decision trees as base predictors [32]. CatBoost uses a symmetrical adopted tree model to reduce over-fitting and enhance the model's generalizability. Moreover, it avoids the problems of gradient bias and prediction shift.
Extreme gradient boosting (XGBoost) [33] is a scalable tree boosting system that implements a generalized gradient boosting method including an additional regularization term. The regularization term aims to smooth the final learned weights to avoid over-fitting, and to yield accurate models. In addition, parallel calculations are automatically executed for the functions during the training step [34]. This algorithm comes with several improvements in multithreaded processing and optimization function to improve calculation speed and reduce over-fitting events as well.
Linear regression (LR) is a machine learning algorithm used to represent the relationship between independent variables and one dependent variables (simple linear regression) or more than independent variables (multiple linear regression). This linear regression establishes a mathematical model to study and describe a given real-world phenomenon [35].
Multilayer perceptron neural network (MLP) is a class of feedforward artificial neural network (ANN) with a backpropagation algorithm for training. MLP consists of one or more hidden layers between the input and output layers. The hidden layers contain multiple neurons mimicking the biological nervous system [36]. The input signal starts with input nodes and propagates to the output node in the forward pass. The output Agronomy 2021, 11, 1550 6 of 17 calculation in each node is performed layer-by-layer by multiplying it by a weight factor called a synaptic weight. After performing the forward pass calculations, an error is calculated in the backward pass using a loss function [37]. The advantages of MLP is that it has a strong nonlinear fitting ability and can map arbitrarily complex nonlinear relationships [38]. Figure 2 presents the workflow of the applied methodology. All data were randomly split into three parts: 70% for calibration or training, 15% for validation, and 15% for testing using GroupShuffleSplit from scikit-learn [39]. The calibration set is used to learn the parameters of the model: it consists in estimating one granulometric fraction (clay, silt, or sand) according to the reflectances of the vis-NIR spectrum. This step is used to train the chosen algorithm. The validation set is used to avoid introducing any artifact in the prediction results when using the training dataset. This validation adjusts the model hyperparameters to avoid overfitting, a very common phenomenon in prediction model training. The test dataset acts as independent data and is used to verify the performance of the prediction results. In order to avoid the effect of a random seed, the random split was performed 30 times with different seeds. Therefore, for illustrative purposes, the seed chosen in this study is the one giving results closer to the average values of R 2 (Equation (1)) and RMSE (Equation (2)) obtained by the 30 prediction models.

Development of Predictive Modeling Process
where R 2 is the coefficient of determination, RMSE is the root mean squared error,ŷ i is the predicted value of the ith observation, y is the mean measured value, y i is the measured value of the ith observation, and n is the number of samples.
To correct the residual error (bias adjustment) and ensure unbiased predictions, the observed residuals from the validation model were used to linearly correct the bias [40]. This correction step was only used for prediction of the soil textural fractions.

Machine Learning Models for Predicting Fine (G1), Medium (G2), and Coarse (G3) Textural Groups
Since both soil libraries ICRAF and LUCAS contain the spectral and chemical measurements of soils, we used the seven available chemical measurements (N Total , P available , K exchangeable , Ca exchangeable , Mg exchangeable , CaCO 3 , pH water , OC) for LUCAS and the five available measurements (K exchangeable , Ca exchangeable , Mg exchangeable , pH water , OC) for ICRAF. The inclusion of these chemical measurements in the spectral measurements in the calibration model is intended to enhance the performance of the textural group prediction. In contrast to the predictions of the size fractions (clay, silt, and sand), which are quantitative features and have shown a better affinity with regression algorithms such as MLP and CatBoost, the textural groups (G1, G2, and G3) are categorical variables, which require classification algorithms, such as the light gradient boosting machine (LightGBM). The LightGBM is a model applying gradient-based one-side sampling (GOSS) and exclusive feature bundling technologies [4]. Unlike the conventional GBM tree splitting method, LightGBM uses the GOSS method to identify the observations that can be used for computing the split. LightGBM has the advantages of reducing memory usage while increasing the training speed. In addition, it produces much more complex trees by following a leaf-wise split approach rather than a level-wise approach, which is the main factor in achieving higher accuracy, and it is capable of handling large-scale data. LightGBM contains two novel techniques, namely gradient-based one-side sampling and exclusive feature bundling, to deal with large numbers of data instances and large numbers of features, respectively [41]. We used this algorithm to train predictive models for classification of the three textural groups of soil. Many research studies have used auxiliary information derived from different data sources to improve the prediction of soil properties [42,43]. Textural groups are based on the Canadian system of soil classification (Figure 3). techniques, namely gradient-based one-side sampling and exclusive feature bundling, to deal with large numbers of data instances and large numbers of features, respectively [41]. We used this algorithm to train predictive models for classification of the three textural groups of soil. Many research studies have used auxiliary information derived from different data sources to improve the prediction of soil properties [42,43]. Textural groups are based on the Canadian system of soil classification ( Figure 3).

Model Evaluation
To evaluate the performance and accuracy of prediction models for the three soil textural fractions, four parameters were used; the coefficient of determination (R 2 ), the root mean squared error (RMSE), and the slope and intercept of predicted versus observed values [46]. Li et al. [47] proposed a classification criterion for R 2 values: R 2 < 0.50 (unacceptable prediction), 0.50 ≤ R 2 < 0.75 (acceptable prediction) and R 2 ≥ 0.75 (good prediction). The same criterion was applied in the current study.
For the three textural groups (G1, G2, G3) we used a normalized confusion matrix and visualized it using the "seaborn" package [48] in python 3.9. The performance index in classifying the soil textural groups was evaluated by calculating the accuracy, ranging from 0 to 100% (Equation (3)), which represent the ratio between the number of correctly classified cases over the total number of properly and wrongly classified cases.

Model Evaluation
To evaluate the performance and accuracy of prediction models for the three soil textural fractions, four parameters were used; the coefficient of determination (R 2 ), the root mean squared error (RMSE), and the slope and intercept of predicted versus observed values [46]. Li et al. [47] proposed a classification criterion for R 2 values: R 2 < 0.50 (unacceptable prediction), 0.50 ≤ R 2 < 0.75 (acceptable prediction) and R 2 ≥ 0.75 (good prediction). The same criterion was applied in the current study.
For the three textural groups (G1, G2, G3) we used a normalized confusion matrix and visualized it using the "seaborn" package [48] in python 3.9. The performance index in classifying the soil textural groups was evaluated by calculating the accuracy, ranging from 0 to 100% (Equation (3)), which represent the ratio between the number of correctly classified cases over the total number of properly and wrongly classified cases. Accuracy = correctly classified Total : correctly and incorrectly classified Agronomy 2021, 11, 1550 8 of 17 Table 1 shows the five pre-processing techniques applied to the raw vis-NIR spectra. The results show that the first derivative is the best for ICRAF and LUCAS. Therefore, the first derivative spectrum was used for the prediction models of sand, silt, and clay. This choice was based on the four evaluation criteria that reflect the consistency between predicted and measured values when using the testing data. The model performed better when the slope (m) and the coefficient of determination (R 2 ) were closer to 1.0, and the intercept (a) and the RMSE were closer to 0.

Performance of Pre-Processing Techniques
Ben-Dor et al. [49] reported that the first derivative of the spectra improved spectral information and showed spectral changes better. Moreover, our results agree with other research studies. Gerighausen et al. [50] pointed out that the first derivative of the spectra gave the best prediction results for the soil clay and organic carbon content.
In general, there are several baseline removal methods available, such as spectral derivative transformations (first and second derivative), which is one of the best methods for removing baseline effects [27]. Rinnan et al. [26] reviewed the most applied spectral pre-processing techniques for near-infrared spectrometry. They concluded that the first derivative is very effective for removing the baseline offset. Therefore, the first derivative technique was selected in this study as the pre-processing method for the following steps. Compared to the raw spectra that give unacceptable prediction results according to the classification of Li et al. [47] with R 2 mostly around 0.47 for sand and silt and 0.65 for clay, the first derivative transformation of these spectra contributed to increase the robustness of the prediction models by over 20% (Table 1). Even though the two databases are very different in size and origin, the prediction results were comparable for both the raw and first derivative transformed spectra ( Table 2). The prediction from the first derivative spectra was significantly higher for clay than for the other two particle size fractions with the highest R 2 and slopes and the lowest intercepts and RMSE. Therefore, it may be concluded that spectrometry is more adapted to clay fraction than to silt and sand fractions.   Table 2 shows the results that yielded the best validity (measured vs. predicted) among the five models used with both datasets, ICRAF and LUCAS, compared to the models without bias corrections (Supplementary Tables S1 and S2). This validity assesses the degree of association between the three particle size fractions (clay, silt, and sand) measured by standard techniques (hydrometer or Robinson pipette) and those derived by examining the signal spectrum obtained by spectral scanning of the soil samples by artificial intelligence. In a two-dimensional graph representing the real values on the abscissa (e.g., measured percentage of clay) and the predicted values on the ordinate (predicted percentage of clay), we fitted a linear regression to the testing dataset. This linear regression model shows the four performance criteria ( Table 2): robustness (R 2 ), accuracy (slope, m), sensitivity (intercept, a) and precision (RMSE). Using these four performance criteria, CatBoost was found to be the best in predicting the three particle size components for the LUCAS dataset. Its R 2 values ranged from 0.76 to 0.85, its m-slopes were close to 1 (high accuracy), its a-intercepts were low at 0-0.57% (low sensitivities), and it had low RMSEs of 4.99-11.77. These results show that MLP appears to have performed better than CatBoost for the prediction of sand and clay with the ICRAF testing dataset. MLP showed a R 2 robustness of 0.78-0.85, m accuracies of 0.99; for the intercept, a, it showed a low sensitivity of 0.29-0.62%, and it had a RMSE of 8.77-13.55. It appears that CatBoost is best suited for LUCAS sand. MLP works best with ICRAF databases. According to the R 2 classification of Li et al. [48], it can be stated that all predictions of the three particle size components (clay, silt, and sand) in the two databases LUCAS and ICRAF are considered good, and thus satisfactory to adopt as alternative methods of soil scanning to deduce their particle size composition. Figure 4 shows all clay prediction results for the ICRAF dataset. These are the four criteria of prediction performance through the linear regression between the predicted and measured values in the training (Figure 4a), validation (Figure 4b), and testing (Figure 4d) datasets. In the same figure, the linear bias on the residuals of the validation dataset (Figure 4c) was used to correct the results in the testing dataset (Figure 4e). Accounting for the linear bias of the prediction residuals (predicted vs. measured residuals) as a function of the percentage of clay according to the equation Y = 0.235X-4.54 (Figure 4c), the predictions were improved significantly. Compared to the validation results (Figure 4b), the linear bias correction helped to increase the R 2 by 0.07, the slope m by 0.24, and sensitivity or intercept a by 4%. In Figure 4e, prediction in the corrected testing can be seen to reach a robustness R 2 of 0.85, an m value close to 1, and sensitivity close to zero. It is therefore clear that the linear bias correction proposed by Song [40] improves the performance criteria for clay prediction from vis-NIR spectra. Curiously, the prediction results were almost the same for the LUCAS database ( Figure 5) and the ICRAF database (Figure 4). Similarly, the prediction results for LUCAS in the corrected testing reached the same levels of R 2 robustness of 0.85, m of 0.99, and sensitivity, a, of 0.62% (Figure 5e).

Soil Fractions Prediction and Bias Correction
Generally, the bias correction step significantly improved the model accuracy for the slope, as shown in Figures 4 and 5, and furthermore, this step (bias correction) was carried out for the three soil textural fractions in both datasets (ICRAF and LUCAS), as shown in Table 2. For both datasets, the clay content prediction result was the highest between the soil textural fractions for LUCAS and ICRAF (R 2 = 0.85 for both datasets), followed by the silt (R 2 = 0.81 and 0.76 for ICRAF and LUCAS, respectively), and sand (R 2 = 0.78 for both datasets) (Supplementary Figures S1-S4).
Generally, the bias correction step significantly improved the model accuracy for the slope, as shown in Figures 4 and 5, and furthermore, this step (bias correction) was carried out for the three soil textural fractions in both datasets (ICRAF and LUCAS), as shown in Table 2. For both datasets, the clay content prediction result was the highest between the soil textural fractions for LUCAS and ICRAF (R 2 = 0.85 for both datasets), followed by the silt (R 2 = 0.81 and 0.76 for ICRAF and LUCAS, respectively), and sand (R 2 = 0.78 for both datasets) (Supplementary Figures S1-S4).  Cozzolino and Moron [51] achieved a good R 2 (0.67, 0.80, and 0.90 respectively) for sand, silt, and clay content from vis-NIR spectra using principal components regression (PCR) models (n = 332 soil samples). Chang et al. [52] obtained an R 2 of 0.67 for clay content prediction, beside other soil properties based on PCR using the first derivatives pre-processing (n = 743 soil samples). Ahmadi et al. [53] concluded the arithmetic mean of R 2 for sand, silt, and clay content prediction (0.76, 0.68, and 0.70, respectively) using vis-NIR data from several studies. These studies used different prediction algorithms, soils, and methodologies that can explain the variation of the prediction results obtained from several studies. Cozzolino and Moron [51] achieved a good R 2 (0.67, 0.80, and 0.90 respectively) fo sand, silt, and clay content from vis-NIR spectra using principal components regression (PCR) models (n = 332 soil samples). Chang et al. [52] obtained an R 2 of 0.67 for clay conten prediction, beside other soil properties based on PCR using the first derivatives pre-pro cessing (n = 743 soil samples). Ahmadi et al. [53] concluded the arithmetic mean of R 2 for sand, silt, and clay content prediction (0.76, 0.68, and 0.70, respectively) using vis-NIR data from several studies. These studies used different prediction algorithms, soils, and methodologies that can explain the variation of the prediction results obtained from sev eral studies.
The conventional methods used in the laboratory (e.g., pipette and hydrometer meth ods) for determining soil texture are laborious, costly, and not suitable for large numbers of soil samples. Moreover, the texture hand test method is very subjective and often gives The conventional methods used in the laboratory (e.g., pipette and hydrometer methods) for determining soil texture are laborious, costly, and not suitable for large numbers of soil samples. Moreover, the texture hand test method is very subjective and often gives unreliable results. Therefore, spectral soil analysis can be a valuable alternative to the traditional wet chemistry soil analyses once robust predictive models are built. This spectral method is attractive because: (a) the intercepts (sensitivities) do not exceed the value of 0.62% (Table 2); (b) the slope m (accuracy) is 0.99-1; (c) the R 2 (robustness) of the prediction models is 0.76-0.85 on the testing dataset (Table 2); (d) the range of the two datasets cover a wide variety of soils from several countries; (e) the reproducibility of the spectral calibration performance criteria was satisfactory for both independent databases of ICRAF and LUCAS; and (f) the suitability to precision agriculture, which generates a large number of soil tests. With these high prediction performances, we can expect the spectral method to be a candidate replacement for conventional methods of soil texture diagnosis. If a sufficient number of samples is available on a regional scale, the prediction performance could be further refined, since the training is done only on soils in the region where the analysis results will be used. A laser diffraction granulometry method is proposed as another alternative [54] to the standard techniques (hydrometer, Robinson pipette); however, it remains to be seen if it is competitive in terms of processing time and quality of the results [55].

Prediction of Textural Groups
Textural soil group information is very valuable for planning agricultural management practices [10]. Figure 6 shows the predicted textural groups for the LUCAS testing dataset using LightGBM. The overall accuracy (Equation (3)) of predicting the textural groups for the LUCAS testing dataset was 75% when considering only the spectra, 58% when considering only the seven available chemical measurements (N Total , P available , K exchangeable , Ca exchangeable , Mg exchangeable , CaCO 3 , pH water , OC), and 84% when combining the spectral and chemical predictors. The overall accuracy of the textural group prediction for the LUCAS training dataset was 100% regardless of whether one considers only spectra as predictors, or chemical measurements or combined measurements (chemical and spectral). This 100% accuracy shows that the training used to estimate the G1, G2, and G3 textural groups resulted in a perfect prediction with a 0% prediction error. However, this same "perfect" training model resulted in a 16-38% decrease in accuracy when applied to independent testing data. This shows that the LightGBM model tends to overtrain, causing a drop in prediction performance. Despite this decrease, the probability of correctly classifying the textural group from spectral analysis and routine chemical testing of the soil is still quite high, at almost 84%. Moreover, a normalized confusion matrix (observation ratios in percentage terms) was used to visualize the performance of the LightGBM model by textural groups G1, G2, and G3, as defined in Figure 3. Their accuracy was 100% for training and 80.8, 80.6, and 89.4% for G1, G2, and G3 respectively. The overall accuracy of predicting the textural groups for the ICRAF testing dataset was 75% when considering only the spectra, 62% when considering only the five available chemical measurements (K exchangeable , Ca exchangeable , Mg exchangeable , pH water , OC), and 83% when combining the spectral and chemical predictors. Figure 7 present the results of testing data from ICRAF dataset with a total or overall prediction accuracy of 100% for training and 83% for testing. Distributed over the three textural groups, this prediction accuracy for testing was 88.12%, 56.52%, and 80.26%, respectively for groups G1, G2, and G3 and 100% for training.
The accuracy of both models shows a good ability to classify soil textural groups. As expected, most of the misclassified data points were located on the borders between neighboring textural groups. Even the conventional methods (e.g., pipette and hydrometer methods) can generate such measurement errors when it comes to neighboring of textural groups [56]. Beretta et al. [57] confirmed that even conventional wet methods can show differences between them in textural class assignments. Therefore, this misclassification between different neighboring soil classes determined by the conventional methods may explain the misclassified data points generated by the LightGBM model. For a tolerance of about 10% on the sand and clay fractions, we can visualize a border effect of 20 ± 2% when considering the horizontal boundary of 20% clay between G1 and G3, of 30 ± 3% when considering the horizontal boundary of 30% clay between G1 and G2, and of 45 ± 4.5% to 55 ± 5.5% when considering the vertical boundaries of 45-55% sand between G2 and G3 (Figures 6 and 7). The remaining points that are misclassified, and are far from these border effects, are attributed to true prediction errors by LightGBM. At first glance, misclassified points due to border effects are more frequent than those due to prediction errors (Figures 6 and 7).  As shown in Figures 6 and 7, the total prediction accuracy for soil textural groups was similar for ICRAF and LUCAS, respectively at 83% and 84% for testing and 100% for training. These high classification accuracies were achieved when spectral information and auxiliary chemical variables are combined in the prediction model. Previous studies showed that the inclusion of auxiliary predictor variables improves the prediction accuracy of a model [8,18,[58][59][60]. Moreover, Cozzolino and Moron [51] obtained a high positive correlation between Ca and clay content (0.80), Cu and Mg (0.71), K and clay (0.60), and Mg and clay (0.51), which may consequently affect the soil texture prediction. They concluded that these high correlations between physical properties and chemical parameters could explain some of the high accuracy obtained. Our results suggest that the LightGBM model can be successfully used to predict the soil textural groups. Figure 6. Prediction texture triangles (following the Canadian system of soil classification) obtained from LightGBM algorithm for the LUCAS dataset: (a) calibration data (n = 12,087); (b) testing data (n = 2367); (c) normalized confusion matrix of the training dataset; (d) normalized confusion matrix of testing dataset obtained for the 3 textural groups.

Conclusions
In this paper, the machine learning algorithms MLP, CatBoost, and LightGBM were used to predict soil textural fractions (clay, silt, and sand) and groups (G1: Fine, G2: Medium, and G3: Coarse) for two large-scale vis-NIR soil spectral libraries (ICRAF and LUCAS). The models' performance on the testing datasets reached R 2 values up to 0.85. The results showed a good performance for the LUCAS testing dataset with R 2 values of 0.78 (sand), 0.81 (silt). and 0.85 (clay). For the ICRAF testing dataset, the R 2 values for sand, silt, and clay were 0.78, 0.76, and 0.85, respectively. This predictive capacity of soil texture properties using soil spectral information is a very promising alternative to the traditional soil laboratory analysis due to its sensitivity, accuracy, reliability, versatility, reproducibility, and adaptability to precision agriculture.
Furthermore, the soil textural groups (G1, G2, and G3) were classified with LightGBM using spectra and chemical auxiliary variables with a high overall accuracy of 100% for training and close to 84% for testing. These findings support the hypothesis that chemical tests are powerful auxiliary variables for improving the prediction of these three textural groups. The algorithms CatBoost, MLP, and LightGBM are promising for soil texture prediction, and they can be used when a soil spectral library is available with enough samples. Moreover, in light of these good prediction accuracies produced for soil texture prediction with the ICRAF and LUCAS spectral libraries and the CatBoost, MLP, and LightGBM algorithms, we can consider further refining this method by using regional soil sample databases with spectral data obtained using in situ spectral methods.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/agronomy11081550/s1, Table S1: Five machine and deep learning algorithms used for sand, silt, and clay prediction on LUCAS dataset before bias correction, Table S2: Five machine and deep learning algorithms used for sand, silt, and clay prediction on ICRAF dataset before bias correction, Figure S1: Prediction accuracy of Sand content from ICRAF dataset, Figure S2: Prediction accuracy of Silt content from ICRAF dataset, Figure S3: Prediction accuracy of Sand content from LUCAS dataset, Figure S4: Prediction accuracy of Silt content from ESDAC dataset.