Next Article in Journal
A Parallax Shift Effect Correction Based on Cloud Top Height for FY-4A Lightning Mapping Imager (LMI)
Previous Article in Journal
The Retrieval of Forest and Grass Fractional Vegetation Coverage in Mountain Regions Based on Spatio-Temporal Transfer Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Rapid Determination of Soil Horizons and Suborders Based on VIS-NIR-SWIR Spectroscopy and Machine Learning Models

by
Karym Mayara de Oliveira
1,*,
Renan Falcioni
1,
João Vitor Ferreira Gonçalves
1,
Caio Almeida de Oliveira
1,
Weslei Augusto Mendonça
1,
Luís Guilherme Teixeira Crusiol
2,
Roney Berti de Oliveira
1,
Renato Herrig Furlanetto
3,
Amanda Silveira Reis
1 and
Marcos Rafael Nanni
1
1
Graduate Program in Agronomy, Department of Agronomy, State University of Maringá, Av. Colombo, 5790, Maringa 87020-900, Brazil
2
Embrapa Soja (Empresa Brasileira de Pesquisa Agropecuária), Londrina 86001-970, Brazil
3
Gulf Coast Research and Education Center, University of Florida, Wimauma, FL 33598, USA
*
Author to whom correspondence should be addressed.
Remote Sens. 2023, 15(19), 4859; https://doi.org/10.3390/rs15194859
Submission received: 31 August 2023 / Revised: 28 September 2023 / Accepted: 2 October 2023 / Published: 7 October 2023
(This article belongs to the Section Environmental Remote Sensing)

Abstract

:
In an effort to improve the efficiency of soil classification, traditional methods are being combined with analytical and computational techniques. This integration has strengthened the connection between conventional classification and the application of machine-learning (ML) models to interpret soil spectral reflectance data. Due to the time and computational cost, several studies are limited to testing the classification performance of a few algorithms and do not always explore the best parameters for model optimization. The study aims to assess the efficiency of combining soil spectral reflectance with prevalent ML models for classifying pedogenetic horizons and soil suborders, enhancing traditional classification methods. We collected seven soil monoliths, previously classified according to the Brazilian Soil Classification System (SiBCS) and soil taxonomy. Using the ASD Fieldspec spectroradiometer, we obtained spectral reflectance samples along each monolith (n = 800 per monolith) to classify horizons and n = 5600 for suborder classification. Spectral fingerprints were obtained and explored by Principal Component Analysis (PCA). The spectral data were subdivided into training (70%) and test (30%) sets and submitted to the Logistic Regression (LR), Artificial Neural Network (NN), Support Vector Machine (SVM), Random Forest (RF), and Gradient Boosting (GB) models for the classification of horizons and suborders, considering the model’s adjustment parameters. Accuracy and F-Score were used to verify the performance of the models. There was a significant influence of particle size and soil organic carbon on the spectral fingerprint of the soils. The PCA indicated that topsoil horizons clustered in most of the monoliths analyzed, while most of the subsoil horizons showed data overlap. The NN model showed the highest accuracy in the classification of horizons (97%), while the SVM showed the lowest performance (52% accuracy). The classification of soil suborders presented accuracies between 95% and 98%. Therefore, our study concludes that spectral data combined with ML models can enhance the discrimination and classification of soil horizons and suborders, improving upon traditional methods.

Graphical Abstract

1. Introduction

The importance of soil as a substrate for life is widely recognized, and this natural resource is fundamental for food production and the development of human activities. The continuous population increase and, consequently, the increase of demand for food and energy sources have causes the intensive use of the soil. This can lead to impacts such as erosion or use beyond its potential [1,2]. Classifying soils allows for a better understanding of their formation processes and the analysis of their physical and chemical properties. This knowledge is important for optimizing land use and conserving this natural resource.
The characterization of attributes and the classification of soils by the traditional method is laborious and time-consuming, and the reagents for soil analysis, which are later discarded, might become potential polluting materials for the environment. In this context, several authors reported the applications of the spectral reflectance of soils in the determination of their physical [3,4,5,6] and chemical attributes, such as soil organic carbon [4,7], color, mineral composition and clay content [8], soil water content [9,10], and iron oxides [11], in addition to evaluating the effectiveness of spectroscopy as a potential tool for soil classification [12,13,14,15].
Spectroscopy is based on the principle that electromagnetic radiation (EMR) is subdivided into different wavelengths and, for each of these wavelengths, there is a spectral behavior of absorption, transmission, and reflection from the different targets that interact with the EMR [16,17,18]. As for the determination of soil characteristics, the spectra encode information about its inherent composition, including minerals, organic compounds, and water. These elements acquire specific characteristics during soil formation, in response to environmental conditions. These encodings are represented in the spectrum as absorptions at specific EMR wavelengths, enabling identification of components, and, consequently, soil characterization [19].
In this context, the search for tools that align efficiency and practicality in soil classification, combining traditional methods with analytical and computational techniques, intensified the relationship between traditional classification and spectral reflectance data modeling using machine-learning (ML) models. Even though it is a recently approached technique, the application of ML models in soil classification already demonstrates promising performance in studies. For example, authors using Random Forest to classify order, suborder, group, and subgroup of Chinese soils, achieved an accuracy of 63%, 62%, 40%, and 22%, respectively [19]. Using Random Forest, researchers achieved accuracies ranging from 44% to 100%, and 72% in the classification of horizons and soil orders [20]. Other studies used Multinomial Logistic Regression to classify soil suborders and obtained an accuracy between 70% and 76%, depending on the combination of horizons used [21]. Using Support Vector Machine for soil classification, authors obtained accuracy of 61% in data validation using the Vis–NIR–MIR spectrum [22]. However, most studies test specific ML models without fine-tuning their parameters due to the time-consuming nature and computational cost generated by the simultaneous modeling of several algorithms [23].
Soil is a complex system with varied physical and chemical compositions. It forms from weathered rocks and minerals on the Earth’s crust. Its genesis strongly depends on the environmental conditions of the atmosphere and lithosphere [23]. In this context, the performance analysis of several ML models is necessary, as well as the verification of optimization parameters that lead to the best performance of the model (avoiding overfitting).
Therefore, the hypothesis of our study is that the effective application of computational classification tools and spectral data can provide robust support to traditional classification methods. To validate this, it is crucial to analyze the performance of different ML models, verify optimization parameters that ensure the best model performance (avoiding overfitting), and utilize a diverse soil database.
The objective of this work is to evaluate the potential of spectral reflectance of soil profiles combined with the most widespread ML models up to the present period, in the classification of pedogenetic horizons and soil suborders, as a support tool for traditional methods of soil classification. The primary goal of this study is to leverage the VIS–NIR–SWIR spectrum for soil horizon characterization while assessing the accuracy and performance of various ML models in classifying pedogenetic horizons and soil suborders, considering model optimization parameters.

2. Material and Methods

2.1. Study Area and Soil Sampling Sites

We collected seven soil monoliths measuring 0.12 m × 1.60 m between March 2022 and January 2023 from soil profiles located in the north–central region of Paraná, Brazil (Figure 1). Each monolith was collected at different times due to the extensive work in selecting location and digging trenches. The selection of sampling points was based on capturing the main soil classes present in the region. The profiles were classified according to the standards of the SiBCS (Brazilian System of Soil Classification) and in accordance with the USDA Keys to Soil Taxonomy [24] (Table 1).
In all profiles, soil samples were collected for chemical analysis (sampling every 10 cm) and for physical analysis (sampling every 5 cm). In addition to the soil attributes used to determine the orders and suborders, the results of soil organic carbon (SOC), measured by the Walkley–Black method, and particle size, analyzed by the pipette method and wet sieving using 0.1 mol L−1 NaOH as a dispersing agent, were obtained specifically to aid in the analysis and interpretation of spectral behavior. For the same purpose, mineralogy results of pedogenetic horizons were also obtained.
The mineralogical composition of the soil horizons was analyzed by X-ray diffraction in a fraction smaller than 0.053 mm, obtained from soil samples previously ground in a porcelain mortar. The samples were placed in aluminum containers and subjected to X-ray diffraction (XRD–6000 equipment®, Shimadzu from Brazil) using CuKα radiation in staggered scans at the theta intervals from 3° to 70° with angular steps of 0.02° 2θ/0.6 s in the Bragg configuration [25]. The data were interpreted by comparison with reference standards contained in the International Center for Diffraction Data [26].

2.2. Spectroscopic Measurement and Preprocessing

After air-drying the monoliths, spectral reflectance samples were collected throughout the monoliths, totaling 800 samples per monolith (Scheme 1), and subdivided between horizons and transitions of pedogenetic horizons. These data were obtained through the ASD Fieldspec 3 Jr Spectroradiometer using the ASD Contact Probe accessory connected to the Fieldspec by a fiber optic cable, which standardizes the incident radiation, eliminating interference from external light through its own light source. The reflectance samples were collected side by side on the monolith (considering 10 mm of spot size). During sampling, the contact probe was subjected to slight inclinations to allow complete sampling of natural elevations and concavities of the aggregates. At these points, more than one sample was collected at different angles to obtain detailed reflectance of the soil. During the sampling process, the samples belonging to each horizon (previously subdivided) were noted. The Spectralon plate (Labsphere®, North Sutton, NH, USA) was used as a reflectance [27,28].
Spectral data were processed in ViewSpec PRO® v6.2 software (Analytical Spectral Devices, Inc., Boulder, CO, USA) and exported in radiance format for later conversion into reflectance (Equation (1)).
R e f l e c t a n c e   ρ = S o i l   r a d i a n c e L a m b e r t i a n   r a d i a n c e

2.3. Exploratory Data Analysis

Principal component analysis (PCA) was used to investigate the clustering of the soil horizons and soil suborders through spectral reflectance. This is a projection method to visualize all the information contained in a dataset and identify patterns of similarity in a multidimensional environment, avoiding noise and redundancies. PCA was applied to the datasets using The Unscrambler® X version 10.4 software (CAMO Software, Oslo, Norway). The clustering of horizons and soil suborders was assessed by score graphs.

2.4. Classification of Soil Horizons and Suborders Using Machine Learning Models

After verifying the homogeneity of the data by the Hotelling’s T2 (p-value ≤ 0.05) and Leverage tests, performed in The Unscrambler® X version 10.4 software (CAMO Software, Oslo, Norway), the datasets for each monolith (n = 800) were randomly divided into training sets (70% of the data) and test sets (30% of the data) randomly. They were then subjected to preprocessing based on the requirements of each learner (a process performed automatically by the software) and machine-learning models for the classification of pedogenetic horizons. Soil classification was also conducted by merging the datasets of the monoliths (n = 5600). These data were also divided into training (70%) and testing (30%), and the same preprocessing and machine-learning models were applied. For each model used, variations in the optimization parameters were tested with the goal of achieving the highest possible accuracy while avoiding overfitting.
To evaluate the performance of the models, the parameters Overall Accuracy classification (Equation (2)), which assesses the proportion of correctly classified data, and F-score (Equation (3)), which is the harmonic mean between Recall (evaluating the method’s performance in detecting positively classified results) and Precision (assessing the accuracy of true positives) were used [29]. The confusion matrices (Equation (4)) of the horizons (only the matrices of the best-performing model) and of the suborders (the confusion matrices of all models) were also analyzed.
O v e r a l l   a c c u r a c y = C o r r e c t l y   c l a s s i f i e d   d a t a A l l   d a t a
F S c o r e = T r u e   P o s i t i v e T r u e   P o s i t i v e + F a l s e   N e g a t i v e + F a l s e   P o s i t i v e 2
C o n f u s i o n   m a t r i x = x 11       x 12 x 1 n x 21       x 22 x 2 n x n 1       x n 2 x n n
The preprocessing and classification were conducted in Orange Data Mining software version 3.32.0 [30], and the graphs were generated with the assistance of data editing software. The machine-learning models used were as follows:
Logistic Regression (LR): A linear machine-learning algorithm commonly used to estimate the probability that an instance belongs to a particular class. It calculates a weighted sum of the input features (plus a bias term) and outputs the logistic function (a sigmoid function that outputs a number between 0 and 1) of this result [29].
Artificial Neural Network (NN): It is composed of units that combine multiple binary inputs to produce a single output [29,31].
Support Vector Machine (SVM): A binary classifier that looks for a line that best separates two classes. It has also been extended to cover multiple classes. The data instances that are closest to the line that best separates the classes are called support vectors and influence where the line is placed [32].
Random Forest (RF): A nonlinear machine-learning algorithm that builds a binary tree from the training data. The algorithm uses the training data to select the best points to split the data to minimize a cost metric. First, the algorithm divides the training set into two subsets using a single feature k and a threshold tk. After this, it recursively divides the subsets using the same logic, and so on. It stops the divide once it reaches the selected maximum depth [29,32].
Gradient Boosting (GB): Considered one of the best-performing techniques for the classification of sets, it works by adding predictors sequentially to a set, each one correcting its predecessor. Furthermore, it adjusts the new predictor to the residual errors made by the previous predictor [29].

3. Results

3.1. Soil Properties

Table 2 presents the analyzed soil suborders: respective horizons, SOC, particle size, and color observed in each horizon. The Oxisols (Typic Eutrudox, Kandiudalfic Eutrudox and both Typic Hapludox) and the Aquic Udorthents, in general, showed a high amount of SOC in the A horizon. This is directly related to soil management, environmental conditions, and soil texture. These soils exhibited higher clay content, and this small particle size composition is considered a primary determinant of soil organic matter (SOM) content due to the increased surface area for the formation of organo–mineral complexes and also responsible for the formation of micropores, which protect the SOM of decomposition. In contrast to the clayey soils, the soils with larger particle sizes (Arenic Kandiustults and Typic Kandiaqualfs) showed lower levels of SOC.
The mineralogical composition of the analyzed profiles is presented in Figure 2. The suborders Arenic Kandiustults, Typic Kandiaqualfs, and Typic Hapludox Loamy (Figure 2a,b,d) are soils found in a border region between the Paranapanema formation, characterized by thick and inflated pahoehoe basalt flows, and the Goio Erê, characterized by fine to very fine subarkosic sandstones [33]. The X-ray diffractograms of the horizons of these profiles show kaolinite, some iron oxides, and especially intense reflections of quartz, which are characteristic of sandy soils [34].
The other Oxisols (Figure 2c,e,f) are found in the Paranapanema formation and are characterized by a large amount of iron oxides, especially hematite, characteristic of this region. However, Aquic Udorthents (Figure 2g), located in a low position in the landscape and constantly saturated with water, present the same minerals characteristic of the other soils obtained from the Paranapanema Formation, but in addition, it also presents minerals 2:1. Often, saturated soils present a slower weathering process, in addition to the synthesis of new products from solubilized substances, such as the formation of 2:1 minerals, when there are large amounts of soluble cations, or formation of ferrous compounds, from the oxidation of ferrous ions into ferric ions [35].

3.2. Soil Horizons and Spectral Behaviour

Figure 3 presents the spectral fingerprints of the horizons of each soil suborder and the general reflectance of each soil. High spectral reflectance values were observed in horizons with higher sand content. Furthermore, the general spectral reflectance of suborders with higher sand contents (Arenic Kandiustults, Typic Kandiaqualfs, and Typic Hapludox Loamy) was also higher than the reflectance of soils with higher clay content.
In addition to particle size, soil organic matter (SOM) content, and, consequently, its SOC indicator, are also predominant factors in regard to the amplitude of soil spectral reflectance. In Figure 3a,b, the A horizon presents higher spectral reflectance than the other pedogenetic horizons, while for the other soils, this relationship is inverted, with the A horizon presenting the lowest reflectance values in the entire spectral curve. The low reflectance observed in these surface horizons is due to the increase in SOM, as seen by the SOC content previously presented in Table 2 (see in Section 3.1).
The presence of certain minerals can also exhibit absorption characteristics at specific wavelengths. Figure 3c shows absorptions at ≈2205 nm due to the presence of kaolinite and at approximately 2360 nm due to the presence of gibbsite. In Figure 3d, a low peak caused by the presence of Fe oxides can be seen, followed by a shoulder-shaped absorption formed between 750 and 900 nm, which represents the Fe-crystallinity and the more concave, the more Fe-crystalline, as seen in Figure 3e. Water adsorbed on clay minerals is represented by adsorption at ≈1920 nm. When clay minerals are 1:1, this absorption is less intense, while when clay minerals are 2:1, the absorption observed in this band is more intense, as seen in Figure 3f.
In the spectral fingerprints of the suborders (Figure 3h) the distinction of mineralogical composition between the different soils analyzed is clear. It is noted that Arenic Kandiustults, Typic Kandiaqualfs, and Aquic Udorthents have a low composition of Fe oxides, as evidenced by X-Ray diffraction (Figure 2a,b,g), while the others have the characteristic absorptions of Fe oxides, as evidenced by the red color (reported in Table 2, Section 3.1).

3.3. Principal Component Analysis (PCA)

Figure 4 displays the results of the PCA in clustering soil horizons and suborders. The first two Principal Components (PC) explained over 96% of the variance in the spectral data for both the soil horizon datasets and the soil suborder datasets. When the majority of the variance can be explained in the first and second principal components, the remaining components can be omitted without any loss of essential information from the dataset [36].
The clustering for Arenic Kandiustults (Figure 4a) clearly separated the A, Ab, and transitional BA horizons, distinguishing the surface horizons from the other horizons and placing them in the lower quadrants. For the Bt horizons, there is a subdivision gradient, but the prevailing aspect is the overlap of data distributed in the upper quadrants. For Typic Kandiaqualfs (Figure 4b), there was an overlap between the A, EA, E, and BE horizons, while Btg and Cg showed clear clustering separate from the others. This is likely due to both the change in coloration induced by the gleying process and the increase in clay observed in the Btg and Cg horizons. These horizons with the gleying process also displayed clear clustering among themselves, probably because of an average increase of 45% in clay content observed in the Cg horizon compared to Btg.
The Oxisols Typic Eutrudox (Figure 4c) showed overlap between Ap and Bw1 horizons, while Kandiudalfic Eutrudox (Figure 4f) showed clear clustering of the surface horizons due to the increase in SOC, as organic matter absorbs EMR at various wavelengths across the entire VIS–NIR spectrum. In contrast, the other horizons showed overlapping data, which is expected since the subdivision of the Bw and Bt horizons of these soils, in these cases, were characterized by differences in the aggregate structure size. Structural features are yet to be identified as differentiating characteristics of spectral reflectance. Meanwhile, both the Oxisols and Typic Hapludox (Figure 4d,e) showed significant overlap of spectral data.
For the Aquic Udorthents (Figure 4g), the A and Cgv horizons overlapped, despite the increase in SOC observed in the surface horizon. The Cg and Cgss horizons displayed a clear subdivision of clusters, primarily due to the significant color change, as seen in Table 2, agreeing with the characteristics of their spectral fingerprint in the visible region (Figure 3g), as well as the increase in clay content observed in the Cgss horizon.
The soil suborders exhibited clear clustering (Figure 4h) separating soils with higher sand content (allocated in the right quadrants) from those with higher clay content (allocated in the left quadrants). The suborders AK, TK, and THL, despite having the same source material, showed clear clustering, while the suborder TE presented data overlap, especially with the THV suborder. For KE, AU, and THV suborders, clustering can also be observed with very close centroids and an overlapping of some samples.

3.4. Classification of Soil Horizons Using Machine Learning Models

Table 3 presents the performance of the ML models in classifying soil horizons. The models developed using the GB method showed an accuracy ranging between 0.72 and 0.92, and an F-score between 0.70 and 0.91 for horizon classification. The classification of THV horizons demonstrated the lowest accuracy, while AU horizons presented the highest accuracy. These classification performance results are satisfactory, especially when considering the classification of THV horizons, due to the intense data overlap observed in the PCA (Figure 4e) and in the spectral fingerprints (Figure 3e). Meanwhile, LR exhibited accuracy between 0.65 and 0.93, and an F-Score between 0.63 and 0.91, as well as for THV and AU, respectively.
The observed accuracy for SVM was 0.52 for KE, 0.55 for THV, 0.89 for AU, and intermediate values between these for the other suborders. This model showed the worst performance in horizon classification among the models analyzed. Meanwhile, RF exhibited accuracy values ranging between 0.65 and 0.90, and F-Score between 0.65 and 0.89 for the analyzed suborders.
In this study, NN (with the optimal parameters Activation: logistic and Solver: Adam) was the model that showed the best performance in the classification of pedogenetic horizons, resulting in an accuracy between 0.82 and 0.97 as well as an F-Score between 0.81 and 0.97 for the soils analyzed. In this context, the confusion matrix for the horizons classified by this model is presented.
The confusion matrices for the NN classification model (Figure 5) display the percentages of correctly classified data for each horizon during the testing phase. It can be observed that the highest percentages of confusion in horizon classification occur between horizons that are immediately below or above the analyzed horizon. Nonetheless, the lowest accuracy observed was 81%, and the highest reached 99%.

3.5. Classification of Soil Suborders Using Machine Learning Models

Table 4 shows the performance of the ML models in the classification of soil suborders and the used parameters. The learners SVM and RF had the lowest accuracies (both with 0.95) and F-Score (0.94 and 0.95, respectively). However, these values are satisfactory for the classification of suborders, especially considering that some of them are within the same soil order. Hence, they might present very similar characteristics, yet spectral reflectance was able to discriminate the datasets (Table 2 and Figure 3c–f).
GB displayed an accuracy of 0.97 and F-Score of 0.96 in classifying the suborders, followed by LR with 0.98 and 0.97 for accuracy and F-Score, respectively. Meanwhile, NN showed both accuracy and F-Score at 0.98. LR and NN presented the best performance in classifying the suborders used in this study.
Figure 6 displays the confusion matrices for each learner. Generally, the soils with the highest sand content, AK and TK, originating from the sandstone/basalt transition formation, had the highest proportion of confusion between them. Nonetheless, this was a minor percentage when compared to the accuracy percentage of data allocation within the same suborder, which was between 85.9% and 97.9% for AK and between 89.6% and 96.5% for TK. Regarding the Oxisols, they generally showed the highest proportion of confusion within this order, especially TE and THV. The confusion in sample allocation between them ranged from a minimum of 0.2% to a maximum of 6.9%. Considering this, they achieved an accuracy rate in classification above 93.1%. For the Aquic Udorthents, the models displayed an accuracy range from 99.6% to 100% (Figure 6).

4. Discussion

4.1. Soil Horizons and Spectral Behavior

Studies verified that the greater amplitude in spectral reflectance observed in sandy soils is due to the strong reflectance promoted by quartz, especially in the SWIR region, which promotes an increase in the amplitude of the entire spectral curve [37], as observed in Figure 3a,b. These authors also comment that sandy soils show reflectance spectra with an upward shape up to approximately 1700 nm and, later, a downward shape within the rest of the infrared spectrum. In addition, the SOM presents characteristics of absorption of EMR distributed throughout the VIS–NIR–SWIR reflectance spectrum. Thus, the greater its content, the lower the reflectance distributed throughout the spectral curve. In addition, when in large amounts, it can often even hide the soil mineralogy characteristics [4,20,38].
Absorption characteristics observed in the spectral fingerprint, due to the interaction of EMR with soil minerals, arise from electronic processes and vibrations. When EMR interacts with matter, the electrons will selectively absorb the radiation. This is because electrons require different amounts of energy to remain bonded to the nucleus, depending on their specific energy level and sub-level. The absorbed radiation causes changes in the atomic state, either promoting an electron to jump to a higher energy level or merely stimulating a vibrational movement. The outcome depends on the type of radiation (wavelength), resulting in a specific spectral absorption characteristic [16,39].
Soils with a strong red color, characteristic of hematite abundance (Fe2O3), have a strong concave shape around 450 nm and shoulder-shaped absorption between 610 and 811 nm, as observed in Figure 3c–f. However, goethite (FeO(OH); yellow color) has a narrow concave shape and higher reflectance intensity at the same wavelengths [13,40]. Authors working with different soils in a large region of the State of Rio Grande do Sul, Brazil, observed different minerals in the VIS–NIR–SWIR spectral region, such as hematite (at 523, 750 and 880 nm) and kaolinite (2200 nm), in addition to potassium concentrations with the presence of Fe–OH (2480 nm) and H–O–H bending in structural/adsorbed H2O (1910 nm) [16]. In addition, valley-shaped absorption, around 2204 nm, is due to Al–OH absorption and small absorption around 2280 nm, which may be related to Fe–OH from isomorphic substitution in the octahedral sheet, for example, in montmorillonite [40].

4.2. Principal Component Analysis (PCA)

Authors analyzing diagnostic horizon characteristics of seven Spodosols, two Ultisols, and one Inceptisol observed that the first two principal components explained 75% of the data variation [41]. These authors also mentioned that the PC1 loadings from 500–1350 nm corresponded to the SOC absorption, and that goethite promotes absorption in the 780–1150 nm range. Added to that, authors attributed to PC1 the variation resulting from illuviation, representing SOC and the accumulation of Fe oxides and clay. Meanwhile, PC2 aligned with the abundance of SOC. In this work, we also observed an important influence of SOC on horizon clustering, as observed in the clustering of topsoil horizons in Figure 4f,g.
The different reflectance patterns leading to the clustering of horizons and suborders are due to the physical, chemical, and mineralogical properties of the soil. These variations change the absorption and reflection features across the VIS–NIR–SWIR spectral bands [16]. A study with British soils showed significant wavelengths for absorption bands associated with clay minerals from 2204 to 2211 nm (illite, kaolinite), 2340 nm (illite), and 2207 nm (smectite) [6]. Absorption bands commonly linked to hematite and goethite were found at 920 nm and 880 nm, respectively. According to Nanni e Demattê (2006) [42], researchers, inflections at 1417 and 1927 nm indicate the presence of water and -OH groups, while the inflection around 2265 nm indicates the presence of gibbsite. Clay minerals also intersect with the SWIR region (1000–2500 nm), especially around 2200 nm, and 2315 nm bands and 2385 nm features of illite at 2352 nm and 2445 nm [18,43]. The most crucial spectral region for predicting clay, sand, and silt is located between 1900 and 2400 nm [4].
Certain components may display overlapping absorption bands; for example, iron oxides in the VIS–NIR range and clay minerals in the SWIR range. In general, a previously defined mineral composition might contribute to differentiation. Regarding iron oxides, the most distinctive feature is their absorption at approximately 900 nm. However, this absorption band is shared by goethite, lepidocrocite, maghemite, and ferrihydrite, among others. Thus, it is necessary to consider secondary absorption peaks at approximately 500 nm and 700 nm for their characterization [43].
These peaks and absorptions that characterize the spectral fingerprints of each horizon and each suborder allow for the clustering of spectral samples, as well as the training of statistical–mathematical models for classifying spectral samples with similar characteristics.

4.3. Classification of Soil Horizons Using Machine-Learning Models

The classification of pedogenetic horizons using machine-learning models is a technique recently addressed by researchers. Therefore, there are still few studies to compare these results. Corroborating the results from this research, authors using VIS–NIR spectra and Fuzzy clustering to recognize morphological characteristics in soil horizons, concluded that this approach offers more information than the traditional description and avoids observer bias when describing a soil core [14]. Authors applied Random Forest for classification of five master horizons and five B horizons, using MIR spectra, and achieved an overall accuracy ranging between 44% and 100% [20].
In this research, the confusion matrix (Figure 5, Section 3.4) shows that errors in sample allocation occur mainly between adjacent horizons. This is expected since adjacent horizons are genetically related in terms of their chemical, physical, and biological attributes [35]. Nonetheless, the lowest accuracy observed was 81%, and the highest reached 99%. The authors showed the discrimination of horizons of Australian soils using VIS–NIR spectra and canonical variate analysis [44]. Through the confusion matrix, they observed that more than 65% of each of the surface and transitional horizons had their samples correctly classified or reallocated to another topsoil horizon. The B, C, and transitional BC horizons were correctly assigned in more than 50% of the cases or reallocated to another subsoil horizon with similar spectral characteristics. When studying the classification of different parent materials of soils using the machine-learning models Random Forest, Support Vector Machine, and Linear Discriminant analysis, the authors noted that despite differences in the patterns of variable weights, all the models identified a set of variables that strongly correlate with different clusters of parent materials [45]. In other words, they are finding distinct ways to discern each class of parent materials.

4.4. Classification of Soil Suborders Using Machine Learning Models

Authors classified five soil orders from China using Support Vector Machine with VIS–NIR–MIR spectra [22]. Corroborating this work, they achieved an accuracy of 64% when using only VIS–NIR spectra and an accuracy of 61% in data validation when combining VIS–NIR–MIR spectra. Another study characterized and classified soil orders using MIR spectra and Randon Forest as the classification model [20]. These authors noted that many absorption features in the MIR spectra were caused by organic functional groups, clay minerals, quartz, and carbonates in soils. They achieved 72% accuracy in classifying eight soil orders.
Analyzing the classification of five orders (Aquerts, Aquepts, Udepts, Udalfs, and Orthents, according Soil Taxonomy), six suborders and 21 groups of soils using VIS–NIR spectra and Multinomial LR, authors obtained accuracies of 76.3%, 71.3%, and 70.3% using spectra of topsoil, subsoil, and combined horizons, respectively [21]. Meanwhile, the results for the level using the topsoil horizon were 40.5%. These authors only used two horizons as the dataset for each soil profile and mentioned that a more comprehensive spectral interpretation of a profile is required. They suggested additional support for inferences, such as comparisons between horizons within a profile, similar to what we conducted in this study (Table 2 and Figure 5).
Working with portable X-ray fluorescence (pXRF) data to characterize seven different soil orders in Brazilian tropical soils, authors explored the efficacy of the Support Vector Machine, Artificial Neural Network and Random Forest, both with and without the Principal Component Analysis (PCA) pretreatment for soil prediction [45]. They achieved the best accuracies for order (81.2%) and suborder (74.3%) using the Random Forest model without PCA pretreatment. Meanwhile, authors applied Principal Component Analysis to extract predictor variables for soil classification in VIS–NIR spectra [19]. They then used the Random Forest model to classify and identify soil profile classes at the order, suborder, group, and subgroup levels for Chinese soils. Their accuracy rates were 63%, 62%, 40%, and 22% for orders, suborders, groups, and subgroups, respectively.
In this work, the Aquic Udorthents even resulted in 100% of data allocated correctly. This suborder originates from the same source material (Paranapanema formation) as most of the Oxisols analyzed here (Table 2 and Figure 3c–f). However, its lower position in the landscape, combined with water saturation, resulted in a change in its color and an increase in its mineralogy. Consequently, this led to a specific absorption rate of radiation at molecular vibration frequencies for this soil [22]. Some suborders that have pedogenetic processes closely associated with spectrally active soil properties or have characteristic profile patterns can be identified with higher precision, even when they represent a smaller volume of the dataset [46].
Among the five learners evaluated, four of them (LR, GB, RF, and NN) correctly allocated more than 90% of the samples. Both the NN and LR learners achieved a 100% correct classification rate for the suborders THL and AU, respectively. This highlights that using spectral reflectance combined with ML models is a promising tool to support traditional soil classification by introducing an analytical and automated approach.

5. Conclusions

Soil characteristics and properties influence the VIS–NIR–SWIR spectral reflectance of pedogenetic horizons and suborders, allowing discrimination between them. Despite the overlapping of spectral information for some horizons observed in the analysis of spectral fingerprints and in the clustering of scores by PCA, the most widespread machine-learning models currently classified pedogenetic horizons with accuracies ranging from 52% (by the SVM model) to 97% (by the NN model).
The classification of soil suborders by ML models is promising, with accuracies of 95% for RF and SVM, 97% for GB, and 98% for LR and NN. Even for suborders belonging to the same soil order, models correctly allocated 85.9% to 100% of the spectral samples.
These results highlight the potential of using spectral reflectance with the currently most widespread machine-learning models, in discriminating and classifying the variability of soil profiles and suborders. The use of a computational method as a tool aligned with traditional classification methods can bring optimization to the process and an analytical perspective that contributes to classification standards.

Author Contributions

Conceptualization, K.M.d.O.; Data curation, K.M.d.O.; Formal analysis, K.M.d.O. and J.V.F.G.; Funding acquisition, K.M.d.O. and R.F.; Investigation, K.M.d.O. and L.G.T.C.; Methodology, K.M.d.O. and J.V.F.G.; Project administration, K.M.d.O.; Resources, K.M.d.O.; Software, K.M.d.O. and J.V.F.G.; Supervision, K.M.d.O.; Validation, K.M.d.O. and R.F.; Visualization, K.M.d.O., R.B.d.O., A.S.R., M.R.N., R.H.F., W.A.M., and C.A.d.O.; Writing—original draft, K.M.d.O.; Writing—review and editing, K.M.d.O., R.F., J.V.F.G., and L.G.T.C. All authors have read and agreed to the published version of the manuscript.

Funding

This study was financed by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior: 001, Brasil (CAPES), the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), and the CEAGRE − Centro de Excelência em Agricultura Exponencial.

Data Availability Statement

Not applicable.

Acknowledgments

Thanks are due to Programa de Pós–Graduação em Agronomia (PGA–UEM) at the State University of Maringá for encouragement and supporting communication.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Demattê, J.L.I.; Nanni, M.R. Management Options in Sandy Soils. Bol. Info. (SBCS) 2018, 44, 16–21. [Google Scholar]
  2. ONU—United Nations. World Population Prospects 2019: Highlights. Depart. of Economic and Social Affairs, Population Division, 2019. Available online: https://www.un.org/development/desa/pd/news/world-population-prospects-2019-0 (accessed on 26 August 2023).
  3. Vasques, G.M.; Dematte, J.A.M.; Viscarra Rossel, R.A.; Ramírez-Lopez, L.; Terra, F.S. Soil Classification Using Visible/near-Infrared Diffuse Reflectance Spectra from Multiple Depths. Geoderma 2014, 223, 73–78. [Google Scholar] [CrossRef]
  4. Castaldi, F.; Palombo, A.; Santini, F.; Pascucci, S.; Pignatti, S.; Casa, R. Evaluation of the Potential of the Current and Forthcoming Multispectral and Hyperspectral Imagers to Estimate Soil Texture and Organic Carbon. Remote Sens. Environ. 2016, 179, 54–65. [Google Scholar] [CrossRef]
  5. Chagas, C.S.; Carvalho, W., Jr.; Bhering, S.B.; Calderano Filho, B. Spatial Prediction of Soil Surface Texture in a Semiarid Region Using Random Forest and Multiple Linear Regressions. Catena 2016, 139, 232–240. [Google Scholar] [CrossRef]
  6. Rawlins, B.G.; Kemp, S.J.; Milodowski, A.E. Relationships between particle size distribuition and VNIR reflectance spectra are weaker for soils formed from bedrock compared to transported parente materials. Geoderma 2011, 166, 84–91. [Google Scholar] [CrossRef]
  7. Li, S.; Shi, Z.; Chen, S.; Ji, W.; Zhou, L.; Yu, W.; Webster, R. In Situ Measurements of Organic Carbon in Soil Profiles Using Vis-NIR Spectroscopy on the Qinghai-Tibet Plateau. Environ. Sci. Technol. 2015, 49, 4980–4987. [Google Scholar] [CrossRef]
  8. Viscarra Rossel, R.A.; Cattle, S.R.; Ortega, A.; Fouad, Y. In Situ Measurements of Soil Colour, Mineral Composition and Clay Content by Vis–NIR Spectroscopy. Geoderma 2009, 150, 253–266. [Google Scholar] [CrossRef]
  9. Richter, K.; Palladino, M.; Vuolo, F.; Dini, L.; D’Urso, G. Spatial distribuition of soil water contente from airborne termal and optical remote sensing data. Remote Sens. Agric. Ecosyst. Hydrol. 2009, 7472, 74720W. [Google Scholar] [CrossRef]
  10. Sobrino, J.A.; Franch, B.; Mattar, C.; Jiménez-Muñoz, J.C.; Corbari, C. A Method to Estimate Soil Moisture from Airborne Hyperspectral Scanner (AHS) and ASTER Data: Application to SEN2FLEX and SEN3EXP Campaigns. Remote Sens. Environ. 2012, 117, 415–428. [Google Scholar] [CrossRef]
  11. Sellitto, V.M.; Fernandes, R.B.A.; Barrón, V.; Colombo, C. Comparing two diferente spectroscopic techniques for the characterization of soil iron oxides: Diffuse versus bi-directional reflectance. Geoderma 2009, 149, 2–9. [Google Scholar] [CrossRef]
  12. Chen, S.; Li, S.; Ma, W.; Ji, W.; Xu, D.; Shi, Z.; Zhang, G. Rapid Determination of Soil Classes in Soil Profiles Using Vis–NIR Spectroscopy and Multiple Objectives Mixed Support Vector Classification: Soil Classification Using Vis-NIR Spectroscopy. Eur. J. Soil Sci. 2019, 70, 42–53. [Google Scholar] [CrossRef]
  13. Demattê, J.A.M.; Bellinaso, H.; Romero, D.J.; Fongaro, C.T. Morphological Interpretation of Reflectance Spectrum (MIRS) Using Libraries Looking towards Soil Classification. Sci. Agric. 2014, 71, 509–520. [Google Scholar] [CrossRef]
  14. Fajardo, M.; McBratney, A.; Whelan, B. Fuzzy Clustering of Vis–NIR Spectra for the Objective Recognition of Soil Morphological Horizons in Soil Profiles. Geoderma 2016, 263, 244–253. [Google Scholar] [CrossRef]
  15. Jeune, W.; Francelino, M.R.; Souza, E.D.; Fernandes Filho, E.I.; Rocha, G.C. Multinomial Logistic Regression and Random Forest Classifiers in Digital Mapping of Soil Classes in Western Haiti. Rev. Bras. De Ciência Do Solo 2018, 42. [Google Scholar] [CrossRef]
  16. Coblinski, J.A.; Giasson, E.; Demattê, J.A.M.; Dotto, A.C.; Costa, J.J.F.; Vasát, R. Prediction of Soil Texture Classes through Different Wavelength Regions of Reflectance Spectroscopy at Various Soil Depths. Catena 2020, 189, 104485. [Google Scholar] [CrossRef]
  17. Fonseca, A.D.; Fernandes, J.C. Remote Detection; Lidel: Lisboa, Portugal, 2004; p. 224. [Google Scholar]
  18. Viscarra Rossel, R.A.; Behrens, T.; Bem-Dor, E.; Brown, D.J.; Demattê, J.A.M.; Shepherd, K.D.; Shi, Z.; Stenberg, B.; Stevens, A.; Adamchuk, V.; et al. Aglobal spectral library to characterize the world’s soil. Earth Sci. Rev. 2016, 155, 198–230. [Google Scholar] [CrossRef]
  19. Xie, X.L.; Li, A.B. Identification of Soil Profile Classes Using Depth-Weighted Visible–near-Infrared Spectral Reflectance. Geoderma 2018, 325, 90–101. [Google Scholar] [CrossRef]
  20. Zhang, Y.; Hartemink, A.E.; Huang, J. Spectral signatures of soil horizons and soil orders–An exploratory study of 270 soil profiles. Geoderma 2021, 389, 114961. [Google Scholar] [CrossRef]
  21. Zeng, R.; Zhang, G.L.; Li, D.C.; Rossiter, D.G.; Zhao, Y.G. How Well Can VNIR Spectroscopy Distinguish Soil Classes? Bios. Eng. 2016, 152, 117–125. [Google Scholar] [CrossRef]
  22. Xu, H.; Xu, D.; Chen, S.; Ma, W.; Shi, Z. Rapid Determination of Soil Class Based on Visible-near Infrared, Mid-Infrared Spectroscopy and Data Fusion. Remote Sens. 2020, 12, 1512. [Google Scholar] [CrossRef]
  23. Ben-Dor, E.; Chabrillat, S.; Demattê, J.A.M.; Taylor, G.R.; Hill, J.; Whiting, M.L.; Sommer, S. Using Imaging Spectroscopy to Study Soil Properties. Remote Sens. Environ. 2009, 113, S38–S55. [Google Scholar] [CrossRef]
  24. Soil Survey Staff, S.S. Keys to Soil Taxonomy; United States Department of Agriculture: Washington, DC, USA, 2014. [Google Scholar]
  25. Medeiros, S.G.; Dutra, R.P.S.; Grilo, J.P.F.; Martinelli, A.E.; Paskocimas, C.A.; Macedo, D.A. Preparation of low-cost alumina-mullite composites via reactive sintering between a kaolinite clay from Paraíba and aluminum hydroxide. Cerâmica 2016, 62, 266–271. [Google Scholar] [CrossRef]
  26. Smith, D.K.; Jenkins, R. The Powder Diffraction File: Past, Present, and Future. J. Res. Natl. Inst. Stand. Technol. 1996, 101, 259–271. [Google Scholar] [CrossRef] [PubMed]
  27. Nanni, M.R.; Demattê, J.A.M.; Rodrigues, M.; Santos, G.L.A.A.D.; Reis, A.S.; Oliveira, K.M.D.O.; Cezar, E.; Furlanetto, R.H.; Crusiol, L.G.T.; Sun, L. Mapping Particle Size and Soil Organic Matter in Tropical Soil Based on Hyperspectral Imaging and Non-Imaging Sensors. Remote Sens. 2021, 13, 1782. [Google Scholar] [CrossRef]
  28. LRCL—Labsphere Reflectance Calibration Laboratory. Spectral Reflectance Target Calibrated From 0.25–2.5 nm Reported in 0.050 nm Intervals, 1st ed.; LRCL: London, UK, 2009. [Google Scholar]
  29. Geron, A. Hands-On Machine Learning with Scikit-Learn TensorFlow: Concepts, Tools and Techniques to Build Intelligent Systems; O’Reilly Media: Newtown, MA, USA, 2017. [Google Scholar]
  30. Demsar, J.; Curk, T.; Erjavec, A.; Gorup, C.; Hocevar, T.; Milutinovic, M.; Mozina, M.; Polajnar, M.; Toplak, M.; Staric, A.; et al. Orange: Data Mining Toolbox in Python. J. Mach. Learn. Res. 2013, 14, 2349–2353. [Google Scholar]
  31. Kriegeskorte, N.; Golan, T. Neural network models and deep learning. Curr. Biol. 2019, 29, R231–R236. [Google Scholar] [CrossRef] [PubMed]
  32. Brownlee, J. Machine Learning Mastery with Python: Understand Your Data, Create Accurate Models and Work Projects End-To-End. Machine Learning Mastery: 2016. Available online: https://machinelearningmastery.com/machine-learning-with-python/ (accessed on 3 August 2023).
  33. Besser, M.L.; Brumatti, M.; Spisila, A.L. Mapa Geológico e de Recursos Minerais do Estado do Paraná. Programa Geologia, Mineração e Transformação Mineral. Curitiba: SGB-CPRM, 2021, Escala 1:600.000. Available online: https://rigeo.cprm.gov.br/jspui/handle/doc/22492 (accessed on 20 August 2023).
  34. Melo, V.F.; Alleoni, L.R.F. Química e Mineralogia do Solo: Parte II—Aplicações; Soc. Bras. de Ciência do Solo: Viçosa, MG, Brasil, 2009. [Google Scholar]
  35. Lepsch, I.F. 19 Lições de Pedologia; Oficina de Textos: São Paulo, Brasil, 2011; p. 456. [Google Scholar]
  36. Martínez-Martínez, V.; Gomez-Gil, J.; Machado, M.L.; Pinto, F.A.C. Leaf and Canopy Reflectance Spectrometry Applied to the Estimation of Angular Leaf Spot Disease Severity of Common Bean Crops. PLoS ONE 2018, 13, e0196072. [Google Scholar] [CrossRef]
  37. Demattê, J.A.M.; Terra, F.D.S.; Quartaroli, C.F. Spectral Behavior of Some Modal Soil Profiles from São Paulo State, Brazil. Bragantia Bol. Tec. Do Inst. Agro. Do Estado De São Paulo 2012, 71, 413–423. [Google Scholar] [CrossRef]
  38. Demattê, J.A.; Araújo, S.R.; Fiorio, P.R.; Fongaro, C.T.; Nanni, M.R. Espectroscopia VIS-NIR-SWIR na avaliação de solos ao longo de uma topossequência em Piracicaba (SP). Rev. Ciência Agron. 2015, 46, 679–688. [Google Scholar] [CrossRef]
  39. Moreira, M.A. Fundamentos do Sensoriamento Remoto e Tecnologias de Aplicação, 3rd ed.; Viçosa; UFV: Florestal, MG, USA, 2007; p. 314. [Google Scholar]
  40. Viscarra Rossel, R.A.; Behrens, T. Using Data Mining to Model and Interpret Soil Diffuse Reflectance Spectra. Geoderma 2010, 158, 46–54. [Google Scholar] [CrossRef]
  41. Huang, Y.C.; Huang, C.Y.; Minasny, B.; Chen, Z.S.; Hseu, Z.Y. Using PXRF and Vis-NIR for Characterizing Diagnostic Horizons of Fine-Textured Podzolic Soils in Subtropical Forests. Geoderma 2023, 437, 116582. [Google Scholar] [CrossRef]
  42. Nanni, M.R.; Demattê, J.A.M. Spectral Reflectance Methodology in Comparison to Traditional Soil Analysis. Soil Sci. Soc. Am. J. 2006, 70, 393–407. [Google Scholar] [CrossRef]
  43. Zhao, L.; Hong, H.; Liu, J.; Fang, Q.; Yao, Y.; Tan, W.; Yin, K.; Wang, C.; Chen, M.; Algeo, T.J. Assessing the Utility of Visible-to-Shortwave Infrared Reflectance Spectroscopy for Analysis of Soil Weathering Intensity and Paleoclimate Reconstruction. Palaeogeogr. Palaeoclimatol. Palaeoecol. 2018, 512, 80–94. [Google Scholar] [CrossRef]
  44. Viscarra Rossel, R.A.; Webster, R. Discrimination of Australian Soil Horizons and Classes from Their Visible-near Infrared Spectra. Eur. J. Soil Sci. 2011, 62, 637–647. [Google Scholar] [CrossRef]
  45. Mancini, M.; Weindorf, D.C.; Silva, S.H.G.; Chakraborty, S.; dos Santos Teixeira, A.F.; Guilherme, L.R.G.; Curi, N. Parent Material Distribution Mapping from Tropical Soils Data via Machine Learning and Portable X-Ray Fluorescence (PXRF) Spectrometry in Brazil. Geoderma 2019, 354, 113885. [Google Scholar] [CrossRef]
  46. Andrade, R.; Silva, S.H.G.; Weindorf, D.C.; Chakraborty, S.; Faria, W.M.; Guilherme, L.R.G.; Curi, N. Tropical Soil Order and Suborder Prediction Combining Optical and X-Ray Approaches. Geoderma 2020, 23, e00331. [Google Scholar] [CrossRef]
Figure 1. Location map of soil profile sampling sites in the state of Paraná, Brazil. Different soil classes were sampled.
Figure 1. Location map of soil profile sampling sites in the state of Paraná, Brazil. Different soil classes were sampled.
Remotesensing 15 04859 g001
Scheme 1. General scheme for data acquisition and processing. (ac) shaping, structuring, and collecting the monolith, (d) collecting spectral data by ASD spectrorradiometer, (e,f) database for horizons classification and merge of database for soil suborders classification, (g) Spectral fingerprint analysis, (h) machine-learning model analysis.
Scheme 1. General scheme for data acquisition and processing. (ac) shaping, structuring, and collecting the monolith, (d) collecting spectral data by ASD spectrorradiometer, (e,f) database for horizons classification and merge of database for soil suborders classification, (g) Spectral fingerprint analysis, (h) machine-learning model analysis.
Remotesensing 15 04859 sch001
Figure 2. X-ray diffractogram of soil classes. (a) Arenic Kandiustults, (b) Typic Kandiaqualfs, (c) Typic Eutrudox, (d) Typic Hapludox Loamy, (e) Typic Hapludox Very-fine, (f) Kandiudalfic Eutrudox, (g) Aquic Udorthents. Kt: kaolinite, Qz: Quartz, Hm: Hematite, Gb: Gibbsite, Mg: Maghemite, Gt: Goethite.
Figure 2. X-ray diffractogram of soil classes. (a) Arenic Kandiustults, (b) Typic Kandiaqualfs, (c) Typic Eutrudox, (d) Typic Hapludox Loamy, (e) Typic Hapludox Very-fine, (f) Kandiudalfic Eutrudox, (g) Aquic Udorthents. Kt: kaolinite, Qz: Quartz, Hm: Hematite, Gb: Gibbsite, Mg: Maghemite, Gt: Goethite.
Remotesensing 15 04859 g002
Figure 3. Mean reflectance of soil horizons and suborders. (a) Arenic Kandiustults, (b) Typic Kandiaqualfs, (c) Typic Eutrudox, (d) Typic Hapludox Loamy, (e) Typic Hapludox very fine, (f) Kandiudalfic Eutrudox, (g) Aquic Udorthents, (h) All suborders. Dashed circles indicate high concavities. Dashed arrows indicate small-sized particles.
Figure 3. Mean reflectance of soil horizons and suborders. (a) Arenic Kandiustults, (b) Typic Kandiaqualfs, (c) Typic Eutrudox, (d) Typic Hapludox Loamy, (e) Typic Hapludox very fine, (f) Kandiudalfic Eutrudox, (g) Aquic Udorthents, (h) All suborders. Dashed circles indicate high concavities. Dashed arrows indicate small-sized particles.
Remotesensing 15 04859 g003
Figure 4. Principal Component Analysis (PCA) of the soil horizons and suborders. (a) Arenic Kandiustults, (b) Typic Kandiaqualfs, (c) Typic Eutrudox, (d) Typic Hapludox Loamy, (e) Typic Hapludox Very fine, (f) Kandiudalfic Eutrudox, (g) Aquic Udorthents, (h) All suborders.
Figure 4. Principal Component Analysis (PCA) of the soil horizons and suborders. (a) Arenic Kandiustults, (b) Typic Kandiaqualfs, (c) Typic Eutrudox, (d) Typic Hapludox Loamy, (e) Typic Hapludox Very fine, (f) Kandiudalfic Eutrudox, (g) Aquic Udorthents, (h) All suborders.
Remotesensing 15 04859 g004
Figure 5. Confusion matrix (%) for the classification of soil horizons using the Neural Network model. (a) Arenic Kandiustults, (b) Typic Kandiaqualfs, (c) Typic Eutrudox, (d) Typic Hapludox Loamy, (e) Typic Hapludox Very fine, (f) Kandiudalfic Eutrudox, (g) Aquic Udorthents. A transition from light blue to dark blue indicates an increase in the percentage accuracy of the models.
Figure 5. Confusion matrix (%) for the classification of soil horizons using the Neural Network model. (a) Arenic Kandiustults, (b) Typic Kandiaqualfs, (c) Typic Eutrudox, (d) Typic Hapludox Loamy, (e) Typic Hapludox Very fine, (f) Kandiudalfic Eutrudox, (g) Aquic Udorthents. A transition from light blue to dark blue indicates an increase in the percentage accuracy of the models.
Remotesensing 15 04859 g005
Figure 6. Confusion matrix (%) of soil suborders classification by machine-learning models. (a) Logistic Regression, (b) Gradient Boosting, (c) Support Vector Machine, (d) Random Forest, (e) Neural Network. AK: Arenic Kandiustults, TK: Typic Kandiaqualfs, TE: Typic Eutrudox, THL: Typic Hapludox Loamy, THV: Typic Hapludox Very-fine, KE: Kandiudalfic Eutrudox, AU: Aquic Udorthents. A transition from light blue to dark blue indicates an increase in the percentage accuracy of the models.
Figure 6. Confusion matrix (%) of soil suborders classification by machine-learning models. (a) Logistic Regression, (b) Gradient Boosting, (c) Support Vector Machine, (d) Random Forest, (e) Neural Network. AK: Arenic Kandiustults, TK: Typic Kandiaqualfs, TE: Typic Eutrudox, THL: Typic Hapludox Loamy, THV: Typic Hapludox Very-fine, KE: Kandiudalfic Eutrudox, AU: Aquic Udorthents. A transition from light blue to dark blue indicates an increase in the percentage accuracy of the models.
Remotesensing 15 04859 g006
Table 1. Soil classification of the profiles.
Table 1. Soil classification of the profiles.
Taxonomy Units
SiBCS 1Soil TaxonomyID 2n 3
Argissolo Vermelho Ta DistróficoArenic KandiustultsAK800
Gleissolo Háplico Ta DistróficoTypic KandiaqualfsTK800
Latossolo Vermelho EutróficoTypic EutrudoxTE800
Latossolo Vermelho DistróficoTypic Hapludox LoamyTHL800
Latossolo Vermelho DistróficoTypic Hapludox Very-fineTHV800
Nitossolo Vermelho EutróficoKandiudalfic EutrudoxKE800
Gleissolo Háplico Ta EutróficoAquic UdorthentsAU800
Total 5600
1 Brazilian Soil Classification System. 2 Abbreviation for Identification. 3 Number of samples.
Table 2. Results of soil properties: particle size and soil organic carbon (SOC) analysis.
Table 2. Results of soil properties: particle size and soil organic carbon (SOC) analysis.
SoilHorizon 1Depth
(m)
SOC 2
(g dm−3)
Sand
(%)
Silt
(%)
Clay
(%)
Color 3 (Moist)
Hue, Value/Chroma
Arenic KandiustultsA0–0.123.6690195YR, 5/3
Ab0.12–0.305.1791185YR, 3/2
BA0.30–0.402.1091195YR, 3/2
Bt10.40–0.601.07811185YR, 3/2
Bt20.60–0.950.71761235YR, 3/2
Bt30.95+0.09832165YR, 3/4
Typic KandiaqualfsA0–0.096.6790195YR, 3/4
EA0.09–0.234.8791185YR, 3/4
E0.23–0.431.86921710R, 3/4
BE0.43–0.703.348621210R, 3/4
Btg0.70–1.102.46713275YR, 5/1
Cg1.10+3.11492495YR, 5/1
Typic EutrudoxAp0–0.1615.891477810YR, 3/6
Bw10.16–0.903.44748910YR, 3/6
Bw20.90–1.351.98829010YR, 3/6
Bw31.35+1.86629210YR, 3/6
Typic Hapludox LoamyA0–0.1614.87693292.5YR, 3/4
BA0.16–0.356.80663312.5YR, 3/4
Bw10.35–1.04.55653322.5YR, 3/4
Bw21.0+4.33673312.5YR, 2.5/4
Typic Hapludox Very-fineAp0–0.1222.942711622.5YR, 3/6
Bw10.12–0.449.73297642.5YR, 3/6
Bw20.44–1.144.32145812.5YR, 3/6
Bw31.14+1.98212772.5YR, 3/6
Kandiudalfic EutrudoxAp0–0.2410.111777610R, 3/6
AB0.24–0.404.511667810R, 3/6
Bt10.40–0.102.941757710R, 3/6
Bt20.10+1.732147410R, 3/6
Aquic UdorthentsA0–0.2828.722167310YR, 3/1
Cgv0.28–0.387.282737010YR, 3/1
Cg0.38–0.102.982067410YR, 4/1
Cgss0.10+4.69104872.5YR, 3/0
1 Horizon classificated based on Soil Taxonomy. 2 Soil organic Carbon. 3 Color measured by Munsell collor chart.
Table 3. Classification accuracy of soil horizons and ML model parameters.
Table 3. Classification accuracy of soil horizons and ML model parameters.
LearnerModel
Performance
SoilOptimal General Parameters
AKTKTETHLTHVKEAU
Gradient BoostingAccuracy:0.790.830.830.830.720.780.92Scikit-learn,
F-Score0.780.820.820.820.710.770.91learning rate: 0.1 or 0.2
Neural
Network
Accuracy:0.900.890.920.920.820.850.97Act.: logistic or ReLu,
F-Score0.890.890.920.920.810.850.97solver: Adam
SVMAccuracy:0.620.800.620.730.540.520.89Cost = 1,
F-Score0.630.800.610.760.540.540.90linear or polynomial
Randon ForestAccuracy:0.780.820.770.780.650.700.908 or 10 trees,
F-Score0.780.810.760.770.650.690.89minimum split: 5
Logistic
Regression
Accuracy:0.860.870.800.750.650.740.93Ridge,
F-Score0.860.870.800.710.630.720.91C = 10
SVM: Support Vector Machine. AK: Arenic Kandiustults, TK: Typic Kandiaqualfs, TE: Typic Eutrudox, THL: Typic Hapludox Loamy, THV: Typic Hapludox Very-fine, KE: Kandiudalfic Eutrudox, AU: Aquic Udorthents.
Table 4. Classification accuracy of soil suborders and model parameters.
Table 4. Classification accuracy of soil suborders and model parameters.
LearnerAccuracyF-ScoreOptimal Parameters
Gradient Boosting0.970.96Method: scikit-learn, learning rate: 0.1
Neural Network0.980.98Act.: logistic, solver: Adam
Support Vector Machine0.950.94Cost = 1, linear
Random Forest0.950.95N° trees: 10, minimum split: 5
Logistic Regression0.980.97Ridge, C = 10
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

de Oliveira, K.M.; Falcioni, R.; Gonçalves, J.V.F.; de Oliveira, C.A.; Mendonça, W.A.; Crusiol, L.G.T.; de Oliveira, R.B.; Furlanetto, R.H.; Reis, A.S.; Nanni, M.R. Rapid Determination of Soil Horizons and Suborders Based on VIS-NIR-SWIR Spectroscopy and Machine Learning Models. Remote Sens. 2023, 15, 4859. https://doi.org/10.3390/rs15194859

AMA Style

de Oliveira KM, Falcioni R, Gonçalves JVF, de Oliveira CA, Mendonça WA, Crusiol LGT, de Oliveira RB, Furlanetto RH, Reis AS, Nanni MR. Rapid Determination of Soil Horizons and Suborders Based on VIS-NIR-SWIR Spectroscopy and Machine Learning Models. Remote Sensing. 2023; 15(19):4859. https://doi.org/10.3390/rs15194859

Chicago/Turabian Style

de Oliveira, Karym Mayara, Renan Falcioni, João Vitor Ferreira Gonçalves, Caio Almeida de Oliveira, Weslei Augusto Mendonça, Luís Guilherme Teixeira Crusiol, Roney Berti de Oliveira, Renato Herrig Furlanetto, Amanda Silveira Reis, and Marcos Rafael Nanni. 2023. "Rapid Determination of Soil Horizons and Suborders Based on VIS-NIR-SWIR Spectroscopy and Machine Learning Models" Remote Sensing 15, no. 19: 4859. https://doi.org/10.3390/rs15194859

APA Style

de Oliveira, K. M., Falcioni, R., Gonçalves, J. V. F., de Oliveira, C. A., Mendonça, W. A., Crusiol, L. G. T., de Oliveira, R. B., Furlanetto, R. H., Reis, A. S., & Nanni, M. R. (2023). Rapid Determination of Soil Horizons and Suborders Based on VIS-NIR-SWIR Spectroscopy and Machine Learning Models. Remote Sensing, 15(19), 4859. https://doi.org/10.3390/rs15194859

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop