Soil Order-Land Use Index Using Field-Satellite Spectroradiometry in the Ecuadorian Andean Territory for Modeling Soil Quality

: Land use conversion is the main cause for soil degradation, inﬂuencing the sustainability of agricultural activities in the Ecuadorian Andean region. The possibility to identify the quality based on the spectral properties allows remote sensing methods to offer an alternative form of monitoring the environment. This study used laboratory spectroscopy and multi-spectral images (Sentinel 2) with environmental covariates (physicochemical parameters) to ﬁnd an affordable method that can be used to present spatial prediction models as a tool for the evaluation of the quality of Andean soils. The models were developed using statistical techniques of logistic regression and linear discriminant analysis to generate an index based on soil order and three indexes based on the combination of soil order and land use. This combined approach offers an effective method, relative to traditional laboratory methods, to derive estimates of the content and composition of soil constituents, such as electrical conductivity (CE), organic matter (OM), pH, and soil moisture (HU). For Mollisol index.3 with P á ramo land use, a value of organic matter (OM) ≥ 8.6% was obtained, whereas for Mollisol index.4 with Shrub land use, OM was ≥ 6.1%. These results reveal good predictive (estimation) capabilities for these soil order–land use groups. This provides a new way to monitor soil quality using remote sensing techniques, opening promising prospects for operational applications in land use planning.


Introduction
Soil is the main natural resource for food and energy production [1]. It controls the movement of water in the landscape and functions as a biological filter for the possible leaching of pollutants into environmental spheres [2]. However, soil can be degraded by chemical and physical processes, which reduces its ability to function as a base for the development of a healthy layer for vegetation. Therefore, acknowledging soil conditions by the effects on vegetation can represent site conditions [3,4]. Gholizadeh and Kopačková (2019) [1] considered that conventional methods of soil health evaluation in large areas involve several expensive and time-consuming variables such as collection of field data, chemical analysis in a laboratory, and geostatistical interpolation. Alternately, several studies have shown the possibility of characterizing soils and identifying their quality by correlating both physicochemical and spectral parameters [5]. Therefore, the use of remote sensing spectrometry products in environmental evaluation studies offers a complementary alternative to in situ monitoring procedures to aid in research, control, and monitoring of the soil component. The application field for these tools in the soil component is extensive [6] through the study of soil characteristics such as reflectance, degradation, and possible polluting agents with the processing of satellite images permitting the inspection and monitoring of large areas in a fixed time and place [7].
In Ecuador, the properties and pedogenetic processes of soil have been studied in terms of rock type, geomorphology, taxonomic classification, and soil order. An increasing breach between the available information on the main soils and their quality [8] leads to understanding the condition of the soil to allow for the planification of healthy and sustainable territories, as determined by goals 12 and 15 of the Sustainable Development Goals (SDG) [9]. These goals correspond to responsible consumption and production, and life on land. Therefore, it is necessary to determine the quality of soil to develop fast, feasible, and affordable estimation methods for monitoring and assessing areas.
In the study area, the predominate soils are Andisol and Mollisol, which originate from weathering of volcanic material (ash) [10]. These relatively young soils can convey high agricultural potential [11]. Andisols, also known as páramos, are clay loam soils capable of retaining enormous amounts of water; on the contrary, Mollisols are fertile soils with a high organic matter content that cover approximately 70% of the Cayambe canton, a political-administrative unit of Ecuador, where the research's basin is located. However, the lack of land use and occupation policies has caused the expansion of agricultural activity boundaries [12], causing the loss of the páramo. The main goal is to understand whether there is any relationship between the spectra measured in the samples collected in field with the corresponding bands measured by satellite, combining physicochemical analysis to quantify and model the quality of Andean soils caused by agricultural activity. This will be achieved by (i) compiling and analyzing the physicochemical parameters of soil based on quality standards for agricultural activities, obtaining indices that classify the soils based on their order (Andisol and Mollisol) and use, and (ii) determining whether the soil is associated with some of the physicochemical qualities considered, validating land use and order models based on field reflectance data, satellite reflectance, and physicochemical qualities.
There are several studies based on national models capable of predicting spectra limited in an infrared laboratory with statistical algorithm analysis [13,14]. In this research, with the use of these spectroscopic methods for the evaluation of soil quality, models were developed for estimating indicators based on the combination of soil order, land use, and physicochemical characteristics, using logistic regression analysis, linear discriminant analysis, and regression trees. The approach offers a method that derives the estimates using the ratio of laboratory/satellite spectra when the soil is well represented by the calibration samples used to build the predictive models [13]. Therefore, the performance of these local models can be used in other geographic spaces by incorporating the spectra into a dataset for that area [15].
Consequently, the combination of laboratory spectroscopy and multispectral images with environmental covariates is an adequate methodological alternative to obtain models that are adjusted for the prediction of the quality of Andean soils, independently of other methodological approaches that have been used [16][17][18][19].
The existing black and brown soils in the basin can retain high amounts of water and organic matter content, known as Mollisol and Andisol [10]. The soils have been influenced by the volcanic activity of the Cayambe during its genesis, causing slopes ranging from gentle to steep [7]. Vegetation coverage is characterized by the presence of cultivated grass (27%), herbaceous and shrubby moorland (45.6%), flowers and short-cycle crops (16.5%), and eucalyptus trees (8.1%) [10].

Sentinel-2 Satellite Images
The satellite imagery used for this research was acquired by the Sentinel-2 (S2) satellite constellation (2A and 2B Earth Observing Missions) launched in 2015 and 2016, which has been used extensively for monitoring land cover and vegetation [23,24]. The S2 satellites are identical and operate in a sun-synchronous orbit at a mean altitude of 786 km. The main S2 payload is a multi-spectral instrument (MSI), which is a push-broom sensor that registers the radiation reflected from the Earth passing through the atmosphere in 13 spectral bands distributed in four bands at 10 m, six bands at 20 m, and three bands at 60 m spatial resolution. Figure 2 depicts the range and spectral response functions of the S2A/S2B MSI instruments for these bands [25]. The S2 satellites have a swath width of 290 km. The visible bands (VIS) B1, B2, B3, B4 at 10 m resolution; near-infrared bands (NIR) B5, B6, B7, B8A at 20 m resolution and B8 at 10 m; and shortwave infrared (SWIR) B11 and B12 are most useful for retrieving geophysical surface parameters [26].
Meanwhile, the 60 m resolution bands are used for atmospheric corrections, which are of crucial importance for most EOS applications and permit the development and evaluation of robust atmospheric correction algorithms such as Sen2Cor. In the study area, located inside a swath overlap, the revisit frequency of each satellite is four to five days in an 11° forward-looking view angle; the presence of two identical satellites allows a The existing black and brown soils in the basin can retain high amounts of water and organic matter content, known as Mollisol and Andisol [10]. The soils have been influenced by the volcanic activity of the Cayambe during its genesis, causing slopes ranging from gentle to steep [7]. Vegetation coverage is characterized by the presence of cultivated grass (27%), herbaceous and shrubby moorland (45.6%), flowers and short-cycle crops (16.5%), and eucalyptus trees (8.1%) [10].

Sentinel-2 Satellite Images
The satellite imagery used for this research was acquired by the Sentinel-2 (S2) satellite constellation (2A and 2B Earth Observing Missions) launched in 2015 and 2016, which has been used extensively for monitoring land cover and vegetation [23,24]. The S2 satellites are identical and operate in a sun-synchronous orbit at a mean altitude of 786 km. The main S2 payload is a multi-spectral instrument (MSI), which is a push-broom sensor that registers the radiation reflected from the Earth passing through the atmosphere in 13 spectral bands distributed in four bands at 10 m, six bands at 20 m, and three bands at 60 m spatial resolution. Figure 2 depicts the range and spectral response functions of the S2A/S2B MSI instruments for these bands [25]. The S2 satellites have a swath width of 290 km. The visible bands (VIS) B1, B2, B3, B4 at 10 m resolution; near-infrared bands (NIR) B5, B6, B7, B8A at 20 m resolution and B8 at 10 m; and shortwave infrared (SWIR) B11 and B12 are most useful for retrieving geophysical surface parameters [26].
Meanwhile, the 60 m resolution bands are used for atmospheric corrections, which are of crucial importance for most EOS applications and permit the development and evaluation of robust atmospheric correction algorithms such as Sen2Cor. In the study area, located inside a swath overlap, the revisit frequency of each satellite is four to five days in an 11 • forward-looking view angle; the presence of two identical satellites allows a geometric revisit time between two and three days, supporting near-continuous monitoring of vegetation and land surface processes [26].
The imagery used in this research was acquired on 16 July 2018 by the Sentinel 2B platform (see Supplementary Materials). The Level-2A product was used after the geometric revisit time between two and three days, supporting near-continuous monitoring of vegetation and land surface processes [26].
The imagery used in this research was acquired on 16 July 2018 by the Sentinel 2B platform (see Supplementary Materials). The Level-2A product was used after the processing carried out in the Level-1C product available in the Open Access Hub of the European Space Agency (ESA) DataHUB server [27]. The identifier of the product is referenced as S2B_MSIL1C_20180716T153619_N0206_R068_T17NRA_20180716T202613. Furthermore, in the text, it will be referred to as S2BL2A.

Soil Sampling
Soil samples were collected during four field trips during the months of June, July, and August 2018. The samples were placed in hermetically sealed sleeves for physicochemical analysis (Figure 3a,b), and in two bulk density cylinders ( Figure 3c) for spectroradiometry analysis. The cylinder lids were covered with geomembranes to keep the samples unaltered, as explained by Yánez and Arciniegas (2019) [7], to comply with the provisions of Section 2.5.

Soil Sampling
Soil samples were collected during four field trips during the months of June, July, and August 2018. The samples were placed in hermetically sealed sleeves for physicochemical analysis (Figure 3a,b), and in two bulk density cylinders ( Figure 3c) for spectroradiometry analysis. The cylinder lids were covered with geomembranes to keep the samples unaltered, as explained by Yánez and Arciniegas (2019) [7], to comply with the provisions of Section 2.5. geometric revisit time between two and three days, supporting near-continuous monitoring of vegetation and land surface processes [26]. The imagery used in this research was acquired on 16 July 2018 by the Sentinel 2B platform (see Supplementary Materials). The Level-2A product was used after the processing carried out in the Level-1C product available in the Open Access Hub of the European Space Agency (ESA) DataHUB server [27]. The identifier of the product is referenced as S2B_MSIL1C_20180716T153619_N0206_R068_T17NRA_20180716T202613. Furthermore, in the text, it will be referred to as S2BL2A.

Soil Sampling
Soil samples were collected during four field trips during the months of June, July, and August 2018. The samples were placed in hermetically sealed sleeves for physicochemical analysis (Figure 3a,b), and in two bulk density cylinders ( Figure 3c) for spectroradiometry analysis. The cylinder lids were covered with geomembranes to keep the samples unaltered, as explained by Yánez and Arciniegas (2019) [7], to comply with the provisions of Section 2.5.   A total of 36 surface soil samples (0-10 cm) were collected from the study site using core drilling, cylinders, a stainless-steel shovel, and a GARMIN GPSMAP 62sc handheld navigator (accurate to within ±4 m). After removing vegetation from the soil surface in a quadrant of approximately 30 × 30 cm, 1 kg of soil sample was collected from each  [29], establishing two samples per homogeneous zone, and one sample for zones ID02 and ID08, which presented collection problems.

Physicochemical Parameter Measurements
To determine the quality of soil, the physical, chemical, and biological components of the soil and their interactions must be considered; despite the different measurements possible, not all parameters are relevant for the soil in a particular scenario [30]. In this study, the physicochemical components were selected considering two criteria: The first was based on the reflectivity of the soils that are conditioned to organic matter, that interfere with the spectral curves [31], and the second according to Friedman et al. 2001 [32], who set a minimum number of indicators for agricultural activities, due to human use and management. The parameters measured were soil moisture, pH, electrical conductivity (CE), and organic matter (OM). Other parameters, such as heavy metals, could not be measured for economic reasons.
For the 36 soil samples, the analysis was carried out as follows: For soil moisture, pH, and conductivity, the methods established in NOM-021-RECNAT-2000 (Mexican official standard that establishes the specifications of fertility, salinity, and soil classification) were used [33]. The AS-05 gravimetric method was applied to soil moisture, the AS-02 electrometric method was applied to pH, and the AS-20 method was applied to electrical measure conductivity.
For the determination of OM, the soil samples were sent to the certified soil, foliar, and water laboratory of the Agency for the Regulation and Control of Phytosanitary and Zoosanitary (Agrocalidad) of Ecuador, where the volumetric PEE/SFA 09 method (Walkley Black's analytical method consisting of wet oxidation of the soil sample) was applied [34].

Spectral Measurements in Laboratory and Satellite Image Sentinel-2
Spectral analysis of the soil samples was carried out in the Ecuadorian Space Institute (IEE) laboratory, which facilitated the use of the ASD FieldSpec4 spectroradiometer (Analytical Spectral Devices). The equipment has a spectral resolution of 3 nm in the range of 350-1000 nm and 10 nm in the range of 1001-2500 nm. In this regard, the spectral bands of the MSI sensor onboard satellite S2 are within the spectral range of the ASD instrument.
In the laboratory, the soil spectra were collected using a small spectralon placed in the HiBrite MudLight device with an optical fiber connected to generate an artificial light source ( Figure 4). The ASD FieldSpec4 spectroradiometer was used to obtain soil spectral data in the laboratory. The measurement protocol followed the methodology described in Figure 5 [35].  After the common configuration and control settings, i.e., output folders, connecting optical fiber, etc., the appropriate integration time was set given the lighting conditions to optimize the measurements. Subsequently, the dark current was also recorded. Then, a reference target or white reference (spectralon) was measured until a horizontal line with a reflectance value of 1 was presented. Laboratory soil spectra measurements followed these pre-operation phases, and the resulting measured spectra were processed with the corresponding software (ViewSpec Pro). An important issue that also needed to be addressed was that of the signal-to-noise ratio, or SNR, during measurements, which is related to the signal component. The spectra measurement procedure was performed using the principle of a continuous fiber optic cable, as specified in [36]. This technique has the advantage of significantly degrading the SNR, as it avoids interactions with other media between the recording device and the sample. Soil measurements were carried out by controlling the direction of the optical fiber to always point towards the target in order to avoid anisotropy effects. Subsequently, an average of all measurements was made to obtain the spectrum. Finally, the displays of the spectra that were selected were shown.
The S2BL2A product specified in 2.2 was used for the satellite image, with bottomof-atmosphere (BOA) reflectance, in which only the bands matching the S2 product were considered. The centroid of each pixel was determined to obtain the spectral values per band on a 20 × 20 m grid. After the common configuration and control settings, i.e., output folders, connecting optical fiber, etc., the appropriate integration time was set given the lighting conditions to optimize the measurements. Subsequently, the dark current was also recorded. Then, a reference target or white reference (spectralon) was measured until a horizontal line with a reflectance value of 1 was presented. Laboratory soil spectra measurements followed these pre-operation phases, and the resulting measured spectra were processed with the corresponding software (ViewSpec Pro). An important issue that also needed to be addressed was that of the signal-to-noise ratio, or SNR, during measurements, which is related to the signal component. The spectra measurement procedure was performed using the principle of a continuous fiber optic cable, as specified in [36]. This technique has the advantage of significantly degrading the SNR, as it avoids interactions with other media between the recording device and the sample. Soil measurements were carried out by controlling the direction of the optical fiber to always point towards the target in order to avoid anisotropy effects. Subsequently, an average of all measurements was made to obtain the spectrum. Finally, the displays of the spectra that were selected were shown.
The S2BL2A product specified in 2.2 was used for the satellite image, with bottomof-atmosphere (BOA) reflectance, in which only the bands matching the S2 product were considered. The centroid of each pixel was determined to obtain the spectral values per band on a 20 × 20 m grid.
These laboratory spectroradiometry measurements and satellite images were used to establish indices (the table in Section 2.8.1) that compare the physicochemical parameters of the soil with the spectral bands of the S2-MSI sensor. Since it was agreed to only use the equipment supplied in the Laboratory of the Ecuadorian Space Institute, in-field spectroradiometry measurements were not possible. The measurements of the spectra were carried out only in the laboratory, as in [37][38][39], generating a different model to those already known to evaluate soil quality [40,41].

Land Use/Soil Order Dataset
Before the processing specified in Section 2.8, a dataset was built based on the in-field soil samples (spectra and physicochemical parameters) and reflectance per homogeneous area in the S2BL2A satellite image. These data consisted 345,408 observations as a product of the combination of two datasets and constituted the geographic population of the area under study. The data were then treated based on the combination of the variables "land order" and "land use", whose total set of observations was as follows ( Table 1). The Model 1 was established from the dataset by applying a simple random method of 5% of the population to compare what was provided by each sample until one dataset was left as a result of the information provided between one model and another being the same. The data were divided into training and testing groups. The training dataset consisted of 70% of the total number of observations in the sample, and the test dataset consisted of 30% of the total number of observations in the sample. With the dataset the model was elaborated and later executed.
Model 2 was developed from the dataset by selecting those corresponding to the Andisol soil order (Table 1), with 199,660 observations. Three categories were selected: Shrub, Páramo, and Pasture, leaving Forest out of the analysis due to its low frequency. Composite samples of 70% of each land use category were randomly selected, thus leaving a composite sample of 139,734 random observations, which were part of the training dataset. Therefore, the test dataset contained the remaining 30% of the 59,886 observations.
For the Mollisol order, considering the data's trend, based on the geographical distribution of the different land uses ( Figure 6), it was observed that some of the soils had a particular use owing to their nature and population growth, among other characteristics of the area. The dataset corresponding to the Mollisol soil order (Table 1), with 145,748 observations, was considered to form Model 3 from a stratified random sample of 70% of the total. This allowed us to consider two subsets based on land use, resulting in two more models. The first subset (Model 3), called "Mollisol 1", was made up of Forest, Páramo, and Pasture; the second subset (Model 4), "Mollisol 2", was composed of Agriculture and Shrub. Each training dataset represented a 70% stratified random sample based on the categories of Mollisol 1 ( Table 2) and Mollisol 2 (Table 3). Consequently, the test dataset included the remaining 30% of the data. For Mollisol 1 it was 38,391, and for Mollisol 2, 5335. Table 2. Training sample to obtain the discriminant function in the classification of land use of the The dataset corresponding to the Mollisol soil order (Table 1), with 145,748 observations, was considered to form Model 3 from a stratified random sample of 70% of the total. This allowed us to consider two subsets based on land use, resulting in two more models. The first subset (Model 3), called "Mollisol 1", was made up of Forest, Páramo, and Pasture; the second subset (Model 4), "Mollisol 2", was composed of Agriculture and Shrub. Each training dataset represented a 70% stratified random sample based on the categories of Mollisol 1 (Table 2) and Mollisol 2 (Table 3). Consequently, the test dataset included the remaining 30% of the data. For Mollisol 1 it was 38,391, and for Mollisol 2, 5335.

Homogeneous Zones
The area of the Rio Blanco basin was extracted from the hydrological database of the Cayambe canton. Land use was extracted from the national productive systems database [10], and slope data and soil order data were obtained from a geopedology base map [10]. The multi-criteria technique was applied to create homogeneous zones [7] considering the following criteria: soil order and land use. This made it possible to establish a spatial analysis unit for correlation evaluation tests, determining the number of samples to be taken in the field. In total, within the Rio Blanco basin, 31 zones with similar characteristics were identified in previous work [7], of which 19 were analyzed for easy access in the study area ( Figure 7). Using the physicochemical laboratory analysis results, the Thiessen polygons were calculated in each homogeneous zone from the soil-sampling points. Likewise, the topology of spatial relations (intersection and inside) was applied [42] with the spectral values per satellite band to identify the spatial relationship of the satellite spectral values concerning the physicochemical characteristics of each homogeneous zone [43].
The Mexican (NOM-021-RECNAT-2000) [33] and Ecuadorian (Book VI, Annex 2) environmental standards [29] were also considered to establish homogeneous areas as a unit of spatial analysis. This allowed sampling points to be defined based on the criteria of slope, soil order, and land use, resulting in 19 homogeneous areas ( Figure 7). Using the physicochemical laboratory analysis results, the Thiessen polygons were calculated in each homogeneous zone from the soil-sampling points. Likewise, the topology of spatial relations (intersection and inside) was applied [42] with the spectral values per satellite band to identify the spatial relationship of the satellite spectral values concerning the physicochemical characteristics of each homogeneous zone [43].

Regression and Spatial Analysis
The Mexican (NOM-021-RECNAT-2000) [33] and Ecuadorian (Book VI, Annex 2) environmental standards [29] were also considered to establish homogeneous areas as a unit of spatial analysis. This allowed sampling points to be defined based on the criteria of slope, soil order, and land use, resulting in 19 homogeneous areas ( Figure 7).

Regression and Spatial Analysis
To identify the relationship between the spectral behavior of soil in the laboratory and satellite images associated with the physicochemical qualities of soils, we proceeded to apply regression techniques based on the theory of machine learning, which makes them useful for predicting outcomes, identifying patterns, and making decisions with minimal human intervention [44]. Subsequently, discriminant analysis techniques, geostatistics, and non-parametric models (regression trees) were applied.
This study started by creating a logistic regression model ( Figure 8) with the dependent variable "soil order" (Andisol, Mollisol), and as explanatory variables, the reflectance levels of the spectral behavior of soil in laboratory and satellite images, called Model 1, to later select those variables that were statistically significant, from which this model generated the soil index required ( Table 4). The basic form of the logistic regression model is as follows (Equation (1)) [45]:     Several logistic regression trials with the dependent variable of soil order were tested. The statistically significant variables (p ≤ 0.05) were verified until the most highly significant set of variables was obtained. The statistically significant model consisted of a combination of independent variables related to the reflectance levels of the soil spectra in the laboratory and satellite image.
The land use variable was related to the taxonomic classification of soil, so when considering only this variable, there was no way to separate the different land uses and generate an index that allowed us to estimate the physicochemical characteristics based on the land use variable of soil. To resolve this difficulty, the soil order variable was considered, and within each of these categories-Andisol and Mollisol-the land use variable that yielded the best classification levels by soil order was analyzed, as discussed in Section 2.6. Considering land use as the dependent variable, a statistical methodology known as linear discriminant analysis was applied ( Figure 8) [46], which allowed a linear combination of variables to be identified that could be used to determine the group to which each individual belonged. In this case, the individuals were identified as homogeneous areas in which different soil samples were collected to develop Model 2, Model 3, and Model 4 ( Table 4).

Obtaining Index
The standardization of the coefficients from the logistic regression model (Model 1) followed a pattern where each coefficient was divided by the sum of its coefficients, in such a way that the sum of the coefficients of the index was equal to 1 (Equation (2) From the linear discriminant function, the coefficients of the models were standardized by land use by soil order of the models: Model 2, Model 3, and Model 4, where each coefficient of the model was divided by the sum of its coefficients in such a way that the sum of the coefficients of the indices (Index 2, Index 3, Index 4) was equal to 1 Equation (3).
where cso[i] = standardized coefficients of land use by soil order; In this case, the data were treated based on the combination of soil order and land use, and the entire set of observations is shown in Table 1.
An association analysis was applied between each physicochemical parameter and the indices, which determined no linear trend between the different pairs of variables that were compared. The geostatistical surface was created using the inverse distance weighted (IDW) method, which assumes that closer objects are similar to those far apart. Therefore, any unknown location will probably have an equal value to the nearest known locations [47]. It was possible to determine how the order of soil and land use were spatially distributed in terms of probability and to establish predictions of each physicochemical parameter through a set of non-parametric models known as decision trees [48], where the dependent variable can be categorical or numeric. The regression tree model was applied since the dependent variable was each of the calculated indices and was numerical, and the explanatory variables (covariate) were the order of the soil and the order-land use, together with each physicochemical parameter, considered as independent variables, as shown examples in Table 5, which are explained in Section 3.5. The space defined by the regression tree models, as part of a non-parametric analysis, consisted of dividing the predictor space into boxes (regions) [49]. For example, the areas were a function of Index 1 and the order of soil to estimate the physicochemical organic matter (OM) parameter to make the prediction. Consequently, a regression tree model was generated to describe the association between each index and each physicochemical parameter, as shown in Figure 8. For a better understanding, the general methodological framework was divided into three parts, as shown in Figure 8. All statistical and graphical calculations were performed using RStudio software [50].

Validation
The validity of Model 1 was tested by calculating the confusion matrix [49] to determine the classification error of the samples and the accuracy, together with the Kappa statistic to indicate the degree of agreement between the measured data and the predicted value by the model, concerning their order in Andisol or Mollisol, as well as the sensitivity and specificity, which indicate the probability of correctly classifying the soil samples from the model. The Kappa coefficient must be equal to zero. When the Kappa coefficient differs from 0, it means that the data obtained from the validation model, as predicted data, agree with the measured data used to generate the model. Sensitivity refers to the ability of a model to identify the order of soils. In contrast, specificity indicates the ability of the model to identify soil samples that do not correspond to the order of the soil to be classified.
In Model 2, among the different land uses of the order Andisol with each of the two sets of test data selected randomly, the classification errors were evaluated using the confusion matrix and the accuracy metric.
For Model 3, among land uses such as Forest, Páramo, and Pasture of soil order Mollisol, those that did not participate in the elaboration of the model needed to be classified based on Model 3 and have their corresponding confusion matrices calculated in order to determine the classification error and accuracy metric.
The same criteria applied for Model 4, between the agricultural and shrubland uses of the soil order Mollisol.
Once each model was validated, the effects of soil order and land use on each physicochemical parameter were determined; one-way analysis of variance (ANOVA) was used [49], with a dependent variable for each physicochemical parameter and an independent variable for the order and land use of the soil. Several ANOVA models were tested for soil order and other land use models within each order. For each case, the null hypothesis was that the mean of each physicochemical parameter is the same in each order of soil, or the mean of each physicochemical parameter is the same for each land use within each order, versus the alternate hypotheses indicating that at least one pair of mean values is different. The statistical decision criteria are based on a significance level of 5% (α = 0.05); thus, if the p-value is less than or equal to 0.05 the null hypothesis is rejected, and it can be concluded that the effect of the independent variable is significant in relation to the mean of the dependent variable. Otherwise, if the p-value is greater than 0.05, then the available data do not yield enough information to conclude that the independent variable is significant in relation to some variation of the dependent variable.

Physicochemical Analysis
The different soil samples were analyzed using standardized physicochemical methods. The soil moisture results were in a range from 12.23% to 74.99%. The areas that predominated with the highest percentage of soil moisture were moors, and those with the lowest soil moisture were forest and shrub areas. The lowest water content was observed in Mollisol.
Regarding pH, the most acidic soils corresponded to undisturbed moors, whereas the least acidic soils corresponded to cultivated pastures. The range was 4.55 to 5.76, which, according to Mexican regulations, ranges from moderately acidic to strongly acidic, and according to Ecuadorian regulations, it would be out of range.
The electrical conductivity of the soils was within the limits established by both the Ecuadorian and Mexican regulations, <200 µS/cm and <1 dS/m, respectively. The areas with the lowest electrical conductivity were moors, whereas the highest electrical conductivity was observed in grasses.
For OM, the values ranged from 2.78% to 16.06%. According to Mexican standards, the zones range from very low to very high levels for soils of volcanic origin. The zone with the lowest OM percentage was forest, followed by passage areas, and the zone with the highest OM content was páramo areas.

Analysis of Spectral Signatures
In this section the behavior of each band was determined with respect to the intensity of reflectance of the soil samples. For each cylinder, two spectral measurements were made in opposite sections of the same tube, and for each soil sample 10 spectral measurements per section were averaged for a single representative spectrum per homogeneous area, which resulted in graphs (Figure 9a,b) as a function of reflectance and wavelength per sample in each homogeneous area.

Analysis of Spectral Signatures
In this section the behavior of each band was determined with respect to the intensity of reflectance of the soil samples. For each cylinder, two spectral measurements were made in opposite sections of the same tube, and for each soil sample 10 spectral measurements per section were averaged for a single representative spectrum per homogeneous area, which resulted in graphs (Figure 9a,b) as a function of reflectance and wavelength per sample in each homogeneous area. The reflectance of the spectra was graphically analyzed in the laboratory to determine the behavior of the soils related to their spectral signatures of the Andisol and Mollisol orders. The spectral signatures obtained in the laboratory presented a pattern related to the typical spectral signature of soils, ranging from the visible range (VNIR) to near-infrared (NIR) to short-wave infrared (SWIR).
The graph in Figure 9a shows the pasture curves, where sample ID11 of the Andisol soil order presented the same intensity of reflectance as sample ID14 of the Mollisol order, The reflectance of the spectra was graphically analyzed in the laboratory to determine the behavior of the soils related to their spectral signatures of the Andisol and Mollisol orders. The spectral signatures obtained in the laboratory presented a pattern related to the typical spectral signature of soils, ranging from the visible range (VNIR) to near-infrared (NIR) to short-wave infrared (SWIR).
The graph in Figure 9a shows the pasture curves, where sample ID11 of the Andisol soil order presented the same intensity of reflectance as sample ID14 of the Mollisol order, which were the highest compared to the other samples. Sample ID10 of the Mollisol order had a medium intensity of reflectance, unlike sample ID15 of the Mollisol order and sample ID13 of the Mollisol order with lower values of reflectance intensity. This variation in the curves is related to the properties and state of these soils [51], considering the variation of each of the land uses. Thus, in the graph in Figure 9b, sample ID16 of the Mollisol order, with agricultural land use, may indicate changes in the characteristics and status of agricultural use in the months of June, July, and August.
It can be said that the graphs made a difference in the behavior of the soil order based on the associated land use.
This could be related to the reflectance records of the Sentinel-2 satellite images (Figure 10a,b) to improve spectral differences by calculating soil order indices based on land use and physicochemical parameters, as explained in the next section. It can be said that the graphs made a difference in the behavior of the soil order based on the associated land use.
This could be related to the reflectance records of the Sentinel-2 satellite images (Figure 10a,b) to improve spectral differences by calculating soil order indices based on land use and physicochemical parameters, as explained in the next section. From the logistic regression calculation, Model 1 was obtained, whose structure is

Model 1, by the Orders of Andisol and Mollisol Soils
From the logistic regression calculation, Model 1 was obtained, whose structure is shown in Table 6. The coefficients of Model 1 were both positive and negative. This model was composed of explanatory variables, consisting of a combination of the spectral behavior of the soil in the laboratory and satellite image, with the particularity that there are reflectance levels related to the characteristics of red, red border, and near-infrared. Classical vegetation indices were composed, but in this case, the objective was to classify the order of the soil in Andisol and Mollisol. Furthermore, one of the independent variables was not significant (B08c), which did not influence the global significance of this logistic regression model (p < 0.0001).
Based on the training dataset, we obtained a confusion matrix (Table 7). The confusion matrix indicated a training error of 3.5%, which means that the model was good for classifying soils in relation to their order in Andisol or Mollisol based on the spectral behavior of the soil in laboratory and satellite image, and satellite related to the red reflectance of any modality. The false-positive and false-negative coefficients were relatively low (Table 7), at 1.9% (2615/139,762) and 5.89% (6006/102,024), respectively. In other words, 2615 soil samples of the Andisol order were classified as Mollisol, and 6006 Mollisol soil samples were classified as Andisol. We then evaluated the model using a test dataset to describe the validation process.

Model 1 Validation
The diagnostic evaluation of Model 1, from the diagnostic statistics using the test dataset, was generally good because the accuracy, sensitivity, and specificity were above 95%. On the other hand, the p-value of the Kappa statistic (Table 8) was more significant than 0.05 (p > 0.05), indicating that the null hypothesis that the measurements obtained through Model 1 were equivalent to the real data is not rejected. As explained in Section 2.8.1, to classify land use according to the Andisol soil order, linear discriminant analysis was applied (Figure 8). The following results were obtained (Table 9). The spectral values of the soil in the laboratory had greater weight in the classification of the different land uses than in the satellite image as a function of spectrometry in the laboratory and satellite image. Regardless of the sign, the coefficients of the soil spectral values in the laboratory were greater than the coefficients of the values in the satellite image. Consequently, the first component of this linear discriminant function explains that 96.5% of the total variability of the three different land uses had lower coefficients; even though the reflectance values in the satellite image were lower, these variables were important for the classification of land use as a function of the Andisol soil order. The first and second discriminant components were the linear combinations of the variables that best discriminate between the three land uses of the Andisol order, which in this case corresponded to the entire spectrum of soil in the laboratory and satellite image, respectively. Figure 11 shows the results of the soil classification based on the linear discriminant function model (Model 2).
The numbers 1, 2, and 3 represent the mean of each dataset. The means were quite separate, which implies a good classification of the land use of the Andisol order. In addition, based on the first linear discriminator, better discrimination was observed between the soils of Pasture and Páramo or between soils of Shrub and Páramo use than between the soils of Shrub and Pasture use. This situation could be because these land uses, in some cases, have relatively small neighboring units. Based on the training dataset, a confusion matrix was obtained (Table 10). In Figure 11 and Table 10, a good classification of the land uses of the Andisol order was observed, with a classification error of 0.51%. and second discriminant components were the linear combinations of the variables th best discriminate between the three land uses of the Andisol order, which in this ca corresponded to the entire spectrum of soil in the laboratory and satellite image, resp tively. Figure 11 shows the results of the soil classification based on the linear discrimina function model (Model 2). The numbers 1, 2, and 3 represent the mean of each dataset. The means were qu separate, which implies a good classification of the land use of the Andisol order. In a dition, based on the first linear discriminator, better discrimination was observed betwe the soils of Pasture and Páramo or between soils of Shrub and Páramo use than betwe the soils of Shrub and Pasture use. This situation could be because these land uses, in som cases, have relatively small neighboring units. Based on the training dataset, a confusi matrix was obtained (Table 10). In Figure 11 and Table 10, a good classification of the la uses of the Andisol order was observed, with a classification error of 0.51%.   Model 2 Validation From the first group data for Shrub, Páramo, and Pasture land uses of the Andisol order, corresponding to the 30% that were not part of the model calculation, a confusion matrix was obtained (Table 11) from which a good classification of the land uses of the Andisol order was obtained, whose classification error was only 0.50% and accuracy 99.5%. Similar results were obtained for the second set of randomly selected data.  The results were obtained from the application of LDA (Table 12). For the Mollisol 1 order, observing the coefficients of the linear discriminant function (Table 12), it resulted that the spectral behavior of soils measured in the laboratory exhibited a higher contribution to discriminate land uses Forest, Páramo, and Pasture for the Mollisol 1 order, compared to the coefficients derived from the satellite image. Consequently, the first component of this linear discriminant function (LD1) explained 70.9% of the total variability of the three different land uses, implying that although the reflectance values in the satellite image had lower coefficients, these variables were important for the classification of land use from the Mollisol 1 soil order. The first and second discriminant components were the linear combinations of the variables that best discriminated between the three Mollisol 1 land use types. Figure 12 shows a representation of the linear discriminant function for this particular case of Mollisol 1 land use, with a minimum overlap between Páramo and Pasture, with a classification error of only 0.47% and an accuracy of 99.52%.
14, x FOR PEER REVIEW 18 of 29 The following results were obtained from the application of LDA (Table 13): Table 13. Coefficients of the linear discriminant function with dependent variable of the use of soil of the Mollisol 2 order and independent variables of the reflectance levels of the soil spectra in la- The following results were obtained from the application of LDA (Table 13): In relation to the coefficients of the linear discriminant function (Table 13), it was found that the soil spectral values measured in the laboratory had a higher contribution for the classification of the considered land uses compared to those of the satellite image. Regardless of the sign, the coefficients of the soil spectrum in the laboratory were higher than the coefficients of the spectrum in the satellite image. Consequently, the only component of this linear discriminant function explained 100% of the total variability of the two different land uses, which implies that even though the spectral values of the satellite image had lower coefficients, these variables were important for the classification of these land uses as a function of the Mollisol 2 soil order. Figure 13 represents a good classification of the uses of soils order Mollisol 2.

Model 4 Validation
For the validation of Model 4, we tested 30% of the remaining data, called the test dataset, to classify the soils based on Model 4, and obtained the following confusion matrix, shown in Table 14.

Model 4 Validation
For the validation of Model 4, we tested 30% of the remaining data, called the test dataset, to classify the soils based on Model 4, and obtained the following confusion matrix, shown in Table 14. In Table 14, for the remaining 30%, a good classification of the land use was observed in Agriculture and Shrub for the Mollisol 2 order, considering the same behavior indicated in the training data.  Table 15. For the soils of the Andisol order, the mean level of the index was 0.2377 with a relatively low level of variability, equal to 0.1792. For soils of the Mollisol order, the mean level of the index was lower, −1.4961, presenting a higher level of variability equal to 0.5688, which can also be classified as a high level of variability. In this way, index.ma.1 classifies soils according to their order. The index obtained from the discriminant function of Model 2 was expressed as follows (Equation (5) The descriptive statistics of Index 2 are displayed in Table 16.  The index obtained from the discriminant function of Model 3 is expressed as follows (Equation (6) For land uses of the Mollisol 1 order, the mean level of Index 3 was higher in forest land use, at 0.79, with the highest level of variability, equal to 0.22 (the table in Section 3.5.3). For the use of páramo land of the Mollisol 1 order, the mean level of Index 5 was −0.06, with the lowest level of variability, equal to 0.12 (Table 17). The level of variability of the groups defined as a function of the land use of the Mollisol order was different, representing the natural behavior of these variables. The fourth index obtained from the discriminant function (Model 4) was obtained by standardizing the coefficients of this model. Each coefficient of Model 4 was divided by the sum of its coefficients in such a way that the sum of the coefficients of Index 4 was equal to 1, obtaining (Equation (7)): − 0.34B11c + 0.35B12c + 0.02B02s + 0.02B03s (7) − 0.05B04s − 0.03B05s − 0.04B06s + 0.02B07s − 0.002B08s + 0.02B08As − 0.01B11s + 0.08B12s

Indices Depending on the Variable of Land Use of the Andisol and Mollisol Orders
For land uses of the Mollisol 2 order, the mean Index 4 was higher in agricultural land use, at 0.12, with the highest level of variability, equal to 0.02 (Table 18). For the use of shrub soil of the Mollisol 2 order, the mean level of Index 6 was −0.07, with the lowest level of variability, equal to 0.01 (Table 18). The level of variability of the groups defined as a function of the land use of the Mollisol order was different, representing the natural behavior of these variables. The first model predicted the values of index.ma.1. as a function of the covariates, soil order, and physicochemical parameters. An example with soil moisture is presented, where for soils of the Andisol order, the mean of the index.ma.1 ( Figure 14) is equal to −0.134. If the soil moisture (HU) is greater than or equal to 36.7 for soils of the Andisol order, the predicted value of the index.ma.1 is on average equal to 0.109 (i = 0.109) for n = 74,000 soil samples. However, if the soil moisture is less than 36.7, the predicted value of index.ma.1 is on average equal to 0.382 (i = 0.382), for n = 65,700 soil samples. The same interpretation for soils of Mollisol order.
Likewise, if the value of the index is positive, it corresponds to an Andisol soil order and negative to a Mollisol soil order. The predicted moisture value is at least 36.7.

Regression Tree Models of Physicochemical Parameters as a Function of Land Use of the Andisol Order through Model 2 (index.2)
As shown in Table 19, it was possible to obtain the effect of land use of the Andisol order on the physicochemical parameters. An example of the non-parametric regression tree model is presented below (Figure 15), with the dependent variable as index.2 and the independent variables as soil use and organic matter (OM) for soils of the Andisol order (  Likewise, if the value of the index is positive, it corresponds to an Andisol soil order and negative to a Mollisol soil order. The predicted moisture value is at least 36.7.

Regression Tree Models of Physicochemical Parameters as a Function of Land Use of the Andisol Order through Model 2 (index.2)
As shown in Table 19, it was possible to obtain the effect of land use of the Andisol order on the physicochemical parameters. An example of the non-parametric regression tree model is presented below ( Figure  15), with the dependent variable as index.2 and the independent variables as soil use and organic matter (OM) for soils of the Andisol order (Table 20)     In Figure 16, the regression tree model of index.3 can be observed as an example in the function of land use of soil order Mollisol 1 and organic matter, which are related  In Figure 16, the regression tree model of index.3 can be observed as an example in the function of land use of soil order Mollisol 1 and organic matter, which are related based on In Figure 16, the regression tree model of index.3 can be observed as an example i the function of land use of soil order Mollisol 1 and organic matter, which are relate based on Table 21 (ARUSM3MO).   OM was greater in Páramo (≥8.6%) than Forest and Pasture, with a misclassification of 22% for Forest and 17% for Pasture (Table 21).

Regression Tree Models of Physicochemical Parameters as a Function of Land Use of the Mollisol Order through Model 4 (index.4)
The land use regression tree model for soil order Mollisol 2 and OM, which are related based on Table 22 (ARUSM4MO), are shown in Figure 17. OM was greater in Páramo (≥8.6%) than Forest and Pasture, with a misclassification of 22% for Forest and 17% for Pasture (Table 21) The land use regression tree model for soil order Mollisol 2 and OM, which are related based on Table 22 (ARUSM4MO), are shown in Figure 17.  OM was greater in Shrub (≥6.1%) than Agriculture (<6.1%), with a misclassification of 1% for Shrub (Table 22).

Discussion
The results of this study allow for a description of the correlation between the physicochemical parameters with index.2 (Andisol), index.3 (Mollisol 1), index.4 (Mollisol 2), according to the soil order-land use homogeneous zones defined in Section 2.7 and based on the criteria of Zebrowski 1997 [52]. In the case of the Mollisol order soil, the prediction values for Páramo were ≥8.6% (Table 21), which maintained the characteristic behavior of the Ecuadorian Andean zone, as cited by Podwojewski (1999) [53]. On the contrary, for OM was greater in Shrub (≥6.1%) than Agriculture (<6.1%), with a misclassification of 1% for Shrub (Table 22).

Discussion
The results of this study allow for a description of the correlation between the physicochemical parameters with index.2 (Andisol), index.3 (Mollisol 1), index.4 (Mollisol 2), according to the soil order-land use homogeneous zones defined in Section 2.7 and based on the criteria of Zebrowski 1997 [52]. In the case of the Mollisol order soil, the prediction values for Páramo were ≥8.6% (Table 21), which maintained the characteristic behavior of the Ecuadorian Andean zone, as cited by Podwojewski (1999) [53]. On the contrary, for Forest and Pasture, the prediction models presented a behavior with a lower value in organic matter (<8.6%) (Table 21), a less acidic pH and lower soil moisture percentage, and a higher electrical conductivity [10]. This behavior shows the effects of the impact of human activity, with a lesser value of OM in the Agriculture land use (<6.1%) ( Table 22).
The results presented in this study differ from other studies that compared different classification techniques using Sentinel-2 images [18,54], or considered the capacity of satellite observations to monitor and determine the state of the vegetation due to environmental stress factors by evaluating vegetation and chlorophyll indices [1].
Unlike other methodological approaches [17,[55][56][57], this study demonstrates that the combination of laboratory spectroscopy and multispectral images with environmental covariates is an adequate alternative to establish spatial analysis models to predict the quality of Andean soils in terms of physicochemical variables such as CE, OM, pH, and HU. For this purpose, performing soil order-land use associations was revealed to be an important possible tool for assessing the accomplished predictive models.
(1) Performance of the Models Equation (4) shows the distributions of the logistic regression coefficients in the R-NIR spectral range for soil order, with low false-positive (1.9%) and false-negative (5.89%) coefficients. For the Andisol soil order, the mean level of the index (index.ma.1) was 0.2377, with a level of variability equal to 0.1792 (Table 15). For the Mollisol soil order, the mean level of the index (index.ma.1) was −1.4961, with a level of variability equal to 0.5688 (Table 15). The index values ranged from approximately −6 to 2, with some outliers below −4 and a very low frequency of occurrence. This corroborates the potential of promoting soil studies based on laboratory spectral data and remote sensors, such as Ali Aldabaa et al. (2014) [19], who evaluated the feasibility of the methods for the prediction of soil surface salinity by visible near-infrared diffuse reflectance spectroscopy (VisNIR) and remote sensing (RS). Equations (5)- (7) show the distributions of the coefficients of the linear discriminant function in the VIS-NIR-SWIR spectral range for the order-land use both in laboratory and S2 configurations. For land uses of the Andisol order, the level of variability of the defined groups was very different, which represents the natural behavior of these variables (Shrub, Páramo, Pasture), which, unlike previous research on the VIS-NIR, presented greater sturdiness considering SWIR.
(2) Predictions of Physicochemical Parameters The prediction performance of the R-NIR model, based on the Student t-test with p < 0.0001 for OM, CE, pH, and soil moisture, shows that the mean of each parameter in the Andisol and Mollisol soil order were different, concluding that the mean of each one was lower for the Mollisol soil order, unlike CE, where its mean was higher in Mollisol.
Regarding the results obtained from the VIS-NIR-SWIR or full spectrum model, using non-parametric regression tree models, excellent results were obtained for OM, pH, CE, and soil moisture as explanatory variables of order-land use [57]. For Mollisol 1, the 95% confidence intervals for the difference in means for the set of physicochemical parameters (CE, OM, pH, HU) were negative for Pasture and Páramo, and for Forest. This means that on average the given set of parameters had higher values in Forest. For the Mollisol 2 soil type, the 95% confidence intervals for the difference in means for the considered set of physicochemical parameters were negative for Shrub and positive for Agriculture. For Andisol-type soils, the 95% confidence intervals for the difference in means for OM in Páramo are higher than the average OM in Shrub. Similarly, but in an opposite direction, when comparing the mean OM in Pasture and Páramo land uses (p = 0.0000 < 0.05), the 95% confidence interval for the mean difference was negative, which implies that the average OM in soils used for pasture was less than the average OM in soils used for Páramo. Very similar results were obtained in relation to the pH physicochemical parameter, and in relation to CE and HU in all pairs of established comparisons there were statistically significant differences.
In the methodological process, the nonparametric regression tree method was successfully applied to predict the values of the model covariates by soil order or land use order ( Figure 8). This statistical analysis methodology differed from those applied to date, like Adeline (2017) [41], Bao (2017) [40], Soriano-Disla (2014) [17], and Ali Aldabaa (2014) [19], where it was established that soil properties were derived from reflectance spectra that can be applied from various sources of spectral measurements, such as measurements in the laboratory, in the field, or from remote sensing systems.
These regression tree models were more flexible than those presented by Hill (2011) [55], because they did not consider non-compliance with statistical assumptions such as normality or collinearity problems between predictor variables. The regression tree models allowed for approximate estimates supported by 95% confidence intervals as a measure of the variation range of each physicochemical parameter, allowing for a reading of this from the top to the final nodes and vice versa, which was not possible in other applied models (Figures 14-17; Tables 21 and 22) [19,55,56].
Soil quality and soil degradation are crucial to develop sustainable agricultural activities [58]. The usual methods for environmental soil monitoring are very labor intensive and costly to cover large areas of land [1,13]. Satellite data in this field open up new research opportunities with great applications, as large areas of land can be analyzed and soil quality can be assessed in areas that are difficult to access [6,51].
Finally, this study is very valuable for the Ecuadorian Andean region for soil sustainability. Additionally, the results obtained in this study could be adapted in future research to other geographical regions after reviewing the soil order and land use that allow the relationships observed in the proposed model indices to be confirmed.

Conclusions
Soil quality is an important factor in sustainable land management. Its evaluation allows for the development and implementation of sustainable agriculture management techniques. Thus, in this study, an alternative method for the prediction of the parameters OM, CE, pH, and soil moisture based on the R-NIR and VIS-NIR-SWIR models is presented to demonstrate its applicability in the Ecuadorian Andean region. For this purpose, logistic regression analysis and linear discriminant function analysis were used. This required the establishment of homogeneous zones defined by soil order and land use combinations to design and implement soil-sampling strategies and field-satellite spectral measurements. The findings of this study suggest that soil + RS spectroscopy is a useful technique to predict soil properties, presenting good potential as an impetus towards future soil studies.
According to the results of this study: (1) The logistic regression function made it possible to predict the values as a soil order function and each of the physicochemical parameters described above. Therefore, because of the achieved results, the proposed methodology might be applied to other regions and adapted to predict soil properties as a function of the site-specific soil order and land use properties. Future research should explore the variability of soil quality parameters geographically with the aim of building regional models.