Remote Sensing and Kriging with External Drift to Improve Sparse Proximal Soil Sensing Data and Deﬁne Management Zones in Precision Agriculture

: The precision agriculture scientiﬁc ﬁeld employs increasingly innovative techniques to optimize inputs, maximize proﬁtability, and reduce environmental impacts. Therefore, obtaining a high number of soil samples to make precision agriculture feasible is challenging. This data bottleneck has been overcome by identifying sub-regions based on data obtained through proximal soil sensing equipment. These data can be combined with freely available remote sensing data to create more accurate maps of soil properties. Furthermore, these maps can be optimally aggregated and interpreted for soil heterogeneity through management zones. Thus, this work aimed to create and combine soil management zones from proximal soil sensing and remote sensing data. To this end, data on electrical conductivity and magnetic susceptibility, both apparent, were measured using the EM38-MK2 proximal soil sensor and the contents of the thorium and uranium elements, both equivalent, via the Medusa MS1200 proximal soil sensor for a 72-ha grain-producing area in S ã o Paulo, Brazil. The proximal soil sensing attributes were mapped using ordinary kriging (OK). Maps


Introduction
The diverse economic and environmental pressures on crop-producing farms are growing in tandem with the demands for food, given a growing world population [1].The increase in agricultural productivity is much needed due to the challenges of distributing food and commodities.However, it is challenged by rising debates on environmental protection and conservation, soil, fertilizer, and water use regulation, and social and economic equity [2].A promising alternative to optimize the use of fertilizers and irrigation water, avoiding their overuse and waste while increasing crop production on farms, is applying them at different rates across a field, which has been proposed in the precision agriculture (PA) scientific field [3][4][5][6][7][8].
Many farms manage soil using fixed-rate applications, through which fixed amounts of fertilizers, amendments, and water are applied.This approach considers an area homogeneous, disregarding soil, relief, and plant variations.From the farmers' point of view, it is a practical and convenient method to manage the farm, as it simplifies the management process.On the other hand, this approach may overestimate the inputs in some areas, leading to high costs for farmers and more significant environmental impacts, and underestimate them in other areas, leading to suboptimal production and potential soil degradation [3, [9][10][11].
Nevertheless, the delimitation of subregions with distinct soil properties in homogeneous areas has been practiced for a long time in history [12].The interdisciplinary field of PA recognizes this approach as management zones (MZs).Doerge (1999) defines an MZ as a sub-region of a field that expresses a homogeneous combination of yield-limiting factors for which a single rate of a specific crop input is appropriate [13].
However, one of the most time-consuming and expensive activities for identifying and delineating MZs within an agricultural area for variable-rate applications in PA is obtaining many samples to characterize the variations in soil, relief, and plant properties.Traditionally, as Nawar et al. (2017) [3] describe, the agricultural conditions of the soil are assessed using a limited number of deemed representative samples (usually a few composite samples in an area), which are analyzed in the laboratory via wet chemistry [14].However, sampling and analyzing soils via wet chemistry is costly, limiting the number of samples.As such, the explicit recognition of variations and the delineation of homogeneous areas inside a plot to support variable-rate input applications needs a better solution to overcome the high costs and time invested in collecting and analyzing many soil samples.
With new equipment sensitive to soil properties, it is possible to distinguish them in productive sub-regions for a given crop with greater efficiency than in the past [14].Proximal soil sensors (PSSs) have been used to overcome this issue, as they obtain more data, covering a field more efficiently.PSSs constitute geophysical equipment that measures up to 2 m from the target [14].They can be mounted to vehicles or transported manually, allowing measurements of soil properties correlated with other properties conventionally measured in the laboratory, including clay, carbon, and moisture content [15][16][17].Among the properties measured via PSSs, the apparent electrical conductivity (aEC) and magnetic susceptibility (aMS), as well as the equivalent contents of thorium (eTh) and uranium (eU), have been widely studied in soil science [3, [18][19][20][21][22].
The aEC data measured using the Geonics EM38-MK2 sensor has been used to identify and map the spatial variation of soils affected by salts due to irrigation [23][24][25].This property has also been measured to predict and map soil properties, such as soil texture and organic matter [16,26,27], and to assess soil electrical conductivity variation in a grainproduction farm [22].In addition, gamma-ray proximal sensors have been used to predict and map soil properties at various spatial scales [28][29][30][31].In agricultural areas, proximal gamma-ray sensors such as the Medusa MS1200 have shown potential for predicting the soil water retention curve of the gravimetric moisture, nitrogen, phosphorus, and potassium content [16,32,33].
The procedure of datafusion expects to use more than one data source with different resolutions to perform a more accurate mapping procedure.This practice is widely studied in soil science and is known as data fusion [15,16,[34][35][36].Remote sensors mounted on satellites hundreds of kilometers away or on aerial vehicles hundreds of meters from the ground have also been used to model and map soil properties in the field [7,[37][38][39].Using statistical modeling, they can be combined with proximal sensors to increase the multivariate correlation with soil properties.Fusing data from remote and proximal sensors with different spatial and temporal resolutions may lead to better soil property maps and more robust management zones for PA [16,35,40,41].
De Benedetto et al. ( 2013) fused data from a hyperspectral remote sensor with a ground penetration radar (GPR) data amplitude, aEC data from a PSS, and vegetation indices from a remote sensor to define management zones in a 300 m × 30 m production field in southeastern Italy [20].First, they performed individual clustering algorithms for each sensor group and merged the data.Finally, they managed to identify three major management zones.From principal component analysis, they identified that the GPR amplitude data were sufficient to identify one of the management zones.In contrast, two other clusters were identified using the aEC data and the vegetation indexes.Horney et al. (2005) performed a cookbook-style study, presenting practical MZ implementation methods [23].They used the aEC dataset obtained via the EM38-MK2 as an example.They studied two areas in the San Joaquin Valley, CA, USA.They defined two objectives to describe a generalizable methodology for other cultivated varieties.Moreover, a second objective was to test the first methodology in experiments with a low germination rate.Fraisse et al. (2001) used aEC data from the EM38-MK2 and fused it with data derived from a digital elevation model (DEM) obtained via GPS in two study areas of 36 and 28 ha near Centralia, Missouri, USA [18].Unsupervised classification and principal components analysis of the aEC data and topographic attributes derived from the DEM were used to define clayey soil management zones.They concluded that the ideal number of management zones could vary yearly, depending mainly on the climate and the crop.Li et al. (2008) used bulk electrical conductivity (EC b ) measurements for the soil profile (0-20 cm) using a portable WET (W stands for water, E electrical conductivity, and T temperature) sensor, the cotton yield, and normalized difference vegetation index (NDVI) data measured in a 396 grid-sampling scheme to define soil management zones in a 15-ha field on saline coastal land in Zhejiang Province, China [42].Maps for EC b and yield produced via geostatistics and the NDVI map were merged using fuzzy c-means clustering to define the MZ modeled to the soil and yield 396 data points via analysis of variance to assess how well the designated management zones reflected the soil properties and yield level.The analysis of variance indicated significant differences between the soil's chemical properties and the cotton yield among the three delineated management zones.
Hence, it was hypothesized that PSSs combined with remote sensing (RS) data can better identify the soil heterogeneity during the MZ mapping process and improve operations in PA procedures.During fieldwork, the operator follows transects and generates considerable data points.There is a frequent question about the optimal density and spacing of the transects, which involves a trade-off between the intensity and costs of the fieldwork and the accuracy of the resulting MZ maps.The objective of this study was to compare MZ maps produced via ordinary kriging (OK) using aEC, aMS, eTh, and eU maps and kriging with external drift (KED), considering them together with RS satellite images and digital-elevation-model-derived covariates to evaluate the potential of using these covariates together with a different mapping procedure.Finally, both MZ maps' accuracy was assessed to determine whether fusing proximal and remote sensor data fusion increases the accuracy in MZ delineation by relatively improving the index.

Study Area
The study area was a crop field under center pivot irrigation of 72 ha.It was in the municipality of Itaí, in the state of São Paulo, southeastern Brazil (Figure 1A-C), with central coordinates 23.5854 • south and 48.9395 • west and elevation of approximately 685 m (Figure 1D).The field was cultivated with a no-till crop rotation system of wheat, bean, and oat.In the study pivot area, the soils were described explicitly as a Latossolo Vermelho Distrófico típico ("Ferralsols" in WRB (FAO, 2015)), with a clayey texture in the surface layer (515 g kg −1 ) and a very clayey texture in the subsurface (600 g kg −1 ) [43].
The study area was a crop field under center pivot irrigation of 72 ha.It was in the municipality of Itaí, in the state of São Paulo, southeastern Brazil (Figure 1A-C), with central coordinates 23.5854° south and 48.9395° west and elevation of approximately 685 m (Figure 1D).The field was cultivated with a no-till crop rotation system of wheat, bean and oat.In the study pivot area, the soils were described explicitly as a Latossolo Vermelho Distrófico típico ("Ferralsols" in WRB (FAO, 2015)), with a clayey texture in the surface layer (515 g kg −1 ) and a very clayey texture in the subsurface (600 g kg −1 ) [43].
To assess the region's climatology, two weather stations close enough to contribute to the temperature estimate at Itaí: Bauru Airport (62%, 121 km north) and Viracopos International Airport (38%, 168 km east).The historical models were built with data from 1 January 1980 to 31 December 2016.The season with the most significant precipitation occurs from October to March.However, it rains throughout the year in Itaí at a monthly average of 190 mm.Throughout the year, the temperature generally ranges from 14 °C to 30 °C and is rarely below 10 °C or above 34 °C.

Proximal Soil Sensor Data Preparation
Apparent electrical conductivity (aEC) and magnetic susceptibility (aMS) were measured using an EM38-MK2 electromagnetic induction conductivity meter (Geonics Ltd.Mississauga, ON, Canada) in Vertical Mode with 1-m coil spacing.This sensor was placed on a foam cushion inside a wooden box free of metallic parts mounted on a 1-cm-thick rubber mat.The rubber mat was tied to the rear of a 4 × 4 pickup truck using a 3-m-long To assess the region's climatology, two weather stations close enough to contribute to the temperature estimate at Itaí: Bauru Airport (62%, 121 km north) and Viracopos International Airport (38%, 168 km east).The historical models were built with data from 1 January 1980 to 31 December 2016.The season with the most significant precipitation occurs from October to March.However, it rains throughout the year in Itaí at a monthly average of 190 mm.Throughout the year, the temperature generally ranges from 14 • C to 30 • C and is rarely below 10 • C or above 34 • C.

Proximal Soil Sensor Data Preparation
Apparent electrical conductivity (aEC) and magnetic susceptibility (aMS) were measured using an EM38-MK2 electromagnetic induction conductivity meter (Geonics Ltd., Mississauga, ON, Canada) in Vertical Mode with 1-m coil spacing.This sensor was placed on a foam cushion inside a wooden box free of metallic parts mounted on a 1-cm-thick rubber mat.The rubber mat was tied to the rear of a 4 × 4 pickup truck using a 3-mlong nylon ribbon, avoiding metallic interference from the truck (Figure 2A).In addition, equivalent thorium (eTh) and uranium (eU) contents were measured using the gamma-ray spectrometer Medusa MS1200 (Medusa, Groningen, the Netherlands) mounted on the brush grille guard of the vehicle and tied with a nylon ribbon, staying approximately 50 cm from the ground (Figure 2B).
nylon ribbon, avoiding metallic interference from the truck (Figure 2A).In addi equivalent thorium (eTh) and uranium (eU) contents were measured using the gam ray spectrometer Medusa MS1200 (Medusa, Groningen, the Netherlands) mounted on brush grille guard of the vehicle and tied with a nylon ribbon, staying approximatel cm from the ground (Figure 2B).
The field campaign to collect data from proximal sensors occurred right after har with the soil covered by leftover straw.The fully equipped vehicle (Figure 2C) was dr along parallel survey lines about 40 m apart, totaling 26 lines, at a constant speed o proximately 15 km/h, totaling 90 min for the operation.Measurements were taken f both sensors every second.The EM38-MK2 sensor measured aEC and aMS at 5790 points, while eTh and eU the MS1200 sensor were measured at 6563 points.The difference between the total va measured using the EM38-MK2 and the Medusa MS1200 was due to the moment quired to start the measurement process with each sensor.During the original EM38-M data exploratory analysis, highly conductive aEC outliers and corresponding aMS s ples close to the irrigation equipment were identified and removed.Also, some locat were oversampled due to vehicle stops and other maneuvers during data acquisition.zerodist function from the sp package [44] in the software R (version 3.4.2) [45] was app to filter, in both EM38-MK2 and MS1200 datasets, data recursively measured at the e location.
After data filtering, the remaining PSS dataset had 4306 and 4896 samples from EM38-MK2 and MS1200 sensors, respectively.The EM38-MK2 sensor data showed m values to be removed due to the more significant influence of the metal parts of the ir tion pivot close to the vehicle path.The Medusa MS1200 data set only had the point cursively measured at the exact location removed, allowing for a more significant num of points to be maintained.
Before the mapping procedure, the last preprocessing step was to reserve a set of points from the PSSs to validate the maps of the attributes measured using the PSS duced by OK and the KED.The sample function of the dplyr package [46] was used in software R [45] to select the external validation dataset of the produced maps.Four h dred points were reserved for all datasets of the two PSSs, and the accuracy of The field campaign to collect data from proximal sensors occurred right after harvest, with the soil covered by leftover straw.The fully equipped vehicle (Figure 2C) was driven along parallel survey lines about 40 m apart, totaling 26 lines, at a constant speed of approximately 15 km/h, totaling 90 min for the operation.Measurements were taken from both sensors every second.
The EM38-MK2 sensor measured aEC and aMS at 5790 points, while eTh and eU via the MS1200 sensor were measured at 6563 points.The difference between the total values measured using the EM38-MK2 and the Medusa MS1200 was due to the moments required to start the measurement process with each sensor.During the original EM38-MK2 data exploratory analysis, highly conductive aEC outliers and corresponding aMS samples close to the irrigation equipment were identified and removed.Also, some locations were oversampled due to vehicle stops and other maneuvers during data acquisition.The zerodist function from the sp package [44] in the software R (version 3.4.2) [45] was applied to filter, in both EM38-MK2 and MS1200 datasets, data recursively measured at the exact location.
After data filtering, the remaining PSS dataset had 4306 and 4896 samples from the EM38-MK2 and MS1200 sensors, respectively.The EM38-MK2 sensor data showed more values to be removed due to the more significant influence of the metal parts of the irrigation pivot close to the vehicle path.The Medusa MS1200 data set only had the points recursively measured at the exact location removed, allowing for a more significant number of points to be maintained.
Before the mapping procedure, the last preprocessing step was to reserve a set of data points from the PSSs to validate the maps of the attributes measured using the PSS produced by OK and the KED.The sample function of the dplyr package [46] was used in the software R [45] to select the external validation dataset of the produced maps.Four hundred points were reserved for all datasets of the two PSSs, and the accuracy of the produced maps was evaluated using the root of the mean square error (RMSE, Equation ( 1)) index and the external validation dataset.
where N is the number of observations, y is the original value, and ŷ is the predicted value.

Remote Sensor Data Preparation
Since the data from the PSSs were obtained in the first field campaign (September 2018) and the data analyzed in the laboratory were from the second campaign (October 2019), it was decided to select RS satellite data for the two dates of the campaigns to contemplate the two scenarios in which the soil of the area had been used.More about the soil sampling data for laboratory analysis is better described below.This approach consisted of maintaining a pool of orbital image data to allow the selection of possible correlations and influences between the spectrum values of the bands with the proximal sensor variables so as not to restrict the soil scenario to only that without till or cover [47].
The DEM obtained from the Alos Palsar satellite was used to complement the information about the RS covariables.Ten relief covariables were derived using the R software's RSAGA package [48].The description of the satellite images and the relief variables derived from the DEM and their abbreviations can be seen in Table 1.Also, the satellite images described in Table 1 are summarized in abbreviation given that two scenarios were contemplated (2018 and 2019) during the analysis and modeling.
All the different bands' satellites and all the rasters of the covariates derived from the DEM were resampled to the final resolution of 10 m so that it was possible to stack them and extract the information from each raster to the respective coordinates of the data of PSS.For this, the resample function of the raster package [49] using the bilinear interpolation method in the R software and the extraction process were performed using the extract function in the raster package.

Selection of Remote Sensing Covariates
The many remote sensor covariates required filtering to identify those most significant to integrate the PSS data and optimize the model.These models would be used to build maps using RS and the PSSs via KED and finally define the MZ maps.The following sections better describe the fusing process to determine the MZ.
The regression method via linear models was used to model data from the PSSs as a function of RS data, and multicollinearity was minimized.The number of covariates was reduced using Pearson's correlation r values between all predictive covariates (the remote sensing satellite and DEM derivates).For this, all variables that showed Pearson's r values greater than 0.9 absolute were identified, and then these covariates were correlated again with the PSSs dataset.Those with the highest Pearson correlation absolute r values were maintained.
Finally, the remaining group of RS was applied to the regsubsets function present in the leaps package [50] in the R software [45].This function allows for testing all possible linear combinations between the remaining satellite covariates with the proximal sensors' data.Then, the combination that presented the lowest value for Schwarz's information criterion (BIC) [51] was selected to integrate the prediction model of the PSS attributes as a function of the RS data.

Mapping via Ordinary Kriging and Kriging with External Drift
It's observed that different interpolation methods can generate more reliable thematic maps to represent specific areas, especially in locations with limited sampling units [52].
Thus, two sections describe the mapping of the original data from the PSSs and the predicted data from the PSSs according to the data from RS selected from the covariate selection procedure described above.First, the original PSS data were mapped using the ordinary kriging method (OK).Then, the data from the PSSs predicted via the selected RS covariables were mapped using kriging with external drift (KED).

Proximal Soil Sensing Mapped via Ordinary Kriging
All data from the PSSs were assessed for spatial distribution patterns to evaluate the normality distribution to interpolation using OK.Using isotropic adjustment variograms, the least-squares reduction adjustment procedure was performed via variogram analysis tools in the gstat package [53] in the R software.The semivariogram works to estimate a value for a region in which the semivariogram distance is known using data in the neighborhood of the estimation location [54].
The semivariograms of the aEC, aMS, and eTh data were adjusted using the spherical model (Equation (2), while the eU data were adjusted for the Gaussian model (Equation (3); then, these data were interpolated using OK (Equation (4) [55].Finally, all data from the proximal sensors (aEC, aMS, eTh, and eU) were interpolated using a spatial resolution of 10 m.
where c is the sill variance, and a is the range.h is the separator, being the vector of the separation of the pairs of points in which h = x i − x j and where C x i , x j = ∑ {Z(x i − µ)} Z x j − µ , which is a constant for any h.This constancy of the mean, variance, and covariance depends only on the separation and not the absolute position [55].
where c is the sill variance, r is the distance parameter, and h is the vector of the separation of pairs of points [56].
where Z • KO (x 0 ) is the value of the random variable to be estimated via ordinary kriging (OK), λ i is the optimal weights calculated under two constraint conditions ((a) the estimator is not skewed and (b) the variance of the estimate is minimal), and Z(x i ) is the values of the random variable at the n sample points [56].

Predicting and Mapping Proximal Soil Sensing Data via Kriging with External Drift
The kriging method with external drift (KED) consists of identifying the presence of a trend.Therefore, it is necessary to show how to separate the deterministic trend from the random component and estimate the components' contribution via restricted maximum likelihood [56].The external drift method integrates into the kriging system supplementary universality conditions about one or several external drift variables measured exhaustively in the spatial domain [57].
If a target variable's tendency is the spatial coordinates, it indicates that this may be one of several cases of mixed linear models to be considered, for example.The deterministic effects may be modeled by others like y 1 , y 2 , . ... In the present study, these variables were the RS covariables, which can be linearly related to Z.In this case, Z will be the PSS attribute.Hence, measuring and calculating the target locations and the Z points is possible.Under these circumstances, it is possible to observe Equation (5).(5) What was done was to replace the coordinates in the function with the values of one or more variables.The y_1 (x) , y_2 (x) , . .., y_k (x) were known, and β_k was the unknown coefficients to be determined [54].y_k, being k = 1, 2, . . .was the "external" variables distinct from the internal Z_x and kriged with them; therefore, it was known as "kriging with external drift".The KED incorporates the local trend within a neighborhood search window as a linear function of a secondary variable that varies smoothly, and the trend of the primary variable must be linearly related to the secondary variable [57].
Since the primary variable trend must be linearly related to the secondary variable, only the proximal sensor aEC and aMS data presented satisfactory R 2 and adjustable R 2 values of linear model adjustments (>0.7) using RS data as covariates.The eTh and eU data did not show a significant linear relationship, with no possible combination of RS data used in the present study.Although KED was not performed for eU and eTh, the best models were adjusted and would be shown, and these attributes would be mapped using OK only.Thus, only the aEC and aMS data were interpolated with KED using the krige function of the gstat package [53] associated with the pre-selected satellite covariate models.All maps were interpolated using 10 m as the spatial output resolution.
The regression modeling, followed by ordinary kriging of the residues for the aEC and aMS maps, was tested since these two attributes presented high adjusted R 2 values.
For this, the simple linear model (lm function) and the predicted functions present in the R software [45] were used to perform the regression after applying the selected regression model to the entire spatial pixel area of the RS covariates.Then, these maps were also evaluated for accuracy through the external validation RMSE index using the 400 previously separated samples.

Management Zones
Three management zone maps (MZs) were defined based on three combinations.The first MZ map was built using only the PSS maps via OK.Then, the second MZ map was constructed from the PSS associated with the best RS covariates through KED mapping.In this second MZ map, the eTh and eU data were used, but the OK map version was considered since KED did not show high linear adjustment.Finally, the third MZ map was built using all the RS covariates described in Table 1.
The kmeans function in the stats package presented natively in the R software [45] was used.The algorithm chosen within this function used the calculation of Hartigan and Wong (1979) [58].The three datasets (PSS; PSS + RS; RS) were organized in a data matrix format and grouped using the k-means method.This method partitioned the dataset into k groups.The sum of the distance for the centers of the designated clusters was minimized.In the present study, three zones were chosen for k since many zones would not be operationally practical for the farmers.The k-means clustering algorithm aims to partition n observations into k clusters, where each observation will belong to a cluster with the average value closest to each other.The centers are in the mean of their sets of Voronoi polygons, which are the sets of points closest to each cluster's center.The value "1" was set to reproduce the same arrangement using the function set.seedpresent in the stats package in the R software [45].
Considering that the PSS and RS data were obtained from different sources all data used in the kmeans function were parameterized for the zero values of mean and variance one using the scale function present in the stats package of the R software.

Soil Laboratory Data Sampling as Field Truth
Soil data collected traditionally were used to validate the design of the three MZ maps.Seventy-two disturbed samples were collected at the soil surface (0-10 cm depth) and arranged in a regular grid (one sample per hectare; Figure 1C) using a Dutch auger (Figure 3A) in a second field campaign.

Management Zone Validation
The 72 points analyzed in the laboratory validated the three MZ maps produced from PSSs mapped via OK, using PSS data combined with the best RS covariate models mapped This campaign was carried out on 6 October 2019, and beans covered the study area in an early stage of growth (Figure 3B,C).The stratify function in the spcosa package [59] was used to define and locate the points in the regular grid.This function allows to separate an area of interest in sub-areas with exact sizes, and then the spsample function present in the same package was used to locate the center of each section of the area.
The 72 disturbed samples were analyzed for soil texture and chemical contents.For texture analysis, the clay contents were measured using the sieve-and-pipette method Teixeira et al., 2017 [60].

Management Zone Validation
The 72 points analyzed in the laboratory validated the three MZ maps produced from PSSs mapped via OK, using PSS data combined with the best RS covariate models mapped via KED, and using only RS data.The zone classes were extracted from the three MZ maps for the 72 coordinates associated with the laboratory data using the extract function present in the raster package [49] in the R software [45].
Finally, the variance analysis method (ANOVA) was used to identify whether the management zones could distinguish the variance of the values of the laboratory attributes regarding their classes of management zones.The aov function of the stats package in the R software [45] was used for that.

Results and Discussion
The descriptive statistics of the PSS data for the training and validation dataset and the laboratory data are shown in Table 2.For all PSS datasets, the mean and median values were close.Also, the standard deviation values did not contrast much.Therefore, the asymmetry values for all PSS data can be considered moderately positive except for the aMS data.The kurtosis coefficients for the eTh and eU data, including validation data, can be regarded as leptokurtic.The coefficient corresponds to a less flat distribution than the same area's standard curve, as they are below the absolute value of 0.263 [61].aEC: apparent electrical conductivity in mS/m (millisiemens per meter); aMS: apparent magnetic susceptibility in ppt (parts per thousand); eTh: equivalent thorium in ppm (parts per million); eU: equivalent uranium in ppm (parts per million); clay in g kg −1 ; Ca: calcium in g kg −1 ; C: organic carbon in g kg −1 ; Mg: magnesium in g kg −1 .
The distribution pattern of the aEC and aMS attributes and laboratory data could be classified as platykurtic, corresponding to a flatter distribution than the normal curve of the same area, except for aEC for the leptokurtic training dataset.Despite the asymmetry and kurtosis values of the aEC data, it was impossible to consider a normal distribution, and therefore, for mapping via OK, it was necessary to transform the data to fit the normal distribution using natural logarithmic distribution.The other attributes of the PSS (aMS, eTh, and eU) and the separate data for external validation showed a normal distribution.The distribution pattern of the values for all data sensors (the training and validation dataset) and laboratory datasets is shown in Figure 4.

23, 5, FOR PEER REVIEW 11
the highest values of electrical conductivity for a 200-ha experimental study area field situated 350 km north of Perth at Buntine, Western Australia.The electrical conductivity and magnetic susceptibility values were similar to those presented by Castrignanò et al., 2012 [62] on an 80-ha cropping field in Corrigin, Western Australia.The thorium and uranium values were similar to those found in the study by Wong et al., 2009 [63], in which the lowest values of eTh and eU occurred as opposed to the highest values of electrical conductivity for a 200-ha experimental study area field situated 350 km north of Perth at Buntine, Western Australia.
The prediction models adjusted for the sensor attributes are shown in Table 3.The RS covariates selected from the selection procedure described in Section 2.4 via the regsubset function and lower BIC values are shown in the "Coefficient" column of Table 3 (first column).The intercept values represent the estimated response variable values when all predictors were zero.Notably, the aspect variable showed a positive association with log(aEC), indicating that an increase in aspect was linked to a slight rise in electrical conductivity (log(aEC)) by 0.01 mS/m.Conversely, the variable "chnb" exhibited negative associations with log(aEC) (−0.02 mS/m), aMS (−0.02 ppt), and eTh (−0.24 ppm), suggesting that higher values of "chnb" were associated with decreased values of these response variables.The DEM (Digital Elevation Model) variable demonstrated positive relationships with aMS (0.09 ppt), eTh (0.33 ppm), and eU (0.02 ppm), implying that an increase in DEM led to higher values of these response variables.Plan curvature ("plan_curv") negatively influenced aMS, indicating that higher plan curvature significantly decreased aMS by 15.63 ppt.Additionally, the variable "twi" showed a positive association with log(aEC) (0.01 mS/m), suggesting that increased topographic wetness index (twi) values were linked to higher electrical conductivity.0.01-0.020.01 aspect 0.01-0.000 ast_B1 0.00-0.000 land_2018_B3 0.00-0.000 0.00-0.000 land_2018_B5 0.00-0.000 land_2018_B10 0.00-0.000 land_2019_B3 0.00-0.000 sent_2018_B8A −0.00-−0.000 −0.01-−0.01−0.01 −0.00-−0.000 0.00-0.000 sent_2018_B12 −0.00-−0.000 0.00-0.000 sent_2019_B2 −0.00-−0.000 0.00-0.000 sent_2019_B3 0.00-0.000 sent_2019_B8A −0.The variable "rsp" exhibited negative associations with log(aEC) (−1.18 mS/m), aMS (−6.42 ppt), and eTh (−0.65 ppm), indicating that higher "rsp" values led to decreased values in these response variables.Table 3 also highlights the importance of context-specific interpretation, as certain variables, such as "ast_B1", "land_2018_B3", "land_2018_B5", and others, did not show significant associations with any of the response variables within the specified confidence intervals.Considering the observed R-squared values ranging from 0.20 to 0.81, the regression models explained a substantial proportion of the variance in the respective response variables.Adjusted R-squared values provide a more accurate measure of goodness of fit by considering the number of predictors, ensuring a balanced evaluation of the model's explanatory power.The R 2 and adjusted R 2 values for the aEC and aMS attributes were close to 0.8, indicating a high potential correlation with the identified satellite attributes.Finally, the graphs containing the predicted and original values for the PSS attributes according to the best models of RS covariates are shown in Figure 5.The nugget/sill ratio is shown in Table 4, column 6, to present the nugget parameter's impact on the spatial structure of variation in the final representation of the semivariance.Those ratio values were multiplied by 100 to express percentages, allowing us to better understand and discuss the errors of semivariance.The KED method for the attributes of the aEC and aMS sensors presented 34% and 17%, respectively.These values were higher Since the aEC and aMS data were well adjusted for the RS covariates, pixel-by-pixel regression from the created estimation model was also used.However, this method did not show significant results when using the external validation dataset through the RMSE index to evaluate the aEC and aMS maps.There was also no spatial dependence on the predicted map residues, so the regression procedure and subsequent ordinary kriging of the residues could not be performed.Then, it was considered to remain with the KED method to compose the maps from which the MZ would be delineated.
Semivariograms were adjusted by removing the RS covariates' trend, as shown in Figure 6.The parameters used in the PSS semivariograms to map using OK and KED are shown in Table 4.All attributes of the aEC, aMS, and eTh data were adjusted for the spherical model (Figure 6), while the sensor eU was better adapted for the Gaussian model (Figure 6).The low values of the nugget parameter may have indicated a reduction in the values of variance and standard deviation of the maps.The interpolated maps showed similar distribution patterns for the OK method (Figure 7A) and KED (Figure 7B).The high concentrations of aEC were in the southwestern portion of the map (Figure 7A), where an ephemeral drainage channel could be observed.Although the survey with the EM38-MK2 was carried out in a dry season (July), with the soil not cultivated (therefore, without a pivot irrigation schedule) and only protected with previous harvest The nugget/sill ratio is shown in Table 4, column 6, to present the nugget parameter's impact on the spatial structure of variation in the final representation of the semivariance.
Those ratio values were multiplied by 100 to express percentages, allowing us to better understand and discuss the errors of semivariance.The KED method for the attributes of the aEC and aMS sensors presented 34% and 17%, respectively.These values were higher than the ratio when using the OK method.The analysis of this ratio suggested that the distance of 150 m, represented by the aEC data set, for example, was the maximum adequate distance for data collection with sensors since, from the removal of the trend identified using the RS covariates, from this distance, there may have been a significant influence of randomness on the semivariance.
The interpolated maps showed similar distribution patterns for the OK method (Figure 7A) and KED (Figure 7B).The high concentrations of aEC were in the southwestern portion of the map (Figure 7A), where an ephemeral drainage channel could be observed.Although the survey with the EM38-MK2 was carried out in a dry season (July), with the soil not cultivated (therefore, without a pivot irrigation schedule) and only protected with previous harvest wheat straw, the high concentration values of aEC may have been associated with higher soil moisture related with the clay presence concentrations, as found in similar studies by Fulton et al. (2011) [64] and Islam et al. (2011) [65].However, in the present study, the values of this sensor attribute did not have significant Pearson correlation values (−0.17) with clay (Table 5).In contrast, the lower aEC values in the northern part of the study area suggest a well-drained region with higher elevation and more sandy soils.The aEC data showed high correlation values with the chemical attributes of C and Mg.In contrast, the attributes measured via gamma radiometric sensors showed high correlation values for all measured attributes, among which C and clay presented the highest (0.57 and 0.46, respectively).The clay content showed a weak positive correlation with the calcium content (r = 0.19) and moderate positive correlations with carbon content (r = 0.28 *), magnesium content (r = 0.25 *), apparent magnetic susceptibility (r = 0.28 *), equivalent thorium concentration (r = 0.50 **), and equivalent uranium concentration (r = 0.46 **).Calcium content exhibited strong positive correlations with carbon content (r = 0.75 **), moderate positive correlations with magnesium content (r = 0.60 **), apparent magnetic susceptibility (r = 0.28 *), and equivalent thorium concentration (r = 0.42 **), and a slightly weaker positive correlation with equivalent uranium concentration (r = 0.43 **).Carbon content showed moderate positive correlations with magnesium content (r = 0.58 **), moderate negative correlations with apparent electrical conductivity (r = −0.42**), and strong positive correlations with apparent magnetic susceptibility (r = 0.48 **), equivalent thorium concentration (r = 0.65 **), and equivalent uranium concentration (r = 0.57 **).Magnesium content had a weak negative correlation with apparent electrical conductivity (r = −0.24*), moderate positive correlations with apparent magnetic susceptibility (r = 0.32 **) and equivalent thorium concentration (r = 0.44 **), and equivalent uranium concentration (r = 0.39 **).Apparent electrical conductivity exhibited strong negative correlations with apparent magnetic susceptibility (r = −0.93**), moderate negative correlations with equivalent thorium concentration (r = −0.68**), and weak negative correlations with equivalent uranium concentration (r = −0.28*).Apparent magnetic susceptibility showed strong positive correlations with equivalent uranium concentration (r = 0.79 **) and moderate positive correlations with equivalent thorium concentration (r = 0.76 **).with apparent magnetic susceptibility (r = −0.93**), moderate negative correlations with equivalent thorium concentration (r = −0.68**), and weak negative correlations with equivalent uranium concentration (r = −0.28*).Apparent magnetic susceptibility showed strong positive correlations with equivalent uranium concentration (r = 0.79 **) and moderate positive correlations with equivalent thorium concentration (r = 0.76 **).The aMS maps shown in Figure 8 presented a distribution inversely proportional to the distribution of the concentration values of the aEC maps (Figure 7).In addition, the OK mapping methods of the aMS (Figure 8A) and KED (Figure 8B) data showed similar distribution patterns for the entire study area.The aMS maps shown in Figure 8 presented a distribution inversely proportional to the distribution of the concentration values of the aEC maps (Figure 7).In addition, the OK mapping methods of the aMS (Figure 8A) and KED (Figure 8B) data showed similar distribution patterns for the entire study area.The attributes of the Medusa MS1200 sensor showed concordant distributions between eTh and eU (Figure 9A,B, respectively).According to the literature, the high values of the ratio of eU over eTh may indicate a tendency to erode since the eU attribute is more mobile in the environment [66].However, in Figure 9, it is possible to observe that the two attributes present patterns of distribution of values in similar regions, indicating that agricultural practice, even if implementing no-till, can minimally revolve the soil and homogenize the soil's attributes that control the preferential distribution of eU compared to eTh.Furthermore, from the spatial structure represented in the eU semivariograms and the map produced via OK, it was possible to notice monotonous variability in the study area, indicating growth in values towards the southeast.The values of the uncertainties of the sensor maps are presented in Table 6.Although the hypothesis of the work was to use RS covariables to optimize the mapping of the data from proximal sensors, the values of the aEC and aMS attributes mapped using both OK The attributes of the Medusa MS1200 sensor showed concordant distributions between eTh and eU (Figure 9A,B, respectively).According to the literature, the high values of the ratio of eU over eTh may indicate a tendency to erode since the eU attribute is more mobile in the environment [66].However, in Figure 9, it is possible to observe the two attributes present patterns of distribution of values in similar regions, indicating that agricultural practice, even if implementing no-till, can minimally revolve the soil and homogenize the soil's attributes that control the preferential distribution of eU compared to eTh.Furthermore, from the spatial structure represented in the eU semivariograms and the map produced via OK, it was possible to notice monotonous variability in the study area, indicating growth in values towards the southeast.The attributes of the Medusa MS1200 sensor showed concordant distributions between eTh and eU (Figure 9A,B, respectively).According to the literature, the high values of the ratio of eU over eTh may indicate a tendency to erode since the eU attribute is more mobile in the environment [66].However, in Figure 9, it is possible to observe that the two attributes present patterns of distribution of values in similar regions, indicating that agricultural practice, even if implementing no-till, can minimally revolve the soil and homogenize the soil's attributes that control the preferential distribution of eU compared to eTh.Furthermore, from the spatial structure represented in the eU semivariograms and the map produced via OK, it was possible to notice monotonous variability in the study area, indicating growth in values towards the southeast.The values of the uncertainties of the sensor maps are presented in Table 6.Although the hypothesis of the work was to use RS covariables to optimize the mapping of the data from proximal sensors, the values of the aEC and aMS attributes mapped using both OK and KED presented values close to each other, indicating that there were no benefits of using satellite variables to optimize data from proximal sensors.The aEC and aMS maps produced with OK did not differ when mapped with the RS variables via KED.This was The values of the uncertainties of the sensor maps are presented in Table 6.Although the hypothesis of the work was to use RS covariables to optimize the mapping of the data from proximal sensors, the values of the aEC and aMS attributes mapped using both OK and KED presented values close to each other, indicating that there were no benefits of using satellite variables to optimize data from proximal sensors.The aEC and aMS maps produced with OK did not differ when mapped with the RS variables via KED.This was expected since the RMSE values were also close to each other.The maps of the management zones based on PSS (OK) and PSS plus RS (KED) classified via the k-means cluster algorithm are presented in Figure 10A and B, respectively.It is possible to observe that the zones outlined using the PSS maps via OK and KED were remarkably similar.This pattern may have resulted from implementing eTh and eU data in an OK format to compose the management zone map representative of the KED method.
The remote sensing covariables that collected information from the irrigation pivot area when the soil was planted (2019) did not optimize or significantly improve the distribution of the observed patterns of the PSS attributes spatially.When beans were grown, the satellite images representative of 2019 had a significative edge effect, recognizing the transition between exposed soil and planting on the pivot area's borders that can be observed in the management zone map using only RS data (Figure 11).On the other DEM-derived data did not present edge problems.Therefore, the DEM-derived attributes selected to compose the KED models described in Section 2.4 and shown in Table 3 could represent the variables controlling the study area's water behavior.The MZ created from RS data alone (Figure 11) could only separate the zone to the north of the study area, similar to the MZ maps via OK and KED (Figure 10A,B).It is possible to observe that the orange management zone in Figure 11 was like the zones in Figure 10 for both methods used (OK and KED).
The management zone in the blue area of Figure 11 needed to be better classified due to the edge effect of some satellite images in 2019 that identified the planting limits between beans and exposed soil.Although it was impossible to identify improvements for optimizing the PSS data via KED, generating an MZ map to separate soil management zones was possible using only RS data via satellite images and DEM covariates.
Table 7 shows that the classes created with the PSS data groupings via OK, KED, and RS only could distinguish the distribution regions for all the soil data analyzed in the laboratory.However, the MZ map using only remote sensor data could not be determined, with the same level of significance as OK and KED, the Ca and Mg attributes.The potential to treat different growth areas using proximal soil sensing was also presented by Van Meirvenne et al. (2013) [67] and De Benedetto et al. (2013) [68], who also tested the inclusion of remote sensor variables associated with the cluster clustering method.

Figure 1 .
Figure 1.(A) Map of Brazil in Latin America; (B) contour of the state of São Paulo; (C) contour of the municipality of Itaí; (D) map of the study area with a digital elevation model in the background, the vehicle route equipped with proximal soil sensors (PSSs), and points collected for analysis in a la boratory.

Figure 1 .
Figure 1.(A) Map of Brazil in Latin America; (B) contour of the state of São Paulo; (C) contour of the municipality of Itaí; (D) map of the study area with a digital elevation model in the background, the vehicle route equipped with proximal soil sensors (PSSs), and points collected for analysis in a laboratory.

Figure 2 .
Figure 2. (A) MS1200 gamma-ray spectrometer mounted to the vehicle's brush grille guard EM38-MK2 sensor cushioned inside a wooden box mounted on a rubber mat and tied to the cle's rear; (C) fully equipped vehicle showing the irrigation line in the background.

Figure 2 .
Figure 2. (A) MS1200 gamma-ray spectrometer mounted to the vehicle's brush grille guard; (B) EM38-MK2 sensor cushioned inside a wooden box mounted on a rubber mat and tied to the vehicle's rear; (C) fully equipped vehicle showing the irrigation line in the background.

AgriEngineering 2023, 5 ,Figure 3 .
Figure 3. (A) Soil sampling at 0-10 cm using a Dutch auger; (B) study area in September 2019 with irrigated beans; (C) view of the study area showing the bean plantation.

Figure 3 .
Figure 3. (A) Soil sampling at 0-10 cm using a Dutch auger; (B) study area in September 2019 with irrigated beans; (C) view of the study area showing the bean plantation.

Figure 4 .
Figure 4. aEC: apparent electrical conductivity in mS/m (millisiemens per meter); aMS: apparent magnetic susceptibility in ppt (parts per thousand); eTh: equivalent thorium in ppm (parts per million); eU: equivalent uranium in ppm (parts per million); clay in g kg −1 ; Ca: calcium in g kg −1 ; C: organic carbon in g kg −1 ; Mg: magnesium in g kg −1 .First line: histograms of proximal sensor data used for mapping via OK and KED; second line: histograms of proximal sensor data used to validate the OK and KED maps; third line: histograms of soil data collected with a Dutch auger.

Figure 4 .
Figure 4. aEC: apparent electrical conductivity in mS/m (millisiemens per meter); aMS: apparent magnetic susceptibility in ppt (parts per thousand); eTh: equivalent thorium in ppm (parts per million); eU: equivalent uranium in ppm (parts per million); clay in g kg −1 ; Ca: calcium in g kg −1 ; C: organic carbon in g kg −1 ; Mg: magnesium in g kg −1 .First line: histograms of proximal sensor data used for mapping via OK and KED; second line: histograms of proximal sensor data used to validate the OK and KED maps; third line: histograms of soil data collected with a Dutch auger.

Figure 5 .
Figure 5. Predicted versus experimental plots of proximal sensor data as a function of remote sensor data using the training dataset.The continuous black lines adjust the intercept and slope for the models, while the dashed lines are the intercept and model idealized as 1 and 0, respectively.R 2 adj.: R 2 adjusted value; aEC: apparent electrical conductivity in mS/m (millisiemens per meter); aMS: apparent magnetic susceptibility in ppt (parts per thousand); eTh: equivalent thorium in ppm (parts per million); eU: equivalent uranium in ppm (parts per million).

Figure 5 .
Figure 5. Predicted versus experimental plots of proximal sensor data as a function of remote sensor data using the training dataset.The continuous black lines adjust the intercept and slope for the models, while the dashed lines are the intercept and model idealized as 1 and 0, respectively.R 2 adj.: R 2 adjusted value; aEC: apparent electrical conductivity in mS/m (millisiemens per meter); aMS: apparent magnetic susceptibility in ppt (parts per thousand); eTh: equivalent thorium in ppm (parts per million); eU: equivalent uranium in ppm (parts per million).

Figure 6 .
Figure 6.Empirical (circles) and adjusted (lines) semivariograms of proximal sensor variables aEC: apparent electrical conductivity in mS/m (millisiemens per meter); aMS: apparent magnetic susceptibility in ppt (parts per thousand); eTh: equivalent thorium in ppm (parts per million); eU: equivalent uranium in ppm (parts per million).First line: semivariogram of aEC using OK and KED; second line: semivariogram of aMS using OK and KED; third line: semivariogram of eTh and eU using OK.There was no adjustment of the semivariogram via KED for the last two attributes of the gamma radiometer.It did not reasonably adjust regression models in the previous steps.

Figure 6 .
Figure 6.Empirical (circles) and adjusted (lines) semivariograms of proximal sensor variables aEC: apparent electrical conductivity in mS/m (millisiemens per meter); aMS: apparent magnetic susceptibility in ppt (parts per thousand); eTh: equivalent thorium in ppm (parts per million); eU: equivalent uranium in ppm (parts per million).First line: semivariogram of aEC using OK and KED; second line: semivariogram of aMS using OK and KED; third line: semivariogram of eTh and eU using OK.There was no adjustment of the semivariogram via KED for the last two attributes of the gamma radiometer.It did not reasonably adjust regression models in the previous steps.
aEC: apparent electrical conductivity in mS/m (millisiemens per meter); aMS: apparent magnetic susceptibility in ppt (parts per thousand); eTh: equivalent thorium in ppm (parts per million); eU: equivalent uranium in ppm (parts per million); clay in g kg −1 ; Ca: calcium in g kg −1 ; C: organic carbon in g kg −1 ; Mg: magnesium in g kg −1 ; levels of statistical significance: * p < 0.05 and ** p < 0.01.

Figure 9 .
Figure 9. (A) Equivalent thorium and (B) equivalent uranium maps produced via ordinary kriging.eTh: equivalent thorium in ppm (parts per million); eU: equivalent uranium in ppm (parts per million).

Figure 9 .
Figure 9. (A) Equivalent thorium and (B) equivalent uranium maps produced via ordinary kriging.eTh: equivalent thorium in ppm (parts per million); eU: equivalent uranium in ppm (parts per million).

Figure 9 .
Figure 9. (A) Equivalent thorium and (B) equivalent uranium maps produced via ordinary kriging.eTh: equivalent thorium in ppm (parts per million); eU: equivalent uranium in ppm (parts per million).

Figure 10 .
Figure 10.(A) Management zones made via k-means using proximal sensor maps with ordinary kriging (OK) and (B) management zones made via k-means using proximal sensor maps with external drift kriging (KED).

Figure 10 .
Figure 10.(A) Management zones made via k-means using proximal sensor maps with ordinary kriging (OK) and (B) management zones made via k-means using proximal sensor maps with external drift kriging (KED).

Figure 10 .
Figure 10.(A) Management zones made via k-means using proximal sensor maps with ordinar kriging (OK) and (B) management zones made via k-means using proximal sensor maps with ex ternal drift kriging (KED).

Figure 11 .
Figure 11.Management zones produced from all remote sensing data rasters using a k-means clus tering algorithm.

Figure 11 .
Figure 11.Management zones produced from all remote sensing data rasters using a k-means clustering algorithm.

Table 1 .
Resolution and description of the bands of the images used and covariates derived from the DEM.
SWIR: shortwave infrared; VRE: vegetation red edge; NIR: near-infrared; C/A: coastal/aerosol; TIRS: thermal infrared; ESA: the European Space Agency; NASA: the National Aeronautics and Space Administration; the _year_Bname refers to 2018 and 2019 and the band spectrum length name.

Table 2 .
Descriptive statistic of proximal soil sensing and laboratory data.

Table 2 .
Descriptive statistic of proximal soil sensing and laboratory data.

Table 3 .
Linear models' adjustment parameters of the proximal sensor attribute as a function of the remote sensing data using the best combination from Pearson's selection in addition to the BIC criterion and, finally, using a regsubset function.

Table 4 .
Semivariogram parameters for proximal sensor variables for ordinary kriging (OK) and kriging with an external drift (KED).

Table 5 .
r of Pearson correlation between the proximal and laboratory data variables.

Table 5 .
r of Pearson correlation between the proximal and laboratory data variables.

Table 6 .
Root mean square error (RMSE) of the external validation of the proximal sensor data variables mapped via ordinary kriging (OK) and kriging with an external drift (KED).: apparent electrical conductivity in mS/m (millisiemens per meter); aMS: apparent magnetic susceptibility in ppt (parts per thousand); eTh: equivalent thorium in ppm (parts per million); eU: equivalent uranium in ppm (parts per million). aEC

Table 7 .
Analysis of variance (ANOVA) of the classes of the management zone map produced from the data of proximal sensors via OK (PSS); proximal sensors combined as the best satellite covariate via KED (PSS and RS); and remote sensing (RS) data raster.

Table 7 .
Analysis of variance (ANOVA) of the classes of the management zone map produced from the data of proximal sensors via OK (PSS); proximal sensors combined as the best satellite covariates via KED (PSS and RS); and remote sensing (RS) data raster.