Finding Suitable Transect Spacing and Sampling Designs for Accurate Soil ECa Mapping from EM38-MK2

: Finding an ideal sampling design is a crucial stage in detailed soil mapping to assure reasonable accuracy of resulting soil property maps. This study aimed to evaluate the inﬂuence of sampling designs and sample sizes on the quality of soil apparent electrical conductivity (ECa) maps from an electromagnetic sensor survey. Twenty-six (26) parallel transects were gathered in a 72-ha plot in Southeastern Brazil. Soil ECa measurements using an on-the-go electromagnetic induction sensor were taken every second using sensor vertical orientation. Two approaches were used to reduce the sample size and simulate kriging interpolations of soil ECa. Firstly, the number of transect lines was reduced by increasing the distance between them; thus, 26 transects with 40 m spacing; 13 with 80 m; 7 with 150 m; and 4 with 300 m. Secondly, random point selection and Douglas-Peucker algorithms were used to derive four reduced datasets by removing 25, 50, 75, and 95% of the points from the ECa survey dataset. Soil ECa was interpolated at 5 m output spatial resolution using ordinary kriging and the four datasets from each simulation (a total of twelve datasets). Map uncertainty was assessed by root mean square error and mean error metrics from 400 random samples previously selected for external map validation. Maps were evaluated on their uncertainty and spatial structure of variation. The transect elimination approach showed that maps produced with transect spacing up to 150 m could preserve the spatial structure of ECa variations. Douglas-Peucker results showed lower nugget values than random point simulations for all selected sample densities, except for a 95% point reduction. The soil ECa maps derived from the 75% reduced dataset (by random sampling or Douglas-Peucker) or from 13 transect lines (80 m spacing) showed reasonable accuracy (RMSE of validation circa 0.7) relative to the map interpolated from all survey points (RMSE of 0.5), suggesting that transect spacing of 80 m and reading intervals greater than one second can be used for improving the e ﬃ ciency of on-the-go soil ECa surveys.


Introduction
Sampling design is fundamental in research and monitoring of natural resources. Proximal soil sensing (PSS) technology is currently available to produce soil attribute maps in high spatial resolution, aiming to support sustainable variable rate input management in precision agriculture [1,2]. However, optimal sampling designs using continuous PSS surveys are still lacking the definition in tailored data acquisition structures [33]. Therefore, large georeferenced datasets are gathered by continuous recording during less invasive and high-speed operations.
High density data sampling can provide an efficient characterization of soil property variations [32,33]. Sudduth et al. [34] recorded more than 5000 observations, corresponding to a 4-6 m data spacing, and Islam et al. [35] recorded more than 14,000 observations in a 1.4 ha ECa dataset. They applied the EM38-MK2 on a 1 s interval, stating that amount of points could enable automatic variogram fitting, provide proper kriging interpolations, optimal-design, and fast-navigation paths as required for cost-effective survey interventions. Whereas high density datasets may reduce bias in sampling designs [25], sample density directly affects output map uncertainties. Therefore, studies to tailor transect spacings and sample densities should be considered to overcome soil ECa map uncertainties when integrating mobile visualization (on-the-fly) and on-the-go monitoring for variable rate application decision support. In this context, this work aims to contribute to these questions, establishing efficiency thresholds to maintain output map accuracy. Investigations on different sample designs look to the reduction of transect lines and sampling observations as a matter of minimum track distance and maximum operational speed, respectively. The overall objective is to evaluate interpolated on-the-go ECa maps from different sample designs using common validation indices. It is believed that a proper combination of sampling designs can improve operational efficiency, preserving high quality with low uncertainty in map generation. The paper introduces preliminary analysis of continuous EM38 operational frameworks in Brazil, and it could provide basic information on optimal PSS sampling designs in tropical-soils that are relevant to central pivot no-till grain production.

Materials and Methods
This section details materials and methods related to soil ECa survey dataset investigations on sample distribution and density influencing output map accuracy. Specific objectives are addressed by considering: four transect spacing subsets; four sampling density subsets using the random point and four using the Douglas-Peucker selection algorithms; and kriging interpolations evaluated by a standard external validation subset for mean error (ME) and root mean square error (RMSE) indexes.

Study Area
The study was carried on a grain crop rotation production system (i.e., beans, soybeans, wheat, and oats) under central pivot irrigation and no-till soil management. The farm is located at Itaí district, São Paulo State, Brazil (Figure 1a). It has central coordinates of 23.58544 • South latitude and 48.9395 • West longitude, in a subtropical climate, with annual average maximum and minimum temperatures of 26 • C and 16 • C, respectively, and an average rainfall of 119 mm [36]. The paddock area is 72 ha at a maximum elevation of approximately 712 m above sea level (Figure 1b). The regional soil characterization is an association of LATOSSOLOS VERMELHOS Distróficos ("Ferralsols") and ARGISSOLOS VERMELHOS-AMARELOS Distróficos ("Acrisols") (Brazilian Institute of Geography and Statistics, 1:5000,000), The pivot area was offseason with wheat straw cover during survey.

EM38-MK2: Mobile Data Acquisition Structure and Survey Operation
This ECa survey used the EM38-MK2 EMI meter which includes two receiver coils, separated by 1 m and 0.5 m from a single transmitter coil. This device provides two simultaneous ECa datasets with readings in milliSiemens per meter (mS/m), either in vertical or horizontal dipole orientations. The effective depth ranges of ECa readings are 1.5 m and 0.75 m in the vertical position (ECaV), or 0.75 m and 0.375 m in the horizontal position (ECaH).
Field operations started with sensor calibrations at the height of 1.5 m from the ground in both horizontal and vertical dipole orientations. After calibration procedures the sensor was placed perpendicularly to the earth's surface in a mobile data acquisition system as further detailed, providing vertical dipole readings (ECaV). As no measurements were taken in the horizontal dipole orientation, the resulting dataset is further refereed as ECa.
Data storage was done in a single and continuous run, using Bluetooth connections between an Archer Rugged Handheld (Juniper System Inc., Bromsgrove, UK) PDA and two professional georeferencing devices, a XGPS-100A (Dual Electronics Corporation, Lake Mary, FL, USA) roof top GPS and a GeoExplorer 3000 (Trimble Inc., Sunnyvale, CA, USA) A mobile data acquisition system was structured in a wooden box with no metal parts used. The box was assembled using wood-glue and Velcro tapes wrapping it over a high-resistance rubber mat (1 cm thick). The rubber mat was attached to long nylon straps connected to the back of a 4 × 4 pickup, dragging the structure 3 m apart to avoid magnetic interference from the metallic body ( Figure 2).

EM38-MK2: Mobile Data Acquisition Structure and Survey Operation
This ECa survey used the EM38-MK2 EMI meter which includes two receiver coils, separated by 1 m and 0.5 m from a single transmitter coil. This device provides two simultaneous ECa datasets with readings in milliSiemens per meter (mS/m), either in vertical or horizontal dipole orientations. The effective depth ranges of ECa readings are 1.5 m and 0.75 m in the vertical position (ECaV), or 0.75 m and 0.375 m in the horizontal position (ECaH).
Field operations started with sensor calibrations at the height of 1.5 m from the ground in both horizontal and vertical dipole orientations. After calibration procedures the sensor was placed perpendicularly to the earth's surface in a mobile data acquisition system as further detailed, providing vertical dipole readings (ECaV). As no measurements were taken in the horizontal dipole orientation, the resulting dataset is further refereed as ECa.
Data storage was done in a single and continuous run, using Bluetooth connections between an Archer Rugged Handheld (Juniper System Inc., Bromsgrove, UK) PDA and two professional georeferencing devices, a XGPS-100A (Dual Electronics Corporation, Lake Mary, FL, USA) roof top GPS and a GeoExplorer 3000 (Trimble Inc., Sunnyvale, CA, USA) A mobile data acquisition system was structured in a wooden box with no metal parts used. The box was assembled using wood-glue and Velcro tapes wrapping it over a high-resistance rubber mat (1 cm thick). The rubber mat was attached to long nylon straps connected to the back of a 4 × 4 pickup, dragging the structure 3 m apart to avoid magnetic interference from the metallic body ( Figure 2).  Navigation speed through the entire study area was kept constant at 15 km/h, taking 90 min for the total operation and collecting parallel transect lines in a back-and-forth path. The average distance between transect lines was approximately 40 m ( Figure 1b). The soil ECa survey raw dataset was of 5788 observation points in total. The navigation path included 26 transect lines with the EM38-MK2 sensor set for a 1 sec reading interval.

EM38-MK2 Data Filtering and External Validation
Exploratory data analysis was applied to the EM38-MK2 raw dataset to investigate for outlier values due to potential electromagnetic interferences by metallic parts of the irrigation pivot framework creating high conductivity at specific locations. Spatial query filters were used to remove ECa observations recursively measured in the same location when brief stops for operational maintenance were necessary. Complementarily, sample points that drifted off transect lines were removed to improve a parallel sampling design path. The remaining clean dataset was of 4306 points in total. The final preprocessing step used an automatic random subset sampling algorithm, in the R statistical packages [37], to subset points, reserving 400 points for use in external map validation of the simulations from the different sampling designs (Figure 3). Navigation speed through the entire study area was kept constant at 15 km/h, taking 90 min for the total operation and collecting parallel transect lines in a back-and-forth path. The average distance between transect lines was approximately 40 m ( Figure 1b). The soil ECa survey raw dataset was of 5788 observation points in total. The navigation path included 26 transect lines with the EM38-MK2 sensor set for a 1 sec reading interval.

EM38-MK2 Data Filtering and External Validation
Exploratory data analysis was applied to the EM38-MK2 raw dataset to investigate for outlier values due to potential electromagnetic interferences by metallic parts of the irrigation pivot framework creating high conductivity at specific locations. Spatial query filters were used to remove ECa observations recursively measured in the same location when brief stops for operational maintenance were necessary. Complementarily, sample points that drifted off transect lines were removed to improve a parallel sampling design path. The remaining clean dataset was of 4306 points in total. The final preprocessing step used an automatic random subset sampling algorithm, in the R statistical packages [37], to subset points, reserving 400 points for use in external map validation of the simulations from the different sampling designs ( Figure 3).    Table 1 details all four resulting simulation datasets with their respective numbers of transect lines and the remaining dataset size. From the standard survey dataset (3906 points), another four simulation subsets were extracted for sampling designs using different sample densities. The first algorithm used was the automatic random subset sampling algorithm from the R statistical packages [37], which eliminated 25%, 50%, 75%, and 95% of sample points from the original dataset (Table 1 and Figure 3). The same removal percentages were performed by the DouglasPeuckerNbPoints function implemented in the kmlShape package [38]. This algorithm consists of a proximity rule, where all original data points must be within a certain distance from the estimate. A polyline is created using the input dataset coordinates as polyline vertexes, from which a tolerance distance or an idealized number of points can be predefined. The algorithm strategy recursively creates new segments approximating the original polyline, until all vertices of the polyline satisfy the predefined tolerance condition [39]. Both sampling design approaches were further evaluated to assess kriging interpolation accuracy metrics using an external validation subset as further detailed in the next section.

Statistics, Interpolation, and Mapping Uncertainties
Kriging has been used for many decades for spatial interpolation [40] and is one of the geostatistical tools widely used with good references [41]. Ordinary kriging uses only one variable and is one of the most robust and widely used types of kriging [42]. The main objective of kriging is to estimate the value of a random variable, Z, where it was not measured.
The study considers two assumptions for sampling and geostatistics: (1) sampling by transects to represent the variation in two dimensions; (2) irregular sampling in two dimensions [42]. However, here, we will briefly summarize some necessary assumptions and equations. The spatial variability of ECa for each group, selected by approach 1 or 2, was analyzed using variograms. In this analysis, the spatial dependence of an observation for a given point z(x) is comparatively determined for a specific observation given its neighboring points z (x + h), where h denotes the distance lag and N(h) is the number of data pairs separated by a particular lag vector h. The average distance calculated for each gap of the variogram is given by γ(h) [41] Equation (1).
The result of the experimental variogram is the mean of the semivariance of the pairs of points Z i (x i ) and, Z i (x i + h), sampled over a lag distance h. A variogram model can be fitted to the experimental variogram. Based on the variogram model, values can be estimated at locations that were not sampled using kriging. Moreover, the variogram model provides a value for the nugget variance C 0 , which is the theoretical semivariance at the sampling location. It is extrapolated from the shape of the variogram model at short lag distances to h equal to zero. The nugget variance includes the variance that is associated with the small-scale variability that cannot be further distinguished by the sampling procedure, and it also includes the variability that is caused by analytical and sampling error [43].
When the nugget variance is subtracted from the sill, the structured variance C is obtained, i.e., the variance being explainable from neighboring observations. The range (a) is the distance at which neighboring observations become spatially independent.
Therefore, to manifest the spatial continuum of observations, the optimum sampling distance must be taken shorter than the range; as observations become less and less related, the more they approach the range. The maximum sampling distance (upper limit of cell size) can be determined with the "mean correlation distance" (MCD) [44], and can be calculated for spherical variogram models with Equation (2).
The by-product of ordinary kriging is the kriging variance, and the standard error can be calculated as the square root of the variance. Therefore, this by-product is a spatial variation function of the data (i.e., modeled by the variogram) or the spatial configuration of the data concerning each of its estimated values. The variance of the estimate is the expected value between theẐ(x 0 ) and Z(x 0 ).
Thus, the variance of each map produced from approach two was calculated using the two selection algorithms. Then, to assess the spatial difference between the two methods, each equivalent map in each level of removal density was subtracted from the results of the two different algorithms to identify the spatial variation of the estimated variances.
Statistical analysis of ECa sample subsets for the two approaches was evaluated for normal distribution patterns according to kriging interpolation assumptions. If the data were not normally distributed, they were transformed by the natural logarithm. A manual variogram fitting procedure for all simulations in both approaches used variogram analysis tools from the gstat package [45] in R software [37] for isotropic variogram fitting. Ordinary kriging interpolations used the krigeTg function, applied either to normal distribution or natural logarithm transformed subsets.
ECa map accuracy of all combinations was evaluated with an external validation subset using the mean error index (ME) in Equation (3), and the root mean square error (RMSE) in Equation (4).
Soil Syst. 2020, 4, 56 where: N is the number of observations, y is the observed value andŷ is the predicted value.

Exploratory Data Analysis
Descriptive statistics summarizing soil ECa datasets for different distances between transect lines are presented in Table 2, along with the ECa external validation subset. ECa derived datasets displayed equal minimum and maximum values for all line spacing subsets, except for the 26 lines dataset, which showed a difference in the minimum value (2.62 mS/m). Values for all datasets exhibited mean and median values close to 10 mS/m, indicating that, although transect spacing increased, we can observe similarities regarding mean values. The standard deviation value for the 26 lines dataset was 3.39, similar to the other datasets. Asymmetry values shown in Table 2 can be classified as moderately positive for all datasets. An increasing trend towards the left of the arithmetic mean can be observed in Figure 4a, showing a longer histogram "tail" on the right. Kurtosis coefficients for all spacing datasets, including validation data, can be classified as leptokurtic, as they showed values higher than 0.3, implying that there is a flattening in all data distribution patterns. These asymmetry and kurtosis values suggest that the data distribution for all subsets may not be considered normally distributed. Therefore, a natural logarithm transformation was applied to the datasets to normalize distribution patterns (Figure 4b).

Fitting Semivariogram Models
Spherical semivariogram model fittings were tested for anisotropy, with no significant RMSE differences for approach 1 cross-validations in the 30° and 120°; 40° and 130°; and 50° and 140° directions. Exponential and Gaussian model fittings were simulated, but did not improve RMSE in cross-validation. As a result, spherical models were used in all cases, as slightly larger ranges were

Fitting Semivariogram Models
Spherical semivariogram model fittings were tested for anisotropy, with no significant RMSE differences for approach 1 cross-validations in the 30 • and 120 • ; 40 • and 130 • ; and 50 • and 140 • directions. Exponential and Gaussian model fittings were simulated, but did not improve RMSE in cross-validation. As a result, spherical models were used in all cases, as slightly larger ranges were suitable while searching for the maximum distance between transect lines.
Best fit variogram parameters for the natural logarithm of soil ECa datasets for the different numbers of transect lines are shown in Table 3. These parameters summarize best fit spherical models for all semivariograms for transect spacing simulations ( Figure 5). Low nugget values were seen for ECa simulations with 26, 13, 7, and 4 transect lines, with respective line spacings of 40 m, 80 m, 150 m, and 300 m. That indicates a smaller variance in kriged maps and low standard deviation values. In spite of increasing distances between transect lines, nugget values were similar for seven or more transect lines, indicating that the distance between lines and, consequently, the reduction in the number of points contained in the ECa dataset affected the spatial dependence only for distances greater than 150 m between transects. As a metric of the impact of the nugget parameter on the final representation of the semivariance, the nugget parameter/sill ratio is shown in Table 3, column 5. Expressing this ratio value as a percentage allows a better comprehension and discussion about aleatory errors of semivariance. The ratio analysis suggests 150 m as the maximum distance between lines, since there is a clear influence of randomness in the semivariance in the four transect lines simulation. MCD values in this approach were between 184 and 192 m. Ranges varied from 498 to 530 m, representing the maximum distance beyond which there is significant loss of accuracy in the spatial pattern of ECa. All MCD are smaller than 300 m, and, according to [46], this measure can be considered a threshold distance that is proportional to the semi-variogram range and partial sill. Therefore, MCD may represent a maximum distance between transect lines. In this case, range parameter interpretations suggest that transect intervals around 150 to 190 m are within a safe threshold range. From the combination of nugget and range parameter indications, we would suggest no line intervals greater than 150 m.

Mapping Soil ECa Spatial Variations
The soil ECa output map from 26 total survey lines is considered as a baseline for comparison against other transect spacing interpolations. High ECa values are located in the southwest portion of the map (Figure 6a), where a drainage channel can be observed. Although the EM38 survey was done in the dry season (July), on no till straw soil cover and with no irrigation regime, high ECa values may be associated with higher soil moisture and clay concentrations in this area, as reported by [3,10,34]. In contrast, lower ECa values in the northern part of the study area suggests a welldrained region with higher elevation and flatter sandy soils. These results match previous work from [35,47], where strong positive correlations were found between ECa values and clay and moisture concentrations.
For transect spacings equal to or greater than 80 m (13-and 7-line datasets), smoother ECa distribution patterns started to be observed, indicative of a threshold for sample reduction. Although this smoothing increases gradually from the 13-to 7-line dataset, it is still possible to observe the distribution of small-scale ECa features in the maps using the groups with 13 and 7 lines, when compared to the maps using the 26-line dataset. In particular for transect spacings larger than 150 m (4-line dataset), patterns of spatial structure have been clearly smoothed. The ECa maps interpolated from 13 and 7 lines (Figure 6b

Mapping Soil ECa Spatial Variations
The soil ECa output map from 26 total survey lines is considered as a baseline for comparison against other transect spacing interpolations. High ECa values are located in the southwest portion of the map (Figure 6a), where a drainage channel can be observed. Although the EM38 survey was done in the dry season (July), on no till straw soil cover and with no irrigation regime, high ECa values may be associated with higher soil moisture and clay concentrations in this area, as reported by [3,10,34]. In contrast, lower ECa values in the northern part of the study area suggests a well-drained region with higher elevation and flatter sandy soils. These results match previous work from [35,47], where strong positive correlations were found between ECa values and clay and moisture concentrations.
For transect spacings equal to or greater than 80 m (13-and 7-line datasets), smoother ECa distribution patterns started to be observed, indicative of a threshold for sample reduction. Although this smoothing increases gradually from the 13-to 7-line dataset, it is still possible to observe the distribution of small-scale ECa features in the maps using the groups with 13 and 7 lines, when compared to the maps using the 26-line dataset. In particular for transect spacings larger than 150 m (4-line dataset), patterns of spatial structure have been clearly smoothed. The ECa maps interpolated from 13 and 7 lines (Figure 6b,c) display similar distribution patterns to the baseline map. The northern part of the 7-line output map shows a slight separation in a homogeneous area classified as low ECa values in the baseline map. Higher values are in the southeastern part, being smoothed at areas of low ECa values with more generalized patterns. However, the southwestern drainage can still be seen with a 150 m distance between lines.
Even though the 4-line output map still showed lower and higher ECa values corresponding with regions mapped with 26-, 13-, and 7-line interpolations, transect spacings of 300 m are higher than the MCD values observed in this approach (184 to 191 m). Even though there is pattern generalization and information loss in this interpolation, it is still possible to identify the drainage channel, although in shorter length if compared to the baseline map.

Map Uncertainty Assessment
Soil ECa map uncertainties based on external validation have RMSE and ME index values of 0.54 and 0.00, respectively, for the 26-line interpolation ( Table 4). As expected, RMSE and ME values increase as the number of lines decreases, due to increasing uncertainty with smaller and scattered datasets. It can be observed that both ME and RMSE indexes are very responsive, showing differences in their values when comparing results from 26-and 13-line interpolations. These two datasets showed ME uncertainties values close to zero, where a zero ME value indicates unbiased simulations. ME from output maps using 13 and 7 lines have shown no significant difference in values in general, indexes show smaller values at finer resolutions, or more transect lines.  Even though the 4-line output map still showed lower and higher ECa values corresponding with regions mapped with 26-, 13-, and 7-line interpolations, transect spacings of 300 m are higher than the MCD values observed in this approach (184 to 191 m). Even though there is pattern generalization and information loss in this interpolation, it is still possible to identify the drainage channel, although in shorter length if compared to the baseline map.

Map Uncertainty Assessment
Soil ECa map uncertainties based on external validation have RMSE and ME index values of 0.54 and 0.00, respectively, for the 26-line interpolation ( Table 4). As expected, RMSE and ME values increase as the number of lines decreases, due to increasing uncertainty with smaller and scattered datasets. It can be observed that both ME and RMSE indexes are very responsive, showing differences in their values when comparing results from 26-and 13-line interpolations. These two datasets showed ME uncertainties values close to zero, where a zero ME value indicates unbiased simulations. ME from output maps using 13 and 7 lines have shown no significant difference in values in general, indexes show smaller values at finer resolutions, or more transect lines.

Exploratory Data Analysis
Unlike the descriptive statistics results in approach 1, minimum and maximum values were not alike for all datasets. Mean and median values from subset selection algorithms were stable across sample densities (Table 5). This indicates that, even with a significant reduction in the number of points, datasets selected by the random and the Douglas-Peucker algorithms still preserved similarities regarding average values. Standard deviation values were similar in all datasets. As the standard validation dataset is the same, those results presented in Table 2 are not repeated here. Values are moderately asymmetric positive for all densities. These values indicate that most of the ECa measurements have small values or are located to the left of the arithmetic mean, promoting a longer histogram "tail" on the right (Figure 7a,c). The kurtosis coefficients for all densification sets showed values higher than 0 mS/m, indicating that there is a flattening in the data distribution pattern, and the classification is leptokurtic. Thus, as the asymmetry and kurtosis values strongly suggest that data distribution for all sample densities may not be considered normal, a natural logarithm transformation was applied before variogram fitting and kriging (Figure 7b,d).
Soil Syst. 2020, 4, x FOR PEER REVIEW 13 of 20 sample densities (Table 5). This indicates that, even with a significant reduction in the number of points, datasets selected by the random and the Douglas-Peucker algorithms still preserved similarities regarding average values. Standard deviation values were similar in all datasets. As the standard validation dataset is the same, those results presented in Table 2 are not repeated here.
Values are moderately asymmetric positive for all densities. These values indicate that most of the ECa measurements have small values or are located to the left of the arithmetic mean, promoting a longer histogram "tail" on the right (Figure 7a,c). The kurtosis coefficients for all densification sets showed values higher than 0 mS/m, indicating that there is a flattening in the data distribution pattern, and the classification is leptokurtic. Thus, as the asymmetry and kurtosis values strongly suggest that data distribution for all sample densities may not be considered normal, a natural logarithm transformation was applied before variogram fitting and kriging (Figure 7b,d).

Fitting Semivariogram Models
Similar to approach 1, variogram simulations for anisotropy and exponential model fitting showed no significant differences of RMSE in the approach 2 cross-validations, suggesting that the larger range values from spherical isotropic fittings should be considered for maximum distance between observations. Variogram adjustment for all subsets from both selection algorithms had their best fit with spherical models (Figure 8; Figure 9). Nugget effects were small and of similar values, except for the 95% reduction subset from Douglas-Peucker (D-P) algorithm ( Table 6). The D-P algorithm showed lower nugget values compared to random sampling, except for the 95% removal (Table 6). In the same way, range and sill values were similar for the two algorithms, except for 95%-point removal using D-P with slightly higher range value of 600 m ( Table 6).   . Empirical (circles) and adjusted (lines) semivariogram models for natural logarithm transformations of different soil ECa sample density reductions (i.e., 25%, 50%, 75%, and 95%), using the Douglas-Peucker algorithm.

Mapping Soil ECa Spatial Variations
Soil ECa output maps from the four dataset densities using the two algorithms captured the main spatial patterns of variation in the study area (Figure 10). High ECa values were in the centersouthwest area, and low values located in the southeast and north parts of the area. These spatial representations are similar to the interpolated maps previously shown in approach 1. Even with a   . Empirical (circles) and adjusted (lines) semivariogram models for natural logarithm transformations of different soil ECa sample density reductions (i.e., 25%, 50%, 75%, and 95%), using the Douglas-Peucker algorithm.

Mapping Soil ECa Spatial Variations
Soil ECa output maps from the four dataset densities using the two algorithms captured the main spatial patterns of variation in the study area (Figure 10). High ECa values were in the centersouthwest area, and low values located in the southeast and north parts of the area. These spatial representations are similar to the interpolated maps previously shown in approach 1. Even with a Figure 9. Empirical (circles) and adjusted (lines) semivariogram models for natural logarithm transformations of different soil ECa sample density reductions (i.e., 25%, 50%, 75%, and 95%), using the Douglas-Peucker algorithm. Similar behavior could be observed for the nugget/sill ratio, also indicating a stronger spatial structure of variation for D-P subsets as compared to random selections. Hence, even reducing the ECa samples from 2930 (25% removal) to 196 (95%), the nugget/sill ratio did not increase significantly ( Table 6). When using the D-P algorithm, there was no increase in the uncertainty represented by the nugget/sill ratio up to the 95% removal level, and the ratio was smaller than for the random selection subsets, again except for the 95% removal level ( Table 6).
MCD values were around 185 m, except for the 95% level of reduction by D-P, where there was a slight increase to 217 m ( Table 6). These results are close to MCD values from the first approach, which ranged from around 184 to 191 m, indicating that a reasonable threshold for observation separation would be 150 m. The reduction in sample densities did not increase randomness in spatial dependencies, as no significant increase in nugget values was observed. This suggests that sample density reductions in this case could preserve the spatial structure of ECa variations.

Mapping Soil ECa Spatial Variations
Soil ECa output maps from the four dataset densities using the two algorithms captured the main spatial patterns of variation in the study area ( Figure 10). High ECa values were in the center-southwest area, and low values located in the southeast and north parts of the area. These spatial representations are similar to the interpolated maps previously shown in approach 1. Even with a significant reduction in the number of sample points, the second approach illustrates high ECa levels in the central regions of the study area. The ECa maps from 95% point removal (196 points) were still able to illustrate similar visual results to ECa maps using 100% of points (3906 points; Figure 6a significant reduction in the number of sample points, the second approach illustrates high ECa levels in the central regions of the study area. The ECa maps from 95% point removal (196 points) were still able to illustrate similar visual results to ECa maps using 100% of points (3906 points; Figure 6a) using either random selection (Figure 10a) or D-P (Figure 10b).

Map Uncertainty Assessment
Uncertainty index results in the line spacings approach are revisited here as a standard reference of 100% sample density (all points in all transects) against which to compare accuracies from the ECa maps of approach 2. Those results from the soil ECa map interpolated from 26 lines (Table 4, line 1, columns 2 and 3) were 0.54 for RMSE and 0.00 for ME. Overall results in the sample density approach include RMSE and ME values from map interpolations of all reduced datasets with an increasing trend as the total number of samples was reduced. Although these indexes are not exceptionally different between reduction algorithms, the D-P algorithm produced higher accuracy maps for all density subsets (Table 7), and the lower RMSE values from the D-P algorithm suggest a better spatial point distribution in ECa mapping. Uncertainty indexes were low and similar for all subsets in the two selection algorithms, besides being very close to indices of the standard reference dataset up to 50% density reduction, in particular for the D-P algorithm. This suggested that if the number of samples was halved, accurate maps would still be produced. RMSE and ME values presented similar ranges, with small variations for sample densities reduced by 25%, 50%, and 75%, around 20% maximum uncertainty increase between all densities and resolutions, whereas the 95% sample density reduction simulations showed maximum uncertainty variation 35% above other results. All ECa sample density datasets (25%, 50%, 75%, and 95%), could represent major spatial patterns in soil ECa variations in the study area, when compared to the map produced from 100% sample density (Figure 6a).

Map Uncertainty Assessment
Uncertainty index results in the line spacings approach are revisited here as a standard reference of 100% sample density (all points in all transects) against which to compare accuracies from the ECa maps of approach 2. Those results from the soil ECa map interpolated from 26 lines (Table 4, line 1, columns 2 and 3) were 0.54 for RMSE and 0.00 for ME. Overall results in the sample density approach include RMSE and ME values from map interpolations of all reduced datasets with an increasing trend as the total number of samples was reduced. Although these indexes are not exceptionally different between reduction algorithms, the D-P algorithm produced higher accuracy maps for all density subsets (Table 7), and the lower RMSE values from the D-P algorithm suggest a better spatial point distribution in ECa mapping. Uncertainty indexes were low and similar for all subsets in the two selection algorithms, besides being very close to indices of the standard reference dataset up to 50% density reduction, in particular for the D-P algorithm. This suggested that if the number of samples was halved, accurate maps would still be produced. RMSE and ME values presented similar ranges, with small variations for sample densities reduced by 25%, 50%, and 75%, around 20% maximum uncertainty increase between all densities and resolutions, whereas the 95% sample density reduction simulations showed maximum uncertainty variation 35% above other results. All ECa sample density datasets (25%, 50%, 75%, and 95%), could represent major spatial patterns in soil ECa variations in the study area, when compared to the map produced from 100% sample density (Figure 6a).
Additionally, spatial differences in the standard deviation between ECa maps from the random selection and D-P algorithms are shown in Figure 11, as another way to visualize spatial uncertainty analysis. A map algebra procedure to subtract kriging results from both algorithms was used to map the standard deviation for their respective sample densities. Datasets with a 25% and 50% reduction, Figure 11a,b respectively, show minor differences in variances, while for 75% and 95%, Figure 11c,d respectively, differences in variances have increased slightly, with the highest variance difference in the maps with 95% of the points removed (Table 7). It can be observed that the Douglas-Peucker algorithm performs better at spatial point selection as compared to the random selection algorithm, as it better preserved map accuracy even with a 95% reduction in density.
the standard deviation for their respective sample densities. Datasets with a 25% and 50% reduction, Figure 11a,b respectively, show minor differences in variances, while for 75% and 95%, Figure 11c,d respectively, differences in variances have increased slightly, with the highest variance difference in the maps with 95% of the points removed (Table 7). It can be observed that the Douglas-Peucker algorithm performs better at spatial point selection as compared to the random selection algorithm, as it better preserved map accuracy even with a 95% reduction in density. Figure 11. Differences between standard deviations of ECa from interpolations using random selection and Douglas-Peucker algorithms on different sample density reductions: 25% (a); 50% (b); 75% (c); and 95% (d).

General Discussion
Simulation datasets in the first approach, including 26, 13, and 7 transect lines, showed interpolation results using ordinary kriging that could represent main patterns of ECa variation in the study area. This supported the indication that future sampling designs should consider transect intervals from 100 to 150 m for ECa data collection on parallel back-and-forth paths. RMSE and ME values were higher when less than seven transects were included, suggesting that EM38 on-the-go survey paths should not exceed distances of 150 m between transect lines. Line intervals of 300 m (four lines) attenuated the representation of soil spatial variations in ECa maps, as RMSE and ME values from four lines interpolations were higher from external validation.
Evaluating the RMSE values between the two algorithms used to vary the density of points in the second approach, a reduction in the uncertainty values were found in all combinations between datasets of different sampling densities and spatial resolutions when the Douglas-Peucker algorithm was used in comparison to the random selection approach. The Douglas-Peucker algorithm provided values lower than those from the random selection algorithm for all combinations between sets of point density, except for the dataset with a 95%-point removal, corroborating the optimization by the D-P selection algorithm. Furthermore, in this algorithm RMSE values suggest a better spatial point distribution for ECa mapping procedures, in which the uncertainty range (from 0.56 to 1.04) suggests more accurate maps than for random reduction (from 0.60 to 1.11) and transect spacing (from 0.54 to 1.73) simulations.  Figure 11. Differences between standard deviations of ECa from interpolations using random selection and Douglas-Peucker algorithms on different sample density reductions: 25% (a); 50% (b); 75% (c); and 95% (d).

General Discussion
Simulation datasets in the first approach, including 26, 13, and 7 transect lines, showed interpolation results using ordinary kriging that could represent main patterns of ECa variation in the study area. This supported the indication that future sampling designs should consider transect intervals from 100 to 150 m for ECa data collection on parallel back-and-forth paths. RMSE and ME values were higher when less than seven transects were included, suggesting that EM38 on-the-go survey paths should not exceed distances of 150 m between transect lines. Line intervals of 300 m (four lines) attenuated the representation of soil spatial variations in ECa maps, as RMSE and ME values from four lines interpolations were higher from external validation.
Evaluating the RMSE values between the two algorithms used to vary the density of points in the second approach, a reduction in the uncertainty values were found in all combinations between datasets of different sampling densities and spatial resolutions when the Douglas-Peucker algorithm was used in comparison to the random selection approach. The Douglas-Peucker algorithm provided values lower than those from the random selection algorithm for all combinations between sets of point density, except for the dataset with a 95%-point removal, corroborating the optimization by the D-P selection algorithm. Furthermore, in this algorithm RMSE values suggest a better spatial point distribution for ECa mapping procedures, in which the uncertainty range (from 0.56 to 1.04) suggests more accurate maps than for random reduction (from 0.60 to 1.11) and transect spacing (from 0.54 to 1.73) simulations.
Although soil ECa maps from EM38-MK2 devices have proven suitable supporting several correlated soil attributes, it is not possible to assure whether any soil process is actually occurring in the study area without pedotransfer functions been adjusted for ECa data as a function of other soil attributes data obtained via laboratory analysis.

Conclusions
The results of this study provide a basic step toward detailed investigations of variable ECa sampling densities. The use of the EMI sensors for precision agriculture applications in Brazil is relatively recent. The dataset used in this work is among first data collection efforts throughout different regions and soil types. The following statements are addressed from this investigation:

•
Sampling designs for continuous PSS surveys are still lacking optimal operational standards, potentially compromising map uncertainty evaluations; • Datasets from different transect spacings and sampling densities could preserve similar ranges in the magnitude of soil ECa mapping uncertainty variations; • Accurate soil ECa maps were obtained from increasing transect spacing simulations up to 150 m; or decreasing sample densities to a maximum 75% and limiting the distance between observations to 180 m.