Validation of Rapid and Low-Cost Approach for the Delineation of Zone Management Based on Machine Learning Algorithms

: Proximal soil sensors are receiving strong attention from several disciplinary ﬁelds, and this has led to a rise in their availability in the market in the last two decades. The aim of this work was to validate agronomically a zone management delineation procedure from electromagnetic induction (EMI) maps applied to two different rainfed durum wheat ﬁelds. The k-means algorithm was applied based on the gap statistic index for the identiﬁcation of the optimal number of management zones and their positions. Traditional statistical analysis was performed to detect signiﬁcant differences in soil characteristics and crop response of each management zones. The procedure showed the presence of two management zones at both two sites under analysis, and it was agronomically validated by the signiﬁcant difference in soil texture (+24.17%), bulk density (+6.46%), organic matter (+39.29%), organic carbon (+39.4%), total carbonates (+25.34%), total nitrogen (+30.14%), protein (+1.50%) and yield data (+1.07 t ha − 1 ). Moreover, six unmanned aerial vehicle (UAV) ﬂight missions were performed to investigate the relationship between ﬁve vegetation indexes and the EMI maps. The results suggest performing the multispectral images acquisition during the ﬂowering phenological stages to attribute the crop spatial variability to different soil proprieties.


Introduction
The current social context requires an increase in food production, improvement of its quality characteristics and greater environmental sustainability in the management of agricultural systems. Technological innovation plays a great role in making agriculture more efficient and sustainable.
On the 1 June 2018, the European Commission set goals for the new Common Agricultural Policy (CAP) for beyond 2020, focusing on the contribution of innovation and sustainability of crop production in Italy (through Regional Agricultural Policies), as for the rest of Europe (EIP-AGRI partnership). One of the key points reported is the necessity of effective nutrient management, more specifically, avoiding environmental losses and preserving yields [1].
Uniform management of fields does not consider spatial variability, and it is not the most effective management strategy. Precision agriculture is considered the most viable approach for achieving sustainable agriculture [2]. Soil is the temporal result of several factors such as the atmosphere, biosphere, lithosphere and hydrosphere [3]. Such variability may act over different spatial and temporal scales and affects crop yield both quantitatively and qualitatively [4].
The aim of our contribution is to validate the k-means algorithm to delineate homogeneous management zones. The k-means algorithm uses low-cost resistivity maps created by an electromagnetic induction method as a source of data. The proposed approach could be used to easily reconduct the spatial variability of the soil in homogeneous management zones statistically different from each other for high-quality prescriptions maps for the precision farming application.

Experimental Sites Description
The method was tested on two experimental sites ( Figure 1). In both sites, three different homogeneous zones (ZH) were identified by resistivity maps created by an electromagnetic induction (EMI), which were subsequently identified with the letters a, b, c. In these three areas, two different fertilization applications were tested: variable rate (VRT) and uniform (UA).
In order to produce the homogeneous zone map, we need to monitor the field over time by using several sensors. Depending on the type of sensor used and analysis performed, several authors provided different approaches to define the homogeneous zones.
The authors of [26] proposed a multi-source geostatistical approach, [27] evaluated 20 different unsupervised machine learning algorithms, while [28] used the Self-Organizing Maps.
The aim of our contribution is to validate the k-means algorithm to delineate homogeneous management zones. The k-means algorithm uses low-cost resistivity maps created by an electromagnetic induction method as a source of data. The proposed approach could be used to easily reconduct the spatial variability of the soil in homogeneous management zones statistically different from each other for high-quality prescriptions maps for the precision farming application.

Experimental Sites Description
The method was tested on two experimental sites ( Figure 1). In both sites, three different homogeneous zones (ZH) were identified by resistivity maps created by an electromagnetic induction (EMI), which were subsequently identified with the letters a, b, c. In these three areas, two different fertilization applications were tested: variable rate (VRT) and uniform (UA). In 2019-2020 at site 1 Az. Agricola F.lli Lillo (Matera) latitude: 40.712640° longitude 16.656343° ( Figure 2) on a study area of 6.65 ha, the experiment was conducted with durum wheat (Triticum durum L., var PR22D89) with sod seeding (7 January 2020).
In 2018-2019 at site 2 Genzano di Lucania (PZ) latitude: 40.82 • N, longitude: 16.08 • N (Figure 2), the study area (4.93 ha) was located on the clayey hills of the Bradanica grave and the basin of Sant'Arcangelo. The experiment was conducted with durum wheat (Triticum durum L., var Tirex). The inter-row spacing of 0.13 m and 250 kg ha −1 of seeds was used. Soil tillage consisted of a 40 cm deep plowing (28 August 2018) and two harrowing (11 November 2018 and 5 December 2018) ( Figure 2).
The crop potential N uptake was estimated by the nitrogen content in the yield in each homogeneous area and was corrected considering the nitrogen provided by mineralization of the organic matter by using the agri-environmental measures adopted within the Rural Development Plans at a local scale (https://www.regione.marche.it/Regione-Utile/ Agricoltura-Sviluppo-Rurale-e-Pesca/Produzione-Integrata#Tecniche-Agronomiche, 15 December 2021). N mineralization was calculated considering the content of organic matter in the soil profile explored by the roots, its content in organic N and by the mineralization efficiency which in turn depends on the carbon/nitrogen ratio of the soil (1 for C/N < 9; 0.5 for C/N > 9, 0 for C/N < 12) (https://www.regione.marche.it/Regione-Utile/Agricoltura-Sviluppo-Rurale-e-Pesca/Produzione-Integrata#Tecniche-Agronomiche, 15 December 2021).
For the VRT treatment, the N doses applied in each area through a variable rate spreader are reported in Table 1 for site 1 and Table 2 for site 2. For each treatment, plots of 2 m × 2 m replicated three times inside each of the homogeneous areas identified in the field were established. In all such plots, a dose of N uniform was applied, which corresponds to the amount generally applied by the farmer and slightly over the average of the dose of N applied in the three zones. The fertilizer was manually spread in UA. The crop potential N uptake was estimated by the nitrogen content in the yield in each homogeneous area and was corrected considering the nitrogen provided by mineralization of the organic matter by using the agri-environmental measures adopted within the Rural Development Plans at a local scale (https://www.regione.marche.it/Regione-Utile/Agricoltura-Sviluppo-Rurale-e-Pesca/Produzione-Integrata#Tecniche-Agronomiche, 15/12/2021). N mineralization was calculated considering the content of organic matter in the soil profile explored by the roots, its content in organic N and by the mineralization efficiency which in turn depends on the carbon/nitrogen ratio of the soil (1 for C/N < 9; 0.5 for C/N > 9, 0 for C/N < 12) (https://www.regione.marche.it/Regione-Utile/Agricoltura-Sviluppo-Rurale-e-Pesca/Produzione-Integrata#Tecniche-Agronomiche, 15 December 2021).
For the VRT treatment, the N doses applied in each area through a variable rate spreader are reported in Table 1 for site 1 and Table 2 for site 2. For each treatment, plots of 2 m × 2 m replicated three times inside each of the homogeneous areas identified in the field were established. In all such plots, a dose of N uniform was applied, which corresponds to the amount generally applied by the farmer and slightly over the average of the dose of N applied in the three zones. The fertilizer was manually spread in UA.

Soil and Crop Samples Position
The soil spatial variability was detected by means of low induction electromagnetic technique of CMD miniexplorer (GF Instruments, s.r.o., Brno, Czech Republic) with 6 m between transects and an average measurement distance of 0.8 m along transects.
The CMD miniexplorer returns data must be interpolated; in this case, the inverse distance squared method was performed by using Qgis [29][30][31].
After obtaining the electrical resistivity map, the cluster analysis was performed to identify the zones, and then for each zone, soil samples at the depths of 0-40 cm were collected and characterized by conventional analytical methods according to [32]. All samples were air-dried, and 2-mm sieved before laboratory analyses.
The organic carbon (OC) content was measured by the Walkley-Black method, and the total Kjeldahl nitrogen was determined by the Kjeldahl method. The available phosphorus (Pava) was determined by ultraviolet and visible (UV-vis) spectrophotometry according to the Olsen method. The total content of CaCO 3 was determined by the gas-volumetric methods (Freuling calcimeter method), whereas the active lime was extracted with 0.1 M ammonium oxalate and determined by titration with 0.1 M KMnO 4 .
At crop maturity, the grain yield (t ha −1 ) and protein content (%) was measured on a sample area of 4 m 2 replicated three times. The protein content (%) was measured by the FOSS Infratec 1241.

Management Zone Delineation Approach
The management zones map creation workflow was entirely performed with R statistical software [33]. The workflow to generate the management zone map is composed of several steps, which could be summarized as (1) import resistivity map, (2) raster to dataframe conversion, (3) cluster analysis, (4) management zone map creation and (5) export.
The resistivity maps were imported to R by using the "raster" function of the raster R package [34]. After checking the geographical reference system and the spatial resolution, the resistivity maps were converted to "dataframe" R object by using the "as.data.frame" function of the raster R package.
The cluster analysis was performed by using the "kmeans" function of the stats R package [33]. The "kmeans" function requires the number of "centers" as a mandatory parameter, which defines the number of clusters that the algorithm must perform.
The optimal number of "centers" was defined by performing the gap statistic index, which calculates the goodness of clustering by comparing the total intra-cluster variation for different values of k with their expected values under the null reference distribution of the data. The gap statistic index was performed by using the "clusGap" function of the cluster R package [35].
Based on the gap statistic index, the k-means cluster analysis was performed, and the zone management map was created and converted to the spatialpolygonsdataframe R object by using the "df_to_SpatialPolygons" function of the FRK package [36]. The spatialpolygonsdataframe was exported by using the "writeOGR" function of the rgdal R package [37] in an ESRI Shapefile file format.

UAV Images Acquisition
The UAV images acquisition was conducted in 2019-2020 at site 1. The images were acquired using a Parrot Bluegrass drone with a Parrot Sequoia multispectral sensor, and the flight plan was set using Pix4Dcapute. Six flight missions were carried out throughout the durum wheat crop cycle (Table 3). For the agriculture domain sector, each image acquired by UAV flight required an image processing workflow to compute the vegetation index (VI). The image processing is Starting from the raw tiff files acquired by the UAV, the orthomosaic reflectance map was generated by using structure from motion (SfM) software [38], which in this case was PIX4D.
In order to complete the second main step, the orthomosaic reflectance map was imported in R statistical software [33], and the VI shown in Table 4 was calculated [39]. In order to define the relationship between the previous VI and the resistivity map, the coefficient of determination (R 2 ) was computed. (1) The VI maps were scaled at the same resolution as the resistivity map by using the "resample" function of the raster R Package [34]. (2) After obtaining raster files with the same resolution, they were converted to a data frame by using the "as.data.frame" function of the raster R Package [34]. (3) Then, a linear model was fitted by using the "lm" function of the stats R Package [33] in order to compute the R 2 .

Statistical Analysis
All the statistical analyses were performed with R statistical software [33]. Before performing any analysis, a descriptive statistics analysis was performed on the resistivity maps; the range and the coefficient of variation (CoV) to describe the spatial variability of both sites were calculated.
In order to validate the zone management map creation workflow, the statistical analysis was performed on the soil samples, which were assigned an experimental factor in relation to the zone management area previously defined.
In order to perform the statistical analysis, a one-factor linear model was built by using the "lm" function of the stats R package [33], on which the cluster was considered the main factor.
Before performing the Analysis of Variance (ANOVA), whether the model met the three assumptions of the ANOVA was verified [45]. The Normality distribution of the model residual was checked both graphically (QQ-plot) and by performing the Shapiro-Wilk normality test. Moreover, the homoscedasticity was checked using the Levene test. The last ANOVA assumption was satisfied by the experimental design and the random sampling.
When all the three ANOVA assumptions were met, the ANOVA was applied to the model. Only when the ANOVA showed a significant difference (p-value < 0.05), the estimated marginal means post hoc analysis was performed by using the "emmeans" function with the Bonferroni adjustment of the emmeans R package [46].
For the yield dataset, the same procedure of the soil samples dataset was performed, except that the statistical analysis was performed on a full factorial model where the site and zone management were set as experimental factors.

Resistivity Maps
The resistivity maps of both sites are shown in Figure 3. The values were scaled based on quartiles to show the in-field spatial variability better. Based on the previous scale classification, both sites showed high spatial variability. As evidence of the different spatial variability, both range and coefficient of variation (CoV) were calculated ( Table 5). The first site obtained a higher value of +15.11 CoV and +18.56 of range than the second site. While considering the EC value, the first site obtained a higher value of +3.90 mS m −1 than the second site (Table 5).
model. Only when the ANOVA showed a significant difference (p-value < 0.05), the estimated marginal means post hoc analysis was performed by using the "emmeans" function with the Bonferroni adjustment of the emmeans R package [46].
For the yield dataset, the same procedure of the soil samples dataset was performed, except that the statistical analysis was performed on a full factorial model where the site and zone management were set as experimental factors.

Resistivity Maps
The resistivity maps of both sites are shown in Figure 3. The values were scaled based on quartiles to show the in-field spatial variability better. Based on the previous scale classification, both sites showed high spatial variability. As evidence of the different spatial variability, both range and coefficient of variation (CoV) were calculated ( Table 5). The first site obtained a higher value of +15.11 CoV and +18.56 of range than the second site. While considering the EC value, the first site obtained a higher value of +3.90 mS m −1 than the second site (Table 5).

Zone Management Delineation and Statistical Analysis Results
The number of the optimal "centers" to associate as a parameter for the k-means computation was defined based on the gap statistic index. For both sites, the gap statistic index defined that the optimal number of clusters was two. The zone management map visualization is reported in Figure 4.
In order to agronomically validate the two zones identified by the k-means classification for both sites, we set an experimental factor of the soil samples based on the affiliation of zone management. Then the zone management experimental factor was analyzed by the ANOVA applied to the soil sample.
The ANOVA showed that the zone management defined by the k-means was statistically significant for clay, silt, bulk density, EC, organic matter, organic carbon, total carbonate, nitrogen and ratio C/N for both sites (Tables 6-8).

Zone Management Delineation and Statistical Analysis Results
The number of the optimal "centers" to associate as a parameter for the k-means computation was defined based on the gap statistic index. For both sites, the gap statistic index defined that the optimal number of clusters was two. The zone management map visualization is reported in Figure 4. In order to agronomically validate the two zones identified by the k-means classification for both sites, we set an experimental factor of the soil samples based on the affiliation of zone management. Then the zone management experimental factor was analyzed by the ANOVA applied to the soil sample.
The ANOVA showed that the zone management defined by the k-means was statistically significant for clay, silt, bulk density, EC, organic matter, organic carbon, total carbonate, nitrogen and ratio C/N for both sites (Tables 6-8).
In addition to the variables previously cited, for the second site, the ANOVA showed a statistical impact of zone management for sand and magnesium (Tables 6-8). However, the ANOVA did not show a statistical impact of the zone management for the ratio Mg/K, sodium, phosphorus, AWC and pH.     In addition to the variables previously cited, for the second site, the ANOVA showed a statistical impact of zone management for sand and magnesium (Tables 6-8). However, the ANOVA did not show a statistical impact of the zone management for the ratio Mg/K, sodium, phosphorus, AWC and pH.
The emmeans with the Bonferroni adjustment analysis showed that, for both sites, zone number 2 obtained a statistically higher value than zone number 1 for clay, EC, organic matter, organic carbon, nitrogen and ratio C/N. However, no statistical superiority was highlighted between the two zones of both sites for Mg/K, phosphorus, AWC and pH (Tables 9 and 10). Furthermore, for site 1, zone 2 obtained a higher percentage value of +18.95% of clay, +35.29 % of EC, +32.22% of organic matter, +32.37% of organic carbon, +16.20% of nitrogen and +19.35% ratio C/N than the zone 1. While zone 1 obtained a higher percentage value of +17.65 of silt, +5.88 % bulk density and +24.28% of total carbonates than zone 2 (Tables 9 and 10).
Based on the results obtained above, it can be stated that zone 2 generated by the cluster analysis for both sites has soil physical and chemical characteristics superior to zone 1 for durum wheat cultivation, such as higher clay, silt, EC, organic matter, organic carbon, nitrogen and the ratio of carbon and nitrogen.

Grain Yield and Statistical Analysis
The ANOVA applied to the full factorial model showed that the single effect of site and zone management statistically impacts the grain yield (t/ha) and grain protein content (%). Moreover, the combined effect of the interaction between site and zone management statistically impacted the grain yield (t/ha), while for the protein content, no statistical impact was raised (Table 11). Since we observed a significant difference in the combined effect, we applied the post hoc analysis on the site and zone management interaction. Table 11. Results of the ANOVA applied to the grain yield and protein content on the zone management experimental factor for both sites.

Experimental Factor
Df 1
With reference to the production (grain yield t/ha), the ML approach shows for both sites 1 and 2, and for both fertilization application N (UA) and N (VRT) a difference between management zones 1 and 2 (Table 12). Specifically, in site 1 in N (UA), zone 2 shows the production of +0.95 t/ha with respect to zone 1; in zone 2 in N (VRT), it produced +0.7 t/ha with respect to zone 1.
Furthermore, in site 2 in N (UA), zone 2 produced + 1.4 t/ha with respect to zone 1; in N (VRT), zone 2 produced + 1.25 t/ha with respect to zone 1 ( Table 12). The same is true for the % of grain proteins content where the difference is relevant for both site 1 and site 2 between management zones 1 and 2 defined with ML.
In line with the results described, it is possible to underline that in the ZH approach for both sites with reference to production (grain yield t/ha), even considering the N (UA) and N (VRT) treatments, a significant difference is highlighted between zone c with respect to zones a b. This reinforces the results obtained from the test of the approach (ML) as zones a b flow into zone 2, and zone c flows into zone 1 (Table 12). Means within column that are followed by the same letter are not significantly different at p < 0.05%.

Relationship between Vegetation Index and Resistivy Map
Six UAV flight missions were performed during all growing seasons in site 1 to obtain multispectral images to compute five vegetation indexes and evaluate the relationship with the resistivity map. As reported in Table 13, the coefficient of determination of the relationship between the vegetation indexes and the resistivity map is not significant until flowering. During flowering, the maximum value of correlation is reached with an average coefficient of determination of 0.53. After flowering, the correlation value decreases for each vegetation indexes until the maturity of the durum wheat, where a non-significant correlation is shown (Table 13). The vegetation index that showed a higher relationship with the resistivity map is the NDVI [40], which reached an R 2 of 0.56 during flowering, while the vegetation index that reported the lowest R 2 was the WV.VI.
The NDVI maps are reported in Figure 5, which were scaled by using the quartile and where it is possible to appreciate the evolution of the NDVI throughout the year. All the NDVI maps were scaled by using the quartile. While in Figure 6, it is possible to appreciate the overlapping of the resistivity map, NDVI and the zone management defined by the cluster analysis. Agronomy 2021, 11, x FOR PEER REVIEW 12 of 17

Soil Sensor and Management Zone Creation
Proximal soil sensors are receiving strong attention from several disciplinary fields, and this has led to a rise in the number of proximal soil sensors available in the market in the last two decades [47]. These sensors contribute to measuring the spatial-temporal variability of soil proprieties, such as moisture content and soil texture [48].
These sensors could be used to measure the soil organic matter, nitrogen availability and the ratio of carbon-nitrogen indirectly, which are soil variables that are mostly considered to calculate the nitrogen balance in several integrated production standards [49]. Moreover, after careful evaluation and calibration, these sensors can avoid time-consuming and expensive soil sampling and analysis, which cannot be scaled at the farm level [50].

Soil Sensor and Management Zone Creation
Proximal soil sensors are receiving strong attention from several disciplinary fields, and this has led to a rise in the number of proximal soil sensors available in the market in the last two decades [47]. These sensors contribute to measuring the spatial-temporal variability of soil proprieties, such as moisture content and soil texture [48].
These sensors could be used to measure the soil organic matter, nitrogen availability and the ratio of carbon-nitrogen indirectly, which are soil variables that are mostly considered to calculate the nitrogen balance in several integrated production standards [49]. Moreover, after careful evaluation and calibration, these sensors can avoid time-consuming and expensive soil sampling and analysis, which cannot be scaled at the farm level [50].

Soil Sensor and Management Zone Creation
Proximal soil sensors are receiving strong attention from several disciplinary fields, and this has led to a rise in the number of proximal soil sensors available in the market in the last two decades [47]. These sensors contribute to measuring the spatial-temporal variability of soil proprieties, such as moisture content and soil texture [48].
These sensors could be used to measure the soil organic matter, nitrogen availability and the ratio of carbon-nitrogen indirectly, which are soil variables that are mostly considered to calculate the nitrogen balance in several integrated production standards [49]. Moreover, after careful evaluation and calibration, these sensors can avoid time-consuming and expensive soil sampling and analysis, which cannot be scaled at the farm level [50].
Based on previous assumptions, it is essential to use the proximal soil sensor in order to characterize the soil proprieties and to perform the SSNM (Site-Specific Nitrogen Management) [19].
By using the EMI sensor, we could generate the resistivity map, which can be used as the information layer to define the zone management. Different approaches such as the multivariate geostatistical approach [51] or the machine learning approach were found in the literature, which deal well with non-linear patterns [27,52].
Our contribution is to validate agronomically the k-means algorithm to delineate zone management which uses the resistivity map by using a statistical approach. Both sites under study showed a different spatial variability which allowed us to validate the approach in two different field conditions. The unsupervised machine learning algorithm, which was the k-means [53] based on the gap statistic index [35], reported the existence of two zone managements at both sites.
At both sites, multiple soil and crop samples were performed in those different zones in order to perform a statistical analysis to agronomically validate the presence of the two zones [54]. At both sites, between the two zones, there was a significant difference in clay (%), silt (%), bulk density (g m −3 ), EC (mS m −1 ) organic matter (%), organic carbon (%), total carbonates (%), nitrogen (g kg −1 ) and C/N.
The second zone for both sites showed a higher value of organic matter, organic carbon, nitrogen and ratio of nitrogen and carbon, which led to higher grain yield (t ha −1 ) than the first zone. This result is in accordance with [55], where a higher content of soil organic matter and nitrogen led to a higher value of several vegetation indexes and grain yield (t ha −1 ). Moreover, it was observed that the differences in yield are significant between zones and not within zones. While for the grain protein content (%), the difference was found only in the two sites where the resistivity map obtained a higher spatial variability.

Relationship between Vegetation Indexes and Resistivity Map
Efficient and reliable methods for measuring spatial variability in soil properties are fundamental in precision agriculture [56].
It is by using these instruments precisely that spatial variability can be estimated without soil sampling, which is time-money consuming [50].
Beyond the use of proximal soil sensors, several authors tried to use remote sensing data, such as Sentinel-2 multispectral images, in order to predict the spatial soil proprieties, such as organic carbon [57] and electrical conductivity [58]. Other authors tried the Pedotransfer Functions [59] or neural networks [60] in order to improve the accuracy of the models.
We showed that the correlation between the VI and the resistivity map depends strongly on the phenological and developmental stage of the durum wheat. During the whole development of the crop, there is no significant correlation, except during flowering when the linear correlation reaches 0.53 of the R 2 . This is because flowering is the most important and susceptible phase of crop phenological development. During flowering, the crop reaches its maximum development, generating maximum leaf development.
It is at that point that differences in nitrogen uptake due to soil differences are shown in the crop [5].
Moreover, NDVI was the best vegetation index to be related to the resistivity map, while the worst VI was the WV.VI.

Conclusions
Two sites were mapped through an electromagnetic induction sensor to measure the electric conductivity map. An unsupervised machine learning approach was applied to the resistivity maps to detect the presence of different zones. Based on the results of the classification algorithm, multiple soil and crop samples were taken to validate the difference of the zones agronomically.
The algorithm used was able to detect the presence of the two zones for both sites. The soil samples acquired showed a significant difference between zones and not within zones for organic matter, nitrogen and the ratio of carbon-nitrogen. The differences reported on the soil proprieties led to a statistical difference in the grain yield obtained between the zones detected by the k-means algorithm.
This approach could be used to provide a high-quality prescription map to apply the precision agriculture applications. This approach could be scaled at the farm level; one resistivity survey and a few soil samples could generate a high-quality prescription map, containing costs and falling within the farm-year budget. Future work will focus on creating an automated nitrogen fertilization determination method starting from the acquired soil data.
Moreover, the correlation between the VI and resistivity map depends strongly on the phenological and developmental stage of the durum wheat. Therefore, we suggest performing the UAV multispectral images acquisition during the flowering phenological stages to attribute the crop spatial variability to different soil conditions.