1. Introduction
In modern agriculture, crop and soil properties are collected, processed, and analyzed both temporally and spatially, integrating additional information to support decision-making. By addressing variability within agricultural systems, it is possible to improve resource efficiency, productivity, profitability, and the sustainability of agricultural production. Data retrieved by soil sampling, drone and satellite imagery, remote sensors, raster data, and other information technologies are used to gather information about issues such as heterogeneity of chemical and physical soil properties, crop development, and climate variability. This information can be used as inputs to decision support systems to assist farmers in decisions such as targeted specific water and nutrient needs, yield prediction, and crop harvest management.
Site-Specific Management Zones (SSMZs) are defined as areas within a field that share similar characteristics, identified using data from sources such as soil sampling, remote sensing, or yield monitoring. By dividing fields into these zones, farmers can implement customized management strategies, optimizing the application of resources like fertilizers, water, and seeds. This targeted approach reduces waste, enhances productivity, and supports sustainable farming practices by addressing the unique needs of each zone. Several methods have been proposed to address the management zone delineation problem. Clustering techniques based on information such as soil properties, yield maps, historical seasonal data, and combined factors have been widely studied [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12]. These approaches often result in irregularly shaped zones, which can hinder their practical implementation by farmers.
To overcome this limitation, several models for delineating rectangular and orthogonal management zones have been proposed using mathematical programming, heuristic, and metaheuristic techniques. Rectangular and orthogonal partitions are generally more compatible with agricultural machinery, improving their functionality in practice. Such models typically define field partitions based on agricultural field data, an objective function, e.g., minimizing the number of zones or minimizing the sum of variances within the zones, subject to a required homogeneity index.
For instance, in [
13], a binary integer linear programming (BILP) model was proposed for delineating rectangular and homogeneous management zones, minimizing the sum of variances for a soil property. Albornoz et al. [
14] introduced a bi-objective mixed-integer linear programming model to simultaneously minimize the number of management zones and maximize the relative homogeneity of the partition. Later, Albornoz et al. [
15] presented a robust mixed-integer optimization model that considers the spatial and temporal variability of vegetation or soil indices, combining it with a column generation algorithm for solving the model. Additionally, more recently Albornoz et al. [
16] developed a linear binary integer program for integrated zoning and crop planning with adjacency constraints, solved using a decomposition-based heuristic. Integrated approaches for the delineation of rectangular management zones in crop planning problems under both deterministic and uncertain conditions were proposed in [
17] and [
18], respectively. Finally, orthogonal management zone delineation is approached using a greedy heuristics algorithm in [
19] and estimation of distribution algorithms in [
20,
21].
Initially, the aforementioned models require as input a dataset that describes the variability of a property across a rectangular field, i.e., soil or crop property values obtained from equidistant samples. Each of these equidistant samples characterize small square area units, defining a perfect grid within the rectangular fields. When fields are not rectangular (i.e., when samples and their respective area units do not form a perfect grid), equidistant dummy samples are introduced to complete the rectangular shape of the field. The handling of property values for these dummy samples varies among authors. For instance, authors in [
13,
14] assigned to dummy samples high property values, relative to real samples. This approach ensures that dummy samples are grouped separately from real samples, allowing them to be excluded from the final partition afterwards. Conversely, Velasco et al. [
20] assigned dummy samples the value of their nearest neighboring sample. This method prevents the formation of management zones composed solely of dummy samples, which are also removed from the final field partition. The risk with these approaches for handling irregularly shaped fields is that the insertion of arbitrary values to inexistent samples alter the agricultural field description and has an impact in global and local descriptors, such as the total variance of the field and the internal variance of zones containing dummy samples involved in the optimization process. In these cases, the distortion in the field description becomes accentuated as the number of dummy samples increases. The objective of this paper is to present a new methodology based on genetic algorithms to approach the rectangular management zone delineation problem, in both rectangular and irregularly shaped agricultural field, using only real sample data.
The remainder of this article is organized as follows:
Section 2 presents Materials and Methods, including a general description of the genetic algorithm, and the benchmark algorithm and instances used to evaluate the performance.
Section 3 describes and discusses the experimental results. Conclusions and recommendations are presented in
Section 4.
3. Experimental Results and Discussion
3.1. Algorithm Paramater Tuning
Several tests were conducted to assess the algorithm’s sensitivity to input parameters, such as initial population size, the share of genetic operations, and the number of evolutionary iterations. Based on these tests, reference values were established and later refined for each optimization case through additional experiments to enhance the algorithm’s tuning. The minimum and maximum number of zones allowed in a field partition—
and
in
Figure 2—were set to one and the number of sampled AUs, respectively, for each optimization case.
3.1.1. Initial Population Size
The evolutionary process can only occur if the initial population includes solutions that meet the homogeneity requirements. Therefore, the population size must be large enough to ensure sufficient diversity, including individuals with a relative variance equal to or greater than the required threshold. The appropriate size of the initial population is influenced by instance characteristics such as heterogeneity and size. To evaluate the impact of these factors, tests were conducted for the different study cases. These tests involved varying the initial population size, assessing the percentage of successful iteration starts over a fixed number of trials, and analyzing the effects of changing homogeneity requirements and instance sizes.
Table 1 shows how the initial population size required for a successful start of the evolutionary process depends on the relative variance (RV) requirement for organic matter and pH properties in the “Quilaco” instance. For organic matter, the required population size increases as the RV requirement grows, with a more pronounced rise when the RV meets 0.9. In contrast, for pH, a small population size is sufficient to initiate the optimization process. This difference is related to the heterogeneity of the properties within the instance: the total variance of organic matter (4.45) is significantly higher than that of pH (0.012).
Table 2 shows how the instance size affects the initial population size in the group of Instance 1 to 12 for a relative variance requirement of 0.9. The figure shows a non-linear relationship between both parameters consisting of a first phase (Instances 1 to 5), where the size of the initial population increases as the number of sample increases, and a second phase (Instances 6 to 12), where the population size decreases and stabilizes as the number of samples grows. In the first phase (smaller instances), as the instance size increases, the size of the population increases. This is because in smaller instances, the random solutions tend to create more smaller zones—zones that group fewer samples—resulting in greater heterogeneity between zones and lower relative variance. To achieve a high relative variance, more aleatory solutions are necessary due to this increased variability. In the second phase (for larger instances), the required number of solutions decreases as the instance size grows. This is because in larger instances, the random solutions tend to create larger zones that group more samples, leading to greater homogeneity and a higher relative variance. As the number of zones decreases, fewer solutions are needed to achieve a high relative variance. The transition between these two phases suggests a critical point, where spatial averaging starts to dominate, making homogeneous solutions easier to obtain.
3.1.2. Mutation and Crossover Share
The influence of genetic operations was investigated by applying different proportions of mutations and crossovers to various optimization cases. Over a fixed number of genetic operations—defined as the size of the new generations in the evolutionary process—seven mutation–crossover shares (0–100%, 20–80%, 40–60%, 50–50%, 60–40%, 80–20%, and 100–0%) were tested. Each optimization test was repeated a fixed number of times, allowing for the collection of the minimal, maximal, and average best solutions from the set of optimizations. Additionally, optimal solutions were calculated using the BILP model for each case.
Figure 5 presents the GAZD results for Instance 1 and
Figure 6 presents results for Instance 4, both under a relative variance requirement of 0.9. For each mutation–crossover share, the figures display the minimal, average, and maximal values from the set of executions. The BILP optimal solution is 13 zones for Instance 1 and 31 zones for Instance 4.
As shown, the absence of mutations leads to convergence toward local optima, resulting in significant deviations of the GAZD results from the global optimum and a wider range of variation around it. A mutation share of 40% or higher leads to solutions that approach the global optimum and reduces variability around it.
3.1.3. Finalization Criterion
The end of the optimization process is determined by the number of iterations of the evolution process. This directly affects the possibilities to converge to a local or global optimal. As in the initial population size case, the appropriate setting of this parameter may be affected by characteristics of the study case such as the size of the instance and heterogeneity. To determine the appropriate number of generations for each optimization case, different values of the finalization criterion were tested.
Figure 7 and
Figure 8 shows the evolution profile for Instances 3 and 6 under a relative variance requirement of 0.7. The figures display the best solutions—measured by the number of zones—across generations. Reaching solutions with fewer zones requires more iterations for Instance 6 (140 samples) than for Instance 3 (89 samples). As instance size increases, so does the number of generations needed to achieve solutions with fewer zones.
3.2. “Quilaco” Instance
The genetic algorithm was evaluated using the real-field instance “Quilaco” described in
Section 2.4.
Table 3 presents the experimental results comparing the GAZD approach with the BILP model. For each property, relative variance values ranging from 0.5 to 1 were used to test the algorithms, resulting in a total of 24 optimization cases. The first column lists the chemical soil properties: organic matter (OM), pH, phosphorus (P), and sum of bases (SB). The second column specifies the homogeneity level of the management zones, represented by the required relative variance. Column 3 shows the optimal solution, in terms of the number of zones, obtained by the BILP model. Columns 4–6 present the results for the genetic algorithm, reporting the minimum, average, and maximum number of zones among the best solutions found across 50 independent runs of the algorithm. The color-highlighted results indicate cases where the GAZD achieved outcomes equal to or better than those of the BILP. As the two algorithms were executed on different computers for this test, execution times are not included in the table, nor is a direct comparison provided. However, time performance is briefly discussed at the end of the section to provide an overview of the differences.
Within each property optimization set, the GAZD exhibited a consistent relationship between the homogeneity constraint and the number of zones: as the relative variance requirement increases, the number of zones in the partitions also tends to increase. The range, i.e., the difference between the maximum and minimum number of zones in the best solutions, varies between zero and five.
Nevertheless, the average range across all optimization cases is 2.71 zones, demonstrating the robustness of the GAZD, as the best solutions do not vary significantly between executions. In more than 50% of the optimizations (15 out of 24), the GAZD found solutions equal to or better than those of the BILP, particularly for cases involving organic matter and sum of bases properties. For the pH and phosphorus optimization sets, the genetic algorithm generally produced solutions that were slightly inferior but still close to those of the BILP. The fact that in several cases the GAZD “outperformed” the optimal solutions produced by the BILP can be attributed to differences in how the models handle the field description. While the BILP requires the addition of two dummy samples to complete a rectangular field shape—thereby assigning non-real property values to the field description—the genetic algorithm excludes these areas from the optimization process. As a result, the two models are effectively dealing with different field representations.
BILP execution times (see [
20] for details on software and hardware) range from 0.01 to 0.2 s, while GAZD execution times range from 5.57 to 21.25 s. Higher execution times in the GAZD are associated with cases requiring high homogeneity levels, where larger initial population sizes are necessary to ensure sufficient diversity and the presence of well-suited solutions to support the following stages of the evolutionary process.
3.3. Instances 1–6
Table 4 presents the experimental results comparing the BILP and GAZD approaches for the set of Instances 1-6. For each instance, three different relative variance values (0.5, 0.7, 0.9) were used to test the algorithms, resulting in a total of 18 optimization cases. The first column lists the instance index, while the second column shows the instance size in terms of the number of samples. The third column specifies the required homogeneity level of the partition. Columns 4–5 present the results and execution times for the BILP, while Columns 6–9 display the results for the GAZD across 50 independent runs, along with their average execution time. Cases where the GAZD achieved results equal to or better than those of the BILP are highlighted in color.
Since no dummy samples were required, the field descriptions used by both models were identical, allowing for a direct comparison of their results. In this set of tests, GAZD results showed a slight increase in the difference between the best minimal and maximal solutions per case, ranging from zero to eight zones. Nevertheless, GAZD found the global optimum in 60% of the optimizations (11 out of 18), and for the remaining cases, the deviation of GAZD’s best minimal solution from the BILP results ranged from one to three zones, equivalent to a relative percentage error range of 4.3% to 15.4%. This demonstrates that GAZD is capable of finding global or near-optimal solutions, effectively avoiding premature convergence to local optima.
However, the BILP approach significantly outperformed GAZD in terms of execution time: BILP execution times ranged from 0.078 to 1.75 s, while GAZD’s average times varied between 8.94 and 375.5 s. The execution times of the genetic algorithm increased with the combinatorial size of the problem (i.e., the number of samples in the instance) and with stricter homogeneity requirements. The execution time ratios (GAZD/BILP) varied widely, from 22.5 to 451 across all optimizations. While GAZD is slower, its execution times remain within acceptable limits for most agricultural decision-making processes, where real-time performance is not critical. Furthermore, strategies such as implementing the model in C and parallelizing processes during execution could significantly improve performance.
3.4. Scalability Analysis
Complementary experiments using a relative variance requirement of 0.9 were conducted with Instances 7–12 to evaluate the GAZD’s performance on larger-scale optimization problems. These results, combined with those from Instances 1–6, provide a comprehensive overview of GAZD’s performance.
Figure 9 and
Figure 10 summarize the results from 50 independent runs for each optimization case. The first figure presents the average execution time, while the second illustrates the trends in the minimum, average, and maximum of the best solutions across all instances. The execution time behavior exhibits two distinct phases. The first phase, covering Instances 1 to 6, shows an increasing trend as the instance size grows. This can be explained by the simultaneous increase in both the initial population size required to start the evolutionary process and the number of iterations needed to achieve solutions with fewer zones. The second phase, from Instances 7 to 12, marks a halt in this growth trend, with execution times stabilizing at values slightly lower than the maximum observed in Phase 1. This can be attributed to two compensating behaviors: while the initial population size requires decreases for this set of instances, the number of iterations increases. Overall, it can be inferred that beyond a certain instance size, execution time ceases to be a critical factor. In such cases, the improvement strategies proposed in previous sections could be sufficient to enhance GAZD’s performance. In relation to the number of zones in the solutions, as shown in the second figure, the gap between the minimum and maximum number of zones in the best solutions widens as instance size increases. This indicates that for larger instances, GAZD’s results exhibit increased variability in the solutions obtained. As a result, the algorithm’s robustness and precision decrease in larger-scale problems, potentially leading to deviations from the optimal solution. This problem should be addressed in future research. Potential solutions could involve exploring how the maximum allowable partition sizes (i.e., the maximum number of zones) can be regulated, either statically or dynamically, within the evolutionary process. In the present study, these values were kept static, fixed to the maximum possible partitioning—equivalent to the number of sampled AUs. By varying these parameters, it may be possible to enhance the robustness and precision of the algorithm for larger-scale problems.
3.5. Irregularly Shaped Instances
The results of GAZD for the three irregularly shaped instances are presented in
Table 5. Three relative variance values (0.5, 0.7, and 0.9) were used to evaluate GAZD, resulting in a total of nine optimization cases. The first column lists the modified instance, while the second column indicates the number of sampled AUs. The third column specifies the required homogeneity level of the partition. Columns 4–7 present the results obtained across 50 independent runs of GAZD, along with the average execution time.
For this test, a performance comparison between the BILP and the GAZD was not conducted due to the number of dummy samples required by the BILP to handle the non-rectangular shapes. For the modified Instances 4 and 5, the BILP requires 33 dummy samples, and for modified Instance 6, 37 dummy samples are required. This extra information, added to the ”real” field description, distorts the values of parameters such as the total variance and the relative variance of the partitions, which are used in the optimization process. In contrast, the GAZD uses only the information from the sampled AUs, preserving the ”real” agricultural field description. Compared to the real-instance experiment, where only two dummy samples were involved, in this case the differences between the instances descriptions would be emphasized, resulting in the optimization of different problems by the algorithms. In these circumstances, the comparison between model was deemed not pertinent.
Table 5 shows that the relationship between the homogeneity requirement and the number of zones in the partition remains consistent: as the relative variance requirement increases, the number of zones also increases.
Figure 11 presents examples of solutions for all modified instances under various relative variance requirements. The figure illustrates that, in all cases, the GAZD is able to find solutions that respect the rectangular zone shape constraint.
4. Conclusions
This paper presents a methodology based on genetic algorithms to address the site-specific management zone (SSMZ) delineation problem. The GAZD generates optimized field partitions, using rectangular zones, for both regular and irregularly shaped agricultural fields. While other methodologies add dummy samples to handle the SSMZ problem in non-rectangular fields, the GAZD uses only real sample data, thereby not altering the field description during the optimization process. The possibility to deal with irregularly shaped instances without adding non-real data is the major contribution of this methodology.
To evaluate the algorithm, its performance was compared with that of an exact approach based on integer linear programming (BILP). Experimental tests were conducted using real-field instances and a set of generated irregularly shaped instances of various sizes. Although the GAZD requires longer execution times compared to the exact BILP approach, the algorithm demonstrates functionality, flexibility, and robustness in addressing the SSMZ problem for both rectangular and irregularly shaped agricultural instances, especially for smaller problem sizes. However, for larger instance sizes, a loss of precision and robustness is observed, with the GAZD exhibiting increased variability in the solutions. This issue will be addressed in future work, where potential solutions may involve dynamically adjusting the maximum allowable partition sizes (i.e., the maximum number of zones) during the evolutionary process to enhance the algorithm’s robustness and precision for larger-scale problems.
Moreover, GAZD’s time performance is not critical for the planning decisions related to the SSMZ problem, and there is significant potential for improving its performance. Strategies such as implementing the model in a compiled language and parallel processing could substantially reduce execution times. Additionally, the GAZD provides a set of “good enough” solutions that users can evaluate in terms of feasibility and practical convenience, offering a potential advantage in supporting decision-making processes.