1. Introduction
Crops such as corn and cotton exhibit considerable yield variability due to differences in soil fertility, moisture capacity, and other local conditions, making variable rate seeding (VRS) a promising approach to boost both productivity and sustainability. VRS is an advanced precision agriculture strategy designed to improve crop performance by adapting seeding rates to the variability in soil characteristics, seed cultivar suitability, topography, and environmental factors in a field. This contrasts with the conventional uniform seeding method, which uses a constant seed rate throughout the field, often resulting in inefficient resource use and spatial yield variability, especially in fields exhibiting large spatial or anthropic spatial variability. Overall, studies agree that adapting seeding rates to field variability enhances resource efficiency and yield consistency. The fundamental concept of VRS is to align the seed rates with the specific needs of different management zones (MZ) within a field. For example, areas with fertile soil may benefit from denser plant populations to maximize yields, while less fertile zones may require lower seed rates to prevent excessive competition for scarce resources. By customizing seeding rates, farmers can optimize the use of inputs such as seeds, fertilizers, and water, which could lead to better economic returns and lower environmental impacts. This site-specific adjustment is widely linked to higher input efficiency and sustainability.
Anselmi et al. [
1] demonstrated that MZ specific seeding is a cost-effective strategy that boosts yields while lowering expenses with seeds. For large-scale implementation, the feasibility of VRS is further enhanced by Speranza et al. [
2]. They showed that vegetation indices (VIs) alone can be sufficient to delineate these MZs, simplifying the process, and making precision seeding more accessible for broader adoption. These findings point toward more practical and scalable VRS applications through remote sensing.
Šarauskis et al. [
3] found that using VRS equipment on 400 ha of arable land leads to a one-year payback period, with an average benefit of around 100 EUR per hectare. Their research also concluded that VRS is effective for farms averaging at least 150 ha. For cotton, where plant density plays a critical role in boll formation and overall yield, fine-tuning seeding rates based on soil properties can reduce costs and improve fiber quality [
4]. Thus, economic assessments generally support the profitability of VRS in large-scale operations.
Research on the effect of plant density on cotton lint yield presents contrasting findings, often influenced by water availability (
Table 1). For instance, Kimura et al. [
5] investigated yields in a water-scarce environment, finding no statistical differences across various population densities. However, their study suggested that lower final plant densities optimized net return when compared to higher densities. This implies that producers in water-scarce regions might achieve acceptable economic benefits by reducing seeding rates, even with statistically similar yields. In contrast, studies by Khan et al. [
4,
6] consistently reported that a specific higher plant density resulted in the highest seed cotton and lint yields (see
Table 1 for specific values). Both studies noted that increasing plant density beyond this optimum led to a decrease in yield, attributing the enhanced yield at the optimal density to greater biomass accumulation in reproductive organs. Overall, water availability emerges as a key factor explaining the contrasting yield-density relationships.
Boyer et al. [
7] also observed a curvilinear relationship between the seeding rate and the lint yield, where the yield decreased beyond an optimum. Their research identified a yield-maximizing seeding rate, which is listed in
Table 1. However, they distinguished this from profit-maximizing objectives, which often lead to different recommendations [
7]. These authors concluded that profit-maximizing rates depend on both cotton lint and seed prices; as seed prices increase, the optimal seeding rate for profit decreases. In contrast, an increase in cotton prices leads to an increase in the optimal seeding rate. For example, with a seed price of
$5 per 1000 seeds, optimal profit-maximizing seeding rates (shown in
Table 1) were identified when the assumed prices for cotton varied from
$1.1 to
$1.7 kg
−1. These profit-maximizing rates were consistently lower than the rate identified for maximizing yield. Hence, most economic analyses show that profit-maximizing seeding rates are lower than yield-maximizing ones.
Further supporting this distinction, Kimura et al. [
5] observed that under dryland conditions, lower population densities significantly increased net returns due to reduced seed costs, even when lint yields were statistically similar across different densities. In contrast, irrigated trials showed consistent net returns across all tested population densities. While Khan et al. [
6] reported that the optimal plant density for lint yield (see
Table 1) also produced the highest net returns, they acknowledged that seed and labor costs substantially impacted profitability. They noted that lower plant densities might sometimes necessitate increased labor for specific field management practices, such as vegetative branch removal. Together, these findings indicate that profitability depends on both input costs and production environment.
The intricate relationship between plant density and cotton yield is modulated by a confluence of environmental and management factors. These include, but are not limited to, geographical location, precipitation levels, heat unit accumulation, soil type, planting dates, tillage practices, the presence of previous crops or cover crops, and fertility and weed control programs [
4,
5]. Row spacing also plays a critical role. In environments with limited water availability, lower plant densities are frequently identified as optimal [
5]. Additionally, delayed planting dates typically reduces yield potential, and in such scenarios, reducing the seeding rate can become a profit-maximizing decision. Collectively, these studies emphasize that seeding optimization requires integrating agronomic, climatic, and management variables.
While the aforementioned research clearly demonstrates the significant potential of VRS to enhance yields, reduce costs, and improve sustainability in various crops (including corn and cotton), widespread adoption of this promising technology remains notably limited. Despite the demonstrated economic benefits, such as a rapid payback period and substantial returns per hectare, farmers continue to face considerable hurdles, including the initial investment in specialized equipment, the complexity of sophisticated data collection and analysis, and the need for specialized expertise to develop precise, site-specific seeding prescriptions. Most authors agree that adoption barriers are primarily technical and financial, not agronomic. Ongoing advancements in precision agriculture, particularly the improvement of spatial data precision and the increase in technology accessibility, are crucial to overcoming these barriers and providing the robust economic evidence necessary to accelerate the broader integration of VRS into diverse agricultural systems.
Although the referenced literature presents promising results, a critical limitation is evident: most of the studies were carried out under strictly controlled conditions on small experimental plots within research institutions. Only Anselmi et al. [
1] incorporated on-farm experimentation, which was exclusively focused on maize seeding rates, in a commercial farm. Generalizing recommendations for optimal seed populations from such constrained environments remains challenging due to the vast diversity in soil properties, climatic conditions, and existing cultural practices encountered across commercial agricultural landscapes. This limitation underscores the need for more on-farm studies to validate VRS performance at operational scales. Furthermore, it is crucial to acknowledge the scale of real-world operations; the fields studied in our work range from 78 to 332 ha, commonly managed with fleets of up to five planters and as many as fifteen combines during sowing and harvesting, respectively.
Recognizing these fundamental differences, the present research focused on on-farm investigation within a commercial operation. The primary objective of this study was to establish a practical, data-driven methodology that enabled farm managers to determine the seed population that maximized return on investment under their specific field conditions. This approach was developed under the hypothesis that the essential spatial and operational data required for this analysis were already available on most farms. In doing so, the work aimed to empower farm managers to conduct their own site-specific experimentation with minimal disruption to routine operations.
2. Materials and Methods
2.1. Experimental Design
The cotton experiment was conducted in five rainfed fields on the commercial farm Tucunaré, which is part of the Amaggi Group and located in Sapezal, Mato Grosso, Brazil (12°59′22″ S, 58°45′51″ W) (
Figure 1). This site is within the Cerrado biome, the largest tropical savanna in the world, spanning 198.455 Mha and covering 23.3% of the Brazilian territory [
8]. These relatively flat fields typically operate under a no-tillage double-cropping system, with soybeans cultivated in the spring-summer period and cotton in the summer-fall-winter period.
Table 2 details the characteristics of each experimental field and its corresponding cotton season. The predominant soil type in the five fields is dystropic red latosol (Oxisol). Notably, while fields B through E provided data only for the 2023/24 harvest season, Field A uniquely offered data from two consecutive cotton seasons: 2022/23 and 2023/24. For the second season, the farm manager decided to reduce the planted area of Field A by excluding a sandy region that had previously produced very low yields.
The fertilization management was applied as follows:
N: A total of 170 kg ha−1 was applied in four top-dressing splits: one-fifth at 15, 30, and 70 days after sowing (DAS), and the remaining two-fifths at 45 DAS.
P: 40 kg ha−1 was applied as a single dose at sowing.
K: A total of 160 kg ha−1 was applied in three splits: one-fourth at sowing, one-fourth at 30 DAS, and half at 70 DAS.
S: A total of 70 kg ha−1 was applied in two equal splits at 30 and 70 DAS.
B: A total of 3 kg ha−1 was applied, with one-third at sowing and the remaining two-thirds as a top-dressing at 30 DAS.
Elevation data for all fields were precisely collected using the Real Time Kinematic GPS (RTK-GPS) monitor of a combine harvester, while a clay content map was generated through kriging interpolation from soil samples taken at a density of one sample per 5 ha.
The trial methodology evaluated four seeding rates: 90%, 100% (100% represents the recommended dose. The application range of 90% to 120% was determined by the farm manager based on operational discretion and risk tolerance.), 110%, and 120% of the recommended seed companies’ rates (
Table 3). These populations were applied within defined grid cells, also known as virtual plots (
Figure 2), established within the fields. For all fields, rows were spaced at 3 feet (0.9144 m), with each grid cell’s width determined by two passes of the planter.
Table 3 also details the dimensions and number of cells according to the planter, determined by farm logistics, and its respective number of rows.
The farm manager made the yield maps available, which were normalized by the weight of the cotton bales harvested by each John Deere CP690. This normalization is necessary to minimize differences in calibration between multiple machines. The yield monitor on the harvester measures the mass flow rate of the cotton being conveyed to the harvester’s basket. The John Deere CP690 specifically uses a microwave sensor that measures the mass flow rate and moisture content as cotton travels through the air ducts. This data, combined with an RTK-GPS signal for location and other machine data, such as speed, is used to calculate and create a detailed yield map for the field. While microwave sensors are susceptible to environmental factors like humidity, proper calibration and normalization procedures help ensure the accuracy of the final yield map. Due to the humidity influence on cotton mass, it is important to harvest the entire field in the same period of the day.
The four populations were distributed in grid cells using a 4 × 4 Latin square design as shown in (
Figure 2).
The climate of the region is characterized as tropical monsoon (Am) according to the Köppen climate classification [
9], and the shapefiles of the climate zones of Brazil were obtained in [
10]. The rainfall distribution and temperatures during each crop season are presented in
Table 4 in multiple intervals of 30 DAS. Daily weather in each field during crop season is represented in
Figure 3. Precipitation data were obtained from the Climate Hazards Center InfraRed Precipitation with Station data (CHIRPS), a quasi-global high resolution (0.05°) rainfall dataset that spans over 40 years, from 1981 to the near-present [
11]. CHIRPS combines satellite imagery from the infrared spectrum with data from on-the-ground weather stations to produce accurate gridded rainfall time series. The vertical resolution (north-south) is constant, approximately 5.55 km, while the horizontal resolution is about 5.41 km at a latitude of −12.99°. Temperature data was obtained from the fifth generation of European ReAnalysis (ERA5) produced by the Copernicus Climate Change Service (C3S) at the European Center for Medium-Range Weather Forecasts (ECMWF) [
12]. ERA5-Land provides hourly estimates for numerous atmospheric, land, and oceanic climate variables, including 2-m air temperature. This reanalysis product is created by combining a large number of observations from a variety of sources with an advanced numerical weather prediction model, resulting in a complete and consistent global data set with 31 km of horizontal resolution.
Significant differences in accumulated rainfall profiles are observed in
Figure 3, likely because the fields are geographically separated by distances ranging from 1.6 km to 19.2 km.
2.2. Management Zones Delineation
On-farm experimentation is fundamentally restricted by the availability of existing datasets at the farm level and by the financial or logistical capacity of the farm manager to obtain additional information. For this investigation, we received the soil clay content analysis data and the yield maps of the plots, normalized with the mass of weighted cotton bales. The sowings were carried out according to the seed population application maps (
Figure 2), which were previously generated and downloaded to the planter’s monitors.
Recognizing that soil clay content and elevation alone were insufficient for a comprehensive analysis for management zones delineation, we augmented our data set with remote sensing data. We specifically acquired the mean values of twelve distinct VIs (
Table 5), derived from Sentinel-2, spanning from September 2018 to May 2025. This collection of VIs reflects crop variations, including factors such as water availability, senescence, chlorophyll content, and biomass [
2]. This extra data was collected using a JavaScript script developed within the cloud-based platform, Google Earth Engine (GEE). To ensure high-quality, cloud-free optical satellite surface reflectance imagery for analysis, Sentinel-2 data were processed by loading a collection of Sentinel-2 images that have been processed with the Cloud Score+ algorithm (version 1) [
13] to provide quality assessment information (such as cloud and shadow masks). The parameter CLEAR_THRESHOLD was set to 0.7, which means that only pixels where the cloud score (cs or cs_cdf) is greater than or equal to 0.7. Pixels with scores below this threshold would be masked out or excluded because they are considered too cloudy or obscured. Furthermore, only images with an average enhanced vegetation index (EVI) between 0.5 and 0.9 were retained. This specific EVI range ensured that the acquired data solely represented active crop growth periods, optimizing the relevance of the clustering method to determine the MZs. Research by Siqueira et al. (2024) [
14] indicates that EVI and Triangular Vegetation Index (TVI) demonstrate the strongest correlation with cotton yield in the Brazilian Cerrado biome. This correlation is particularly evident between 90 and 150 DAS, a period corresponding to crucial phenological phases such as development of the boll, open boll, and fiber maturation. This finding underscores the value of these VIs for in-season cotton yield prediction in the region.
In summary, the JavaScript script executes the following steps:
Clip satellite image tile with field boundaries: The code first clipped the Sentinel-2 image tiles using a shapefile mask representing the field boundaries.
Filter images: It applied the aforementioned constraints (exclusion of cloudy and shadowed images, EVI range), ensuring only relevant data were included.
Calculate average VIs values: Using the 10 m × 10 m grid of Sentinel-2, the script computed the average value of each of the twelve VIs and incorporated these values into the clipped raster file.
Save results: The processed data were then saved in raster format for further analysis.
For the defined acquisition period, 61, 56, 66, 26, and 60 images were successfully obtained that met all filtering constraints for plots A, B, C, D, and E, respectively. The raster files were converted to vector shapefiles with a Python (version 3.10.12) script using the GeoPandas library.
The grid cells defined for the seeding operation (
Figure 2) were subsequently processed as a base grid in QGIS (2025) [
15] using the “Join attributes by location (summary)” tool. This process was used to average data points in the shapefiles containing the twelve VIS, soil clay content, elevation, and the previous crop yield map. The resulting shapefiles for each plot were then processed using the Python GeoPandas Library to open the geospatial shapefiles as DataFrames and conveniently apply the K-Means clustering algorithm, available in the machine learning library scikit-learn [
16].
K-means clustering relies on similarity measured by distance metrics, such as the Euclidean distance. Consequently, the algorithm is sensitive to the scale of the input variables, which requires their normalization. While VIs typically range from −1 to a few units, soil clay content can vary by tens or hundreds (depending on the measurement unit), and elevation may span hundreds or thousands of units. Schenatto et al. [
17] evaluated standard score, range, and average methods for data normalization in delineating MZs and concluded that the range (Equation (
1)) has the best performance. The corresponding method in Python is the class “MinMaxScaler” of the module “sklearn.preprocessing” within the Scikit-learn.
where
Z is the scaled value in the grid cell centroid,
X is the original value,
and
are the minimum and maximum values of the entire field, respectively.
To determine the optimal number of clusters (k), we employed the KElbowVisualizer diagnostic visualization tool from the Yellowbrick library. This tool extends scikit-learn, offering robust visual analytics for machine learning tasks.
The KElbowVisualizer automates the process of running K-Means across a range of potential k values, typically from 1 to 10. For each k, it trains a K-Means model and calculates the distortion score, also known as the within-cluster sum of squares (WCSS) or inertia. This score quantifies the compactness of the clusters; a lower score indicates more tightly grouped clusters.
The KElbowVisualizer then plots the distortion on the y-axis against the number of clusters (k) on the x-axis. The resulting graph often resembles an arm, and the “elbow” is the critical point. This “elbow” indicates where the rate of decrease in distortion sharply slows down. This point is considered a strong candidate for the optimal k because adding more clusters beyond this point yields only marginal reductions in distortion; essentially, simply subdividing already well-formed clusters rather than revealing new meaningful groupings.
With k established, the aim was to identify the optimal combination of 14 parameters (12 VIs, soil clay content, and elevation). Each of the
possible parameter combinations was evaluated within a clustering routine, running the Python function “combinations()” of “itertools” module, to maximize yield variance reduction (VR) [
17]. This is a key indicator of the clusters’ ability to explain yield variability, and was quantified using the following Equation (
2):
where
c represents the number of MZs,
represents the fraction of area of
i-th MZ area by the total area,
is the variance of yield in the
i-th MZ, and
is the yield variance within the total area of the plot.
To validate intra-cluster consistency, we used the silhouette width (SW) [
18] or silhouette coefficient, which quantifies how similar a feature is to its own cluster. This metric ranges from −1 to +1, where values exceeding 0.7 indicate “strong” consistency, values above 0.5 suggest “reasonable” consistency, and values greater than 0.25 signify “weak” consistency.
A step-by-step flowchart (
Figure 4) was created to illustrate the entire methodology for delineating MZs.
2.3. Statistical Analysis of Population Effect on Cotton Yield
The primary research question addressed was: Does variation in plant population density within a specified range affect cotton yield? To investigate this question while accounting for spatial variability in heterogeneous production fields, a single-plot approach using a Latin square (
LS) design was employed. Five production fields were sown with four target population densities, centered around the regional standard density, informed by farmer expertise and seed company recommendations. Each field was systematically arranged to incorporate plant density variation, with the analytical focus on estimating the effect of population density in typically heterogeneous fields. Given the quantitative nature of the controlled variable (population density) and the limited number of target densities, linear regression models were fitted in SAS (version 9.4) to evaluate the relationship with cotton yield. For each target plot, linear regression models were fitted to data from
LS designs, selected to include the target plot and allowing up to four missing plots to accommodate irregular field shapes. The slopes from these models were combined in Python to classify the density effect as negative (
), positive (
), or non-significant (0), based on their sign and significance (
, F-test). The relationship between these classified effects and the predefined management zones was evaluated using statistical routines available in R software (version 4.4.3) [
19] by analyzing their joint and marginal distributions. All models adjusted for spatial variation, with significance assessed at
. Details of each step are provided in the following subsections.
2.3.1. Latin Square Selection
To evaluate the effect of plant population density on cotton yield, each field was divided into a grid with dimensions
and
, with four target population density (
TarPop, plants m
−1) assigned in a
LS design. For each grid point
, where
and
, a
subgrid (rows
, columns
) was extracted and assigned a unique identifier
R_C, in a dataset column
LS. Subgrids with at least 12 plots (
), allowing for the absence of up to one row or column to accommodate geometrically irregular fields, were retained as
LSs. Each plot within a subgrid was assigned a unique identifier formatted as
R_C, stored in the dataset column
IDplot. The algorithm produces a dataset
LSs containing
LS subgrids (with columns
row,
column,
TarPop,
PopApl,
Yield,
IDplot,
nplot referring to target and applied plant density, plot ID and number of plots in the subgrid) and a dataset
LSPlotList storing subgrid identifiers in column
LS and their comma-separated plot lists in column
Plots. The selection process is detailed in Algorithm A1,
Appendix A.
2.3.2. Regression Analysis
For each
LS, cotton yield (
Y, kg ha
−1) was modeled as a function of the applied plant density (
PopApl, plants m
−1), centered on the overall mean of the field, using a generalized linear mixed model (GLMM) implemented in the GLIMMIX procedure of SAS [
20]. The model was fitted using restricted pseudo-likelihood (
method = RSPL) with the linear predictor:
where
X is the design matrix for plant density, and
is the regression slope, specified without an intercept (
noint). Spatial heterogeneity was accounted for by including random effects for row and column, modeled as described by [
21]:
where
Y is the yield vector,
Z is the design matrix for random effects,
represents row and column effects, and
is the residual error. The covariance matrix
G was specified as diagonal to account for spatial heterogeneity by allowing independent random effects for rows and columns, improving the precision of
, while
R was diagonal with homoscedastic residual errors to maintain a simple residual structure. Parameter estimates were extracted to
ParEstLS using SAS Output Delivery System (
ODS output). A data step added a column
Recp to
ParEstLS, assigning
for a significant (
, F-test) negative slope, 1 for a significant positive slope, and 0 otherwise. The
Recp value represents the predicted effect of increasing plant density by one plant per meter above the overall mean of the field.
2.3.3. Combining Multiple Models for Density Effect Classification
Each plot within a
LS is included in multiple regression models, each yielding a potentially different predicted density effect class (
Recp,
,
, or 0). To assign a single density effect class to each plot, two ensemble approaches were employed. The first approach used majority voting Algorithm A2 of
Recp values from all models containing the plot. The second approach extended the first by including
Recp values from models of its immediate neighbors, as illustrated in
Figure 5. Ties were resolved by the sign of the sum of all
Recp values. The classification process is detailed in the three-stage algorithmic pipeline (Algorithm A3). Alternative variations of these criteria were explored, including one weighting votes by the proportion of plots in the
LS relative to 16, and another using non-centered plant population density with standard linear regression models. These variations were disregarded because most plots had similar density effect classes, despite differing class proportions, and exhibited weaker association with management zones, as described in the subsequent
Section 2.3.4.
2.3.4. Association Analysis Between Predicted Classes and Management Zones
To evaluate the association between predicted plant density class effects and management zones, chi-square tests of independence were applied to contingency tables (
MZ ×
Cl), and Cramér’s V was computed to assess the strength of association. The chi-square test of independence, implemented via the
chisq.test function in the
stats package [
19], was used as the default method, with the test statistic calculated as:
where
denotes observed frequencies and
denotes expected frequencies, computed from the marginal totals of the contingency table. The associated
p-value is derived from the chi-square distribution with
degrees of freedom, where
r and
c are the number of rows and columns in the contingency table, respectively.
To ensure the validity of the chi-square test, expected frequencies () were evaluated. The chi-square test was deemed appropriate when no expected frequency was less than 1 and at most 20% of cells had expected frequencies below 5 (i.e., at most one cell for tables and none for tables). If these conditions were not met, Fisher’s exact test, implemented via the fisher.test function in the stats package, was applied. Fisher’s exact test computes an exact p-value based on the hypergeometric distribution, suitable for tables with small frequencies or sample sizes. For cases where Class contained only one category, no independence test was performed.
For exploratory purposes, when Fisher’s exact test was used due to inadequate expected frequencies, the chi-square test statistic and p-value were computed and reported as exploratory results, with a caveat that they were not reliable for inference due to assumption violations.
Expected frequencies for all contingency tables were calculated and recorded to justify the choice between chi-square and Fisher’s exact tests, using:
where
is the row total for row
i,
is the column total for column
j, and
n is the total sample size.
The strength of association between
MZ and
Cl was quantified using Cramér’s V, calculated via the
assocstats function in the
vcd package [
22], as:
The effect of plant population density (
PopApl) on cotton yield was studied for plots categorized by management zones (
MZ) and density effect classes (
Cl) using linear regression models, requiring a minimum of eight plots per model, based on the framework in
Section 2.3.2, except with non-centered
PopApl and an intercept.
4. Discussion
The findings of this research on optimizing cotton cultivation through VRS in commercial fields provide some insights, particularly when interpreted in the context of previous studies and the inherent complexities of real-world agricultural systems. The primary hypothesis that VRS can enhance crop performance by adapting seeding rates to field variability is strongly supported by the demonstrated ability of management zones (MZs) to identify areas that respond differently to plant density variations.
Previous research on cotton plant density presents contrasting findings, often influenced by environmental factors such as water availability. For instance, Kimura et al. [
5] found no statistical differences in lint yield across various population densities in water-scarce environments, suggesting that lower final plant densities could optimize both yield and net return due to reduced seed costs. In contrast, Khan et al. [
6] consistently reported that higher plant densities (e.g., 87,000 plants ha
−1) resulted in the highest seed cotton and lint yields, attributing this to greater biomass accumulation. Boyer et al. [
7] identified a yield-maximizing seeding rate around 118,000 plants ha
−1, but crucially distinguished this from profit-maximizing rates, which were consistently lower (78,000 to 93,000 plants ha
−1, dependent on cotton and seed prices).
Our study’s results for the 2023/2024 crop season—showing no positive response to increasing plant density above the field averages and negative responses in a significant percentage of plots (e.g., 84% in Field A, 2023/2024)—align more closely with the perspective that lower densities can be optimal, especially for profit maximization or under specific environmental conditions. The comparison of Field A across two consecutive seasons (2022/2023 and 2023/2024) further highlights the influence of unpredictable weather conditions. A shift from a 69% positive response in 2022/2023 to predominantly negative responses in 2023/2024, despite consistent sowing densities, underscores the dynamic interplay between plant density, climate, and yield. This dynamic nature emphasizes the importance of site-specific adjustments that account for inter-seasonal variability.
The statistical correlation indicating that lower clay content is associated with yield reductions under higher plant densities should be interpreted within the framework of resource competition dynamics. Clay content serves as a fundamental determinant of soil physical and chemical properties, governing critical processes such as water retention, nutrient availability, and root development. In MZs characterized by lower clay content—such as MZ 1 (38% clay)—resource availability is inherently constrained; consequently, increased plant density exacerbates inter-plant competition, culminating in the observed negative yield responses. Quantitative evidence supports this interpretation: regression analyses revealed substantially greater yield reductions in low-clay zones (e.g., a 23 kg ha−1 difference) compared with zones possessing higher clay content (e.g., MZ 2 at 52% clay), underscoring that resource scarcity amplifies the adverse effects of elevated plant population density.
The delineation of MZs proved to be an effective strategy, with significant associations found between MZs and plant density effect classes across most fields. Our findings consistently showed that lower clay content was correlated with yield losses at higher plant densities, particularly in fields A, B, and C. This provides strong evidence that soil properties, in conjunction with other factors, critically modulate the optimal seeding rate and yield response within different management zones.
Limitations emerged when transitioning from small, controlled research plots to large-scale, operational on-farm experimentation. Regarding spatial resolution, the methodology depended on input data of varying granularity: soil clay content maps were generated via kriging interpolation from sparsely distributed samples (one per 5 ha), while climate datasets—such as CHIRPS precipitation and ERA5-Land temperature—had relatively coarse spatial resolutions, ranging from approximately 5 km to 31 km. This disparity in spatial scale contrasts with the actual heterogeneity observed across the commercial fields, which are separated by distances of up to 19.2 km and exhibit substantial differences in accumulated rainfall.
With respect to climatic variability, the inter-seasonal shift in field A’s yield response (from 69% positive in 2022/2023 to predominantly negative in 2023/2024), despite constant sowing density, highlights that optimal seeding rates are highly sensitive to unpredictable weather conditions. Finally, the experimental design was necessarily constrained by the overarching objective of minimizing operational interference and relying exclusively on existing farm data. This constraint required a complex statistical framework, in which a single density-effect class for each plot was derived by combining outcomes from up to 16 regression models through ensemble methods (MV and NEMV), reflecting the inherent difficulty of isolating variables in heterogeneous commercial environments.
In context, the most substantial potential for profit improvement indicated by our results arises from the high proportion of plots exhibiting either no positive yield response (Class 0) or significant yield loss (Class −1) when plant density exceeded field averages during the 2023/2024 season. In field B, C, and D, Class 0 responses were observed in 83%, 69%, and 95% of plots, respectively. Under these conditions, VRS prescriptions would recommend reducing seeding rates toward the lower end of the tested range (e.g., 8.0 plants m−1 in field D and 5.3 plants m−1 in field B). In these non-responsive consistent with prior evidence that VRS improve resource-use efficiency and promotes sustainable production practices. This site-specific adjustment in seeding rates, which prevents excessive plant competition, is widely linked to higher input efficiency and overall sustainability.
Despite the demonstrated economic benefits of VRS, such as rapid payback periods, widespread adoption remains limited due to hurdles like initial investment. This research directly addresses this barrier by proposing a methodology for on-farm investigation that leverages already available data sources and minimizes interference with routine farm operations. By empowering farm managers to conduct their own site-specific experimentation, this approach facilitates the determination of the most profitable or highest-yielding seed populations tailored to unique field conditions, thereby promoting the broader integration of VRS into agricultural systems. Future research should continue to focus on improving spatial data precision and making these technologies even more accessible to further accelerate VRS adoption across diverse agricultural landscapes.
5. Conclusions
This study successfully established an enabling methodology for optimizing cotton cultivation through VRS in commercial on-farm settings. The approach prioritizes leveraging readily available farm data while minimizing operational interference, thereby addressing the critical gap between small, controlled experimental research plots and large-scale operational validation (fields ranging from 78 to 332 ha). By providing farm managers with a validated framework to conduct their own site-specific experimentation, this work directly contributes to advancing precision agriculture and empowering producers to determine the seed population that maximizes returns under their specific field conditions.
A key implication of the research is the validation of MZs as robust tools for guiding site-specific seeding decisions. Statistically significant associations were observed between MZs and plant population density effect classes across most experimental fields, demonstrating that MZs can effectively delineate areas that exhibit differentiated yield responses to seeding variation. Furthermore, the findings consistently showed that increasing plant density above field averages during the 2023/2024 season generally led to non-positive or negative yield responses, particularly in areas characterized by lower clay content. This strong evidence suggests that optimizing input use often requires reducing seeding rates in response to localized constraints. This approach is widely linked to higher input efficiency, promoting resource efficiency, and enhancing the economic viability and sustainability of cotton production by preventing yield losses caused by excessive inter-plant competition. The dynamic nature of yield responses, demonstrated by inter-seasonal variability, underscores the critical need for adaptive management strategies informed by historical field data.
Despite the methodology’s success in commercial implementation, several limitations must be acknowledged. The study relied on input data of varying granularity, specifically noting the disparity between the coarse spatial resolution of climate datasets (CHIRPS precipitation and ERA5-Land temperature, ranging from 5 km to 31 km) and the high heterogeneity observed across fields separated by up to 19.2 km. Additionally, the reliance on existing farm data constrained the experimental design, requiring a complex statistical framework involving ensemble regression methods to isolate variables in heterogeneous commercial environments.
Future research should focus on two main directions to accelerate the widespread adoption of VRS:
Improving Spatial Data Precision: Continued effort is needed in developing and integrating high-resolution spatial data for key variables like soil properties and climate, which will enhance the accuracy and predictability of MZ delineation.
Increasing Technology Accessibility: Further efforts must be made to simplify technology and data analysis requirements, making the methodology and subsequent VRS implementation more accessible to farm managers across diverse agricultural landscapes.