1. Introduction
Potato (
Solanum tuberosum) plant density is a basic measurement of plant population whose spatial variation has been linked to significant variations in yield and tuber size distribution [
1,
2,
3,
4]. Increasing plant spacing significantly reduces the plant population and subsequently decreases yield [
5]. Potatoes are known to form multiple main stems per planted tuber and the stems are considered to be the ideal unit of plant population due to previous studies that have linked stem density to tuber number at harvest [
1]. However, potatoes are also known to form multiple secondary stems originating below the ground, which makes it difficult to determine an accurate stem number from above the ground without digging up the plant [
1]. Currently, the opportunity to accurately assess stem population is taken during yield sampling, when plants are uprooted for the purpose of estimating tuber numbers. Stem number assessments at this stage are used to parameterize yield prediction models. For early-season assessments and calculations of plant population, the practical approach for estimating plant population involves the use of the weight of the tubers planted per unit area as a standard seed rate that directly relates to plant density rather than stem density [
1]. This enables farmers to determine the effective plant density in the field in order to evaluate the efficiency of planting operations, seed germination rates, and to accordingly adjust yield expectations. Plant density also affects the number of tubers produced by a plant and their size distribution [
1]. This means accurate prediction of spatial variation in plant density can enable the adoption of spatially variable management to optimized tuber size distribution at harvest. Currently, farmers predominantly evaluate plant density using visual counts of emerged plants from a sample of locations during the establishment phase of crop development [
6]. The reliability of this method ultimately depends on the sample size used to make the assessment and the assumption that the sample is representative of the actual spatial variation in the field. Apart from the inherent irreproducibility of sampling-based designs, the sample sizes needed to reliably meet these assumptions in large fields makes this method impractical. Reliable, efficient, and reproducible methods of plant density determination are therefore an important need in precision agriculture.
A potential solution for estimation of potato plant density is by remotely sensing potato plants, leveraging on the advances in computer vision and aerial image photogrammetry. Unmanned aerial vehicles (UAV) fitted with imaging sensors provide a platform for remote sensing of canopy development with the potential to determine the variation in plant density for precision agriculture applications. A common application of remote sensing in agriculture has been the generation of spatial variation maps using vegetation indices derived from multispectral camera data. These maps have been extensively used to infer spatial variation in vegetation health, utilizing the known spectral responses of vegetation to plant health and molecular constitution [
7]. Classification algorithms applied at spatio-temporal scales are useful for the evaluation of fractional vegetation cover characteristics [
8]. Fractional vegetation cover of land surfaces has been measured temporally and used as a proxy for crop growth rate [
8], providing the potential to simulate crop growth characteristic curves that are required in time-step yield forecasting crop models. Therefore, remote sensing of plant canopy development, using sensors mounted on UAVs, irrigation equipment and tractors provides the potential to operationalize crop modelling at an unprecedented spatial scale, as well as deliver the data needed for high throughput assessment of plant establishment in precision agriculture, quick evaluation of viability of populations in breeding lines and other applications. Apart from pixel-level image analysis applications, there is considerable interest in leveraging advances in artificial intelligence technologies in remote sensing for non-destructive and non-invasive evaluation of properties of interest within crop canopies such as plant heights with light detection and ranging, instances of diseases or pests and stem population counts [
9,
10].
Several computer vision algorithms have been proposed for determining plant counts from aerial images of different crops using traditional image analysis and machine learning approaches. The most prevalent approach involves feature extraction using traditional image analysis followed by any machine learning regression approach for predicting numerical image class labels from extracted features. Several variations of this two-step approach have been used to produce plant-counting models in wheat [
11], rapeseed [
12], potatoes [
13], and other crops. In weed-free fields, images of emerged plants before canopy consolidation consist mostly of green pixels of the objects of interest against a background dominated by soil [
14]. This dichotomy is utilized in the feature extraction step to classify and consolidate connected foreground pixels as objects of interest using reflectance values of the pixels. Colour indices are often used to generate two-dimensional greyscale images from truecolor (RGB) or multispectral images. Once the colour indices are applied, the generation of a binary mask representing the dichotomy between the foreground and background often involves the selection of a threshold that is either learned from the image using the Otsu algorithm [
13] or subjectively selected [
6]. The binary masks are used to generate the feature sets for machine learning like the number of blobs, and their total pixel area. While these approaches return satisfactory binary masks in some situations, subjective selection of a threshold is clearly not expected to be robust in all environments. The most common algorithm-based approach for automatic threshold selection to overcome the subjective threshold selection problem is the Otsu algorithm, which unfortunately becomes sub-optimal when vegetation indices produce multi-modal frequency histograms [
15]. The reliability of the subsequent feature extraction and regression or classification modelling is therefore contingent upon further data-cleaning methods deployed to clean the binary mask, which reduced the reproducibility of downstream models.
Traditional image analysis, fused with machine learning was used by [
13] to predict plant numbers in potatoes, with accurate predictions in well separated plants but mixed results in instances containing overlapped plants. Following up on this work, [
14] trained a deep learning-based Mask R-CNN model, predicting potato and lettuce instances using bounding boxes and sizing the extent of each instance with a mask. This overcame the need for manual feature extraction and demonstrated the efficacy of the deep learning approach. A framework based on a fully convolutional neural network to count potato plants in merged canopies was also developed by [
16]. However, both the Mask R-CNN [
14] and the two-step computer vision [
13] approaches suffer from a loss of accuracy in distinguishing overlapping plants. Homogeneity in potato plants, and the difficulty of separating individual plant units in merged canopies during data annotation makes it difficult to train models that can accurately distinguish individual plants after potato canopy consolidation. A potential solution to this problem would be to conduct imaging before canopy consolidation. However, potato emergence rates vary spatially with soil temperature, disease incidence, and treatments that alter apical-dominance at the seed-lot level [
6]. Additionally, calcium deficiency at the terminal bud can induce the loss of apical dominance, which necessitates branching and determines the number of sprouts per tuber, causing a spatial variation in canopy growth rates across the field [
17]. Variations in planting depth also cause variations in the number of days to emergence, with deeper planted tubers taking up to a week longer to emerge than shallower planted tubers within a field [
18]. These factors make it necessary to delay image acquisition, so as to minimize the chance of underestimating emergence.
Accurate mapping of centroids of plants in merged canopy was reported by [
16] using a custom CentroidNet architecture supported by a fully convolutional network learning the centroid origin of leaves in overlapping potato canopies. The premise of this method is that potato leaves grow outwardly from the location of the planted tuber, and the model can therefore detect the location of a plant by learning the vectors pointing to the centroid of a plant object. In merged canopies, this assumption may be violated by the fact that potato leaves grow outward from their subtending stem and each stem eventually grows independently from its mother tuber [
1]. Therefore, overestimation can be expected in multi-stem plants using the CentroidNet. Indeed, [
16] report observing false positives due to oddly shaped plants, which may be primary or secondary stems. Evidence suggests that accurate potato plant counting remains an object detection problem that is best modelled from UAV data before canopy overlap when individual plants are discernible to data annotators. The practical need for plant counting algorithms to farmers is the generation of a plant density map across the field from the detected objects, making this primarily an object detection problem, and the accurate pixel-wise segmentation problem is secondary. Faster R-CNN (FRCNN), the detection framework under the Mask R-CNN framework therefore provides an adequate and simplified training protocol. Training FRCNN models also requires less hyperparameter tuning requirements than MASK-RCNN [
14].
The ultimate goal of plant counting algorithms is to replace the need for manual estimations of plant density across a field, which is done by interpolation of manually collected count data. The accurate production of 2D density maps that represent the spatial variation in plant density is therefore more pertinent than fine-grain plant-by-plant accuracy because the economic and practical feasibility of site-specific management is contingent upon the establishment of a practically manageable range of spatial autocorrelation [
19]. Management decisions on variables that respond to plant density (e.g., yield) are also likely to only be economically interesting if the plant density exhibits relatively long-range spatial autocorrelation. Large differences in the range of autocorrelation in potato yields from 12 m to 425 m have been reported [
19]. In commercial production, farmers aim to produce uniform plant density across the field and any variation is likely to come from factors such as a systematic fault in the planting operation, low viability of a batch of seed, or soil-borne factors that affect seed germination [
5]. Such systematic sources of variation are likely to exhibit spatial autocorrelation relative to the size of the field. Therefore, plant density maps need to capture this spatial structure rather than merely capturing local variation. This provides a geostatistical solution to the problem of plant-density determination using computer vision and deep learning algorithms in fields with slightly merged canopies. In a large orthomosaic of UAV imagery, plant detection can be conducted on a sliding window as suggested by [
14] then a filtering step can be added post-detection, discarding all images containing overlapped plants. This would create a sparse matrix of detections across the field from which geostatistical interpolation can be used to generate a continuous representative 2D density map. The 2D density map can then be related to variation in yield parameters or used to test the utility of other sources of variation like satellite-derived early-season NDVI. With the main potato-related papers in this field focused solely on detection [
13,
14,
16], this approach has not been reported in literature.
In this paper, we demonstrate plant counting in potatoes as a detection problem solvable by a FRCNN model without the need for instance segmentation. For the first time, we produce geostatistically interpolated 2D plant density plots, compare them to satellite-derived early-season NDVI density plots and evaluate the relationships between the density plots and potato yield attributes.
3. Results
Using the test dataset of 248 images from Horse Foxhole, the FRCNN model achieved an aP score of 0.78.
Figure 3 shows the actual vs. predicted number of plant objects in the test dataset drawn from the same pool as the training data. Overall, the model predicted the actual number of plants in an image to within 2 stems error as illustrated in
Figure 3 (RMSE = 1.59). The model had a nRMSE of 0.19 and R
2 of 0.80, though the scatter plot revealed relatively larger variation between observations and model predictions at higher plant densities, elucidating the decrease in model accuracy in closely spaced and potentially merged plants. Manual assessments showed that the model under-predicted the ground-truth bounding boxes in images with merged plants.
Some images contained potato plants that were emerging from the ground but had not formed enough above-ground foliage to be confidently annotated. The model learned and labelled these as potato plants, leading to over-prediction in a small number of instances. An illustration of this is in
Figure 4 where the model predicted one more plant than the originally annotated ground truth on account of a single planting station that had delayed emergence and was not annotated in the ground truth but predicted by the model. In practical terms, the model predictions were closer to the actual plant density than the ground truth labels in this instance. Due to the clear linear relationship between the predicted and actual plant counts, reliable plant density maps of the whole field could be constructed from the model and compared to observed densities.
The plant density map of Horse Foxhole (
Figure 5) revealed considerable systematic spatial variation in the plant stand, showing higher plant densities in the edges of the field compared to the middle. This was determined to result from an inconsistent seed metering mechanism during planting at the edges, resulting in very high plant densities. This was rectified when the middle section of the field was planted, culminating in the observed differences. The variation in predicted plant density coincided with the variation observed in NDVI (
Figure 6), showing that early-season NDVI sensed by the sentinel-2 is partially influenced by the plant density on the ground. NDVI was not measured at Buttery Hill due to the lack of cloud-free images before significant canopy consolidation in the Sentinel-2 repository, therefore the NDVI map and plant density map are not shown.
Comparison of the actual plant populations at Horse Foxhole and Buttery Hill against the predictions from the FRCNN model (
Figure 7) by RMSE showed that the model performed comparably with the test dataset.
An RMSE of ~1 plant, and nRMSE of 0.24 were observed, showing good accuracy at low plant densities but largely under-predicted at the high plant densities at Horse Foxhole. Overall, there was less variation in planting density at Buttery Hill compared to Horse Foxhole where high actual plant densities of up to eight plants per square metre were observed, corresponding to approximately 12 cm spacing between plants. There was a large degree of overlap between plants at these plant densities, causing the model to miss some detections and subsequently return lower plant densities than expected. However, the model still highly captured the variation in plant densities (R2 = 0.80) and therefore the density map produced was a reflectance of observed variation.
Table 2 shows the mean and standard deviation of the potato yield components at Horse Foxhole and Buttery Hill. Overall, inconsistent plant spacing at Horse Foxhole had a large effect on the plant population density, with an average five plants being planted per square metre while Buttery Hill had three plants per square metre. There was also larger variation at Horse Foxhole with a standard deviation of two plants (STD = 1.58) compared to Buttery Hill (STD = 0.77). Buttery Hill produced more tubers per plant than Horse Foxhole, but the mean tuber weight was smaller (74 g and 103 g, respectively). Correlation analysis showed a highly significant correlation between the FRCNN plant density predictions and the actual plant densities as shown in
Table 3. At both sites, there was a correlation of exactly 0.87 between predicted and measured plant density. Furthermore, there was a strong correlation (
r = 0.61) between plant density predicted by the FRCNN model and the NDVI derived from the Sentinel-2 satellite, quantitatively buttressing the visual similarity between the FRCNN plant density map (
Figure 5) and the NDVI map (
Figure 6). The FRCNN-predicted plant density, actual plant density and NDVI all showed a similar pattern in their correlation coefficients with potato yield components. Increasing plant density was associated with decreasing tuber number per plant (r = −0.61 at Buttery Hill and r = −0.78 at Horse Foxhole). Similarly, a negative significant correlation was observed between NDVI and the number of tubers per plant at Horse Foxhole.
The weight of tubers per plant was also negatively associated with plant density at both sites and a similar negative relationship was observed with NDVI at Horse Foxhole. Despite the significant relationships between plant population and the weight per plant, no strong correlation with yield per square metre was observed at both sites. Similarly, no correlation between marketable yield and NDVI was observed at Horse Foxhole. The negative correlation between predicted plant density and tuber weight per plant at Horse Foxhole (r = −0.82,
p < 0.001) coincided with a positive correlation between predicted plant density and total tuber number (r = 0.41,
p = 0.01). However, while the tuber number per unit area was positively related to the predicted plant density (r = 0.41 at Horse Foxhole and r = 0.54 at Buttery Hill), the overall yield per unit area was not significantly different. As observed in
Table 3, high PPN at Horse Foxhole were negatively correlated with tuber weight (r = −0.62). The tuber weight per square metre at Buttery Hill was only weakly correlated to predicted and actual plant density although the tuber weight per plant was negatively correlated (r = −0.57) and the total tuber number was positively correlated (r = 0.54).
4. Discussion
The predictive precision obtained by our FRCNN model (0.78) is much higher than previously reported 0.41 mean average precision by [
14], who used a Mask-RCNN architecture on overlapped plants. Though utilizing Mask-RCNN offers the prospective advantage of filtering individual plant masks from overlapping potato plants over ordinary FRCNN, [
14] concede that accurate annotation of individual plants from canopy images of overlapped plants is difficult for domain experts and subsequently the trained models struggle to make accurate detections. Additionally, Mask-RCNN’s detection framework is based on FRCNN, however, [
14] conducted their study using images taken later than four weeks after emergence, while our image acquisition was restricted to fewer than three weeks after emergence. Therefore, large improvements in precision observed in this study are partially due to less overlapping plants in the images, on account of earlier image acquisition. While [
16] report accurate mapping of individual plant centroids in merged canopies, they did not report the average precision of their models. However, the inherent assumptions of the CentroidNet model in relation to potato plants reveal an underlying limitation that at least needs to be acknowledged; the radial expansion of a cluster of potato leaves from a centroid location is only likely to be valid in non-overlapping single-stem plants. Each planted potato tuber produces multiple competing stems that eventually function as independent plants, complicating the unit of plant density [
1]. The direction of vectors learned from canopy leaf images are therefore expected to point towards the centroid of individual stems and not the overall plant object. Similarly, it is not practical for data annotation to be conducted in potatoes after a large degree of canopy consolidation because of the difficulty in assigning perceived individual main stems and leaves to individual plant units accurately. [
14] also acknowledge this limitation in their data annotation and [
13] acknowledge it as a source of error in their random-forest-based classification of potato plant numbers in an image from extracted plant object features. Because of these limitations, it can be considered impractical to conduct plant detection on images containing overlapped plants. While earlier image acquisition can solve the limitations, asynchrony in potato emergence days makes it difficult to determine an optimum day for obtaining a representative plant population. Beyond the scope of this study, striking a balance between premature and late imaging provides one potential solution to these challenges, which can be studied for each variety by evaluating model accuracy as a function of the number of days after planting. However, this would entail variety-specific recommendations for imaging time, which also need further calibration against factors that affect emergence rate like planting depth and average soil temperature. In this study, a geostatistical approach was chosen to create a sparse matrix of accurate plant density predictions at locations where no overlapping plants were observed.
In the two-stage (FRCNN detection then geostatistical interpolation) approach used in this study, we demonstrate that an accurate 2D surface of the variation in plant population density can be created from partially merged canopies, by conducting detections on non-overlapping plants and interpolating across the whole surface. Except for soil temperature and disease incidence, the factors that have been reported to contribute to asynchrony in potato emergence like variety, seed-tuber physiological age and apical-dominance-altering seed treatments vary at the seed-lot level [
6]. It is therefore reasonable to assume that spatial variation in plant emergence early in the season is attributable to systematic inefficiencies in planting operations. With an nRMSE of 0.19 for the determination of actual plant numbers per image as well as a high R
2 value (0.83), the current FRCNN model exhibits the robustness required to produce reliable field maps of spatial variation in plant density to inform precision agriculture decisions. Comparably, the image-analysis-based algorithm by [
13] reported a high R
2 value of 0.96. However, as noted by both [
13] and [
14], the algorithm is heavily dependent on the accurate production of a noise-free binary mask based on the Excess Green vegetation index, which is not always guaranteed and cannot distinguish between weeds and potato plants. The current FRCNN model therefore provides a robust modelling pipeline free of the complexity of generating potato image masks from vegetation indices deployed in the Mask-RCNN and random-forest-based image analysis algorithm.
The high correlation between satellite-imagery-derived NDVI and the FRCNN model results we report represents novel evidence of a link between early-season NDVI and potato plant density. Before canopy consolidation, NDVI is influenced by soil brightness in coarse-resolution imagery, which is normally corrected using the soil-adjusted vegetation index [
26] when evaluating vegetation health. In the present study, non-adjusted NDVI was used with the premise that the level of interference from soil is dependent on the number of plants in the pixel. The high correlation value (
r = 0.61) between the FRCNN modelled plant densities and the NDVI values substantiates this premise. A limitation to the use of satellite data for this purpose is the uncertain availability of cloud-free images after emergence before canopy consolidation (highlighted in this study by the lack of available cloud-free images within this window at Buttery Hill). Nevertheless, early-season NDVI can be calculated from UAVs hosting red and NIR sensors, down-sampled to reproduce the spectral mixing observed in satellite imagery.
Our findings agree with [
3] that there is a significant negative effect of plant density on the number of tubers per plant. Additionally, [
3] found a negative correlation between tuber size (measured as average tuber fresh weight in g) and plant density, in line with our results. In controlled-treatment studies, the effects of plant density on yield are largely inevitable due to large fixed differences in plant density between treatments that are not typically observed in commercial production field. Previous studies [
2,
3,
4,
5] report significant positive association between plant density and yield, in line with our observations at Buttery Hill. However, the relationship was not observed at Horse Foxhole, with yield more related to stem number than plant number. Literature [
27] also points to a more significant relationship between yield and stem number than yield and plant number, with the latter explaining a negligible portion of the variation (R
2 = 0.06).
Ultimately, the association between plant density and yield appears to be tenuous. While our findings suggest that areas of high plant density within a field produced more tubers per unit area, the number of marketable tubers at harvest was smaller. Assuming exponential growth in dry matter production over time [
28,
29], marketable tuber numbers can potentially be increased by delaying harvest timing in high density areas to allow for more tuber bulking. A utility of our main findings is that plant density maps produced from UAVs can be used as a basis for management and harvest decisions, such as variable in-season nitrogen management to delay senescence for tuber bulking purposes and subsequently incorporate variable vine desiccation and harvest timing.