Detection of Banana Plants Using Multi-Temporal Multispectral UAV Imagery

: Unoccupied aerial vehicles (UAVs) have become increasingly commonplace in aiding planning and management decisions in agricultural and horticultural crop production. The ability of UAV-based sensing technologies to provide high spatial (<1 m) and temporal (on-demand) resolution data facilitates monitoring of individual plants over time and can provide essential information about health, yield, and growth in a timely and quantiﬁable manner. Such applications would be beneﬁcial for cropped banana plants due to their distinctive growth characteristics. Limited studies have employed UAV data for mapping banana crops and to our knowledge only one other investigation features multi-temporal detection of banana crowns. The purpose of this study was to determine the suitability of multiple-date UAV-captured multi-spectral data for the automated detection of individual plants using convolutional neural network (CNN), template matching (TM), and local maximum ﬁlter (LMF) methods in a geographic object-based image analysis (GEOBIA) software framework coupled with basic classiﬁcation reﬁnement. The results indicate that CNN returns the highest plant detection accuracies, with the developed rule set and model providing greater transferability between dates (F-score ranging between 0.93 and 0.85) than TM (0.86–0.74) and LMF (0.86–0.73) approaches. The ﬁndings provide a foundation for UAV-based individual banana plant counting and crop monitoring, which may be used for precision agricultural applications to monitor health, estimate yield, and to inform on fertilizer, pesticide, and other input requirements for optimized farm management.


Introduction
Worldwide, banana (Musa spp.) cultivation is vital to economies, having an estimated value of USD 31 billion and export value of USD 8 billion, with the majority of production based in Asia, Latin America, and Africa [1]. The largest producers are India and China, with an annual production of 29 and 11 million tonnes, respectively [2]. Aside from commercial importance, bananas are also considered the third most important starchy food source, with approximately 85% sourced from subsistence agriculture and providing 25% of calorific intake in rural production areas [3,4]. Although minor in comparison, commercial banana cropping is an important industry in Australia, with bananas being the most popularly purchased fruit and with a wholesale value (fresh supply) of AUD 723 million annually [5]. temporal resolution imagery improved classification results and enabled the delineation of banana plants.
In remote sensing there are several approaches that can be used for the detection and delineation of individual plant crowns across all species, with some of the most common and well established being local maximum filter (LMF) [35,36], template matching (TM) [37,38], watershed segmentation [39], and GEOBIA [40]. The application of machine learning, specifically deep learning methods, based on neural networks has been increasingly applied for the detection of individual plant crowns [41,42]. Several neural network models exist; however, the most popular and widely used of these is the CNN approach [43]. The application of CNN for individual crown detection has advantages due to its ability to utilize large datasets to provide flexible detection under varying environmental conditions such as illumination and changes to morphology, reporting high success and outperforming many other methods [44,45]. Drawbacks to CNN approaches compared to well-established non-machine learning approaches are the required computing power and initial supervised training, which can be time-consuming, as it is often an iterative process [34,44]. Although other deep learning options exist and are often utilised and suited for sophisticated tasks with relevant additional complexity [46], for the purpose of crown object detection, CNN provides an appropriate model that, due to its mechanism of detection, is well suited for the detection of spectral and spatial features that indicate plant crowns.
Unoccupied aerial vehicles (UAV), also referred to as unmanned aerial vehicles or remotely piloted aerial systems (RPAS), as a sensor platform are particularly appropriate for banana crop monitoring, as they permit low flight altitude to yield high resolution imagery in a relatively cost-effective manner [18,21]. UAVs also enable flight operations on a responsive or ad hoc basis, providing greater temporal resolution with the potential for near real-time or farm-based data processing and analysis (e.g., Trimble Geospatial oil palm solution https://geospatial.trimble.com/products-and-solutions/ecognition-oilpalm-solution, accessed on 25 March 2021). Recent studies utilizing UAVs for the purpose of delineating crops or stands of banana from surrounding land-cover classes include Harto et al. [47] and Handique et al. [48], both of whom used GEOBIA to detect plants based on spectral, textural, and shape attributes and reporting a user's accuracy of 80% and 87%, respectively.
More related to this study, the detection of individual banana plant crowns, specifically targeted to planned commercial cropping, include Kestur et al. [49], who reported success in the detection and delineation of banana crowns from red, green, blue (RGB) imagery using a combination of spectral and spatial features. A K-means spectral classifier was compared to machine learning in the form of extreme learning machine (ELM) neural networking for initial individual plant-crown detection, with the machine learning approach outperforming K-means. Following this, crown delineation and separation was accomplished using a watershed and region-growing approach [49]. Similarly, Neupane et al. used a neural network approach to detect objects representing whole banana crowns of young banana plants using a bounding box with high success (>95% overall accuracy) by merging the results of UAV RGB imagery captured at three different altitudes (40,50, and 60 m) [50]. Their mapping results were facilitated by the fact that young banana plants present greater homogeneity in crown morphology and their crowns exhibit greater separation than mature plants that often have overlapping crowns. Both studies utilized single-date imagery with training and analysis based on subsets of the same imagery presenting reduced variance in lighting conditions, seasonal difference, and plant morphology. They also highlighted that the lack of such variation may be a constraint when applying these methods to different imagery, with further studies required to assess transferability over time and for different areas (spatial and temporal variability), stating an intention to improve results by additional training.
A recent study that did include spatial and temporal variability is that of Gomez Selvaraj et al. [51], part of a broader investigation based in West and Central Africa (Democratic Republic of the Congo and Republic of Benin) detecting stands, crown objects, and disease identification using pixel classification and object recognition on satellite and UAV imagery. Most relevant was their use of UAV RGB captures to isolate mixed-age individual banana crown objects from surrounding land cover at differing locations and on differing capture dates, with an approximate overall accuracy of 70% for whole-crown object detection using a bounding box on the test dataset based on a CNN architecture approach. Overall, these studies demonstrate the suitability of UAVs for data capture of banana crops, with all studies [49][50][51] aiming for individual crown detection utilizing some form of deep learning. As highlighted by the authors, individual banana crown detection research is ongoing, with further work required to improve detection success and robustness of models.
UAV multispectral sensors, often specifically designed for plant and agricultural application, have become increasingly available, with several off-the-shelf (reasonably) affordable options marketed toward crop monitoring, such as Parrot Sequoia and MicaSense RedEdge or turnkey-integrated UAV and sensor solutions from manufacturers such as Parrot (Bluegrass and Disco-Pro AG) and DJI (P4 Multispectral or enterprise range). The benefits of multispectral sensors that include red edge (RE) and/or near infrared (NIR) portions of the spectrum are well established in vegetation and crop monitoring [31,52,53]. The use of multispectral sensors as a basis for the detection of banana plants provides potential for enhanced monitoring and gaining important information on plants and crops not offered by traditional RGB capture [18]. Basing detection on a multispectral sensor increases the utility of gathered data and minimizes the need for additional captures and processing (as opposed to the need for discrete RGB and multispectral captures), extending the capability of UAV as a platform for monitoring crops.
Based on existing literature, there are knowledge gaps for selecting appropriate detection and mapping approaches for banana plants. The few studies that exist focus predominantly on machine learning methods of differing architectures (Table 1). No studies assess long-established crown-detection methods such as TM and LMF for the detection of banana plant crowns. Although banana crowns are inherently complex due to their crown structure, an investigation of TM and LMF methods is considered valuable, as they have potential to provide suitable crown-detection results. However, the results generally depend on the application (location and resource context) and provide the benefit of reduced computational requirements and potential for faster and less complex application. As discussed in cited studies on the detection of banana crowns [49][50][51], further development and research is required to test the capability and improve the robustness of plant detection, including testing and training at different locations and bioregions and over larger areas. Including image data at all of these scales introduces additional variability in spatial and spectral reflectance/absorption properties of banana plant crowns, with the inclusion of the temporal domain adding further complexity to detection. Further work toward the evaluation of detection methods and particularly their ability to perform satisfactorily on multi-temporal imagery is important for the monitoring of banana plants, with only one identified study (Gomez Selvaraj et al. [51]) having assessed the suitability of their methods on imagery captured on different dates. As such, no study exists on a structured commercial crop-monitoring scenario (repeat monitoring of same crop over time) whereby multi-temporal detection of individual plants of mixed age and asynchronous growth is made using multispectral UAV data.
This study provides an innovative investigation of a range of methods for detecting individual multi-age banana plants, grown at a commercial banana-crop farm located in South East Queensland Australia, using UAV multispectral imagery with particular focus on the targeted detection of the plants' inner crown in order to help facilitate automatic counting and further assessment of banana plants. Approaches investigated herein include CNN, TM, and LMF with classification refinement using a GEOBIA method. Each method was developed using a discrete (not subset) dataset and applied to multi-temporal imagery to determine transferability and robustness of the detection methods. An investigation into the detection accuracy in relation to UAV-derived morphological measurements (height and crown spread) is also presented. Successful data collection, processing, and mapping approaches provided by this investigation are part of a larger proof-of-concept study to determine the capability of high spatial resolution remotely sensed data to map in-field plant condition and age variability and discriminate multi-temporal changes in banana plants and ratoon crops under commercial cropping conditions over a bunch growth cycle.

Study Location
The study site for this investigation was a commercial banana farm located in Wamuran, Queensland, Australia (Figure 1), approximately 11 km west of Caboolture in South East Queensland. UAV data were acquired over an area of 0.5 ha, located on a northerly aspect at 170 m above sea level and falling to 113 m above sea level, with an average slope of approximately 21 degrees. Wamuran, situated in South East Queensland in the Moreton Bay region, is considered to be a humid subtropical climate with moderate to hot summer months (December to February) and cool to mild winters (June to August). Recordings from the Beerburrum weather station (#040284), located approximately 11 km west of Wamuran, shows that temperatures during summer have a maximum average of 30.2 • C and winter average lows of 9.3 • C [54]. Rainfall primarily occurs in summer, with a maximum monthly summer average of 203.2 mm and a low of 45.9 mm in the winter months. The surrounding region hosts forestry, farming (primarily strawberries and pineapple), and residential land uses. This irrigated site cultivates approximately 600 Cavendish (Williams) banana plants spaced 2.5-3.0 m apart, with the general age of plants over 6 years and new plantings established on an as-needed basis. Older crops generally display higher levels of asynchronous growth, likely due to management practices (harvest regime and plant propagation methods) and biotic and abiotic causes [15]. Due to the age and planting strategy, this crop comprises asynchronous mixed growth.

UAV Data Collection and Ground Validation
Multispectral imagery was captured using a Parrot Sequoia camera (Parrot Drone SAS, Paris, France) mounted to a 3DR Solo quadcopter (3D Robotics, Berkeley, CA, USA). The Sequoia camera utilizes a 1280 × 960 pixel CMOS sensor that captures information in the green (550 nm, 40 nm bandwidth), red (660 m, 40 nm bandwidth), red-edge (RE) (735 nm, 10 nm bandwidth), and near-infrared (NIR) (790 nm, 40 nm bandwidth) parts of the spectrum with an upward-facing irradiance sensor for radiometric normalization purposes. UAV flight plans were programmed as a grid-pattern flight line following the direction of row plantings at a height of 50 m above ground level (AGL), with 80% sidelap, 92% forward overlap (1 second capture interval), and 5 m/s flight speed using Mission Planner and loaded into the 3DR Tower App for flight control. In an effort to maintain consistent altitude, flights were programmed perpendicular to the slope direction with waypoints set at 50 m AGL based on a 1 m digital terrain model (DTM) obtained from the Queensland Spatial Catalogue [55]. Flights were conducted on three dates to capture a range of seasonal differences, plant morphology, growth (various mixed phenological stages), and condition, including on 28 August 2017 at 11:13-11:26 am, 20 September 2017 at 11:50-12:04 pm, and 19 March 2018 at 11:19-11:30 am, with approximate sun elevations of 53 • , 62 • , and 63 • , respectively. The first two flights were undertaken under clear cloud-free conditions, whereas the final flight occurred with~20% cloud cover. However, the images of the final flight were collected during a period with no clouds obscuring the sunlight or casting shadows on the study area.
Ten Propeller AeroPoints (Propeller Aerobotics Pty Ltd., Surry Hills, Australia) Ground Control Points (GCPs) were deployed with locations ( Figure 1) recorded for a minimum of 4.5 h and geometric correction processing made using a Propeller network base station located 11 km from the study site. Eight gradient greyscale targets constructed of masonite with three coatings of matte Dulux Wash and Wear paint were deployed in the field within the study area for radiometric correction purposes similar to the method described by Johansen et al. [56] and Wang and Myint [57], with reflectance characteristics of each measured using an ASD FieldSpec 3 spectrometer (Malvern Panalytical Ltd., Malvern, UK) and found to be near Lambertian. Field data included an overall count of banana plants to determine a baseline number of plants. Plants identified as suckers were omitted from counts. Attributes such as plant spacing and number of rows were also observed. To support counts, manual interpretation of high spatial resolution UAV RGB captures and same-day orthomosaics based on similar flight planning were also used.
Field measurements of 29 banana plants spanning the capture dates were made for height and crown spread (horizontal width of the crown) using a survey staff or a laser rangefinder as per the manufacturer's recommendations (Laser Tech Inc., Centennial, CO, USA). Height was measured from the ground to the crown apex and average crown spread measurements were based on the distance measured from the outermost leaf edge of the crown (dripline) horizontally to intersect the psuedostem in 6 directions calculated as: where SUM is the aggregate, r is the radius measurement of the crown (psuedostem to edge of crown measurement), and n represents the number of measurements [58].

UAV Data Pre-Processing
Agisoft PhotoScan Pro (Agisoft LLC, St. Petersburg, Russia) was used to create orthomosaics and digital surface models (DSM) from the multispectral data (processing information provided in Table 2). Prior to image processing, photos were visually inspected and removed if they were captured during turns and height adjustment at the end of a flight line. For the photo alignment, the key and tie point limits were set to 40,000 and 10,000, respectively. GCPs were visually located in the images for geo-referencing and a dense point cloud was built using the high-quality setting and mild depth filtering in order to retain as much banana canopy detail as possible. The point cloud was then used to first produce a digital surface model (DSM) and digital terrain model (DTM) by classifying ground objects in the point cloud. Prior to orthomosaic generation, the colour-correction setting was enabled to account for Sequoia automatic capture settings (shutter speed and ISO values) and image vignetting. The ground-sampling distance (GSD) of the orthomosaic was generated using the default mosaic blending mode and the DSM as the surface. A canopy height model (CHM) was created by subtracting the DTM from the DSM [57,59]. Following orthomosaic generation, it was observed that the central parts of several banana plant crowns had a halo effect, caused by the inability of the 3D reconstruction of the dense point cloud to identify the thin tips of the leaves, which in turn affected the DSM used as the surface for the orthomosaic generation ( Figure 2) [20,21]. The use of the NIR band to prevent the halo appearance was trialled with minimal improvement. In order to preserve the spectral information of the orthomosaic, imagery was re-processed using the DTM. Although the use of the DTM for the orthomosaic generation solved the issue with the halo effect, it also meant that the banana plant crowns were not correctly orthorectified, causing slight geometric offsets, specifically in the taller parts of banana plants. However, in this study, preservation of the spectral information was considered more important than geometric accuracy. The orthomosaics based on the DTMs were converted to at-surface reflectance using a simplified empirical correction based on the greyscale radiometric calibration panels with the aide of a Python script [59]. Dark-object subtraction based on minimum values of each band within the study area was carried out on the orthomosaics, as negative reflectance values were encountered in some captures. The negative values occurred in portions of shaded area cast by crowns [59].

Image Classification
For comparative purposes, classification of the orthomosaics across multiple dates was trialled using three classification methods for the detection of tree crowns: convolutional neural network (CNN), template matching (TM), and local maximum filter (LMF). Classification testing and development was carried out using eCognition software and allowed for flexible development and testing of GEOBIA for CNN, TM, and LMF workflows ( Figure 3). It also streamlined the classification process by including a sampling environment for CNN and TM and algorithms for pre-processing data such as layer creation, augmentation, and filtering as well as algorithms used for post-processing such as classification refinement and exportation of results. Use of the eCognition GEOBIA environment allowed developed classification approaches to be saved to a single ruleset able to be transferred between dates and flexibility to permit further development of the ruleset in future planned analysis. Orthomosaic classification was carried out using convolutional neural network (CNN), template matching (TM), and local maximum filter (LMF) approaches.

CNN
The CNN analysis was based on the Google TensorFlow API (https://www.tensorflow. org/api_docs/, accessed on 25 March 2021) integrated into Trimble's eCognition Developer 9.5 (Trimble Geospatial, Munich, Germany). Training of the CNN was based on samples created from the orthomosaic from the first capture date (28 August), with image segmentation created for each class, including the inner crown (50 cm buffer surrounding the centre point of each banana plant), vegetation consisting of all non-banana vegetation (e.g., grass, weeds, tree crowns), and soil. The crown class was created using a vector-buffering algorithm based on manually created centre points, followed by a vector-based segmentation algorithm and assign-class algorithm. Soil and vegetation classes were created using a threshold segmentation algorithm based on thresholds set by an automatic threshold algorithm using the Enhanced Vegetation Index 2 (EVI2) layer. Minimal manual editing was required to separate areas of vegetation and banana crown overlap. In order to increase the sample size and improve results [41], the CNN sampling algorithm creates samples at random locations within each of the defined classification objects (inner crown, vegetation, and soil). At each of these random locations, samples are made using a user-defined window size (Figure 4). Trials of differing sample window sizes determined that 36 × 36 pixels provided more consistent detection of crowns than larger windows, whereas a smaller window led to greater false positives. Based on this, 8000 samples of crown, vegetation, and soil class were created using a 36 × 36 window. Four bands of the Sequoia sensor were included in the model with three additional vegetation indices (VI), and three edge-detection filters (Lee Sigma, Canny, and 2D morphology) created in the eCognition Developer software. To aide selection of the most effective VIs to include in the model, the feature space-optimization tool was utilized to select the Green Red Vegetation Index (GRVI)(2) [60], Enhanced Vegetation Index 2 (EVI2)(3) [52,61], and Normalised Difference Vegetation Index (NDVI)(4) [62].
Optimising a CNN model is an iterative process that requires selection of hyperparameters and user inputs defining the architecture of the network and manner in which it will be trained. Selection of these hyper-parameters influences the detection success and required processing [63]. Although some generally accepted approaches apply to this selection, many journal articles exist and studies are ongoing on model designs and optimization. Even so, inputs are often required to be tailored toward the study subject through experimentation for best results.
Hyper-parameters required to define a CNN architecture include the selection of the number of hidden layers and the number of feature maps (filtered outputs) generated for each hidden layer. A user-defined matrix filter (or kernel) carries out a local convolution, creating a hidden layer. Kernel size, multiplicative weights selected, and the number of resultant feature maps created influence the extraction of features used to identify the object of interest (in this case, banana crowns) and works by attempting to find a distinct structure or uniform arrangement of pixels able to detect the object of interest [44,63]. Features extracted that might be used to identify objects include a distinct structure or uniformity in pixels such as edges, lines, focal point, or other such pattern able to detect the object of interest.
After several iterations trialling different hyper-perimeter settings, a CNN model ( Figure 5) was selected consisting of two hidden layers: a 5 × 5 kernel size with 32 feature maps and a 3 × 3 kernel with 64 feature maps, to which max pooling was applied, aiming to preserve the most prominent features by down-sampling data in order to reduce its complexity. Max pooling used a 2 × 2 filter with a stride of 2 in the horizontal and vertical image direction. The inclusion of two hidden layers provided a slightly more robust result across dates compared to a single hidden layer with minimal change in processing time. During the CNN training step, random batches of sample imagery were input into the model for the classifier to form a representation of a banana crown and characteristics able to be used for detection. Weights and biases used to compute this representation were based on a backpropagation algorithm. The user-defined hyper-parameters that influenced the training success most were the adjustments made to the learning rate, which influenced how weight was adjusted during each iteration of the statistical gradient descent optimization [63,64]. A learning rate of 0.0015 was set with 50 training samples and 5000 training steps initially used based on recommended eCognition default settings and guidance from agricultural tree-crown investigations by Csillik et al. [41]. In order to optimize training, a trial of different learning rates was conducted and it was found that increasing the learning rate either reduced the effectiveness of crown detection whilst providing minimal improvement to processing time or failed to provide a result (did not converge). The selected CNN model was saved and applied to subsequent capture dates, producing a probability heat map of plant locations. To reduce noise in the resultant probability heat map and to aid in isolating potential crown centres, a 15 × 15 Gaussian filter (2D morphology pixel filter algorithm) was applied followed by a layer arithmetic filter to highlight the local maxima if they exceeded a set probability heat-map threshold. Threshold values ranged from 0 to 1, with those closer to 1 indicating a greater likelihood of inner crown detection [64]. Different probability heat-map thresholds were trialled to optimize classification results, with a threshold of 0.65 for primary detections, which was later lowered to 0.5 to improve recall (defined below).

Template Matching
TM was carried out using eCognition's template-matching algorithm based on Pearson's correlation coefficient ( Figure 6). Template matching requires the manual collection of samples from imagery to create a template patch of user-defined size, representative of the object of interest (in this case, banana crowns). Templates are then compared to the imagery requiring analysis and the results of a cross-correlation are output as a layer displaying the level of similarity across the image, with larger values representing areas of greater similarity and likely to be banana crowns. To increase the robustness of the detection method to changes in illumination between the template and analysis datasets, they were normalised prior to correlation calculations [65]. For banana crown detection, template-sample generation was based on the dataset collected on 28 August using banana crown centre-point locations identical to those utilized for the CNN model to provide a comparative representation of samples using a template size of 36 × 36 pixels. Template matching in eCognition is based on a single input layer, and following trials of the various layers identified during feature optimization, EVI2 was selected as the most effective. eCognition's template editor was used to create grouped-type templates whereby the algorithm creates a user-defined number of templates from subgroups of similar objects identified within the samples. Implementation of the template-matching algorithm provides options for sample augmentation based on rotation (angle input) and requires a correlation threshold to be set to identify valid targets in the correlation layer. Adjustments were made to rotation and correlation thresholds and trials of the different layers determined that a suitable combination for template generation was 50 sample subgroups with 10 • sample rotation applied based on the EVI2 layer using a correlation threshold of 0.5.

Local Maxima Filter
The underlying assumption for successful detection of individual banana plants using LMF is that plant crowns represent greater pixel brightness/values than surrounding areas in imagery [66]. To isolate crown locations, a fixed window approach was utilized, that is, a window of a fixed user-defined size was applied to the September and March orthomosaics to filter locations with a maximum response as potential plant crown candidates [66]. The LMF approach was implemented in the eCognition software by using a combination of algorithms ( Figure 7). First, the selected image layer was smoothed using a 2D Gaussian filter of 35 × 35 pixels (2D morphology pixel filter algorithm), which was found to be an appropriate size due to majority of crowns having similar dimensions. Smoothing can help identifying crowns, as pixels that make up individual crowns may not appear homogenous when using high-spatial-resolution imagery, leading to multiple false positive detections. A dilation filter of 35 × 35 pixels was then applied to further detect locations of maxima representing possible crown centres. Finally, a user-defined LMF threshold was applied to identify crown centres. This method was tested on layers identified during feature optimization, with the EVI2 layer providing the greatest contrast and consequently best LMF crown-detection result.

Classification Refinement
A universal method was applied to improve upon the initial classification of the three detection methods by reducing the detection of false positives, effectively improving precision. Initially, false positives of non-banana vegetation were removed based on the CHM, using a threshold-segmentation algorithm to retain plants identified to be >1.5 m and <8.5 m in height. Due to the slight misalignment of the CHM and imagery layers (from the orthomosaics generated from the DTM surface), a buffer was created based on the distance map algorithm, with the buffer size adjusted to cover the inner crown extent. On application of the CHM, a small number of correctly identified banana crowns (true positives) was removed. The removal of the smaller banana plants (<1.5 m) did not decrease the effectiveness of classification as these plants were not in production and were likely to be removed during crop-management practices such as de-suckering and plant thinning, whereby unwanted side shoots and plants are removed.
On some of the classified outputs, particularly CNN, some single crowns had multiple detections tending to occur on larger crowns or when a lower threshold was set for heatmap detection. In order to preclude crowns having multiple centre points, points within 1 m of one another were merged to form a single location. This distance was chosen as few distinct crowns were observed to be within 1 m of each other. Within the eCognition software, the omission of points within 1 m of each other was achieved by growing crown centre points to 50 cm (pixel-based object resizing algorithm), merging adjoining crowns after the object growing (merge region algorithm) and exporting the results to a vector layer based on a single point representing the centre of gravity (export vector layer algorithm). To facilitate further testing, this operation was also carried out using GIS software (bufferdissolve-centroids), producing identical results.

Accuracy Assessment
To determine the performance of each of the three classification approaches, an accuracy assessment was carried out for the September and March orthomosaics against identified crown locations. A successful inner crown detection was considered to be within 1 m of the crown centre point; for the majority of banana plants this area encompasses the greatest density of aboveground biomass. From these inner crown locations, true positives (TP) (correct identification), false positives (FP) (incorrect identification), and false negatives (FN) (a crown is not identified) were identified and used to calculate precision, the ratio of banana plant predictions that belonged to the banana class, with greater precision indicating that fewer features were incorrectly classified as banana plants; recall, the proportion of banana plants not detected during classification, with greater recall indicating that fewer banana plants were omitted from the classification; and F-score, the overall accuracy, taking into account equal representation of precision and recall results [50].

Relation of Plant Morphology and Detection Rate
To determine whether the UAV orthomosaics provided a representative estimation of banana plant morphology across capture dates, the individual crowns of 29 plants that had been measured in the field were manually delineated from the orthomosaic using GIS software. Following delineation, zonal statistics were used to extract the maximum value from the CHM to estimate height. A minimum oriented bounding box was then applied to each delineation from which minimum and maximum lengths were derived and average crown spread calculated [58]. Estimated maximum height and crown spread based on the CHM and orthomosaic data were compared to field measurements, from which a linear regression was calculated along with goodness of fit (R 2 ) and root mean of square error (RMSE) to assess the relationship between field-and UAV-derived measurements. The comparison of field-measured and orthomosaic-derived plant height and crown spread provided information on the ability to derive height and crown spread directly from the UAV data to expand the assessment of height and crown spread to banana plants that were omitted (false negatives) during the detection process using the three different approaches. Further investigation into false negative detections for each of the detection methods was carried out to determine whether a visible trend was apparent in the orthomosaics or whether specific morphological characteristics affected the detection rates. From the orthomosaics of each detection date, omitted crowns (false negatives) were manually delineated and a minimum oriented bounding box was applied to each. Based on the bounding box, perpendicular minimum and maximum crown diameter lengths were derived, from which average crown spread was calculated (Equation (1)) [58]. The derived measurements of crown spread for the false negatives were subsequently compared for each of the three detection approaches to assess how crown morphology impacted the detection rate.  Table 3). The results of the banana plant detection (Figure 8) showed that reductions in precision were mostly attributed to false positives due to multiple detections of the same crown. The crowns with multiple detections were generally too large for the specified crown-centre buffer (50 cm) to effectively prevent multiple detections. This issue was more prevalent on 19 March, where banana plants had larger and denser crowns with greater levels of crown overlap.  March is likely to have reduced the effectiveness of classifiers to detect crowns. As opposed to the CNN approach, neither the TM nor the LMF classifiers returned false positives caused by multiple crown detections on a single crown, which is likely a function of the window size defined for detection.

Detection Rate of Banana Plants
When comparing F-scores, CNN provided a superior result to both the TM and LMF approach. TM and LMF provided a better result for precision compared to CNN, but they detected fewer crowns (lower recall), with a further reduction in performance on the second capture date, whereas CNN recall was higher and consistent on both dates despite visible changes to the banana crowns and increases in other vegetation (ground-cover weeds and grass). Classification refinement improved results for each detection method in a slightly different manner. CNN classification improvements were mostly concerned with reductions in the occurrence of multiple detections on single crowns, whereas TM and LMF both displayed a similar detection pattern of having false positives on other vegetation (not banana crowns).
To improve recall prior to the application of classification refinement, detection thresholds were decreased for the probability heat map (CNN), correlation coefficient (TM), and the LMF. Improvements to recall were evident for CNN, whereas lowering detection thresholds for TM and LMF introduced additional false positive detections that were completely unrelated to the banana crowns, with no noticeable improvement to recall. A lowered threshold of 0.5 was applied to both 20 September and 19 March CNN probability heat maps with subsequent improvements to recall. As a result, the number of false positives increased, returning a precision of 0.52 and 0.5, respectively. As the majority of these false positives were caused by multiple detections on single crowns, the application of classification refinement, specifically the merging of multiple points within the single crown, was an effective method of improving results. An indicative result of CNN without the above classification refinement, but with the initial elevated detection threshold of 0.65 applied, provided a precision of 0.9, a recall of 0.86, and an F-score of 0.88 for the orthomosiac based on UAV image data acquired on 20 September, whereas a precision of 0.81, a recall of 0.79, and an F-score of 0.8 were achieved based on the UAV image data from 19 March.
A review of the classified orthomosaics revealed an increased amount and vigour of vegetation (banana and non-banana) on 19 March compared to 20 September, represented by areas of brighter pixel response on the background EVI2 layer. Increased growth of grass and weeds was observed on inter-row tracks and was particularly present in the north eastern portion of the site. From field observations, banana plants growing in the north eastern plot were in visibly worse condition than the rest of the site. The results from this portion of the site revealed lower detection rates than those for the rest of the site, with a reduction in recall averaging close to 20% for both dates. Despite this reduction, when included in the overall calculations, there was little effect on recall for the rest of the site, as the north eastern plot represented only a small proportion of all the plants.

Individual Plant Morphology Estimation
Field-based plant height and crown spread were related to UAV-derived estimates to assess whether UAV-measured plant height and crown spread could reliably be extracted for plants not measured in the field, specifically those plants representing false negatives. The linear regression between the CHM-estimated plant heights and field measurements provided a positive correlation, with an R 2 value of 0.84 (Figure 9a). The field-derived average height of measured plants was 3.24 m, whereas the CHM-estimated average was 3.83 m (n = 29), with a calculated RMSE of 0.78. Inconsistency in height between the CHM-derived plant heights and the field measurements could be related to a combination of factors, with the most likely being 3D reconstruction inaccuracies of the point cloud, overlapping crowns causing intermingled CHM measurements, or in-field crown height measurement variability due to the crowns' non-rigid structure.
A positive correlation was found between the average crown spread derived from the orthomosaic and the field measurements, with the linear regression producing an R 2 value of 0.85 (Figure 9b). Average crown spread for the field measurements and orthomosaic was 2.58 m and 2.46 m, respectively (n = 29), with an RMSE of 0.45. As both plant height and crown spread could be extracted from the orthomosaics, further assessment was undertaken to evaluate the impact of crown morphology on detection accuracies of the three approaches. Both plant height and crown spread of banana plants that were omitted (false negatives) by the three different detection approaches were evaluated. As height characteristics were used during classification refinement, understanding the accuracy of the heights extracted from CHM is a consideration when determining height thresholds to be used during classification refinement. However, as CHM was often observed to be influenced by overlapping crowns of neighbouring plants it was considered that crown spread was more relevant for the assessment of crown morphology on detection accuracies. Crown spread also provided a more accurate representation of physical measurements and could have more relevance for comparison due to the detection mechanism of each of the methods. Only the results of crown spread are provided below.

Effects of Crown Morphologies on Banana Plant Detection Rates
The spread of undetected crowns for the CNN approach on both 20 September (n = 18) and 19 March (n = 28) showed slightly less variation compared to the TM (20 September n = 122, 19 March n = 243) and LMF (20 September n = 121, 19 March n = 245) approaches ( Figure 10). The median crown spread of the false negatives was slightly lower for the CNN approach on both dates than those for TM and LMF (September = −0.22 m, March = −0.34-0.39 m). The CNN interquartile range was larger than those for TM and LMF, indicating that the CNN detections had greater variability in crown spread around the median than TM and LMF. Observations from the orthomosaics indicate that CNN detection performance was affected by large crown overlap from multiple directions, or in situations where crowns were in close proximity to one another, whereby crowns were either obscured or their shape was changed by neighbouring crowns. Although the median crown spread of false negatives was smaller for CNN than the other methods, it was observed that false negative detections generally occurred in areas of higher plant density (reduced spacing). Hence, the crown spread of individual plants is not specifically an indication of the amount of crown overlap, e.g., neighbouring plants may have had large crowns that encroached on the omitted plant, or cases of double crown plants growing from the same corm (e.g., large follower plants). A second observation is that false negatives occurred in plants that appeared to have oddly shaped crowns or had leaves missing, which could relate to poor health or an orthomosaic abnormality. Plants located in the north eastern portion of the site that were observed to be in poor health provide an example of poor detection rates, with a cluster of undetected plants present on 19 March. It appears that false negative detections were more related to visible incomplete crown structure as opposed to a particular crown spread.
No obvious trend could be observed for TM and LMF in regard to average crown spread, with false negatives represented across a wide range of crown spreads on both dates and displaying similar performance on each consecutive date with no obvious trend relatable to the orthomosaic. A general observation is that both detection methods appeared to be affected by crown overlap, which was particularly the case for the March 19 capture. Based on these results, i.e., the large variation in crown spread of false negatives, it appears that the morphology of the banana plants had no obvious impact on the detection rate of TM and LMF, and in comparison to CNN, performed equally poorly across all crown-size morphologies of undetected plants. However, the condition and crown structure could be linked to the performance of the CNN approach. Hence, CNN results may be affected by banana plants with poor leaf structure, which can occur due to plant phenological stage, poor health, wind damage, or during fruit development when leaf growth ceases.

Discussion
Similar to most tree crops, detection (and further delineation) of individual plants is important in order to enable plant-specific monitoring and targeted management [21,56]. This is particularly relevant to banana plants, considering phenology, morphology, and growth characteristics such as asynchronous growth and mobile crowns. Different seasons also affect banana plant growth, with plants growing faster in warmer periods due to the associated longer photoperiod and summer rainfall [67,68]. Previous studies have demonstrated that UAV-based remote sensing is able to produce orthomosaics and height data (DSM/DEM/CHM) with suitable spectral and spatial resolutions to detect individual crowns at high accuracy through use of neural network approaches [49,50]. Our study also ascertained the feasibility of UAV data for banana plant detection, identified potential alternative methods that are more computationally efficient, and verified the transferability of methods between dates. Contrasting with other cited studies, importance was placed on the detection of the plants' inner crown for this study to assist automatic plant counting and assess plant density. Plant detection may also facilitate leaf and crown delineation.

Evaluation of the Different Classification Methods
Given the effort and time required to manually demarcate and generate image templates suited for the TM approach, and TM returning similar accuracy to that of LMF, there seems no reason to choose TM. LMF provides a faster process, requiring no sampling or template generation. TM's ability to detect individual banana crowns could relate to the irregular crown shape and unique crown structure of the banana plants [50]. Attributes such as leaf folding and shredding, causing uneven illumination and varying spectral reflectance properties [32,38], have been identified as a problem for template matching in other studies [69]. The TM method could be improved by applying several templates based on various spectral layers and vegetation indices and using templates of different dimensions to improve the detection of plants of different crown sizes. Although TM has been effective for the detection of other species of plants such as coniferous trees and oil palms, which have greater homogeneity in crown structure [66,70], the application seemed poorly suited to banana plants, as demonstrated by a reduction in detections between different capture dates, likely due to pronounced changes to plant morphology in response to plant growth on the later capture date (19 March).
The LMF approach provided a reasonable result similar to TM, but with minimum processing and analysis required, and was more time-efficient than TM. The LMF approach provided a distinct contrast between banana plant crowns and background non-target classes (soil and weeds) due to limited ground cover. During different seasons or for applications at other sites, LMF may not be suitable due to its reliance on contrast between ground objects and crowns. For this study, LMF's reduced banana crown detection over the capture dates is likely to be related to increases both in ground cover (non-banana vegetation) and banana crowns with a higher amount of leaf overlap, causing a decrease in single crown definition and pixel contrast. It has been recognized that the application of LMF for crown detection in other species is prone to providing multiple false positives of the same crown, likely caused by crown structure [66]. For banana plants at our study site, multiple detections of the same banana plant was not apparent when using the LMF approach. However, a suitable filter size is important, considering that the structure of a banana crown, with all leaves emerging and often overlapping from a single central psuedostem, provides a focal density of biomass near the centre of the crown, distinguished by brighter pixels and leading to greater detection success. Further exploration of the use of multiple LMF window sizes may improve (omission) results with the possibility of linking crown size to development (with larger crowns generally belonging to mature plants).
The CNN approach provided the best overall result compared to TM and LMF and could be considered more temporally robust over the capture dates, with detection levels remaining similar, which is consistent with the fundamental design of CNN architecture. Importantly, most detections of false positives were either caused by multiple detections of a single banana crown or situated on the outer crown and not able to be resolved during classification refinement, in contrast to TM and LMF, which had all false positive detections situated on other vegetation or ground objects. In general terms, considering the location of false positives, CNN was more effective in that the majority of false positives were situated on banana crowns as opposed to the other methods that detected unrelated objects.

Evaluation of Classification Refinement
By optimizing the initial banana plant detection and applying classification refinement to reduce multiple detections of the same crown, the CNN F-score improved by almost 5% for both dates in relation to the CNN F-score prior to applying the refinement steps. After classification refinement of the TM and LMF results, the F-score of the banana plant detection improved by 9-10%. The refinement steps were simple but effective, and may be further improved by adding additional criteria or the inclusion of additional GEOBIA refinement [41]. Given the classification refinement application focused on removing false detections to reduce overestimation, only precision, and not recall, was improved. As part of the refinement method was reliant on CHM thresholds, depending on individual farming practices and capture location, thresholds may to be tailored to suit, and the method and may not be applicable when banana plants are of mixed varieties with different growth heights (e.g., Dwarf varieties), are younger or have new plantings, or occur among mixed vegetation, with the latter an unlikely scenario under commercial settings.
Many studies utilise CHMs for individual tree-crown detection and delineation [71]. Most commonly, a CHM aides in the detection of individual crowns and crown centres by determining the location of apexes based on height. For banana plants and palm crops (e.g., date, coconut, and oil palms), the apex might not correspond to the crown centres, hence, causing the highest point to be offset in relation to the centre. Such an approach would have been interesting for this study, but further integration of a CHM was hampered by the requirement to use a DTM for orthomosaic creation, causing an offset between orthomosaic and CHM datasets. As a result, use of the CHM was limited to the creation of a distance map representation of the CHM for classification refinement, causing a reduction in spatial accuracy and detail. Contrarily, the use of a CHM may not be beneficial for locating individual banana plants or crown centres when plants have high levels of leaf overlap due to banana plants lacking a distinct crown apex [33,50].

Application Effectiveness for Multi-Temporal Detections
For this study, the banana plant detection results of the earlier date (20 September) had greater accuracy, which was attributed to the training dataset (28 August) having a much larger time gap than the UAV image data collected on 19 March. Therefore, the growth stage/morphology of the banana plants was similar for the training dataset (28 August) and the UAV image data collected on 20 September. In the later March capture, the plant crowns were noticeably larger and the leaves had greater surface area with greater levels of crown overlap, making visual discrimination of individual crowns difficult. Morphology change such as leaf folding or leaf shredding could have caused changes to crown appearance, and so could accelerated growth and vigour in response to increased rainfall, higher temperatures [67], and an increased photoperiod [68], a situation that may have affected this study, with initial captures taking place in spring (late August and September), whereas the second capture was in early autumn, just following the end of the summer growth period. It is also noted that the later capture in March was partially cloudy, which could have led to reduced leaf folding [68]. Further training and testing, additional sampling, sample augmentation, and time spent on optimization and validation may improve results further. For CNN model creation, future work should focus on automated sensitivity testing to optimize the CNN and evaluation of differing architectures, which could make the process more efficient and improve accuracy [43][44][45].
By testing transferability of the different detection methods on different capture dates, this study introduced variation to capture conditions in relation to weather and season, plant phenology, and morphological changes (size of crown, structure, etc.). However, as this study was based in a specific regional setting, identical methods may not be applicable for detection of banana plants in different geographic regions or even different farms. Variation in banana plant variety, growth, surrounding vegetation, soil, farming practices, and many other environmental and management/cultural factors can make transferability challenging [72]. Aside from training CNN models, gathering representative and highquality samples is time-consuming. The creation of a larger, high-quality, and more diverse sample library or database would provide a great tool for broader adoption of CNN and improve transferability for UAV-based detection of individual plants in banana cropping systems.

Evalution of Crown Morphology on Detection Success
Investigation into the occurrence of false negatives aids future development by determining the weaknesses of each of the detection methods and potential required refinement. It is vital for detection methods to have the ability to classify plants of varying morphologies, particularly considering banana plants' asynchronous growth and potential for plants to have highly variable morphologies related to their growth stage within the same field.
Commonly derived morphological attributes from the data presented include the physical size of crowns and the height of the plant. In comparison to other studies of UAV-derived tree height, underestimation is generally most common, and occurred in studies on avocado [20,73], lychee [56], and olive trees [21]. Height was found to be overestimated for 28 of the 29 banana plants investigated, which may have been caused by several factors. Flight-planning choices such as flight altitude, speed, and flight pattern and image capture rate influence GSD and the amount of imagery overlap, which was attributed to errors in CHM-derived height. Likewise, image-processing inaccuracies can be introduced during 3D reconstruction and incorrect DTM estimation [20,73], and finally, error can be introduced from physical measurements in the field.
As opposed to height derived from the CHM, which may have been influenced by neighbouring overlapping plants, the crown spread was found to be a better option for the assessment of false negatives. The use of UAV-derived average crown spread calculations based on the two-axis method [58] were found to be more appropriate than height, as often only the innermost part of the crown was correctly reconstructed in the DSM-based orthomosaics (Figure 2). The UAV-based derivation of crown spread also corresponded to the way field-based measurements of average crown spread were measured and calculated.
From our investigation on the occurrence of false negative detections, in addition to the above recommendations for improvement of detection (Section 4.1), there is evidence that false negative crown detections could be reduced through the use of different-sized template samples (TM) and filters (LMF) to address the variance in crown spread. However, success of these recommendations in the detection of inner crowns requires further testing. The number of false negative detections of the CNN approach could be further improved through additional training. However, due to the small number of false negative detections in this study, further development was not considered worthwhile, as remaining undetected crowns could quickly be detected manually. For time-series monitoring, historical plant crown detection locations can be carried forward over the season, with crown locations established and confirmed over time (e.g., pre-and post-capture dates). Therefore, further development of detection methods would only be required if problems are discovered in future deployments or when testing on imagery at different locations, with any further improvement relating to quality checking the orthomosaics prior to detection.

Conclusions
Considering the unique manner in which banana plants grow and their specific morphological and growth characteristics, mapping individual banana plants is an important step toward gaining accurate measurements and valuable information about plant growth, status, and condition through the use of spectral and morphological information (height, crown size, etc.) derived from UAV image data. The addition of multi-temporal image captures provides important insight into the dynamics of the crop, such as phenology, yield prediction, and timing for maintenance activities. In this study, multispectral UAV imagery was captured over different dates to determine the ability of three different methods to detect banana crowns within a GEOBIA environment. Detection results demonstrated that the application of CNN for the detection of banana plants was best suited compared to TM and LMF. The CNN models were transferable between differing dates with acceptable results, with further improvement based on classification refinement utilizing contextual (distance between crowns) and crown elevation information (CHM). An exploration of the relationship between field-and UAV-derived measurements of plant morphology was found to provide insight into plant characteristics of those plants that were omitted by the three classification approaches. Given these results, the application of the CNN method described is suitable for the purpose within the bounds of this study and adds important information for the development of suitable methods and further application possibilities in future research, as well as working toward applications in crop management.
Successful detection results from this study support the use of UAVs as an appropriate platform for further development of banana crop monitoring approaches. Similar to their application for other crops, once scientifically valid results and methods have been established through further research, UAV sensor platforms have the potential to provide a cost-effective solution for many facets of banana crop monitoring. It is the aim of future work to further develop the use of UAV-captured imagery for extracting useful information relevant to real crop management. Specifically, development is underway on crown delineation methods from UAV imagery and an assessment of its ability to measure banana plant morphological attributes (e.g., crown spread, height) in a more automated manner to ease crop monitoring. Following this, multi-temporal captures over the life cycle of banana plants will be used to monitor changes to growth and morphology to discover any noticeable trends and determine suitability for crop monitoring.

Data Availability Statement:
The data presented in this study is available on request from the corresponding author and will only be supplied following permission from Earle Lawrence (farm holder) due to privacy considerations.