Deep Learning for Mango (Mangifera Indica) Panicle Stage Classification

A pixel-based segmentation method was demonstrated to be confounded by developmental stage in estimation of flowering of mango. Categorization of panicles into three developmental stages was undertaken with a single and a two-stage deep learning framework (YOLO and R2CNN), using either upright or rotated bounding boxes. For a validation image set and for total panicle count, the models MangoYOLO(-upright), MangoYOLO-rotated, YOLOv3rotated, R2CNN(-rotated) and R2CNN-upright achieved: (i) RMSEs of 25.6, 16.0, 15.4, 25.8 and 32.3 panicles per tree image, (ii) Mean average precision (mAP) scores of 72.2, 69.1, 65.0, 62.5 and 70.9% and (iii) weighted F1-scores of 76.5, 76.1, 74.9, 74.0 and 82.0, respectively. For a test set of images involving a different orchard and cultivar and use of a different camera, the R2 for machine vision to human count of panicles per tree was 0.86, 0.80, 0.83, 0.81 and 0.76 for the same models, respectively. Thus, models generalised well, but with no consistent benefit from use of rotated over upright bounding boxes. While the YOLOv3-rotated model was superior in terms of total panicle count, the R2CNN-upright model was more accurate for panicle stage classification. To demonstrate practical application, panicle counts were made weekly for an orchard of 994 trees, with a peak detection routine applied to document multiple flowering events.


Introduction
Mango (Mangifera indica) trees produce panicles bearing hundreds of inconspicuous flowers, of which at most three or four flowers will develop fruit, although frequently only one or none will so develop.The assessment of the number of panicles on a tree thus sets a maximum potential for the crop yield of that season, while the assessment of stage of panicle development is useful for assessment of the time spread of flowering, and thus the likely time spread of the harvest period.Mapping areas of early flowering can also guide selective early harvesting.Panicle detection may also inform selective spraying operations.However, the manual assessment and recording of panicle number and stage is a tedious task.
Machine vision has been applied for assessment of the level of flowering for several tree crops where the flowers are easily distinguishable from the background based on colour thresholding.For example, [1] reported a prediction accuracy of 82% on apple flower count, relative to a manual count, [2] achieved a R 2 of 0.94 between machine vision and manual count of tangerine flowers, [3] claimed an average detection rate of 84.3% on peach flowers, [4] reported a R 2 of 0.59 between machine vision and manual count of apple flower cluster counts, and [5] obtained a F1-score of 73% for tomato flower detection.[6] avoided a direct count of flowers, but rather characterised almond 'flowering intensity' in terms of a ratio of flower and canopy pixels, reporting a poor relationship (R 2 = 0.23) for this index for a given tree between two seasons.All these reports were based on segmentation routines, generally involving colour given the obvious colour difference for flowers of these species and background.
There are some reports on use of machine vision for assessment of number of flowering panicles (i.e., much branched inflorescences with many flowers).For example, [7] used intensity level in LAB colour space as an index to the number of grape flowers in inflorescences imaged against a black background.A R 2 of 0.84 against human count was achieved.[8] used Scale-invariant Feature Transform (SIFT) descriptors/features along with Support Vector Machine (SVM) to detect rice flower panicles in images.
Recent review papers have emphasised the use of neural network and deep learning in agricultural machine vision in general [9] and for fruit detection and yield estimation [10].For example, [11] noted that better performance in fruit detection and localization was achieved with use of neural networks compared to traditional models based on colour thresholding and handengineered feature extraction methods.[12] introduced the use of lighter weight, single shot detectors such as YOLO [13], which allow faster computation times, for the application of fruit detection and count.[14] used Clarifai [15] Convolutional Neural Network (CNN) architecture to extract features from the possible flower regions obtained from super-pixel segmentation followed by SVM [16] for flower detection.For apple, peach and pear datasets, this method outperformed other methods of that time that were based on SVMs and Hue-Saturation-Value (HSV) colour thresholding methods.Extending their earlier work, [17] used a Fully Convolutional Neural Network (FCN) from [18] for flower detection on tree images of apple, peach and pear.A region growing refinement (RGR) algorithm was implemented to refine the segmentation output from FCN.This method achieved F1 scores of 83.3, 77.3, 74.2 and 86 % on two apple, peach and pear flower datasets respectively, outperforming their previous Clarifai CNN method [14] and the HSV colour model.
Mango panicle size changes with developmental stage, increasing through the stages of bud break, 'asparagus', elongation, anthesis (flower opening) to the full bloom 'Christmas tree' stage, then decreasing with flower drop.Panicle structure is thus more complex than that of a single flower.Therefore, machine vision detection of mango panicles is more challenging compared to the detection of the single flowers of apple, citrus and almond trees.Deep learning methods of object detection may suit the task of detection and counting of panicles by developmental stage, through automatic learning of useful features for classification [10].
There are two reports on the use of machine vision to assess mango flowering.[19] and [20] used the traditional method of pixelwise segmentation to segment panicle pixels from canopy pixels, with results expressed either as panicle pixel count per tree or as the ratio of panicle to canopy pixel count (termed 'flowering intensity').This procedure was implemented on images obtained at night using artificial lighting, processed with a colour threshold followed by SVM classification to refine the segmentation results.The use of a Faster R-CNN [21] deep learning technique to count panicles was reported by [20].This work was limited to estimation of the extent of flowering and the time of peak flowering event.A R 2 of 0.69, 0.78 and 0.84 between machine-vision flowering intensity and in-field human count of panicles per tree for 24 trees was reported for the segmentation method and a deep learning Faster R-CNN framework to using dual and multi-view imaging approaches, respectively.
These earlier reports employed upright bounding boxes.Use of an annotation bounding box as tight as possible around the objects has been recommended to avoid background noise in the training image sets [10].However, panicles are oriented in some range of angles.Upright bounding boxes will therefore not fit tightly around the object perimeter (Figure 1), and a larger amount of background signal will be included in the object class for training.This could adversely affect the classification accuracy of the model.There are several methods involving use of rotated bounding boxes.The two stage detector R 2 CNN (Rotational Region CNN for Orientation Robust Scene Text Detection) [22] is a modification of the Faster R-CNN object detection framework to incorporate training on rotated bounding box annotations for detecting arbitrarily-oriented objects in images.This modified framework seems suited to the task of panicle detection.However, the single stage detector YOLO allows a number of data augmentation measures, including random rotation of the labelled bounding boxes during training) by a range (degrees) specified by the user in the configuration file In the current paper, the task of panicle detection task is extended to another level through classification to developmental stage, with comparison of a large (R 2 CNN framework with ResNet101 CNN) and a small (YOLO framework with MangoYOLO CNN) object detection architecture, and consideration of the use of rotated, compared to upright, bounding boxes.To the authors' knowledge, the current study is the first to classify the stage of flowering for an on-tree fruit crop and is the first report on use of the rotated bounding box method of R 2 CNN for flower panicle detection.The current study utilized the imaging hardware used by [20], allowing for a direct comparison of results of the traditional machine learning approach used by [20].Field relevance was demonstrated by assessment of orchard flowering at regular intervals (e.g., weekly) to provide information on timing of flowering peaks, for use in estimation of harvest timing.

Image acquisition
Tree images of orchard A (Table 1) were acquired at night every week from August 16 to October 18, 2018, using a 5 MP Basler acA2440-75µ m RGB camera and a lighting rig (700 W LED floodlight) mounted on a farm vehicle driven at a speed of about 5 km/hr, as described by [20].The orchard contained 994 trees, and thus each weekly imaging event captured 1,988 images.
The image set of 24 trees from orchard B (Table 1; from [20] were used as a test set.In that study, images were acquired using a 24 MP Canon (DSLR 750D) camera.The number of open, flowering panicles on these trees was manually counted.For both orchards, trees were imaged from each side, with a view from each inter-row ('dual-view' imaging).

Image annotation and labelling
Mango panicles in the training image sets were categorized into three stages; (i) stage X -panicles with flowers (whitish in colour) that were not fully opened (Figure 2 2, bottom row).A four-category system was initially trialled, but human differentiation of the two early stages was problematic, and the resulting model performance was poorer than for the three-category system (data not shown).

Training, validation and test sets
Training was based on 54 images of trees of orchard A (Table 2), with images drawn from different weeks.A validation set was assembled from images of one side of a single orchard A tree, acquired over a six-week period (Table 2).These images were not included in the training set.The training dataset also included annotations for background (369 snips), as required for R 2 CNN training.R 2 CNN uses the background class for negative hard-data mining and was not treated as a detection class during inference.However, the YOLO object detection framework does not require a background class as all parts of the images other than those having bounding box for training are automatically treated as background.Therefore, the background class was not used for YOLO model performance evaluation.
The test set (Table 2) was an independent set consisting of images of orchard B, from the study of [20].These images were of trees of a different cultivar and from a different orchard, and acquired with a different camera, to that of the training and validation sets.

Computing
Model training and testing was implemented on the CQUniversity High Performance Computing (HPC) facility graphics node with following specifications: Intel ® Xeon ® Gold 6126 (12

Detection and classification models
The pixel segmentation method of [20] was employed using the settings of [19].Mango panicle classification to three developmental stages was attempted using the following architectures: YOLOv3-rotated: The single shot object detection framework YOLOv3 [23] is deeper but more accurate than its previous versions [13] and [24].The configuration file data augmentation feature to allow rotation of images during training was enabled for this training exercise.Images were rotated randomly in the range of 40 degrees to mimic the natural range of panicle orientations in tree images.Network input resolution was set to 1024×1024 pixels following the YOLO requirement that the input images are square, and resolution is a multiple of 32.This resolution can be changed to higher values but at the cost of higher computation memory and slower train/test speed.Four classes (3 stages and 1 background class) were used in training, however detection of the background class was ignored during model performance evaluation.The model was trained for 35.7 k iterations with a batch size of 32 and data augmentation techniques defaulted to the YOLOv3 settings (saturation = 1.5, exposure = 1.5, hue = 0.1).The learning rate and momentum were set to the default values of 0.001 and 0.9 respectively.No transfer learning was employed, with the weights of the convolution neural network initialized at random values.Finally, saved model weights from 33 k iteration was used for testing and validation.
MangoYOLO-rotated: MangoYOLO was originally conceived for on-tree mango fruit detection, and has an architecture based on the YOLOv3 [23] object detection framework, optimized for better speed and accuracy.In this architecture, a YOLO deep learning object detection framework is used with a MangoYOLO CNN classifier [12].The training parameters for YOLOv3-rotated were retained for MangoYOLO-rotated model training.Images were rotated randomly in the range of 40 degrees.The model was trained for 77.5 k iterations with a batch size of 32.Finally, saved model weights from 77.4 k iteration was used for testing and validation.
MangoYOLO(-upright): MangoYOLO(-upright) is same as MangoYOLO-rotated except the random image rotation feature in MangoYOLO-rotated configuration file was turned off, as was the case in the fruit detection work of [12].The model was trained for 89.5 k iterations with a batch size of 32.Finally, saved model weights from 66 k iteration was used for testing and validation.

Estimation of peak of flowering
Repeated (weekly) orchard imaging provided a time course of panicle number by week.The signal.find_peaksfunction from Scipy (www.scipy.org)packages was used to find peaks in the panicle numbers per tree side.Peak properties were specified as height = 10 and distance = 2. Height determines the minimum height of the peak which refers to the minimum number of panicles to consider as a peak.This parameter helps to filter noise (insignificant small peaks) in the signal.Distance determines the minimum horizontal distance in samples between neighbouring peaks.

Segmentation method
The pixel-segmentation method of [20] differentiates pixels associated to panicles from background based on fixed values of colour thresholding.In poorly illuminated areas of images, panicles were not detected (false negative) while in some mages, parts of the tree such as branches or brownish/yellowish leaves were incorrectly classified as flower pixels.Two example images are presented, processed using the segmentation [20] and deep learning R 2 CNN methods (Fig. 4).Flowering intensity (ratio of panicle pixels to the canopy pixels) was assessed following the method of [20] and correlated to the panicle counts per tree made using the R 2 CNN(-rotated) method, for all 994 trees (1988 images) of an orchard for each of seven consecutive weeks (Table 3).Better correlation was obtained between stage Y panicle count rather than total panicle count in the last two weeks, as the proportion of stage Y panicles changed (Table 3).

Deep learning methods
An example of one image processed with the three methods of MangoYOLO-rotated, R 2 CNN(rotated) and R 2 CNN-upright is given as Figure 5. RMSE for estimates of total panicle count per tree of the validation set was lowest with the YOLOv3-rotated method, while the lowest RMSE for count of stages X, Y and Z was achieved with the MangoYOLO-rotated and R 2 CNN-rotated methods (Table 4).Low RMSE values were associated with low bias values.RMSE and bias were generally lower with use of rotated compared to upright bounding boxes for both YOLO and R 2 CNN methods (Table 4).The highest mean average precision and Weighted F1 score was obtained with MangoYOLO and R 2 CNN-upright models respectively (Table 4).Table 4. Panicle stage detection results on the validation set using three methods.RMSE refers to a comparison with ground truth assessments of panicles per image.All values refer to # panicles/tree image.Best results for a given metric and panicle stage are shown in bold. 1 mAP is presented in 'Total' column 2 Weighted F1 is presented in 'Total' column YOLO and R 2 CNN methods were also used in prediction of the test set images (two images per tree), collected from a different orchard, cultivar and camera to the calibration set (Table 5, Figure 6).Predicted counts were compared to human count of panicle stages per tree.The MangoYOLO-rotated method achieved the lowest RMSE and bias of the five methods, while the MangoYOLO(-upright) method returned the highest R 2 but suffered a high bias (Table 5).The R 2 CNN-upright model returned a lower R 2 than the base R 2 CNN(-rotated) model.The R 2 CNN-upright model result was similar to that reported by [20] for a Faster R-CNN (VGG-16) method (which uses upright boxes) for the same trees from test set (Table 2) Table 5.Comparison of panicle assessment methods on the test image set (from [20]) in terms of the R 2 and RMSE between machine vision panicle (sum of two sides of a tree) count on images (two per tree) and in-field human counts of panicles per tree. .Faster R-CNN [20] 0.78 --

Pixel segmentation:
The pixel-segmentation method of [20] outputs the total flower and canopy-like pixels and does not provide estimation of the number of panicles.A correlation was obtained between flowering intensity values and panicle counts, consistent with the report of [20].However, pixel number per panicle varies with stage of panicle development, and so confounds number with developmental stage.A stronger correlation between flowering intensity and panicle count is expected when all panicles are at the same stage of development.
The pixel segmentation method also uses a fixed colour threshold range, but colour of panicles and canopy may vary between cultivars and with growing conditions, resulting in false positives and negatives.Use of the segmentation method was therefore discontinued in favour of a deep learning method for object detection, echoing the advice of [12].

Defining flowering stages.
Mango panicle development is typically described in terms of eight stages: quiescent bud, bud swelling, inflorescence axis elongation, inflorescence sprout, flower opening and inflorescence branching, well-developed inflorescence, and inflorescence starting to set fruit [25].The early stages are difficult to differentiate from vegetative structures at a distance, and classification on tree images was not attempted.Initially a four-stage classification was attempted, adding an early stage of development, but the level of uncertainty in human labelling of the images was high (data not shown).Even with the three-class model, there was uncertainty in annotation, between late X-stage and early Y-stage panicles and late Y-stage and early Z-stage panicles.This uncertainty will adversely affect the classification accuracy of the trained models on development stage, but it will not affect the model estimate of total panicle number.
Deep learning methods: use of rotated bounding boxes.R 2 CNN(-rotated) is a Faster R-CNN framework with the added capability of training on rotated objects (bounding boxes).It was hypothesised that that use of rotated bounding boxes in training and prediction would improve results for panicle detection.
Models trained using image rotation and rotated bounding box produced numbers closer to the total panicle count per image i.e., lower bias (Table 4).However, lower mAP and average weighted F1-scores for rotated models suggest that there was more error in classification of detected panicles compared to the models trained with upright bounding box (Table 4).This result indicated that rotated models were better in detecting panicles of different orientations and suited for applications involving single class, for example for the task of detecting and counting panicles.However, models for classifying panicles to different classes did not benefit from this approach.Thus upright models are recommended for applications involving multi-class classification.
We conclude that there was no benefit from use of random panicle rotation as a data augmentation technique for panicle stages classification.

Deep learning methods: a comparison
The five methods (MangoYOLO(-upright), MangoYOLO-rotated, YOLOv3-rotated, R 2 CNN(rotated) and R 2 CNN-upright) ranked differently in terms of RMSE, bias, Precision and F1 score.Such differences are expected, given the difference in calculation of these metrics.RMSE is a 'gross' metric compared to Precision and F1 score, as false negatives can be offset by false positive detections to yield a low RMSE metric.Therefore, model performance was primarily assessed on the mAP and F1 evaluation metrics (Table 5).As a rule of thumb, a deeper CNN can yield superior prediction accuracy.However, performance is also related to architecture (connecting layers, etc.) as well as depth [26].A comparison of the R 2 CNN-upright method with its deeper CNN classifier (ResNet101) to the singleshot MangoYOLO(-upright) method is useful in this respect.In the validation exercise, the R 2 CNNupright method produced a higher weighted F1-score, but a lower mAP score compared to the MangoYOLO(-upright) method (Table 4).This outcome suggests that the YOLO method suffered from lower recall rates.
The performance of the deeper single shot detector, YOLOv3, was also compared to that of the MangoYOLO method.Better results (mAP and F1-score) of the MangoYOLO method was consistent with previous results in fruit detection [12], and suggests that the MangoYOLO architecture is better suited to this application than that of YOLOv3.
The YOLOv3-rotated model returned the lowest bias and RMSE for count of panicles per image, but R 2 CNN-upright returned the highest F1 score (Table 4 and Figure 5) and the MangoYOLO(upright) model achieved the highest mAP score 4).For the test exercise, the standout result was a low bias and associated low RMSE for the MangoYOLO-rotated model.(Table 5) Panicle image size (in pixels) can vary due to variation in camera to tree distance or use of different camera lens.YOLO models can be robust for such conditions as the YOLO object detection framework provides multiscale image training as a part of data augmentation.Other data augmentation techniques such as random rotation and random change in hue, saturation and exposure of training samples provides robustness for YOLO model to deal with variation in lighting conditions and cultivars.In comparison, R 2 CNN like Faster R-CNN does not support multiscale training, and the only data augmentation during training provided is image flipping (horizontal and vertical flips) as.Therefore, the YOLO method of object detection is recommended for applications similar to that considered in the current study.
The speed of the classification by YOLO and R 2 CNN methods could not be compared because R 2 CNN supports standard CNN classifier architectures such as ResNet and MobileNet while YOLO uses a custom CNN classifier architecture, with no support for standard CNN classifiers.It is expected, however, that YOLO will process images faster than R 2 CNN as the object detection framework of YOLO uses a single-stage detection technique, in comparison to the dual-stage detection technique of R 2 CNN.[12] documented speed and memory requirement comparisons of several deep learning object detection frameworks.For example, it took 10, 25 and 67 ms for mango fruit detection on tree images (512×512 pixels) and requiring 873, 2097 and 1759 Mb of GPU memory for MangoYOLO, YOLOv3 and Faster-RCNN (VGG16) models, respectively.

Applications
One of the challenges in precision agriculture is the display and interpretation of large data sets in a form useful to the farm manager, i.e., an appropriate decision support tool for farm management is required.Thus, the presentation of data on panicle count by stage of development for every tree in an orchard requires consideration.A key assessment is the timing of flowering, which can be used in conjunction with calendar days or thermal time to estimate harvest date.This is useful for planning harvest resourcing.Inaccurate harvest timing estimation will result either in harvest of less mature fruit, resulting in lowered eating quality, or over mature fruit, with reduced postharvest life.
For example, panicle stage data can be presented on an individual tree basis by week as a line graph (Fig. 7).This figure enables identification of early flowering areas in one row of an orchard, but it does not scale well to consideration of an entire orchard block.Data can also be presented on a farm map, using a colour scale to display values, per week of imaging (Fig. 8) using the web-app described by [27], which allows display of the individual images associated with each tree.Alternatively, data can be summarised for an orchard block, with a tally of panicles by development stage by week (Figure 9).In the example data, there was a shift in developmental stage from stage X to Y to Z over the monitored period, as expected given panicle development.A peak in total count occurred at week 5.This profile can be interpreted as a single sustained flowering event maintained over four weeks (weeks 1 to 5).In another approach, the timing of the peak in panicle number per tree can be assessed for individual trees given a time course of images and use of a peak detection routine (Fig. 10).A display of the number of trees with peaks in X stage panicle count each week can be displayed to give a sense of when the orchard in general has a peak flowering event.For example, of 1986 images of 993 trees in orchard A, 168 tree-sides (8%) displayed two flowering events (peaks in stage X) (Fig. 11).The first flowering event occurred in the first 2 weeks of imaging while another peak flowering event occurred on the fourth week.This information on major flowering events can be coupled to temperature records to provide an estimated harvest maturity time based on thermal time [27].If the events are sufficiently large and temporally separated, the grower can consider these as separate harvest populations.Alternatively, data can be presented on a farm map for selective display of spread of panicles per week of imaging (Fig. 12).In this software [20], an individual tree can be selected to display the flowering level, tree image, image capture date and tree identity (not shown).

Conclusions
Deep learning methods can automatically learn the useful features required in an application.For panicle detection this is likely to involve colour and shape patterns.Similar to that demonstrated by [12] for the fruit detection application, a deep learning method was demonstrated to generalize in terms of application to an orchard of different cultivar, growing condition and canopy architecture, also imaged with a different camera, to that used for training images.Similarly, [17] reported that a deep learning FCN trained on apple flower images was able to generalize well for peach and pear flower detection, and to operate across camera hardware.The use of deep learning models over traditional segmentation techniques for the application of panicle developmental stage detection is thus confirmed.
In addition to total panicle counts, classification of panicle into several developmental stages was achieved.This is an important step in allowing extraction of information on the time-spread of flowering, and thus of the future harvest.Counts of panicles in stage X over time were used in estimation of the timing of peak flowering events.Counts of panicles in stage Y demonstrated the highest correlation the flower intensity level per tree.
There has been continuous evolution of CNN architectures in terms of increased depth (number of layers) for better performance but recent improvement in the representational capacity of deep CNNs has been attributed to the restructuring of processing units (e.g., having multiple paths and connections) rather than just increasing depth [26].The better performance of MangoYOLO compared to deeper CNN architecture of YOLOv3 for panicle detection is consistent to the report of [12] for fruit detection, and can be ascribed to CNN design considerations.
The YOLO method of object detection is recommended for total panicle classification and count, and similar applications.The procedure offers a large set of data augmentation features (e.g., image rotation, HSV channel values, jitter, image scale etc.) that can be specified by user to create robust models.Moreover, YOLO is a single shot detection technique and provides faster detection rate compared to two-stage detection methods (e.g., Faster R-CNN) which allows YOLO models to be operated in real-time.
The use of rotated training images improved the detection of panicles and thus this approach is recommended for applications that require counting the instances of objects in natural scenes and having different orientations.However, for applications that require more emphasis on multi-class classification of the objects it is recommended to use upright-bounding box.
There have been several publications on machine-vision based detection and crop load estimation for tree fruit crops, but relatively few reports of in-field tree crop flower assessment especially in terms of developmental stage.There has been even less attention given to presentation and management of such data.The flower peak detection and display options presented in the current study should prompt further work in this field.
Funding: This work received funding support from the Australian Federal Department of Agriculture and Water through Horticulture Innovation (project ST19009, Multiscale monitoring tools for managing Australian tree crops).

Figure 1 .
Figure 1.Left to right: original image, upright bounding box and rotated bounding box.
, top row); (ii) stage Y -panicles with open flowers (Figure 2, middle row); and (iii) stage Z -panicles displaying flower drop and fruit set (Figure

Figure 2 .Figure 3 .
Figure 2. examples for stages X to Z, by rows.

R 2
CNN(-rotated): R 2 CNN(-rotated) is basically a Faster R-CNN object detection framework with a modification to support training on rotated objects.The R 2 CNN(-rotated) implementation supported only three CNN architectures -mobilenet_v2, ResNet50_v1 and resnet101_v1.Given that deeper CNN models generally produce better results in terms of object detection and classification (Koirala et al., 2019), ResNet101_v1 CNN architecture was used with R 2 CNN(-rotated) framework for model training.The tensorflow re-implementation of R 2 CNN (https://github.com/DetectionTeamUCAS/R2CNN_Faster-RCNN_Tensorflow)was used for training and testing in this study.All model training parameters were set to the default values.With R 2 CNN(-rotated), as for Faster R-CNN, the input resolution can be set to be square or the original aspect ratio can be preserved (shorter side scaled to 800 pixels and longer side scaled accordingly).The original Basler images 2464×2048 pixels automatically resized to 962×800 pixels during training and testing.The RGB channel pixel mean values were initialized with the values (R = 41.647,G = 41.675,B = 43.048)calculated of the training dataset.The model was trained for 146k iterations with a batch size of 1 (as no support existed for batches with more than 1 image), learning rate of 0.0003 and momentum of 0.9.ImageNet weights Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 12 December 2019 doi:10.20944/preprints201912.0160.v1(http://download.tensorflow.org/models/resnet_v1_101_2016_08_28.tar.gz) were used as transfer learning to initialize the R 2 CNN(-rotated) model.Finally, saved model weights from 146 k iteration was used for testing and validation.R 2 CNN-upright: To allow a comparison between rotated and upright box annotation in model training, a R 2 CNN-upright model was established.The R 2 CNN-upright model was trained using a training set of upright annotation boxes.This was achieved by setting the orientation of all boxes to 0 degrees, as used for training of the YOLO method.The training parameters used for the R 2 CNN(rotated) model were retained for training of the R 2 CNN-upright model.The model was trained for 146 k iterations and the final weight was used for testing and validation.

Figure 2 .
Figure 2. Pixel segmentation (left panels) and deep learning R 2 CNN(-rotated) (right panels) results for the same images.Flowers in the dark background did not segment properly (top left).Branches and leaves were erroneously segmented as flower pixels (bottom left).

PreprintsFigure 5 .
Figure 5. Example image processed with three methods for panicle stage detection.Top panel: MangoYOLO-rotated method.Orange, green and blue coloured boxes represent panicle stages X, Y and Z respectively.Middle panel and bottom panel: R 2 CNN(-rotated) and R 2 CNN-upright methods, respectively.Green, pink and red coloured boxes represent panicle classes of X, Y and Z, respectively.

Figure 6 .
Figure 6.Example images of panicle stage detection on Canon images of [20].Top panel: MangoYOLO(-upright) method: orange, green and blue coloured boxes represent panicle stages X, Y and Z respectively.Middle panel and bottom panel: R 2 CNN(-rotated) and R 2 CNN-upright methods, respectively: green, pink and red coloured boxes represent panicle classes of X, Y and Z, respectively. .

PreprintsFigure 7 .
Figure 7. Time course (weeks 1, 3, 5 and 7) of panicle number by developmental stage per tree for a row of trees.Data can also be presented on a farm map, using a colour scale to display values, per week of imaging (Fig.8) using the web-app described by[27], which allows display of the individual images associated with each tree.

PreprintsFigure 8 .
Figure 8. Flowering intensity level (top panel) and panicle count (using R 2 CNN(-rotated)) (bottom panel) of an orchard with tree rows (994 trees, 1988 images).The colours green, orange and red colour corresponds to low, medium and high flowering levels, per tree with each coloured dot representing the image of one side of a tree.In the top panel the categories are in terms of flower pixels as a percentage of canopy pixels (<10%, 10 to 25% and >25%).In the bottom panel the categories are in terms of total panicle number (<30, 30 to 70 and >70 panicles per tree image).

PreprintsFigure 9 .
Figure 9. Flower stages trend analysis across weeks for an orchard.

Figure 10 .
Figure 10.Peak flowering event detection on stage-X panicle counts for two different trees across 9 weeks of imaging.Single peak (left) and double peaks (right) marked with a coloured dot.

PreprintsFigure 11 .
Figure 11.Plot displaying week in which a peak flowering event was noted (top panel) and plot displaying the week of the largest flowering event (bottom panel) for 168 trees in which two flowering peaks were recorded.

Figure 12 .
Figure 12.Data of the imaging run of week-1 of Orchard A, in terms of number of panicles at stages X, Y and Z (top to bottom panels).Dark green light green, orange and red colours correspond to <30; 30 to 70, 70 to 100 and >100 panicles per tree image.

Table 1 .
Orchard and imaging description.

Table 2 .
[20]er of panicles in training, validation and test image sets.The test set consisted of manual panicle counts per tree from[20]and Canon camera images of those trees (no panicle stage categorization was available).

Table 3 .
Correlation R 2 between flowering intensity level per tree from pixel segmentation method and Y stage or all stages panicle counts, respectively, from the R 2 CNN(-rotated) method, and the average ratio of stage Y to total panicle count per image, for each week (n = 1988).