Quantitative Phenotyping of Northern Leaf Blight in UAV Images Using Deep Learning

: Plant disease poses a serious threat to global food security. Accurate, high-throughput methods of quantifying disease are needed by breeders to better develop resistant plant varieties and by researchers to better understand the mechanisms of plant resistance and pathogen virulence. Northern leaf blight (NLB) is a serious disease affecting maize and is responsible for signiﬁcant yield losses. A Mask R-CNN model was trained to segment NLB disease lesions in unmanned aerial vehicle (UAV) images. The trained model was able to accurately detect and segment individual lesions in a hold-out test set. The mean intersect over union (IOU) between the ground truth and predicted lesions was 0.73, with an average precision of 0.96 at an IOU threshold of 0.50. Over a range of IOU thresholds (0.50 to 0.95), the average precision was 0.61. This work demonstrates the potential for combining UAV technology with a deep learning-based approach for instance segmentation to provide accurate, high-throughput quantitative measures of plant disease.


Introduction
Global food production is severely threatened by plant disease [1,2]. In order to breed for increased plant resistance and better understand the mechanisms of disease, breeders and researchers alike need accurate and repeatable methods to measure subtle differences in disease symptoms [3,4].
Although examples exist of qualitative disease resistance where plants exhibit presence/absence of disease symptoms, the majority of observed resistance is quantitative [5]. Quantitative traits exhibit a continuous distribution of phenotypic values and are controlled by multiple underlying genes that interact with the environment. Additionally, pathogen virulence is quantitative [6], with the combination of these two systems resulting in complex plant-pathogen interactions [7]. The ability to accurately measure subtle phenotypic differences in this complex multidimensional system is necessary to better understand host-pathogen interactions and develop disease resistant plant varieties.
Northern leaf blight (NLB) is a foliar fungal disease affecting maize (Zea mays L.) caused by Setosphaeria turcica (anamorph Exserohilum turcicum). It occurs worldwide and is a particular problem in humid climates [8,9]. In the US and Ontario, estimated yield losses due to NLB have increased over recent years and accounted for estimated yield losses of 14 million tons in 2015 [10]. NLB symptoms Remote Sens. 2019, 11, 2209 2 of 10 begin as gray-green, water soaked lesions that progress to form distinct cigar shaped necrotic lesions. Visual estimates of NLB severity are typically carried out in the field using ordinal [11] or percentage scales [12]. However, visual estimates have been shown to be subject to human error and variation both between and among scorers in NLB [13], as well as various other plant diseases (e.g., [14,15]).
Advances have been made in recent years using image-based approaches to detect and quantify disease symptoms that increase throughput and eliminate human error [16]. More recently, deep learning has shown tremendous potential in the field of plant phenotyping [17] and has previously been applied to plant disease identification and classification [18].
Image-based deep learning can be broadly grouped into three areas: image classification, object detection, and object segmentation. Image classification provides qualitative (presence/absence) measures of objects in an image, as was previously used for identification of NLB symptoms in images [19]. Instance segmentation, however, advances image classification to identify individual objects in an image, thus providing quantitative measures such as number, size, and location of objects within an image. The area of instance segmentation is rapidly advancing [20], with a recent breakthrough being Mask R-CNN [21]. Mask R-CNN performs instance classification and pixel segmentation of objects and has been successfully used in a diverse range of tasks, including street scenes [21], ice wedge polygons [22], cell nuclei [23], and plant phenotyping [24].
The rapid advancement of unmanned aerial vehicle (UAV) technology has made these platforms capable of capturing images suitable for plant phenotyping [25,26]. The combination of UAV-based image capture and a deep learning-based approach for instance segmentation has the potential to provide an accurate, high-throughput method of plant disease phenotyping under real world conditions. The aim of this study was to develop a high-throughput method of quantifying NLB under field conditions. A trained Mask R-CNN model was used to accurately count and measure individual NLB lesions in UAV-collected images. To our knowledge, this is the most extensive application of Mask R-CNN to quantitatively phenotype a foliar disease affecting maize using UAV imagery.

Image Annotation
Through collection by a camera mounted on a UAV flown at an altitude of 6 m, aerial images of maize artificially inoculated with S. turcica [27] were used as a starting point for our image dataset. The location of lesions was previously annotated by trained plant pathologists in 7669 UAV-based images using a simple line annotation tool [27]. Using the line annotations, individual lesions were cropped out of the full-size images ( Figure A1). From the center point of each line, images were cropped in all directions determined by: where bbox is the smallest bounding box that can be drawn around the line annotation, bbox max is the maximum dimension of the bounding box and b is a 300 pixel buffer ( Figure A2). Cropping resulted in square images to preserve lesion aspect ratios. As a result the proximity of lesions, cropped images may contain more than one lesion. Images were further resized to 512 × 512 using the antialias filter in the Python imaging library [28]. Individual lesions in 3000 resized images were further annotated with polygons using a custom ImageJ [29] annotation macro (File S1). The perimeter of each lesion was delineated using the free hand line tool. For each image, the annotation macro produced a 512 × 512 × n binary .tiff image stack where n represents the number of lesion instances.

Model and Training
A Mask R-CNN model [21] was trained using code modified from [31] (File S3). The model was built with a resNet-101 backbone initiated with weights from a model pretrained on the MS-COCO dataset [32]. The training dataset was augmented by rotating images 90 • , 180 • , and 270 • , as well as rotating 0 • , 90 • , 180 • , and 270 • followed by mirroring, resulting in seven additional augmented images per image. The model was trained for a total of 10 epochs. The pretrained weights of the resnet backbone were frozen for the first four epochs, and the head branches were trained using a learning rate of 1 × 10 −3 . All layers were then trained for an additional epoch using a learning rate of 1 × 10 −3 . The model was fine tuned by reducing the learning rate to 1 × 10 −4 for the remaining five epochs. The model began to overfit after the sixth training epoch; therefore, weights from the sixth epoch were used for subsequent model validation. Training was performed on a linux PC running Ubuntu 16.04 LTS fitted with an Intel Xeon E5-2630 CPU, 30 GB RAM, and two NVIDIA GeForce 1080Ti GPUs. Training was performed with four images per GPU giving an effective batch size of eight. Inference was conducted on single images on a single GPU.

Validation
After training, the model was used to classify the test image set and evaluated with metrics used by the COCO detection challenge [33]. Average precision was calculated with an intersection over union (IOU) threshold of 0.50 (AP 50 ) and over a range of thresholds from 0.50 to 0.95 in 0.05 increments (AP). Predictions were considered to be true positives if the predicted mask had an IOU of >0.50 with one ground truth instance. Conversely, false positives were declared if predictions had <0.50 IOU with the ground truth. Instances present in the ground truth with an IOU of <0.50 with the predictions were assigned as false negatives. The mask IOU was calculated between the predicted and ground truth masks for each image. In cases where more than one instance was present in either mask, the masks were flattened to produce 512 × 512 × 1 dimension binary masks.

Results
A Mask R-CNN model was trained to segment NLB lesions in images acquired by a UAV. Using the weights from the sixth epoch, the model was used to classify a hold-out test set. Training time for the six epochs was 1 h and 5 min. Mean inference time on the test set was 0.2 s per 512 × 512 pixel image.
Over the 450 image test set, the mean IOU between the ground truth and predicted masks test set was 0.73 ( Figure 1a). The IOU gives a measure of the overlap between the ground truth and the prediction, with values >0.50 considered to be correctly predicted. Notably, 93% of the predicted test set masks had >0.50 IOU with the ground truths.
Average precision (AP), which is the proportion of the predictions that match the corresponding ground truth at different IOU thresholds, was found to be 0.61. The AP 50 of the trained model was 0.96, revealing that the model performed well at the lenient IOU threshold of 0.50. In contrast, an AP of 0.61 showed that the performance of the model decreased as more stringent thresholds were applied (Figure 1b). Moreover, the trained model was robust to variation in image scale. The size of the predicted lesions in the test set ranged from 162 to 45,457 pixels, with a mean size of 7127 pixels (Figure 1c). Of the false positive instances, the most common causes were partially occluded patches of senesced tissue (Figure 5a), patches of soil with a similar shape to lesions (Figure 5b), and senesced male flowers (Figure 5c). Coalesced lesions were frequently annotated as a single lesion in the ground truth, but predicted to be two separate lesions or vice versa, thus having an IOU of <0.50 and were considered ambiguous. Similarly, 11 of the 46 false negative instances were due to differences in the prediction and ground truth of coalesced lesions resulting in an IOU between predicted and ground truth instances of <0.50.

Discussion
The work presented demonstrates the potential of instance segmentation with a deep learning-based approach for field-based quantitative disease phenotyping. Despite the challenges of field imagery, the trained Mask R-CNN network performed well at detecting and segmenting lesions with results comparable to similar work in other systems [23]. It has previously not been possible to phenotype plant disease from the air at such a high-resolution. The ability to measure both the number and size of lesions provides the opportunity to better study the interaction between S. turcica and maize under realistic field conditions.
Both pathogen virulence and plant resistance are predominantly quantitative traits [5,6]. The ability to accurately phenotype these traits is key to successful application of genetic dissection approaches such as linkage analysis and genome-wide association studies, as well as successfully selecting resistant lines in a breeding program. There is evidence in other pathosystems that different disease symptoms represent differences in the underlying genetics of resistance in the plant [34] or virulence in the pathogen [35]. Considering individual components of disease symptoms rather than simply assigning a single overall value, on an ordinal or percentage scale for example, has led to the discovery of new genetic loci responsible for pathogen virulence [35] and plant resistance [36].
Previous studies have demonstrated the ability of deep learning to classify the presence/absence of single diseases [19], as well as classify multiple diseases on a range of plants [37][38][39][40] in images acquired under field and controlled conditions. The ability to detect and measure individual lesions in UAV images builds on previous work to classify the presence/absence of NLB in ground- [19] and aerial-based images [41]. Whilst image classification may be useful for tasks such as disease identification, it does not give precise measures of symptoms required as part of a breeding or research program. In contrast, instance segmentation has the ability to provide accurate quantitative measures of disease.
As a result of the nature of field imagery, collected images can frequently contain features that are similar in size and color to lesions. Such non-lesion features resulted in our network returning a greater number of false positives than false negatives. As with other studies, we found that patches of soil between leaves, senesced male flowers, and areas of leaf necrosis not caused by disease were commonly misclassified as lesions [19,41]. Interestingly, we found that a large number of false positives were actually lesions that were missed by human 'experts', an observation also previously made in this pathosystem [19]. A similar number of false positives were ambiguous instances which, upon manual inspection, could not be determined either way.
Our instance masks used pixels as a proxy to measure lesion size. As a result of the differences in leaf height on tall stature crops such as maize and the constant elevation of the UAV, the number of pixels within a given lesion varied depending on its position on the plant and subsequent distance from the camera. With current technology, this is not an obstacle that can be easily overcome. However, if distances were available, the narrow focal plane of the images would allow for the estimation of error within a known range.
The presented work focuses on a single disease from an artificially inoculated maize field with a single disease. The Mask R-CNN framework has the ability to classify and segment multiple instance classes [21]. The potential exists for expansion of the current work to multiple diseases affecting maize, provided that sufficient image resolution and training data can be acquired.
The presence of publicly available deep learning models, as well as annotated image datasets, allows for state-of-the-art techniques to be brought to bear on long-standing questions. To illustrate this point, our minor modifications to the Mask R-CNN model initiated with pre-trained weights from everyday objects allowed for rapid application to the niche problem of NLB detection. In parallel, the use of UAVs allows large areas of land to be surveyed for plant disease more quickly and in more detail compared to human observers [26]. The combination of UAV technology and a deep learning approach, for instance, segmentation, shows tremendous promise to beneficially alter the field of agricultural research [42]. Integrating these technologies together into plant disease research and breeding programs has the potential to expedite the development of resistant varieties and further our understanding of pathogen virulence, plant resistance, and the interaction between host and pathogen.