Watson on the Farm: Using Cloud-Based Artificial Intelligence to Identify Early Indicators of Water Stress

As demand for freshwater increases while supply remains stagnant, the critical need for sustainable water use in agriculture has led the EPA Strategic Plan to call for new technologies that can optimize water allocation in real-time. This work assesses the use of cloud-based artificial intelligence to detect early indicators of water stress across six container-grown ornamental shrub species. Near-infrared images were previously collected with modified Canon and MAPIR Survey II cameras deployed via a small unmanned aircraft system (sUAS) at an altitude of 30 meters. Cropped images of plants in no, low-, and high-water stress conditions were split into four-fold cross-validation sets and used to train models through IBM Watson’s Visual Recognition service. Despite constraints such as small sample size (36 plants, 150 images) and low image resolution (150 pixels by 150 pixels per plant), Watson generated models were able to detect indicators of stress after 48 hours of water deprivation with a significant to marginally significant degree of separation in four out of five species tested (p < 0.10). Two models were also able to detect indicators of water stress after only 24 hours, with models trained on images of as few as eight water-stressed Buddleia plants achieving an average area under the curve (AUC) of 0.9884 across four folds. Ease of pre-processing, minimal amount of training data required, and outsourced computation make cloud-based artificial intelligence services such as IBM Watson Visual Recognition an attractive tool for agriculture analytics. Cloud-based artificial intelligence can be combined with technologies such as sUAS and spectral imaging to help crop producers identify deficient irrigation strategies and intervene before crop value is diminished. When brought to scale, frameworks such as these can drive responsive irrigation systems that monitor crop status in real-time and maximize sustainable water use.


Introduction
Freshwater is a finite resource that is required for the daily production of container crops to be used for food, ecosystem services, urban development, and other purposes. The United Nations Education, Scientific, and Cultural Organization (UNESCO) has indicated that the combined expansion of manufacturing, agriculture, and urban populations has created excessive strain on the existing fresh water supply and has called for more sustainable water management [1]. One opportunity to reduce water consumption lies in the development of intelligent irrigation systems that can optimize water use in real-time [2]. Crop producers routinely provide an excess of water to container-grown plants to mitigate plant stress and subsequent economic loss, resulting in inefficient use of agrichemicals, energy, and freshwater. Site-specific irrigation systems minimize these losses by using sensors to allocate water to plants as needed, improving crop production while minimizing operating costs [3]. Sensor-based irrigation is not a new concept [1,[4][5][6]. Kim et al. [5] developed software for an in-field wireless sensor network (WSN) to implement site-specific irrigation management in greenhouse containers. Coates et al. [7] developed site-specific applications using soil water status data to control irrigation valves.
In 2017, the U.S. nursery industry had sales of $5.9 billion and ornamental production accounted for 2.2 percent of all U.S. farms [8]. Plants grown in containers are the primary (73%) production method [9] and the majority (81%) of nursery production acreage is irrigated [10]. The largest production cost for nurseries is labor, which amounts to 39% of total costs [11], and labor shortages are linked to reduced production [12]. Adoption of appropriate technologies may offset increasing labor costs and labor shortages. Small unmanned aircraft systems (sUAS) have been suggested as an important tool in nursery production to help automate certain processes such as water resource management [13].
sUASs allow farmers to quickly survey large plots of land using aerial imagery. sUAS imagery has been used to detect diseases and weeds [14,15], predict cotton yield [16], measure the degree of stink bug aggregation [17], and identify water stress in ornamental plants [18]. Several thermal and spectral indices have been correlated to biophysical plant parameters based on sUAS imagery [19,20]. Analyses of sUAS imagery have been shown to be sensitive to time of day, cloud cover, light intensity, image pixel size, soil water buffering capacity, and atmospheric conditions at the canopy level [21,22]. Still, multispectral data collected with sUAS were shown to be more accurate than data collected using manned aircraft [23]. A variety of methodologies, including thermal and spectral imagery, have been used to assess water stress in conventional sustainable agriculture using sUAS [3]. Stagakis et al. [24] indicated that the high spatial and spectral resolution provided by sUAS-based imagery could be used to detect deficient irrigation strategies. Zovkoa et al. [25] reported difficulty measuring three levels of water stress of grape grown in soil; however, they were able to discern irrigated vs. non-irrigated plots via hyperspectral image analysis (409-988 nm and 950-2509 nm) when employing a support vector machine (SVM). de Castro et al. [18] successfully identified water-stressed and non-stressed containerized ornamental plants using two multispectral cameras aboard an sUAS, although the spectral separation was higher when information from the sensors was combined. Data being produced by de Castro and Zovkoa could be utilized as a roadmap for real-time, sustainable water management of specialty or container-grown crops using sUAS. Fulcher et al. [26] indicated that the adoption of sUAS to monitor crop water status will be useful in addressing the challenge of sustainable water use in container nurseries. Unlike conventional crops produced in soil systems, containerized soilless-based systems have low water buffering capacity, resulting in rapid physiological changes that may not be observed at the ground level visually, but can be monitored by reflected wavelengths captured by sUAS. To reduce size and cost, sUAS can collect and wirelessly transmit high-resolution image data to cloud providers that can perform analyses on offsite servers. Thus, the convergence of technologies-such as sUAS, Internet of Things (IoT), spectral imagery, and cloud-based computing-can be used to build intelligent irrigation systems that monitor crop status and optimize water allocation in real time.
In this study, images were analyzed with IBM Watson Visual Recognition, a cloud-hosted artificial intelligence service that allows users to train custom image classifiers using deep convolutional neural Remote Sens. 2019, 11, 2645 3 of 13 networks (CNNs). Unlike linear algorithms, CNNs model complex non-linear relationships between the independent variables (pixels comprising the image) and the dependent variable (plant health) by transforming data through layers of increasingly abstract representation ( Figure 1). The first layer is an array of pixel values from the original image; nodes in subsequent layers represent local features such as color, texture, and shape; deeper layers encode semantic information such as leaf or branch morphology. Individual nodes become optimized to represent different features of the image through an iterative learning process that rewards nodes that amplify aspects of the image that are useful for classification and suppresses those that do not [27]. The convolutional relationship from one layer to the next allows CNNs to model complex relationships between input variables, making it particularly useful for analyzing image data that cannot be understood by examining pixels in isolation. Given a set of images of stressed and non-stressed plants, for example, individual nodes in the network may become optimized to represent spectral indices that are sensitive to water stress. Those nodes can affect the outcome directly, or they can feed forward into higher-order features such as the specific location and pattern of discoloration within the plant. Spectral indices may combine with other plant features such as the unique structure of sagging branches or the distinct texture created by the shadows from drooping leaves. All of these features culminate in a single output node that returns a value from zero to one representing the confidence that a given image belongs to the desired class (i.e., water stress).
Remote Sens. 2019, 11, x FOR PEER REVIEW 3 of 13 relationships between the independent variables (pixels comprising the image) and the dependent variable (plant health) by transforming data through layers of increasingly abstract representation ( Figure 1). The first layer is an array of pixel values from the original image; nodes in subsequent layers represent local features such as color, texture, and shape; deeper layers encode semantic information such as leaf or branch morphology. Individual nodes become optimized to represent different features of the image through an iterative learning process that rewards nodes that amplify aspects of the image that are useful for classification and suppresses those that do not [27]. The convolutional relationship from one layer to the next allows CNNs to model complex relationships between input variables, making it particularly useful for analyzing image data that cannot be understood by examining pixels in isolation. Given a set of images of stressed and non-stressed plants, for example, individual nodes in the network may become optimized to represent spectral indices that are sensitive to water stress. Those nodes can affect the outcome directly, or they can feed forward into higher-order features such as the specific location and pattern of discoloration within the plant. Spectral indices may combine with other plant features such as the unique structure of sagging branches or the distinct texture created by the shadows from drooping leaves. All of these features culminate in a single output node that returns a value from zero to one representing the confidence that a given image belongs to the desired class (i.e., water stress). While CNNs' layers allow networks to model complex nonlinear relationships that simpler algorithms might miss, they are also prone to overfitting. This occurs when CNNs learn patterns that are specific to the training set and do not generalize to the overall population. In one case study, for example, a model trained to predict a patient's age based on MRI images was found to have learned the shape of the head rather than the content of the scan itself [28]. The challenge of overfitting is compounded by CNNs' inherent 'black box' quality. Since information is passed through so many transformations, it is difficult to identify which input variables have the largest influence on the final outcome. While CNNs often must be trained with large datasets to overcome their tendency to overfit, transfer learning techniques allow fully trained networks to be repurposed for new classification tasks with much smaller datasets. A growing set of tools are also making it possible to introspect models to determine feature importance directly. Saliency heat maps, for example, can highlight regions of the image that are used for classification [29,30]. Overfitting can be tested with a cross-validation scheme in which models are trained with one set of images and then used to classify a new, previously unseen set of images. Performance metrics are based on how well the model's classification of unseen data matches a ground truth standard. A final limitation of CNNs is the significant amount of time and resources required to train them. To circumvent this, the computation may be outsourced to cloud computing providers that train models on large servers and offer a suite of tools for hyperparameter tuning, transfer learning, and cross-validation [31]. While CNNs' layers allow networks to model complex nonlinear relationships that simpler algorithms might miss, they are also prone to overfitting. This occurs when CNNs learn patterns that are specific to the training set and do not generalize to the overall population. In one case study, for example, a model trained to predict a patient's age based on MRI images was found to have learned the shape of the head rather than the content of the scan itself [28]. The challenge of overfitting is compounded by CNNs' inherent 'black box' quality. Since information is passed through so many transformations, it is difficult to identify which input variables have the largest influence on the final outcome. While CNNs often must be trained with large datasets to overcome their tendency to overfit, transfer learning techniques allow fully trained networks to be repurposed for new classification tasks with much smaller datasets. A growing set of tools are also making it possible to introspect models to determine feature importance directly. Saliency heat maps, for example, can highlight regions of the image that are used for classification [29,30]. Overfitting can be tested with a cross-validation scheme in which models are trained with one set of images and then used to classify a new, previously unseen set of images. Performance metrics are based on how well the model's classification of unseen data matches a ground truth standard. A final limitation of CNNs is the significant amount of time and resources required to train them. To circumvent this, the computation may be outsourced to cloud computing providers that train models on large servers and offer a suite of tools for hyperparameter tuning, transfer learning, and cross-validation [31].
Despite their inherent limitations, CNNs have become popular for image recognition tasks ranging from Facebook photo-tagging to self-driving cars [32,33]. In agriculture, CNNs have been used to predict wheat yield based on soil parameters, diagnose diseases with simple images of leaves, and detect nitrogen stress using hyperspectral imagery [34,35]. CNNs' ability to learn complex nonlinear features makes them particularly useful for analyzing image data in which individual pixels form larger features such as shape or texture. Extensive research has demonstrated that CNNs perform image classification tasks with higher accuracy than traditional machine vision algorithms [36].
In our study, a small set of aerial images were used to train custom image classification models to detect water stress in ornamental shrubs. The objective was to evaluate the ability of IBM Watson's Visual Recognition service to detect early indicators of plant stress. These experiments provide a strong rationale for the deployment of cloud-based artificial intelligence frameworks that use larger datasets to monitor crop status and maximize sustainable water use.

Materials and Methods
This research was conducted at the Hampton Roads Agricultural Research and Extension Center (Hampton Roads AREC-Virginia Tech), located in Virginia Beach, VA, USA (36.8919N, 76.1787W). Six plots with container-grown ornamental plants across two experimental areas were studied. Containers were established outdoors on gravel. The species and number of plants in each experimental plot are shown in Table 1. A subset of plants from each species was removed from the open-air nursery and transferred to a greenhouse where the plants experienced water stress due to the absence of overhead irrigation. High water stress (HWS) plants were transferred to the greenhouse on 8 Aug 2017 and low water stress (LWS) plants were transferred to the greenhouse on 9 Aug 2017. The plants were then returned to the open-air nursery on 10 Aug 2017 after non-stressed plants received overhead irrigation daily, including 10 Aug 2017. This process produced three levels of water stress for this experiment; high, low, and non-stressed (Table 2). At the time of flight, the soilless substrate of HWS plants contained 19% less water (mL) than non-stress plants and soilless substrate of LWS plants contained~13% less water (mL) than non-stress plants. There were no easily detectable visual symptoms of water stress in any of the treatment plants. After the data collection, all water-stressed plants were returned to normal irrigation on 10 August 2017 where they fully recovered and continued to grow. This strategy was part of a broader research program with the aim of studying the adaptation of ornamental species to stress conditions.   Table 2. Figure 3 shows the data collected from both sensors.  Table 2. Figure 3 shows the data collected from both sensors.   Figure 2b. During each flight, the quadcopter took images using each camera at a height of 30 meters and a forward and side lap of 90% and 60%, respectively. The technical specifications of the two sensors are shown in Table 2. Figure 3 shows the data collected from both sensors.

Image Pre-Processing
Images were cropped using a custom, browser-based interface in LabelBox (Figure 4). Data were annotated by dragging a bounding box across each plant and labeling it as 'high water stress', 'low water stress', or 'no stress' according to the key provided in Figure 5. The GraphQL application programming interface (API) was used to pull the pixel coordinates of each bounding box onto a local computer so that individual plants could be cropped from the original aerial images. The

Image Pre-Processing
Images were cropped using a custom, browser-based interface in LabelBox (Figure 4). Data were annotated by dragging a bounding box across each plant and labeling it as 'high water stress', 'low Remote Sens. 2019, 11, 2645 6 of 13 water stress', or 'no stress' according to the key provided in Figure 5. The GraphQL application programming interface (API) was used to pull the pixel coordinates of each bounding box onto a local computer so that individual plants could be cropped from the original aerial images. The resolution of the cropped images was approximately 150 by 150 pixels. The number of cropped images for each condition is shown in Table 3.
Remote Sens. 2019, 11, x FOR PEER REVIEW 6 of 13 resolution of the cropped images was approximately 150 by 150 pixels. The number of cropped images for each condition is shown in Table 3.     Table 3.    Since multiple photographs were taken of the same plots from different angles, cropped images of the same plants were grouped together so that they could be segregated into the training set or validation set as complete units. This procedure protected against overly optimistic performance estimates that would occur if photographs of the same plant appeared in both the training and validation datasets. For each species and treatment, the centers of each bounding box were calculated and normalized to a range of zero to one. Spatstat of plants seven and eight would be used for validation. This allowed us to make full use of the data during the training phase without artificially inflating performance metrics by validating models with images of the same plants they were trained with. The successful grouping was confirmed by visual inspection (Figure 6). Since multiple photographs were taken of the same plots from different angles, cropped images of the same plants were grouped together so that they could be segregated into the training set or validation set as complete units. This procedure protected against overly optimistic performance estimates that would occur if photographs of the same plant appeared in both the training and validation datasets. For each species and treatment, the centers of each bounding box were calculated and normalized to a range of zero to one. Spatstat (http://spatstat.org), an open-source R package for analyzing point patterns, was then used to match plants from different aerial images based on the similarity of their pixel coordinates. For example, if there were eight plants in the HWS treatment of a certain species, all images of plants one through six would be used to train the model and all images of plants seven and eight would be used for validation. This allowed us to make full use of the data during the training phase without artificially inflating performance metrics by validating models with images of the same plants they were trained with. The successful grouping was confirmed by visual inspection (Figure 6).

Model Training and Testing
Cropped images were used to train models with the Watson Visual Recognition API, a cloud-hosted artificial intelligence service provided by IBM that uses CNNs to build custom image classifiers. Here, models were trained to predict water stress status using red, green, and near-infrared pixel values of the cropped images. A Python script was used to access the service and transfer images from a local computer to a cloud server for model training and testing. For each species and camera, three-quarters of NS and HWS images were used to train a model that was then used to classify the remaining quarter. The API returned a prediction between zero and one for each validation image with zero indicating no stress and one indicating water stress (Table 4). This process was repeated four times so that a prediction could be made for each image in the dataset and compared to the ground truth.

Statistical Analysis
A receiver operating characteristic area under the curve (AUC) score was used to quantify the degree of separation between treatments for each species and camera. A one-sample t-test was used to compare the AUC scores returned by the four-fold validation sets to a hypothesized mean of 0.5, corresponding to random classification.

Results
Of the 11 combinations of species and camera used in this study, four produced models that were able to discriminate images of NS and HWS plants with a statistically significant degree of separation (p < 0.05): Canon and MAPIR images of Buddleia, Canon images of Physocarpus opulifolius, and MAPIR images of Hydrangea paniculata (Table 5). Of these four, models trained with MAPIR or Canon images of NS and HWS Buddleia were also able to discriminate NS and LWS plants with high separation (Table 6). Four datasets produced models with a marginally significant degree of separation (0.05 < p < 0.10): Canon and MAPIR images of Hydrangea quercifolia, Canon images of Hydrangea paniculata, and MAPIR images of Physocarpus opulifolius ( Table 5). Images of Spiraea japonica were not tested because the HWS class in the training set did not meet the minimum of 10 images required by the Visual Recognition API. Overall, models trained with four of five species tested achieved marginal significance or better (p < 0.10) in one or both cameras (Figures 7 and 8).
Results were compared to a previous study by de Castro et al. [18] that described the same dataset by masking the background and comparing mean pixel values in stressed and non-stressed plants. The three wavelengths detected by each camera were delineated and differences between treatments were evaluated by performing an analysis of variance (ANOVA) significance by a Tukey honestly significant difference (HSD) range test. Experiments that demonstrated a significant difference in mean pixel value between water stress treatments in one or more wavelengths (p < 0.05) are highlighted green in Table 5. Marginal significance is not shown because de Castro et al. [18] did not report specific p-values. Table 5. Performance of models trained to classify HWS and NS images. Models achieving a statistically significant degree of separation (p-value < 0.05) are highlighted green and models achieving a marginal degree of separation (0.05 < p-value < 0.10) are highlighted yellow.  are highlighted green in Table 5. Marginal significance is not shown because de Castro et al. [18] did not report specific p-values.

Discussions
Unlike traditional machine vision models that require users to manually select features, CNNs have layers of neurons that allow them to automatically learn relevant features from data. CNNs improve with each training example by iteratively rewarding neurons that amplify aspects of the image that are important for discrimination and suppressing those that do not. For example, in traditional techniques, the background must be manually segmented prior to analysis. By contrast, CNNs can automatically 'learn' to ignore the background because it is not relevant to the classification task. Similarly, rather than manually delineating spectral indices thought to be correlated with plant health, networks can infer relevant transformation of the input color channels from data. Low level features inferred by the network feed into higher-order features such as the specific location or pattern of discoloration within the plant. Information from spectral indices may combine with other features such as the unique structure of sagging branches or the distinct texture created by the shadows from wilted leaves. Thus, CNNs can learn multiple features of the training images and are not limited by a priori hypotheses.
Models tested in this study demonstrated significant variation in their ability to identify water stress in different species. Models trained on Buddleia achieved near-perfect separation while those trained on Cornus approximated random classification. Such variation is consistent with previous literature showing differences in morphological and physiological responses to water stress across genera, species, and even cultivar. In Michigan, Warsaw et al. [37] tracked daily water use and water use efficiency of 24 temperate ornamental taxa from 2006 and 2008. Daily water use varied from 12 to 24 mm per container and daily water use efficiency (increase in growth index per total liters applied) varied from 0.16 to 0.31. Of the similar taxa used, Buddleia davidii 'Guinevere' (24 mm per container) had the greatest water use followed by Spirea japonica 'Flaming Mound' (18 mm per container), Hydrangea paniculata 'Unique' (14 mm per container), and Cornus sericea 'Farrow' (12 mm per container) with estimated crop coefficients (KC) of 6.8, 5.0, 3.6, and 3.4, respectively. Low-water tolerant taxa such as Cornus may simply not have been demonstrating symptoms of water stress when they were photographed. Models that achieved moderate performance were likely provided with too few examples to distinguish patterns relevant to the classification task from those specific to the training data, causing them to generalize poorly to new data during the testing phase. Such overfitting bias can be overcome by training models with a larger and more diverse set of training images. Varying the location, weather, and growing period in which images are taken, for example, Figure 8. Models that achieved a statistically significant degree of separation on HWS images were also used to classify LWS images.

Discussions
Unlike traditional machine vision models that require users to manually select features, CNNs have layers of neurons that allow them to automatically learn relevant features from data. CNNs improve with each training example by iteratively rewarding neurons that amplify aspects of the image that are important for discrimination and suppressing those that do not. For example, in traditional techniques, the background must be manually segmented prior to analysis. By contrast, CNNs can automatically 'learn' to ignore the background because it is not relevant to the classification task. Similarly, rather than manually delineating spectral indices thought to be correlated with plant health, networks can infer relevant transformation of the input color channels from data. Low level features inferred by the network feed into higher-order features such as the specific location or pattern of discoloration within the plant. Information from spectral indices may combine with other features such as the unique structure of sagging branches or the distinct texture created by the shadows from wilted leaves. Thus, CNNs can learn multiple features of the training images and are not limited by a priori hypotheses.
Models tested in this study demonstrated significant variation in their ability to identify water stress in different species. Models trained on Buddleia achieved near-perfect separation while those trained on Cornus approximated random classification. Such variation is consistent with previous literature showing differences in morphological and physiological responses to water stress across genera, species, and even cultivar. In Michigan, Warsaw et al. [37] tracked daily water use and water use efficiency of 24 temperate ornamental taxa from 2006 and 2008. Daily water use varied from 12 to 24 mm per container and daily water use efficiency (increase in growth index per total liters applied) varied from 0.16 to 0.31. Of the similar taxa used, Buddleia davidii 'Guinevere' (24 mm per container) had the greatest water use followed by Spirea japonica 'Flaming Mound' (18 mm per container), Hydrangea paniculata 'Unique' (14 mm per container), and Cornus sericea 'Farrow' (12 mm per container) with estimated crop coefficients (KC) of 6.8, 5.0, 3.6, and 3.4, respectively. Low-water tolerant taxa such as Cornus may simply not have been demonstrating symptoms of water stress when they were photographed. Models that achieved moderate performance were likely provided with too few examples to distinguish patterns relevant to the classification task from those specific to the training data, causing them to generalize poorly to new data during the testing phase. Such overfitting bias can be overcome by training models with a larger and more diverse set of training images. Varying the location, weather, and growing period in which images are taken, for example, can force models to learn features that generalize to all conditions. Future studies can also use images of plants with multiple degrees of water stress to train regression models that return a value along a numeric scale rather than a stressed or not-stressed binary.
While CNNs' complicated nature prevents us from knowing what features are driving the model, insight can be gained from the conditions in which classifiers succeed or fail. For example, classifiers trained by pooling images of all species had significantly lower performance than classifiers trained with images of just one species despite having a considerably larger training set. This suggests that symptoms of water stress differ from one species to the next. Subsequent studies can identify what features are driving the model by iteratively removing them from the image. For example, one experiment could train models with individual R, G, or near-infrared channels to determine if certain spectral indices are more sensitive to water stress than others. Another experiment could crop a rectangle circumscribed to the plant in order to see if plant shape or other peripheral features aid the classifier. Features that significantly reduce performance when removed may represent biologically relevant phenotypes that are worthy of further study.

Conclusions
Our findings confirm that the IBM Watson Visual Recognition service can be used to identify early indicators of water stress in ornamental shrubs despite constraints such as small sample size, low image resolution, and lack of clear visual differences. Watson-generated models were able to detect indicators of stress after 48 hours of water deprivation with a significant to marginally significant degree of separation in four out of five species tested (p < 0.10). Models trained on images of Buddleia achieved near-perfect separation after only 24 hours with a max AUC of 0.9884. Furthermore, unlike traditional algorithms that require users to manually select plant parameters believed to correlate with health status, CNNs were able to automatically infer relevant features from the training data and combine multiple types of visual information. Despite this, not all models were successful. Failure of models trained on images of Cornus was consistent with previous literature, suggesting higher water stress tolerance in Cornus compared to the other species tested. Because all plants were grown in the same experimental area, authors cannot be certain that these models will generalize well to new situations.
Future studies can focus on improving model accuracy and generalizability by increasing the number of training examples and varying the conditions in which images are taken. Fully trained networks can also be introspected to give biological backing to the most predictive features. Other studies can expand the application of this workflow by testing data collected with different sensors and on different species. These experiments provide a valuable case study for the use of CNNs to monitor plant health. Brought to scale, artificial intelligence frameworks such as these can drive responsive irrigation systems that monitor plant status in real time and maximize sustainable water use.