A Method for Detecting Coffee Leaf Rust through Wireless Sensor Networks, Remote Sensing, and Deep Learning: Case Study of the Caturra Variety in Colombia

Agricultural activity has always been threatened by the presence of pests and diseases that prevent the proper development of crops and negatively affect the economy of farmers. One of these pests is Coffee Leaf Rust (CLR), which is a fungal epidemic disease that affects coffee trees and causes massive defoliation. As an example, this disease has been affecting coffee trees in Colombia (the third largest producer of coffee worldwide) since the 1980s, leading to devastating losses between 70% and 80% of the harvest. Failure to detect pathogens at an early stage can result in infestations that cause massive destruction of plantations and significantly damage the commercial value of the products. The most common way to detect this disease is by walking through the crop and performing a human visual inspection. As a result of this problem, different research studies have proven that technological methods can help to identify these pathogens. Our contribution is an experiment that includes a CLR development stage diagnostic model in the Coffea arabica, Caturra variety,scale crop through the technological integration of remote sensing (through drone capable multispectral cameras), wireless sensor networks (multisensor approach), and Deep Learning (DL) techniques. Our diagnostic model achieved an F1-score of 0.775. The analysis of the results revealed a p-value of 0.231, which indicated that the difference between the disease diagnosis made employing a visual inspection and through the proposed technological integration was not statistically significant. The above shows that both methods were significantly similar to diagnose the disease.


Introduction
The food and beverage industry is characterized by a relatively small number of multinational companies that link small producers around the world with consumers. A development analysis conducted by the World Economic Forum and Accenture, in 2018 [1], focused, predominantly, on upstream value chain segments due to the low tech nature of food and beverage processing and production and the substantial potential for improving efficiency in agrifood activities.
(a) (b) Figure 1. Coffee Leaf Rust (CLR) effects: (a) on the Caturra variety crops; (b) on a leaf at the disease's highest development stage [11].
CLR progresses gradually in time and reaches three noticeably phases. The first one, called the "slow phase" (severity ≤5%), is where the first structures responsible for the production of spores emerge and low levels of infection are evident. The second one, which is named the "fast or explosive phase" (5% < severity ≤ 30%), starts with the fungus sporulation and is represented by more plants getting sick in a short period. The final phase is called the "maximum or terminal phase" (severity >30%) and occurs when most of the leaves are severely attacked and a small amount of healthy leaves remains. At that moment, the epidemic stops in the host due to the lack of biological matter to continue the infection. When the CLR is not controlled and the climatic conditions are favorable, the disease can develop at a daily rate of 0.19-0.38%, reduce the impact of the chemical controls, and cause significant economic damage [10].

Context
In the Colombian context, coffee is the most exported agricultural product, followed by cut flowers, bananas, cocoa, and sugarcane [12]. In the country, there are more than 903,000 hectares dedicated to it, and approximately 563,000 families depend directly on this economic activity. Colombian coffee has been considered one of the best soft coffees in the world, and this product has traditionally been of great importance for Colombian exports. Currently, 14,000,000 bags of 60 kg are exported every year to the USA, Japan, and Germany, among other countries [13].
In terms of employment generation and income distribution, coffee growing is a sector with superlative relevance for local economies and the maintenance of the social fabric in many regions of the country. For this reason, it is justified to contribute by solutions that strengthen the profitability of families engaged in this activity and improve their life quality, either by increasing the selling price of the product, reducing production costs, or increasing the number of units produced per unit of cultivated area.
Among the main threats for strengthening the coffee growing families' profitability, nutritional deficiencies and phytosanitary problems stand out. Phytosanitary problems are caused by pests such as the coffee borer beetle and diseases such as CLR, whose proliferation increases due to the drastic climate changes (from long drought periods to extended rainy seasons) that occur in Colombia. In the case of CLR, when the climatic conditions are unfavorable and the agronomic management deficient, at least 20% of the total expected harvest is not able to be collected. Additionally, the quality of coffee deteriorates dramatically, reducing the marketing price and increasing the costs associated with its control [10]. In extreme cases of CLR, the disease has caused devastating losses that have represented between 70% and 80% of the total harvest.
Although it is a disease with vertiginous spread and highly negative repercussions for the coffee farmers' economy, its detection and diagnosis are carried out using visual inspection while walking through the crops. This method refers to the recognition of plant diseases using visual inspections, development scales, and standard severity diagrams for their measurement [14]. People in charge of the crops walk through them, watching and touching the plants to identify symptoms associated with the particular disease that produces them and calculate infection levels [15].
Unfortunately, because the process consists of a visual inspection, which is not done with enough regularity, most of the time, the detection of the development stage of the disease is late, its control becomes more difficult, and considerable economic losses are inevitable.

State-of-the-Art
Plenty of research has been done on applying technological methods and strategies to diagnose diseases [16], to detect pests [17], and to obtain nutritional information [18], among other objectives, for different types of crops. The phytosanitary status of the plantations is closely related to different crucial factors in their ecosystem, such as weather, altitude, and type of soil, among others. Therefore, several biological and engineering studies aim to implement practical solutions based on these factors to improve farming techniques to preserve healthy crops.
The most commonly used methods for monitoring the phytosanitary status efficiently, including those that make use of technology, are: (i) Remote Sensing (RS), (ii) visual detection, (iii) biological intervention, (iv) Wireless Sensor Networks (WSN), and (v) Machine Learning (ML) supported on a source of data. Thus, this work is intended to present recent relevant studies based on the mentioned methods for detecting anomalies on the plantations.

(i) Remote Sensing (RS)
RS is based on the interaction of electromagnetic radiation with any material. In the case of agriculture, it involves the non-contact measurement of the reflected radiation from soil or plants to assess different attributes such as the Leaf Area Index (LAI), chlorophyll content, water stress, weed density, and crop nutrients, among others. Those measurements can be made using satellites, aircraft, drones, tractors, and hand held sensors [19]. In addition to measuring reflected radiation, there are two other RS techniques that analyze fluorescent and thermal energy emitted by the leaves. However, the most common technique is reflectance, because the amount of reflected radiation from the plants is inversely related to the radiation absorbed by their pigments, and this can serve as an indicator of their health status [19]. RS helps the indirect detection of problems in agricultural fields since this method captures unusual behaviors in crops' reflectance, which can be caused by factors like nutritional deficiencies, pests and diseases, and water stress. In 2017, Calvario et al. [20] monitored agave crops using Unmanned Aerial Vehicles (UAVs) and integrating RS with unsupervised machine learning (k-means) to classify agave plants and weed. In 2003, Goel et al. [21] studied the detection of changes in the spectral response in corn (Zea mays) due to nitrogen application rates and weed control. For that purpose, the researchers employed a hyperspectral sensor called the Compact Airborne Spectrographic Imager (CASI) and analyzed the reflectance values of 72 bands with a wavelength between 409 and 947 nm, which comprise part of the visible and Near-Infrared (NIR) regions of the electromagnetic spectrum. The obtained results demonstrated the potential of detecting weed infestations and nitrogen stress using the hyperspectral sensor CASI. Specifically, the researchers found that the best fitting bands for the detection were the wavelength regions near 498 nm and 671 nm, respectively, as seen in Figure 2.
It has been shown that using satellites' multispectral images, it is possible to detect the location of crops [22], but the resolution of satellites images does not allow early detection of the phytosanitary of individual lots of plants. Regarding the phytosanitary status of the plants, the water and the type of soil are two components that play an essential role in their health. In 2017, Bolaños et al. [23] proposed a characterization method using the visible and infrared spectrum to identify these components, through low cost cameras with two different filters, Roscolux #19 and Roscolux #2007, and a multi-rotor air vehicle. Through this method and using portable and highly qualified devices, those hard-to-reach places were monitored and analyzed to detect anomalies that may cause diseases in the crop. This monitored phase provided a characterization of the Normalized Difference Vegetation Index (NDVI), as seen in the example of Figure 3, which was used to categorize essential characteristics of the crop, such as crop health, diseased plants or soil, and water or others.  In 2017, Chemura et al. proposed a method to predict the presence of diseases and pests early among coffee trees based on unnoticeable water stress. For that purpose, multispectral scanners with filters with wavebands from the visual spectrum and near infrared region were placed on a UAV [25]. The wavebands scanner results showed inflections points between the regions 430 nm and 705-735 nm due to the water content in coffee trees.These results underlined the importance of a suitable irrigation plan according to the water requirements of the trees, causing an improvement in productivity. Although the later region indicated relevant values, the waveband of 430 nm was the most relevant band of remote sensing for predicting the water plant content directly related to its stress. However, in [25], the authors remarked that although the results were promising, there were some missing valid components that could allow the model to be suitable and testable in real conditions. For that purpose, they recommended using hyperspectral cameras, which provide more precise measured waveband results.

(ii) Visual Detection
The detection of visual symptoms uses the changes in the plant's appearance (colors, forms, lesions, spots) as an indicator of it being attacked by a disease or pest [15]. In the survey of Hamuda et al. [26], image based plant segmentation, which is the process of classifying an image into plant and non-plant, was used for detecting diseases in plants [27]. For instance, for the evaluation of the CLR's infection percentage in a specific lot, the number of diseased leaves in 60 random trees had to be divided into the total number of leaves in those trees and multiplied by 100 (see Equation (1)). A leaf is considered diseased with CLR when chlorotic spots or orange dust are observed on it. The severity of the disease can be divided into five categories depending on the number and diameter of rust orange spots, as seen in Figure 4.  . CLR development stages [28].
A visual inspection can be carried out to detect the presence of chlorotic spots on the leaves, which are then used for measuring the incidence and severity of the disease [10].
To understand the conditions conducive to the development of CLR and, subsequently, refine the disease control, Avelino et al. [29] monitored such development on 73 coffee crops in Honduras for 1-3 years. Thereby, through the analysis of production situation variables such as climate, soil components, coffee tree productive characteristics, and crop management patterns, the researchers aimed to establish a relationship with the presence of rust. The result of this research indicated that CLR epidemics depend on the diverse production situations based on Table 1, linked as well to the local conditions of the plantation. Due to the above, these results reflect the need for the consideration of a certified growing system that aims for sustainability, taking into consideration production situations and, thus, preventing the development of pests and diseases.  [29].

Kind of Variable Relevance
Climate variation (Altitude and rainfall) High Soil components Medium-low Cropping practices Medium Coffee tree productive characteristics High (iii) Biological Intervention Several authors stated the importance of the relationship between living beings sharing the same environment. One of them was Haddad et al. [30], who in 2009, proposed a study to determine if seven selected isolated bacteria under greenhouse conditions would efficiently detect and control CLR. For the development of this research, they inoculated these bacteria: six Bacillus sp., B10, B25, B157, B175, B205, and B281, and one Pseudomonas sp., P286, which help to detect and control CLR in the early development stages, according to a preliminary result presented by Haddad et al. (2007). For the experiment, two important coffee varieties, Mundo Novo and Catuai, were selected due to the high susceptibility to CLR. Therefore, for three years, the varieties with the disease interacted with different treatments (bacteria) to analyze the behavior evolution between them. Based on the results of the treatments, the isolates P286 and B157 were as efficient as the copper fungicide in controlling the rust. Hence, considering the harmful effects due to the copper fungicide, the application of biological control with the B157 isolate of Bacillus sp. may be a reliable alternative solution to CLR management. That is why this research displayed the opportunity to successfully biocontrol CLR, for specialty coffee growers.
Jackson et al. [31], in 2012, proposed as well a biological detection and control based on a fungus, Lecanicillium lecanii. Their primary interest in the crops, in general, was the analogy of the coexistence of organisms in a specific environment with defined conditions that encounter a perfect balance. Given the above, the biological control system of the A. instabilis ants were mutualistically associated with the white halos of the fungus, Lecanicillium lecanii, based on the CLR effect.
However, the hypothesis stated the possibility that spores from Lecanicillium lecanii help to attack the Hemileia vastatrix before the rainy season. The effect of the time delay of Lecanicillium lecanii in Hemileia vastatrix resulted in a relationship between the two fungi and the ants not to be demonstrated, in spite of the control experiment resembling the real world. In conclusion, the restriction of biotic factors directly affects the development of CLR; therefore, for future work, it is important to consider the climate variation of an ecosystem to be able to predict such development [31].
(iv) Wireless Sensor Networks (WSN) Wireless Sensor Networks (WSN) are a technology that is being used in many countries worldwide to monitor different agricultural characteristics in real time and remotely. It consists of multiple non-assisted embedded devices, called sensor nodes, that collect data in the field and communicate them wirelessly to a centralized processing station, which is known as the Base Station (BS). The BS has data storage, data processing, and data fusion capabilities, and it is in charge of transmitting the received data to the Internet to present them to an end-user [32]. Once the collected data are stored on a central server on the Internet, further analysis, processing, and visualization techniques are applied to extract valuable information and hidden correlations, which can help to detect changes in crop characteristics. These changes could be used as indicators of phytosanitary problems such as nutritional deficiencies, pests, diseases, and water stress. WSN is a powerful technology since the information of remote and inaccessible physical environments can be easily accessed through the Internet, with the help of the cooperative and constant monitoring of multiple sensors [33]. The sensor nodes in a WSN setup can vary in terms of their functions. Some of them can serve as simple data collectors that monitor a single physical phenomenon, while more powerful nodes may also perform more complex processing and aggregation operations. Some sensors can even have GPS modules that help them determine their particular location with high accuracy [33]. The most common sensors used in WSN for agriculture are the ones that collect climate data, images, and frequencies. Chaudhary et al. [34] emphasized in 2011 the importance of WSN in the field of PA by monitoring and controlling different critical parameters in a greenhouse through a microcontroller technology called Programmable System on a Chip (PSoC). As a consequence of the disproportionate rainfall dynamics, the need for controlling a suitable water distribution meeting those parameters inside the greenhouse arises. Thereby, the study tested the integration of wireless sensor node structures., with high bandwidth spectrum telecommunication technology. Mainly, it was proven that the integration was useful to determine an ideal irrigation plan that met the specific needs of a crop based on the interaction of the nodes within the greenhouse. Furthermore, the researchers recommended using reliable hardware with low current consumption to develop WSN projects, because it generates more confidence for the farmers concerning its incorporation with their crops and provides a longer battery life.
Besides, Piamonte et al. [35] proposed in 2017 a WSN prototype for monitoring the bud rot of the African oil palm. With the use of pH, humidity, temperature, and luminosity sensors, they aimed to measure climate variations and edaphic (related to the soil) factors to detect the presence of the fungus that causes the disease indirectly.
(v) Machine Learning The domain concerned with building intelligent machines that can perform specific tasks just like a human is called Artificial Intelligence (AI) [36]. One of the main subareas of AI is Machine Learning (ML), which aims to extract complex patterns from large amounts of raw data automatically to predict future behaviors. When the extracting process of those patterns is taken to a more detailed level, where computers learn complicated real-world concepts by building them out of simpler ones in a hierarchical way, ML enters one of its most relevant subsets: Deep Learning (DL) [37]. The functionality of DL is an attempt to mimic the activity in layers of neurons in the human brain. The central structure that DL uses is called an Artificial Neural Network (ANN), which is composed of multiple layers of neurons and weighted connections between them. The neurons are excitable units that transform information, whereas the connections are in charge of rescaling the output of one layer of neurons and transmitting it to the next one to serve as its input [38]. Inputting data such as images, videos, sound, and text through the ANN, DL builds hierarchical structures and levels of representation and abstraction that enable the identification of underlying patterns [36]. One application of finding patterns through DL can be for estimating plant characteristics using non-invasive methodologies by means of digital images and machine learning. Sulistyo et al. [39] presented a computational intelligence vision sensing approach that estimated nutrient content in wheat leaves. This approach analyzed color features of the leaves' images captured in the field with different lighting conditions to estimate nitrogen content in wheat leaves. Another work of Sulistyo et al. [40] proposed a method to detect nitrogen content in wheat leaves by using color constancy with neural networks' fusion and a genetic algorithm that normalized plant images due to different sunlight intensities. Sulistyo et al. [41] also developed a method for extracting statistical features from wheat plant images, more specifically to estimate the nitrogen content in real context environments that can have variations in light intensities. This work provided a robust method for image segmentation using deep layer multilayer perceptron to remove complex backgrounds and used genetic algorithms to fine tune the color normalization. The output of the system after image segmentation and color normalization was then used as an input to several standard multi-layer perceptrons with different hidden layer nodes, which then combined their outputs using a simple and weighted averaging method. Fuentes et al. [42] presented a robust deep learning based detector to classify in real-time different types of diseases and pests in tomatoes. For such a task, the detector used images from RGB cameras (multiple resolutions and different devices such as mobile phones or digital cameras). This method detected if the crop had a disease or pest and which type it was. Similarly, Picon et al. [43] developed an automatic deep residual neural network algorithm to detect multiple plant diseases in real time, using mobile devices' cameras as the input source. The algorithm was capable of detecting three types of diseases on wheat crops: (i) Septoria (Septoria tritici), (ii) tan spot (Drechslera tritici-repentis), and (iii) rust (Puccinia striiformis and Puccinia recondita). Related to CLR, research has been done, such as that by Chemura et al. [44], who evaluated the potential of Sentinel-2 bands to detect the CLR infection levels early due to its devastating rates. Through the employment of the Random Forest (RF) and Partial Least Squares Discriminant Analysis (PLS-DA) algorithms, such levels could be identified for early CLR management. The researchers employed the variety of Yellow Catuai, which was chosen due to its CLR susceptibility. In this matter, Chemura et al. considered only seven Sentinel-2 Multispectral Instrument (MSI) bands due to the high resolution stated by previous works in biological studies. Based on the selected bands, the research results determined that the CLR reflectance was higher in NIR regions of the spectrum, as could be seen in leaves from the bands B4 (665 nm), B5 (705 nm), and B6 (740 nm). These bands achieved a high overall CLR discrimination of 28.5% and 71.4% using the RF and PLS-DA algorithms respectively. Thus, the band and vegetation indices derived from the MSI of Sentinel-2 achieved the detection of the disease and an evaluation of CLR in the early stages, avoiding unnecessary chemical protection in healthy trees.
In 2017, Chemura et al. [45] studied the detection of CLR through the reflectance of the leaves at specific electromagnetic wavelengths. The objective of their investigation was to assess the utility of the wavebands used by the Sentinel-2 Multispectral Imager in detection models. The models were created using Partial Least Squares Regression (PLSR) and the non-linear Radial Basis Function partial Least Squares Regression (RBF-PLS) machine learning algorithm. Then, both models were compared, resulting in a low accuracy prediction of the state of the disease for the PLSR, due to its over-fitting, and a high accuracy prediction for the RBS-PLS model. Additionally, Chemura et al., through weighting of the importance of the variables, found that the blue, red, and RE1 bands had a high model correlation, but the implementation excluding the remaining four bands led to lower accuracy in both models. On the other hand, if more than one NIR or red edge (RE) band were used, then the RBS-PLS model developed would over-fit, resulting in a non-transferable model. However, Chemura et al. emphasized the utilization of the RBS-PLS model due to its machine learning advantage and its excellent adaptation to possible model over-fitting.

Conclusions of the Literature Review
The presented state-of-the-art showed that several researchers sought the detection of any vital element like water stress, nitrogen levels, and vegetation indexes that could lead to an improvement of production and quality in crops, which translated to an increase in profitability. However, most of the research did not integrate different means of detecting CLR to have more insights and better accuracy in predicting this disease. Furthermore, the determination of the infection percentage of the crop through visual inspection is a tedious task, which is also laborious, time consuming, and subject to human error and inconsistency [46]. For this reason, the objective of this research is to evaluate to what extent it is possible to diagnose the CLR development stage in the Colombian Caturra variety (the most susceptible to the disease) through a technological integration system that involves Remote Sensing (RS), Wireless Sensor Networks (WSN), and Deep Learning (DL). Adequate management of CLR could preserve the quality and selling price of the final product, reduce production costs by rationing control costs, and protect productivity. The present research aims to facilitate the management of the most dangerous disease in the Colombian Caturra variety's coffee production to strengthen the profitability of the rural inhabitants.
The present work provides empirical evidence of a novel diagnostic method for the classification of the development stage of CLR in coffee crops, by means of a technological integration of image data (RS), WSN, and DL. This contribution allows coffee growers to detect CLR disease automatically, thus optimizing the production and maintenance of their crops and replacing the task of manual inspection. Through this method, the performance evaluation is done, and the results are presented to conclude to what extent it is possible to diagnose CLR disease. Thus, this information can be useful for coffee growers to determine if the integration of RS, WSN, and DL in our method could positively impact their profitability.

Proposed Method
The design of experiment implemented in this research was a Completely Randomized Design (CRD). It was used to compare two or more treatments considering only two sources of variability: treatments and random error. The objective of using this design of experiment in this project was to analyze whether the diagnosis of the CLR development stage through the integration of RS, WSN, and DL was similar to the one made with a traditional visual inspection. A summarized diagram of this process is shown in Figure 5. In that sense, the study factor was the type of inspection, which had two levels ("visual inspection" and "technological integration"), and the response variable was the development stage of the disease, which was a whole number between 0 and 4. Thereby, the fundamental hypothesis to prove, presented in Equation (2), helped by deciding whether Treatments 1 ("visual inspection") and 2 ("technological integration") were statistically equivalent with respect to their means [47].
The procedure for proving the mentioned hypothesis is called Analysis Of Variance (ANOVA) and required a data table containing a row for each observation and a column for each treatment indicating the measurements of the response variable. This procedure separated the variability due to the treatments from the one attributed to the random error and compared them. If the former was higher than the latter, the means of the treatments were different, and thus, the type of diagnosis influenced the determined CLR development stage. Otherwise, the means were statistically equivalent, and it was possible to conclude that the visual inspection and the technological integration were similar for diagnosing the disease. Lastly, it is essential to mention that the significance level that was used for proving the hypothesis was 10% (α = 0.1), since the problem at hand was related to agriculture, where many noise factors associated with the variation of environmental conditions were involved [47].
For the data collection experiment, 16 six month old, healthy coffee plants coming from Jardín, Antioquia, were used. Those plants were stored in a Universidad EAFIT's greenhouse. A biology team was in charge of their transplantation, agronomic management (elimination of weeds, fertilization, and fumigation), inoculation, and supervision. For the inoculation, the biology team followed the process described in Chemura et al. [44]. It is relevant to clarify that a new group of diseased plants was held as a reserve in case the inoculation of the healthy plants did not take effect over time.
Furthermore, an engineering team was dedicated to the design and assembly of a system, in the same greenhouse, that integrated RS and WSN. It allowed building a scale crop, recording different characteristics of it regularly, and storing them on a remote server to analyze its phytosanitary status later using DL. In that way, once the plants were inoculated and the system was verified, they were transplanted to it so that the data collection may begin. For that purpose, the scale crop was divided into four lots with certain differences in their agronomic management, which sought to recreate various circumstances of a real coffee crop. Thereby, a greater number of scenarios were covered, and the false positive rate regarding the diagnosis was reduced. LOT 1 contained four non-inoculated plants, and they were neither fertilized nor fumigated; LOT 2 had four non-inoculated plants and was fertilized but not fumigated; LOT 3 had four inoculated plants, and they were also fertilized but not fumigated; and LOT 4 had four inoculated plants, and they were neither fertilized nor fumigated. The previous distribution can be seen in Figure 6.
Finally, the visual inspections for diagnosis of the CLR development stage were carried out by the biology team for three months. Once per day, one of them examined the severity of the disease for each lot and indicated the value of the response variable for each observation; this measure corresponded to the ground truth. Similarly, the technological system automatically recorded the scale crop's characteristics from each lot seven times per day at different moments (with and without sunlight, because the field sensors and cameras had different illuminance requirements), assigning to each of these samples the above mentioned daily ground truth. After the data collection phase finished, the diagnostic model using DL was generated, and a comparative data table for the statistical analysis was produced, based on its predictions and the results of the visual inspections. As it was expected that a considerable amount of observations would be made, only 25% of all collected data were used for the statistical study. It should also be noted that, as was recommended, the order of the table's entries were randomized before executing the analysis in order to minimize bias.

Experimental Testbed
To evaluate to what extent it was possible to diagnose the CLR development stage in the Colombian Caturra variety through the integration of RS, WSN, and DL, it was necessary to obtain empirical evidence employing an experiment. Therefore, an experimental testbed prototype was built, which included a scale coffee crop. This testbed was capable of simulating different agronomic conditions and allowed capturing data for diagnosing the disease. The experimental testbed consisted of a data collection system prototype that integrated remote sensing and wireless sensor networks. In this testbed, the coffee plants were grouped, combined with the soil, and then divided into four lots. Furthermore, they were separated to inoculate CLR in half of them, and after that, the four lots were assembled again. For their agronomic management, fertilizer and fungicide were distributed and incorporated. Then, each lot was isolated from the others to make the four lots independent, and the whole scale crop was combined with a rain emulation system and a wind system. Both rainfall and wind speed for the whole crop were perceived. Furthermore, using sensors in each lot, pH, illuminance, temperature, humidity, and electrical conductivity were perceived, which will be further called "sensor data", and RGB and multispectral images were captured. RGB pictures were acquired through a regular RGB camera with a resolution of 720 p. These cameras were positioned on the bottom of the plants since CLR was commonly visible at the underside of the leaf [10]. Regarding the multispectral cameras, which allowed capturing the reflected radiation of wavelengths that were not perceptible to the human eye, two cameras from MAPIR R , called Survey3, were used. Based on the information cited in the state-of-the-art [21,23,25], the Red + Green + NIR (RGN) and Red Edge (RE) camera filters were chosen as being suitable to identify crop diseases, including CLR. Thus, one camera centered in the wavelengths 660 nm-550 nm-850 nm and another one centered in the 735 nm wavelength were selected to capture images from the top of the plants. The Survey3 incorporated a Sony R Exmor R IMX117 12MP sensor and a sharp non-fish eye lens for perceiving light in specific wavelengths. The created experimental testbed is shown in Figure 7. Afterwards, the state of each lot was integrated with the expert's visual inspection information to diagnose the CLR development stage, and then, this information was clustered with the collected data. To finish the data collection process, data were stored locally and sent to a remote server over the Internet.
On the other hand, the data that were received on the remote server were preprocessed for cleaning purposes and stored in a remote database. An example of the LOT 3 directory's content on the remote server after one data collection routine was concluded is presented in Figure 8. To clarify how a data collection routine worked, Figure 9 details the whole pipeline from the sensor readings and image captures until the remote storage. The data from sensors were gathered and smoothed by a microcontroller. RGB and multispectral images were captured by the cameras. The totality of the data was collected by a Single Board Computer (SBC), which continually notified the progress to the Internet of Things (IoT) platform (see Figure A3 inside Appendix C for the IoT platform dashboard) while it created a single data package. The package containing the documents with the lots and general data, as well as the images was stored locally. Furthermore, the documents were inserted into the remote MongoDB R , which resided in the data center, and the entire data package was uploaded via Secure File Transfer Protocol (SFTP) to the data center's file system. At that point, the data collection routine finished.
Finally, it is also relevant to mention how the collected data can be reviewed so that the process can be verified. Using a personal computer, the IoT platform, the single board computer, and the data center could be accessed over the Internet. The access to the IoT platform required a web browser, while the single board computer and the data center could be remotely inspected through the graphical desktop sharing system Virtual Network Computing (VNC) or the cryptographic network protocol Secure Shell (SSH).

Machine Learning Pipeline
To create an adequate model for diagnosing the CLR development stage, the stored data were first divided into two sets, namely training (with cross-validation) and test. The training set was processed to build the diagnostic model with cross-validation, which served to assess its intermediate performance and tune it. Once the diagnostic model was generated, the test set was used for evaluating its final performance. All the developed models and cloud storage were implemented using an academic data center.
Within the framework of this project, the data center was used to store the data collected remotely on the physical part of the prototype. Both the MongoDB R instance in it, as well as its file system made the replication of single-board computer's local storage possible and facilitated the ubiquitous access to that information. Furthermore, the data center was the place where the data preprocessing, model generation and CLR development stage diagnosis occurred. It is also relevant to mention that the software libraries used for the implementation were Python 3.6.0, NumPy 1. 16 [52].
The machine learning pipeline model to show how the collected data were manipulated to extract the model that was used to diagnose the development stage of the disease in question is shown in Figure 10.
This pipeline model initially consisted of four sub-directories ranging from LOT 1 to LOT 4 where each lot's data would be correspondingly labeled later on. For that purpose, the biology team determined the labels by carrying out visual inspections in the field on all plants once a day during the whole data collection phase. In that sense, it assigned a whole number between 0 and 4 to each plant on each lot, evaluating the plant leaves' severity level, and calculated the specific lot's label as the rounded average of its four plants' disease development stages. All data directories of the current day and corresponding lot were labeled with the value of the last visual inspection, which was determined in the most recent checkup.
Subsequently, a new rgb_images directory containing five sub-directories (ranging from 0 to 4) representing the diseases' five stages was created. In these five sub-directories, RGB images coming from all lots (LOT 1 to LOT 4) were correspondingly stored according to their label. Similarly, the sensor data, which were stored as a JavaScript Object Notation File (JSON), and the multispectral images had the same label as the RGB images belonging to the same lot. Furthermore, in the case of images in general, they were visually checked one-by-one to keep only the ones with valuable content (focus, brightness level) and remove the others. In addition to this, a script was executed to eliminate the irrelevant JSON files (those with missing values and outliers), as well as the sub-directories that ended up with no content. The last two actions were part of the depuration stage. In the end, five sub-directories would exist containing the data from all lots (LOT 1 to LOT 4) adequately labeled. Those sub-directories were the ones that were used for the generation and final evaluation of the diagnostic model, taking into account that the diagnosis occurred at the lot level. Once all the data were correctly distributed, the content of each of the five sub-directories was virtually shuffled, and the elements per file type were counted for every sub-directory. Then, per label sub-directory, the minimum of those values was found. Twenty-five percent was calculated, and the file type associated with that minimum was determined. The resulting numbers indicated the number of files per respective determined file type and per label that could be used, at most, for testing the diagnostic model. Taking those threshold numbers into account, the shuffled lot data directories within each label subdirectory were individually analyzed to split them into two groups, namely training and test sets. If a particular lot;s data directory was considered as complete (i.e., it had a JSON file, the two multispectral images, and at least one RGB image) and supposing that using its files for testing did not exceed the corresponding threshold, then it was copied under the same structure to another location in order to feed the test set. Otherwise, the lot data directory was also copied, but to grow the training set. Thereby, the training set (∼75% of all data) was used to train and tweak the model, while the test set (∼25% of all data), with no overlapping with training set, was only incorporated at the time of the diagnosis evaluation. The data distribution after the above mentioned process was importantly imbalanced, as seen in Table 2. It can be noted that Stage 1 was not included in the table. This was due to the fact that only one sample was identified in that stage. Consequently, it could not be used for the model construction, and therefore, it could be considered as not relevant.  Training  711  55  90  112  968  Test  149  12  18  23  202 After the two sets were correctly obtained, one submodel was generated for each file type, i.e., sensor data (JSON), RGB, RGN, and RE. For the JSON files, Multi-Layer Perceptron (MLP) was used, whereas Convolutional Neural Networks (CNNs) were implemented to classify the RGB, RGN, and RE images. For that purpose, the data in the training set were first divided into four subdirectories according to the file type, while preserving the same structure. Then, each of them was preprocessed so that the noise was removed from the images, and the irrelevant keys in the documents were also identified and eliminated. Figure 11 illustrates an example of preprocessed image files. After that, the corresponding data were loaded within each submodel's generation, divided into feature data (the files themselves) and label data (the names of the label subdirectories that contained the files), and permuted. Thereby, the data were randomly mixed while it was still possible to know each feature's respective label unequivocally. Then, if applicable, the data were normalized and structured to scale the input and format, as was recommended when using deep ANNs. The normalization used for this experiment was the z-score (subtracting the mean of the feature and dividing by its standard deviation), which scaled the data to have the properties of a standard normal distribution [53]. Upon having the data prepared, different architectures and hyperparameter values were tried to train the submodel to tune it to reach higher performance values on the predictions.

# of Samples Stage 0 Stage 2 Stage 3 Stage 4 Total
The technique used for tuning the submodel is called grid search with cross-validation. It consisted of executing an exhaustive search over specified hyperparameter values for an estimator to find out which combination achieved the best performance, which was by default the higher accuracy, but different metrics could be chosen. One candidate estimator for each combination of hyperparameters was built and evaluated, so that the best estimator, its attributes, and its average performance could be extracted once the search was complete [54]. Furthermore, the procedure for measuring the average performance of each candidate estimator during the generation of the submodel is called k-Fold Cross-Validation, where k separate learning experiments are run on the the same estimator to calculate k performance values and average them. To achieve this, the feature and label data were split at the beginning into k non-overlapping subsets (also known as "folds"), so that for every experiment, one different fold was kept for measuring the performance, whereas the remaining k − 1 were put together to form the training set to fit the estimator [55]. Finally, when the grid search processes concluded, the four submodels were extracted and saved for the definitive diagnosis about the CLR development stage.
To select the best estimator during the grid search with cross-validation, the chosen metric was the F 1 -score, which, in the multi-label case, was the weighted average of the labels' F 1 -scores. This metric was used due to the importantly imbalanced dataset (skewed classes) between the development stages of the CLR, as seen in Table 2. The F 1 -score of the label L is a value in the [0, 1] range, and it was calculated as the harmonic mean of the estimator's precision and recall with respect to L (see Equation (3)). The precision with respect to L is the ratio of the number of times that L was correctly predicted to the overall number of times that L was predicted. Furthermore, the recall with respect to L is the ratio of the number of times that L was correctly predicted to the overall number of times that L should have been predicted. Thereby, the general F 1 -score reaches its best value at 1, indicating that the estimator perfectly matched reality, and its worst at 0, showing that the estimator never coincided with reality [53].
At this point, the data that were kept to be only incorporated at the time of the diagnosis evaluation were brought up. First, the submodels were loaded. Then, each lot's data directory contained in the test set was submitted to the following process. At the beginning, its data were divided according to the file type. After that, each type was sent to its corresponding submodel, where it was first cleaned, normalized, and structured, applying the same particular procedures that were used to prepare the data for the submodel generation. Subsequently, the submodel made its prediction based on the trained model output. It is also relevant to mention that, considering that the diagnosis was made at the lot level, the RGB submodel could be used up to four times per lot data directory before retrieving its result (which was the rounded average of its predictions). The final step consisted of combining the outcomes of the four submodels and calculating their rounded weighted average, the weights being the respective F 1 -scores. Thereby, the definitive lot's CLR diagnosis was obtained, and it was recorded along with the processed lot's data directory label. Once the whole test set was covered, a table showing comparative results was generated for the statistical analysis, and the performance reached by the composite model was assessed with the calculation of the F 1 -score. Figure A2 from Appendix B illustrates the above machine learning pipeline in a detailed manner, and Table 3 shows the selected hyperparameters and obtained F 1 -score for each of them. Tables A1, A2 and A3 from Appendix D details the architectures of the submodels. The last step of the proposed ML pipeline consisted of integrating the four presented submodels and evaluating the composite model, i.e., diagnosing the CLR development stage through it, creating a comparative table with the results achieved and calculating the model's performance. For that purpose, a model evaluator script was implemented. This script was in charge of loading the submodels into memory, iterating over the whole test set, taking each lot data directory within it, dividing the contained files according to their type and preprocessing them, resizing them to reduce the spatial complexity (in the case of images), normalizing and structuring each file according to the submodels' expected input, and sending them to their corresponding submodel to get a prediction. In addition, the script allowed gathering the four predicted labels and calculating their rounded weighted average, since the generated submodels presented different performances for diagnosing the CLR development stage. Table 4 shows the weights for the predictions of each submodel, which were determined as the ratio of each F 1 -score in Table 3 with respect to the sum of all F 1 -scores. To further explain the weighted average, let us assume that a sample folder with all the collected data (sensor data, RGB, RGN, and RE images) was labeled as CLR Development Stage 2 (Label = 2). Then, these data inside this folder were fed into the developed submodels (sensor data, RGB, RGN, and RE submodels) which produced an output based on their trained model. Let us assume that the sensor data submodel classified this as 0, the RGB submodel as 3, the RGN submodel as 2, and the RE submodel as 2. Then, considering the weights from Table 4, the averaged development stage would be approximately 1.90. Then, rounding this value up, the final output of the ML pipeline would be DevelopmentStage = 2. This example is shown in Figure 12.

Results
The results of this experiment were a composite trained model with an F 1 -score of 0.775. This model was tested using ANOVA to prove the validity of the hypothesis presented in Section 2, with respect to the visual inspection and our proposal using the technological integration methods. The p-value obtained was 0.231, which was greater than the significance α = 0.1. This result indicated that the proposed method for automatically detecting the CLR disease presented an equivalent performance compared to the manual/visual inspection method (the ANOVA test will be further discussed in Section 3.1). All the inputs for the obtained results are detailed below.
On the one hand, it must be mentioned that, during the data collection phase, the biology team had to replace 12 coffee plants of the scale crop with external diseased ones because the inoculation did not take effect after two months (all plants stayed in Development Stage 0).
On the other hand, the training set used for fitting the submodels was composed of 968 directories. In total, they contained 672 sensor data (JSON) Figure  A1 from Appendix A shows the comparative table for the statistical analysis. Table 5 presents the definitive F 1 -score reached by each submodel and the composite model.

Analysis of the Results
Statistical analysis of the results regarding the performance evaluation of the diagnostic model was carried out using the comparative table found in Figure A1 from Appendix A. The purpose of the analysis was to determine whether there was a significant difference in the mean CLR development stage diagnosed with a visual inspection and using the proposed technological integration. The outcome was relevant to get the necessary statistical support for answering the research question.
The comparative table contained 202 observations with the corresponding diagnosed development stage for both treatments. Figure 13 shows the box plot chart describing the measurements. The x-axis contains the two treatments ("visual inspection" and "technological integration"), whereas the CLR development stage is presented on the y-axis. The graphical similarity of the data distribution of each treatment suggested a possible similarity to the means of the response variable. To assess this condition and make a decision based on the hypothesis, an ANOVA was executed. The results of the ANOVA can be seen in Table 6. The obtained p-value for the treatments factor was 0.231. This value was greater than the set significance (α = 0.1), which meant that there was not sufficient evidence for rejecting the null hypothesis. Thus, it was concluded, with 90% confidence, that there was no statistically significant difference between the diagnosis of the CLR development stage made by using visual inspection and the technological integration. This result indicated that both methods were significantly similar to diagnose the disease.
This research demonstrated the feasibility of diagnosing the CLR development stage in the Colombian Caturra variety, with significant performance, through the integration of RS, WSN, and DL. The analysis of the results allowed obtaining statistical evidence for supporting the research hypothesis. In that sense, the outcome suggested that a technological integration could contribute to the protection of the phytosanitary status of coffee crops since it showed potential for complementing the traditional visual inspections towards the diagnosis of the most economically limiting disease for Colombian coffee production.

Conclusions
The integration of RS, WSN, and DL within the framework of this study successfully allowed evaluating to what extent it was possible to diagnose the CLR development stage in the Colombian Caturra variety. To this end, the most relevant information obtained was consolidated, the knowledge about the study context and CLR was detailed, and the repercussions of the disease in the Colombian coffee growing industry were identified. Furthermore, the state-of-the-art methods were reviewed and used for the current research. Creative design sessions were carried out to define the most useful technological integration of RS and WSN. Afterward, a functional prototype that automatically collected data in the field and transferred them to a remote server over the Internet was built. Besides, a diagnostic model using DL was implemented based on the stored data, and it successfully allowed evaluating the CLR development stage with unknown field data.
The motivation of this research project was to contribute to rural development through technological innovation to strengthen the profitability of Colombian coffee growers. Considering that the country has the potential, in terms of environmental conditions and diverse ecosystems, to generate a giant portfolio of exotic products that would be better valued in the specialty coffee market, this research evaluated, with empirical evidence, a technological approach that attempted to facilitate the diagnosis and mitigate the risks of one of the most economically limiting diseases for coffee production. In that sense, the proposed technological integration could positively impact the rural sector since those innovations promote investments in infrastructure, which are crucial to empower the rural community and improve the living standards and activities concerning progress, productivity, and income generation.
The obtained p-value in the analysis of the results was 0.231, which helped to determine, with 90% confidence, that the visual inspection and the technological integration did not present a statistically significant difference regarding the diagnosis of the CLR development stage. Thus, it could be said that the assessment of the disease led to a similar outcome using either method, which suggested that the obtained results supported the research hypothesis. Finally, it could be asserted that through the integration of RS, WSN, and DL, it was possible to diagnose the CLR development stage in the Colombian Caturra variety with a F 1 -score of 0.775. This value indicated that, on average, the diagnostic model was excellent in terms of the certainty and usefulness of its diagnosis.
Regarding the data processing phase, a further extension of this research could include the implementation of a simple user interface for visualizing the diagnosis of the CLR development stage through the generated DL model to better illustrate the results to a coffee grower. Additionally, the proposed technological integration could be scaled to a real context by using drones with one or both of the two multispectral cameras used in the experiment presented by this work (depending on the project budget) as a possible approach, knowing that the identification of the CLR could be done with just one camera, e.g., RGN (F 1 -score of 0.946), due to its high score. Another real context approach could be further explored using a mobile autonomous robot with a single RGB camera. Finally, the F 1 -score values achieved on the test set, which showed that the submodels based on images presented a higher performance than the JSON submodel (sensor data model), suggested reconsidering the composite model for future work and focusing all efforts on improving the collection and processing of just RGB and multispectral data or using more robust sensors when the technology allows it; by using just the three submodels (RGB, RGN, and RE), we computed an average F 1 -score of 0.93, which clearly showed that an improved composite F 1 -score could be surely achieved, but a real context commercial application may only implement one of the best three previous submodels due to both implementation and maintenance costs.
Author Contributions: A.S. and S.S. designed and implemented the experimental testbed and algorithms for integrating WSN, RS, and DL for CLR detection. D.V. and M.T. supervised the experimental design and managed the project. M.M. and B.S. reviewed the machine learning and deep learning parts of this research. All authors contributed to the writing and reviewing of the present manuscript. All authors read and agreed to the published version of the manuscript.
Funding: This research was funded by Universidad EAFIT internal research projects and Colciencias "Jovenes Investigadores e Innovadores por la paz 2017".

Acknowledgments:
We would like to thank Engineer Carlos Mario Ospina for his support during the visit to CENICAFÉ's Experimental Station "El Rosario", as well as coffee grower Nabor Giraldo, Diego Miguel Sierra, and Dentist Samuel Roldán for their support in the procurement of the coffee plants and the biological matter containing CLR. Additionally, we would like to thank Alejandro Marulanda, Edwin Nelson Montoya, Engineer Hugo Murillo, Universidad EAFIT, and Colciencias for the constant academic, infrastructural, and financial support. It is also important to highlight the enormous work and dedication of Project Supervisors Engineer Camilo Velásquez, Engineer Felipe Gutiérrez, Engineer Sebastián Osorio, Engineer Juan Diego Zuluaga, and physics engineering undergraduate student Sergio Jurado. In addition, we want to acknowledge the biology team, formed of undergraduate students Alisson Martínez and Laura Cristina Moreno and led by Luisa Fernanda Posada. We would also like to mention computing-systems engineering undergraduate students Alejandro Cano, Luis Javier Palacio, and Sebastián Giraldo, and we also thank the Engineers Ricardo Urrego and Luis Felipe Machado and Sebastián Rodríguez.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: