Computer Vision, IoT and Data Fusion for Crop Disease Detection Using Machine Learning: A Survey and Ongoing Research

: Crop diseases constitute a serious issue in agriculture, affecting both quality and quantity of agriculture production. Disease control has been a research object in many scientiﬁc and technologic domains. Technological advances in sensors, data storage, computing resources and artiﬁcial intelligence have shown enormous potential to control diseases effectively. A growing body of literature recognizes the importance of using data from different types of sensors and machine learning approaches to build models for detection, prediction, analysis, assessment, etc. However, the increasing number and diversity of research studies requires a literature review for further developments and contributions in this area. This paper reviews state-of-the-art machine learning methods that use different data sources, applied to plant disease detection. It lists traditional and deep learning methods associated with the main data acquisition modalities, namely IoT, ground imaging, unmanned aerial vehicle imaging and satellite imaging. In addition, this study examines the role of data fusion for ongoing research in the context of disease detection. It highlights the advantage of intelligent data fusion techniques, from heterogeneous data sources, to improve plant health status prediction and presents the main challenges facing this ﬁeld. The study concludes with a discussion of several current issues and research trends.


Introduction
According to the FAO [1], pest attacks and plant diseases are considered as two of the main causes of decreasing food availability and food hygiene. Plant diseases vary seasonally depending on the presence of pathogen, environmental conditions and the crop type. Crops can experience environmental stress due to the following factors: abiotic (drought, water logging, salinity, etc.), biotic (insects, pests, weeds, viruses, etc.) or climate change [2]. On the other hand, pathogen is the organism that causes disease whether it is virus, a bacterium or a fungus. Determined by the disease and the development stage, the damages on the crops range from simple physiological defects to plant death. In addition to biological agents, other physical agents such as abrupt climate change [3] can cause diseases and harm the plant. Reliance on pesticides is the common practice to limit damages caused by these microorganisms [4]. In addition to their negative impacts on nature, the unreasonable use of pesticides can lead to the death of auxiliary insects used in biological control and/or the development of genetic resistance [5]. Localization of infected areas in plantations can reduce chemical use. Conventional methods for detecting and locating plant diseases include direct visual diagnosis by visual identification of disease symptoms appearing on plant leaves or by chemical techniques that involve molecular tests on plant leaves [6]. These methods are time-consuming and require a large number of people.
Promising approaches for detecting and locating diseases were proposed in recent years using automatic monitoring and recognition systems. Advances in sensor technologies and data processing have opened new perspectives for the detection and diagnosis of crop anomalies. Disease surveillance can be performed by capturing data from the soil and plant cover or using sensors, such as remote sensing (RS) or ground equipment, as well as with developing and testing machine learning algorithms [7]. Implementing management practices with smart algorithms optimizes profitability, sustainability and protection of land resources. Overall, it allows effective treatment to be delivered in the right place, at the right time, and with the right rate [8,9].
Furthermore, the agriculture field can be supplied with multiple sensors measuring environmental characteristics, plant canopy, leaves indices extracted from remote sensing imagery and IoT sensors. Given a variety of data extracted, data fusion techniques are required to assemble those types of data to better understand crop growing conditions and disease symptoms development. In addition, machine learning-based data fusion has undergone important development, and when used on agriculture data would have a great impact on plant protection field, in particular, disease and early disease detection. Therefore, several multi-sensors and remote sensing based fusion techniques have been used in agriculture for this purpose [10,11].
The use of agriculture related data from different acquisition tools along with machine learning and fusion algorithms has led to extensive research in the field of digital agriculture, especially for plant monitoring, control and protection, generating a considerable scientific literature. Over the last decade, several survey papers have been proposed on the use of machine learning approaches for agriculture mainly based on one of the different extracted data, such as IoT data, ground imagery or remote sensing imagery. Table 1 summarizes the review papers that have addressed different crop monitoring problems, covering acquisition techniques using wireless sensors, multispectral cameras, thermal cameras, hyperspectral cameras and satellite sensors. Nonetheless, as far as we know, there is no specific survey study covering IoT techniques, ground imagery, unmanned aerial vehicle (UAV) imagery and satellite imagery for disease detection using machine learning.
The literature still lacks comprehensive insight on this field of study regarding the different data sources in the agriculture field and the fusion of these data for disease detection. In this paper, we review crop disease detection methods based on unimodal data sources (wireless sensor networks, ground imagery, UAV imagery and satellite imagery) using machine learning algorithms. For each data source, these algorithms are broadly divided into two main categories: traditional machine learning and deep learning. We highlight the combination of multi-source data for agricultural applications and discuss data fusion approaches. The recent data fusion advances are presented. Current existing issues are also analyzed, particularly for disease detection. Table 1. Recent review paper in the agricultural domain using machine learning.

Topics Covered Year Review
IoT applications in agro-industrial and environmental field. 2017 [12] Precision farming techniques in semi-arid West Africa for labor productivity.
2017 [13] IOT IoT technologies in several smart farming scenarios recognition, transport, communication and treatment. 2018 [14] Crucial technologies of the internet of things in protected agriculture for plant management, animal farming and food/agricultural product supply traceability.
2019 [15] The role of wireless sensor networks for greenhouses and the models and techniques adopted for efficient integration and management of WSN. 2019 [16] Crop yield prediction using machine learning. 2020 [17]

Topics Covered Year Review
Imaging Hyperspectral image analysis techniques for the detection and classification of the early onset of plant disease and stress. 2017 [18] Data collection and handling of plants close range hyperspectral imaging and presentation of recent applications of plant assessment using those images. 2017 [19] UAV-Based Sensors, data processing and applications for agriculture and forestry. 2017 [20] Applied sensing systems and data analytics in agriculture.
2018 [22]  2019 [37] Multilevel data fusion for the internet of things in smart agriculture. 2020 [38] Utilization of multi-sensors and data fusion in precision agriculture. 2020 [39] The rest of the paper is organized as follows. Section 2 provides an overview of crop disease detection techniques for precision agriculture, the use of monitoring systems in agriculture and the current state of the art for crop disease detection techniques, from ground to aerial imagery and satellites imaging techniques. Section 3 analyzes fusion approach opportunities for agriculture. The discussion and conclusion is presented in Section 4.

Crop Disease Detection
Natural plant growth depends on multiple interactions with many environmental characteristics: soil properties, cultivation techniques, weed growth and microclimatic conditions. Unsuitable conditions such as sudden changes in temperature and humidity can eventually cause plant diseases. Since early treatments are the most effective in protecting agricultural production, knowledge of the elements of infection is essential to ultimately develop targeted control methods against pathogens. Therefore, precision agriculture (PA) attempts to consider all these characteristics to equip decision systems and agricultural machines with information provided by sensors, since many recent sensors enable monitoring and mapping of various field parameters [40]. In the literature, several acquisition protocols have been proposed to acquire crop spectral images. One of the most used in this domain is the protocol for imaging leaves of isolated plants. The choice of the acquisition tools depends on the purpose of the study, it can vary from a simple smartphone camera to a highly sophisticated hyperspectral camera mounted on an aerial vehicle. The images' quality and processing tools can vary dependently on the acquisition system and acquisition conditions used.
In the next subsections we present different types of sensors deployed for disease detection using machine learning, i.e., ground cameras, UAV for aerial imaging, satellites and pedoclimatic sensors. Note that for the rest of the paper, IoT refers to pedoclimatic sensors.

Ground Imaging
Crop ground imaging is the technique of acquiring crops' fruit and leave images at ground level using smartphones or digital cameras. Since visual symptoms on the crops and plant leaves are important for disease detection, researchers tried to capture plant leaves in field conditions [41,42], raising the challenge of dealing with complex background, shadows and unstable luminosity. On the other hand, other studies have examined the sensitivity of spectrometry to chemical and organic characteristics of plants. When plants suffer from stress the normal chlorophyll production decreases, thus absorption decreases which causes a significant growth of reflectance [43,44]. As a result, the spectral characteristics of plants are affected by diseases, leading researchers to invest in the detection of infected and uninfected leaves and the classification of different disease severity degrees with visual symptoms and even before visual symptoms appearance [45]. A modern approach for disease detection relies on machine learning algorithms to explore data from different acquisition systems: Traditional machine learning algorithms were used for the purpose of disease detection. Support vector machine (SVM) models are commonly used for plant disease detection due to their prediction efficiency. In [45], SVM was used for early detection of drought stress in barley with close range hyperspectral imaging. The model was trained on the extracted information from labels and selected vegetation indices (Red Edge Normalized Difference Vegetation Index (RENDVI) and Plant Senescence Reflectance Index (PSRI)). Similarly, manual extraction of lesion characteristics and combination of multiple SVM classifiers (color, texture and shape characteristics) for diseases recognition on plant leaves have been proposed in order to reduce misclassification [46][47][48][49]. Statistical analysis of some indices using the principal component analysis (PCA) model successfully differentiated between healthy plants and infected golden potato disease progression [50]. The authors first extracted the vegetation indices, simple ratio (SR) and Normalized Difference Vegetation Index (NDVI) from acquired hyperspectral images of infected potatoes at different disease development stages. The analysis results demonstrated the ability of spectral data to distinguish healthy from diseased plants. In the same way, the authors in [51] used K-nearest neighbor (KNN) and the decision tree-based classifier C5.0 to classify grey mold infection severities on tomato leaves using hyperspectral images. Results indicate that the full range model can differentiate between healthy and infected leaves with an accuracy of 92.86% and 85.71% for KNN and C5.0, respectively. In [52], the authors aimed to estimate the severity of three diseases on wheat using an artificial neural network composed of one hidden layer; the classification accuracy reached 81%.
Deep learning models were then used to improve the prediction quality and address larger types of diseases and crops. This subsection presents deep learning models deployed on RGB images, multispectral images and hyperspectral images for early disease detection.
In [53], the authors tested several deep learning models from scratch and with transfer learning. The tests were carried out on the PlantVillage dataset of plant leaves images [54]. The ResNet34 model outperformed all other models by achieving an accuracy of 99.67%. In [42], the authors proposed a conditional convolutional neural network derived from the ResNet50 topology. To test their approach, they generated a dataset of five crops images acquired with a smartphone under real conditions. The approach outperformed the conventional method using only visual information with an accuracy of 98%. Some authors have tested EfficientNet on the PlantVillage dataset [55]; the model outperformed state-ofthe-art deep learning models achieving 99.91% and 99.97% in the original and augmented datasets, respectively. Likewise, in [56], a smaller dataset of infected tomato plant leaves images was divided into different pest attacks and plant diseases. The detection method achieved an accuracy of 95.65% using the DensNet161 with transfer learning.
For better practicability for farmers, researchers developed mobile applications for disease detection using deep learning adaptable models with mobile computing capacity and energy. In [57], the authors proposed a model inspired by MobileNet Google models for tomato disease detection. The model was able to distinguish tomato leaves diseases through image recognition with accuracy reaching 89.2%. Similarly, MobileNet was deployed to detect diseases from apple leaves [58]. Compared to the state-of-the-art model, ResNet, MobileNet was the most efficient with three times less computing time. In [59], MobileNet was tested for citrus disease detection and compared to another CNN model, such as Self-Structured (SSCNN) classifiers. The results showed that SSCNN was more accurate for citrus leaf disease classification on mobile phone images.
In a pre-symptom disease detection task exploiting hyperspectral images, the authors in [60] used the extreme learning machine (ELM) classifier model on full wavelengths of hyperspectral tomato leaves images. The identification results were very satisfying, with an overall classification accuracy of 100%, however, the classification was time-consuming. To solve this issue, they re-established the ELM model based only on the effective wavelengths selected by the successive projection algorithm (SPA). The model performed the disease classification task with an accuracy of 77.7%. In a similar way, the authors in [61] attempted to detect the tobacco mosaic virus (TMV) on tobacco leaves using the ELM classifier. They selected the effective wavelengths that contain much disease information in order to avoid instability of convergence in predictive models (high correlation between bands). The overall classification accuracy (healthy, 2 DPI (days post-infection), 4 DPI or 6 DPI) reached 98% using input spectral data. In the same context, [41] developed a method to detect fusarium head blight disease in wheat using hyperspectral images and a specific acquisition protocol considering the field conditions. The authors were able to classify infected and healthy wheat head crops using hyperspectral images. The accuracy reached 84.6% using a two-dimensional convolutional bidirectional gated recurrent unit neural network (2D-CNN-BidGRU) hybrid model.
Ground imagery is an interesting technology in smart farming. Deep models using this type of acquisition guarantee high detection accuracy thanks to the close level leaf imaging with high resolution, Table 2 summarizes the effective wavelengths used for disease detection for close range imaging presented in this section. However, this strategy fails to monitor and diagnose plant diseases at a large scale. Moreover, these techniques are time-consuming in a wide range study area.

UAV Imaging
UAVs are exploited also as a precision agriculture (PA) solution for monitoring and controlling crops growth [25,63], eventual disease development and weed detection [64,65], thanks to their ability to collect higher resolution images at lower costs. It has an effective role in agriculture considering the cost reduction by avoiding the need for a field expert to go through the whole culture several times for monitoring. UAVs equipped with embedding cameras and sensors perform efficient field data acquisition for field scale visualization and analysis. Additional elements can help enhance performances of crop monitoring techniques, such as the choice of appropriate sensors and intelligent recognition models. As spectrometry is sensitive to diseases, multispectral cameras are more often used for disease detection studies. The combination of cameras mounted on a low-altitude remote sensing platform allows real-time image acquisition in precise location and with different wavelengths.
For systems using these types of sensors, a large amount of data is first stored in largescale databases provided by information systems such as the geographical information system (GIS). In fact, the information system enables the visualization and analysis this data. Data collected provide information on soil and vegetation cover characteristics, such as soil organic content and soil moisture, biomass quantity, weed existence and early detection of crop stress with eventual disease stage evaluation.
Traditional machine learning algorithms are used for plant disease detection using UAV images. One of the first models attempting to predict infection severity on plants from images is the Backpropagation NN (BPNN) [62], in which the authors extracted spectral data from remote sensing hyperspectral images of tomato plants. Afterwards the authors rated the images according to the light blight severity based on five stages and tested the BPNN on the data extracted. The results showed that ANN with backpropagation could be used in spectral prediction for disease detection. In the same context, the authors in [66] attempted to detect leafroll disease using the Classification and Regression Tree. Their approach was based on spectral and spatial signatures extracted from UAV hyperspectral images of grapevine. Correspondingly, the authors in [67] extracted spectral bands, vegetation indices and biophysical parameters of diseased and healthy plants from UAV multispectral images. The ROC analysis was then exploited to estimate the capacity of the selected variables for disease detection. In [68], the authors adopted a segmentation approach based on the Simple Linear Iterative Clustering (SLIC) for soybean foliar diseases detection. This method employs the k-means algorithm for superpixels. After segmentation, the images were classified using SVM achieving a precision of 98.34%. In another study [69], authors covered the wheat yellow rust infection using UAV multispectral images. The approach based on random forest classifier was able to discriminate the disease in different development stages with an accuracy reaching 89.3%. In [70], UAV images were utilized for the detection of citrus canker in several disease development stages. The authors used radial basis function (RBF) for classification; RBF is an artificial neural network that performs supervised machine learning. The classification achieved a disease detection accuracy of 92%. In another study [71], the authors extracted vegetation indices (VIs) from multispectral images to enhance information on plant characteristics. The results showed that the VIs feature compressed with PCA and combined with the value of the original data generated an accuracy of 100% using AdaBoost algorithm. Multilayer perceptrons (MLP) were also applied for classification tasks using hyperspectral data collected on healthy and diseased avocado trees [72]. Similarly, the SVM classifier was used to detect a fungus attacking olive trees based on hyperspectral and thermal images captured from a UAV [73]. The model achieved an accuracy of 80% using an optimal set of spectral bands.
To conclude, the performance of traditional machine learning approaches is limited and can easily vary according to different growing periods and with different acquisition equipment. In addition, the low performance can also be due to the feature engineering process, which provokes important information loss.
Deep learning models have also been developed and used to tackle the limitations of traditional machine learning for plant disease detection using UAV images. In [74], a sliding window was used on each plot image, moving in small steps along the image. The classification task was performed using a convolutional neural network (CNN) architecture; the results obtained had a mean absolute error value of 11.72% and a relatively low variance. In [75], the classification between healthy and diseased maize leaves was performed using ResNet model, achieving a test accuracy of 97.85%. Similarly, with the aim of detecting disease symptoms in grape leaves [76], the authors used the CNN approach by performing a relevant combination of image features and color spaces. Images were converted into different colorimetric spaces to separate the intensity information from chrominance. The CNN model Net-5 was tested on multiple combinations of input data and three patch sizes. The best result using combination obtained an accuracy of 95.86%. In [77], the authors proposed a novel deep learning architecture for the detection of yellow rust in winter wheat at different observation times across the wheat growing. The architecture consists of multiple Inception-ResNet blocks combining the Inception and ResNet models for deep feature extraction. The model reached an overall accuracy of 85%. In another study [78], the authors attempted to detect disease in pinus trees using UAV images. The classification model was designed based on a combination of deep convolutional generative adversarial networks (DCGANs), and an AdaBoost classifier. The motivation behind using Adaboost classifier is to create a reinforcement learning combining the other classification models for better precision. The proposed approach achieved a recall value of 95.7%. In [79], the authors developed a deep learning approach to combine visible and near-infrared images obtained from two different sensors in order to detect the grapevine mildew symptoms. The first step consisted of overlaying the two types of images, using an optimized image registration, and resulting images were used with semantic segmentation approach (SegNet architecture) to delineate and detect the vine symptoms. Their approach achieved an accuracy of 92% for detection at grapevine level. The same authors recently proposed a deep learning architecture that combines multispectral and depth information for vine symptom detection [80].

Satellite Imaging
Even if UAV are available, the temporal aspect in the historical monitoring task may be missing. Conversely, satellites covering wider land areas offer historical images of the study area depending on satellite acquisition frequency. Plant health status can also be monitored using satellite imaging [81]. Indeed, satellites can provide multispectral images with very high spatial resolution that can range from 0.5 m to more than 30 m. For example, Landsat and Sentinel-2 satellite sensors provide the most widely accessible medium-to-high spatial resolution multispectral data that can be used for vegetation phenology. Table 3 details commercial satellite sensors collecting multispectral images with a spatial resolution from 0.5 m to 30 m (Satellite Imaging Corporation (SIC)). Table 3 shows that satellites with high spectral resolution have quite large revisit periods. Conversely, high temporal resolution satellites have very low spectral resolution [82]. For instance, the MODIS sensor for the Terra/Aqua satellite collects daily images. However, its images have a spatial resolution of 250 m (band 1, 2), 500 m (bands 3-7) and 1000 m (bands . A spatio-temporal fusion can be useful to carry out a vegetation monitoring study using these data [83].
Several machine learning methods have been used to perform land monitoring from satellite images, for instance: mapping of urban fabric [84][85][86], crop classification and field boundaries [87,88] and pest detection [89].
Traditional machine learning was used to test the usage of satellite images for disease detection. In [90], the authors developed a detection application using SPOT-6 images with a supervised classification algorithm called spectral angle mapper (SAM) to map powdery mildew of winter wheat. The classification was based on selected bands (green and red) and indices of disease-sensitive vegetation. The approach achieved an overall mapping accuracy of 78%. Similarly, the authors in [91] collected images from Sentinel 2 for stress detection in rice. The types of stress detected in this study were pests and disease stress, heavy metal stress or double stress combining the two first types. The study demonstrated the usefulness of satellite imagery to distinguish the causes of stress in different areas using the coefficients of spatio-temporal variation (CSTV) derived from the stress-sensitive VIs related to red edge bands. In [92], the authors were able to discriminate severity levels of yellow rust infection (i.e., healthy, slight, and severe) in winter wheat using multispectral bands for the Sentinel 2 sensor and hyperspectral data acquired at canopy level. To achieve the classification task, a new multispectral index, the Red Edge Disease Stress Index (REDSI) was proposed. This index was based on the sensitive bands B4 (Red), B5 (Re1), and B7 (Re3) and validated with an overall identification accuracy of 85.2% using the optimal threshold. SVM was deployed in [93] for disease detection in winter wheat. The proposed approach was based on growth indices and environmental factors calculated from Landsat-8 images. The model achieved an overall accuracy of 80%. In [94], the naive Bayes algorithm was tested on spectral signatures of coffee berry necrosis issued from Landsat 8 OLI satellite images in the aim of disease detection; the classification reached an accuracy of 50%. Nevertheless, accurate disease detection predictions require smart data processing using a smart method. Deep learning has proven its high performance for disease detection also using satellite images. In [95], the authors proposed a gated recurrent unit (GRU)-based model to predict development of sudden death syndrome (SDS) disease in soybean quadrats. Twelve PlanetScope satellite images were conducted in this study. The method incorporated time-series information for the classification task in different scenarios, each scenario having a different sequence size. Interestingly, the highest accuracy value was reached in the fourth scenario with the highest sequence size, which means that when enough historical images are available, precision improves. However, the study suffers from a limited dataset, which has the effect of unbalancing the development of the SDS. The issue was addressed by assigning weights to diseased samples. The high spatial resolution is an important criterion for plant disease detection. In fact, the range of 10 m resolution and above are barely enough for crop classification task, which becomes challenging for disease detection [96]. To bridge the gap of lacking data and improve the prediction, several analysts recommended incorporating satellite images with aerial images and other data sources such as wireless sensor networks that capture environmental parameters for disease detection [97].

Internet of Things Sensors
IoT sensors are one of the most widely used technologies in PA, due to their efficiency, ease of installation and low cost. A typical wireless monitoring system must contain multiple sensors connected in each zone to an installed node, with sensors and nodes communicating via radio-frequency. In addition, a gateway is also needed to accomplish connection between sensors and the user [98][99][100]. Once connected to the Internet server, the user can access the collected data. In case the WSN is unavailable, one of the existing alternative solutions is the weather station [101] which provides different local measurements in real-time for various agricultural applications. Several studies have been established to collect wireless sensor network data for disease detection. In [102], the authors developed an IoT monitoring system that collects the environmental and soil information data using a wireless sensor network. Collected data have been used for early detection of tomato and potato disease. In [103], the authors developed a monitoring and prediction system for mildew prediction in a vineyard. The approach was based on temperature, humidity and rainfall observations and the Goldanich model for prediction and alarming. Analyzing the resulting data is an essential key to ensure phytosanitary protection. Nevertheless, the classic methods used for disease detection are limited and it is more interesting to take advantage of machine learning algorithms to generate efficient prediction models.
Traditional machine learning: Many research studies have been carried out to control and monitor plants, as well as predict their health status based on specific physical sensors, since abiotic factors help to determine the health status of crops. In [104], the authors developed a surveillance system to identify the risk of grape disease in its early stages using the Hidden Markov model. Sensors for temperature, relative humidity and leaf humidity are placed in the vineyard to collect the necessary data. These data were transferred to a server via Zig-Bee communication (standard designed for low-power wireless data transmission). For the classification task, the favorable conditions for those responsible for the spread of diseases in grapes were provided by the National Center for Research on Grapevines (CNRG), as shown in Table 4. A naive Bayes kernel model was used in [105] for disease prediction based on environmental and soil information extracted using an IoT monitoring system. A KNN model was deployed for the early detection of agricultural diseases [106]. The prediction was based on multiparameters extracted from the field, namely atmospheric temperature, atmospheric humidity, CO2 concentration, illumination intensity, soil moisture, soil temperature and leaf wetness. The model achieved promising results, proving the validity of environmental data for early disease detection. Similarly, in [107], the authors proposed a system to predict the health status of tomato plants. Since abiotic factors such as temperature, soil moisture and humidity help to determine whether the plant is growing in healthy conditions or not, the system used two sensors: a soil moisture sensor and a temperature-humidity sensor. Two supervised learning algorithms (SVM and Random Forest) and an unsupervised learning technique (K-means clustering) were tested. The algorithms achieved test accuracies of 99.3% for SVM, 99.6% for Random Forest and 99.5% for K-means.
Deep learning: In [108], the authors developed an approach for prediction of cotton disease and pests occurrence. The approach was designed based on weather and atmospheric circulation time series collected from six different zones in India. Bidirectional-LSTM (Bi-LSTM) was then introduced for prediction; it achieved an accuracy of 87.84% and an overall area under the curve (AUC) score of 0.95. Nevertheless, we noticed that the amount of IoT papers established for disease detection using machine learning is not sufficient, which may be due to the fact that these data are not efficient in prediction crop health status. Thus, these inputs coupled with other types of data can provide valuable results by using appropriate fusion techniques and adequate AI models for good adjustments to these complex multivariate data.

Summary
Some of the most innovative technologies in plant protection are connected sensor networks, since there is a correlation between variations in microclimatic conditions and plant stress. Numerous research studies were carried out to control and monitor crops, and also predict plant health based on meteorological characteristics [100,104,107]. In addition, images can be a better representation of crop health state. This is due to the spectral signature of symptoms on the crops and plant leaves. Ground images, UAV images [74,76,109] and satellite images [91,92] have proven effective in detecting plant diseases. Table 5 is a summary of the research studies presented previously.
We noticed that a new tendency in disease detection application is spreading widely, characterized by the use of deep learning. This may be due to the high performance of deep learning (DL) models compared to conventional machine learning models [60]. DL eliminates the manual feature extraction phase that can sometimes result in low prediction performance and requires less effort for feature engineering [23]. In addition, DL models have been used to efficiently classify diseases in challenging environments with complex backgrounds and overlapping plant leaves. Conversely, traditional machine learning cannot effectively distinguish symptoms of disease with similar characteristics, nor can it take advantage of a larger number of trainings [42].
Nevertheless, there are still considerable drawbacks to DL regarding training time that can reach weeks depending on the processor capacity of the computer used, dataset size and model complexity. In the context of plant disease detection, datasets are not sufficiently available or are inadequate, especially when it concerns the task of early disease detection. Prior knowledge of the crop and the history of the parcels containing diseases and pests that occur is a preliminary task. To tackle this issue, researchers opt for creating their own dataset by monitoring and capturing the natural development of the infestation if it occurs [54] or by inoculating the fungus causing the disease in an experimental greenhouse [50,51]. Regarding acquisition, a hyperspectral image, for example, requires relatively expensive instruments and experts for the data collection protocol [61]. In addition, annotation is a mandatory step for creating a new dataset. The task is time-consuming and involves agriculture experts for the annotation of different diseases since the task is not within the reach of ordinary volunteers. As researchers tend to use data augmentation methods for small datasets, these methods are not always efficient and cannot exceed a certain threshold to avoid overfitting. Once the dataset is available, it can suffer from imbalance, where samples of healthy plants are more important than samples of diseased plants, as well as seasonal and regional difficulties with various categories of crop diseases [95,111].

Data Fusion Potential for Disease Detection
As the data comes from multiple sources, it seems more judicious to combine them to achieve a better performance in disease detection. Multimodal fusion for disease detection is still an ongoing area of study. In fact, researchers have started to notice the importance of merging heterogeneous types of data from different sensors [39,97]. Nevertheless, important effort must be done in order to integrate sophisticated fusion techniques on multimodal data. This will allow a better understanding of crop behavior and thus improve the prediction quality. A deeper understanding of plant features can be achieved by fusing data from multiple sensors to provide more accurate and efficient predictions.
Data fusion is the combination and simultaneous use of data and information from multiple sources to achieve better performance than using data sources separately. It is often related to the need to perceive different environmental variables from sensors [112]. Multimodal data fusion is a challenging task because it deals with a combination of heterogenous data from different modalities (images, signals, times series, etc.) [113]. Compared to classical probabilistic fusion methods, machine learning techniques have proven their capacity to provide more accurate predictions for fusion [114]. In this section, we review data fusion techniques, namely measurement fusion, feature fusion, decision fusion, hybrid fusion and tensor fusion, and explore the data fusion applications using machine learning for agriculture. Finally, we will discuss the major challenges in applying data fusion in agriculture.

Data Sources
Data sources can provide useful information about the studied phenomena; for unimodal data source a simple data concatenation can be enough for prediction purposes [104]. Otherwise, when we have several types of sensors, advanced data fusion is necessary [115]. Data from several sensors first require data analysis to characterize, order or correlate the different available data sources, and then to decide on the strategy or algorithm to be used to merge the data. Among the relationships that exist, we can find distribution, complementarity, heterogeneity, redundancy, contradiction, concordance, discordance, synchronization and difference in granularity [112].

Data Fusion Categories
In literature, data fusion methods are divided into three main categories: probabilitybased methods, evidence-based methods and knowledge-based methods. Probabilitybased methods [37] such as the Kalman filter [116], the Bayesian fusion [117] and the Hidden Markov model [118] are limited to low-dimensional or homogeneous data and suffer from high computational complexity. Therefore, they are not adequate for complex problems. Evidence-based methods [119], such as the Dempster Shafer theory [120,121], are used to deal with missing information, additional assumptions and solve the problem of uncertainty. Nevertheless, they present estimation limitations that restrict their applications.
Conversely, knowledge-based methods have proven to be effective in feature extraction, data reduction, classification and decision-making [122,123]. This type of method allows the fusion center to extract useful information from large imprecise datasets.

Intelligent Multimodal Fusion
Multimodal fusion based on machine learning [124] is capable of learning representations of different modalities at various levels of abstraction [125], with significantly improved performances [126]. Multimodal fusion can be split into two main categories [123]: model-based approaches that explicitly address fusion in their construction, and modelagnostic approaches which are general and flexible and do not directly depend on a specific machine learning method. Depending on data abstraction level, different fusion architectures for the agnostic fusion [37,123,127] are possible.
Measurement fusion (or early fusion), also known as first level data fusion, allows the immediate integration and presentation of sensor data using feature vectors. Data are generally concatenated [104], which makes fusion limited when dealing with heterogeneous data. This architecture is the most widely used because of its simplicity: it is easy to align data. In [128], the authors tried to predict the rate of photosynthesis and calculate the optimal CO 2 concentration based on real-time environmental information via a WSN system in greenhouses for tomato seedling stage cultivation. The BPNN prediction model takes as input parameters the environmental variables (temperature, CO2 concentration, humidity and luminosity) and the photosynthesis rate of the individual leaf array as the output parameter.
Feature fusion combines the results of early fusion and individual unimodal predictors by merging feature vectors, allowing heterogeneous data from different data sources to be combined. In [129], deep fusion architectures were implemented to detect defects in a planetary gearbox using four types of signals as inputs. Deep Convolutional Neuron Networks (DCNNs) were used on multiple levels of multimodal data fusion. The feature level fusion with feature learning from raw data was performed after the raw data extraction phase. DCNNs were applied on each type of data to learn features, and then the outputs were extracted as the learned features. The learned features were finally combined and fed to another DCNN for feature-level fusion classification.
Decision fusion (or late fusion) involves processing data from each sensor separately to obtain high-level inference decisions, which are then combined in a second stage [130]. The decision-level fusion method combines information from different sensors after each sensor has made a preliminary decision. The combination can be simple or weighted. In [131], a use case of weighted decision fusion architecture on multiple sensors is presented. In addition to the sensor data, two classical characteristics (power and median frequency) were extracted from each signal corresponding to each sensor and entered the individual channel classifier. Then, the method of weighted majority voting (WMV) was used to merge the resulting vectors, with each sensor data being weighted by a confidence measure (or weight).
Hybrid fusion merges information at two or more levels. In the hybrid approach proposed in [115], the authors developed the merging technique of different CNN classifiers for object detection in changing environments. Three types of input modalities were used: RGB, depth and optical flow. The CifarNet architecture was designed as the single expert model, and then the outputs of each expert network model were fused with weights determined by an additional network called gating network. This approach was called Mixture of Deep Experts (MoDE).
Tensor Fusion (TFM) consists mainly of a tensor fusion layer that models unimodal, bimodal and trimodal interactions using a three-fold Cartesian product from modality integration [132]. The architecture has been improved to lower the computational complexity; the resulting architecture is called low rank tensor fusion (LMF) model [133]. LMF has been proposed to identify the emotions of speakers according to their verbal and non-verbal behaviors, based on visual, audio and language data. Three YouTube videos databases were used with annotation of feelings, speaker traits and emotions. The learning network for acoustic and visual modalities was represented by a two-layer neural network, and for linguistic modalities, a Long Short-Term Memory (LSTM) network was used to extract the representations. The LMF model, compared to the tensor fusion model, performed significantly better for all datasets and measurements. Moreover, the LMF significantly reduced the computational complexity from exponential to linear.
Multimodal Search Architecture Fusion (MFAS) is a generic architecture that creates a large number of possible fusion architectures, scans the neural architecture and choses the best performing architectures [126]. The MFAS is inspired by the progressive neural architecture search (PAS) [134] where the search is efficiently guided for architecture sampling using temperature-based sampling [135]. Testing three datasets, the MFAS has proven its efficacity against the state-of-the-art results on those datasets.
In [136], the authors performed a comparison between four types of fusion (late, MoE, LFM and MiD) on image and signal modalities for automatic texture detection of objects. Fusion methods provided latent vectors which were introduced in the corresponding artificial neural networks ANNs. The most efficient fusion method in the texture classification task was the LMF which achieved an average test accuracy of 88.7%. Tested on degradation scenarios, the Late, MoE and Mid fusion methods behaved similarly. The fusion architecture potentially allowed ANN to achieve good results in the texture detection task. Nevertheless, the performance without the main modality (images) decreased significantly.
To conclude, machine learning-based multimodal fusion approaches have an important potential to solve open issues in agriculture by merging different types of data. We believe that exploiting these advanced techniques for disease detection issues can provide a better understanding of the plant environment and thus improve prediction performance.

Data Fusion Applications in Agriculture
Even if advanced fusion techniques are a rapidly growing area in agriculture, literature still lacks studies on disease detection in this domain. Different applications on data fusion in agriculture are presented in literature, specifically data fusion for yield prediction [63,137,138], crop identification [96,139], land monitoring [140,141] and disease detection [111,142,143].

Data Fusion for Yield Prediction
In [63], the authors investigated the relationship between canopy thermal information and grain yield, using data fusion of data from different sensors. They extracted spectral (VIs), structure (vegetation fraction (VF), canopy height (CH)), thermal (normalized relative canopy temperature (NRCT)) and texture information from canopy using multi-sensors installed on a UAV. Two fusion models were used in this study, input-level feature fusion DNN (DNN-F1) and intermediate-level feature fusion DNN (DNN-F2). The DNN-F2 outperformed DNN-F1 in terms of prediction accuracy, spatial adaptability and robustness across different types of models.
In [138], datasets of summer and winter rice yield, meteorology data and area data from 81 counties in a Chinese region were used. To predict rice yield, the authors proposed a deep learning fusion model named BBI model combining backpropagation neural networks (BPNNs) with an independent recurrent neural network (IndRNN). The model first captured deep spatial and temporal features from the input data, then combined these deep features by fusing the outputs and then learned the relationship between these features and yield values. The proposed model accurately predicted both summer and winter rice yield.
In another study [137], extreme learning regression machine (ELR) was used to phenotype estimation from several sensors. First, the authors simultaneously collected RGB, multispectral and thermal images from sensors installed on a drone. Then, vegetation traits (color, physical structure, spectral and thermal features) were extracted from the images. The features were combined and fed into the ELR prediction model. Compared to other classifiers, with multisensory combinations, ELR provided relatively accurate results of plant features estimation.

Data Fusion for Crop Identification
In this study [139], the authors exploited spatio-temporal data to segment satellite images of vegetation fields. The data used are images captured by the Gaofen 1 and 2 satellite. The authors developed a CNN 3D active architecture to extract information for the multi-temporal images. The 3D tensor is composed of the spectral, spatial and temporal characteristics of each element in each band. The convolution is done at spatial-spectral or spatial-temporal scale. In the same context, some researchers tested the feasibility of temporal CNNs (TempCNNs) for satellite images classification [96]. To do so, they collected 46 images during one year from the Formasat-2 satellite. Three bands (NIR, red (R) and green (G)) and three VIs were used (NDVI, Normalized Difference Water Index (NDWI) and Brilliance Index (IB)). The proposed algorithm consisted of three convolutional filters applied consecutively. The results obtained showed that the overall accuracy of the CNN models increased when more features were added, regardless of their type. The proposed model using the spectral bands in the feature vector outperformed all other combinations by a variation between 1% and 3%, achieving an accuracy of 93.4%.

Data Fusion for Land Monitoring
Nearly all studies on satellite images agree that a very high resolution is the key to achieving interesting results. Thus, the required resolution is not frequently available. Therefore, some researchers have attempted to develop resolution improvement techniques to solve this issue using data fusion. In this context, the authors in [83] developed an extended super-resolution convolutional neural network (ESRCNN) for data fusion framework, specifically to blend Landsat-8 and Sentinel-2 images of 20 m and 10 m spatial resolution, respectively. The study produced temporally dense observations of land surfaces at short time intervals using Landsat-8 and Sentinel-2 data. In the same context, the authors in [140] proposed a method for exploiting UAV remote sensing data by fusing highand low-resolution images. The resulting data were then transmitted to a deep semantic segmentation module to provide a useful reference for sunflower lodging assessment and mapping. The fusion approach outperformed the no-fusion approach using the same models, and the best accuracies were achieved using the SegNet method reaching 84.4% and 89.8% without and with image fusion, respectively, on the test set. In another study [141], the authors suggested a satellite/UAV fusion technique for monitoring soybean fields using machine learning. They combined spectral canopy information (vegetation indices) extracted from Worldview-2/3 data with canopy structure features (canopy cover and height) calculated from UAV RGB images. These features were combined and fed into their ELR dual activation prediction model for plant characteristics predictions. Their results showed that predictions based on the combination of multi-sensors (Satellite/UAV) data outperformed those using single-sensor features.

Data Fusion for Disease Detection
Plant monitoring tools for disease identification and classification produce a huge amount of data. One way of dealing with these data is either to analyze each type of modality separately to compare results and evaluate the method validity [144,145], or to fuse and combine data for a better understanding of disease conditions. One of the first attempts to integrate multisource data for disease detection was developed using both meteorological data and satellite scenes [142]. The classification task was based on logistic regression and the effective characteristics extracted from both modalities. Results showed great potential for the multimodal data integration for disease detection. In [111], the authors proposed a multi-context fusion network for crop disease detection. The approach was based on images collected in the field, in addition to contextual information (season, geographical location, temperature and humidity). The proposed model was composed of three major parts: CNN backbone for visual features extraction, ContextNet for fusion of the contextual features, and a fully connected network for fusion of all features and final predictions. The proposed approach achieved an identification accuracy of 97.5%. Nonetheless, their method suffers from imbalanced data due to seasonal and regional difficulty with various categories of crop diseases. In another study [143], aiming to detect diseases in mixed and complex African landscapes, the researchers split the study into three main parts: pixel-based banana classification, object-based banana localization and disease detection. The classification was established using SVM in which the inputs were a combination of multispectral bands with vegetation indices extracted from the multi-level satellite images (Sentinel 2, PlanetScope and WorldView-2) and UAV (MicaSense RedEdge) images. For banana plant detection in the field, they trained the object detection model RetinaNet on UAV RGB images and developed a custom classifier for simultaneous banana tree localization and disease classification. Compared to the VGG model, the custom classifier provided the best results with an accuracy reaching 92%. Although the approach has good classification results, it suffers from considerably important training time.

Summary
The applications of data fusion in agriculture presented in this section can be divided into three types. Spatio-spectral fusion is a multi-band fusion that constitutes a fine-spatial and fine-spectral fusion. Spatio-temporal fusion is based on blending data with fine spatial resolution and coarse temporal resolution (temporal revisit frequency) with data that have fine temporal resolution, but coarse spatial resolution, with the objective being to create a fine spatio-temporal resolution. Finally, multimodal fusion corresponds to heterogeneous multisensory fusion. Table 6 summaries data fusion applications in agriculture presented in this subsection.

Data Fusion Challenges for Agriculture
Data fusion is only worthwhile if it increases the quality of predictions and relevance of decisions based on the data combination. These data are likely to be insignificant, noisy or flawed [113]. If the algorithm decisions are of poor quality, they may have a negative impact on the expected results. Therefore, it is imperative to reduce this noise and eliminate these errors to improve the accuracy [10]. In addition to noise, observed data may be characterized by non-commensurability, different resolutions and incompatible sizes or alignment, and consideration should be given to exploit a pre-processing model to solve this problem [79]. Furthermore, different data sources may provide contradictory data or missing values. A data analysis step is therefore required. Once data are ready for the learning process, unbalanced data, which are basically unequal representation, can also affect the prediction rate. Thus, the biggest constraint of data fusion is the multimodality with data from distinct types of sensors, different fusion architectures can be adopted [133,136]. However, the exploitation of these advanced models in agriculture, precisely in the disease detection area, is still budding.

Discussion and Conclusions
This review enabled to map the various research works of disease detection using machine learning with different data modalities. Spectral imaging can be an essential tool to assess crop health status. This is due to the different reflectance of healthy and diseased crops. Therefore, researchers exploited plant leaves images using machine learning and deep learning techniques for automatic disease detection which has led to interesting accuracies. Multispectral and hyperspectral images were very useful and provided higher precision for disease detection. That is tied to the spectral measurements sensitivity to stress and change at different stages of crop growth and disease severity. Nevertheless, hyperspectral data acquisition protocol is difficult to apply in field conditions. Moreover, the spectral reflectance can also be influenced by several factors, such as technical properties (resolution, brightness, etc.), sample preparation conditions (laboratory or field) and sample properties (size, texture, humidity, etc.). Further analysis of reflectance based on crop vegetation indices, during crop growth at all stages of infection, is required. In addition to RGB and hyperspectral images, thermal images have proven to be very useful in the detection of plant diseases. The main motivation is the fact that plant leaf temperature can help predict plant health status. Several researchers have explored this type of images for disease detection approaches at the leaf level, and others have combined these images with multispectral data for effective early detection at the ground vehicle and aerial vehicle level since plant leaves acquisition requires involving people to drill down the whole field to acquire images, which is an energy and time-consuming strategy. Indeed, the plant disease detection issue has strongly benefited from the aerial vehicles for the crop monitoring process at plot scale. Several types of cameras were used for that purpose and mounted on a drone. Based on the acquired images, combined with machine learning models, researchers were able to efficiently determine healthy and infected crops.
Spectral imaging using UAV provides important information on soil and the upper part of plants in a large spectrum, therefore, UAVs are used more often. Nevertheless, the difficulty arises in evaluating the state of fruits in plants and lower to plant leaves. Merging the two technologies can also broaden the spectrum of plants to be processed while ensuring early detection accuracy. However, UAVs suffer from environmental and logistic constraints such as high-speed wind and rain, battery capacity and a fundamental need for a trained person to start and manage flights. Satellites can be an excellent alternative to UAVs to monitor healthy plant growth depending on the spatial and spectral resolution. A promising new field in the detection of plant disease on a larger scale is UAVs and satellites imaging which has proved its usefulness in many agriculture applications.
Despite the usefulness of satellite images, this area of research faces several challenges. Clouds and their shadows represent a major obstacle when it comes to processing and extracting disease signature from high-resolution satellite images; when clouds cover vegetation, the acquired images become unexploitable. The main obstacles to the crop monitoring application and disease detection using satellites are rapid changes in agricultural land cover in relatively short time intervals, differences in seeding dates, atmospheric conditions and fertilization strategies; since it is difficult to predict whether the reflectance changes are due to disease or to those factors, an in-situ study is required to validate predictions. High-resolution satellite images can be a key approach for very large-scale disease detection.
The current technologies of imaging sensors have many limitations for earlier disease detection. The association of multiple sensors data can provide a better understanding of the growth and health status of the crop and thus better prediction rates. This explains a growing interest in the scientific community for the multimodal data fusion in the field of crop disease detection. The most known meteorological sensors used for disease detection are temperature [43], humidity [146], soil moisture and light intensity sensors [147]. One can benefit from the power of AI algorithms to process the multimodal data sources and predict crop diseases in earlier stages.
Indeed, neural networks and deep neural network models demonstrated a significant capacity in the agriculture field to monitor the healthy crop growth and capture anomalies outperforming traditional machine learning algorithms. An example of the high performance of DL compared to conventional method can be seen in [140]. The authors compared the classification results of SVM with FCN (Fully Convolutional Network) and SegNet on multispectral images. SegNet and FCN outperformed SVM model in both experimental fields and with different combinations of image bands as shown in Table 7; where RGB MS and NIR MS are respectively visual and near-infrared (NIR) bands of multispectral image, FRGBMS and FNIRMS with high resolution are respectively fusion results of RGBMS and NIRMS. In addition, from the table we can clearly see the impact of image fusion on the recognition results and the accuracies improved for all the models, including the traditional machine learning model. Thus, the correct diagnosis depends on the choice of DL architecture and the type, quantity and the quality of the data. Hence the fusion of all different important components can lead to an efficient disease detection system. The application of multimodal deep learning involves the selection of a learning architecture and algorithm. Lately, multimodal fusion has proven an inescapable potential and is increasingly used in several domains such as healthcare, sentiment analysis, human-robot interaction, human activity recognition or object detection. In the agriculture field, several deep learning fusion approaches have been proposed, such as applications in yield prediction, land monitoring, crop identification and disease detection.
The most widely used type of fusion in agriculture is the fusion of multi-sensors data from aerial vehicles, fusion of multi-resolution satellites data and the fusion of satellite and UAV images. This fusion is employed to improve the detection process for tasks such as crop monitoring or plant classification. Thus, for our specific task, the use of other data sources can enhance early disease detection performance. However, few multimodal fusion studies have been conducted, particularly for disease detection. Promising results of multimodal fusion were presented in this paper, demonstrating the high potential of deep learning fusion models for prediction when using multimodal data, which creates an opportunity for further research works.

Conflicts of Interest:
The authors declare no conflict of interest.