Graph-Based Data Fusion Applied to: Change Detection and Biomass Estimation in Rice Crops

: The complementary nature of different modalities and multiple bands used in remote sensing data is helpful for tasks such as change detection and the prediction of agricultural variables. Nonetheless, correctly processing a multi-modal dataset is not a simple task, owing to the presence of different data resolutions and formats. In the past few years, graph-based methods have proven to be a useful tool in capturing inherent data similarity, in spite of different data formats, and preserving relevant topological and geometric information. In this paper, we propose a graph-based data fusion algorithm for remotely sensed images applied to (i) data-driven semi-unsupervised change detection and (ii) biomass estimation in rice crops. In order to detect the change, we evaluated the performance of four competing algorithms on fourteen datasets. To estimate biomass in rice crops, we compared our proposal in terms of root mean squared error (RMSE) concerning a recent approach based on vegetation indices as features. The results conﬁrm that the proposed graph-based data fusion algorithm outperforms state-of-the-art methods for change detection and biomass estimation in rice crops.


Introduction
Recent advances in sensor technology have led to the increased availability of hyper-spectral, multi-spectral (MS), and synthetic aperture radar (SAR) images (at very high spatial and spectral resolutions), which describe an object or phenomenon. Each sensor captures different information that explains physical features. For example, a SAR sensor captures information about the physical characteristics of a surface (such as roughness, geometric structure, and orientation), and an MS sensor captures reflectances at different wavelengths from objects. Therefore, it is generally desirable to use more sensors rather than fewer [1]. For hyper-spectral and multi-spectral images, the fusion approaches can be categorized into component substitution, multi-resolution analysis, unmixing, and Bayesian-based methods. We encourage the reader to refer to [2] for a comprehensive review. Even though data fusion contributes to better performance in classification and detection in remote sensing, it is a complex task. For example, the different resolutions, units, dimensions, and formats are challenges imposed by raw data [3]. Thus, the extraction of features helps to cope with those challenges. Additionally, in recent years, graph-based fusion algorithms have been able heterogeneous physical units [30,31] has been addressed by processing techniques (such as domain adaptation, data transformation, transfer learning, and image-to-image translation [2,26,[31][32][33]) in such a way that datasets lying in different domains are brought into one single domain for comparison.
In order to reduce the effects of small inter-class variability and artifacts presented in MS images, we propose a graph-based data fusion methodology that works with both heterogeneous and homogeneous data. We validated our approach using fourteen real cases.

Biomass Estimation
The measurement of biomass in rice crops relies on destructive sampling or satellite image analysis. Destructive sampling involves much manual work to gather plant samples. Subsequently, it is necessary to measure the accumulated dry weight determined by leaves, stems, panicles, and all the aboveground components of rice canopies, per unit of a given area in the field [34]. The remote sensing approach, on the other hand, uses the information sourced from satellites, which provide crop-scale images with limited resolution, to perform non-invasive image-based crop data estimation. In addition, unmanned aerial vehicles (UAVs) offer a number of benefits; firstly high-resolution information, secondly relevant relationships between vegetation indices, photosynthetic activity, and canopy structural properties, and thirdly, reliable aboveground biomass estimation (AGBE) [35][36][37][38].
In the last few years, the low cost and flexibility of UAVs have created new opportunities in precision agriculture and phenotyping. They have made it possible to calculate vegetation indices (VI) from multi-spectral and thermal imagery captured from the crop. For instance, the normalized difference vegetation index (NDVI) fuses reflectances from the red band (R) and near-infrared (NIR) and is one of the most popular VIs used by farmers to quantify crop vegetation density. Although NDVI is accurate in the early stages of a crop [35], it saturates as the biomass grows. This issue produces inaccurate measurements during late growth crop stages [39]. Nonetheless, a combination of several VIs can improve the assessment of the impact that each stage of plant growth has on crop yield [40,41].
Given the advantages of UAVs with respect to alternative methods for gathering data (such as manual collection or satellite image analysis [37,38] in agriculture applications), they have become an excellent alternative for crop monitoring. Several methods have been proposed for AGBE [38,[42][43][44], which have at their core the training of machine learning methods based on features extracted from vegetation indices (VIs). A recent approach presented in [38] pre-processes MS images to extract the pixels corresponding to the rice crop. It then uses VIs to train three separate linear regression models for each growth stage (vegetative, reproductive, and ripening). To build a unique model that captures the variability of biomass for all the growth stages, we propose the use of eigenvectors as features extracted from a fused graph. This paper is structured as follows. The next section details the proposed graph-based fusion method and each step involved in the applications: change detection and estimation of biomass in rice crops. Section 3 presents the experimental results that verify the effectiveness of the proposed approach on fourteen different real remote sensing datasets for detecting the changes and one real dataset to estimate biomass in rice crops. In Section 4, we set out conclusions.

Materials and Methods
Since graphs explain the structural relationships between nodes (such as image pixels) and also capture local information related to data (such as radiometric similarities), the proposed graph-based data fusion approach aims to:

•
Construct an approximate local graph related to remotely sensed images (such as an MS image captured by Landsat/UAV) by using the Nyström extension.

•
Perform data fusion over the local graphs by minimizing the similarity between the connections of the graphs to capture relevant information embedded in the case studies.
• Extract different relationships given in the fused data by computing the eigenvectors/eigenvalues of the fused graphs.

•
Evaluate the performance of the proposed graph-based data fusion in the applications of change detection and biomass estimation.

Graph-Based Data Fusion
MS images contain pixels that reside on a regularly sampled 2D grid. Thus, it is possible to interpret them as a signal on a graph with edges that connect each pixel in each band to its neighborhood of pixels. A graph is a nonlinear structural representation of data, defined by G = (V, E), where G is the graph, V is a set of nodes, and E refers to the arcs or edges that explain the directed or undirected relationship between nodes. The edges have an associated weight of w i,j , which quantifies how strong the relationship is between the nodes. The common measure used for each weight is a Gaussian kernel [45]: where d(V i , V j ) is the distance between two nodes and σ is the standard deviation of all d(V i , V j ). A common application for graphs is the embedding of G based on the Laplacian (L) matrix into a space R m . That keeps the graph nodes as close as they were in the input space. In short, the embedding of a graph is given by the eigen problem Ly = λ λ λDy [46], where L = D − W, W is known as the adjacency matrix, or weights of the graph (each component is given by Equation (1)), and D is a diagonal matrix, its components being the degree of the node (di = ∑ j w i,j ).
As there is such a high number of pixels in an MS image, the computational cost of calculating the full matrix W ∈ R N×N is extremely high (an image with a resolution of 1280 × 960 is equivalent to N = 1,228,800). To solve this problem, we apply the Nyström extension [47] to find an approximation of W in significantly reduced time. To select points uniformly distributed across the image, n s samples are selected by spatial grid sampling and re-indexing the matrix W as: where κ G is a Gaussian kernel, d AA ∈ R n s ×n s represents the graph distances within the n s sample nodes, d AB ∈ R n s ×(N−n s ) are the distances between the n s sample nodes and the remaining N − n s nodes, and C ∈ R (N−n s )×(N−n s ) are the distances within the unsampled nodes. This method approximates C by choosing n s samples uniformly distributed across the image from the dataset of size N (n s N), hence: Thus, the eigenvectors of the matrix W can be spanned by the eigenvalues and eigenvectors of κ G (d AA ). Solving the diagonalization of κ G (d AA ) (eigenvalues λ and eigenvectors U: κ G (d AA ) = U ΛU), the eigenvectors of W can be spanned by: Since the approximated eigenvectorsÛ are not orthogonal, as explained in [47], to obtain orthogonal eigenvectors, we use Then, by diagonalization of S (S = U s Λ s U s ), the final approximated eigenvectors of W are given by:

Fusion Stage
In this section, we present the fusion stage to integrate the multi-temporal data. We model each pixel as a node in the graph and assume that pre-event and post-event images contain the same number of elements and that they are co-registered. Figure 1 presents a diagram of the method explained in Algorithm 1, which processes each instance of band b and time k (X b,k ) and the number of samples n s as inputs.
Input: Temporal images from band b or set of bands X b,k ∈ R m×n , number of samples n s Output: Fused graph W F ∈ R (n s +c)×n s Initialize: (2) Take n s samples uniformly distributed across X b,k by spatial grid sampling.
AA and X b,k , perform the pairwise distance between samples-samples Apply the normalized graph Laplacian (D − 1 2 WD − 1 2 ) by using the code in [47]. (6) Apply a Gaussian kernel (κ G (.)) with σ = mean(d AB ) on the normalized distances, and build the approximated normalized Laplacian matrix based on the Nyström approximation.
The output of Algorithm 1 for one instance of time of a selected band or bands X b,k corresponds to the approximate normalized adjacency matrix ( W b,k N ) [47]. Consequently, the fusion step consists of capturing the unique information given by each graph ( W b,k N ) into one fused graph (W F ). In order to achieve this fusion, we maximize the distance (or minimize the similarity) among the pixels: where w i,j represents the weight of the node for each instance of time (i = 1, 2, . . . , c; j = 1, 2, . . . , n s ). This learning approach is data driven (uses a few uniformly distributed n s samples across the image). It is restarted for each dataset. The graph W F represents the relationship in terms of dissimilarity between the pre-event and post-event images. an image that represents an event, X b,k AA represents the samples from ij is the approximated degree matrix, and W b,k N is the normalized Laplacian calculated by using the Nyström approximation.

Change Detection Scheme Based on the Multi-Temporal Graph (GFB-CD)
The change detection scheme presented in Figure 2 uses the approximated eigenvectors and eigenvalues found by Nyström's extension from W F , as features to represent the change between the pre-event and post-event images. The number of eigenvectors is equal to the number of samples (n s ) taken from an instance of time k. To build the change map, we select the scaled eigenvector (I u i ) that maximizes the mutual information (MI) [48] of this eigenvector with a binarized prior signal. The prior signal comes from the normalized differences between pre-event and post-event images. Get binarized prior from the given input images Change detection, where W F is the fused graph, U is the approximated eigenvectors, D is the eigenvalues, and T in the prior stands for a binarization operator.

Graph-Based Fusion Regression for Estimating Biomass in Rice Crops
In terms of image processing, the analysis of images related to crops implies important challenges. Weather conditions can affect the quality of the data (sunny or cloudy). Another important factor is the appearance (architectural morphology) of the plant as it grows. The development of tillers occurs in the vegetative stage; the number of leaves increases, as well as the height of the plant. In this stage, the color green is predominant. In the reproductive stage, panicle formation starts, and thus, yellow features appear in the images. In the final ripening stage, the flowering, the grain filling, and the maturation of the plant occur, while the leaves begin to senesce. In this stage, the color yellow is predominant, and the plot can barely be distinguished from panicles, while grains and senescent leaves predominate. In conclusion, it is possible to observe (see Figure 5) that during a plant's growth, it becomes more difficult to separate plots and distinguish between plants and background, using RGB images. Therefore, general assumptions about the color, the size of the plant, and the color of the soil will not always be correct [38]. Considering these limitations, we believe that the graph-based fusion of MS bands provides a flexible way of representing useful combinations of surface reflectance, to produce features at two or more wavelengths that predict biomass in rice crops at different stages of growth. We developed our method inspired by the work in [38], in which the authors estimated rice biomass as a function of one of the growth stages (vegetative, reproductive, and ripening). They proposed three models of linear regressions, one for each stage of the crop. Those models have inputs that are features extracted from VIs. A comprehensive survey of the specialized literature was carried out, in order to identify which vegetation indices are suitable for estimating rice biomass as a function of the growth stage of the crop [40][41][42]49]. The results of this survey are summarized in Table 1: Table 1. VIs for biomass estimation.

Name Equation
Ratio Vegetation Index (RVI) [40] N IR RED The following is a brief explanation of the procedure used by the authors in [38]: (i) segment the area covered by the crop from the soil by using k-means clustering (K = 2); (ii) extract VIs (features) from the crop pixels; (iii) train a linear regression model for each stage of the crop.
Firstly, the bands (X b,k ) that are to be fused are red (R), green (G), and near-infrared (NIR) (b = [1, 2, 3]). Secondly, there are n s eigenvectors for each fused graph (W F ). For each graph, we took the eigenvector with the associated highest eigenvalue, as it provides the strongest contribution to the Laplacian reconstruction. Thirdly, we stacked all these features as row vectors from each image into a matrix F ∈ R q×(n s +c) . Fourthly, since there are q = 489 images with a size of 1280 × 960, the dimensionality of the features is high (≈1.2 million dimensions). Consequently, we reduced the number of features to z dimensions by applying two well-known techniques: principal component analysis (PCA) [51] and t distributed stochastic neighbor embedding (t-SNE) [52]. Lastly, we trained a support vector machine (SVM) regressor [53] with a Gaussian kernel to predict the biomass over all growth. Figure 3 illustrates our proposed method for estimating biomass in rice crops based on Algorithm 1 (setting k = 1 and b = [1, 2, 3]) and the graph-based fusion methodology shown in Figure 1.
Take the last eigenvector for each fused graph.
Apply Nyström extensión Perform z-sc ore and then dimensionality reduction (PCA, T-SNE) Figure 3. The proposed methodology based on graph fusion for estimating biomass in rice crops, from q images.
The procedure used to train the models is given in Algorithm 2 below.
Algorithm 2: SVM-regression models, trained on features extracted from the fused graph.
Input: Images of rice crops to train I i ∈ R m×n , number of images q, biomass related to the image Y ∈ R q Output: Regression models: M TSNE and M PCA Initialize: (2) Take the eigenvector with the highest associated eigenvalue of the fused graph: (3) Stack the vector as features for the regression F(:, i) = v ∈ R 1×N i = i + 1 end (4) Apply the z-score to the features. (5) Perform dimensionality reduction of features (F ∈ R q×N ): F TSNE = tsne(F, z) ∈ R q×z , F PCA = pca(F, z) ∈ R q×z (6) Train the models with SVM-regression: The outputs of Algorithm 2 are two regression models that predict the biomass related to an image of rice crops. These models use the reduced dimensions as inputs (such as PCA or t-SNE) of the eigenvectors from the fused graph with respect to the red, green, and near-infrared bands. The reason for applying the z-score is to avoid the high variability of features given for the entire growth stage of the rice crops and to decrease unstable biomass estimations.

Datasets' Description
Here, we describe the datasets used to measure the performance of the proposed graph-based data fusion method. For the change detection application, we considered fourteen real scenes captured by MS and SAR sensors (as shown in Table 2 and Figure 4), which include events such as: earthquakes, floods, wildfires, melted ice, farming, and building. In addition, for the biomass estimation task, we used 560 UAV images with their corresponding value of biomass measured by the destructive method.      The authors of [38] provided a dataset that contains 321, 96, and 72 images, as well as biomass measurements for vegetation, reproductive, and ripening stages, respectively (see Figure 5). The biomass (ground truth) associated with each image is defined as follows: For each plot of the crop, one linear meter of the plant was cut from the ground. Plants were sampled and weighed (fresh weight), then put in an oven at 65 • C for four days, or until a constant weight was reached. Later, the vegetative biomass (leaves and stems) was separated from the reproductive biomass (panicles and grains). Both vegetative and reproductive biomass were then weighed (dry weight). The images were taken by a UAV equipped with the Tetracam ADC-lite multispectral camera capable of capturing images up to 72.26 mm/pixel in resolution flying at an altitude of 122 m. In our study, the UAV took images of the rice crops, flying over them at a steady altitude of 12 m above ground level (5.93 mm/pixel of resolution). The images (resolution of 1280 × 960) were co-registered, and the bands used to extract the information from the crops were NIR, red, and green.

Experimental Setup
We ran all the codes (to ensure the reproducibility of the proposed method, the code and all datasets are publicly available at: https://github.com/DavidJimenezS/GBF-CD) using a server with two processors, Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz, with a total of 24 physical cores, 48 threads of processes, and 252 GB of memory RAM @2400 MHz.

Change Detection
We compared the proposed graph-based fusion (GBF)-CD with the classical Kittler-Illingworth (KI) [16] and state-of-the-art techniques: Rayleigh-Rice (rR) [17], Rayleigh-Rayleigh-Rice (rrR) [18], and unsupervised change detection using the regression homogeneous pixel transformation (U-CD-HPT) (code available at https://github.com/llu025/Heterogeneous_CD) [31]. We evaluated each change map generated by all the methods with respect to the ground truth by using relevant metrics in change detection such as: false negatives (FNs), false positives (FPs), precision (P, Equation (8)), recall (R, Equation (9)), Cohen's kappa (κ, Equation (6)), and overall error (OE), where the metrics FN, FP, and OE are measured in percentage with respect to the number of real change pixels, real non-change pixels, and all the pixels in the image, respectively. The method U-CD-HPT is the only one that requires a post-processing stage by filtering and thresholding to build the change map. We selected the parameters for this post-processing stage according to the values presented by the authors in [31].
The metrics are expressed as follows: where p o is the observed agreement between predictions and labels (the overall accuracy), while p e is the probability of random agreement, which is estimated from the observed true positives (TPs), true negatives (TNs), false positives (FPs), and false negatives (FNs) as: Precision and recall measure the agreement between the predicted and the real changed pixels as: The number of samples (n s ) taken by spatial grid sampling and the standard deviations (σ) for the kernels were set through exhaustive grid-search using MATLAB R 2017a. Table 3 shows the parameter values of the proposed method for each database:

Estimating Biomass in Rice Crops
The number of samples (n s ) was set to 100, and they were selected using a grid mesh on the image. We used cosine distance for t-SNE. For both t-SNE and PCA, the dimension z and the standard deviations (σ) for the kernels were set through exhaustive grid-search using MATLAB R 2017a, which gave us the dimensions z = 16. Table 4 shows the mean results of σ parameters for each stage of the crop. In order to evaluate the performance of the proposed features for biomass estimation, we used cross-validation splitting the data into training (70%) and testing (30%) datasets. The model considers the whole growth stage of rice crops (vegetative, reproductive, and ripening). To measure the accuracy of the proposed features and the commonly used vegetation indices for biomass estimation, we calculated the root mean squared error (Equation (10)): where y i are for the real values of the biomass andŷ i are the estimations of the model.

Change Detection
The visual comparison of the estimated change maps and the corresponding ground truths provide a qualitative assessment of the performance for each method. Figure 6 illustrates the resulting change maps for the same geographical area, in which each row represents a dataset and each column is one of these methods: KI [16], rR-EM [17], rrR-EM [18], U-CD-HPT [31], and the proposed GBF-CD, respectively. The change maps that were obtained for all the methods show that the most challenging datasets were the Katios National Park and Atlantico (see the fifth and sixth row in Figure 6). The images corresponding to pre-and post-events have similar variability in their pixel intensities. Therefore, the assumption of the probabilistic approaches [17,18] (that the data follow a certain distribution for non-change and change pixels) does not hold. For both the thresholding algorithm (KI) [16] and the unsupervised method based on image-to-image translation (U-CD-HPT) [31], the estimated thresholding and the Frobenius distance between affinity matrices were unable to detect real change. This is because of the similarity between the distributions of change and no-change pixel intensities. In contrast, in the proposed GBF-CD method, the results came from building a fused graph (that minimized the similarities between the pixel intensities in the pre-event and post-event images) and from selecting an approximated eigenvector. This methodology maximizes the mutual information with a prior change map and yields change maps with lower false negative rates.       In terms of false negatives (FNs) and false positives (FPs), the probabilistic method (rR) [17] provided the worst performance. This was because the assumption of a large difference between pre-event and post-event images was not true in some of the test scenarios. The KI [16] and rrR-EM [18] algorithms classified all pixels in the San Francisco and California scenarios as belonging to the change category, producing zero FN and very high FP rates.
In summary, the U-CD-HPT [31] and GBF-CD methods provide a reasonable compromise between the correctly detected change pixels, FNs, and FP rates (see Tables 5-18, where the best performance with respect to the metrics is written in bold type).    Table 7. Performance of the models for the Alaska dataset.   Table 9. Performance of the models for the Katios dataset.

Model FN (%) FP (%) Recall (%) Precision (%) κ κ κ (%) OE (%) Time (s)
KI [16] 90. 34  The Toulouse, California, Bastrop, and Gloucester-2 test scenarios are represented by NIR and SAR images. With regard to the Toulouse and Gloucester-2 datasets, the rrR-EM and GBF-CD methods yielded change maps with high TPs, low FNs, and high FP rates. In contrast, KI, rR-EM, and U-CD-HPT provided low TPs and high FN rates. In the case of the California dataset, the KI, rR-EM, and rrR-EM algorithms provided inaccurate change maps due to the fact that these methods were devised for processing homogeneous (one modality) input data. Despite the data heterogeneity, the U-CD-HPT [31] and GBF-CD algorithms provided better performance in terms of FNs, FPs, and κ. Unlike the KI, rR-EM, and rrR-EM methods, the algorithms U-CD-HPT and GBF-CD used for the Bastrop dataset yielded an accurate change map.
To illustrate the relative performance of each CD method in all the challenging test scenarios, we counted the number of times a given CD method outperformed the competing algorithms in a specific performance metric (see Figure 7). We observed that the proposed GBF-CD method outperformed (in terms of κ) the competing algorithms in eight (Mulargia, Alaska, Madeirinha, Katios, Atlantico, San Francisco, WenChuan, and Toulouse) of the fourteen datasets. Moreover, the GBF-CD algorithm achieved the best performance metrics (FN, recall, precision, and OE) in four (Katios, Atlantico, WenChuan, and Gloucester-2) of the test scenarios. It also showed the lowest FP rate in one scenario (Mulargia). Overall, the proposed GBF-CD algorithm outperformed the comparison methods in at least one performance metric. In contrast, the KI, rR-EM, rrR-EM, and U-CD-HPT algorithms did not surpass other competing methodologies in at least one performance metric.  Figure 8 illustrates the comparison of the biomass prediction results. This was achieved by applying the dimensionality reduction techniques t-SNE and PCA to the features extracted from the proposed graph-based fusion approach, in addition to the biomass estimation yielded by using the traditional VIs. These results show that the VI does not capture the biomass features during the growth of rice crops. In contrast, the regressor that was trained with the features obtained after applying the dimensionality reduction techniques provided better prediction results and lower estimation errors (as shown in Table 19).

Biomass Estimation
Even though the proposed graph-based fusion features outperformed the traditional VIs for biomass estimation, there is still a need for further work; for instance, to decrease the computation time, as it currently takes approximately three hours to extract the features and train the models. It would also be advantageous to reduce the dependency of the performance on the number of selected samples n s and the standard deviation σ and explore parameter tuning methods beyond exhaustive grid search. However, one regression model based on the proposed features predicted the biomass well, despite its variability during different growth stages of rice crops. This is a result of the fact that the graph-based features capture both radiometric and structurally useful information from the MS bands. In contrast, the VIs are not able to capture the biomass variability for rice crop growth, requiring three separate regression models [38].

Conclusions
In this paper, we introduced a graph-based data fusion methodology for remote sensing images and tested it in two applications: change detection and biomass estimation in rice crops. The main contribution of this study was a "data-driven" framework used to capture unique information for multi-temporal, multi-spectral, and multi-modal/heterogeneous (Toulouse, California, Bastrop, and Gloucester-2 datasets) images in a fused graph. The fused graph stage captures information in one graph from a small set of samples (less than 10% of the total pixels) for each dataset (in different times or bands for homogeneous or heterogeneous data).
For the change detection application, we utilized a mutual information criterion to select from a prior and an eigen-image to build the final change map. In this case, our method is parametric since it depends on a number of samples and the prior information (difference images). Thereby, from the results for all datasets, we observed that our model obtained coherent change maps and outperformed state-of-the-art methods [16][17][18]. The method proposed in this study performed well with respect to the metrics TP and FN in multi-sensor datasets such as: Toulouse, California, Bastrop, and Gloucester-2.
In addition, the model developed in this paper does not require a post-processing stage, such as that needed by the U-CD-HPT method.
In biomass estimation, the model showed that the features extracted from the fused graph with a dimensionality reduction technique (i.e., PCA or t-SNE) capture the variability of biomass in rice crops. This makes it possible to predict the biomass features throughout the growth stages in rice crops, by using one regression model. These outcomes are more comprehensive than those reported by the authors in [38], in which three separate regression models estimated the biomass at each stage of the rice crop, based on VI features.
Future studies are necessary to reduce the dependency of the proposed method on the manual selection of n s samples and prior information, currently defined in terms of the differences between the pre-event and post-event images.