Spectrum- and RGB-D-Based Image Fusion for the Prediction of Nitrogen Accumulation in Wheat

The accurate estimation of nitrogen accumulation is of great significance to nitrogen fertilizer management in wheat production. To overcome the shortcomings of spectral technology, which ignores the anisotropy of canopy structure when predicting the nitrogen accumulation in wheat, resulting in low accuracy and unstable prediction results, we propose a method for predicting wheat nitrogen accumulation based on the fusion of spectral and canopy structure features. After depth images are repaired using a hole-filling algorithm, RGB images and depth images are fused through IHS transformation, and textural features of the fused images are then extracted in order to express the three-dimensional structural information of the canopy. The fused images contain depth information of the canopy, which breaks through the limitation of extracting canopy structure features from a two-dimensional image. By comparing the experimental results of multiple regression analyses and BP neural networks, we found that the characteristics of the canopy structure effectively compensated for the model prediction of nitrogen accumulation based only on spectral characteristics. Our prediction model displayed better accuracy and stability, with prediction accuracy values (R2) based on BP neural network for the leaf layer nitrogen accumulation (LNA) and shoot nitrogen accumulation (SNA) during a full growth period of 0.74 and 0.73, respectively, and corresponding relative root mean square errors (RRMSEs) of 40.13% and 35.73%.


Introduction
Nitrogen is a key element required for the growth and development of wheat, and thus exerts an important influence on its yield and quality. At present, during wheat production, in order to pursue high crop yields, problems exist such as excessive nitrogen fertilizer application and unscientific fertilization methods, which not only cause environmental pollution, but also reduce profits [1][2][3]. Therefore, the precise management of wheat fertilization and the realization of scientific nitrogen application not only have the production significance of saving nitrogen and increasing efficiency, but also the ecological significance of reducing farmland pollution.
in Rugao City (32 • 16 N, 120 • 46 E), Nantong City, Jiangsu Province, China. The wheat variety selected for the experiment was Shengxuan 6, and the experiment was divided into three nitrogen treatments, namely N0 (0 kg N/ha), N1 (180 kg N/ha), and N2 (360 kg N/ha). The planting densities had row spacings of 20 cm, 35 cm, and 50 cm. Three replicates were set for each group of experiments, for a total of 27 plots, with the area of each plot being 5 × 6 m 2 .
Remote Sens. 2020, 12, x FOR PEER REVIEW 3 of 16 treatments, namely N0 (0 kg N/ha), N1 (180 kg N/ha), and N2 (360 kg N/ha). The planting densities had row spacings of 20 cm, 35 cm, and 50 cm. Three replicates were set for each group of experiments, for a total of 27 plots, with the area of each plot being 5 × 6 m 2 . Figure 1. Wheat test scene, in which the red box is the wheat test plot.

Obtaining Wheat Agronomic Parameters
Based on the key fertilization periods of wheat, samples were taken at the jointing, booting, and heading stages. Due to the relatively long duration of the jointing stage, two samples were taken during this stage. From each plot, we randomly selected 15 wheat plants for sample analysis, and then separated them indoors according to their organs (stems, leaves, and ears). After the samples were fixed at 105 °C for 30 min, they were baked to a constant weight at 80 °C and then weighed. The individual organ weights, i.e., the leaf dry weight ( LDW ), stem dry weight ( tem S DW ), and spike dry weight ( pike S DW ), were measured. After each sample was crushed, the Kjeldahl method was used to determine the leaf nitrogen content ( LNC ), stem nitrogen content ( tem S NC ), and spike nitrogen content ( pike S NC ). We then calculated the leaf layer nitrogen accumulation ( LNA ) and shoot nitrogen accumulation ( SNA ) according to the following equations.

Wheat Spectrum and Image Data Acquisition
The ASD FieldSpec HandHeld 2 was used to obtain the spectral data of the crop. The measurement position was 1 m from the top of the canopy. Each plot was measured three times at different locations, with the average value representing the spectral information of the plot. We used the Intel ® RealSense™ D415 Depth Camera to acquire wheat field images. As shown in Figure 2b, the camera has two infrared stereo cameras, one infrared emitter, and one color sensor. The infrared stereo camera was used to generate depth images, and the color sensor was used to generate RGB images, both with a resolution of 1280 × 720 pixels. The camera was positioned 70 cm from the crop canopy, and we selected three different locations in each plot for image acquisition. The actual image acquisition device in the field is shown in Figure 2.

Obtaining Wheat Agronomic Parameters
Based on the key fertilization periods of wheat, samples were taken at the jointing, booting, and heading stages. Due to the relatively long duration of the jointing stage, two samples were taken during this stage. From each plot, we randomly selected 15 wheat plants for sample analysis, and then separated them indoors according to their organs (stems, leaves, and ears). After the samples were fixed at 105 • C for 30 min, they were baked to a constant weight at 80 • C and then weighed. The individual organ weights, i.e., the leaf dry weight (LDW), stem dry weight (S tem DW), and spike dry weight (S pike DW), were measured. After each sample was crushed, the Kjeldahl method was used to determine the leaf nitrogen content (LNC), stem nitrogen content (S tem NC), and spike nitrogen content (S pike NC). We then calculated the leaf layer nitrogen accumulation (LNA) and shoot nitrogen accumulation (SNA) according to the following equations.

Wheat Spectrum and Image Data Acquisition
The ASD FieldSpec HandHeld 2 was used to obtain the spectral data of the crop. The measurement position was 1 m from the top of the canopy. Each plot was measured three times at different locations, with the average value representing the spectral information of the plot. We used the Intel ® RealSense™ D415 Depth Camera to acquire wheat field images. As shown in Figure 2b, the camera has two infrared stereo cameras, one infrared emitter, and one color sensor. The infrared stereo camera was used to generate depth images, and the color sensor was used to generate RGB images, both with a resolution of 1280 × 720 pixels. The camera was positioned 70 cm from the crop canopy, and we selected three different locations in each plot for image acquisition. The actual image acquisition device in the field is shown in Figure 2.

Obtaining Wheat Agronomic Parameters
Based on the key fertilization periods of wheat, samples were taken at the jointing, booting, and heading stages. Due to the relatively long duration of the jointing stage, two samples were taken during this stage. From each plot, we randomly selected 15 wheat plants for sample analysis, and then separated them indoors according to their organs (stems, leaves, and ears). After the samples were fixed at 105 °C for 30 min, they were baked to a constant weight at 80 °C and then weighed. The individual organ weights, i.e., the leaf dry weight ( LDW ), stem dry weight ( tem S DW ), and spike dry weight ( pike S DW ), were measured. After each sample was crushed, the Kjeldahl method was used to determine the leaf nitrogen content ( LNC ), stem nitrogen content ( tem S NC ), and spike nitrogen content ( pike S NC ). We then calculated the leaf layer nitrogen accumulation ( LNA ) and shoot nitrogen accumulation ( SNA ) according to the following equations.

Wheat Spectrum and Image Data Acquisition
The ASD FieldSpec HandHeld 2 was used to obtain the spectral data of the crop. The measurement position was 1 m from the top of the canopy. Each plot was measured three times at different locations, with the average value representing the spectral information of the plot. We used the Intel ® RealSense™ D415 Depth Camera to acquire wheat field images. As shown in Figure 2b, the camera has two infrared stereo cameras, one infrared emitter, and one color sensor. The infrared stereo camera was used to generate depth images, and the color sensor was used to generate RGB images, both with a resolution of 1280 × 720 pixels. The camera was positioned 70 cm from the crop canopy, and we selected three different locations in each plot for image acquisition. The actual image acquisition device in the field is shown in Figure 2.  Due to lighting conditions, infrared reflection characteristics of the surface material of the measured object, and area occlusion, depth images have the problem of missing data (appearing as holes), resulting in decreased accuracy of the target object information, which affects the subsequent feature extraction. As shown in Figure 3, we divide the occurrence of holes into two categories for discussion: local holes and large-area holes. Local holes appear within the leaf, and are mainly affected by the surrounding environmental conditions, such as light and vibration. The area of this type of hole is usually small, and the depth information is similar to the surrounding depth information. Large-area holes appear at the junctions of different areas. The reason for their occurrence is related to the application range of the sensor. They mainly appear at the transition of depth information on the edge of the leaf. Based on the analysis of different types of holes, we designed a hole-filling algorithm (HF), consisting of the following steps. First, perform column traversal on the depth information of a crop row image, find the positions of all the hole points (line), and calculate the difference H of the non-zero depth information before and after the point (line). By comparing H with an artificially set threshold T, the types of holes are distinguished. When H < T, it is a local hole, and the mean value of the non-zero depth information before and after the hole point (line) is interpolated. When H ≥ T, it is a large-area hole, and the non-zero depth information before and after the hole point (line) is separately fitted and interpolated. Since there were horizontal crop rows in the center of the images we examined, a curve was more suitable for fitting. We used a quadratic polynomial for fitting. HF is expressed as Equation (3), where Hb and H f represent the non-zero depth information before and after the hole point (line), respectively; x represents the column coordinates corresponding to the hole point; and d(x) is the depth information to be filled.

Depth Image Restoration Method
Due to lighting conditions, infrared reflection characteristics of the surface material of the measured object, and area occlusion, depth images have the problem of missing data (appearing as holes), resulting in decreased accuracy of the target object information, which affects the subsequent feature extraction. As shown in Figure 3, we divide the occurrence of holes into two categories for discussion: local holes and large-area holes. Local holes appear within the leaf, and are mainly affected by the surrounding environmental conditions, such as light and vibration. The area of this type of hole is usually small, and the depth information is similar to the surrounding depth information. Large-area holes appear at the junctions of different areas. The reason for their occurrence is related to the application range of the sensor. They mainly appear at the transition of depth information on the edge of the leaf. Based on the analysis of different types of holes, we designed a hole-filling algorithm (HF), consisting of the following steps. First, perform column traversal on the depth information of a crop row image, find the positions of all the hole points (line), and calculate the difference H of the non-zero depth information before and after the point (line). By comparing H with an artificially set threshold T, the types of holes are distinguished. When H < T, it is a local hole, and the mean value of the non-zero depth information before and after the hole point (line) is interpolated. When H ≥ T, it is a large-area hole, and the non-zero depth information before and after the hole point (line) is separately fitted and interpolated. Since there were horizontal crop rows in the center of the images we examined, a curve was more suitable for fitting. We used a quadratic polynomial for fitting. HF is expressed as Equation (3), where Hb and Hf represent the non-zero depth information before and after the hole point (line), respectively; x represents the column coordinates corresponding to the hole point; and () dx is the depth information to be filled.

Wheat Canopy Segmentation Method Based on RGB-D Images
There are non-crop canopy features such as soil, straw, and weeds in canopy images of wheat during its early and mid-term growth. In order to eliminate the interference of non-canopy information in the images during feature extraction, the canopy portion of the images needs to be segmented and extracted. Common vegetation indices such as ExG and ExGR are greatly affected by environmental light when performing canopy segmentation and cannot remove green information

Wheat Canopy Segmentation Method Based on RGB-D Images
There are non-crop canopy features such as soil, straw, and weeds in canopy images of wheat during its early and mid-term growth. In order to eliminate the interference of non-canopy information in the images during feature extraction, the canopy portion of the images needs to be segmented and extracted. Common vegetation indices such as ExG and ExGR are greatly affected by environmental light when performing canopy segmentation and cannot remove green information such as weeds. When the light is strong, the color of the reflective leaf is closer to the color of the soil background, and is thus easily removed by mistake, resulting in poor segmentation. In order to avoid the influence of ambient light on the segmentation results, we propose segmenting the soil background in hue, Remote Sens. 2020, 12, 4040 5 of 16 saturation, lightness (HSI), and space. The value range of the H channel in the HSI color model is [0, 2π], and the H values corresponding to yellow, green, and cyan are π 3 , 2π 3 , and π, respectively. Due to different nitrogen treatments of wheat, the colors of the leaves varied, so the H channel value in the interval (0, 4π 3 ) was retained as crop row information. In addition, considering the influence of field weed information on the extraction of crop row characteristics-since the wheat closes the rows after the jointing stage-weeds no longer affect the subsequent extraction of wheat canopy characteristics, and were only removed during the jointing stage via height information obtained from the depth images.

RGB-D Image Fusion Method for the Wheat Canopy
When most studies use RGB images to obtain canopy structure information, they extract coverage and other information through color differences, which cannot express the characteristics of wheat canopy structure in a three-dimensional space [26]. When using depth information for crop growth parameter inversion, researchers generally extract canopy height information to predict above-ground biomass and other agronomic parameters, and no further methods for extracting canopy structure features based on three-dimensional space have been proposed [27]. In order to simultaneously fuse RGB information and depth information to express the characteristics of canopy structure, we propose a pixel-level RGB-D image fusion method, which uses the connection between the corresponding pixels of RGB images and depth images to complement the two kinds of information in the modeling process.
Since the imaging origins of RGB images and depth images are inconsistent, depth information and RGB image information cannot correspond one-to-one. For the subsequent feature extraction, the first step is to align two images. The basic idea of image alignment is the conversion of different coordinate systems. By converting the coordinate system of a depth image to the world coordinate system and then to the coordinate system of an RGB image, the alignment of the two images is realized.
The basic idea of the RGB-D image fusion method is to use the physical relationship between intensity and phase. In the natural environment, color information in an RGB image indicates the reflection intensity of the object, and the intensity information is affected by different lighting conditions. The principle of acquiring depth information is that the sensor collects different phase information for different depth areas of the subject, and then converts these structural variations into depth information through the arithmetic unit. Therefore, the inverse square law can be used to describe the relationship between intensity and phase [28]: where I R represents the intensity information of each point in the RGB image, corresponding to the grayscale value after the RGB image is converted to a grayscale image, and d represents the depth information corresponding to I R . From the inverse square law, the phase information φ of each pixel in the image can be calculated. When performing image fusion, we use the idea of IHS transform to replace the I component in the HSI model with the phase information (φ) and depth information (d) for representation, thus obtaining a fused image based on HSP (where P is phase information) and HSD (where D is depth information) models. Since the value range of the I channel is [0, 1], in order to simplify the calculation, the depth information is normalized. The normalization formula is as follows: where d is the actual depth information, d max and d min are the maximum and minimum depth information, respectively, and d represents the normalized results. Previous studies have shown that the spectral reflectance of crops at 730 nm and 815 nm performs well in assessing leaf area index, pigment status, and nitrogen nutrition status [29,30]. Therefore, we selected the above two bands to construct the normalized difference red edge (NDRE) index, and the ratio vegetation index (RVI). The calculation formulas of vegetation indices are as follows: where R730 and R815 represent the reflectance of wheat in the 730 and 815 nm bands, respectively.

Feature Extraction of Fused Images
In order to represent the information of the wheat canopy structure more accurately, a single row of crops was intercepted for feature extraction, and the crop row image block size intercepted in all periods was 150 × 500 pixels. After image fusion, the color co-occurrence matrix was used to extract texture features. The color co-occurrence matrix usually contains a large amount of information. In order to reduce the dimensionality of the feature space as much as possible to decrease the amount of calculation, while still retaining the description of the image texture information, we selected five feature statistics-entropy (ENT), inverse difference moment (IDM), contrast (CON), correlation (COR), and angular second moment (ASM)-to represent texture [31].
The specific calculation process of texture features is as follows: First, the RGB image is converted to color space, and its corresponding value in the HSD and HSP space models is calculated. In this study, the H, S, D, and P components were non-uniformly quantized into eight parts in order to reduce information redundancy and make the calculation of the co-occurrence matrix (CCM) easier. When calculating the CCM, set the four components as C 1 , C 2 , C 3 , and C 4 , set m = C k , n = C k as the two components in the four-component combination space (k, k ∈ {1, 2, 3}), and use the color co-occurrence matrix CCM mn to represent the different components, C k and C k , of the pixels in the image; specifically, the measurement of the interaction between m and n space. From this, CCM HH , CCM SS , CCM PP , CCM DD , CCM HS , CCM HP , CCM SP , CCM HD , and CCM SD can be calculated. Since the symbiosis matrices CCM HH , CCM SS , CCM PP , and CCM DD only have elements on the diagonal, these elements have little effect on distinguishing various textures, and thus the remaining four matrices are used for calculation. For any pixel in the image, assuming that the value of its kth component is i, i.e., m = i, and the value of the k th component is j, i.e., n = j, then the element CCM m.n (i, j) in the matrix is used to represent the number of occurrences of such pixels in the image. The calculation formula of the color co-occurrence matrix is as follows: where the ith row and jth column elements in the color co-occurrence matrix are represented by p(i, j).

Evaluation Index of the Fused Image
Two objective indicators-entropy and cross-entropy-were used to evaluate the quality of the fused images. Before image evaluation, each image in the HSI space is converted to an RGB image, and the grayscale operation is performed. The entropy of an image (E) is an important index for measuring the richness of image information. The larger the entropy, the more information the image has. Cross-entropy (C) can be used to determine the information difference between the grayscale distribution of two images and is a key index for evaluating the difference between two images. The smaller the cross-entropy, the smaller the difference between the fused image and the source image, and the more information the fusion method extracts from the source image. The calculation formulas of entropy and cross-entropy, respectively, are: where L represents the grayscale level of the image, which is 256, and p i and q i represent the ratio of pixels with grayscale value to the total pixels in the source image and fused image, respectively.

Evaluation Method of the Hole Repair Algorithm
In order to evaluate the effectiveness and accuracy of the hole repair algorithm, we artificially created holes in existing depth images, including local holes and large-area holes at the edges, and evaluated the algorithm by calculating the true value and the RMSE of the filling value calculated by the algorithm. The calculation formula of the RMSE is as follows, in which M i represents the repair value of each hole point, T i represents the true value corresponding to the hole point, and n represents the number of hole points:

Establishment and Evaluation Method of the Wheat Nitrogen Accumulation Model
Due to high feature dimensionality, we selected two methods-multiple regression analysis and the BP neural network-to separately construct the model. In order to reduce the phenomenon of overfitting and improve the generalization ability of the model, the prediction result of the BP neural network model was the mean value of five trials, in which 50% of the samples were randomly selected each time as a training set (14 for a single growth period, and 54 for a full growth period), 20% as the validation set (five for a single growth period and 22 for a full growth period), and 30% as a test set (eight for a single growth period and 32 for a full growth period). The multiple linear regression model is as follows: where num is the number of different features, x i represents the values of different features, y represents the values of LNA and SNA, and a i and b represent the parameters of the model. The structure of the BP neural network is presented in Figure 4. There are 10 neurons in the hidden layer, and the training function is Bayesian regularization. The model was evaluated by calculating R 2 and RRMSE. The calculation formulas of the RMSE and RRMSE are as follows: where y pi , y ri , and y r represent the predicted value, measured value, and mean value of the wheat LNA and SNA, respectively, and N is the sample size. where pi y , ri y , and r y represent the predicted value, measured value, and mean value of the wheat LNA and SNA, respectively, and N is the sample size.

Hole-Repair Results
Since the T value setting in the HF algorithm will directly affect the classification results of the holes, it will thus affect the accuracy of the subsequent hole repair. In order to determine the best difference threshold T for the hole-repair algorithm, we employed different T values for experimentation. The results are listed in Table 1. We chose a T value of 40 to repair the entire depth image. Figure 5b shows the repair result of the hole information. Compared with the original image shown in Figure 5a, it is obvious that the repaired image has no black holes, and the repair quality is good. By comparing Figure 5c,d, i.e., before and after the repair, it can clearly be seen that the edge information (the transition of depth information) and the information of the local hole have been repaired, and the overall curve is smooth.

Hole-Repair Results
Since the T value setting in the HF algorithm will directly affect the classification results of the holes, it will thus affect the accuracy of the subsequent hole repair. In order to determine the best difference threshold T for the hole-repair algorithm, we employed different T values for experimentation. The results are listed in Table 1. We chose a T value of 40 to repair the entire depth image. Figure 5b shows the repair result of the hole information. Compared with the original image shown in Figure 5a, it is obvious that the repaired image has no black holes, and the repair quality is good. By comparing Figure 5c,d, i.e., before and after the repair, it can clearly be seen that the edge information (the transition of depth information) and the information of the local hole have been repaired, and the overall curve is smooth. where pi y , ri y , and r y represent the predicted value, measured value, and mean value of the wheat LNA and SNA, respectively, and N is the sample size.

Hole-Repair Results
Since the T value setting in the HF algorithm will directly affect the classification results of the holes, it will thus affect the accuracy of the subsequent hole repair. In order to determine the best difference threshold T for the hole-repair algorithm, we employed different T values for experimentation. The results are listed in Table 1. We chose a T value of 40 to repair the entire depth image. Figure 5b shows the repair result of the hole information. Compared with the original image shown in Figure 5a, it is obvious that the repaired image has no black holes, and the repair quality is good. By comparing Figure 5c,d, i.e., before and after the repair, it can clearly be seen that the edge information (the transition of depth information) and the information of the local hole have been repaired, and the overall curve is smooth.

Results of Wheat Canopy Segmentation
The basic idea of the traditional color vegetation index for canopy segmentation is to use the difference between color information of the crop row and soil background, although the segmentation effect will be affected by factors such as light conditions and weeds. We converted the RGB color model to an HSI color model and performed canopy segmentation using the definition of hue to reduce the impact of light changes on crop row extraction. In addition, the depth information was used to remove weeds, thus reducing the interference of the green background of the non-wheat canopy on the extraction of crop rows. By analyzing the heights of wheat and weeds at the jointing stage, the maximum height of wheat and the average height of weeds under different nitrogen treatments were counted. The results are shown in Figure 6. Taking into account the height difference of the leaves at different positions of the wheat, only the pixels below 10% of the maximum height were regarded as weed information and removed. Figure 7 shows the crop row information extracted by the ExG and ExGR vegetation indices. It can be seen that the reflection of the leaves on the right side of the image has a poor segmentation effect. Figure 7b shows the canopy segmentation resulting from simultaneously removing the soil background and weed information, which is better than the segmentation effect of the vegetation index that only depends on color features. As seen in the figure, the outline of the crop rows is clear, and the reflective part of the leaves does not affect the segmentation.
The basic idea of the traditional color vegetation index for canopy segmentation is to use the difference between color information of the crop row and soil background, although the segmentation effect will be affected by factors such as light conditions and weeds. We converted the RGB color model to an HSI color model and performed canopy segmentation using the definition of hue to reduce the impact of light changes on crop row extraction. In addition, the depth information was used to remove weeds, thus reducing the interference of the green background of the non-wheat canopy on the extraction of crop rows. By analyzing the heights of wheat and weeds at the jointing stage, the maximum height of wheat and the average height of weeds under different nitrogen treatments were counted. The results are shown in Figure 6. Taking into account the height difference of the leaves at different positions of the wheat, only the pixels below 10% of the maximum height were regarded as weed information and removed. Figure 7 shows the crop row information extracted by the ExG and ExGR vegetation indices. It can be seen that the reflection of the leaves on the right side of the image has a poor segmentation effect. Figure 7b shows the canopy segmentation resulting from simultaneously removing the soil background and weed information, which is better than the segmentation effect of the vegetation index that only depends on color features. As seen in the figure, the outline of the crop rows is clear, and the reflective part of the leaves does not affect the segmentation.  Figure 8 shows the image fusion results under low light, strong light, and normal light. Table 2 shows the quality evaluation results of fused images under different lighting conditions, in which fused image 1 represents the fused image based on phase information, and fused image 2 represents the fused image based on depth information. The calculation results reveal that the entropy of the fused image under different lighting conditions is greater than that of the original image, while the cross-entropy is smaller, indicating that the quality of the fused image has been improved and the information of the original image has been better retained. The basic idea of the traditional color vegetation index for canopy segmentation is to use the difference between color information of the crop row and soil background, although the segmentation effect will be affected by factors such as light conditions and weeds. We converted the RGB color model to an HSI color model and performed canopy segmentation using the definition of hue to reduce the impact of light changes on crop row extraction. In addition, the depth information was used to remove weeds, thus reducing the interference of the green background of the non-wheat canopy on the extraction of crop rows. By analyzing the heights of wheat and weeds at the jointing stage, the maximum height of wheat and the average height of weeds under different nitrogen treatments were counted. The results are shown in Figure 6. Taking into account the height difference of the leaves at different positions of the wheat, only the pixels below 10% of the maximum height were regarded as weed information and removed. Figure 7 shows the crop row information extracted by the ExG and ExGR vegetation indices. It can be seen that the reflection of the leaves on the right side of the image has a poor segmentation effect. Figure 7b shows the canopy segmentation resulting from simultaneously removing the soil background and weed information, which is better than the segmentation effect of the vegetation index that only depends on color features. As seen in the figure, the outline of the crop rows is clear, and the reflective part of the leaves does not affect the segmentation.  Figure 8 shows the image fusion results under low light, strong light, and normal light. Table 2 shows the quality evaluation results of fused images under different lighting conditions, in which fused image 1 represents the fused image based on phase information, and fused image 2 represents the fused image based on depth information. The calculation results reveal that the entropy of the fused image under different lighting conditions is greater than that of the original image, while the cross-entropy is smaller, indicating that the quality of the fused image has been improved and the information of the original image has been better retained.   Table 2 shows the quality evaluation results of fused images under different lighting conditions, in which fused image 1 represents the fused image based on phase information, and fused image 2 represents the fused image based on depth information. The calculation results reveal that the entropy of the fused image under different lighting conditions is greater than that of the original image, while the cross-entropy is smaller, indicating that the quality of the fused image has been improved and the information of the original image has been better retained. Remote Sens. 2020, 12, x FOR PEER REVIEW 10 of 16

Feature Screening of Fused Images of the Wheat Canopy
A total of 25 features were extracted from the color co-occurrence matrix of two fused images. The feature data was linearly fitted with the wheat LNA and SNA, and feature screening was performed by comparing the R 2 magnitude and the results are shown in Figure 9. In order to select textural features that have strong applicability in different growth periods, four features of angular second moment and entropy derived from the CCMHP and CCMHD were selected in the early jointing, late jointing, and booting stages. During the heading stage, however, the canopy structure changes significantly due to the wheat heading. Four features were selected, including the contrast derived from the CCMHS, the second moment and entropy of the CCMHP symbiosis matrix angle, and the entropy calculated in the CCMHD. Figure 9. Feature screening results, in which S1, S2, S3, and S4 represent the early jointing, late jointing, booting, and heading stages of wheat, respectively.

Feature Screening of Fused Images of the Wheat Canopy
A total of 25 features were extracted from the color co-occurrence matrix of two fused images. The feature data was linearly fitted with the wheat LNA and SNA, and feature screening was performed by comparing the R 2 magnitude and the results are shown in Figure 9. In order to select textural features that have strong applicability in different growth periods, four features of angular second moment and entropy derived from the CCM HP and CCM HD were selected in the early jointing, late jointing, and booting stages. During the heading stage, however, the canopy structure changes significantly due to the wheat heading. Four features were selected, including the contrast derived from the CCM HS , the second moment and entropy of the CCM HP symbiosis matrix angle, and the entropy calculated in the CCM HD .

Feature Screening of Fused Images of the Wheat Canopy
A total of 25 features were extracted from the color co-occurrence matrix of two fused images. The feature data was linearly fitted with the wheat LNA and SNA, and feature screening was performed by comparing the R 2 magnitude and the results are shown in Figure 9. In order to select textural features that have strong applicability in different growth periods, four features of angular second moment and entropy derived from the CCMHP and CCMHD were selected in the early jointing, late jointing, and booting stages. During the heading stage, however, the canopy structure changes significantly due to the wheat heading. Four features were selected, including the contrast derived from the CCMHS, the second moment and entropy of the CCMHP symbiosis matrix angle, and the entropy calculated in the CCMHD. Figure 9. Feature screening results, in which S1, S2, S3, and S4 represent the early jointing, late jointing, booting, and heading stages of wheat, respectively. . Feature screening results, in which S1, S2, S3, and S4 represent the early jointing, late jointing, booting, and heading stages of wheat, respectively. Table 3 shows the prediction ability of wheat LNA and SNA with different features during the early stage of jointing. By comparing the prediction results of different features, it was discerned that the prediction accuracy based on spectral features is significantly lower than that based on fusion features. This is because in the early stage of jointing, wheat plants are small, and soil background has a huge impact on the collection of canopy reflectivity, resulting in low prediction accuracy. Since the features extracted by image fusion are not affected by soil background, the prediction effect of fused image features is better than that of the spectral features. As the wheat grows and forms joints, it gradually closes the rows, and the collection of canopy reflectance is no longer disturbed by soil background. Tables 4-6 present the prediction results of different characteristics in the late jointing, booting, and heading stages, respectively. The prediction accuracy of wheat LNA and SNA based on spectral characteristics during these stages is higher and tends to be stable. Meanwhile, the prediction accuracy based on fused image features is close to the prediction accuracy during the early stage of jointing, indicating that the canopy structure features reflected by fused images are not affected by the growth period and are relatively consistent in predicting wheat LNA and SNA at different growth stages.

Forecast Results of Nitrogen Accumulation in Wheat during Different Growth Periods
By comparing the prediction results of wheat LNA and SNA based on fused image features at different growth stages listed in Tables 3-6 it was found that the prediction of SNA based on fused image features is better. This is because the depth information in the fused image not only contains the structure of the wheat leaf layer, but also reflects the height of the wheat stalk, so it can more accurately express the nitrogen accumulation of the aboveground wheat population.  Tables 3-7 show the prediction accuracy of different features in different models. In all growth periods, the prediction results of the BP neural network were found to be better than those of the multiple regression model. There is an incomplete linear correlation between the fused image features and LNA and SNA, and since the multiple regression model cannot accurately describe the nonlinear correlation, the result is a certain deviation. The prediction results of all growth periods reveal that the combination of RVI and fused image features performs best, with R 2 > 0.7 in the BP neural network. In addition, in the multiple regression model, the integration of the fused image features also significantly improved the prediction accuracy of the spectral features for LNA and SNA. This proves that the fused image features expressing canopy structure parameters can compensate for the LNA and SNA prediction models based on spectral features to a certain extent, and effectively improve the prediction accuracy of the models.

Prediction Model Performance of Wheat Nitrogen Accumulation Based on Spectral Characteristics
In previous studies, multispectral remote sensing technology has been used to estimate the amount of nitrogen accumulation in wheat [32,33]. It was found that in the range f 550-750 nm for winter wheat, the spectral reflectance of the canopy changes with the change of leaf nitrogen content [34]. The spectral information in the red light portion of the spectrum is often used as an index for predicting nitrogen content [35], which proves the potential of spectral features to predict nitrogen accumulation in wheat. However, the spectral reflectance of the crop canopy exhibits obvious anisotropic characteristics due to the differences in canopy structure. The nitrogen accumulation prediction model based on spectral features ignores the influence of canopy structure on spectral information, resulting in low prediction accuracy and insufficient stability of prediction models [7,13,34]. In our study, depth images and RGB images were combined to extract wheat canopy texture features with depth information, and wheat canopy structure information was expressed from a three-dimensional perspective. The experimental results revealed that when predicting the nitrogen accumulation in wheat, the three-dimensional structural characteristics of the wheat canopy compensated for the shortcomings of the nitrogen accumulation prediction model, effectively improving the prediction accuracy of the model.

Expression Method of Wheat Canopy Structure Based on RGB-D Fused Images
Image fusion is generally divided into three levels: pixel-level, feature-level, and decision-level. At present, in most studies, feature-level fusion methods have often been used when predicting crop canopy structure parameters with RGB-D data [36,37], i.e., extracting features from RGB images and depth images separately, and predicting parameters by combining features. This method, however, often ignores the relationship between the corresponding pixels in the RGB images and depth images. The pixel-level fusion method can retain as much detailed information of the original image as possible. After fusing depth images with RGB images at the pixel level in our study, the extracted fused image features contained the depth information of crop rows, which could break through the limitations of two-dimensional images and express canopy structure information of wheat more accurately. The extracted fused image features cannot quantitatively describe canopy structure information, however. In subsequent research, we will continue to explore the extraction of more detailed canopy structure features from point cloud information, such as leaf inclination.

Generalization Ability of the Wheat Nitrogen Accumulation Prediction Model
The generalization ability of a model refers to its ability to adapt to fresh samples. The generalization ability directly determines the stability and versatility of the model. We utilized the independent experimental data of a wheat variety under three nitrogen treatments and two density treatments to construct and verify the model, and to improve the generalization ability of the model through cross-validation. When applying the model to other varieties, however, the model parameters still need to be corrected. This is because the plant height, leaf inclination angle, tiller number, and leaf development process of different varieties are quite different. In the follow-up study, different plant types of wheat varieties will be tested in order to further improve the generalization ability of the model among different varieties of wheat.

Conclusions
We proposed a pixel-level RGB-D image fusion method, which expresses three-dimensional structural information of the canopy by extracting textural features of fused images, thereby compensating the model for predicting nitrogen accumulation based only on spectral characteristics. We evaluated and analyzed the compensation effect by using two models-multiple regression analysis and the BP neural network. Our work is summarized as follows: