Retrieval of Nitrogen Content in Apple Canopy Based on Unmanned Aerial Vehicle Hyperspectral Images Using a Modified Correlation Coefficient Method

The accurate retrieval of nitrogen content based on Unmanned Aerial Vehicle (UAV) hyperspectral images is limited due to uncertainties in determining the locations of nitrogen-sensitive wavelengths. This study developed a Modified Correlation Coefficient Method (MCCM) to select wavelengths sensitive to nitrogen content. The Normalized Difference Canopy Shadow Index (NDCSI) was applied to remove the shadows from UAV hyperspectral images, thus yielding the canopy spectral information. The MCCM was then used to screen the bands sensitive to nitrogen content and to construct spectral characteristic parameters. Finally, the optimal model for nitrogen content retrieval was established and selected. As a result, the screened sensitive wavelengths for nitrogen content selected were 470, 474, 490, 514, 582, 634, and 682 nm, respectively. Among the nitrogen content retrieval models, the best model was the Support Vector Machine (SVM) model. In the training set, this model outperformed the other models with an R2 of 0.733, RMSE of 6.00%, an nRMSE of 12.76%, and a MAE of 4.49%. Validated by the ground-measured nitrogen content, this model yielded good performance with an R2 of 0.671, an RMSE of 4.73%, an nRMSE of 14.83%, and a MAE of 3.98%. This study can provide a new method for vegetation nutrient content retrieval based on UAV hyperspectral data.


Introduction
Nitrogen is an element that is an essential nutrient in the growth and development of apple trees and is a key indicator in general fruit tree growth evaluations. The level of nitrogen content affects not only the growth and development of apple trees, but also the yield and fruit quality [1]. The traditional means of measuring nitrogen content is the Kjeldahl method, which requires drying, grinding, weighing, and desiccating the leaves. Its efficiency level is low, it is time-consuming and laborious, and, more importantly, it does not meet the needs of rapid, real-time monitoring. Obtaining the nitrogen content of a tree canopy efficiently and effectively has always been an urgent problem to be solved.
With the continuous innovation and improvement of remote sensing platforms, optical remote sensing technology has become an important means of monitoring vegetation nitrogen content [2,3]. Currently, the data sources used for quantitative analysis of vegetation nitrogen mainly include near-surface hyperspectral data, satellite remote sensing data, and low-altitude UAV data [4]. Ground hyperspectral data mainly explore the relationship between each wavelength of the canopy spectrum and the nitrogen content of leaves by obtaining continuous canopy spectral information. Based on the selected wavelength sensitive to nitrogen content, an inversion model was constructed for inversion studies of vegetation nutrients [5][6][7][8]. Although this method can obtain high-precision, continuous, and multi-wavelength canopy spectral data, it cannot achieve high-quality visual monitoring in the study area. There are many studies on monitoring vegetation nitrogen content based on satellite remote sensing images. These methods usually extract spectral data based on satellite remote sensing images and construct a vegetation index to explore the relationship between that index and the nitrogen content [9][10][11]. The spectral data of the study area can be quickly obtained at a fairly low cost. However, satellite remote sensing images are often affected by sensor error, satellite orbit position, revisit cycle, etc., which cannot meet the necessary level of monitoring precision for crops in small areas. In recent years, low-altitude UAV remote sensing has been under tremendous development. Compared with the first two remote sensing platforms, low-altitude UAV remote sensing platforms are flexible and can quickly capture images with high spatial and temporal resolution in real-time. UAV image processing is relatively simple and has a low cost, making it more suitable for the quantitative retrieval of field vegetation nutrient data [12][13][14]. At present, UAV-based studies on vegetation nutrient monitoring mainly use visible-light, multispectral, and hyperspectral sensors [15,16]. At present, visible-light and multispectral sensors are the most widely used, but these two sensors contain less band information [17][18][19]. The spectral information of a hyperspectral UAV sensor is richer than that of visible-light and multispectral sensors. In addition, the sensitive wavelengths that are selected based on UAVs with hyperspectral sensors are more accurate, and the retrieval accuracy is higher [20][21][22].
However, hyperspectral images have more spectral information, which makes the screening of sensitive wavelengths more complicated. At present, the method of screening the sensitive wavelengths from hyperspectral data first involves performing a mathematical transformation (differentiation, derivation, wavelet transform, etc.) on the original spectrum. Continuum-Removal (C-R), Principal Component Analysis (PCA), the Successive Projection Algorithm (SPA), and the Genetic Algorithm (GA) determine the locations of sensitive wavelengths [23,24]. Although these methods can accurately screen out wavelengths sensitive to vegetation nutrients, the specific parameters of the algorithm used need to be determined, and the operation processes are complicated. The correlation coefficient method is simple to operate and is the most widely used sensitive wavelength screening method, but the wavelength screening of this method is relatively concentrated and the accuracy level is low [25]. The band intervals of UAV hyperspectral data are small, and the correlation levels between bands are high. The most commonly applied correlation coefficient method cannot accurately determine the positions of sensitive wavelengths. The simple and quick extraction of wavelengths sensitive to nitrogen content from UAV hyperspectral data is the problem to be solved in this study.
We used an apple orchard in Guanli Town, Qixia City, Shandong Province as the research area and the fruit tree canopy as the research object. The Normalized Difference Canopy Shadow Index (NDCSI) was adopted to remove the shadows in the UAV hyperspectral image data, the wavelengths sensitive to nitrogen content were screened using the Modified Correlation Coefficient Method (MCCM), and spectral parameters were constructed. Then, Multiple Linear Regression (MLR), Partial Least Squares (PLS), Support Vector Machine (SVM), and Random Forest (RF) were utilized. The optimal model for the retrieval of nitrogen content could then be determined. This study can provide a theoretical basis and methodological reference for vegetation nutrient content inversion based on UAV hyperspectral images in the future.

Study Area
The research area is located at the Baishi Apple Park, Guanli Town, Qixia City, Shandong Province, China (37 • 12 27 N, 120 • 44 33 E) ( Figure 1). The park has a warm, temperate monsoon climate, with four distinct seasons. The annual mean temperature is 11.6 • C. The temperature difference between day and night in autumn is large, which is essential to sugar accumulation in fruit trees. The soil type is brown soil, and the hilly terrain is suitable for fruit tree growth. With its unique geographical location, fair climate and soil conditions, the town has become one of the most dominant apple production areas in China. Four orchards, P1, P2, P3, and P4, were randomly selected for the experiment. P1 and P3 were arbor orchards, and P2 and P4 were dwarf orchards ( Figure 1).

Hyperspectral Image Acquisition and Preprocessing
Hyperspectral images of the apple canopies in the study area were collected by UAV during 10:00-14:00 on 30 June 2019, under clear, cloudless, and windy weather. The hyperspectral images were obtained by an eight-rotor UAV M600 Pro with a UHD185 imager (produced by Cubert, Ulm, Germany). The imager has a spectral range of 450 nm-950 nm, with a total of 125 bands spaced at 4 nm, and it can provide hyperspectral images of 50 × 50 pixels and gray images of 1000 × 1000 pixels. During this test flight, the flight altitude was set at 50 m, the spectrometer lens was focused at 23 nm, the field of view angle was 15 • , the image overlap rate was about 80%, and the lateral overlap rate was about 65%.
Agisoft PhotoScan software was adopted for image stitching, and the standard feature radiation correction method was used to calibrate the radiation of the hyperspectral images. Based on the layout of the ground control points for joining together of hyperspectral images geometric correction, the calibration error was set to be less than 0.5 pixel, to obtain the UAV hyperspectral images in the study area.

Apple Canopy Leaf Collection and Nitrogen Content Determination
The apple canopy leaves were collected simultaneously with the drone flight. Under the principle of uniform points, 25 apple trees were randomly selected in each orchard. Three similar-sized pieces of leaves with healthy blade conditionss located on the east, west, south and north sides, respectively, were collected as samples. At the same time, a handheld GPS was used to record the coordinates of the canopy samples. Each sample contained 12 blades, yielding a total of 100 samples. The collected samples were then brought back to the lab using an incubator.
In the laboratory, the leaf samples were put into a 105 • C air-blowing drying oven for 0.5 h and then dried to constant weight at 80 • C. The dried samples were ground into a powder with a mortar, and then screened. The nitrogen content in the leaves was determined via the Kjeldahl method. Finally, the nitrogen content of tree canopy leaves was obtained. The samples were also sorted according to the nitrogen content from low to high. Isometric sampling was then conducted according to the modeling set:verification set ratio = 2:1. Finally, 67 samples constituting the modeling set and 33 samples constituting the verification set (Table 1) were prepared. The mean value and standard deviation are shown in Equations (1) and (2).
where y i is the measured reflectivity value, n is the number of samples, µ is the average value, and δ is the standard deviation.

Methodology Description
In this study, the shades of tree canopy in the UAV hyperspectral images were removed based on NDCSI, and the spectral information of the canopy was extracted. Then, the sensitive wavelengths of canopy nitrogen content were screened based on MCCM and the spectral characteristic parameters were constructed. Finally, the nitrogen content retrieval models were established based on spectral characteristic parameters, and the accuracy of the model was tested. The technical route is shown in Figure 2.

Extraction of Canopy Hyperspectral Images Based on NDCSI
Based on NDCSI, non-related canopy information such as soil and shadow were removed by the threshold method.
To extract the canopy information of the apple trees, the NDCSI was calculated using the band operation tool in ENVI (as shown in Equations (3) and (4)) [26]. According to the threshold range of the vegetation index in Arcgis, images of the apple tree canopy were successively extracted with a step size of 0.005, and the recognition of apple tree canopy under different thresholds was compared. The maximum value was set as the threshold, pixels larger than the threshold were classified as canopy, and pixels smaller than the threshold were classified as shadows. Based on the optimal threshold, a mask was established to extract the spectral image of tree canopy.
In Equation (3), R NIR is the gray value of the near-infrared band, and R RED is the gray value of R red-edge band. In Equation (4), the R red-edge band is the position of the firstorder differential maximum value of spectral data. In this formula, the R red-edge position of the hyperspectral images is the wavelength position where the first-order differential maximum value occurred most frequently. (R red-edge ) MIN is the minimum value of R red-edge band of the image, and (R red-edge ) MAX is the maximum value of R red-edge band of the image.

Screening of Sensitive Wavelengths Using MCCM Method
In previous studies, sensitive wavelengths were mostly screened according to the level of the correlation coefficient. However, the wavelength interval of hyperspectral images is small, which leads to a high correlation between adjacent wavelengths. Therefore, the correlation coefficient method cannot determine the sensitive wavelengths comprehensively and accurately.
Previous studies have found that wavelengths with correlation coefficients higher than 0.5 can be applied to model construction [27]. Based on this knowledge, this study proposed a Modified Correlation Coefficient Method (MCCM) to screen the sensitive wavelengths of nitrogen content in the hyperspectral images. The procedure diagram for selecting the sensitive band is shown in Figure 3. The specific operation process was as follows: (1) Correlation coefficient calculation: a correlation analysis was conducted among the spectral data from the UAV hyperspectral image and nitrogen content. (2) Location of first sensitive wavelength determined: the wavelengths that are higher than 0.5 were correlated with the nitrogen content, screened out and arranged according to the wavelength order. The first wavelength with correlation higher than 0.5 was identified as the first sensitive wavelength. (3) Nitrogen content sensitive wavelength screening: the wavelength autocorrelation threshold was set, the correlation between the first sensitive wavelength and the remaining wavelengths was analyzed, and the position of the second sensitive wavelength was determined according to the threshold. A correlation analysis was conducted among the spectral data of the determined second sensitive wavelength and those of other wavelengths, and the position of the third sensitive wavelength was determined according to the autocorrelation threshold between the wavelengths. This step was repeated until all bands were covered, and finally all sensitive wavelengths of nitrogen content within the band range of the hyperspectral data were determined. The determination of the threshold of autocorrelation between wavelengths was the key parameter of the sensitive wavelength screening in this study. In this study, 0.05 was used as the threshold boundary to screen the sensitive wavelengths of nitrogen content in tree canopy under different thresholds. Based on the screened sensitive wavelengths, a Multiple Linear Regressions (MLR) was constructed to determine the optimal threshold values based on the accuracy of the model.

Construction of Spectral Characteristic Parameters
The spectral parameters were constructed by mathematical transformation of the selected sensitive wavelengths. In this study, two random sensitive wavelengths were selected and combined. Then Difference Spectral Index (DSI), Ratio Spectral Index (RSI), and Normalized Difference Spectral Index (NDSI) were constructed. The Double Difference Index (DDI) was constructed by a combination of any random three wavelengths. The calculation methods are shown in Equations (5)-(8) [28].
RSI (x,y) = R x R y DDI (x,y,z) = R x − R y − R z In Equations (5)-(8), x, y and z represent the corresponding spectral reflectance (R x , R y , R z ) of the x, y and z bands within the band range of 450~950 nm, respectively.

Model Establishment and Testing
In this study, Multiple Linear Regression (MLR), Partial Least Squares (PLS), Support Vector Machine (SVM), and BP Neural Network (BPNN) were used to construct the retrieval model of the canopy nitrogen content. SVM and BPNN were constructed using DPS. The specific parameters of the two models are shown in Tables 2 and 3 [29][30][31]. The determination coefficient (R 2 ), Root Mean Square Error (RMSE), normalized Root Mean Square Error (nRMSE), and mean absolute error (MAE) were used to verify the accuracy of the model. The calculation methods are shown in Equations (9)- (12). The accuracy of nitrogen content retrieval models was compared, and the optimal model was then selected [32,33].
In Equations (9)-(12), y i is the measured reflectivity value,ŷ i is the anticipated value, y is the average value of the measured values, y i is the average value of the predicted values, and n is the number of samples. In Equation (11), y is the average value of measured values, and RMSE is the root mean square error.

Determination of NDCSI Threshold and the Extraction of Apple Canopy Hyperspectral Images
The vegetation index, NDVI, NDCSI, and calculation results of the hyperspectral images of the apple canopy are shown in Figure 4. NDCSI had a stronger canopy shadow recognition ability compared with NDVI. Therefore, it could effectively identify the shadows inside the canopy. The gray values of the apple tree canopy, soil background, and shadow were compared, and the results are shown as follows: apple tree canopy > shadow > soil background. Based on NDCSI, a reasonable threshold value could be set to effectively remove soil and shadow from UAV hyperspectral images.  Figure 5 shows the canopy recognition of hyperspectral images of arbor and dwarf trees under different thresholds of NDCSI. When the NDCSI threshold of spectral images of the canopy of arbor trees was set at 0.03, some shadows inside the canopy were still unrecognized. When the threshold was set at 0.04, the canopy was over-fragmented. When the threshold value was set at 0.035, most of the shadows inside the canopy could be recognized, and the overall shape of the canopy was relatively intact. Therefore, 0.035 was set as the threshold value of NDCSI of the hyperspectral images of the arbor trees. When the NDCSI threshold of hyperspectral images of dwarf fruit trees was set at 0.035, some shadows in the canopy were still unrecognized. When the threshold was set at 0.045, the canopy was over-fragmented. When the threshold value was set at 0.04, most of the shadows inside the canopy could be recognized, and the overall shape of the canopy was relatively intact. Therefore, 0.04 was set as the threshold value of NDCSI in the hyperspectral images of the dwarf fruit trees.
The threshold of the NDCSI of the hyperspectral images of arbor trees was determined to be 0.035, and 0.04 was determined to be the threshold of the NDCSI of hyperspectral images of the dwarf trees. A mask was established to extract the canopy hyperspectral images, and the canopy hyperspectral spectral images resulting from shadow removal are shown in Figure 6.

Spectral Characteristics Analysis of Tree Canopy
Canopy spectral information was extracted based on the UAV hyperspectral image after soil removal and shadow removal, and the spectral curve of the apple tree canopy is shown in Figure 7. In the band between 450 nm~500 nm and 650 nm~700 nm, the visible light radiation is absorbed by the chlorophyll in the canopy leaves and is applied to photosynthesis, so two absorption valleys are formed in this band. At 550 nm, the reflection effect is strong, and then a small reflection peak is formed. Spectral reflectance increases sharply at 700 nm-750 nm, and gradually flattens after entering the near-infrared region.
As shown in Figure 8, by comparing the canopy spectral information of arbor and dwarf trees, it was found that the canopy spectral reflectance of dwarf trees was higher than that of the arbor trees, especially in the near-infrared band. This was due to the different canopy structures of apple trees. Compared with dwarf trees, the canopy of arbor tree trees is denser, with more irregular surfaces and larger reflection dispersion. Therefore, the spectral reflectance of the arbor tree canopy was lower than that of dwarf trees. The differences in the spectral characteristics of vegetation caused by tree structure were mainly concentrated in the near-infrared band, with little difference in the overall hyper-spectrum band range. Therefore, the sensitive wavelength screening and nitrogen retrieval model construction process do not distinguish tree structure.

Correlation Analysis between Canopy Spectral Reflectance and Nitrogen Content
The correlation curve between hyperspectral reflectance of each wavelength and the nitrogen content of the apple tree canopy is shown in Figure 9. There is a negative correlation between the hyperspectral wavelengths and the nitrogen content in the tree canopy, and the correlation between the hyperspectral wavelengths and the nitrogen content was higher between 660 nm and 690 nm.

Threshold Determination and Sensitive Wavelength Screening Based on MCCM
MCCM was used to screen the sensitive wavelengths of nitrogen content in the tree canopy, and the results were shown in Table 4. To ensure a high level of accuracy of the threshold, MLR was applied to build a quantitative model and to evaluate the accuracy level. The accuracy level of the MLR based on the selected sensitive wavelengths under different thresholds was compared to determine the optimal threshold of the autocorrelation between wavelengths. In this study, a reasonable threshold value of autocorrelation between wavelengths was explored, with a step size of 0.05. The results showed that when the threshold value was greater than 0.8, the sensitive wavelength of nitrogen content in the tree canopy could be obtained within the hyperspectral band range. By comparing the accuracy level of the MLR based on screening the sensitive wavelengths at different thresholds, it was found that the lower the threshold is, the fewer sensitive wavelengths were screened, and the accuracy level of the nutrient retrieval model was lowered. When the inter-band correlation threshold was set to 0.95, the sensitive wavelengths of canopy height spectral image screening were 470, 474, 490, 514, 582, 634 nm. At this time, the R 2 of the training set and verification set of MLR constructed was the highest, and the RMSE was at its lowest. Therefore, 0.95 was taken as the threshold of inter-wavelength correlation.

Construction of Spectral Parameters
Four spectral parameters, DSI (x, y) RSI (x, y) NDSI (x, y) and DDI (x, y, z) , were constructed by randomly combining two or three selected nitrogen content-sensitive wavelengths. A correlation analysis between spectral parameters and canopy nitrogen content was conducted, and the results were shown in Figure 10 and Table 5. The correlation between RSI (x, y) , NDSI (x, y) and nitrogen content was less than 0.5 in the spectral parameters constructed by the combination of two wavelengths. DSI (x, y) had a higher correlation with nitrogen content, and the correlation between DSI (470,582) , DSI (470,634) , DSI (474,582) , DSI (474,634) , DSI (490,582) , DSI (490,634) and nitrogen content was higher than 0.5. The correlation between DSI (470,582) , DSI (474,582) , DSI (474,634) , DSI (490,582) and nitrogen content were higher than 0.6. Table 5.
Correlation coefficients between DDI (x, y, z) and nitrogen content based on the MCCM method. The correlation coefficients between the spectral parameters and the nitrogen content based on three wavelengths were all higher than 0.5, and the highest correlation was DDI (474 nm; 582 nm; 634 nm) . The correlation coefficient was 0.735, indicating that the correlation between the nitrogen content and the spectral parameters constructed based on the three sensitive wavelengths was higher than that constructed based on the two sensitive wavelengths.

Spectral Characteristic Parameter
Finally, six sensitive spectral parameters of nitrogen content were screened, namely DDI ( Table 6.
The results showed that the accuracy of the nonlinear model was higher than that of the linear model. The SVM was 0.09 higher than the MLR R 2 , 1.11% lower than RMSE, 2.53% lower than nRMSE, and 1.17% lower than MAE. The validation set R 2 was 0.095 higher, RMSE was 1.69% lower, nRMSE was 11.4% lower, and MAE was 1.83% lower. Compared with PLS, the SVM R 2 was 0.152 higher, RMSE was 1.31% lower, nRMSE was 5.85% lower, and MAE was 1.52% lower. The validation set R 2 was 0.124 higher, RMSE was 0.8% lower, nRMSE was 6.1% lower, and MAE was 1.83% lower. Compared with MLR, the BPNN R 2 was 0.038 higher, RMSE was 0.68% lower, nRMSE was 1.8% lower, and MAE was 0.45 lower. The validation set R 2 was 0.094 higher, RMSE was 1.78% lower, nRMSE was 11.04% lower, and MAE was 2.21% lower. The BPNN was 0.1 higher than the PLS R 2 , 0.88% lower than RMSE, 1.52% lower than nRMSE, and 0.8% lower than MAE. The validation set R 2 was 0.123 higher, RMSE was 0.89% lower, nRMSE was 5.74% lower, and MAE was 0.6% lower. The results show that the precision of the model based on machine learning was higher than that of the linear model. The results showed that the R 2 of SVM was 0.052 higher than that of BPNN, the RMSE was 0.43% lower, the nRMSE was 4.33% lower, and the MAE was 0.72% lower than that of BPNN. The validation set R 2 was 0.001 higher, RMSE was 0.09% higher, nRMSE was 0.36% lower, and MAE was 0.38% higher. Therefore, SVM is the best retrieval model for nitrogen content.

Discussion
MCCM was proposed to solve the traditional problem of intensive distribution of sensitive wavelengths. It can filter out the most relative wavelengths within the entire band. The nitrogen retrieval model with sensitive wavelengths based on MCCM had a high accuracy level, it was feasible, and it can be used in the retrieval of nitrogen content in vegetation studies. However, the determination of the optimal threshold between wavelengths in this method needs to be further explored. In addition, the results show that spectral parameters constructed by mathematical transformation such as addition, subtraction, and division can enhance the correlation between spectral data and nitrogen content, and the correlation between spectral parameters constructed based on three wavelengths was higher than that based on two wavelengths, which was consistent with the results from previous studies [34][35][36].
The background information in the image will weaken the canopy spectral information and will reduce the model retrieval accuracy [37][38][39]. In previous studies, NDCSI was used to extract multi-spectral UAV dwarf tree canopy images, and the results showed that the nitrogen retrieval model constructed based on spectral data extracted from the NDCSI image had a higher accuracy level [40]. In this study, the extraction effect of NDCSI in UAV hyperspectral canopy imagery of apple trees with different planting methods was explored, which provides a reference for future vegetation nutrient retrieval based on UAV hyperspectral imagery. However, the threshold screening process is complicated. In the future, combined with the mixed pixel decomposition method, an effective method of canopy image extraction can be further explored.

Conclusions
In this study, a UAV hyperspectral image was used as the data source, NDCSI was used to extract the canopy hyperspectral image of apple trees, the MCCM was used to select the sensitive wavelength of nitrogen content in the canopy of apple trees, and the retrieval model of nitrogen content in UAV hyperspectral image was established. The sensitive wavelengths of nitrogen content screened based on MCCM were 470 nm, 474 nm, 490 nm, 514 nm, 582 nm, 634 nm, and 682 nm. The larger the autocorrelation threshold, the more sensitive wavelengths were screened and the higher the accuracy of the model constructed. A threshold of 0.95 was determined to be the optimal threshold of the correlation coefficient between wavelengths. Among the constructed MLR, PLS, SVM, and BPNN nitrogen content inversion models, the best model for nitrogen content retrieval was SVM, in which the R 2 , RMSE, nRMSE, and MAE were 0.733%, 6.00%, 12.76% and 4.49%, respectively. The validation set R 2 was 0.671, RMSE was 4.73%, nRMSE was 14.83%, and MAE was 3.98%. This study can provide technical support for the quantitative retrieval of vegetation nutrients based on hyperspectral data. Further study will be conducted to explore the application of dimension reduction methods for hyperspectral images to improve the utilization efficiency of hyperspectral data.