remote sensing

: The red turpentine beetle ( Dendroctonus valens LeConte) has caused severe ecological and economic losses since its invasion into China. It gradually spreads northeast, resulting in many Chinese pine ( Pinus tabuliformis Carr.) deaths. Early detection of D. valens infestation (i.e., at the green attack stage) is the basis of control measures to prevent its outbreak and spread. This study examined the changes in spectral reﬂectance after initial attacking of D. valens . We also explored the possibility of detecting early D. valens infestation based on spectral vegetation indices and machine learning algorithms. The spectral reﬂectance of infested trees was signiﬁcantly different from healthy trees ( p < 0.05), and there was an obvious decrease in the near-infrared region (760–1386 nm; p < 0.01). Spectral vegetation indices were input into three machine learning classiﬁers; the classiﬁcation accuracy was 72.5–80%, while the sensitivity was 65–85%. Several spectral vegetation indices (DID, CUR, TBSI, DDn 2 , D 735 , SR 1 , NSMI, R NIR • CRI 550 and RVSI) were sensitive indicators for the early detection of D. valens damage. Our results demonstrate that remote sensing technology could be successfully applied to early detect D. valens infestation and clarify the sensitive spectral regions and vegetation indices, which has important implications for early detection based on unmanned airborne vehicle and satellite data.


Introduction
The red turpentine beetle (RTB; Dendroctonus valens LeConte) is an important invasive pest in China. Since it was discovered in Shanxi Province in 1998, it has spread rapidly and caused severe mortality of pine trees, resulting in serious ecological and economic losses [1]. By the end of 2004, RTB occurred in an area of more than 500,000 hectares and killed more than 6 million pine trees, resulting in direct economic losses of CNY 684 million and ecological losses of CNY 8.1 billion [2]. In recent years, the infestation area has spread northeast and reached Inner Mongolia and Liaoning provinces in 2017. A simulation study showed that under future climate conditions, the potential distribution of RTB will expand to a higher latitude, and the suitable degree of survival will rise in most areas [3], which means that the range and intensity of RTB outbreaks would increase as regional and global temperatures increase. Therefore, it is crucial to implement control measures to prevent RTB epidemics.
In its native origin-North and Central America-RTB is a secondary pest attacking weak and dying pines. After being introduced to China, RTBs prefer to attack Chinese pine (Pinus tabuliformis Carr.) with a large diameter at breast height and can kill healthy pines [4]. RTB usually has one generation per year in most regions in China, while in some regions, it has either two generations a year or two generations in 3 years, depending on local climate The traditional method to identify RTB-infested trees is to look for pitch tubes and boring dust during ground surveys. Although the information provided by ground surveys is the most precise, it is laborious, time-consuming and difficult to cover large areas, which may result in missing the best time for the prevention of an outbreak. The developments of remote sensing and data processing technology would provide opportunities to monitor and assess the health status of forests in large areas. Several reviews summarized the application cases or potential of remote sensing for forest health monitoring from different perspectives [10][11][12][13][14]. To date, many studies used multispectral satellite data to detect and monitor bark beetle damages. For example, single-date or multi-date medium-resolution satellite data (e.g., Landsat and Sentinel-2 data) were used to detect infestation plots and monitor the dynamics of bark beetle outbreaks in Europe and North America [15][16][17][18][19]. A few studies assessed the ability of Sentinel-2 or Landsat data for early detection [20][21][22][23]. However, their low spatial resolution only enables the detection of infested stands, whereas there are different attack stages simultaneously present in infested forests. High-spatialresolution satellite images (e.g., IKONOS, GeoEye-1, QuickBird and WorldView-2) were used to detect tree mortality (i.e., red and gray attack stages) caused by bark beetles and achieved high accuracies [24][25][26]. Zhan et al. [9] evaluated the ability of Gaofen-2 to detect tree mortality caused by RTB with an overall accuracy of 77.7%. However, the trees they identified were in the last two attack stages, which are hysteretic and insufficient for preventing the spread of bark beetles by salvage logging. Immitzer et al. [27] and Mullen [28] used WorldView-2 imagery to detect the early stage of SBB infestation in Norway and MPB infestation in ponderosa pine forest, respectively. In their results, the classification accuracy of the "green attack" was about 70%, and the high within-class variances overlapped the spectral differences between classes. Therefore, it is still problematic to detect the GA stage of bark beetle infestation using multispectral satellite data because of the limited spatial and spectral resolution.
In recent years, the potential of hyperspectral data for the early detection of stress has attracted attention. Some studies attempted to detect early infestation of bark beetles by using airborne hyperspectral data but, unfortunately, were not able to effectively distinguish between GA and healthy trees [29][30][31][32][33][34]. Compared with the canopy level, the spectrum at the leaf level is less disturbed by background noise and has a higher spectral resolution, which can capture finer spectral responses and provide a better understanding of spectral changes under stress. Several studies used ground-based hyperspectral sensors in the field or laboratory to detect changes in spectral properties at the leaf level. Ahern [35] found three important spectral regions and a red shift of the red edge at the early stage of MPB infestation in lodgepole pine trees. He noted that the shoulder of the NIR plateau was the most promising for early detection, and there was an interaction between needle age and attack status. However, Reichmuth et al. [36] concluded different results from a ringbarking experiment on Norway spruce and multi-temporal laboratory needle hyperspectral data, in which they found that the visible spectral region was more important than the NIR region for detecting stress, and the spectral data between different age classes were significantly similar. Cheng et al. [37] used continuous wavelet analysis to detect the spectral responses of MPB-infested and girdled trees. Their results show that the needle water content of infested trees decreased and the spectral features located between 950 and 1390 nm were sensitive to the GA stage, whereas both needle water and chlorophyll content of girdled trees were significantly different from the healthy trees, and the spectral features used to distinguish girdled and healthy trees were located between 1550 and 2370 nm. Foster et al. [8] confirmed that the SWIR region is an important region for detecting the early stage of Dendroctonus rufipennis infestation. Abdullah et al. [38] measured the mean spectral reflectance and foliar biochemical (i.e., chlorophyll and nitrogen content) between healthy and SBB-infested Norway spruce, and significant differences were observed, with the differences of reflectance being most distinct in the region from 730 to 1370 nm. Several studies also used hyperspectral data for other species and diseases (e.g., Sirex noctilio [39], pine shoot beetles [40] and pine wilt disease [41]) and screened for different spectral features that could be used for early detection. According to previous studies, spectral responses and important features for early detection may vary with host, stress factors and observation scales. To date, the spectral changes in the early stage of RTB infestation have not been studied.
Here, we evaluated the potential of hyperspectral data for the early detection of RTB infestation. The specific objectives of the study were (1) to compare the differences of spectral reflectance between the GA stage and healthy trees, (2) to establish classifiers combining machine learning (ML) algorithms with spectral vegetation indices (SVIs) and evaluate their performance and (3) to identify several important SVIs for early detection of RTB infestation. Our results could provide a reference for early detecting RTB infestation using UAV-and satellite-based data.

Study Area
The Heilihe National Natural Reserve, located at the boundaries of Hebei, Liaoning and Inner Mongolia provinces, was the study area ( Figure 2). The altitude is 770-1836 m, and the mean annual temperature is 4.8 • C. The predominant flora in the reserve is Chinese pine (P. tabuliformis) forests, covering an area of 4667 hectares. Chinese pines infested by D. valens were found in the reserve in 2017. We conducted an intensive field investigation to identify the infested stands in the autumn of 2018.

Spectral Measurements and Pre-Processing
In early May 2019 (before D. vales adults emerged and spread), P. tabuliformis at GA stage were selected if their canopies were green but there were multiple pitch tubes on the trunks ( Figure 3). Healthy trees with similar diameter at breast height (DBH) were selected as the control in the same stands. Five branches exposed to sunlight were taken from each tree using an extendable tree pruner with a maximum length of 10 m. Twigs from each branch were collected separately and placed into labeled re-sealable plastic bags, which were immediately packed into a cooling box filled with ice packs and subsequently taken to the laboratory [28,38]. In total, 100 branch samples were obtained for both GA and healthy classes. The spectra of needles were measured within three hours after branches collection. To avoid the influences of atmospheric conditions or variable illumination, spectral measurements were conducted in a dark room. An ASD FieldSpec-4 spectroradiometer (Analytical Spectral Devices, Inc., Boulder, CO, USA) equipped with a 25 • -field view probe was used to measure the spectra over the region of 350-2500 nm wavelengths. Needle samples from each branch were laid on a velvet black cloth to minimize background reflection. A tungsten quartz halogen lamp was used as the light source to illuminate the pine needle samples. Before spectral measurements and every 20 min during measurements, the instrument is optimized to obtain the data with the best signal-to-noise ratio. Reflectance was standardized using a PTFE white panel before each sample measurement. Spectra were measured at a 10 cm distance between needles and the probe ( Figure 4). Each sample was scanned 20 times with 90 • rotation to account for the irregular topography. The average value was calculated as the sample reflectance. Spectra data were processed using RS3 v6.4 and ViewSpec Pro v6.2.0 software (Analytical Spectral Devices, Inc., Boulder, CO, USA) [36,39]. The raw spectral DN value was converted to reflectance of 1 nm bandwidth. Because of low signal-to-noise ratios, bands in the ranges 350-399 nm and 2401-2500 nm were excluded from the analysis. A Savitzky-Golay smoothing filter with a window size of 7 and quadratic polynomial function was applied to dampen the noise in OriginPro 2019b (OriginLab Corporation, Northampton, MA, USA) [28,38,40].

Hyperspectral Statistical Analyses
To analyze the separation between GA and healthy trees, the significant difference in spectral reflectance of each band was examined using a Mann-Whitney U-test in R v4.0.4 with the "MASS" package [36].

Spectral Vegetation Indices (SVIs)
Removing irrelevant and redundant information can often improve the performance of ML algorithms. Thus, feature selection is an important data-processing step usually performed before running an ML algorithm [42]. In this study, 137 SVIs used in previous studies were calculated to reduce the data dimension, and a recursive feature elimination (RFE) method combined with cross-validation was conducted for feature selection, which is a commonly used wrapper method [43,44]. Finally, 40 SVIs were extracted to form a new dataset for classification procedures (Table 1).

Classification Procedures
The SVIs dataset composed of 200 samples and 40 features (i.e., SVIs) was divided into training and test sets in the ratio of 8:2 [70]. The training set (160 samples) was used for training the models, whose parameters were tuned according to the results of ten-fold cross-validation. The test set (40 samples) was used only for evaluating the models. In this study, three ML algorithms were used to establish the classification models. The classification procedures using each ML algorithm were executed ten times, and the classifiers' performance was compared using one-way ANOVA [44,70].

Random Forest (RF)
The random forest is a non-parametric algorithm widely used in classification and regression. It is suitable for solving the problem with a small number of samples and a large number of variables and is robust against multicollinearity and overfitting. It is essentially an integration method based on decision tree, and RF outputs the classification results by counting the votes of decision trees. Two important parameters needed to be defined: the number of decision trees (ntree) and the number of variables/features at each split node (mtry). Through the test, Stohe et al. [71] found that 500 trees could not only ensure the classification accuracy but also save processing time. Therefore, we set the parameter "ntree" equal to 500 and optimized the parameter "mtry" when training the RF models. Each tree was constructed using a subset that was randomly selected from the training set by sampling with replacement. At the split nodes of each tree, subsets containing a certain number (mtry) of variables were extracted from all variables (i.e., 40 SVIs) without replacement. The RF classification models were established using R statistical software version 4.0.4 with the packages "randomForest" [28,72].

Support Vector Machine (SVM)
SVM is a supervised ML algorithm applied for classification analysis. It has strong generalization ability and can simultaneously minimize the empirical error and maximize the geometric edge region. The core idea of SVM is to map linearly inseparable data in the original space to high-dimensional feature space through a kernel function so that the linear analysis can be carried out in the high-dimensional space. It constructs optimal hyperplanes in the high-dimensional space based on the structural risk minimization theory. These hyperplanes make the distance (margin) between the closest points in the two categories as large as possible, and the points lying on the boundaries, called support vectors, determine the margin. In the study, a radial basis function, which is commonly used for classifying remotely sensed data, was used to map the training samples to a high-dimensional space according to the SVIs' information and find the hyperplanes. Two parameters were optimized in Section 2.5.4-the cost (C) and gamma (γ). The cost was a value used to correct the error of misclassification of the training dataset, and the gamma was the kernel width parameter that controls the shape of the hyperplane. SVM classification models were established using R statistical software version 4.0.4 with the packages "kernlab". [71][72][73][74].

Artificial Neural Network (ANN)
ANN is inspired by the biological nervous system, which is formed by a large number of processing units similar to biological neurons connected to each other. The backpropagation (BP) neural network is one of the most widely used neural network models, whose parameter weights are adjusted by the error backpropagation algorithm. The learning process consists of two processes-the forward propagation of the signal and the BP of the error. In the study, the package "nnet" was utilized to fit single-hidden-layer BP neural networks. The BP networks transmitted the training samples forward and processed them through the neurons of the hidden layers. Each neuron determined whether to send some signals to the output layer through the activation function according to SVI information. The output layers generated the classification results, and the output errors were calculated and propagated back to the hidden layers. The connection weights were adjusted according to the errors of the hidden neurons. The learning process stopped after 100 iterations. We optimized the size that is the number of units in the hidden layer and the parameter for weight decay [44,72,73].

Optimizing Hyperparameters of the Three Classifiers
To obtain the best classification models, an optimization process was implemented to determine the best parameters of each classifier. We determined the best parameter combination of each classifier by random search and 10 times of ten-fold cross-validation. We randomly searched 20 groups of parameters and used the area under a receiver operator characteristic (ROC) curve (AUC) as the evaluation index for classification models. The "train function" in the "caret" package was used to fit predictive models over different tuning parameters, and it automatically determined the parameter combination with the largest AUC [44].

Evaluation of Model Performance
The test set could evaluate the models without bias because there was no contact with the classifiers during modeling [71]. The test samples were input into the classifiers, and the output results and true conditions formed confusion matrices. Four metrics based on the matrices were used to evaluate the performance of the classification models-sensitivity, specificity, precision and accuracy (Table 2) [44,72]. In this study, GA trees were defined as positive class while healthy trees were defined as negative class.

Variable Importance
The variable importance was calculated using "VarImp function" in the "caret" package. The function is a generic method for evaluating the importance of variables, and it can be separated into two groups-RF and ANN (using the model information) and SVM (using a "filter" approach) [75]. The importance scores were scaled to be between 0 and 100 and were ranked in the study.

Spectral Reflectance Analysis
The Mann-Whitney U-test result showed that the spectral reflectance between healthy and GA trees was significantly different (p < 0.05) for 954 wavebands, among which 740 wavebands were highly significant (p < 0.01) ( Figure 5). The spectral reflectance of GA trees significantly decreased in the NIR plateau, while the differences were slighter in visible and SWIR regions. Compared with healthy trees, the red-edge position of GA trees shifted toward shorter wavelengths ("blue shift") and the red-edge slope decreased ( Figure 6). In the SWIR region, the features associated with water absorption (near 1450 and 1950 nm) all increased slightly, while only the changes in the second water absorption feature (near 1950 nm) were significant. The mean reflectance of GA trees slightly decreased both in 1625-1743 nm and 2238-2260 nm.

Classification Models
Three ML algorithms were used to dichotomize the early-infested and healthy trees, and all classification models achieved acceptable accuracy (Figure 7). The RF performed best, with a mean accuracy of 78.50%. The ANN followed with a mean accuracy of 75.75%. The SVM performed the worst, with a mean accuracy of 73.25%. The sensitivity of ANN was significantly higher than RF and SVM (p < 0.01), while the RF had the most stable performance with a sensitivity of 75%. RF had the highest specificity, while there was no significant difference between ANN and SVM. For precision, the RF performed better than ANN and SVM, with a mean precision of 80.75%. The best classification models using three ML algorithms were shown in Table 3. The accuracy of RF and ANN classification models was the same (80%), but the sensitivity of the ANN model was higher than RF.

Important Variables
The variable importance scores for the three best classifiers (Table 3) were ranked, and the top 10 were plotted (Figure 8). The three classification models had different evaluation criteria for the importance of each variable, and different results were obtained. However, some indices had high importance in different models. DID, CUR and TBSI were ranked in the top 10 by all three models. DDn 2 , SR 1 and D 735 were important for the RF and SVM models, while NSMI was important for the RF and ANN models. In addition, R NIR •CRI 550 and RVSI were important for the ANN model, ranking third and fifth, respectively. The spectral regions covered by these indices include green, red, red-edge, NIR and SWIR.

Discussion
Early detection of RTB infestation allows timely action to halt its spread and prevent a pest epidemic. In this study, ground-based hyperspectral data combined with ML algorithms were used to detect pines infested by RTB before obvious discoloration. We determined the spectral regions and important SVIs to differentiate GA pines from the healthy pines.

Spectral Changes at the GA Stage
Differences in spectral reflectance between GA and healthy classes are shown in Figure 4. Compared with healthy trees, the changes in the visible region (400-760 nm) of infested trees were subtle, which was inconsistent with previous studies [28,35,36,38], in which they found that the reflectance increased significantly, and the green peak (near 550 nm) and red edge (680-760 nm) were important features for detecting stress. These spectral changes were in response to changes in the pigment content of needles, particularly chlorophyll degradation [76]. However, in this study, changes were observed only in red-edge features (position and slope), while the green peak changed indistinctively. This may be due to the slight change of pigment content of pines infested by RTB in this study. Cheng et al. [37] also found that chlorophyll content did not decrease significantly in trees infested by MPB at the GA stage but decreased in grilled trees. Moreover, the blue and red spectra were the strong absorption bands of pigments with high pigment absorption coefficients, which meant that low pigment concentrations could saturate the spectral absorption characteristics and made these regions insensitive to slight pigment changes [28]. Ahern [35] also found that the absorption at 676 nm was still present in yellow needles. Thus, mild pigment changes may not be reflected in the spectrum.
There were evident differences in the NIR and lower SWIR regions (760-1394 nm) between healthy and infested trees in this study. The decreasing trend of reflectance of infested trees in this region was in line with the results of previous studies on other bark beetles [31,38,40]. The high reflectance in this region is the result of leaf scattering, and there are several absorption features associated with water and other chemicals (e.g., starch, protein, lignin and cellulose) [76]. The decrease in reflectance may be due to the changes of leaves' internal structures, such as cell arrangement or cell-wall/air-space interface [77]. In addition, stomatal closure to preserve water may also result in reduced NIR reflectance [78]. Although the mechanism is unknown, the obvious differences suggested the potential of this region for the early detection of RTB infestation. The potential of the NIR region for the early detection of stress has also been noted in previous studies [8,35,37].
In the SWIR region, the reflectance is mainly related to foliage moisture content, and water absorption features are centered at approximately 1450 and 1950 nm [79]. The reflectance increase from 1906 nm to 1990 nm may be the result of water stress after the RTB attack. Besides moisture, several organic compounds, such as lignin, cellulose, starch, sugar, protein, oil and nitrogen, contribute to the spectral features of the SWIR region. Deformation of the O-H, C-H or N-H bond in these substances results in some minor absorption features in NIR and SWIR regions [76,77]. Differences in spectral reflectance between 1625-1743 nm and 2238-2260 nm could be attributed to changes in concentrations of these substances. Our findings are partially consistent with Abdulah [38]; in his research, there were significant differences in the two SWIR regions (1430-1500 nm and 1897-2000 nm) between healthy and European spruce bark beetle early-infested trees. They explained that this variation was caused by lower water and nitrogen content in green attacked samples.
In general, we found that the reflectance in the NIR region decreased significantly after the RTB attack, and the changes were earlier than those in the visible region, implying the potential of this region for early detection.

Classification Models and Variable Importance Evaluation
Three ML algorithms, which were widely used in processing remote sensing data, were used to build classification models of early-infested and healthy pines in this study.
During model tuning, a random search process was executed to find the best parameters, which did not restrict the parameter range and can traverse all parameter spaces to find the best combination of parameters [44]. To better evaluate the performance of the models, we established ten models using each ML algorithm and conducted one-way ANOVA. Although the average accuracy of RF was the highest, the sensitivity deserves more attention for forest managers because it could represent the missed diagnosis rate of infested pines. Therefore, the ANN performed best with an average sensitivity of 79.5%. However, it should be noted that the same algorithm may perform differently in different studies [9,33,34], and algorithm selection needs to be determined according to a particular situation. In addition, deep learning has been proven to perform well in remote sensing data processing [71], and further research could combine deep learning with hyperspectral data to explore the early detection of RTB infestation.
Numerous studies have established the correlation between SVIs and physiological or biochemical properties of plants and indicated that SVIs can be better indicators of stress [40,80]. SVIs are also applied to reduce the influence of geometry, background and light conditions, which are important for large-scale (e.g., canopy or individual tree) detection [77]. In this study, SVI computations and a feature selection program were executed to transform the high-dimensional spectral reflectance before modeling, which could remove redundant information and simplify the models. The classification results show that SVIs could be used to distinguish early-infested trees from healthy trees.

Implications for Remote Sensing
Our study identified the spectral changes of P. tabuliformis in the early stage of RTB attack and determined sensitive spectral regions for early detection. These can guide the selection of sensors for large-scale detection. For example, Sentinel-2 has six bands with center wavelengths at 729-1386 nm where the reflectance of infested trees decreased significantly, suggesting its potential for early detection of RTB damage at the stand or landscape scale. Moreover, we screened out several important SVIs for early detection of RTB infestation, and some similar indices might be obtained based on Sentinel-2 bands, such as TBSI, which was a combination of green, red and red-edge bands. Continuous SVI images based on time series of EO data may realize the temporal monitoring of the development dynamics and spread direction of RTB attack [17,81]. The same enlightenment also applies to remote sensing data based on UAVs. However, several points need to be kept in mind when extending these findings to the canopy level. First, the influence of canopy structure parameters on the spectrum may confuse the influence of leaf internal structure, which could lead to uncertain results [25,31]. Second, environmental factors, such as atmospheric moisture, light and understory background, need to be considered when using UAVs or satellites to identify infested individual trees, which may reduce the signal-to-noise ratio [27,28]. Finally, data acquisition time is critical for the early detection of stress using remote sensing [28,78]. On the premise of comprehensive consideration of biological and detection accuracy, remote sensing data acquisition should be more likely to be advanced in order to allow time for subsequent control measures.

Conclusions
This is the first study of the spectral changes of P. tabuliformis at the GA stage after RTB attacking. It explored the possibility of early detection of RTB infestation using hyperspectral data. The main conclusions were as follows: (1) The spectral reflectance of early-infested Chinese pines was significantly different from healthy trees in 954 wavebands between 400 and 2400 nm, and the NIR region was more sensitive to RTB attack than the visible and SWIR regions.
(2) Early-infested trees could be distinguished by SVIs and ML algorithms. The artificial neural network (ANN) classifier performed best with an average sensitivity of 79.5%.
In conclusion, this study demonstrated the potential of remote sensing technology for the early detection of RTB infestation. The results provide a reference for larger-scale early detection based on UAV and satellite platforms. Further research is required to examine the changes in the physiological and biochemical properties of needles, which can provide explanations for the spectral changes. The spectral vegetation indices used in this study were proposed by previous studies, and future research can develop more sensitive indices based on spectral and biochemical properties to enhance the operability of remote sensing for the early detection of RTB infestation.