1. Introduction
Pine wild disease (PWD), known as the pine killer, poses a significant threat to pine forests globally [
1]. This disease is caused by the pine wood nematode, which infiltrates and reproduces within the pine tree, ultimately resulting in the tree’s demise [
2].
At present, effective prevention and control measures involve manually cutting down infected pine trees affected by pine wilt disease, followed by centralized burning of the felled diseased trees. Additionally, a special medicine is sprayed on the stumps of the diseased trees and sealed to prevent secondary transmission. An important prerequisite for the above-mentioned control measures is the identification and localization of infected pine trees, which is achieved through the detection of diseased trees. Traditional monitoring of pine tree blights mainly relies on manual detection. Staff observe the appearance and surface morphological characteristics of trees, judging based on the color change characteristics of infected pine trees, such as yellowish-brown and reddish-brown [
3]. This method has the disadvantages of poor timeliness and large recognition errors, making it difficult to effectively complete the task of epidemic monitoring.
Compared with manual detection, aerial remote sensing image monitoring has the advantages of wide coverage, low labor intensity, and high efficiency. However, implementing satellite remote sensing image monitoring has high costs, low resolution, poor timeliness, and is easily disturbed by natural environmental factors.
With the development of UAV technology, high-resolution UAV imagery has brought great convenience to monitoring tasks in various industries [
4,
5,
6]. A feasible method for early detection of infected trees is using aerial images obtained by unmanned aerial vehicles (UAVs). In recent years, deep learning technology has rapidly developed; deep learning technologies have been used in various research fields, such as image defogging [
7], face recognition [
8,
9], video object segmentation [
10], etc. Due to its powerful feature extraction capabilities, researchers have begun to apply the combination of UAV remote sensing technology and deep learning technology to various fields [
11].
Xu Xinluo et al. [
12] used the Faster R-CNN algorithm to automatically identify pine blight diseased trees and locate infected pine trees, achieving a recognition accuracy of 82.42%. However, this method uses a two-stage detection network, and the reasoning speed for diseased trees is slow. Additionally, the amount of data in the experiment is small, which is insufficient to be applied to the monitoring task of actual pine wilt diseased trees. Li Fengdi et al. [
13] used the improved YOLOv3-CIoU algorithm to detect PWDT and improved the accuracy of the disease by citing the more accurate regression loss function CIoU. However, the research area was only 0.275 square kilometers, and the research results are not representative. Bingxi Qin et al. [
14] used the improved YOLOv5 algorithm to detect multispectral data of pine wood nematodes from UAVs and achieved relatively good results with high identification accuracy. However, the acquisition efficiency of multispectral data is low, the technical requirements for UAV flights are high, and the acquisition cost of diseased tree images is high. As a result, it is not suitable for large-scale identification of pine wilt diseased trees.
Although the above studies have achieved certain results in the detection of pine wild diseased trees (PWDT), they all have the following problems: there are too few small diseased tree samples in a single picture in the training dataset, and the complex background is easy to interfere with the algorithm’s detection of small diseased trees in the shallow feature map. Feature extraction leads to missed detection of a large number of small-sized diseased trees when detecting diseased trees.
To address these issues, this paper proposes a Shallow Pooled Weighted Feature Enhancement Network (SPW-FEN) for small PWD tree detection in UAV images. The proposed network takes advantage of both shallow and deep features, and applies pooling and weighting schemes to enhance the discriminative power of features. Specifically, in this paper, we propose a Shallow Pooled Weighted Feature Enhancement Network (SPW-FEN) based on Small Target Extension (STE) for PWDT detection. First, two layers of shallow feature maps are used to split the output of small-sized diseased trees. At the same time, a Pooled Weighted Channel Attention module (PWCA) is presented, which introduces the proposed PWCA module into the shallow layer of the FPN structure to enhance the feature response of the small diseased tree target in the shallow feature map and enhances the algorithm’s ability to extract the features of the small-sized diseased trees.
In addition, in small-target detection, data augmentation technology can increase the number of samples in the training set by rotating, translating, scaling, etc., thereby improving the detection ability of the model. In the latest YOLO series algorithms, such as YOLOv4, YOLOv5, etc., data enhancement methods are used to improve the detection ability of the algorithm for small targets by increasing the number of small targets. However, the expanded samples from the data enhancement method used above have problems such as deformation, color gamut transformation, focusing on the expansion of the overall sample, and do not show obvious expansion of the small target sample, which is not applicable to the small-scale pine wilt diseased trees in this paper. Based on this, we propose an STE data enhancement method. While increasing the sample size of small and medium diseased trees in a single image, the robustness of the algorithm is improved.
The proposed network is evaluated on the pine wilt diseased trees dataset containing UAV images of PWDT. Experimental results show that SPW-FEN outperforms several state-of-the-art methods in terms of detection accuracy, especially for small PWDT. In addition, a comprehensive analysis is performed to study the effectiveness of the proposed pooling and weighting schemes, as well as the contribution of shallow and deep features.
The remaining chapters of this paper are arranged as follows:
Section 2 introduces in detail the dataset of pine wilt diseased trees produced in this paper, the experimental environment used in this paper, the design of experimental parameters, and a detailed description of the proposed SPW-FEN method. In
Section 3, the results of our comparative experiments and ablation experiments are summarized and analyzed. Finally,
Section 4 concludes the paper and discusses future directions.
3. Results
In this section, we first introduce our experimental environment. Then, we introduce the evaluation metric of our experimental results, and then compare our algorithm with several current mainstream object detection algorithms on our dataset. Finally, an ablation experiment is designed for the proposed modules.
3.1. Experimental Environment and Parameter Setting
The detection algorithm in this paper is based on the PyTorch framework and uses NVIDIA GeForce RTX 3090. Using the dataset of PWDT made by ourselves to train the network model, a total of 120 epochs were trained in this experiment, and the learning rate was adjusted at the 80th and 110th epochs. The initial learning rate was set to 0.0001, and the batch size was set to four. The experimental environment and experimental parameter settings are shown in
Table 3.
3.2. Evaluation Metric
Target detection algorithm evaluation indicators are mainly divided into two categories: classification indicators and localization indicators.
Classification indicators: These mainly measure the classification ability of the algorithm for the target category. Commonly used indicators are
Accuracy,
Precision,
Recall, and
F1-
score. Among them, the accuracy rate is an indicator to measure the overall classification of the algorithm, while the precision rate and recall rate pay more attention to the classification of a single target category by the algorithm.
F1-
score is a comprehensive index of precision rate and recall rate, which can more comprehensively evaluate the classification ability of the algorithm. It is defined as the harmonic mean of precision rate and recall rate. Its formula is as follows:
Positioning index: It mainly measures the evaluation of the algorithm on the target positioning ability. Commonly used indicators are the Intersection over Union (IoU), average precision (AP), and mean average precision (map). The IoU is an indicator for measuring the accuracy of the algorithm for target positioning; AP average accuracy is one of the indicators for evaluating image retrieval results. It is the abbreviation of average precision, which means that for a set of query images, all the prediction results are averaged. AP is calculated by sorting the retrieval results and calculating the area of recall and precision. For each query image, by comparing the similarity between the predicted result and the ground truth label, a set of ranked lists can be generated where each retrieved result has a relevance score. Sort these scores from high to low, and calculate the precision at each recall. Finally, the AP can be obtained by taking the average of the accuracy rates under all recall rates, and the formula is as follows; mAP considers the classification and positioning capabilities of the algorithm for all target categories, and the calculation formula of AP is as follows:
where
represents the number of samples with actual positive labels that are correctly classified as positive.
indicates the number of samples with actual negative labels that are incorrectly classified as positive.
denotes the number of samples with actual positive labels that are incorrectly classified as negative.
P represents precision, and
R represents recall.
In practical scenarios, object detection algorithms are evaluated based on both classification and localization indicators to comprehensively assess their performance. However, for specific applications, different indicators may need to be selected based on the specific conditions and requirements.
In the case of detecting pine wilt diseased trees, the priority is to minimize missed detections to prevent the spread of the disease. Hence, this study uses recall rate and average precision as performance indicators, where the recall rate measures the proportion of predicted positives to all annotated positives. It is expected that the model’s recall rate is as high as possible while ensuring a high overall performance AP.
3.3. Comparative Experimental Results
To verify the performance of our proposed network model, we compared the verification results of the current seven mainstream target detection algorithms and our proposed detection algorithms on the PWDT dataset through experiments; the experimental results can be seen in
Table 4 below.
The experimental results show that compared with the classic network Faster-RCNN [
22] and the mainstream network SSD [
23], YOLOv3 [
24], ATSS [
25], YOLOF [
26], FoveaBox [
27], and YOLOv6 [
28], the proposed detection algorithm achieves the best detection results, with a recall and AP of 86.9 and 79.1, respectively. The visual identification comparison results of each network on the test set are shown in
Figure 9.
It can be found from the experimental comparison results in the two test samples in
Figure 9 that the SPW-FEN algorithm proposed in this paper has the best recognition effect in small-sized pine wilt diseased trees. YOLOv3, Faster-RCNN, and ATSS all have obvious missed detections. The method proposed in this paper has greatly alleviated the missed detection of small-sized diseased trees, and the recognition effect is the best.
3.4. Ablation Study
To further analyze the impact of the proposed channel attention module and data enhancement module of this paper on the network performance, we used RetinaNet as the base network, and the effectiveness of the designed method will be discussed in the following three aspects: small-sized diseased tree shunt prediction output, anchor box recalibration, and PWCA module. The specific experimental analysis data are shown below.
3.4.1. Small-Scale Diseased Tree Shunt Prediction Output
In order to verify the effectiveness of the small-sized disease tree shunt prediction output proposed in this paper, a comparative experiment was designed to analyze the results of only the P3 layer predicting output for small-scale diseased trees and using both the P2 layer and P3 layer to predict small-scale diseased tree output. The detection effect and the specific experimental data are shown in
Table 5 below.
It can be seen from
Table 5 that when only the P3 layer prediction feature map is used to predict the small-scale diseased tree output, the recall rate is 82.1, and the precision is only 77.1. When the P2 layer prediction feature map and the P3 layer prediction feature map are used at the same time when the scale disease tree is used for prediction output, the recall rate is increased by 1.2 percentage points, and the precision is increased by 0.9 percentage points. The recall rate and precision reach between 83.2 and 78.0, respectively. It can be seen that it is necessary to split the diseased tree for prediction output.
3.4.2. Recalibration of Anchor Boxes
According to the distribution of target scales in the dataset, set the sizes of the anchors on the prediction feature maps P2 to P5 to 16 × 16, 36 × 36, 78 × 78, and 140 × 140, respectively, and the three aspect ratios of the anchors to, respectively
and the ratio of the area of the anchor to
. According to the size, aspect ratio, and the area of the anchor box, nine kinds of anchors are redesigned at each pixel on each layer of prediction feature layer. The comparison between the size of the anchor box in the original algorithm and the size of the anchor box after recalibration is shown in
Table 6 below.
From the data in
Table 6, it can be seen that the adjustment of the anchor size can effectively change the detection effect of the diseased tree. There is no P2 layer in the original RetinaNet [
17] network, and the detection accuracy and recall rate of the diseased trees are low. When adding the P2 layer and adjusting the size of the anchor in the P2 layer when detecting the diseased tree, the precision and recall rate are significantly improved. When the anchor of the P2 layer is set to 16 × 16, the anchor of the P3 layer is set to 36 × 36, the anchor of the P4 layer is set to 78 × 78, and the anchor of the P4 layer is set to 140 × 140, the recall rate and precision, respectively, reach 85.4 and 78.4, compared with when no adjustment is made to the size of the anchor, the recall rate and precision increased by 3 percentage points and 1.3 percentage points, respectively.
3.4.3. Pooled Weighted Channel Attenuation (PWCA) Module
To validate the effectiveness of the proposed Pooled Weighted Attention (PWCA) module in this chapter, this section investigates the influence of global average pooling and global maximum pooling on the detection of pine wilt disease in trees by adjusting the weighted parameter values (λ, β). Additionally, the impact of dimensionality reduction (MLP network) on the performance of the attention mechanism is analyzed through experiments. The specific experimental data are presented in
Table 7.
From the experimental results presented in
Table 7, it can be observed that when
λ and
, the attention mechanism is referred to as ECA [
29]. Additionally, when utilizing one-dimensional convolution instead of the dimensionality compression operation of the MLP network, the accuracy improves by 0.7% compared with that of the baseline. In this case, when the dimensionality compression operation of the MLP network is employed, the attention mechanism becomes CBAM [
30]. Substituting the MLP network in the CBAM attention mechanism with one-dimensional convolution leads to a 0.5% increase in accuracy compared with the baseline. By adjusting the parameter values of
λ and
and analyzing the weighted parameter experimental data, it is found that when
λ and
, the introduction of the attention mechanism has the highest recognition accuracy for the diseased tree. It is evident that the pooling weighted channel attention (PWCA) achieves the highest experimental accuracy, yielding the best detection results for diseased trees. The experimental results on the pine wilt diseased tree dataset indicate that the MLP network has a detrimental effect on the channel attention mechanism. It proves to be inefficient and unnecessary for capturing dependencies among all channels. Conversely, considering the recognition results for pine wilt diseased trees with fewer targets in a single image, the PWCA attention mechanism with an increased weight on global maximum pooling performs better in terms of diseased tree recognition.
3.4.4. Comprehensive Experimental Analysis
To further analyze the impact of the proposed channel attention module and data enhancement module of this paper on the network performance, we designed the ablation experiment after adding each module on the basis of the RetinaNet algorithm. The results of the ablation experiments are shown in
Table 8 below.
It can be seen from
Table 8 that the recall of the proposed module increased from 82.4 to 85.4, the recall increased by 3, the AP increased from 77.1 to 78.4, and the AP increased by 1.3 after the anchor re-setting and the prediction output of the diversion in the RetinaNet network. After adding PWCA to the shallow feature map of the RetinaNet algorithm, the recall increased by 1.9 and the AP improved by 1.1. In the RetinaNet algorithm, the recall and AP of the algorithm were improved by 3.1 and 0.6, respectively, after the STE data enhancement method was adopted. At the same time, after using the PWCA module and STE data enhancement in the RetinaNet network, the recall was improved by 4.5 and the AP was improved by 2.0.
As shown in
Figure 10a, the picture is overexposed, and light photography is brighter than natural light. As shown in
Figure 10b–d, the original mosaic enhanced picture has lost the red and yellow-brown color characteristics of PWDT. Through experiments, it is found that these low-quality samples are mainly generated during HSV transformation of the image during sample enhancement [
31]. To solve this problem, the STE data enhancement proposed in this paper removes HSV transformation operation, significantly improving the quality of the enhanced samples. The red circle represents the small-sized diseased tree after using the STE data enhancement method, as shown in
Figure 10. Under the condition of ensuring the same quality as the original sample, the setting of the fixed scaling scale significantly increases the number of small target samples in the enhanced sample. There is only one small target or even no small target samples in the original image, and the number of small target samples in the transformed sample is increased by more than four to eleven samples, which effectively alleviates the problem of too few positive samples in the training process.
4. Discussion
Through the analysis and statistics of the scale size of the diseased trees in the pine wilt diseased tree dataset, we found that the number of small-scale diseased trees is small, which is not enough for the network model to learn the characteristics of small-scale diseased trees. At the same time, we found that in drone footage, the small-scale diseased tree only occupies a small part of the pixel area in the image, and most of the pixel areas are background pixels. This background information seriously interferes with the feature extraction of the small-scale diseased tree.
As for the problem of background information interference, more and more researchers have begun to use the attention mechanism to alleviate the interference problem [
32,
33]. Therefore, in this paper, we propose a Pooled Weighted Channel Attention module to alleviate the background interference on small-scale diseased tree feature extraction. From the bias of the importance of global maximum pooling and global average pooling to feature learning after conducting research, a large number of experiments have proved that for the detection of small-scale diseased trees, the contribution of global maximum pooling is higher than that of global average pooling. Through the weighted fusion of global large pooling and global average pooling, exploring weight parameters is most suitable for small-scale diseased tree detection.
On the other hand, through the analysis of the advantages and disadvantages of the existing data enhancement methods in small-scale diseased tree data enhancement, we propose a data enhancement method based on small target sample expansion, so that it does not affect the color and shape of diseased tree targets. Based on the characteristics, the number of small-scale diseased trees is expanded. The experimental results show that the data enhancement method proposed in this paper can significantly enhance the number of small-scale diseased trees and the robustness of diseased tree detection.
The current research methods have achieved good results in the detection of small-scale diseased trees, but the detection effect on late-stage diseased trees is not good, and further research and analysis are needed. On the other hand, due to the high cost of acquiring diseased tree datasets, which require huge manpower and material resources, the number of existing pine wilt diseased tree datasets is relatively small. How to learn the characteristics of the target with a small number of labels is the focus of future research. At present, active learning technology is developing rapidly in various fields [
34,
35], and active learning mainly focuses on how to build efficient classifiers with little labeled data. Active learning technology provides a theoretical basis for future research on the identification of pine wilt diseased trees. Next, we will conduct research on tasks such as the classification of diseased trees in the field of active learning.
5. Conclusions
In this paper, to solve the problem of the poor detection effect of existing target detection algorithms on small-sized PWDT, we propose a new target detection network, SPW-FEN, for the detection of PWDT. First, to solve the problem that the shallow feature layer in the existing detection algorithms has insufficient ability to extract the features of small-sized diseased trees, in this paper, a PWCA attention module is proposed and adds the module to the shallow feature map, effectively improving the algorithm’s ability to extract the features of small-scale diseased trees. Moreover, because of the problem that there are too few small-sized diseased trees in a single image, we propose an STE data enhancement method which effectively increases the number of small-sized diseased trees in a single image. The method proposed in this paper can effectively enhance the feature extraction ability of the network for small-sized diseased trees, reduce the missed detection rate of small-sized diseased trees, and achieve efficient detection of small-sized diseased trees in UAV images under complex backgrounds. The experimental results show that the method proposed in this paper has a recognition average precision of 79.1% and a recognition recall of 86.9% for pine wilt diseased trees. The recall and average precision are 3.6% and 3.8% higher than the current state-of-the-art method, Faster-RCNN [
22]. At the same time, they are 6.4% and 5.5% higher than those in the YOLOv6 [
28] algorithm in the latest YOLO series network.
In the future, we will focus on studying how to improve the detection performance of late-stage diseased trees, and use semi supervised feature learning and detection methods on the basis of a small amount of data samples to construct low-cost and high-precision diseased tree detection models. Additionally, we will further study the effect of mixed trees on the identification results of diseased trees, and verify the method for the possibility of error due to the presence of trees of other species (mixed forest).