Next Article in Journal
Fiducial Reference Measurements for Greenhouse Gases (FRM4GHG): Validation of Satellite (Sentinel-5 Precursor, OCO-2, and GOSAT) Missions Using the COllaborative Carbon Column Observing Network (COCCON)
Next Article in Special Issue
LRNet: Change Detection in High-Resolution Remote Sensing Imagery via a Localization-Then-Refinement Strategy
Previous Article in Journal
Identification and Analysis of Wind Shear Within the Transiting Frontal System at Xining International Airport Using Lidar
Previous Article in Special Issue
Transform Dual-Branch Attention Net: Efficient Semantic Segmentation of Ultra-High-Resolution Remote Sensing Images
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Quantitative Remote Sensing Supporting Deep Learning Target Identification: A Case Study of Wind Turbines

1
The Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
2
Advanced Copper Industry College, Jiangxi University of Science and Technology, Yingtan 335000, China
3
Space Engineering University, Beijing 101416, China
4
Institute of Remote Sensing and GIS, School of Earth and Space Sciences, Peking University, Beijing 100871, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(5), 733; https://doi.org/10.3390/rs17050733
Submission received: 26 November 2024 / Revised: 27 January 2025 / Accepted: 17 February 2025 / Published: 20 February 2025

Abstract

:
Small Target Detection and Identification (TDI) methods for Remote Sensing (RS) images are mostly inherited from the deep learning models of the Computer Vision (CV) field. Compared with natural images, RS images not only have common features such as shape and texture but also contain unique quantitative information such as spectral features. Therefore, RS TDI in the CV field, which does not use Quantitative Remote Sensing (QRS) information, has the potential to be explored. With the rapid development of high-resolution RS satellites, RS wind turbine detection has become a key research topic for power intelligent inspection. To test the effectiveness of integrating QRS information with deep learning models, the case of wind turbine TDI from high-resolution satellite images was studied. The YOLOv5 model was selected for research because of its stability and high real-time performance. The following methods for integrating QRS and CV for TDI were proposed: (1) Surface reflectance (SR) images obtained using quantitative Atmospheric Correction (AC) were used to make wind turbine samples, and SR data were input into the YOLOv5 model (YOLOv5_AC). (2) A Convolutional Block Attention Module (CBAM) was added to the YOLOv5 network to focus on wind turbine features (YOLOv5_AC_CBAM). (3) Based on the identification results of YOLOv5_AC_CBAM, the spectral, geometric, and textural features selected using expert knowledge were extracted to conduct threshold re-identification (YOLOv5_AC_CBAM_Exp). Accuracy increased from 90.5% to 92.7%, then to 93.2%, and finally to 97.4%. The integration of QRS and CV for TDI showed tremendous potential to achieve high accuracy, and QRS information should not be neglected in RS TDI.

Graphical Abstract

1. Introduction

Remote Sensing (RS) Target Detection and Identification (TDI) has become an important earth observation application [1,2,3]. With the rapid development of space RS technology, the number of RS satellites launched has exploded in recent decades, as shown in Figure 1. The number of high-spatial-resolution (HSR) satellites is growing particularly rapidly, and they are widely used in many fields [4,5]. HSR images have reached the sub-meter level of resolution and even higher resolution, thus improving their ability to identify small targets on the ground. The target scales of RS image identification range from large structures like airports and ports [6,7] to smaller entities such as airplanes, ships, and cars [8,9]. Identification methods have evolved from the use of expert-defined features and classifiers to data-driven machine learning models.
With the rapid development of artificial intelligence technology in the past ten years, the field of RS image TDI has directly inherited or transformed the Computer Vision (CV) model for natural image processing. With the rapid development of deep learning, RS image TDI research has been transformed from traditional methods, such as template matching algorithms [10] and artificial feature modeling algorithms [11], to identification methods based on Convolutional Neural Networks (CNNs) [12].
Many optical RS image TDI methods based on CNN models (such as Wu et al. [13], RingMo [14], LuoJiaNET [15], etc.) with excellent performance have been proposed. The existing HSR RS image identification methods directly inherit natural image identification methods based on CNNs. However, remote sensors are not ordinary cameras but accurate multispectral radiation cameras. RS data generally have information on spectral, radiation, space, time, and other dimensions [16,17]. Compared with natural images, RS images have more abundant and available feature information. For example, optical images that have been corrected with Atmospheric Correction (AC) have a unified standard surface reflectance (SR) spectrum and radiation information. However, HSR data are not automatically quantified as medium- and low-resolution RS data because of their strong surface heterogeneity. Driven by the needs of quantitative applications such as agriculture and the environment [18,19,20], HSR RS has also gradually developed towards the development of technology in order to obtain richer earth surface space information from satellite tracks.
Unfortunately, in previous studies on the above topics [13,14,15,21], the high-resolution optical images used by the existing data-driven intelligent model are RGB image datasets in JPEG format [22,23], and the quantitative characteristics of RS data are not taken into account. Figure 2 shows the format currently used in RS image TDI, taking the DOTA dataset of LuoJiaSET as an example. The format of these images is 3-band JPEG, in which it is hard to retain Quantitative Remote Sensing (QRS) information. The accuracy of image TDI is constrained by two key factors: the richness of feature information within the input data and the capability of the identification model. Moreover, the development and design of the identification model are highly contingent upon the format and characteristics of the input data. Therefore, in the existing work of optical RS small TDI, the dataset used by most researchers has no QRS features, such as spectra, and it is difficult to introduce QRS features into deep learning models.
In order to explore the method and effects of combining QRS information and deep learning models, a set of compsssrehensive experiment was designed in this paper. Taking the high-resolution image identification of wind turbine based on the YOLO model as an example, quantitative AC was carried out on satellite images to obtain four-band fusion SR images, and the contribution of the addition of QRS information such as spectral and radiation features to TDI was explored in order to provide a reference for the TDI community in future research directions.
Wind turbines were chosen as an example in this paper because the body and shadow of wind turbines consist of important feature information which affect identification accuracy. When the body and shadow of wind turbines were affected by the background environment, the identification accuracy was easily reduced. In terms of the radiation brightness range and spectral differences, wind turbines had a relatively broad representation and sensitivity of ground object types. Therefore, the manually labeled wind turbine RS dataset containing quantitative information was selected as an example in this paper.
The main innovations and contributions of this paper are given as follows.
(1)
Quantitative AC processing was incorporated into the preprocessing stage of HSR images to restore the true spectral characteristics of ground objects. Two distinct types of HSR wind turbine sample databases were established. One was the DN (Digital Number) value sample database, which lacks RS features due to the absence of quantitative AC in the image preprocessing stage. The other was the SR sample database, which preserves spectral reflectance and other RS features as a result of the inclusion of quantitative AC during preprocessing. Compared with DN value data, the performance of SR data was significantly enhanced for TDI on the YOLOv5 model.
(2)
The Convolutional Block Attention Module (CBAM) attention mechanism was introduced into the neck part of the YOLOv5 model to enhance the effective feature information of the wind turbine target, and the model identification effect was improved to some extent.
(3)
Based on the identification results of the model, the unique quantitative spectral reflectance, geometry, and texture features of the wind turbine target were selected using RS expert knowledge as the dynamic threshold discrimination conditions, and the re-identification of the wind turbine was further carried out. The integration of quantitative information effectively eliminated many false detection objects, and the performance was excellent.

2. Related Work

2.1. Optical RS Image TDI

Optical RS image TDI based on deep learning has been extensively studied because of its exceptional performance. Compared with the traditional algorithm, the intelligent identification model can efficiently and accurately predict the optical RS small target and offers improvements in many indicators. Chen et al. [24] proposed an end-to-end deep network for target detection by using the HRS images of Google Earth. Wu et al. [13] proposed a novel approach called C3TB-YOLOv5, combining traditional YOLOv5 with the Transformer model to detect the target of high-resolution RS images. Chen et al. [21] proposed a hybrid and practical framework based on saliency detection for wind turbine extraction, using Google Earth images. In terms of the TDI field of various types of RS data, many multi-modal models have been developed such as the RingMo [14] and LuoJiaNET [15]. Deep learning has dramatically improved the accuracy of RS image identification.

2.2. Hyperspectral Image Classification and Identification

The existing research shows that CNNs have excellent performance in the field of optical RS TDI. Unfortunately, CNN models do not make full use of the characteristic information of RS images, such as spectra. In the field of hyperspectral target classification and identification, the characteristics of QRS information, such as spectra, have been well mined and applied, and good performance has been obtained. Yu et al. [25] mined the spectral information of targets in multiband imagery developed for detecting and identifying low-contrast targets. Xie et al. [26] proposed a spectral–spatial target detection (SSTD) framework in deep latent space by analyzing the mapping relationship between the latent spectral feature space and the original spectral band space. Hang et al. [27] proposed a cascaded RNN model to learn and use spectral feature information, which achieved better results for hyperspectral image classification. Zhang et al. [28] proposed a pixel shape index coupled with spectral information, which could improve the classification accuracy of HSR RS images. Considering the semantic segmentation of RS images, RS information such as spectra has been combined with artificial intelligence models [29]. Hong et al. [30] proposed the first purpose-built foundation model designed explicitly for spectral RS data, SpectralGPT, which considered the unique characteristics of spectral data. Therefore, in the field of high-resolution optical RS TDI, it is regrettable that the information of RS images is not fully utilized in artificial intelligence models.

3. Materials and Methods

3.1. GF-2 Satellite Images and Wind Turbine Sample Databases

3.1.1. GF-2 Satellite Images

The GF-2 (Gaofen-2) satellite is the first civilian optical RS satellite with a spatial resolution of more than 1 m developed by China. GF-2 was successfully launched on 19 August 2014, at the Taiyuan Satellite Launch Center, and it can achieve a spatial resolution of up to 0.8 m at the nadir point [31]. GF-2 is equipped with two high-resolution cameras, a 1 m panchromatic camera, and a 4 m multispectral camera [32]. The multispectral camera includes the four bands of BRG-IR (blue, red, green, Near-infrared) [33]. Please refer to Table 1 for the specific spectral bands. GF-2 images have four spectral bands. These four spectral bands were used in this paper to explore whether they have any effect on high-resolution Remote Sensing image target detection.
Due to the advantage of the high-resolution of GF-2 satellite images, these images were used as the data source for TDI. The 113 GF-2 satellite images containing wind turbines were collected from multiple regions across China. The time range of the images was from 2017 to 2022, including various seasons and months. The covered regions included plains, mountains, coastal areas, and other regions. Among them, 102 images were used to construct the sample database, while the remaining 11 images were reserved for testing the identification performance of the model on the whole RS images.

3.1.2. Data Preprocessing

The downloaded GF-2 satellite L1-level data need to undergo a series of preprocessing steps, as shown in Figure 3. Based on the experimental requirements, two sets of GF-2 wind turbine images were obtained: the DN value images and the SR images. The DN value is the brightness value of RS image elements and the gray value of recorded ground objects. The SR images were obtained through sequential orthorectification, quantitative AC, and image fusion. And the DN value images were obtained from GF-2 L1-level images only after orthorectification and image fusion.
In RS imaging, various geometric distortions may occur in images, such as squeezing, stretching, distortion, and offset. Considering the information contained in the “RPB” (Raster Processing Bitmap) metadata file of GF-2 images, multispectral and panchromatic images can be orthorectified using Rational Polynomial Coefficient (RPC) orthorectification [34]. Orthorectification was an effective method for ensuring the inherent geometric quantification features of ground objects. Based on the studied QUAAC system [35], quantitative AC was performed on the orthographically corrected images. A comparison of the results before and after quantitative AC is shown in Figure 4. The quality of the corrected images was significantly improved, and visually, the clarity of the images was improved. Additionally, the texture features of ground objects were enhanced. The pansharp algorithm was used to conduct image fusion between the multispectral and panchromatic images, which improved the resolution of the images [36]. The fused images not only had the high-resolution characteristics of panchromatic images but also contained the spectral and color features of multispectral images. And the features of the wind turbine targets were enhanced.

3.1.3. Sample Labeling

The difference between the two sets of images was based on whether quantitative AC was conducted. SR images with the intrinsic information of ground objects were obtained from the DN value images through the process of quantitative AC. The preprocessed images were cropped and marked, and two groups of wind turbine HSR RS sample databases were constructed: the DN value sample database and the SR sample database.
During model training and testing, the input was the RGB three-band images. The data used were in TIF format in this paper. Data in JPEG format only have three bands, and it was difficult to retain information in the four bands of the GF-2 images. Compared with data in JPEG format, data in TIF format can preserve more feature information with high fidelity, such as texture.
Due to the influence of physical characteristics such as satellite viewing angles and solar altitude on the ground targets in RS images, the areas of the wind turbine body and shadow were different in each image. The body of the wind turbine in some images may be relatively small with a long shadow. In other images, the body of the wind turbine may be relatively large and the shadow relatively small. The high-light areas of most wind turbine bodies occupied a small proportion of the images. If only the RS features of the wind turbine body or shadow were used as training data, the sample’s specificity requirements cannot be met. This may make it difficult for the deep learning model to converge and may result in a large number of false detections during the TDI process. In order to allow the model to better learn the feature information of the wind turbine body and shadow, the wind turbine body and the shadow projected on the ground were taken as wind turbine features. The overall annotation of the wind turbine body and shadow was marked by the minimum bounding rectangle, as shown in Figure 5.
Due to the large width range of RS images, the annotated images were cropped to a size of 416 × 416, resulting in 7333 images. These images were split into training, validation, and testing sets in an 8:1:1 ratio. And the 11 retained images were taken as DN value test images and SR test images.

3.2. Experimental Strategy and Methods

In this paper, a two-stage series pattern was proposed. This began with the initial identification of RGB images using an intelligent model. Subsequently, the re-identification process was carried out by leveraging the expert knowledge derived from the four-band (BRG—IR) data of GF-2 SR images. This comprehensive experiment was divided into four steps to explore the potential role of spectral and radiometric information obtained by quantitative processing in TDI. The specific research route is shown in Figure 6. The two sets of the constructed sample databases were applied to the YOLOv5; YOLOv5_AC (YOLOv5 model trained with SR sample database); YOLOv5_AC_CBAM (the CBAM attention mechanism was added into the YOLOv5 model, and the improved model was trained and tested using the SR sample database); and YOLOv5_AC_CBAM_Exp (QRS features which were selected using the expert knowledge constraint identification results of YOLOv5_AC_CBAM) methods in turn. The identification effects of different models were compared and analyzed, and the accuracy of TDI was expected to be improved step by step. When these models were compared step by step, the parameters of the model training and testing steps were set identically.

3.2.1. YOLOv5

YOLO models belong to the mainstream algorithm in the TDI task, which is fast and has relatively high precision. Zi et al. [37] developed an EOLO framework on the basis of the YOLOv5 model, which realized automatic SAR eddy detection with high accuracy and good generalization. Zhang et al. [38] proposed an efficient detector called feature enhancement-, fusion-, and context-aware YOLO (FFCA-YOLO), which achieved high precision on two public RS datasets (VEDAI and AI-TOD) for small object detection. Zhang et al. [39] proposed an LS-YOLO model based on YOLOv5, which had high performance for multiscale landslide detection with RS images.
Therefore, YOLO, which is widely used and highly representative, was chosen as the basic framework, which shows a good trade-off between speed and accuracy. In this paper, taking the YOLOv5 model as an example, research on the deep coupling method in which an RS mechanism constrained a deep learning model for wind turbine TDI was carried out. The classic YOLOv5 network model was adopted in this paper, mainly to compare the effects before and after adding quantitative information to the model. There are four YOLOv5 network models of different sizes: YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x. The four differently sized network models are controlled by adjusting the relevant parameters to control the depth and width of the network. The YOLOv5 network mainly includes four parts: input, backbone, neck, and prediction [40]. The wind turbine image was taken as input for the backbone to obtain the image features. The neck part was used to extract feature information and generate feature maps. And the prediction part was used to generate the bounding boxes and predict the category for the generated feature maps.
The RGB three-band images of the DN value sample database were selected and input into the YOLOv5m model for training and testing. The experimental environment and parameter setting were kept the same in all experiments: input image size of 416 × 416 × 3, 100 training iterations, and a batch size of 8. Other parameters were kept at the default settings. The pixel value range of RS images does not have a uniform range, and there may be a large-scale difference. For the common RGB image format, the pixel values range from 0 to 255. In order to obtain good results and reduce the calculation time of YOLOv5, the input data were normalized in (0, 255) before inputting them to YOLOv5. The pixels values of the DN value images were normalized from 0 to 255 by the normalization method, calculated as follows:
D N x = D N x D N m i n D N m a x D N m i n × 255
where D N x represents the DN value of the image, and D N m i n and D N m a x represent the minimum and maximum DN values in the image.
In the prediction, the parameters were set consistently during identification. The confidence threshold was uniformly set to 0.5, and I o U was 0.5.

3.2.2. YOLOv5_AC

In order to explore the role of quantitative AC, the RGB three-band images of the SR sample database were also input into the YOLOv5m model for training and testing. The pixel value of the SR image obtained after AC represents the spectral reflectance of ground objects. The SR image is not influenced by illumination brightness; rather, it is an intrinsic attribute of the surface object itself. The SR value exhibits significant variability, spanning a range from 0 to 1. The SR of water and solar flares may be greater than 1, but the influence on wind turbine identification can be ignored. Prior to data input into the network model, the normalization method applied to the SR sample database differed from that of the DN value sample database. Specifically, the SR sample database utilized the normalization method described in Equation (2), which was designed to preserve the intensity information of the quantitative spectral reflectance. According to the artificial statistics, the threshold ρ m a x was set to be a vector (0.65, 0.85, 0.85) for the SR images in the blue, green, and red of the three wavebands. SR values larger than ρ m a x were set to 255. At the data level, spectral reflectance information was added. Finally, in order to speed up the data loading and convergence of the training network, the values of image were all normalized to the range of 0 to 255. The loss function curve of the model became stable after 100 iterations and finally converged.
ρ = ρ ρ m a x × 255
where ρ represents the SR of the image, and ρ m a x is set to the maximum spectral reflectance of each band.

3.2.3. YOLOv5_AC_CBAM

In order to further enhance the identification accuracy of small targets in HSR images, the CBAM attention mechanism, which considers both channel and spatial dimensions, was incorporated into three locations within the neck section of the YOLOv5 network, as shown in Figure 7. In the TDI task involving wind turbines across various regions and diverse scenarios, small targets often occupy a limited number of pixels. Their feature information is prone to being lost within the deep network, leading to issues such as missed detection and false detection.
The CBAM attention mechanism consists of a channel attention module and spatial attention module [41], as shown in Figure 8. The CBAM can effectively increase the weight of the small targets in the entire feature map through channel and spatial attention modules so that the target information is easier to learn for the network. The input image features are denoted as F , and the channel attention map was generated using the channel relationship between the features and then multiplied by F to form a new feature F 1 to enhance the features related to the wind turbine target area. Then, the internal spatial relationship between the features was used to generate the spatial attention feature map, which was multiplied with F 1 to obtain the output feature map F 2 . The weight of the features in the target region of the wind turbine was enhanced from the spatial relationship between the channels and features. The channel and spatial modules worked together to learn crucial local details in the images, improving the network’s attention to the wind turbine in the images and highlighting the important features of the targets [41,42,43]. And the network’s feature learning and representation capabilities of the network were improved.
Firstly, the max pooling and average pooling were used to extract the channel attention module information, and then filtering, activation, and normalization were carried out to improve the channel information extraction ability, as shown in Equation (3). In the RS detection of wind turbine targets, some channels may excel at capturing the fine textures and edge details of wind turbines, which is essential for differentiating them from the background. By employing the channel attention mechanism, the importance of these key channels is amplified, enabling the network to concentrate more on these features during further processing.
M C ( F ) = σ ( ω 1 ( ω 2 ( A v g p o o l ( F C ) ) ) ) + ω 1 ( ω 2 ( M a x p o o l ( F C ) ) )
where σ is the sigmoid function; ω 1 and ω 2 represent the weights of the two layers in multi-layer perception; A v g p o o l ( F C ) and M a x p o o l ( F C ) represent the feature maps after averaging and maximizing pooling. Then, the obtained channel attention map M C is multiplied with the original feature map F , resulting in the channel attention feature map F 1 .
Then, the spatial attention mechanism focuses on the local information of small targets. The information is filtered through max pooling and average pooling, and then important target information is extracted from the filtered data by using convolution, as shown in Equation (4). In the RS detection of wind turbine targets, wind turbines often occupy a limited area in the image and can be surrounded by complex background distractions. Through the spatial attention mechanism, the network is able to pinpoint the spatial location of wind turbines, amplifying the feature responses in these areas while dampening the features in the background regions.
M s ( F 1 ) = σ ( f 7 × 7 ( [ A v g p o o l ( F S ) ; M a x p o o l ( F S ) ] ) )
where σ is the sigmoid function; f 7 × 7 represents a convolution operation with a filter size of 7 × 7 ; A v g p o o l ( F S ) and M a x p o o l ( F S ) represent the feature maps after averaging and maximizing pooling. Finally, the final attention feature map F 2 is obtained by multiplying M s and F 1 .

3.2.4. YOLOv5_AC_CBAM_Exp

Based on the above identification result of the YOLOv5_AC_CBAM model, a threshold re-identification method coupled with QRS information using expert knowledge was proposed, which can further eliminate the false detection objects identified. The unique QRS features of wind turbine targets and other common false detection objects were analyzed and selected to constrain the identification result of YOLOv5_AC_CBAM. Through experimental comparison, the method of constraining deep learning models using a QRS mechanism showed positive effects.
To identify the features that exhibit significant differences between wind turbine targets and false detection objects, the YOLOv5_AC_CBAM model was employed to analyze the SR test images. Specifically, statistical analysis was conducted on six types of objects that had a high probability of being falsely detected. The striated ground, green land, road, farmland, building, and power tower were mainly contained. In Figure 9, it can be observed that these false detection types were characterized by a combination of bright and dark shadow pixels, but they exhibited significant differences in terms of geometry, texture, and other features.
(1)
Quantitative Spectral Reflectance Feature Selection
In the SR image, the spectral reflectance of the same object type is highly similar. Therefore, effective information on wind turbine targets can be separated from quantitative spectral reflectance features. In order to objectively compare the difference in spectral features between falsely detected ground objects and wind turbines, the average spectral curves of the above six falsely detected ground objects and wind turbines were counted, as shown in Figure 10. Five different wind turbine examples were used to analyze the spectral characteristics. It can be observed that the shadow of wind turbines belonged to dark pixels with reflectance values ranging from 0 to 0.1. The predicted bounding boxes of the falsely detected ground objects basically contained dark pixels, making it difficult to extract unique information from them. The reflectance values of the wind turbine body were generally higher than those of other objects in four bands except for the bottom of the power tower. The wind turbine body belongs to the high-brightness element, and the reflectance of each band was relatively high and was above 0.2. In the blue band, except for the bottom of the power tower, the reflectance of other ground objects was generally lower, below 0.25. It was easier to separate the unique spectral feature information of the wind turbine target from the blue band. In the other three bands, it was difficult to separate the unique effective spectral information of the wind turbine from reflectance.
In the blue band, for the wind turbine and several common false detections, the reflectance histograms of their prediction bounding boxes was further counted, as shown in Figure 11. It can be observed that with 0.2 as the reflectance dividing value, pixels with reflectance greater than 0.2 belonged to high-brightness pixels, including the wind turbine body. And for the false detection objects, there were relatively fewer or no high-brightness pixels with reflectance values more than 0.2. In order to ensure that the real wind turbine was not mistakenly eliminated as much as possible, the reflectance threshold was set to 0.18 in this paper. The maximum and minimum pixel numbers with reflectance values of the target prediction bounding boxes greater than 0.18 in the blue band were counted, and they were used as the threshold constraints of quantitative spectral reflectance features, calculated as follows:
M i n ( B S R ) O b j ( B S R )   M a x ( B S R )
where M i n ( B S R ) and M a x ( B S R ) refer to the minimum and maximum pixel numbers with reflectance greater than 0.18 within the prediction bounding boxes of wind turbine targets in the blue band; O b j ( B S R ) represents the total pixel number with reflectance greater than 0.18 within the prediction bounding boxes of the objects waiting to be identified.
However, only relying on spectral reflectance may lead to the phenomenon of “different objects with same spectrum”, and it was difficult to eliminate the power tower with high similarity features of the wind turbine. The texture, geometry, and other feature information can be used to achieve complementary advantages.
(2)
Quantitative Geometric Feature Selection
The changes in imaging geometric information were mainly affected by the geometric conditions of observation and solar angles. When compared to the geometric features of accurately predicted bounding boxes for wind turbine targets by the model, the overall area of the bounding boxes for some falsely detected objects was significantly larger or smaller. In some cases, the dimensions of these bounding boxes were excessively elongated or wide. (a), (c), and (e) and (b), (d), and (f) of Figure 12 are the results identified by YOLOv5_AC_CBAM in the two images. (e) and (f) are the correct wind turbine targets identified, while (a), (b), (c), and (d) are falsely detected objects. Comparing the prediction bounding boxes of (e) and (f) with (a) and (b), the geometric area of false detection bounding boxes was significantly smaller. Similarly, comparing the prediction bounding boxes of (c) and (d) with (e) and (f), the aspect ratio of false detection bounding boxes was significantly higher.
The maximum and minimum total area and aspect ratio of the wind turbine targets’ prediction bounding boxes were used as the threshold constraint conditions of the geometric features. When both threshold conditions were met, that is, when Equations (6) and (7) held at the same time, a wind turbine target was identified.
M i n ( S P i x e l )   O b j ( S )     M a x ( S P i x e l )
M i n ( h w )   O b j ( h w )     M a x ( h w )
where M i n ( S P i x e l ) and M a x ( S P i x e l ) and M i n ( h w ) and M a x ( h w ) represent the minimum and maximum total pixel area and aspect ratio of the wind turbine targets’ prediction bounding boxes correctly predicted, respectively; O b j ( S ) and O b j ( h w ) refer to the total pixel area and aspect ratio of the prediction bounding box of the object waiting to be identified.
(3)
Image Texture Feature Selection
The Gray Level Co-occurrence Matrix (GLCM) is a matrix used to extract texture features and improve the classification accuracy of RS images [44]. To better utilize the GLCM to describe texture features intuitively, the parameters that reflect the texture features of the matrix were extracted from the GLCM. Six commonly used feature parameters from the GLCM were selected: Homogeneity (HOM), contrast (CON), Dissimilarity (DIS), Entropy (ENT), Angular Second Moment (ASM), and Correlation (COR). HOM is used to measure the local variation in the texture in the image. CON reflects the clarity of the image and the depth of texture grooves. DIS is similar to contrast, and if the local contrast of the image is high, the difference value will be greater. ENT and ASM describe the uniformity of the image grayscale distribution and the degree of texture thickness. And COR describes the similarity of local grayscale in the image.
The six texture feature parameters of the wind turbine target and the false detection objects were calculated and analyzed, as shown in Figure 13. From the five parameters of HOM, DIS, ENT, ASM, and COR, it was difficult to separate the unique characteristics of the wind turbine target from them. The CON value of wind turbine targets was generally higher than those for other false detection objects at the π 4 and 3 π 4 angles. CON at the π 4 and 3 π 4 angles showed significant differences compared to most false detection objects, providing distinguishable information segments. Therefore, by summing CON at the π 4 and 3 π 4 angles and using the maximum and minimum value of the sum as the threshold discrimination criteria, some false detection objects with significant differences in texture features can be eliminated, calculated as follows:
θ = C O N ( π 4 ) + C O N ( 3 π 4 )
M i n ( θ )   O b j ( θ )   M a x ( θ )
where θ is the sum of the CON value at the π 4 and 3 π 4 angles, M i n ( θ ) and M a x ( θ ) represent the minimum and maximum values of CON summation at the π 4 and 3 π 4 angles within the prediction bounding boxes of wind turbine targets, and O b j ( θ ) is the sum of CON at the π 4 and 3 π 4 angles within the predicted bounding boxes of the objects waiting to be identified.
(4)
Dynamic Threshold Re-identification Method with Expert Knowledge
When the deep learning model performs target identification, it will output the confidence value for each predicted target. The confidence value represents the probability that the model thinks the target is a real wind turbine with the value ranging from 0 to 1. A higher confidence value indicates a higher likelihood that the target is a real wind turbine. When the model predicted a target with low confidence, it may be a non-target object with similar features to wind turbines, such as power towers, roads, buildings, etc. Therefore, further discrimination was needed, using the selected unique features of real wind turbines to achieve more accurate identification. The wind turbine targets with high confidence in model identification were utilized to extract quantitative spectral reflectance, geometric features, and image texture features. These extracted features from high-confidence targets were then established as the criteria that low-confidence targets must meet. To select the appropriate confidence threshold, confidence thresholds were set at 0.75, 0.8, 0.85, and 0.9 for image identification. The calculation formula is as follows:
P w = T w T i
where P w is the probabilities of the identified targets being real wind turbines, T w is the number of correctly identified wind turbines, and T i represents the number of identified targets by YOLOv5_AC_CBAM.
The results are shown in Table 2. When the confidence threshold was 0.8, P w was already 100%, which indicated that the targets with a confidence higher than 0.8 in the YOLOv5_AC_CBAM identification results were all considered to be correct wind turbine targets. Therefore, the confidence threshold was set to 0.8. High confidence was defined as a value greater than or equal to 0.8 and low confidence defined as less than 0.8.
Owing to the varying sizes of wind turbines with different power outputs, the fluctuating lengths of wind turbine shadows throughout the day, and the multi-angle imaging characteristics of RS satellites, the appearance of wind turbines can exhibit substantial variations even within the same set of source images. The same wind turbine on different GF-2 images had a large difference in the scale of body and shadow. If a fixed threshold was used to re-identify all low-confidence wind turbine targets in all images, the system no longer accurately distinguished between real wind turbines and false detection objects, resulting in the failure to remove falsely detected objects and even the erroneous removal of real wind turbine targets. Applying the threshold range from one image directly to other images was not suitable since the wind turbine target characteristics may have changed. And the corresponding threshold conditions were no longer valid for other images. Therefore, the threshold range must be selected dynamically according to the change in the image so as to break through the problem of poor regional adaptability.
Although there are significant differences in the scale of the wind turbine body and shadow in different images, the high-resolution satellite field of view is relatively small. During shooting, the solar elevation angle changes less. Therefore, the wind turbine body in the same image is highly similar to the size of the shadow, and the similarity of the wind turbine feature in each scene image can be used as the target re-identification basis. In view of the different features of wind turbine targets in different images, a dynamic threshold re-identification method was proposed with expert knowledge, as shown in Figure 14. Firstly, based on the identification results of the above YOLOv5_AC_CBAM, the effective feature information of the geometry, texture, and spectral reflectance of the high-confidence wind turbine target identified by the model was collected and counted as the threshold constraints for the re-identification of low-confidence objects. Secondly, the threshold constraints were set according to Equations (5)–(8) to identify low-confidence objects. Objects that do not meet the threshold constraints were directly discarded, while those that satisfy the conditions were considered as the wind turbine targets and retained. Finally, all the targets that meet the conditions were kept and saved. Before the re-identification of diverse images, the system will recalculate the high-confidence feature information of wind turbine targets and dynamically adjust the threshold range. This approach effectively addresses the issue of the model’s limited adaptability across different regions.

3.3. Evaluation Indexes for TDI

The model validation indexes used for TDI based on the sample database include precision ( P ), Recall ( R ), and average precision ( A P ) [45,46]. P measures the classifier’s accuracy in predicting the targets, which is the ratio of correctly predicted positive samples by the classifier. R reflects whether the classifier can identify all positive samples, representing the ratio of correctly predicted positive samples among all positive samples. The formulas for calculating P and R are represented as follows:
P = T P T P + F P
R = T P T P + F N
where T P represents the number of true positive samples correctly classified as positive, F P represents the number of false positive samples incorrectly classified as positive, and F N represents the number of false negative samples incorrectly classified as negative.
A P is a metric used to measure the identification accuracy of a model in target detection, calculating the average precision of a class by the P R (precision–Recall) curve integral. The formula for calculating A P is as follows:
A P = i = 1 m P ( i ) R ( i ) = 0 1 P ( R ) d R
where A P @ 0.5 represents the value of AP when the I o U (Intersection over Union) threshold is 0.5, and A P @ 0.5 : 0.95 means that I o U thresholds are selected in the range of 0.5 to 0.95 with a step of 0.05 and the average of A P values at different I o U thresholds is collected. I o U is used to calculate the ratio of the intersection to the union between the predicted bounding box by the model and the ground truth bounding box.
In this paper, the application indexes of large RS images were paid more attention. Therefore, for the TDI results of large-scale RS images, the evaluation of the TDI performance is conducted using accuracy ( A c c ), the false alarm rate ( F r ), and the missed detection rate ( M r ). A c c refers to the proportion of correctly predicted targets, and F r represents the proportion of instances that are incorrectly identified as targets. M r represents the probability of all real targets being missed in the test images. The formulas for calculating these metrics are as follows:
A c c = T P T P + F P + F N
F r = F P T P + F P
F r = F P T P + F P

4. Results

In this paper, an experimental strategy was proposed based on the YOLOv5 model. Following this strategy, three different methods, YOLOv5_AC, YOLOv5_AC_CBAM, and YOLOv5_AC_CBAM_Exp, were developed successively, with each method building upon the previous one. The goal was to more effectively integrate QRS information into deep learning TDI. The identification performance of these different methods was then compared and analyzed to evaluate their respective advantages.

4.1. YOLOv5

The test set of the DN value sample database and RS test images was identified, and the above indexes were calculated, as shown in Table 3. The P , R , A P @ 0.5 , and A P @ 0.5 : 0.95 indexes were used to evaluate the identification results of the test set, and the application indexes of A C C , F r , and M r were applied to evaluate their identification effects on large-scale RS images. The remaining 11 GF-2 satellite images were selected as test images. The selected test images included diverse geographical regions and seasons, such as plains, mountains, and the seaside. It can be seen that the identification accuracy of the YOLOv5 model performed well in the task of wind turbine TDI.

4.2. YOLOv5_AC

To quantitatively compare the performance of YOLOv5 and YOLOv5_AC on the test set, the training and test parameters were consistent. The trained network model weights were utilized to perform target identification on the corresponding test set. The test set in the SR sample database was identified and compared with the identification results of YOLOv5. The P , R , A P @ 0.5 , and A P @ 0.5 : 0.95 indicators were calculated, as shown in Table 3. It was evident that after quantitative AC, the identification performance of YOLOv5_AC was significantly improved. And the four indexes of YOLOv5_AC were better than those of YOLOv5.
The A C C , F r , and M r indexes of YOLOv5 and YOLOv5_AC were calculated and compared, as shown in Table 4. Since the size of the whole RS image is large, generally above 20,000 × 20,000, it contains more non-target objects with similar features to wind turbines. Therefore, compared with the cropped images in the sample databases, the identification difficulty of RS images was relatively high. It can be seen that the YOLOv5_AC model trained with the SR database achieved relatively excellent performance in terms of image identification accuracy, false alarm rate, and miss rate. Compared with the identification results of YOLOv5 in DN value test images, the identification accuracy of YOLOv5_AC on SR test images was increased from 90.5% to 92.7%, an increase of 2.2%. Additionally, in the prediction results of YOLOv5_AC, the number of false detection and missing detection cases was significantly reduced, and the false alarm rate and miss rate were reduced by 1.2%. It was shown that adding quantitative AC during data preprocessing can significantly improve the effects of TDI.

4.3. YOLOv5_AC_CBAM

In order to verify the influence of the CBAM attention mechanism on the identification effect, YOLOv5_AC_CBAM and the YOLOv5_AC were trained and used to identify the SR test images separately, and the identification results are shown in Table 4. The parameter settings were consistent in the training and testing stages. After adding the CBAM attention mechanism, the overall identification performance was improved. The accuracy increased by 0.5%, and the false alarm rate decreased significantly to 6.1%. However, the improvement in the miss rate was not as pronounced. Although the CBAM attention mechanism emphasized wind turbine features, due to the feature similarity between wind turbines and other ground objects, YOLOv5_AC_CBAM had also partly increased learning weights for these ground objects. YOLOv5_AC_CBAM demonstrated an overall improvement in performance, and it was used for subsequent experiments.
In order to further verify the effectiveness of YOLOv5_AC_CBAM, it was compared with the two-stage classical FasterRCNN network and the YOLOv7 network [47]. ResNet was used as the backbone network of FasterRCNN, and the pre-training weight of YOLOv7x.pt was used by YOLOv7. The input data were from the SR sample database, and the other parameters were the same. In order to eliminate the randomness of the model identification results, the test set was tested several times by using the trained weight, and the average identification results are shown in Table 5.
It can be seen that compared with the FasterRCNN and YOLOv7 networks, the identification effect of the YOLOv5 network was better, indicating that the YOLOv5 network was relatively more suitable for the TDI of the HSR wind turbine. So YOLOv5 was selected as the research model in this paper. For the identification of RS small targets, the sample size of small targets is small, and the features are limited. Compared with YOLOv5, YOLOv7 adopts a deeper network structure, and it is difficult for the deeper feature map to fully learn the feature information of RS small targets. YOLOv5 was adaptable because of the use of the Darknet-53 architecture that comprises residual networks and a DarknetConv2D structure [22]. Therefore, YOLOv7 is not as suitable as YOLOv5 in the current research scenario. The indicators of YOLOv5_AC_CBAM were improved to some extent. This suggests that the integration of the CBAM attention mechanism into the neck component of the YOLOv5 network bolstered the extraction and learning of wind turbine features, thereby facilitating the more accurate detection and identification of wind turbines.
Since the enhancement obtained from adding the CBAM attention mechanism was smaller, in future improved versions, deep learning networks that can be compatible with spectral, radiation, and other RS features will be studied. Furthermore, a multi-feature threshold constraint re-identification method with expert knowledge was used to improve identification effects.

4.4. YOLOv5_AC_CBAM_Exp

The identification results of YOLOv5_AC_CBAM and YOLOv5_AC_CBAM_Exp on SR test images were compared, as shown in Table 4. It can be observed that by simultaneously incorporating the three threshold conditions, F r significantly decreased from 6.1% to 1.6%, a reduction of 4.5%. However, the cost of this improvement was a 0.4% increase in M r . In terms of overall accuracy, there was a remarkable increase of 4.2%, reaching as high as 97.4%. At the cost of a slight increase in missed detections, F r can be greatly reduced by the method of threshold re-identification with QRS features so as to improve the overall identification effect.

5. Discussion

5.1. Effect Analysis of Quantitative Data Processing

In a snow-covered environment, the highlighted area of the wind turbine was easily obscured by snow or integrated with snow, resulting in target features that were difficult to distinguish. Additionally, influenced by surface types and sunlight, the body of the wind turbine occupied only a small number of pixels. The shadow of the wind turbine was elongated, comprising dark pixels. This made it highly susceptible to blending in with dark backgrounds, thereby significantly increasing the difficulty of TDI. In Figure 15, in which it is shown that YOLOv5 exhibited false and missed detections, YOLOv5_AC can accurately identify targets. In Figure 15a, the false detection objects were predominantly constituted by objects with strong reflections and shadows with low reflectivity, both of which displayed striped textures. These textures bore a striking resemblance to the characteristics of the wind turbine’s body and its shadow. This observation indicates that even in such a complex background, the YOLOv5_AC model is capable of accurately identifying targets. This capability underscores the model’s robust adaptability and relatively high level of robustness.
It can be found from Table 4 that the identification effect of YOLOv5_AC improved compared with YOLOv5. The only difference between the two methods lies in the input data, while the network models were essentially the same. Therefore, the performance improvements in YOLOv5_AC can be attributed to quantitative AC. Compared with YOLOv5_AC, the identification accuracy of YOLOv5_AC_CBAM was improved by 0.5%. Since YOLOv5_AC_CBAM used the same data source as YOLOv5_AC, only the network structure changed. In this process, the model structure improvements played a role. Overall, YOLOv5_AC_CBAM_Exp showed the most improvement. QRS feature information was added into YOLOv5_AC_CBAM_Exp, and the improvement in the identification effect was very obvious. Quantitative AC contributed a lot, and model improvements also played a role.
The accuracy of image identification was significantly improved after quantitative AC. It was proven that the quantified RS information had a positive effect on the identification of RS targets in deep learning models and was of great significance for improving the accuracy of RS TDI. The RS data, which undergo quantitative AC, were more conducive to the development of high-resolution small TDI.
In addition to the common features of natural images, the RS images after quantitative AC also had spectral reflectance features, which were not available in natural images. Under the action of quantitative data processing, on the one hand, the clarity and quality of image data were enhanced, and the physical features of the images, such as geometry and texture, were enhanced from the input data level of the model. On the other hand, after image correction, QRS information such as spectral reflectance and radiation were added to the input of the model. From the input level of the model, the target feature quantity was increased, and the data processing foundation was laid for other fusion identification methods. AC actually enhances and imparts the quantitative information of radiometric dimensions to RS images, so the improvement in TDI accuracy brought by AC indicates that radiometric quantification information has a positive effect on the improvement in CV TDI.

5.2. The Overall Effectiveness of Using QRS Information

In fact, when observing the deep learning model’s identification results, it can be noticed that most of the “false targets”, which were mistakenly identified as real targets by the model, can be easily distinguished as “true” or “false” through visual interpretation. This also indirectly indicates that deep learning models cannot solely be relied on to fully learn the seemingly obvious high-resolution target feature information. Therefore, it was essential to fully explore the quantitative information differences between “true” and “false” targets, allowing the computer to intelligently distinguish the “false targets” mixed in the target group, similar to human visual perception. In this paper, through comparing and analyzing wind turbines with other ground objects, three kinds of feature information of wind turbine targets were selected: quantitative spectrum, geometry, and image texture. The obvious feature differences in high-resolution RS targets were used to constrain the identification results of YOLOv5_AC_CBAM using expert knowledge, and the “false targets” were further filtered out from the target group to achieve accurate identification.
Considering the characteristics of the multi-angle imaging of the RS satellite, the variations in the wind turbine are significant in the different GF-2 images. But in the same GF-2 image, the features of the wind turbine such as size have high similarity. According to the experimental results, the targets predicted by the model with a confidence of more than 0.8 were considered as “true targets”. The threshold ranges of three quantitative features were determined according to the characteristics of these “true targets”. The feature threshold changed for each image. Each set of feature thresholds was only suitable for a single GF-2 image currently being detected and was not suitable for all images. However, the method of selecting the threshold was the same for all images. The threshold determination method proposed in this paper was universal for other RS images.
To verify and analyze the effectiveness of each introduced feature threshold constraint condition, an ablation experiment was conducted. Based on the three threshold conditions mentioned above, (Fa) quantitative geometric features, (Fb) image texture features, and (Fc) quantitative spectral reflectance features, the ablation experiment was performed in a single or combined manner, and the experimental results are shown in Table 6.
From the results of Fa, Fb, and Fc, the improvement effect of Fa was the most obvious, and F r was greatly reduced, but the corresponding M r increased the most. The effect of Fa was better than the effect of Fb, Fa + Fb, and Fa + Fc, which showed that the information based on quantitative geometric features played the most important role in threshold re-identification, followed by quantitative spectral features. Information based on image texture features played a relatively small role. The surface texture features of the ground objects were different, and a fraction of the false detection features with large differences in texture information could be removed. For wind turbine targets, some false detection objects exhibited significant geometric feature differences from real wind turbines, such as green land and farmland. Geometric features were unique quantitative features in RS images with fixed spatial resolution, which were different from natural images. The identification frame was too long or too wide for the wind turbine target in the RS influence of fixed spatial resolution, which was an anomaly. It was easy and effective to use geometric information to eliminate these “false targets”. The quantitative spectral reflectance feature had a good effect in terms of removing the false detection objects with a large proportion of dark pixels, such as green land. If the proportion of highlighted elements such as buildings and roads was relatively large, the removal effect was weak only in terms of quantitative spectral reflectance features. On the whole, these three kinds of characteristic information all had a positive effect on TDI.
Coupled with QRS information, the positive effect of YOLOv5_AC_CBAM_Exp was the most obvious. Geometric and textural features were common to both natural and RS wind turbine images, and theoretically, they could be absorbed by deep learning models. According to the verification results of Table 6, it was proven that the deep learning model was sufficient for image texture feature mining, while the contribution of using expert knowledge to texture feature re-identification was not obvious. However, the geometric features of RS images with fixed spatial resolution have the characteristics of quantification. The deep learning model cannot fully mine and absorb the quantitative geometric features of RS images, while the use of expert knowledge for geometric quantitative feature re-identification results in a large accuracy improvement. Therefore, the quantitative geometric features of RS images obtained using the YOLOv5_AC_CBAM_Exp method can be fully utilized to maximize its effectiveness. The intrinsic information of ground objects was restored with the support of quantitative AC, and each ground object has similar spectral trends. The spectral feature of the real target was mined from the reflectance of the target in each band, and the “false targets” with excessive deviation from the spectral feature of the target were eliminated by fully combining the quantitative spectral information.
Quantitative spectral reflectance, geometric, and texture feature information complement each other, and different features contribute to identification accuracy to achieve complementary advantages. For other high-resolution small TDI tasks, effective quantitative feature information can be explored based on the specific characteristics of the targets and applied to achieve precise TDI.

5.3. Comparison and Analysis of Different Methods

The exploration and research of QRS information supporting deep learning model fusion identification were carried out layer by layer in this paper, and progress was made layer by layer. Compared with the research on using deep learning for RS small TDI [48,49,50], these proposed methods not only improved the network model but also paid more attention to the quantitative processing of data and the effective use of QRS information in deep learning models. The different methods proposed demonstrated varying degrees of positive effects on the TDI of high-resolution wind turbines.
Unlike most deep learning models that use high-resolution RS target datasets [51], in this paper, quantitative AC was performed in the data preprocessing stage, which removed the strong noise brought by the atmosphere to RS TDI. The intrinsic information of the surface was restored, and the quantitative information of RS targets was mined to contribute to TDI. The corresponding identification accuracy improvement effect of the different methods is shown in Figure 16. The wind turbine sample database constructed after quantitative AC was applied to the YOLOv5 model, resulting in an increase in image identification accuracy from 90.5% to 92.7%, a significant improvement of 2.2%. Based on YOLOv5_AC, a method for further improving identification accuracy was explored at the level of the deep learning model. Adding the CBAM attention mechanism to YOLOv5_AC helped improve TDI accuracy, but the accuracy was only increased by 0.5%, showing limited improvement. Finally, it was chosen to further explore the available QRS information and integrate it into the TDI of the deep learning model based on the YOLOv5_AC_CBAM identification results. The quantitative spectral, geometry, and texture feature information of RS targets was coupled and applied to wind turbine target re-identification, leading to a direct and effective improvement in wind turbine TDI accuracy. The last method, which was built upon the first two methods, achieved the most substantial improvement, raising the accuracy from 93.2% to 97.4%. Specifically, the incorporation of each additional dimension of QRS information significantly enhanced the accuracy of TDI.

6. Conclusions

To facilitate the integration of QRS information with deep learning models, this paper proposes various methods to enhance the identification effect of high-resolution optical image targets, advancing them step by step: YOLOv5_AC, YOLOv5_AC_CBAM, and YOLOv5_AC_CBAM_Exp. Firstly, the intrinsic information of objects in high-resolution RS data was restored to obtain SR data. The image quality was improved through quantitative AC, allowing for the extraction of quantitative information from RS targets. The amount of QRS information was increased before the data were fed into the YOLOv5 model, and the total amount of input information increased. The identification accuracy increased from 90.5% to 92.7% by using the YOLOv5_AC method. Then, the YOLOv5_AC model was enhanced by incorporating the CBAM attention mechanism, and the identification accuracy increased from 92.7% to 93.2% by using the YOLOv5_AC_CBAM method. Finally, the QRS features of targets selected using expert knowledge were used to further constrain the identification results of YOLOv5_AC_CBAM, and the identification accuracy increased from 93.2% to 97.4% by using the YOLOv5_AC_CBAM_Exp method. These progressive methods achieved significant results in identifying high-resolution RS wind turbine targets.
The QRS information, such as spectrum and radiation, obtained by quantitative processing was crucial to improve identification performance and achieve high precision. It also validated the positive effect of QRS information on RS TDI. The different methods above showed different enhancement capabilities and provided immense potential for the TDI of high-resolution RS images. Regarding the small TDI of high-resolution RS images, QRS information played a significant and indispensable supporting role, offering extensive research prospects. The effectiveness of QRS information in TDI by using the YOLO network was successfully demonstrated in this paper, offering a fresh perspective and innovative approach for research in related fields.
In future studies, with the rapid development of YOLO and other object detection algorithms, additional comparative experiments will be conducted on these latest models to verify the robustness of the methods. YOLOv5_AC_CBAM_Exp will be improved and integrated into the intelligent model so that it can automatically select appropriate QRS features for TDI. This intelligent model will have the ability to automatically pay attention to spectral differences, radiation differences, etc., between the target and its background environment. Additionally, the proposed methods will be improved and applied to other RS images. In terms of improving or building a deep learning model that can automatically integrate rich QRS information and two-dimensional image information and enhancing the universality of the model, it is still necessary for researchers of RS and CV disciplines to carry out further in-depth cross-research in the future.

Author Contributions

Conceptualization, X.C. and J.L.; methodology, X.C., Y.Z. and S.L.; software, X.C., Y.Z. and W.X.; validation, X.C. and Y.Z.; formal analysis, X.C., Y.Z. and L.M.; investigation, X.M. and W.W.; resources, J.Y., Q.M. and J.L.; data curation, J.L. and W.X.; writing—original draft preparation, X.C. and Y.Z.; writing—review and editing, X.C. and Y.Z.; visualization, X.C. and Y.Z.; supervision, Q.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Civil Aerospace Technology Pre-research Project of China’s 14th Five-Year Plan (Grant No. D040202, D040201, D040401 and D040404) and Shandong Provincial Key R&D Program of China (2024TSGC0428) and the National Natural Science Foundation of China (Grant No. 42171342).

Data Availability Statement

The data used in the reported study were obtained from the websites indicated in this text.

Acknowledgments

The authors acknowledge the China Center for Resources Satellite Date and Application for providing satellite data.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Liu, C.; Chen, Y.; Chen, F.; Zhu, P.; Chen, L. Sliding window change point detection based dynamic network model inference framework for airport ground service process. Knowl.-Based Syst. 2022, 238, 107701. [Google Scholar] [CrossRef]
  2. Chen, X.; Jiang, W.; Qi, H.; Liu, M.; Ma, H.; Yu, P.L.; Wen, Y.; Han, Z.; Zhang, S.; Cao, G. Adaptive meta-knowledge transfer network for few-shot object detection in very high resolution remote sensing images. J. Appl. Earth Obs. Geoinf. 2024, 127, 103675. [Google Scholar] [CrossRef]
  3. Li, G.; Bai, Z.; Liu, Z.; Zhang, X.; Ling, H. Salient Object Detection in Optical Remote Sensing Images Driven by Transformer. IEEE Trans. Image Process. 2023, 32, 5257–5269. [Google Scholar] [CrossRef] [PubMed]
  4. Vu, B.N.; Bi, J.; Wang, W.; Huff, A.; Kondragunta, S.; Liu, Y. Application of geostationary satellite and high-resolution meteorology data in estimating hourly PM2.5 levels during the Camp Fire episode in California. Remote Sens. Environ. 2022, 271, 112890. [Google Scholar] [CrossRef] [PubMed]
  5. Yurtseven, H.; Yener, H. Using of high-resolution satellite images in object-based image analysis. Eurasian J. For. Sci. 2019, 7, 187–204. [Google Scholar] [CrossRef]
  6. Başeski, E. Heliport Detection Using Artificial Neural Networks. Photogramm. Eng. Remote Sens. 2020, 86, 541–546. [Google Scholar] [CrossRef]
  7. Raghavi, K. Novel Method for Detection of Ship Docked in Harbor in High Resolution Remote Sensing Image. Indones. J. Electr. Eng. Comput. Sci. 2018, 9, 12–14. [Google Scholar] [CrossRef]
  8. Liu, Q.; Xiang, X.; Wang, Y.; Luo, Z.; Fang, F. Aircraft detection in remote sensing image based on corner clustering and deep learning. Eng. Appl. Artif. Intell. 2020, 87, 103333. [Google Scholar] [CrossRef]
  9. Li, B.; Xie, X.; Wei, X.; Tang, W. Ship detection and classification from optical remote sensing images: A survey. Chin. J. Aeronaut. 2020, 34, 145–163. [Google Scholar] [CrossRef]
  10. Wu, W. Quantized Gromov-Hausdorff distance. J. Funct. Anal. 2006, 238, 58–98. [Google Scholar] [CrossRef]
  11. Harel, J.; Koch, C.; Perona, P. Graph-based visual saliency. In Proceedings of the Neural Information Processing Systems (NIPS), Vancouver, BC, Canada, 4 December 2006. [Google Scholar]
  12. Zhang, R.; Xu, L.; Yu, Z.; Shi, Y.; Mu, C.; Xu, M. Deep-IRTarget: An Automatic Target Detector in Infrared Imagery Using Dual-Domain Feature Extraction and Allocation. IEEE Trans. Multimedia. 2022, 24, 1735–1749. [Google Scholar] [CrossRef]
  13. Wu, Q.; Li, Y.; Huang, W.; Chen, Q.; Wu, Y. C3TB-YOLOv5: Integrated YOLOv5 with transformer for object detection in high-resolution remote sensing images. Int. J. Remote Sens. 2024, 45, 2622–2650. [Google Scholar] [CrossRef]
  14. Sun, X.; Wang, P.; Lu, W.; Zhu, Z.; Lu, X.; He, Q.; Li, J.; Rong, X.; Yang, Z.; Chang, H.; et al. RingMo: A Remote Sensing Foundation Model with Masked Image Modeling. IEEE Trans. Geosci. Remote Sens. 2022, 61, 99. [Google Scholar] [CrossRef]
  15. Cao, Z.; Jiang, L.; Yue, P.; Gong, J.; Hu, X.; Liu, S.; Tan, H.; Liu, C.; Shangguan, B.; Yu, D. A large scale training sample database system for intelligent interpretation of remote sensing imagery. Geo-Spat. Inf. Sci. 2023, 27, 1489–1508. [Google Scholar] [CrossRef]
  16. Li, K.; Wan, G.; Cheng, G.; Meng, L.; Han, J. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J. Photogramm. Remote Sens. 2020, 159, 296–307. [Google Scholar] [CrossRef]
  17. Zhang, R.; Liu, G.; Zhang, Q.; Lu, X.; Dian, R.; Yang, Y.; Xu, L. Detail-Aware Network for Infrared Image Enhancement. IEEE Trans. Geosci. Remote Sens. 2025, 63, 50003. [Google Scholar] [CrossRef]
  18. Lou, P.; Fu, B.; Lin, X.; Tang, T.; Bi, L. Quantitative Remote Sensing Analysis of Thermal Environment Changes in the Main Urban Area of Guilin Based on Gee. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2020, 42, 881–888. [Google Scholar] [CrossRef]
  19. Khaleal, F.M.; El-Bialy, M.Z.; Saleh, G.M.; Lasheen ES, R.; Kamar, M.S.; Omar, M.M.; Abdelaal, A. Assessing environmental and radiological impacts and lithological mapping of beryl-bearing rocks in Egypt using high-resolution sentinel-2 remote sensing images. Sci. Rep. 2023, 13, 11497. [Google Scholar] [CrossRef]
  20. Zhang, R.; Tan, J.; Cao, Z.; Xu, L.; Liu, Y.; Si, L.; Sun, F. Part-Aware Correlation Networks for Few-Shot Learning. IEEE Trans. Multimedia. 2024, 26, 9527–9538. [Google Scholar] [CrossRef]
  21. Chen, J.; Yue, A.; Wang, C.; Huang, Q.; Chen, J.; Meng, Y.; He, D. Wind turbine extraction from high spatial resolution remote sensing images based on saliency detection. J. Appl. Remote Sens. 2018, 12, 016041. [Google Scholar] [CrossRef]
  22. Bian, L.; Li, B.; Wang, J.; Gao, Z. Multi-branch stacking remote sensing image target detection based on YOLOv5. Egypt. J. Remote Sens. 2023, 26, 999–1008. [Google Scholar] [CrossRef]
  23. Jha, S.S.; Kumar, M.; Nidamanuri, R.R. Multi-platform optical remote sensing dataset for target detection. Data Brief. 2020, 33, 106362. [Google Scholar] [CrossRef] [PubMed]
  24. Chen, H.; Zhang, L.; Ma, J.; Zhang, J. Target Heat-map Network: An End-to-end Deep Network for Target Detection in Remote Sensing Images. Neurocomputing 2018, 331, 375–387. [Google Scholar] [CrossRef]
  25. Yu, X.; Hoff, L.E.; Reed, I.S.; Chen, A.M.; Stotts, L.B. Automatic target detection and recognition in multiband imagery: A unified ML detection and estimation approach. IEEE T. Image Process. 1997, 6, 143–156. [Google Scholar] [CrossRef]
  26. Xie, W.; Zhang, J.; Lei, J.; Li, Y.; Jia, X. Self-spectral learning with GAN based spectral-spatial target detection for hyperspectral image. Neural Netw. 2021, 142, 375–387. [Google Scholar] [CrossRef]
  27. Hang, R.; Liu, Q.; Hong, D.; Ghamisi, P. Cascaded recurrent neural networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5384–5394. [Google Scholar] [CrossRef]
  28. Zhang, L.; Huang, X.; Huang, B.; Li, P. A pixel shape index coupled with spectral information for classification of high spatial resolution remotely sensed imagery. IEEE Trans. Geosci. Remote Sens. 2006, 44, 2950–2961. [Google Scholar] [CrossRef]
  29. Barbato, M.P.; Piccoli, F.; Napoletano, P. Ticino: A multi-modal remote sensing dataset for semantic segmentation. Exp. Sys. Appl. 2024, 249, 123600. [Google Scholar] [CrossRef]
  30. Hong, D.; Zhang, B.; Li, X.; Li, Y.; Li, C.; Yao, J.; Yokoya, N.; Li, H.; Ghamisi, P.; Jia, X.; et al. SpectralGPT: Spectral Remote Sensing Foundation Model. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 5227–5244. [Google Scholar] [CrossRef] [PubMed]
  31. Yang, H.; Wang, Z.; Cao, J.; Wu, Q.; Zhang, B. Estimating soil salinity using Gaofen-2 imagery: A novel application of combined spectral and textural features. Environ. Res. 2022, 217, 114870. [Google Scholar] [CrossRef] [PubMed]
  32. Ren, B.; Ma, S.; Hou, B.; Hong, D.; Chanussot, J.; Wang, J.; Jiao, L. A dual-stream high resolution network: Deep fusion of GF-2 and GF-3 data for land cover classification. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102896. [Google Scholar] [CrossRef]
  33. Li, Y.; Wang, C.; Wright, A.; Liu, H.; Zhang, H.; Zong, Y. Combination of GF-2 high spatial resolution imagery and land surface factors for predicting soil salinity of muddy coasts. Catena 2021, 202, 105304. [Google Scholar] [CrossRef]
  34. Liu, C.C.; Chen, P.L. Automatic extraction of ground control regions and orthorectification of remote sensing imagery. Opt. Express. 2009, 17, 7970–7984. [Google Scholar] [CrossRef]
  35. Liu, S.; Zhang, Y.; Zhao, L.; Chen, X.; Zhou, R.; Zheng, F.; Li, Z.; Li, J.; Yang, H.; Li, H.; et al. QUantitative and Automatic Atmospheric Correction (QUAAC): Application and Validation. Sensors 2022, 22, 3280. [Google Scholar] [CrossRef]
  36. Liu, G.; Wang, Y.; Guo, L.; Ma, C. Research on fusion of GF-6 imagery and quality evaluation. E3S Web Conf. 2020, 165, 03016. [Google Scholar] [CrossRef]
  37. Zi, N.; Li, X.M.; Gade, M.; Fu, H.; Min, S. Ocean eddy detection based on YOLO deep learning algorithm by synthetic aperture radar data. Remote Sens. Environ. 2024, 307, 114139. [Google Scholar] [CrossRef]
  38. Zhang, Y.; Ye, M.; Zhu, G.; Liu, Y.; Guo, P.; Yan, J. FFCA-YOLO for Small Object Detection in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–15. [Google Scholar] [CrossRef]
  39. Zhang, W.; Liu, Z.; Zhou, S.; Qi, W.; Wu, X.; Zhang, T.; Han, L. LS-YOLO: A Novel Model for Detecting Multi-Scale Landslides with Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2024, 17, 4952–4965. [Google Scholar] [CrossRef]
  40. Yu, L.; Qian, M.; Chen, Q.; Sun, F.; Pan, J. An improved YOLOv5 model: Application to mixed impurities detection for walnut kernels. Foods 2023, 12, 624. [Google Scholar] [CrossRef]
  41. Yin, M.; Chen, Z.; Zhang, C. A CNN-Transformer Network Combining CBAM for Change Detection in High-Resolution Remote Sensing Images. Remote Sens. 2023, 15, 2406. [Google Scholar] [CrossRef]
  42. Guo, Y.; Aggrey, S.E.; Yang, X.; Oladeinde, A.; Qiao, Y.; Chai, L. Detecting broiler chickens on litter floor with the YOLOv5-CBAM deep learning model. Artif. Intell. Agric. 2023, 9, 36–45. [Google Scholar] [CrossRef]
  43. Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8 September 2018. [Google Scholar]
  44. Haralick, R.M.; Shanmugam, K.; Dinstein, I.H. Textural features for image classification. IEEE Trans. Syst. Man. Cybern. 1973, 6, 610–621. [Google Scholar] [CrossRef]
  45. Hu, J.; Wei, Y.; Chen, W.; Zhi, X.; Zhang, W. CM-YOLO: Typical Object Detection Method in Remote Sensing Cloud and Mist Scene Images. Remote Sens. 2025, 17, 125. [Google Scholar] [CrossRef]
  46. Wu, Z.; Wu, D.; Li, N.; Chen, W.; Yuan, J.; Yu, X.; Guo, Y. CBGS-YOLO: A Lightweight Network for Detecting Small Targets in Remote Sensing Images Based on a Double Attention Mechanism. Remote Sens. 2025, 17, 109. [Google Scholar] [CrossRef]
  47. Zhou, H.; Wu, S.; Xu, Z.; Sun, H. Automatic detection of standing dead trees based on improved YOLOv7 from airborne remote sensing imagery. Front. Plant Sci. 2024, 15, 1278161. [Google Scholar] [CrossRef]
  48. Hui, Y.; Wang, J.; Li, B. DSAA-YOLO: UAV remote sensing small target recognition algorithm for YOLOV7 based on dense residual super-resolution and anchor frame adaptive regression strategy. J. King Saud. Univ. Comput. Inf. Sci. 2024, 36, 101863. [Google Scholar] [CrossRef]
  49. Ma, D.; Liu, B.; Huang, Q.; Zhang, Q. MwdpNet: Towards improving the recognition accuracy of tiny targets in high-resolution remote sensing image. Sci. Rep. 2023, 13, 13890. [Google Scholar] [CrossRef]
  50. Lin, S. Automatic recognition and detection of building targets in urban remote sensing images using an improved regional convolutional neural network algorithm. Cogn. Comput. Syst. 2023, 5, 132–137. [Google Scholar] [CrossRef]
  51. Sun, X.; Wang, P.; Yan, Z.; Xu, F.; Wang, R.; Diao, W.; Chen, J.; Li, J.; Feng, Y.; Xu, T.; et al. FAIR1M: A benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery. ISPRS J. Photogramm. Remote Sens. 2022, 184, 116–130. [Google Scholar] [CrossRef]
Figure 1. Statistics on the number of RS HSR satellites launched in China, USA, and the world, and the “total” refers to the total number of RS satellites launched (https://space.oscar.wmo.int/, last accessed on 23 July 2024).
Figure 1. Statistics on the number of RS HSR satellites launched in China, USA, and the world, and the “total” refers to the total number of RS satellites launched (https://space.oscar.wmo.int/, last accessed on 23 July 2024).
Remotesensing 17 00733 g001
Figure 2. The basic information of LuoJiaSET’s DOTA dataset for TDI (https://captain-whu.github.io/DOTA/index.html, last accessed on 12 May 2024).
Figure 2. The basic information of LuoJiaSET’s DOTA dataset for TDI (https://captain-whu.github.io/DOTA/index.html, last accessed on 12 May 2024).
Remotesensing 17 00733 g002
Figure 3. The process of GF-2 L1-level image preprocessing. The pixels of GF-2 L1-level images represent the DN value. The GF-2 L1-level images may have topographic distortion, which can be eliminated by orthographic correction. The whole process of quantitative AC includes three steps: radiometric calibration, path radiation correction, and adjacency effect correction. After quantitative AC, the pixels in the image represent the spectral reflectance of the ground object. Image fusion consists of the fusion of the multispectral and panchromatic images of the GF-2 satellite to improve image resolution. GF-2 L1-level images with wind turbines underwent two different preprocessing stages to obtain two different sets of images: SR images and DN value images.
Figure 3. The process of GF-2 L1-level image preprocessing. The pixels of GF-2 L1-level images represent the DN value. The GF-2 L1-level images may have topographic distortion, which can be eliminated by orthographic correction. The whole process of quantitative AC includes three steps: radiometric calibration, path radiation correction, and adjacency effect correction. After quantitative AC, the pixels in the image represent the spectral reflectance of the ground object. Image fusion consists of the fusion of the multispectral and panchromatic images of the GF-2 satellite to improve image resolution. GF-2 L1-level images with wind turbines underwent two different preprocessing stages to obtain two different sets of images: SR images and DN value images.
Remotesensing 17 00733 g003
Figure 4. A comparison of images before (left) and after (right) AC. After quantitative AC, the quality and clarity of images were significantly improved.
Figure 4. A comparison of images before (left) and after (right) AC. After quantitative AC, the quality and clarity of images were significantly improved.
Remotesensing 17 00733 g004
Figure 5. Labeled examples of wind turbine samples. The labeled samples were obtained through the manual annotation of horizontal boxes. By way of visual annotation, the wind turbine body and the shadow projected on the ground were included in the labeled box as targets.
Figure 5. Labeled examples of wind turbine samples. The labeled samples were obtained through the manual annotation of horizontal boxes. By way of visual annotation, the wind turbine body and the shadow projected on the ground were included in the labeled box as targets.
Remotesensing 17 00733 g005
Figure 6. An overview of the specific research route. The SR data represent the sample database with spectral information after AC. The DN value and SR sample databases were split into training, validation, and testing sets in an 8:1:1 ratio. The three RGB bands of DN value data were used for YOLOv5 model training and testing, and the three RGB bands of SR data were used for YOLOv5_AC and YOLOv5_AC_CBAM training and testing. Based on the identification results of YOLOv5_AC_CBAM, YOLOv5_AC_CBAM utilized the feature information of the four bands of SR data.
Figure 6. An overview of the specific research route. The SR data represent the sample database with spectral information after AC. The DN value and SR sample databases were split into training, validation, and testing sets in an 8:1:1 ratio. The three RGB bands of DN value data were used for YOLOv5 model training and testing, and the three RGB bands of SR data were used for YOLOv5_AC and YOLOv5_AC_CBAM training and testing. Based on the identification results of YOLOv5_AC_CBAM, YOLOv5_AC_CBAM utilized the feature information of the four bands of SR data.
Remotesensing 17 00733 g006
Figure 7. The YOLOv5_AC_CBAM model structure. The red boxes are the added CBAMs in the neck part [41].
Figure 7. The YOLOv5_AC_CBAM model structure. The red boxes are the added CBAMs in the neck part [41].
Remotesensing 17 00733 g007
Figure 8. The structure of the CBAM attention mechanism (adapted with permission from Ref. [43]). MaxPool is used to find the maximum value of the feature points in the neighborhood, while AvgPool is used to find the average of the feature points in the neighborhood. And the shared MLP means that the two layers of the neural network are shared.
Figure 8. The structure of the CBAM attention mechanism (adapted with permission from Ref. [43]). MaxPool is used to find the maximum value of the feature points in the neighborhood, while AvgPool is used to find the average of the feature points in the neighborhood. And the shared MLP means that the two layers of the neural network are shared.
Remotesensing 17 00733 g008
Figure 9. Six common types of false detection ground objects, including striated ground, green land, road, farmland, building, and power tower. The similarity of these false detection ground objects was made up of high-reflection and low-shadow pixels. In terms of geometric, texture, and other features, these ground objects were significantly different from wind turbines.
Figure 9. Six common types of false detection ground objects, including striated ground, green land, road, farmland, building, and power tower. The similarity of these false detection ground objects was made up of high-reflection and low-shadow pixels. In terms of geometric, texture, and other features, these ground objects were significantly different from wind turbines.
Remotesensing 17 00733 g009
Figure 10. Average spectral curves for multiple ground objects (a) and wind turbines (b) with error bars. The sample size for each ground object was set at 20 pixels. The average spectral reflectance refers to the average value of the spectral reflectance of the adjacent 20 pixels in the area of the ground object. In the spectral curve of the wind turbine, the reflectance of most wind turbine bodies was relatively the lowest in the blue band and was greater than 0.2. The shadow of wind turbines belongs to the dark pixel, so the reflectance was low. The reflectance of most false detection ground objects was lower than 0.2 in the blue band.
Figure 10. Average spectral curves for multiple ground objects (a) and wind turbines (b) with error bars. The sample size for each ground object was set at 20 pixels. The average spectral reflectance refers to the average value of the spectral reflectance of the adjacent 20 pixels in the area of the ground object. In the spectral curve of the wind turbine, the reflectance of most wind turbine bodies was relatively the lowest in the blue band and was greater than 0.2. The shadow of wind turbines belongs to the dark pixel, so the reflectance was low. The reflectance of most false detection ground objects was lower than 0.2 in the blue band.
Remotesensing 17 00733 g010
Figure 11. Statistical histograms of false detection ground objects (a) and wind turbines (b) reflectance in the blue band. “Frequency” refers to the number of pixels corresponding to a certain reflectivity value in the prediction box. “Reflectance” ranges from 0 to 1. Most of the false detection ground objects have few pixels with reflectance exceeding 0.2 in the blue band, but there are relatively more pixels with the reflectance of the wind turbine exceeding 0.2. Therefore, the difference information of the spectral features of the false detection ground objects and wind turbine can be separated from the blue band.
Figure 11. Statistical histograms of false detection ground objects (a) and wind turbines (b) reflectance in the blue band. “Frequency” refers to the number of pixels corresponding to a certain reflectivity value in the prediction box. “Reflectance” ranges from 0 to 1. Most of the false detection ground objects have few pixels with reflectance exceeding 0.2 in the blue band, but there are relatively more pixels with the reflectance of the wind turbine exceeding 0.2. Therefore, the difference information of the spectral features of the false detection ground objects and wind turbine can be separated from the blue band.
Remotesensing 17 00733 g011
Figure 12. Comparison of predicted bounding box size of false detection ground objects (ad) and wind turbine targets (e,f).
Figure 12. Comparison of predicted bounding box size of false detection ground objects (ad) and wind turbine targets (e,f).
Remotesensing 17 00733 g012
Figure 13. A comparison of the characteristic GLCM parameters of false detection ground objects and wind turbine targets. The “Angle” refers to the four angle values in the GLCM. Apart from CON, it is challenging to distinguish between false detections and wind turbines using HOM, DIS, ENT, ASM, and COR. At the π 4 and 3 π 4 angles, the CON of the wind turbine is higher than that of most false detections. Therefore, the textural feature differences between false detections and wind turbines can be effectively extracted from CON.
Figure 13. A comparison of the characteristic GLCM parameters of false detection ground objects and wind turbine targets. The “Angle” refers to the four angle values in the GLCM. Apart from CON, it is challenging to distinguish between false detections and wind turbines using HOM, DIS, ENT, ASM, and COR. At the π 4 and 3 π 4 angles, the CON of the wind turbine is higher than that of most false detections. Therefore, the textural feature differences between false detections and wind turbines can be effectively extracted from CON.
Remotesensing 17 00733 g013aRemotesensing 17 00733 g013b
Figure 14. A flowchart of dynamic threshold re-identification aided by high-confidence target feature information. Target identification was performed on the GF-2 SR images using YOLOv5_AC_CBAM. Based on the identification results, objects with a prediction confidence score greater than 0.8 were classified as high-confidence targets, while those with lower scores were considered low-confidence objects. These low-confidence objects will be re-identified. The image texture, quantitative geometry, and spectral reflectance features extracted from the high-confidence targets were used to dynamically adjust the threshold conditions for re-identification. The threshold re-identification conditions contained Equations (5)–(8). Ultimately, objects that meet the conditions of Equations (5)–(8) were confirmed as targets, while those that do not were classified as non-targets.
Figure 14. A flowchart of dynamic threshold re-identification aided by high-confidence target feature information. Target identification was performed on the GF-2 SR images using YOLOv5_AC_CBAM. Based on the identification results, objects with a prediction confidence score greater than 0.8 were classified as high-confidence targets, while those with lower scores were considered low-confidence objects. These low-confidence objects will be re-identified. The image texture, quantitative geometry, and spectral reflectance features extracted from the high-confidence targets were used to dynamically adjust the threshold conditions for re-identification. The threshold re-identification conditions contained Equations (5)–(8). Ultimately, objects that meet the conditions of Equations (5)–(8) were confirmed as targets, while those that do not were classified as non-targets.
Remotesensing 17 00733 g014
Figure 15. The missed detection (a) and false detection (b) objects of the YOLOv5 identification. The red boxes represent the missed detection objects, and the orange boxes show the false detection objects of the YOLOv5 model identification.
Figure 15. The missed detection (a) and false detection (b) objects of the YOLOv5 identification. The red boxes represent the missed detection objects, and the orange boxes show the false detection objects of the YOLOv5 model identification.
Remotesensing 17 00733 g015
Figure 16. The accuracy improvement effects of three identification methods. The rightmost number represents the identification accuracy and the increase in the accuracy of wind turbine test images. Compared with the identification result of YOLOv5, the identification accuracy of YOLOv5_AC increased by 2.2%. Compared with YOLOv5_AC, the identification accuracy of YOLOv5_AC_CBAM increased by 0.5%. On the basis of YOLOv5_AC_CBAM, the identification accuracy of YOLOv5_AC_CBAM_Exp improved by 4.2%. When QRS feature information was added for threshold re-identification, YOLOv5_AC_CBAM_Exp achieved the highest improvement.
Figure 16. The accuracy improvement effects of three identification methods. The rightmost number represents the identification accuracy and the increase in the accuracy of wind turbine test images. Compared with the identification result of YOLOv5, the identification accuracy of YOLOv5_AC increased by 2.2%. Compared with YOLOv5_AC, the identification accuracy of YOLOv5_AC_CBAM increased by 0.5%. On the basis of YOLOv5_AC_CBAM, the identification accuracy of YOLOv5_AC_CBAM_Exp improved by 4.2%. When QRS feature information was added for threshold re-identification, YOLOv5_AC_CBAM_Exp achieved the highest improvement.
Remotesensing 17 00733 g016
Table 1. Spectral bands of GF-2 satellite.
Table 1. Spectral bands of GF-2 satellite.
PayloadBand NumberSpectral BandSpatial Resolution
Multispectral and Panchromatic Cameras10.45 µm–0.90 µm1 m
20.45 µm–0.52 µm4 m
30.52 µm–0.59 µm
40.63 µm–0.69 µm
50.77 µm–0.89 µm
Table 2. The probability of the correct identification of wind turbines under different confidence thresholds. When the confidence threshold was 0.8 or higher, the identified targets were all wind turbines.
Table 2. The probability of the correct identification of wind turbines under different confidence thresholds. When the confidence threshold was 0.8 or higher, the identified targets were all wind turbines.
Confidence Threshold P w (%)
0.7595.7
0.80100
0.85100
0.90100
Table 3. The identification results of YOLOv5 and YOLOv5_AC on the test set. After quantitative AC treatment, each index of YOLOv5_AC had a certain improvement compared with YOLOv5.
Table 3. The identification results of YOLOv5 and YOLOv5_AC on the test set. After quantitative AC treatment, each index of YOLOv5_AC had a certain improvement compared with YOLOv5.
ModelWhether to Perform Quantitative AC P R A P @ 0.5 A P @ 0.5 : 0.95
YOLOv5No0.9440.9250.9380.735
YOLOv5_ACYes0.9570.9460.9530.744
Table 4. The index statistics of the test image identification results in YOLOv5, YOLOv5_AC, YOLOv5_AC_CBAM, and YOLOv5_AC_CBAM_Exp. The A C C was gradually improved, and the ACC of YOLOv5_AC_CBAM_Exp was the highest, increasing from 90.5% to 97.4%. F r gradually declined, and the F r of YOLOv5_AC_CBAM_Exp was the lowest, dropping from 7.8% to 1.6%. The M r of YOLOv5_AC_CBAM was the lowest. The results of each identification method have different improvements on the basis of YOLOv5, and YOLOv5_AC_CBAM_Exp has the greatest improvement.
Table 4. The index statistics of the test image identification results in YOLOv5, YOLOv5_AC, YOLOv5_AC_CBAM, and YOLOv5_AC_CBAM_Exp. The A C C was gradually improved, and the ACC of YOLOv5_AC_CBAM_Exp was the highest, increasing from 90.5% to 97.4%. F r gradually declined, and the F r of YOLOv5_AC_CBAM_Exp was the lowest, dropping from 7.8% to 1.6%. The M r of YOLOv5_AC_CBAM was the lowest. The results of each identification method have different improvements on the basis of YOLOv5, and YOLOv5_AC_CBAM_Exp has the greatest improvement.
ModelTotal Number of True Targets T P F P F N A C C (%) F r (%) M r (%)
YOLOv5191218751583790.57.82.0
YOLOv5_AC191218971331592.76.60.78
YOLOv5_AC_CBAM191218981241493.26.10.73
YOLOv5_AC_CBAM_Exp19121891302197.41.61.1
Table 5. The identification results in the test sets of different network models. The identification effect of YOLOv5_AC and YOLOv5_AC_CBAM was relatively better.
Table 5. The identification results in the test sets of different network models. The identification effect of YOLOv5_AC and YOLOv5_AC_CBAM was relatively better.
Model P R A P @ 0.5 A P @ 0.5 : 0.95
FasterRCNN0.9340.9430.9370.675
YOLOv70.9350.9440.9380.670
YOLOv5_AC0.9570.9460.9530.744
YOLOv5_AC_CBAM0.9600.9490.9570.746
Table 6. The results of the ablation experiment. When the geometric features, image texture features, and spectral reflectance features were jointly employed, the performance was outstanding.
Table 6. The results of the ablation experiment. When the geometric features, image texture features, and spectral reflectance features were jointly employed, the performance was outstanding.
FeaturesTotal Number of Real Targets T P F P F N A C C (%) F r (%) M r (%)
No191218981241493.26.10.7
Fa19121891492196.42.51.1
Fb19121896931694.64.70.8
Fc19121897721595.63.70.8
Fa + Fb19121891352197.11.81.1
Fa + Fc19121891372197.01.91.1
Fb + Fc19121894591896.13.00.9
Fa + Fb + Fc19121891302197.41.61.1
Fa quantitative geometric feature. Fb image texture feature. Fc quantitative spectral reflectance feature.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, X.; Zhang, Y.; Xue, W.; Liu, S.; Li, J.; Meng, L.; Yang, J.; Mi, X.; Wan, W.; Meng, Q. Quantitative Remote Sensing Supporting Deep Learning Target Identification: A Case Study of Wind Turbines. Remote Sens. 2025, 17, 733. https://doi.org/10.3390/rs17050733

AMA Style

Chen X, Zhang Y, Xue W, Liu S, Li J, Meng L, Yang J, Mi X, Wan W, Meng Q. Quantitative Remote Sensing Supporting Deep Learning Target Identification: A Case Study of Wind Turbines. Remote Sensing. 2025; 17(5):733. https://doi.org/10.3390/rs17050733

Chicago/Turabian Style

Chen, Xingfeng, Yunli Zhang, Wu Xue, Shumin Liu, Jiaguo Li, Lei Meng, Jian Yang, Xiaofei Mi, Wei Wan, and Qingyan Meng. 2025. "Quantitative Remote Sensing Supporting Deep Learning Target Identification: A Case Study of Wind Turbines" Remote Sensing 17, no. 5: 733. https://doi.org/10.3390/rs17050733

APA Style

Chen, X., Zhang, Y., Xue, W., Liu, S., Li, J., Meng, L., Yang, J., Mi, X., Wan, W., & Meng, Q. (2025). Quantitative Remote Sensing Supporting Deep Learning Target Identification: A Case Study of Wind Turbines. Remote Sensing, 17(5), 733. https://doi.org/10.3390/rs17050733

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop