Dynamic Cascade Detector for Storage Tanks and Ships in Optical Remote Sensing Images

Wang, Tong; Liu, Bingxin; Chen, Peng

doi:10.3390/rs17111882

Open AccessArticle

Dynamic Cascade Detector for Storage Tanks and Ships in Optical Remote Sensing Images

by

Tong Wang

^1,*

,

Bingxin Liu

^2,†

and

Peng Chen

^2,†

¹

College of Computing and Data Science, Nanyang Technological University, Singapore 639978, Singapore

²

College of Navigation, Dalian Maritime University, Dalian 116026, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Remote Sens. 2025, 17(11), 1882; https://doi.org/10.3390/rs17111882

Submission received: 13 March 2025 / Revised: 9 May 2025 / Accepted: 26 May 2025 / Published: 28 May 2025

Download

Browse Figures

Versions Notes

Abstract

Regional Convolutional Neural Network (RCNN)−based detectors have played a crucial role in object detection in remote sensing images due to their exceptional detection capabilities. Some studies have shown that different stages should have different Intersections of Union (IoU) thresholds to distinguish positive and negative samples because each stage has different IoU distributions. However, these studies have overlooked the fact that the IoU distribution at each stage changes continuously during the training process. Therefore, the IoU threshold at each stage should also be adjusted continuously to adapt to the changes in the IoU distribution. We realized that the IoU distribution at each stage is very similar to a Gaussian skewed distribution. In this paper, we introduce a novel dynamic IoU threshold method based on the Cascade RCNN architecture, called the Dynamic Cascade detector, with reference to the Gaussian skewed distribution. We tested the effectiveness of this method by detecting horizontal storage tanks and rotated ships in optical remote sensing images. Our experiments demonstrated that this technique can significantly improve detection results, as evaluated based on the COCO metric. In addition, the threshold range of the last stage impacts other stages, so the threshold range of one stage may change significantly when the number of stages changes. Furthermore, the threshold may not always increase during the training process and may decrease when the IoU distribution resembles a negatively skewed distribution.

Keywords:

Dynamic RCNN; Gaussian skewed distribution; object detection; optical remote sensing image

Graphical Abstract

1. Introduction

Optical remote sensing images are commonly used in the object-detection field due to their ability to cover large areas with high resolution, making it easier to detect and monitor specific objects. For example, remote sensing images can be used to detect ships in the deep sea or to identify storage tanks in a particular area to assess their capacity to store substances such as petroleum. Our upcoming tasks include the volume estimation of storage tanks and fine-grained classification of ships. Considering that training together with some other categories might affect the detection performance of storage tanks and ships, this paper focuses solely on the single-category detection of storage tanks and ships.

Object detection using the convolution neural network (CNN) is gaining momentum and has become a mainstream approach. Two-stage detectors, a key component of CNN-based detectors, are extensively used for detecting objects in optical remote sensing images. Cascade RCNN [1], a multi-stage detector, could achieve a better result with extra stages. Many works utilized the Cascade framework to improve the results [2,3,4]. What is more, Cascade RCNN has shown that different stages should have different Intersections of Union (IoU) thresholds because they have distinct IoU distributions. We question whether the IoU distribution changes at each stage during training. As training progresses, the predictive power of each stage of the multi-stage model continues to increase, increasing the IoU of sample input to the following RCNN stages, thus changing the IoU distribution. To determine whether the IoU distribution at each stage changes during training, we visualize the number of samples in different IoU intervals at different stages of storage tank detection, as depicted in Figure 1. Specifically, we only show samples with an IoU ranging from 0.5 to 1.0, since samples are usually selected within this range. We can observe that the changes in the IoU distribution are more drastic in the later stages. This highlights the necessity of a dynamic IoU threshold to adapt to the changing IoU distribution.

There are also some studies focusing on dynamic label assignment methods. Ref. [5] set the 75th largest IoU value of the training samples as the new threshold. Ref. [6] used the changing average IoU of the samples as the dynamic IoU threshold. However, these two methods did not explore their effectiveness based on extra RCNN stages. As we can see from Figure 1, the IoU distribution in different stages is totally different. Through our experiments, we found that these two methods even had a negative effect on the cascade framework. Therefore, we wanted to explore a dynamic threshold method that can be used in all RCNN stages and provide almost the same improvement.

To better understand the IoU distribution, we create a histogram showing the number of samples in different IoU intervals in a specific training iteration. Figure 2 shows the shape of the IoU distributions of different stages at the 77.3 k iteration for storage tank training of the DIOR dataset, which are similar to the Gaussian skewed distribution in Figure 3. In the second stage, the samples are provided by the RPN, but the detection performance of the RPN is not ideal and only offers some rough detection results, resulting in low-IoU samples making up the majority in the second stage. As shown in Figure 1a, throughout the entire training process, samples with smaller IoUs are more numerous. In other words, the sample distribution in the second stage basically remains as shown in Figure 2a, following the positively skewed distribution illustrated in Figure 3a. In the third stage, after being refined by the first RCNN stage, the IoU of the samples passed into the third stage generally increases. Since the number of samples passed into the third stage is fixed, the number of low-IoU samples gradually decreases, and the smaller the IoU, the fewer such samples there are. As shown in Figure 1b, there is always one IoU interval with the largest number of samples, and for the IoU intervals adjacent to it, the number of samples gradually decreases as the IoU difference increases. Therefore, the sample distribution in the third stage follows the negatively skewed distribution shown in Figure 3b.

Skewness refers to the asymmetry of the probability distribution of a variable. If the value range of the independent variable is 0 to 1, then the mean value µ of the standard Gaussian distribution is 0.5, which is equivalent to the threshold commonly used in the detection framework. Based on this, we assumed that the mean and skewness may effectively divide positive and negative samples.

It is known that if a dataset follows a Gaussian distribution, the sample can be selected within a certain range using the mean and standard deviation of the data. At first, we tried using the mean value of the samples as the dynamic threshold, but the results were poor. Then, based on this, we tried using the standard deviation to adjust the dynamic threshold, and the standard deviation coefficient was 0.5, 1.0, and 2.0. Although the results were better than not using the standard deviation when the standard deviation coefficient was 0.5 and 1.0, they still did not demonstrate the advantage of a dynamic threshold, so we inferred that the standard deviation coefficient might be inaccurate and should be less than 1.0. Through the literature review, we found that the absolute value of skewness is generally less than 1.0, so we used skewness as the coefficient of the standard deviation to adjust the dynamic threshold.

Therefore, considering the changing IoU distribution and the Gaussian skewed distribution, we design a detector with the dynamic IoU threshold applied to the RCNN stages in the Cascade architecture, called the Dynamic Cascade detector.

We use models of our previous works [8,9] as baselines and implement the dynamic threshold method in these models. In addition, to evaluate the generalization and robustness of the dynamic threshold method, we conduct comprehensive experiments on multiple datasets. Furthermore, we generate visualizations of the threshold changes and summarize the change patterns.

The contributions of this article are summarized as follows:

We proposed a novel dynamic IoU threshold method, which is implemented in the RCNN stages, based on a Gaussian skewed distribution idea. This dynamic threshold method can effectively reflect changes in the sample distribution and obtain an accurate threshold.
Unlike other dynamic thresholding methods that are only suitable for two-stage frameworks, the dynamic threshold method that we propose is applicable to cascade structures. Even in the later stages of a cascade structure, where the stages experience more drastic changes in the sample distribution, effective dynamic thresholds can still be obtained.
We conducted extensive experiments on storage tank and ship detection in optical remote sensing images and demonstrated the effectiveness and applicability of the Dynamic Cascade detector to a certain extent, regardless of whether the objects are horizontal or rotated.

In Section 2, we will review the related works about approaches for selecting training samples and alleviating the feature conflict between classification and regression, which are the techniques used in our baseline models. The proposed Dynamic Cascade detector is introduced in Section 3. Section 4 represents the experiments and discussion. Conclusions are given in Section 5.

2. Related Works

2.1. Methods for Selecting Training Samples

Some methods focused on generating diverse and realistic images to augment the training dataset. Ref. [10] proposed a Diff-Mosaic model, which is a diffusion model to increase the diversity and realism of data augmentation methods via diffusion prior, and this model significantly improved the performance of the detection network. Ref. [11] developed a CamoDiffusion model that has a conditional diffusion model to generate masks to solve the Camouflaged Object Detection problem.

Multimodal data have complementary features, which can be used to enhance the detection ability of a model. Ref. [12] introduced a MMFDet model, which leverages multimodal data differences and complementary features to address some challenges in object detection through remote sensing imagery, such as a high proportion of small objects and inadequate detectability of objects in low-light environments. Ref. [13] introduced an extensive overview of sensor fusion methods.

However, the above methods are mainly aimed at improving the accuracy and diversity of the source images. During the training process of the object detection model, it is necessary to select a certain number of samples based on the ground truth bounding boxes.

Many researchers have recognized the crucial role of selecting training samples during the training process. In order to improve the accuracy of initial proposals in the proposal generation stage, several methods [14,15,16,17] have been proposed. Moreover, due to the imbalance in the number of negative and positive samples, it is a challenge to effectively select negative samples. Refs. [18,19] utilized innovative approaches to filter out negative samples by shrinking the search space. Ref. [20] introduced a new loss function to solve the class imbalance problem by reducing the weight of easy-to-classify samples so that the model gives priority to difficult-to-classify samples during training. Additionally, Refs. [21,22,23,24] developed dynamic threshold techniques for sample selection. Ref. [25] designed a novel shape-dependent assignment (SDA) method that dynamically differentiates positive and negative samples based on the object shape. It is worth noting that all of these methods are tailored for the one-stage detection framework and cannot be directly applied to the RCNN stage.

Some studies have focused on the defects of existing RCNN stage sample selection methods. In common cases, some samples are difficult to classify. For example, positive samples with relatively small IoUs or negative samples with relatively large IoUs. Ref. [26] developed an architecture to ensure that “hard samples” that are easily classified receive additional training. However, these frameworks will require more training time and increase the computational burden. Ref. [27] found that due to the large proportion of negative samples with an IoU less than 0.05, random selection can basically only capture hard negative samples with IoUs less than 0.05. In order to obtain more hard negative samples from other IoU intervals, the authors proposed an IoU balanced sampling technique to ensure that hard samples in each IoU interval have an equal probability of being selected. Ref. [28] ranked positive and negative samples separately and focused on prime samples. Refs. [29,30] focused on the imbalance problem of positive training samples and developed techniques to generate positive samples with an arbitrary IoU distribution. However, they did not consider the ratio of positive and negative samples since there may be an imbalance problem between them. Ref. [1] developed the Cascade RCNN architecture to avoid the overfitting problem during training and the quality mismatch at the inference time. The authors pointed out that since the IoU distribution of different stages is different, different stages should have different IoU thresholds. Many works used the Cascade RCNN framework to improve the results, such as [31,32,33]. However, they failed to realize that the IoU distribution changes at each stage during training. Ref. [5] proposed a dynamic IoU threshold method for the RCNN stage. They selected the 75th largest IoU value of the training samples as the new threshold of the only RCNN stage. However, the 75th largest IoU value may not be the best choice for the additional RCNN stages, because different stages have different IoU distributions and the 75th largest sample may not reflect different IoU distributions. Ref. [34] developed an ADAS-GPM module to address inter-sample and intra-sample imbalance problems through dynamic label assignment. ADAS-GPM contains the Gaussian probabilistic distribution-based fuzzy similarity metric (GPM) that solves the difficulty of poor matching between the tiny bounding boxes and pre-defined anchors and the adaptive dynamic anchor mining strategy (ADAS) to ensure that enough and higher quality positive anchors are assigned to tiny objects. Ref. [6] used the changing average IoU of the samples as the dynamic IoU threshold. Ref. [35] calculates the dynamic IoU threshold for each GT with factors like the GT size, shape, and distribution.

The methods mentioned above are created specifically for horizontal objects. Some of these methods may not be very effective for rotated objects due to the differences in the IoU distribution. The bounding box of a rotated object has a high aspect ratio and additional angle variation. Therefore, even a small difference in the angle between the sample and the ground truth will result in a low IoU [36].

2.2. Methods for Improving the Accuracy of Model Features

The baseline models in this paper are derived from our previous works, which are based on the idea of alleviating feature conflict between classification and regression. We will briefly introduce related works and our previous works to show their effectiveness and necessity.

Our previous works made some improvements on the task-aware spatial disentanglement (TSD) box head [37]. TSD utilized different deformable pooling operations [38] to obtain an adaptive bin position for classification and regression, respectively, which can alleviate the spatial misalignment problem between these two sub-tasks in the sibling head. In addition, they added an extra branch with the sibling head to further improve the results. However, ref. [39] pointed out that sharing parameters of classification and regression in the sibling head may affect each other. They proposed a double-head architecture that separates the parameters of classification and regression in the box head. More importantly, in [9], we found that as the IoU of the positive samples increases, the extra branch with a standard pooling operation becomes more important, and the deformable pooling had a bad effect on negative samples, so we added different weights to the loss of the extra branch according to the IoU of training samples.

The above methods were designed for horizontal object detection and cannot be directly used to detect rotated objects represented by rotated bounding boxes. We first designed a rotation-invariant TSD (RITSD) method in [40] to alleviate the spatial misalign- ment problem of rotated ships. Later, considering that a small angle error can lead to a large regression error, in [8], in addition to adding an extra branch with weights, we also designed an angle deformable pooling and applied it to the regression subtask of the extra branch to increase the model’s angle offset learning ability.

In addition to alleviating the feature conflict between classification and regression, feature fusion is also a commonly used approach. Ref. [41] proposed a multidomain feature fusion object detector (MFFOD), which enables the extraction and fusion of domain-specific information and global high-frequency and low-frequency information. Ref. [42] developed a deep feature learning and feature fusion network (DFLFFN) to combine the deep features and the shallow features with rich details. However, many feature fusion methods inevitably increase the number of model parameters and training time.

3. Method

As shown in Figure 4, in a general Cascade RCNN detection framework, the proposals produced from the RPN would be divided into positive and negative samples based on the threshold obtained using our dynamic threshold (DT) method and are then fed into the first RCNN stage. The samples generated by the previous stages need to be re-divided into positive and negative samples using the threshold produced using our dynamic threshold method before being passed into the next RCNN stage.

Since the detection model for horizontal storage tanks and rotated ships are different, we will introduce these two models separately, as well as the application of the dynamic threshold method in each.

3.1. Horizontal Object Detection Framework

As shown in Figure 5a, the architecture is a Cascade structure. In each RCNN stage, the sibling head has been replaced with the enhanced TSD (ETSD) proposed in our previous work [9]. The input image is denoted as I, P0 represents the samples generated by the Region Proposal Network (RPN), and P1 represents the samples regressed in the previous stage. DT stands for the dynamic threshold method and DH stands for the Double-Head box head. Additionally, the architecture of the ETSD is depicted in Figure 5b. Equations (1) and (2) are the loss weight formulas for the classification and regression of samples with different IoUs, respectively.

w_{c l s} = \{\begin{matrix} 2.0, I o U < t h r e s h o l d \\ 1.0 + (⌊\frac{I o U}{0.1}⌋ - 5) \times 0.2, I o U \geq t h r e s h o l d \end{matrix}

(1)

w_{r e g} = 1.0 + (⌊\frac{I o U}{0.1}⌋ - \frac{t h r e s h o l d}{0.1}) \times 0.2,

(2)

3.2. Rotating Object Detection Framework

Figure 6 shows the rotated object detection framework, which is also a Cascade architecture. FPN stands for feature pyramid network. The anchors used in RPN are horizontal and are converted into rotated samples in the first RCNN stage. The ground truth bounding boxes in these two stages are the minimum bounding boxes of the ground truth rotated bounding box. Then the second RCNN stage further processes the rotated samples. RITSD stands for rotation invariant task-aware spatial disentanglement proposed in our previous work [8]. The schematic diagram of RITSD is shown in Figure 7a. Compared to the deformable pooling operation in horizontal object detection, the deltas of each bin of the rotating invariant deformable pooling operation have an additional angle delta. RITSD is rotation invariant because the learned deltas are based on coordinates along the height and width of the bounding box. Like TSD, the delta for each bin is different for the classification subtask, but the delta for each bin is the same for the regression subtask. The schematic diagram of the Double-Head with angle-deformable pooling based on the regression subtask (DHADR) is shown in Figure 7b. The classification subtask is a fully connected layer, and the regression subtask has an additional angle-deformable pooling module, which only has an angle delta. The weights of the extra branch are the same as those of the ETSD.

3.3. Dynamic IoU Threshold Method

This dynamic IoU threshold method is used in the RCNN stage and is determined based on the IoU value of the selected training samples. The calculation process is detailed in Algorithm 1. The initial value of Tnow is set to 0.5. The conversion formula of step 5 is presented in Equation (3). In Formulas (4)–(7), n represents the number of selected samples, Xi is the IoU value of a single sample, and X represents the set of IoU values of all selected samples.

Algorithm 1 Dynamic threshold methods based on the Gaussian skewed distribution

Require:
Selected proposal set P, ground truth set G, period of iterations C to update the IoU threshold
Ensure:
Trained object detector D

Initialize IoU threshold T now as 0.5
Build an empty set S_T to record the calculated IoU threshold T_k of the kth iteration
for i = 0 to maxiter do
Calculate the IoU of every training sample
Divide the IoU into intervals with the size of 0.1 and convert the IoU value of each interval into the starting value of that interval. The conversion formula is shown in Equation (3).
Select a certain range of IoU to calculate the average value, standard deviation, and skewness within the range according to Formulas (4)–(6). If the object is rotating, use the Formula (7) instead of (6). Finally calculating the IoU threshold T_k of the kth iteration according to Formula (8)
if T_k less than 0.5 then
T_k = 0.5
end if
add T_k to S_T
if (i + 1)%C == 0 then
Update T_now: T_now = Mean(S_T)
S_T = ∅
end if
Train the model with the latest T_now
end for
return Improved detection model

We assume that the IoU of training samples follows a Gaussian distribution. The Gaussian distribution diagram is depicted in Figure 8. It is a common practice to select samples within a certain range using the mean µ and standard deviation σ. We boldly assume that the µ can represent the IoU threshold because it can reflect the overall data distribution to a certain extent. In addition, the IoU threshold is usually set to 0.5, which can be regarded as the mean value µ of a standard Gaussian distribution with the independent variable ranging from 0 to 1. Skewness is a measure used to evaluate the direction and degree of asymmetry of the distribution of statistical data. It can be considered as a coefficient of the standard deviation and is used to adjust the IoU threshold to ensure a balance between positive and negative samples. For instance, as shown in Figure 3a, when the skewness value is greater than 0, the number of samples with smaller IoU values is relatively large, resulting in a smaller µ. Using µ, skewness, and standard deviation can increase the threshold and better distinguish between positive and negative samples.

{I o U}_{n e w} = \frac{⌊\frac{I o U}{0.1}⌋}{10},

(3)

μ = \frac{1}{n} \sum_{i = 1}^{n} x_{i},

(4)

σ = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(x_{i} - μ)}^{2}},

(5)

{S k e w (X)}_{h o r i} = E [{(\frac{x - y}{σ})}^{3}] = \frac{\sum_{i = 1}^{n} {(x_{i} - μ)}^{3}}{n \times σ^{3}},

(6)

{S k e w (X)}_{r o t a} = \sqrt[3]{\frac{\sum_{i = 1}^{n} {(x_{i} - μ)}^{3}}{n \times σ^{3}}},

(7)

T_{k} = μ + S k e w \times σ,

(8)

4. Experiments and Evaluations

This section evaluates the effectiveness and advancement of the proposed dynamic threshold method for horizontal and rotated object detection in optical remote sensing images. The dynamic threshold method is first compared with baseline models and some advanced techniques. Next, several ablation experiments were performed to verify the correctness of the hyperparameters. Then, heatmaps and IoU thresholds are visualized to explore the effective feature region and the changing pattern of the thresholds.

4.1. Dataset

We use storage tanks as representative horizontal objects to test our proposed method. We utilize two datasets: DIOR [7] and DOTA [43]. We extracted storage tank images from these two datasets. Table 1 shows that these two datasets contain the most storage tank images. For clarity, the storage tank images from DIOR and DOTA are labeled DIOR−tank and DOTA−tank. After extracting the tank images, we found that the ratio of training and testing images is unbalanced, especially in DIOR−tank, where the amount of test data is much larger than the training data. To solve the unbalanced problem, we split the images in DIOR−tank and DOTA−tank into training and test sets in a ratio of 4:1, respectively. In addition, the image resolution of DIOR-tank is 800 × 800, while the images of DOTA−tank are larger than 800 × 800 and vary in size. We subdivided the images in DOTA−tank into 800 × 800 sub-images with 200 pixels of overlap. If one side of the image does not reach 800 pixels, it is padded with 0 pixels.

We used three ship image datasets to test the dynamic threshold method for rotated objects, and the related information of these datasets is shown in Table 2. The first dataset is called DOTA−Ship−Plus and includes ship images from the DOTA 215 dataset and Google images captured by the LocaSpaceViewer4 software. The 216 captured images are all 800×800, and we resize the DOTA ship images to 800 using a similar 217 approach as the tank images. The DOTA−Ship−Plus dataset contains 4753 images of 29,847 ships. The second ship dataset FAIR1M−Ship contains ship images from the FAIR1M [44] dataset. FAIR1M is a fine-grained object recognition dataset, and we modify the fine-grained label to “ship”. After resizing the images to 800, we end up with 11,083 images containing 47,612 ships. The third dataset is called HRSC2016, which has been widely used in numerous papers to verify the effectiveness of their methods. The training, validation, and test sets of this dataset contain 436, 181, and 444 images, respectively. In this paper, the training set and validation set are combined for training. In addition, the images have been resized to 512 × 800.

4.2. Dataset Implementation Detail

All experiments in this paper are performed using the detectron2 framework [45]. Our model backbone is ResNet50 [46], and the initial parameters are pre-trained on ImageNet [47] by detectron2. In the RPN stage, the anchor scales are set to {16, 32, 64, 128, 256}, and the aspect ratios are set to {0.5, 1, 2}. We selected 128 samples for training, with a positive and negative ratio of 1:1. In the RCNN stage, 256 samples are selected, and the ratio of positive and negative samples is 1:3. Moreover, we adopted the IoU−balanced sampling method [27] to solve the problem of an imbalance in the number of negative samples in different IoU intervals. We only keep the detection bounding boxes with scores higher than 0.5 and perform the Non-maximum Suppression (NMS) using an IoU threshold of 0.5.

During the training process, we completed a total of 80,000 iterations, using the Warmup−MultiStep−LR method to dynamically adjust the learning rate, which was 0.001 for the first 40,000 iterations, 0.0001 for the next 20,000 iterations, and 0.00001 for the last 20,000 iterations. The weight decay and momentum values were 0.001 and 0.9 respectively, and the optimizer used was SGD.

The model was trained end-to-end on a laptop with an RTX2070−max GPU and 8G video memory. The batch size of the input was 1.

In this paper, we utilize a subset of the COCO evaluation metrics to evaluate our approach. In particular, we focus on the following metrics: AP (mean Averaged Precision), AP₀.₅ (AP at the IoU threshold of 0.5), and AP_0.75 (AP at the IoU threshold of 0.75). Higher values of AP, AP₀.₅, and AP_0.75 indicate better performance. However, when evaluating the model based on the HRSC2016 dataset, we adopted the Pascal 2007 and Pascal 2012 evaluation methods, as these two methods are widely used to verify the effectiveness of the model based on the HRSC2016 dataset.

4.3. Experiments on DIOR Dataset

4.3.1. Comparative Study with Advanced Methods

We compare the proposed Dynamic Cascade model with several other advanced object detection methods, including Cascade RCNN [1], Double−Head [39], TSD [37], DecoupleNet [48], LEGNet [49], Dynamic RCNN [5], and Dynamic-TLD [6]. Most of these algorithms achieve excellent performance by mitigating feature conflicts or adjusting thresholds. The quantitative comparison results are shown in Table 3, indicating that Dynamic-Cascade-ETSD produced the best results.

Although Cascade RCNN employs an additional RCNN stage, the feature conflict caused by the sibling head limits the improvement brought by the additional stage. Double-Head uses two independent heads to separate the parameters of classification and regression in the box head, effectively reducing the feature conflict between them. Table 3 shows that Double-Head has significantly outperformed Cascade RCNN, even though Double-Head does not have a Cascade architecture. This emphasizes the importance and necessity of resolving feature conflict between classification and regression. Based on the Double-Head architecture, TSD further enhances the detection results. TSD uses different deformable pooling modules for classification and regression to alleviate the spatial conflict between them. DecoupleNet and LEGNet achieved some improvement due to the enhancements in the backbone. In addition, Dynamic RCNN introduces a dynamic threshold method that is different from this paper. They sort the samples based on the IoU value and select the 75th IoU as the new threshold. Dynamic−TLD also first sorts the samples in descending order based on their IoU values and then uses the average IoU value of the first 220 samples as the new threshold. ETSD was a two-stage framework, and it is the specific structure of each RCNN. The AP value of ETSD was 53.43%. When we used the dynamic threshold method of Dynamic-TLD in the cascade framework, the AP value was 54.20%, which was even worse than ETSD. We have to highlight that the Dynamic RCNN in Table 3 is a two-stage method since the AP of Dynamic-RCNN in the cascade framework is worse, which was 54.43%. It indicates that the existing dynamic threshold methods are not applicable to additional RCNN stages or different datasets. One important reason is the significant difference in sample distribution at each RCNN stage across different stages or at different datasets. The fact that our dynamic threshold method achieves the best results proves that it can effectively reflect the change in the sample distribution and determine the corresponding threshold. Figure 9 displays the dynamic IoU threshold at different RCNN stages for these three methods. For the two existing methods, apart from the specific threshold value, the pattern of threshold variation also differs significantly from our method.

Figure 10 shows examples of storage tank detection using different methods. The red box indicates the predicted bounding box, while the green box represents the ground truth bounding box. It is obvious that both the localization accuracy and classification are gradually improving.

4.3.2. Stability Analysis of the Dynamic Threshold Method for Storage Tanks Detection

The dynamic thresholding method does not add any modules to the model, so it does not increase the number of model parameters. Moreover, the complexity of the dynamic threshold method is not high, and it does not increase training and even decrease the testing speed. Table 4 shows the changes in the model’s parameters, training time, and testing time before and after using the dynamic thresholding method.

After applying the dynamic thresholding method, the model converges more easily. Figure 11 shows the changes in training loss with and without the use of the dynamic thresholding method for storage tanks detection. We can see that after applying the dynamic threshold method, the model converges faster and achieves a lower loss value.

4.3.3. Experiments to Determine the Range of Samples of Calculating the Dynamic Threshold for Storage Tank Detection

The baseline model that we use is a combination of ETSD and Cascade RCNN, which we refer to as Cascade−ETSD in this paper. When we replace the sibling head with ETSD, the IoU distribution changes at the third stage. To obtain a more accurate model, we need to adjust the static IoU threshold to align with the new distribution. The experimental results can be found in Table 5, where Cascade−ETSD achieves the best AP value when the IoU threshold of the third stage is 0.70.

Dynamic−Cascade−ETSD needs to select samples within a certain IoU range to calculate the mean and standard deviation of the IoU value. Because the IoU of small samples is very small, if all samples are used to calculate the dynamic threshold, it will be very small and the expected IoU threshold is usually greater than 0.5. The results of selected samples in different ranges are shown in Table 4. In the second stage, the static IoU threshold is typically set to 0.5 [1], and the IoU distribution is like a positively skewed distribution. If we use the samples with an IoU greater than 0.5 to calculate the dynamic IoU threshold according to Formula (7), then its value must be much greater than 0.5. To appropriately lower the dynamic IoU threshold, we try to select samples with an IoU greater than 0.4. In addition, in the Cascade RCNN architecture, the static IoU threshold will be increased gradually by 0.1 in different RCNN stages. Therefore, we select samples with an IoU greater than 0.5 at the third stage. From Table 6, we can observe that the 2st ≥ 0.4−3st ≥ 0.5 model produces better results than the Cascade−ETSD model.

We then tested different IoU ranges for the model to determine the best IoU range. Initially, we tried to simultaneously decrease or increase the IoU range at each stage. The results in Table 4 show that both adjustments lead to a decrease in the results. To further understand the changes in the IoU distribution and IoU threshold at different stages, we create a visual representation of the skewness value and IoU threshold, as shown in Figure 12. It is found that the skewness value is greater than 0 in the second stage and less than 0 in the third stage, which supports our hypothesis that samples follow a positively skewed distribution in the second stage and a negatively skewed distribution in the third stage.

We notice that the IoU threshold in the second stage keeps increasing and is larger than 0.6 most of the time, which is higher than the static IoU threshold in the third stage of the traditional Cascade RCNN architecture. We suspected that the dynamic IoU method may lead to an overly large threshold, which may have a negative impact. To test whether this suspicion is right, we changed the IoU threshold in the second stage back to the static threshold of 0.5. However, the AP value decreases compared to the baseline model. This suggests that the second stage may act as a Region Proposal Network (RPN) for the third stage, and the RPN usually has a relatively large IoU threshold. Since the second stage should have a relatively large threshold, we expand the range of samples to [0.5, 1) to increase the dynamic threshold of the second stage, but the AP value decreases. We then observed that the dynamic threshold in the third stage first increased and then decreased. To verify that the threshold should be smaller, we expand the range of samples of the third stage to [0.4, 1), and the AP is still worse than the best model. Finally, we found that the AP value obtained by the 2st ≥ 0.3−3st ≥ 0.4 model is very close to our best model. The AP_0.5 is much better than the best model, but AP_0.75 is worse. We attempted to narrow the range to [0.5, 1.0) at the third stage to increase the dynamic threshold value and improve the AP_0.75 value, but the results were the worst, which indicates that the threshold gap between two adjacent stages should not be too large.

We were concerned about the effectiveness of the reduction of the threshold at the third stage, as our experience suggests that the threshold should be increased as training progresses since the number of samples with a high IoU will increase. Moreover, in Figure 13, we found that the average IoU of selected samples at the third stage continues to increase over time, which motivates us to make a small modification to the dynamic IoU method so that the threshold does not decrease. Specifically, if the new IoU threshold is smaller than the previous threshold, the dynamic threshold will not be updated. The quantitative results are presented in Table 7, which show that this constantly increasing threshold does not produce better results. In Figure 12c, we can observe that the skewness value keeps decreasing, indicating the proportion of large IoU samples is increasing so that the average IoU of all samples is too large. We need to use the skewness value to decrease the threshold appropriately. The specific number of samples in different intervals at the third stage is shown in Figure 14b.

4.3.4. Visualization of Effective Feature Area of Dynamic−Cascade−ETSD

In Figure 15, we create heatmaps to visualize the differences in useful feature areas between the Cascade−ETSD and Dynamic−Cascade−ETSD models. The visualization reveals that the Dynamic−Cascade−ETSD model can utilize more feature areas around the central region for classification and more accurate features around the edge for regression.

4.3.5. Ablation Study on Iteration Number C

The dynamic threshold method requires adjusting the threshold every C iteration. For our method, if C is too small, overly frequent updates of the threshold will hinder the model’s convergence speed. If C is too large, it will fail to obtain thresholds that match the new sample distribution. We tested different C values to find the most effective one. The comparison results can be found in Table 8, where the model performed best when the C value is 500. The initial setting of C is based on the work of [5], which has a similar framework to ours. They tested C values of 20, 100, and 500, and obtained the best results with C of 100. The difference in the value of C may be due to the different design logic of our method, which requires a longer update cycle.

4.3.6. Ablation Study on the IoU Value of Calculating the Dynamic Threshold

The IoU used to calculate the dynamic threshold is not the actual IoU of each sample. We divide the IoU of the samples into intervals of 0.1 and transform the IoU in each interval to the initial value of the interval. The transformation is described in [30]. To confirm whether this transformation produces better results than when using the actual IoU value, we compare them. The comparison results are shown in Table 9.

We believe that the reason why using the minimum value of the interval yields better results is that it helps reduce the volatility of the dynamic threshold. From the dynamic threshold of different stages in Figure 12, we can see that the range of threshold variation is not particularly large. Using the actual value would lead to higher threshold values and a wider range of variation. According to the results in Table 9, this is not beneficial for obtaining accurate dynamic thresholds.

4.4. Experiments on DOTA−Tank Dataset

To evaluate whether the dynamic threshold method is applicable to different datasets, we tested it on the DOTA−Tank dataset. We first determine the optimal static threshold for the third stage and then compare it with the dynamic threshold method. The comparison results are shown in Table 10. The model achieves significantly better results with the dynamic IoU threshold. We also examined the skewness and threshold at different stages to determine whether the change pattern is like the DIOR−Tank dataset. The visualizations are shown in Figure 16, which shows that the variation pattern and value range are like the DIOR−Tank dataset for both skewness and IoU threshold.

4.5. Experiments on DOTA−Ship−Plus Dataset

4.5.1. Comparative Study with Advanced Methods Based on DOTA−Ship−Plus Dataset

In this section, we compare the dynamic threshold based on the ERITSD model with some state−of−the−art ship detection methods. Apart from the ERITSD model, we also use RRPN [50], RoI−Transformer [51], DecoupleNet [48], LEGNet [49], R³Det [52], and TSO−3st−DH [39] (Three−Stage with Oriented Bounding Boxes and the third stage is the Double-Head), and RRPN uses rotated anchors. RoI−Transformer has a transformer that can convert a horizontal proposal into a rotated proposal. DecoupleNet and LEGNet are two newly proposed backbones, and the detector is O−RCNN [53]. R³Det is a one−stage method that uses rotated anchors and has a feature alignment module. It greatly improves the detection accuracy while maintaining the efficiency of the one-stage method. The framework of TSO−3st−DH is like Figure 5, but the box head of the third stage is Double-Head. The comparison results are shown in Table 11.

RRPN uses rotated anchors to generate rotated proposals, and the training process takes longer due to the increase in the number of anchors. The RoI−Transformer is considered a third-stage framework because it incorporates an additional transformer to convert horizontal proposals into rotated proposals. It produces better results than RRPN and requires less training time. Despite the presence of a feature alignment module in R³Det, the one-stage-based method still does not produce good results. Although DecoupleNet and LEGNet both achieved better results than RoI−Transformer, they still fall short compared to models that can reduce the feature conflict between classification and regression in the box head, indicating that optimizing the box head is necessary. Dynamic−ERITSD performs the best among these methods. Compared with our previous work ERITSD, the dynamic IoU threshold method further improves the AP value by 0.26%.

Next, we show some detection examples using the mentioned methods (except RRPN). These examples are shown in Figure 17. Among other models utilizing rotated proposals, ERITSD−Dynamic achieves the most accurate localization.

4.5.2. Stability Analysis of the Dynamic Threshold Method for Rotated Ship Detection

Table 12 shows the changes in model parameters, training, and test speed of the rotated ship detection model after applying the dynamic threshold method. Like the storage tanks detection model, there is no change in the number of model parameters and training time, but the test is faster.

Figure 18 shows the changes in training loss with and without the use of the dynamic thresholding method for rotated ship detection. With the dynamic threshold method, the model also converges faster and achieves a lower loss value.

4.5.3. Experiments to Determine the Range of Samples of Calculating the Dynamic Threshold for Ship Detection Based on DOTA−Ship−Plus Dataset

The process of determining the sample range at each stage is shown in Table 13. Initially, we used the same scope as Dynamic−Cascade−ETSD, but the results were not as good as the ERITSD model. Considering that the box head at the second stage of Dynamic− ERITSD is the sibling head and has relatively poor regression ability, which leads to a smaller IoU at the third stage, we adjusted the range of the third stage to [0.4, 1.0) to reduce the dynamic threshold, but the improvement was not significant. Subsequently, we gradually expanded the range at different stages to reduce the dynamic threshold but still could not achieve better results than ERITSD. We suspected that even within the appropriate sample range, the dynamic threshold was still too large. To appropriately lower the dynamic threshold, we try to update the skewness value to its cube root value. This decision was inspired by eliminating the exponentiation of σ in the denominator through the cube root calculation. Ultimately, the best results were achieved when the sample range of each stage was set to [0.3, 1), surpassing the ERITSD model.

The skewness and thresholds at different stages of ship detection are shown in Figure 19. Compared with the storage tank detection, the thresholds at the second stage are smaller, while the threshold at the third stage continues to increase. This may be due to the sibling head at the second stage causing insufficient samples with high IoU at the third stage. Figure 20 depicts the number of samples at different intervals in different stages. From Figure 20b, we can observe that the number of samples with IoU greater than 0.7 continues to increase, indicating that samples with high IoU have not yet reached the saturation point.

4.5.4. Visualization of Effective Feature Area of Dynamic−ERITSD

In Figure 21, we create heatmaps to visualize the changes in the effective feature areas between the ERITSD and Dynamic-ERITSD models. In terms of classification, the effective feature areas emphasize the bow and stern. On the regression side, the ERITSD−Dynamic model can exploit more comprehensive boundary features.

4.6. Experiments on FAIR1M−Ship Dataset

4.6.1. Experiments on Three−Stage Framework

This section tests the Dynamic−ERITSD model based on the FAIR1M−Ship dataset to understand its performance based on different datasets. The experimental results are shown in Table 14. The Dynamic−ERITSD model outperforms ERITSD, with the best sample range being (0.2, 1]. Figure 22 shows the number of samples of different IoU intervals at different stages. There are some differences at the third stage compared to the DOTA−Ship−Plus dataset, which suggests that different datasets may have different IoU distributions, leading to different suitable sample ranges. The skewness and threshold of the FAIR1M-Ship dataset at different stages can be seen in Figure 23. Compared to the DOTA-Ship-Plus dataset, the dynamic threshold for the second stage is always kept at 0.5, while for the third stage, the threshold was 0.5 most of the time. This is because our method sets the dynamic threshold to 0.5 if it is less than 0.5.

4.6.2. Experiments on Four−Stage Framework

The FAIR1M-Ship dataset is designed for fine-grained detection. We try adding an additional stage for fine-grained detection and show the results in Table 15. Our focus is on exploring dynamic IoU patterns. To facilitate observing the changes in results, we use the average of all classes as the result. We adopted a three-stage framework with the dynamic threshold to perform fine-grained detection as the baseline and then experimented with a four-stage framework with the dynamic threshold method using different IoU ranges. Ultimately, the model produced the best results when the IoU range of each stage was set to [0.4, 1.0).

Figure 24 shows the dynamic threshold for each RCNN stage in the four-stage frame- work. Compared to the three-stage framework, the threshold ranges for the second and third stages have changed. In Figure 25, we can observe the number of samples in different IoU intervals at different stages. We note that the second and third stages do not differ much compared to the three-stage framework. This suggests that the dynamic threshold of the last stage determines the threshold of the previous stages to a certain extent. The threshold range for the other stages should be like that of the last stage.

4.7. Experiments on HRSC2016 Dataset

In this section, we test the Dynamic−ERITSD model based on the HRSC2016 dataset. We compare it with some advanced methods, and the results are shown in Table 16. However, not all methods use the same evaluation metric. Some methods use the Pascal 2007 evaluation method, and some methods use the Pascal 2012 evaluation method. To make the comparison more accurate, we use them both in this section. Our previous method ERITSD produced the best result, except when compared to the dynamic threshold method in this paper. Since the optimal sample ranges for the first two ship datasets are different, we conducted experiments on the two best sample ranges based on the HRSC2016 dataset. The results show that the optimal range for HRSC2016 is [0.2, 1). Compared with ERITSD, the dynamic threshold can further improve the AP value by 0.1% regardless of the evaluation system used.

4.8. Case Studies of Our Model for Storage Tank and Ship Detection

Figure 26 displays some examples of storage tank detection results in different scenarios. The images (a), (b), and (c) show that our method can achieve great results for storage tanks with various sizes. Compared to the above three images, image (d) has a more complex background, which will be easily confused with storage tanks; however, our method could still distinguish the storage tanks. In image (e), the storage tank area is near the sea, which is also a common scenario as many storage tanks were built in port. The detection results in image (e) are also correct, and it indicates that the nearby sea surface information will not have much impact on the effect of our model. Finally, image (f) displays some missing objects, and these missing objects are small and closely aligned. Considering that isolated small objects can be detected correctly in other images, we infer that the reason for missed detection may be in the sample filtering process.

Figure 27 displays some examples of ship detection results in different scenarios. Images (a), (b), and (c) also display great detection results for ships of different sizes and aspect ratios. Almost all ships are on water, but in some special cases, the ship can be on land, like in images (d) and (e). However, we can tell from images (d) and (e) that our model can detect ships correctly even if the ships are on the land or are very small. In image (f), there are some cars on the roads near the ships. Ships and cars are small and similar in optical remote sensing images, and it may cause false detection. From the result, we can see that our model can effectively distinguish ships and cars. However, our model could not detect incomplete ships even when we knew there was a ship there.

5. Conclusions

For the RCNN-based detection framework, a new dynamic IoU threshold technique based on the idea of Gaussian skewed distribution is proposed. This method is effective for each stage of the Cascade RCNN framework and can be used for both horizontal and rotated object detection. When visualizing the dynamic threshold at different stages, it can be observed that the threshold is not constantly increasing. If the IoU distribution resembles a negatively skewed distribution and the skewness value continues to decrease, the threshold may also be decreased. Further, the change in dynamic thresholds at each stage will be very small and will not exceed 0.1. In addition, the threshold range of the last stage will affect the other stages. If the number of stages changes, the threshold range and change pattern of each stage may also change. However, the sample range used to calculate the dynamic threshold may vary for different datasets. Our future work will focus on developing consistent methods for selecting samples from different datasets.

Author Contributions

Conceptualization, T.W.; methodology, T.W.; investigation, T.W.; validation, T.W.; resources, T.W.; data curation, T.W.; writing—review and editing, B.L. and P.C.; visualization, T.W.; supervision, B.L. and P.C.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into High Quality Object Detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
Han, X. Modified Cascade RCNN Based on Contextual Information for Vehicle Detection. Sens. Imaging 2021, 22, 19. [Google Scholar] [CrossRef]
Ahmad, M.; Ahmed, I.; Jeon, G. An IoT-enabled real-time overhead view person detection system based on Cascade-RCNN and transfer learning. J. Real-Time Image Process. 2021, 18, 1129–1139. [Google Scholar] [CrossRef]
Cao, R.; Mo, W.; Zhang, W. MFMDet: Multi-scale face mask detection using improved Cascade rcnn. J. Supercomput. 2024, 80, 4914–4942. [Google Scholar] [CrossRef]
Zhang, H.; Chang, H.; Ma, B.; Wang, N.; Chen, X. Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020. [Google Scholar]
Li, J.; Cheang, C.F.; Liu, S.; Tang, S.; Li, T.; Cheng, Q. Dynamic-TLD: A Traffic Light Detector Based on Dynamic Strategies. IEEE Sens. J. 2024, 24, 6677–6686. [Google Scholar] [CrossRef]
Li, K.; Wan, G.; Cheng, G.; Meng, L.; Han, J. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J. Photogramm. 2020, 159, 296–307. [Google Scholar] [CrossRef]
Wang, T.; Li, Y. A double branches network with IoU-based loss weight to get more accurate deformable pooling feature for angle sensitive ships. Geocarto Int. 2022, 37, 14528–14546. [Google Scholar] [CrossRef]
Wang, T.; Li, Y. Enhanced Task-Aware Spatial Disentanglement Head for Oil Tanks Detection in High-Resolution Optical Imagery. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6512505. [Google Scholar] [CrossRef]
Shi, Y.; Lin, Y.; Wei, P.; Xian, X.; Chen, T.; Lin, L. Diff-Mosaic: Augmenting Realistic Representations in Infrared Small Target Detection via Diffusion Prior. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5004311. [Google Scholar] [CrossRef]
Sun, K.; Chen, Z.; Lin, X.; Sun, X.; Liu, H.; Ji, R. Conditional Diffusion Models for Camouflaged and Salient Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 2833–2848. [Google Scholar] [CrossRef]
Zhao, W.; Zhao, Z.; Xu, M.; Ding, Y.; Gong, J. Differential multimodal fusion algorithm for remote sensing object detection through multi-branch feature extraction. Expert Syst. Appl. 2025, 265, 125826. [Google Scholar] [CrossRef]
Abdulmaksoud, A.; Ahmed, R. Transformer-Based Sensor Fusion for Autonomous Vehicles: A Comprehensive Review. IEEE Access 2025, 13, 41822–41838. [Google Scholar] [CrossRef]
Ghodrati, A.; Diba, A.; Pedersoli, M.; Tuytelaars, T.; Van Gool, L. DeepProposal: Hunting Objects by Cascading Deep Convolutional Layers. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 2578–2586. [Google Scholar]
Kong, T.; Yao, A.; Chen, Y.; Sun, F. HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 845–853. [Google Scholar]
Li, H.; Liu, Y.; Ouyang, W.; Wang, X. Zoom Out-and-In Network with Map Attention Decision for Region Proposal and Object Detection. Int. J. Comput. Vision 2017, 127, 225–238. [Google Scholar] [CrossRef]
Shang, M.; Gao, J.; Sun, J. Character Region Awareness Network for Scene Text Recognition. In Proceedings of the 2020 IEEE International Conference on Multimedia and Expo (ICME), London, UK, 6–10 July 2020; pp. 1–6. [Google Scholar]
Kong, T.; Sun, F.; Yao, A.; Liu, H.; Lu, M.; Chen, Y. RON: Reverse Connection with Objectness Prior Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5244–5252. [Google Scholar]
Zhang, S.; Wen, L.; Bian, X.; Lei, Z.; Li, S.Z. Single-Shot Refinement Neural Network for Object Detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4203–4212. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef]
Kim, K.; Lee, H.S. Probabilistic Anchor Assignment with IoU Prediction for Object Detection. In Proceedings of the Computer Vision(ECCV), Glasgow, UK, 23–28 August 2020; pp. 355–371. [Google Scholar]
Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; Li, S.Z. Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 9756–9765. [Google Scholar]
Gu, X.; Yang, M.; Liu, K.; Zhang, Y. Classification-IoU Joint Label Assignment for End-to-End Object Detection. In Proceedings of the Pattern Recognition and Computer Vision (CVPR), Beijing, China, 29 October–1 November 2021; pp. 404–415. [Google Scholar]
Zhang, T.; Luo, B.; Sharda, A.; Wang, G. Dynamic Label Assignment for Object Detection by Combining Predicted IoUs and Anchor IoUs. J. Imaging 2022, 8, 193. [Google Scholar] [CrossRef]
Zhang, X.; Wu, Y.; Zhang, G.; Yuan, Y.; Cheng, G.; Wu, Y. Shape-Dependent Dynamic Label Assignment for Oriented Remote Sensing Object Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 132–146. [Google Scholar] [CrossRef]
Shrivastava, A.; Gupta, A.; Girshick, R. Training Region-Based Object Detectors with Online Hard Example Mining. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 761–769. [Google Scholar]
Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra R-CNN: Towards Balanced Learning for Object Detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 821–830. [Google Scholar]
Cao, Y.; Chen, K.; Loy, C.C.; Lin, D. Prime Sample Attention in Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11580–11588. [Google Scholar]
Oksuz, K.; Cam, B.C.; Akbas, E.; Kalkan, S. Generating Positive Bounding Boxes for Balanced Training of Object Detectors. In Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass Village, CO, USA, 1–5 March 2020; pp. 883–892. [Google Scholar]
Zhu, L.; Xie, Z.; Liu, L.; Tao, B.; Tao, W. IoU-uniform R-CNN: Breaking through the limitations of RPN. Pattern Recogn. 2021, 112, 107816. [Google Scholar] [CrossRef]
Nie, X.; Chai, B.; Zhang, K.; Liu, C.; Li, Z.; Huang, R.; Wei, Q.; Huang, M.; Huang, W. Improved Cascade-RCNN for automatic detection of coronary artery plaque in multi-angle fusion CPR images. Biomed. Signal Proces. 2025, 99, 106880. [Google Scholar] [CrossRef]
Xu, R.; Yu, J.; Ai, L.; Yu, H.; Wei, Z. Farmland pest recognition based on Cascade RCNN Combined with Swin-Transformer. PLoS ONE 2024, 19, e0304284. [Google Scholar] [CrossRef]
Yang, Z.; Liu, Y.; Wen, G.; Xia, X.; Zhang, W.E.; Chen, T. Object Detection in Remote Sensing Images With Parallel Feature Fusion and Cascade Global Attention Head. IEEE Geosci. Remote Sens. Lett. 2024, 21, 6007205. [Google Scholar] [CrossRef]
Fu, R.; Chen, C.; Yan, S.; Heidari, A.A.; Wang, X.; Escorcia-Gutierrez, J.; Mansour, R.F.; Chene, H. Gaussian similarity-based adaptive dynamic label assignment for tiny object detection. Neurocomputing 2023, 543, 126285. [Google Scholar] [CrossRef]
Ge, L.; Wang, G.; Zhang, T.; Zhuang, Y.; Chen, H.; Dong, H.; Chen, L. Adaptive Dynamic Label Assignment for Tiny Object Detection in Aerial Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 6201–6214. [Google Scholar] [CrossRef]
Xiao, Z.; Wang, K.; Wan, Q.; Tan, X.; Xu, C.; Xia, F. A2S-Det: Efficiency Anchor Matching in Aerial Image Oriented Object Detection. Remote Sens. 2021, 13, 73. [Google Scholar] [CrossRef]
Song, G.; Liu, Y.; Wang, X. Revisiting the Sibling Head in Object Detector. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11560–11569. [Google Scholar]
Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable Convolutional Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
Wu, Y.; Chen, Y.; Yuan, L.; Liu, Z.; Wang, L.; Li, H.; Fu, Y. Rethinking Classification and Localization for Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10183–10192. [Google Scholar]
Wang, T.; Li, Y. Rotation-Invariant Task-Aware Spatial Disentanglement in Rotated Ship Detection Based on the Three-Stage Method. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5609112. [Google Scholar] [CrossRef]
Xu, W.; Yin, Q.; Xu, C.; Zhao, Z.; Li, Y.; Huang, D. MFFOD: Multidomain feature fusion object detector for infrared images. Meas. Sci. Technol. 2025, 36, 035401. [Google Scholar] [CrossRef]
Tong, K.; Wu, Y. Small object detection using deep feature learning and feature fusion network. Eng. Appl. Artif. Intel. 2024, 132, 107931. [Google Scholar] [CrossRef]
Xia, G.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 3974–3983. [Google Scholar]
Sun, X.; Wang, P.; Yan, Z.; Xu, F.; Wang, R.; Diao, W.; Chen, J.; Li, J.; Feng, Y.; Xu, T.; et al. FAIR1M: A benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery. ISPRS J. Photogramm. 2022, 184, 116–130. [Google Scholar] [CrossRef]
Wu, Y.; Kirillov, A.; Massa, F.; Lo, W.Y.; Girshick, R. Detectron2. 2019. Available online: https://github.com/facebookresearch/detectron2 (accessed on 1 August 2021).
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
Lu, W.; Chen, S.; Shu, Q.; Tang, J.; Luo, B. DecoupleNet: A Lightweight Backbone Network With Efficient Feature Decoupling for Remote Sensing Visual Tasks. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4414613. [Google Scholar] [CrossRef]
Lu, W.; Chen, S.; Li, H.; Shu, Q.; Ding, C.; Tang, J.; Luo, B. LEGNet: Lightweight Edge-Gaussian Driven Network for Low-Quality Remote Sensing Image Object Detection. arXiv 2025, arXiv:2503.14012. [Google Scholar]
Ma, J.; Shao, W.; Ye, H.; Wang, L.; Wang, H.; Zheng, Y.; Xue, X. Arbitrary-Oriented Scene Text Detection via Rotation Proposals. IEEE Trans. Multimed. 2018, 20, 3111–3122. [Google Scholar] [CrossRef]
Ding, J.; Xue, N.; Long, Y.; Xia, G.S.; Lu, Q. Learning RoI Transformer for Oriented Object Detection in Aerial Images. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 2844–2853. [Google Scholar]
Yang, X.; Yan, J.; Feng, Z.; He, T. R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object. arXiv 2019, arXiv:1908.05612. [Google Scholar] [CrossRef]
Xie, X.; Cheng, G.; Wang, J.; Yao, X.; Han, J. Oriented R-CNN for Object Detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 3500–3509. [Google Scholar]
Jiang, Y.; Zhu, X.; Wang, X.; Yang, S.; Li, W.; Wang, H.; Fu, P.; Luo, Z. R2 CNN: Rotational Region CNN for Arbitrarily-Oriented Scene Text Detection. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; pp. 3610–3615. [Google Scholar]
Yang, X.; Yang, J.; Yan, J.; Zhang, Y.; Zhang, T.; Guo, Z.; Sun, X.; Fu, K. SCRDet: Towards More Robust Detection for Small, Cluttered and Rotated Objects. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8231–8240. [Google Scholar]
Liao, M.; Zhu, Z.; Shi, B.; Xia, G.s.; Bai, X. Rotation-Sensitive Regression for Oriented Scene Text Detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 5909–5918. [Google Scholar]
Li, Y.; Huang, Q.; Pei, X.; Jiao, L.; Shang, R. RADet: Refine Feature Pyramid Network and Multi-Layer Attention Network for Arbitrary-Oriented Object Detection of Remote Sensing Images. Remote Sens. 2020, 12, 389. [Google Scholar] [CrossRef]
Xu, Y.; Fu, M.; Wang, Q.; Wang, Y.; Chen, K.; Xia, G.; Bai, X. Gliding Vertex on the Horizontal Bounding Box for Multi-Oriented Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 1452–1459. [Google Scholar] [CrossRef] [PubMed]
Song, Q.; Yang, F.; Yang, L.; Liu, C.; Hu, M.; Xia, L. Learning Point-Guided Localization for Detection in Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 1084–1094. [Google Scholar] [CrossRef]
Hua, Z.; Pan, G.; Gao, K.; Li, H.; Chen, S. AF-OSD: An Anchor-Free Oriented Ship Detector Based on Multi-Scale Dense-Point Rotation Gaussian Heatmap. Remote Sens. 2023, 15, 1120. [Google Scholar] [CrossRef]
Pan, C.; Li, R.; Liu, W.; Lu, W.; Niu, C.; Bao, Q. Remote Sensing Image Ship Detection Based on Dynamic Adjusting Labels Strategy. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4702621. [Google Scholar] [CrossRef]
Chen, Y.; Wang, J.; Zhang, Y.; Liu, Y. Arbitrary-oriented ship detection based on Kul44lback-Leibler divergence regression in remote sensing images. Earth Sci. Inform. 2023, 16, 3243–3255. [Google Scholar] [CrossRef]
Pan, X.; Ren, Y.; Sheng, K.; Dong, W.; Yuan, H.; Guo, X.; Ma, C.; Xu, C. Dynamic Refinement Network for Oriented and Densely Packed Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11204–11213. [Google Scholar]
Wang, J.; Yang, W.; Li, H.C.; Zhang, H.; Xia, G.S. Learning Center Probability Map for Detecting Objects in Aerial Images. IEEE Trans. Geosci. Remote Sens. 2021, 59, 4307–4323. [Google Scholar] [CrossRef]

Figure 1. The IoU distribution change during the training process for storage tank detection based on the DIOR dataset. (a) The second stage. (b) The third stage.

Figure 2. The IoU distribution of different stages at the 77.3 k iteration for storage tank training based on DIOR dataset [7]. (a) The second stage. (b) The third stage.

Figure 3. Schematic diagram of Gaussian skewed distribution. (a) Positively skewed distribution. (b) negatively skewed distribution.

Figure 4. Overall technical framework diagram of the detection framework with the dynamic threshold method.

Figure 5. (a) The horizontal object detection framework. (b) The architecture of the ETSD.

Figure 6. The architecture of rotating object detection framework.

Figure 7. (a) The schematic diagram of RITSD. (b) The schematic diagram of the extra branch.

Figure 8. Schematic diagram of sample selection based on Gaussian skewed distribution.

Figure 9. Dynamic threshold of Dynamic−RCNN−Cascade, Dynamic−TLD−Cascade, and Dynamic−Cascade−ETSD. The first row is the dynamic threshold of the second stage. The second row is the dynamic threshold of the third stage. (a) Dynamic−RCNN−Cascade. (b) Dynamic−TLD−Cascade. (c) Dynamic−Cascade−ETSD.

Figure 10. Storage tank detection examples of different methods. The green boxes are the ground truth boxes, and the red boxes are predicted boxes. (a) Cascade RCNN. (b) Double−Head. (c) TSD. (d) DecoupleNet. (e) LEGNet. (f) Dynamic R−CNN. (g) Dynamic−Cascade−TLD. (h) ETSD. (i) Dynamic−Cascade−ETSD.

Figure 11. The training loss value of Cascade−ETSD and Dynamic−Cascade−ETSD. (a) Cascade−ETSD. (b) Dynamic−Cascade−ETSD.

Figure 12. Skewness and threshold at different stages based on DIOR−Tank dataset. (a) Skewness at the second stage. (b) Threshold at the second stage. (c) Skewness at the third stage. (d) Threshold at the third stage.

Figure 13. The meaning IoU value of the selected samples at the third stage.

Figure 14. Number of samples in different IoU intervals of different stages based on DIOR−Tank dataset. (a) The second stage. (b) The third stage.

Figure 15. Heatmaps of Cascade−ETSD and Dynamic−Cascade−ETSD. (a) Classification of Cascade−ETSD. (b) Regression of Cascade−ETSD. (c) Classification of Dynamic−Cascade−ETSD. (d) Regression of Dynamic−Cascade−ETSD.

Figure 16. Skewness and threshold at different stages based on DOTA−Tank dataset. (a) Skewness at the second stage. (b) Threshold at the second stage. (c) Skewness at the third stage. (d) Threshold at the third stage.

Figure 17. Ship detection examples of different methods based on DOTA-Ship-Plus dataset. (a) RoI−Transformer. (b) DecoupleNet. (c) LEGNet. (d) R³Det. (e) TSO−3st−DH. (f) ERITSD. (g) ERITSD-Dynamic.

Figure 18. The training loss value of ERTSD and Dynamic−ERITSD. (a) ERTSD. (b) Dynamic−ERITSD.

Figure 19. Visualization of skewness value and IoU threshold at different stages of ship detection based on DOTA−Ship−Plus dataset. (a) Skewness at the second stage. (b) Threshold at the second stage. (c) Skewness at the third stage. (d) Threshold at the third stage.

Figure 20. The number of samples of different stages in different IoU intervals based on the DOTA−Ship−Plus dataset. (a) The second stage. (b) The third stage.

Figure 21. Heatmap of ERITSD and Dynamic−ERITSD. (a) Heatmap of classification of ERITSD. (b) Heatmap of regression of ERITSD. (c) Heatmap of classification of Dynamic−ERITSD. (d) Heatmap of regression of Dynamic−ERITSD.

Figure 22. The number of samples in different IoU intervals of different stages based on the FAIR1M−Ship dataset. (a) The second stage. (b) The third stage.

Figure 23. Skewness and threshold at different stages of ship detection based on FAIR1M−Ship dataset. (a) Skewness at the second stage. (b Threshold at the second stage. (c) Skewness at the third stage. (d) Threshold at the third stage.

Figure 24. Dynamic threshold of different stages of the four−stage framework. (a) Threshold at the second stage. (b) Threshold at the third stage. (c) Threshold at the fourth stage.

Figure 25. The number of samples in different IoU intervals of different stages of the four−stage framework. (a) The second stage. (b) The third stage. (c) The fourth stage.

Figure 26. Storage tank detection examples in different scenarios. (a) Large−scale storage tanks. (b) Storage tanks imaged at an oblique angle. (c) Small−scale storage tanks. (d) Storage tanks in complex background. (e) Storages tanks located next to the ocean. (f) False detection for objects that are small and closely aligned.

Figure 27. Ship detection examples in different scenarios. (a) Large−scale ships. (b) Small−scale storage tanks. (c) Ships with high aspect ratio. (d) ships on the land. (e) Very small ships in complex environments. (f) Ships and other easily confusable objects.

Table 1. A list of datasets containing storage tank image.

Name	Number of Tank	Image Size	Year
DOTA	≥10,000	800−4000	2017
DIOR	≥10,000	800	2018

Table 2. A list of datasets containing ship image.

Name	Number of Ship	Image Size	Year
DOTA−Ship−Plus	29,847	800	2018
FAIR1M−Ship	47,612	800	2021
HRSC2016	2976	~1000	2016

Table 3. Quantitative comparison results with some advanced methods for storage tank detection.

Model	AP%	AP_0.5%	AP_0.75%
Cascade RCNN	52.33	72.17	62.23
Double-Head	53.65	72.94	63.25
TSD	53.94	72.92	63.15
DecoupleNet	54.22	72.87	63.17
LEGNet	54.38	73.01	63.34
Dynamic-RCNN	54.36	72.99	63.31
Dynamic−Cascade−TLD	54.20	70.22	62.66
ETSD	53.43	73.05	63.16
Dynamic−Cascade−ETSD	54.99	72.06	64.37

Table 4. Stability analysis of the dynamic threshold method for storage tanks detection.

Model	Params. (M)	Train Time (s/it)	Test Time (s/img)
Cascade−ETSD	145.3	0.29	0.1496
Dynamic−Cascade−ETSD	145.3	0.29	0.1490

Table 5. Quantitative comparison results of different IoU thresholds of the third stage of the Cascade−ETSD.

IoU Threshold	AP%	AP_0.5%	AP_0.75%
0.65	54.10	72.17	62.20
0.70	54.66	71.20	63.58
0.75	53.79	70.81	62.15

Table 6. Results of different ranges of samples based on Dynamic-Cascade-ETSD model.

Model	AP%	AP_0.5%	AP_0.75%
2st = 0.5 − 3st = 0.7	54.66	71.20	63.58
2st $\geq$ 0.4 − 3st $\geq$ 0.5	54.99	72.06	64.37
2st $\geq$ 0.3 − 3st $\geq$ 0.4	54.86	73.28	63.51
2st $\geq$ 0.5 − 3st $\geq$ 0.6	54.38	70.75	63.29
2st $=$ 0.4 − 3st $\geq$ 0.5	54.56	72.33	63.58
2st $\geq$ 0.5 − 3st $\geq$ 0.5	54.74	72.01	63.46
2st $\geq$ 0.4 − 3st $\geq$ 0.4	54.73	72.26	63.40
2st $\geq$ 0.3 − 3st $\geq$ 0.5	54.53	70.93	63.47

Table 7. Results of different change patterns in the third stage of Dynamic−Cascade−ETSD.

Model	AP%	AP_0.5%	AP_0.75%
Dynamic	54.99	72.06	64.37
Dynamic−no−decrease	54.78	71.94	63.37

Table 8. Results for different C values of Dynamic-Cascade-ETSD.

C	AP%	AP_0.5%	AP_0.75%
100	54.43	72.36	63.71
300	54.75	71.97	63.44
400	54.80	72.07	63.62
500	54.99	72.06	64.37
600	54.78	72.04	63.46

Table 9. Results of different IoU values for calculating the dynamic threshold.

IoU Value	AP%	AP_0.5%	AP_0.75%
Minimum value of interval	54.99	72.06	64.37
Actual value	54.34	70.86	63.31

Table 10. Results of Cascade-ETSD with different static thresholds and dynamic thresholds.

IoU Value	AP%	AP_0.5%	AP_0.75%
Cascade−ETSD−3st = 0.65	74.38	90.08	87.19
Cascade−ETSD−3st = 0.70	74.42	90.67	87.00
Cascade−ETSD−3st = 0.75	74.12	89.74	86.60
Cascade−ETSD−Dynamic	75.00	90.84	87.20

Table 11. Results of Cascade-ETSD with different static thresholds and dynamic thresholds.

Model	AP%	AP_0.5%	AP_0.75%
RRPN	53.69	85.69	62.03
RoI-Transformer	55.12	84.80	65.53
DecoupleNet	56.92	86.45	67.76
LEGNet	57.27	86.36	68.23
R³Det	45.08	73.10	57.09
TSO−3st−DH	57.50	86.40	68.66
ERITSD	58.13	87.28	69.90
Dynamic−ERITSD	58.39	87.35	69.74

Table 12. Stability analysis of the dynamic threshold method for rotated ship detection.

Model	Params. (M)	Training Time (s/it)	Test Time (s/img)
ERTSD	103.6	0.33	0.1429
Dynamic−ERITSD	103.6	0.33	0.1413

Table 13. Results of different sample ranges of Dynamic−ERITSD based on DOTA-Ship−Plus dataset.

Model	AP%	AP_0.5%	AP_0.75%
ERITSD	58.13	87.28	69.90
2st $\geq$ 0.4 − 3st $\geq$ 0.5	57.29	85.40	69.72
2st $\geq$ 0.4− 3st $\geq$ 0.4	57.50	84.64	68.50
2st ≥ 0.3 − 3st ≥ 0.4	57.37	84.42	68.80
2st ≥ 0.3 − 3st ≥ 0.3	58.09	87.39	68.86
2st ≥ 0.2 − 3st ≥ 0.3	58.05	86.38	69.30
2st ≥ 0.2 − 3st ≥ 0.2	57.87	87.31	68.81
2st ≥ 0.4 − 3st ≥ 0.4 − cr	58.20	86.41	70.29
2st ≥ 0.3 − 3st ≥ 0.4 − cr	57.96	86.38	69.39
2st ≥ 0.3 − 3st ≥ 0.3 − cr	58.39	87.35	69.74
2st ≥ 0.2 − 3st ≥ 0.3 − cr	58.28	88.24	69.53
2st ≥ 0.2 − 3st ≥ 0.2 − cr	57.98	87.46	69.52

Table 14. Results of different sample ranges of ERITSD−Dynamic based on FAIR1M-Ship dataset.

Model	AP%	AP_0.5%	AP_0.75%
ERITSD	53.71	83.90	61.47
2st ≥ 0.3 − 3st ≥ 0.3 − cr	53.40	82.04	61.25
2st ≥ 0.2 − 3st ≥ 0.2 − cr	53.84	83.96	61.85
2st ≥ 0.1 − 3st ≥ 0.1 − cr	53.80	83.85	61.76

Table 15. Results of different sample ranges of ERITSD−Dynamic with a four-stage framework based on FAIR1M-Ship dataset.

Model	AP%	AP_0.5%	AP_0.75%
≥0.2 ≥ 0.2 − cr	37.87	57.47	44.50
≥0.2 ≥ 0.2 ≥ 0.3 − cr	37.46	57.05	44.08
≥0.3 ≥ 0.3 ≥ 0.4 − cr	38.78	57.93	45.86
≥0.4 ≥ 0.4 ≥ 0.5 − cr	38.11	55.65	44.95
≥0.4 ≥ 0.4 ≥ 0.4 − cr	38.87	58.07	45.69
≥0.3 ≥ 0.3 ≥ 0.3 − cr	38.15	57.96	44.95

Table 16. Comparison results with some advanced methods based on HRSC2016 dataset.

Model	AP%
R2CNN [54]	73.1
RRPN [50]	79.6
SCRDet [55]	83.4
RRD [56]	84.3
RADet [57]	84.3
RoI-Transformer [51]	86.2
Gliding Vertex [58]	88.2
OPLD [59]	88.4
MDP-RGH [60]	89.69
DAL [61]	89.70
KLD [62]	89.87
DRN [63]	92.7 *
CenterMap-Net [64]	92.8 *
ERITS [8]	89.8/93.0 *
ERITSD−Dynamic−0.3 + cr	88.44/92.14 *
ERITSD−Dynamic−0.2 + cr	89.9/93.1 *

* Pascal 2012 result.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, T.; Liu, B.; Chen, P. Dynamic Cascade Detector for Storage Tanks and Ships in Optical Remote Sensing Images. Remote Sens. 2025, 17, 1882. https://doi.org/10.3390/rs17111882

AMA Style

Wang T, Liu B, Chen P. Dynamic Cascade Detector for Storage Tanks and Ships in Optical Remote Sensing Images. Remote Sensing. 2025; 17(11):1882. https://doi.org/10.3390/rs17111882

Chicago/Turabian Style

Wang, Tong, Bingxin Liu, and Peng Chen. 2025. "Dynamic Cascade Detector for Storage Tanks and Ships in Optical Remote Sensing Images" Remote Sensing 17, no. 11: 1882. https://doi.org/10.3390/rs17111882

APA Style

Wang, T., Liu, B., & Chen, P. (2025). Dynamic Cascade Detector for Storage Tanks and Ships in Optical Remote Sensing Images. Remote Sensing, 17(11), 1882. https://doi.org/10.3390/rs17111882

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dynamic Cascade Detector for Storage Tanks and Ships in Optical Remote Sensing Images

Abstract

1. Introduction

2. Related Works

2.1. Methods for Selecting Training Samples

2.2. Methods for Improving the Accuracy of Model Features

3. Method

3.1. Horizontal Object Detection Framework

3.2. Rotating Object Detection Framework

3.3. Dynamic IoU Threshold Method

4. Experiments and Evaluations

4.1. Dataset

4.2. Dataset Implementation Detail

4.3. Experiments on DIOR Dataset

4.3.1. Comparative Study with Advanced Methods

4.3.2. Stability Analysis of the Dynamic Threshold Method for Storage Tanks Detection

4.3.3. Experiments to Determine the Range of Samples of Calculating the Dynamic Threshold for Storage Tank Detection

4.3.4. Visualization of Effective Feature Area of Dynamic−Cascade−ETSD

4.3.5. Ablation Study on Iteration Number C

4.3.6. Ablation Study on the IoU Value of Calculating the Dynamic Threshold

4.4. Experiments on DOTA−Tank Dataset

4.5. Experiments on DOTA−Ship−Plus Dataset

4.5.1. Comparative Study with Advanced Methods Based on DOTA−Ship−Plus Dataset

4.5.2. Stability Analysis of the Dynamic Threshold Method for Rotated Ship Detection

4.5.3. Experiments to Determine the Range of Samples of Calculating the Dynamic Threshold for Ship Detection Based on DOTA−Ship−Plus Dataset

4.5.4. Visualization of Effective Feature Area of Dynamic−ERITSD

4.6. Experiments on FAIR1M−Ship Dataset

4.6.1. Experiments on Three−Stage Framework

4.6.2. Experiments on Four−Stage Framework

4.7. Experiments on HRSC2016 Dataset

4.8. Case Studies of Our Model for Storage Tank and Ship Detection

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI