A Ship Incremental Recognition Framework via Unknown Extraction and Joint Optimization Learning

Li, Yugao; Bao, Guangzhen; Hu, Jianming; Zhi, Xiyang; Hu, Tianyi; Wang, Junjie; Wu, Wenbo

doi:10.3390/rs18010149

Open AccessArticle

A Ship Incremental Recognition Framework via Unknown Extraction and Joint Optimization Learning

by

Yugao Li

^1,†,

Guangzhen Bao

^1,†

,

Jianming Hu

^1,*

,

Xiyang Zhi

¹

,

Tianyi Hu

¹,

Junjie Wang

¹ and

Wenbo Wu

²

¹

Research Center for Space Optical Engineering, Harbin Institute of Technology, Harbin 150001, China

²

Beijing Institute of Space Mechanics and Electricity, Beijing 100076, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Remote Sens. 2026, 18(1), 149; https://doi.org/10.3390/rs18010149

Submission received: 28 November 2025 / Revised: 20 December 2025 / Accepted: 21 December 2025 / Published: 2 January 2026

(This article belongs to the Special Issue Intelligent Interpretation of Remote Sensing Images and Intelligent Processing of Remote Sensing Information)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

An open-world detection framework oriented towards remote sensing ship targets is proposed.
An unknown target extraction module based on tail distribution modeling is proposed, which can accurately distinguish unknown ships.
A joint optimization based learning module is proposed to achieve incremental recognition of new class samples, while significantly alleviating the catastrophic forgetting problem of known classes.

What is the implication of the main finding?

The proposed method can provide ideas for discovering new types of maritime targets in complex scenarios, offering support for maritime safety and traffic management applications.

Abstract

With the rapid growth of the marine economy and the increasing demand for maritime security, ship target detection has become critically important in both military and civilian applications. However, in complex remote sensing scenarios, challenges such as visual similarity among ships, subtle inter-class differences, and the continual emergence of new categories make traditional closed-world detection methods inadequate. To address these issues, this paper proposes an open-world detection framework for remote sensing ships. The framework integrates two key modules: (1) a Fine-Grained Feature and Extreme Value-based Unknown Recognition (FEUR) module, which leverages tail distribution modeling and adaptive thresholding to achieve precise detection and effective differentiation of unknown ship targets; and (2) a Joint Optimization-based Incremental Learning (JOIL) module, which employs hierarchical elastic weight constraints to differentially update the backbone and detection head, thereby alleviating catastrophic forgetting while incorporating new categories with only a few labeled samples. Extensive experiments on the FGSRCS dataset demonstrate that the proposed method not only maintains high accuracy on known categories but also significantly outperforms mainstream open-world detection approaches in unknown recognition and incremental learning. This work provides both theoretical value and practical potential for continuous ship detection and recognition in complex open environments.

Keywords:

remote sensing; ship detection; open-world detection; unknown target recognition; incremental learning

1. Introduction

With the vigorous development of the marine economy and the increasing demand for maritime security defense, precise and efficient detection of maritime vessel targets has become a focal point of interest in various [1]. In the maritime transportation sector, real-time monitoring of surface vessel dynamics can optimize route planning, avoid collision accidents, and ensure the smooth and safe transportation of goods; in terms of coastal defense military aspects, quickly and accurately identifying friendly and enemy vessels plays a crucial role in safeguarding national maritime sovereignty and timely detecting potential threats [2,3].

Currently, optical remote sensing images have become the main input data for detection models due to their wide detection range, diverse spectral characteristics, and high image resolution. Remote sensing ship images are obtained through remote sensing technology from satellites or aerial platforms, widely used for maritime security, marine rescue, and military monitoring [4]. Ship targets in remote sensing images are often interfered with by complex backgrounds such as waves, clouds, and ports [5], and exhibit a high similarity to the environment, which increases detection difficulty [6]. Moreover, different categories of ships have similar appearances, making it difficult to distinguish them based solely on morphological features. The same category of ships may display significant differences in appearance due to variations in shooting angles, dynamic postures, and lighting changes, leading to substantial challenges in ship target detection tasks [7].

In recent years, with the rapid development of computer vision, object detection technologies based on deep neural networks [8,9], such as Faster R-CNN [10], Yolo [11], and Vision Transformer [12], have been widely applied in various fields. These methods can adaptively extract deep features from images, achieving high-precision detection in complex scenarios. It should be noted that although these methods have shown good performance on standard datasets in closed-world settings, they generally assume that the set of categories is fixed and complete, lacking the ability to handle the constantly changing and newly added object categories in real-world environments. In fact, to meet the detection demands of ever-expanding object categories in the real world, Open-World Object Detection [13] (OWOD) has emerged. Compared to traditional object detection tasks, OWOD requires the detector not only to identify and locate known class objects during the training phase but also to possess the ability to distinguish ‘unknown classes’ during the testing phase, meaning it should recognize categories that did not appear in the training set as unknown rather than forcing them to match existing category labels. Furthermore, when these unknown classes are manually labeled and incorporated into the training set, the detector should be capable of incremental learning [14], integrating these new categories into its model while maintaining stable recognition of old categories without catastrophic forgetting [15]. The characteristics of OWOD make it an important research direction that connects object detection with open-world cognitive abilities [16].

However, open-world detection methods also have certain limitations, specifically reflected in the following aspects: (1) It is difficult to correctly detect unknown categories in complex ship target detection scenarios. Currently, most open-world object detection (OWOD) methods are trained and validated on general natural image datasets (such as COCO [17], VOC [18]), where there are significant category differences, enabling the model to perform well in category distinction. However, when applied to complex remote sensing ship target detection scenarios, their performance is significantly constrained. Specifically, different categories of ships appear highly similar, especially military vessels like destroyers and frigates; their structural and visual differences are often extremely subtle. This inadequate inter-class difference results in existing OWOD methods generally lacking effective discrimination capabilities between known and unknown classes in remote sensing scenarios, leading to unknown targets being easily misidentified as known categories, thus causing many false positives. (2) In the incremental recognition scenario of ship targets, introducing new categories often leads to severe forgetting of old categories. There is a vast number of ship categories in remote sensing images, but the morphological differences between different categories are very subtle, and the sample distribution is highly imbalanced, with some category data being extremely scarce, while the sample scale difference between old and new categories is very pronounced. This imbalance makes the model more prone to favoring new categories when learning new categories, which can lead to catastrophic forgetting, resulting in a significant decline in detection performance for the already learned ship categories.

In order to solve the above-mentioned problems and to respond to the challenges arising from the continuous evolution of ship categories and new challenges in real environments, this paper proposes an open-world detection framework for remote sensing ship targets. First, to address the issue of distinguishing unknown ships in complex remote sensing scenarios, this paper proposes an adaptive unknown area proposal method based on extreme value theory and introduces a fine-grained feature tail distribution modeling mechanism, enabling precise characterization of unknown and known targets at the feature level, significantly enhancing the model’s recognition and separation capabilities for unknown category ships. Secondly, to tackle the catastrophic forgetting problem caused by the introduction of new classes, we have designed an incremental learning framework that jointly optimizes new and old classes, achieving differentiated regulation of parameter updates for the backbone network and detection heads through a hierarchical elastic weight constraint mechanism, thus achieving a dynamic balance in recognition capability between new and old classes. Finally, experimental results on the FGSRCS dataset indicate that this framework not only outperforms existing classical open-world detection methods but also provides a general solution for remote sensing ship target detection that can address both unknown target discovery and continuous incremental recognition, which holds significant theoretical significance and practical value.

Our main contributions are as follows:

An open-world detection framework oriented towards remote sensing ship targets is proposed. Supported by an adaptive unknown rejection threshold and an incremental learning mechanism, it provides new solutions to address the continual evolution of ship categories and unpredictability in complex real-world scenarios;
A Fine-Grained Feature and Extreme Value-Based Unknown Recognition Module (FEUR) is designed, which achieves precise detection and effective differentiation of unknown ship targets by capturing subtle differences between ship categories and combining it with tail distribution modeling;
A Joint Optimization-Based Incremental Learning (JOIL) module is proposed, which utilizes hierarchical parameter constraints to achieve differentiated adjustment of the backbone network and detection head. This allows for incremental recognition with only a small number of new class labeled samples, while significantly alleviating the catastrophic forgetting problem of known categories.

2. Related Works

In this section, we first introduce the development history and challenges of ship target detection, then review the research progress of open-world target detection and the limitations of existing methods, and explain the solutions to overcome the shortcomings of current methods.

2.1. Ship Object Detection

Ship target recognition is one of the core tasks in the field of ocean remote sensing, with significant military and civilian value [19]. With the development of high-resolution optical remote sensing technology, the ability to capture details in sea surface images has significantly improved, making precise ship detection and classification possible [20]. Optical remote sensing-based ship target recognition is not only an important supporting technology in civilian areas such as marine resource development and maritime traffic management [21] but also a key means of achieving battlefield situation assessment and threat early warning [22].

Traditional ship target recognition methods include techniques based on gray-level features, template matching, visual saliency, texture features, and machine learning, which are used for locating and classifying ship instances [23]. However, the aforementioned methods often rely on prior knowledge and manually designed features, making it difficult to maintain stable performance in complex scenarios such as changes in lighting, occlusion, and interference from waves [24].

With the rapid development of deep learning in the field of computer vision [25], Convolutional Neural Networks (CNNs) have been widely introduced into ship detection tasks. Deep learning-based methods can be divided into two major technical paradigms: two-stage and single-stage methods. The two-stage methods are represented by the R-CNN [26] series algorithms, which include three key steps: first, generating candidate regions through selective search or Region Proposal Network (RPN), then using Convolutional Neural Networks (CNN) [27] to extract regional features, and finally completing target class determination and precise regression of bounding boxes through a classifier [10]. Single-stage methods, such as RetinaNet [28], YOLO [11], and FCOS [29], eliminate the steps of region proposal generation and Non-Maximum Suppression (NMS), directly predicting target classes and bounding box coordinates through an end-to-end architecture. DETR is a novel single-stage detector that innovatively introduces the self-attention mechanism of Transformers, constructing a new detection paradigm that is anchor-free and does not require NMS through image serialization processing and global context modeling, significantly simplifying the traditional detection process [30]. Although two-stage methods have advantages in detection accuracy, single-stage methods are more suitable for practical application scenarios due to their efficient inference speed.

Meanwhile, with the rapid development of ship remote sensing datasets and the continuous improvement of public benchmarks, the technology system for ship recognition based on deep learning is constantly enriching. In 2020, CR2A-Net [31] proposed a cascaded ship target detection structure, which first slices the input image and uses a binary classifier to determine whether there are ships in these slices; slices containing ships are then further processed, where the Rotated Align Convolution (RAC) module refines image features to enhance the accuracy of ship detection. In 2022, DSLA [32] designed a novel anchor box quality scoring mechanism, which combines prior data and predicted data of anchor boxes to actively engage the model in the label assignment process; DSLA also designed a soft label assignment strategy that allocates weights to training samples in the loss function, effectively reducing the conflict between regression and classification objectives. The SASOD [33] proposed in 2024 integrates the Saliency-Guided Feature Fusion Network (SGFFN) and Dynamic IoU-Adaptive Strategy (DIAS) to address common issues such as inaccurate ship localization in cluttered backgrounds and detection of small vessels. SGFFN includes a Resolution-Matching Saliency Supervision (RMS) module and a Cross-Stage Saliency Integration Network (CSIN), which refines and merges saliency-aware features to improve ship visibility; at the same time, DIAS dynamically adjusts the IoU threshold to optimize the recognition performance of small ship targets.

However, existing research mostly focuses on the “closed world” hypothesis, which assumes that the set of categories during the training and testing phases is consistent. While such methods achieve excellent performance on known categories, they struggle to cope with the emergence of new ship targets in real-world applications within an open environment. Unlike the aforementioned methods, this paper proposes a ship target detection framework designed for open environments, which better addresses the continuously changing and newly added ship categories in real-world scenarios.

2.2. Unknown Identification and Open World Detection

To address the issue of unknown classes that are not covered by the training set in real-world scenarios, researchers have proposed the research direction of Open World Object Detection (OWOD) [34].

In 2021, Joseph et al. first introduced the concept of Open World Object Detection (OWD) [13]. OWD requires the model to label targets not introduced in the training data as unknown classes in an unsupervised manner. Once the annotation information for the aforementioned unknown classes is obtained, the model can learn knowledge of new categories while maintaining its recognition capability for unknown classes. Additionally, the detection framework ORE was proposed, which addresses the challenges of open world detection through contrastive clustering and energy-based unknown class recognition. However, ORE used statistical information from the validation set during the training process, leading to issues of data leakage.

In 2022, OW-DETR [35] proposed a method for open-world object detection based on the Transformer architecture. OW-DETR selects queries for unknown classes by using high-attention-response bounding boxes that do not belong to any known category, generating candidates for unknown objects more effectively through attention-based pseudo-label selection. Additionally, OW-DETR avoids the validation set dependency issue present in ORE, achieving better accuracy than ORE on various datasets such as MS-COCO [17] and PASCAL VOC [18]. In the same year, UC-OWOD [36] designed a two-stage object detector that introduces an unknown class classification module (Similarity-Based Unknown Classification, SUC), combining fully supervised and self-supervised learning to compute similarity matrices for different classes; it also includes an unknown class clustering refinement module (Unknown Clustering Refinement, UCR), which calculates the probabilities of samples being assigned to different classes based on the Student’s t-distribution and fine-tunes the results by minimizing KL divergence, allowing for further classification of newly detected classes.

In 2023, CAT [37] achieved the decoupling of localization tasks and recognition tasks through shared decoders. Based on Deformable DETR, the first stage decoder focuses on foreground object localization, while the second stage decoder is used to execute known/unknown category determination. This decoupling method reduces the influence of category information on the localization process during recognition, allowing the model to locate more foreground objects and thereby improve its detection capability for unknown targets. In the same year, PROB [38] parameterized recognized objects as multivariate Gaussian distributions in a query embedding space and trained the model through alternating optimization steps. During the incremental learning phase, PROB selected 25 samples each with the highest and lowest target scores, where the high-scoring samples represent the comprehensive features of that category to prevent catastrophic forgetting; the low-scoring samples belong to difficult-to-recognize targets, aiding the network in learning features of new class targets.

Recently, the VOS [39] method has utilized visual spatial distribution modeling to achieve the separation of unknown targets, and the UnSniffer [40] framework further proposes to improve detection stability in open environments through feature generation and discrimination mechanisms.

However, existing open-world detection methods are largely focused on natural scenes, and research on remote sensing ship targets is still extremely limited. In ship remote sensing images, unknown category targets often have characteristics such as small scale, complex backgrounds, and weak inter-class differences, which results in poor performance of existing methods when directly transferred. Therefore, how to design an effective unknown identification mechanism for ship detection tasks remains an urgent issue to be addressed. Unlike the above methods, this paper proposes an unknown identification module that focuses on the fine-grained feature differences of ships, accurately detecting and identifying unknown ship targets by modeling the feature distribution of ship targets.

2.3. Incremental Learning

Incremental Learning aims to enable models to continuously learn while being exposed to new categories, while maintaining the ability to recognize old categories and avoiding Catastrophic Forgetting. Currently, several classic incremental learning methods have been proposed, laying an important foundation for further research on open-world object detection.

In 2016, Li et al. proposed Learning without Forgetting [41] (LwF), which preserves the network’s response to old tasks by using a distillation loss, where the response target is calculated using data from the current task. Consequently, LwF does not require storing older training data; however, if the new task’s data belongs to a different distribution than previous tasks, this strategy may pose problems. As more dissimilar tasks are added to the network, performance on prior tasks declines rapidly. In 2017, knowledge distillation methods were deeply explored in iCaRL [42] proposed by Kirkpatrick et al., which uses a teacher-student network to transfer knowledge and retain the discriminative ability of old classes while learning new ones. Additionally, strategies such as sample replay and prototype maintenance are also widely applied to maintain the stability of the model by reusing old class samples or category prototype vectors. Based on these ideas, incremental learning methods suitable for open-world scenarios have gradually developed in the field of object detection. For example, in 2020, Shmelkov et al. introduced Incremental R-CNN, which integrates distillation loss into the detection framework, enabling the model to maintain performance on old classes as it expands to new categories [15]. In the same year, Peng et al. proposed Faster ILOD [43], which utilized knowledge distillation to design an efficient end-to-end incremental object detector, incorporating multi-network adaptive distillation to appropriately retain knowledge of old categories.

However, most existing incremental detection studies are based on the closed-world assumption, focusing only on the gradual expansion of the category set, without considering the unknown category targets that may appear in real-world scenarios. This is particularly true in the context of ship detection, where the visual differences between ship categories are subtle and new types of ships are continually emerging. Traditional incremental learning methods are prone to significant forgetting and confusion. Additionally, existing incremental learning methods generally overlook the specificity of rotated box detection and lack effective incremental learning mechanisms specifically for ship remote sensing images.

Unlike the above methods, this article proposes an incremental learning strategy for joint optimization of new and old classes, which differentially adjusts the parameter updates of the backbone network and detection head to balance the recognition capability of new and old class targets. At the same time, it combines a lightweight sample storage strategy to further improve recognition accuracy. Compared to these previous works, using our method to add any new tasks will not change the performance of old tasks.

3. Methods

3.1. Overall Architecture of the Proposed Method

In this section, we elaborate on our method for ship target detection and incremental recognition in an open-world setting through the fusion of unknown identification and incremental learning, as depicted in Figure 1. In the initial phase, we first obtain the activation vectors of the targets through a feature extraction network to capture the fine-grained features of the ships. Subsequently, based on extreme value theory, we model the features of the tail samples from all known categories, providing a statistical basis for subsequent unknown target discrimination. On this basis, we further design an adaptive unknown area proposal mechanism to accurately distinguish between known category and unknown category targets. In the second phase, we construct a mixed dataset using a small number of old class samples and new class samples, while also introducing a hierarchical elastic weight constraint mechanism to differentially adjust the parameter updates of the backbone network and the detection head, achieving joint optimization learning for old and new categories. Through this design, the model can continuously expand new classes while maintaining the recognition capability for old classes, thereby effectively balancing the detection performance of old and new classes in an open environment.

3.2. Fine-Grained Feature and Extreme Value–Based Unknown Recognition Module

In practical sea surface detection applications, ship detection models are often required to be deployed in open environments, where they frequently encounter unknown category targets not included during the training phase. However, traditional detectors generally rely on the closed-set assumption, which assumes that all detected targets belong to known categories. This assumption can easily lead to misclassification when faced with unknown or new ships, severely affecting the reliability of detection. Therefore, detection frameworks urgently need the ability to effectively distinguish between unknown targets and known categories to enhance adaptability and robustness in complex maritime scenarios. To address this challenge, we propose an Unknown Identification Module (FEUR) based on fine-grained feature differences and extreme value modeling, combined with tail distribution modeling to finely characterize the subtle feature differences of ship targets, achieving precise detection and identification of unknown ship targets.

In order to better identify unknown categories, it is necessary to observe the spatial distribution of ship target features. Considering the complex nonlinear relationships between the features of the ship targets, the t-Distributed Stochastic Neighbor Embedding (t-SNE) method is used for dimensionality reduction, and a visual analysis of the features is conducted to observe the local clustering characteristics at the sample level. The visualization results of different category features in the FGSRCS dataset are shown in Figure 2.

It can be seen from the figure that the boundaries between different categories of clusters are vague, indicating the presence of extreme sample points with long-tail characteristics in the high-dimensional feature space. Traditional fixed threshold strategies cannot effectively model such dynamic boundaries, leading to unknown categories in open scenarios being misidentified as known categories. Therefore, there is a need to design an adaptive threshold mechanism that estimates the probability density of tail samples to achieve open-set detection of ship targets. To address this issue, this paper proposes an adaptive unknown region proposal mechanism that enhances category separability by utilizing fine-grained feature differences, and combines extreme value modeling to capture the tail behavior of feature distribution, thereby effectively identifying and rejecting unknown categories, significantly improving the robustness of ship detection in open environments. The specific content of the FEUR module is as follows:

First, according to the Extreme Value Theory (EVT), the extreme value distribution of bounded data in a visual task with a clear upper limit of feature scores tends to follow a Weibull distribution, with the probability density function being

f (x; σ, ξ) = \frac{ξ}{σ} {(\frac{x}{σ})}^{ξ - 1} exp (- {(\frac{x}{σ})}^{ξ})

(1)

By fitting the Weibull distribution to the non-match scores of known categories, a dynamic rejection threshold can be calculated t:

t = σ {(- ln (1 - β))}^{\frac{1}{ξ}}

(2)

In the equation,

β

represents the significance level. If the highest non-match score of an input sample exceeds the threshold t, the sample is classified as an unknown category. To enable deep networks to adapt to unknown identification, the process of unknown identification is divided into training and inference phases. During the training phase, first, the penultimate layer of the classification head, that is, the input to SoftMax, is extracted as the activation vector

v (x)

for each known category. Then, by calculating the mean of each known category’s activation vector

v (x)

, the average activation vector (Mean Activation Vector, mAV) for that category is obtained, expressed mathematically as

μ_{c} = \frac{1}{N_{c}} \sum_{i = 1}^{N_{c}} v_{i c} (x)

(3)

where

μ_{c}

represents the average activation vector for class c,

N_{c}

is the number of samples belonging to class c,

v_{i c} (x)

denotes the activation vector of the i-th sample in class c.

Secondly, calculate the Euclidean distance of the correctly classified samples in each category to that category

μ_{c}

:

d_{i c} = {∥v_{i c} (x) - μ_{c}∥}_{2}

(4)

The Euclidean distances for samples within each category are sorted in descending order, and the largest

T a i l s i z e

distances are selected as the extreme tail values. Based on EVT, the Weibull distribution is fitted to these tail values using the FitHigh function from the libMR library, yielding the scale parameter

λ_{j}

and shape parameter

κ_{j}

for each class.

According to EVT, distance deviations associated with highly activated categories are more likely to indicate anomalies. Therefore, the activation vector

v (x)

is sorted in descending order, and the top

α

categories with the highest activation values

s (1), s (2), \dots, s (α)

are selected.

Compute the scaling factor

ω_{s (i)}

for the top

α

categories’ activation values:

ω_{s (i)} = 1 - \frac{α - i}{α} \cdot exp (- {(\frac{| | v (x) - μ_{s (i)} | |}{σ_{s (i)}})}^{ξ_{s (i)}})

(5)

where

v (x)

denotes the activation vector of the input sample,

μ_{s (i)}

represents the average activation vector of class

s (i)

,

σ_{s (i)}

and

ξ_{s (i)}

are the scale and shape parameters of the Weibull distribution for class

s (i)

obtained during the training phase.

Then, the activation values of the top

α

categories are scaled:

{\hat{μ}}_{s (i)} (x) = μ_{s (i)} (x) \cdot ω_{s (i)}

(6)

The activation values of the remaining categories

s (α + 1), s (α + 2), \dots, s (N)

remain unchanged.

Compute the pseudo-activation value for the unknown class:

{\hat{v}}_{0} (x) = \sum_{j = 1}^{N} v_{j} (x) \cdot (1 - ω_{j})

(7)

Thus, compared to SoftMax, the probability of unknown categories (

j = 0

) is introduced, and the probability calculation formula is

P (y = j| x) = \frac{e^{{\hat{v}}_{j} (x)}}{\sum_{i = 0}^{N + 1} e^{{\hat{v}}_{i} (x)}}

(8)

This formula replaces the input of the original SoftMax layer

v (x)

with the corrected activation vector

\hat{v} (x)

, making the network suitable for object recognition in open scenarios.

Through the above processing, the model relies on the discriminative ability of fine-grained feature differences and the statistical constraints of extreme value modeling, allowing it to break through the limitations of the traditional closed-set assumption. In the reasoning phase, it achieves open-set recognition, effectively distinguishing between known and unknown category targets, laying a solid foundation for subsequent incremental learning.

3.3. Joint Optimization–Based Incremental Learning Module

After achieving effective identification and rejection of unknown categories, the model must still possess the capability for continuous learning to adapt to the expanding characteristics of ship categories in practical applications. Traditional detectors often face the problem of ’catastrophic forgetting’ during incremental learning, where the introduction of new categories significantly weakens the discriminative ability for old categories. This phenomenon is particularly severe in tasks like ship object detection, where the differences between categories are minimal and their forms are highly similar, often leading to a sharp decline in detection performance. Therefore, how to efficiently learn new categories while maintaining the detection performance of old ones has become another core challenge that must be addressed in ship object detection in open environments.

To tackle this challenge, this paper designs an incremental recognition module for joint optimization learning of new and old categories, which differentially adjusts the parameter updates of the backbone network and detection head to balance the recognition abilities of new and old category targets. At the same time, a joint dataset is constructed using a small number of old category samples and new categories, introducing a joint optimization strategy during the training process, effectively alleviating forgetting and enhancing the overall stability of detection. The specific content of the JOIL module is as follows:

First, based on the fact that deep networks may achieve the same performance under different parameter configurations, it is possible that the optimal solution

θ_{B}^{*}

for task B is close to the optimal solution

θ_{A}^{*}

for task A. We constrain the network parameters to remain in the low error region of task A (around the center

θ_{A}^{*}

) to protect the performance of task A. From a Bayesian perspective, the optimization goal of the network can be expressed as

θ = arg min_{θ} L_{B} (θ) - l o g p (θ| D_{A})

(9)

The first term represents minimizing the loss on

D_{B}

, while the second term represents maximizing the posterior probability of the parameters on

D_{A}

, thereby preventing catastrophic forgetting. By applying Laplace approximation to

p (θ| D_{A})

and assuming it follows a Gaussian distribution, we have

\log p (θ| D_{A}) = log \frac{1}{\sqrt{2 π} σ} - \frac{{(θ - μ)}^{2}}{2 σ^{2}}

(10)

Let

f (θ) = l o g p (θ| D_{A})

, perform a second-order Taylor expansion of

f (θ)

:

f (θ) = f (θ_{A}^{*}) + ({\frac{\partial f (θ)}{\partial θ}|}_{θ = θ_{A}^{*}}) (θ - θ_{A}^{*}) + \frac{1}{2} {(θ - θ_{A}^{*})}^{T} ({\frac{\partial^{2} f (θ)}{\partial^{2} θ}|}_{θ = θ_{A}^{*}}) (θ - θ_{A}^{*}) + o (θ_{A}^{*})

(11)

θ_{A}^{*}

causes

f (θ)

to have a maximum value; therefore, the first derivative of

f (θ)

at

θ = θ_{A}^{*}

is 0, and the second derivative is negative. Ignoring terms of third order and above,

f (θ)

can be approximated as

f (θ) = log \frac{1}{\sqrt{2 π} σ} - \frac{{(θ - μ)}^{2}}{2 σ^{2}} \approx f (θ_{A}^{*}) + \frac{1}{2} {(θ - θ_{A}^{*})}^{2} f^{″} (θ_{A}^{*})

(12)

Since

f (θ_{A}^{*})

is a constant, Equation (9) can be written as

θ = arg min_{θ} L_{B} (θ) - \frac{1}{2} {(θ - θ_{A}^{*})}^{2} f^{″} (θ_{A}^{*})

(13)

Since

θ

is an n-dimensional vector, the Hessian matrix

f^{″} (θ_{A}^{*})

has a size of

n \times n

, making its computation expensive. Therefore, it is first transformed into the Fisher Information Matrix

F_{i j}

(the negative expectation of the Hessian):

F_{i j} = - E [f^{″} (θ_{A}^{*})] = -_{p (θ| D_{A})} [{\frac{\partial^{2} log p (θ| D_{A})}{\partial θ_{i} θ_{j}}|}_{θ = θ_{A}^{*}}]

(14)

To further reduce computational cost, we assume parameter independence and retain only the diagonal elements of the Fisher matrix. Simultaneously, based on the definition of the Fisher matrix, the second-order derivatives are transformed into squared first-order derivatives:

F_{i i} = - E_{p (θ| D_{A})} [{({\frac{\partial log p (θ| D_{A})}{\partial θ_{i}}|}_{θ = θ_{A}^{*}})}^{2}]

(15)

This expectation can be approximated via Monte Carlo sampling:

F_{i i} \approx - E_{x \sim D_{A}, y \sim p_{θ} (y| x)} [{({\frac{\partial log p (θ| D_{A})}{\partial θ_{i}}|}_{θ = θ_{A}^{*}})}^{2}]

(16)

where x is a sample drawn from task A, and y represents the model’s output for input x. In the experiments of this paper, the training set data is traversed, the gradient of the loss function for each sample is recorded, and the average is taken:

F_{i} = \frac{1}{|D_{A}|} {({\frac{\partial log p_{θ} (Y = y_{x}^{*}| x)}{\partial θ_{i}}|}_{θ = θ_{A}^{*}})}^{2}

(17)

The loss function for task B can be expressed as

L (θ) = L_{B} (θ) + \sum_{i} \frac{λ}{2} F_{i} {(θ_{i} - θ_{A, i}^{*})}^{2}

(18)

In the equation,

L_{B} (θ)

is the loss function for task B, and

λ

is the regularization parameter. Elastic Weight Consolidation (EWC) allows the network to effectively alleviate the problem of catastrophic forgetting without the need to store historical training data, by only saving the Fisher information matrix of the parameters from old tasks. In traditional EWC methods, a globally uniform regularization coefficient is usually applied to constrain all parameters of the network, which neglects the functional differences of different layers in learning new tasks. This paper uses a model

D_{o l d}

trained in Section 3.2 and applies EWC for incremental fine-tuning training on a dataset containing only CRS class data, resulting in a model

D_{n e w}

. By comparing the changes in different network layer parameters (i.e., the Fisher matrix) before and after fine-tuning, we visualized the 10 network layers with the largest changes, as shown in Figure 3.

The results show that the parameters of the ROI Head layer in the network change significantly, which is because the ROI Head is directly responsible for the classification and regression tasks of region proposals, making its parameters more sensitive compared to other modules. For the neck and backbone, since these modules mainly focus on the extraction and fusion of general features, the parameters of these modules have stabilized by the later stages of training. Therefore, this paper emphasizes constraining these parameters.

At the same time, we utilize a small number of old class samples for lightweight sample storage. Specifically, for each old category, we select the 5 samples with the minimum Euclidean distance to the average activation vector from the respective images as representative samples. These images effectively represent the core semantic features of the old classes, which are combined with the new class training data as the training set for the fine-tuning task. This approach facilitates joint optimization learning of both old and new classes, effectively overcoming the problem of catastrophic forgetting.

Through the above design, the new and old classes can achieve collaborative modeling and knowledge sharing in the feature space, allowing the model to effectively retain its discriminative ability for old classes while introducing new classes, thereby avoiding the common problem of catastrophic forgetting found in traditional incremental learning. The design of this module provides a solid guarantee for the continuous expansion of ship detection models in open environments.

4. Experiments

4.1. Datasets and Evaluation Metrics

To verify the performance of the proposed vessel target open set detection model, experiments were conducted on the FGSRCS dataset in this paper. FGSRCS is a fine-grained vessel target recognition dataset for complex scenes, and the data collection covers remote sensing images from over 280 military and civilian ports, military bases, and shipyards across 58 countries and regions globally. It integrates multi-source remote sensing data (such as historical images from Google Earth, WorldView, and Jilin-1) and collects a total of 1420 images under the interference of eight factors: thick clouds, thin fog, smoke, shadows, overexposure, sea clutter, ground facilities, and sea ice, from a high-resolution dataset of 4500 images.

For known categories, the mean average precision (mAP) is still used to verify whether the network maintains its ability to recognize the original categories.

In unknown target detection, we use Absolute Open-Set Error (A-OSE) to represent the number of unknown classes that are correctly located (detection box IoU with ground truth greater than a threshold) but incorrectly classified as known classes. Ideally, A-OSE = 0.

It is evident that A-OSE is related to the size of the dataset, and the results from different datasets will introduce biases. To address this, we propose the Wilderness Impact (WI) metric, which is defined as the proportion of unknown class targets misclassified as known classes out of the total known class detection results. The calculation equation is

WI = \frac{F P_{o}}{T P_{k} + F P_{k}} = \frac{A - OSE}{T P_{k} + F P_{k}}

(19)

In the formula,

T P_{k}

is the true positive of the known category, and

F P_{k}

is the false positive of the known category.

The Unknown Detection Recall (UDR) represents the proportion of detected unknown class targets (regardless of whether the classification is correct) to all unknown class targets, in order to verify the localization accuracy of the unknown classes. The calculation formula is

UDR = \frac{T P_{u} + {FN}_{u}^{*}}{T P_{u} + F N_{u}}

(20)

In the formula,

T P_{u}

is the true positive instances of the unknown category,

F N_{u}

is the positive instances of the unknown category, and

{FN}_{u}^{*}

is the number of unknown class targets recognized as known class targets.

The Unknown Detection Precision (UDP) represents the proportion of correctly detected unknown class targets among all detected unknown class targets (regardless of whether the categorization is correct) to validate the classification accuracy of unknown categories. The calculation formula is as follows:

UDP = \frac{T P_{u}}{T P_{u} + {FN}_{u}^{*}}

(21)

4.2. Implementation Details

The network chooses Oriented RCNN, training on a validation set that excludes all images containing CRS classes, and validating the algorithm’s performance on the original validation set. For training the Weibull model, the Tailsize is set to 10, meaning that for each category, 10 samples farthest from the mean activation vector (mAV) are selected for distribution modeling. During the unrecognized identification training phase, the features output by the ROI Extractor of Oriented RCNN are used as the activation vector

v (x)

for individual targets. To ensure the quality of the activation vectors, the ROIs input to the ROI Extractor are replaced with the ground truth annotations of the targets, obtaining the activation vectors

v (x)

for all targets in the training set, which in turn fit the Weibull model for each category. In the inference phase, the 2000 region proposals generated by RPN are used as input to the ROI Extractor, calculating the unknown probability for each region proposal. This result (i.e., the confidence of the region proposals) is combined with the region proposals themselves as input for the Non-Maximum Suppression (NMS) operation. In this paper, the confidence threshold for NMS is set to 0.05, and the IoU threshold between rotated boxes is set to 0.1. To ensure strict isolation between old and new categories during the incremental learning process and avoid data contamination, only CRS class images from the dataset that do not contain interference from other class targets are selected as training data. The fine-tuning process trains for a total of 18 epochs, with an initial learning rate of 0.0025, which decreases to 1/10 of the previous rate at the 12th and 16th epochs.

4.3. Ablation Analysis

(1) The necessity of the FEUR module: In order to verify the role and necessity of the proposed FEUR module in ship target detection in open scenarios, we conducted targeted ablation experiments on the FGSRCS dataset. Specifically, we built a baseline model that does not include the FEUR module and compared it with the complete model that incorporates this module. By comparing the differences in unknown category detection performance and overall detection accuracy between the two, we can effectively assess the contribution of the FEUR module in enhancing the model’s recognition capability for unknown categories and improving the robustness of the detection framework.

The experimental results are shown in Figure 4. The baseline model without the FEUR module can accurately identify known category PV with high confidence (as shown in Figure 4a), but for the unlearned category CRS target, it erroneously identifies it as SC and AC classes, and the bounding box localization is inaccurate, only covering a partial area of the ship target. In contrast, the baseline model with the FEUR module successfully detects and identifies the unknown category CRS (see Figure 4b). This phenomenon fully reveals the limitations of existing methods in real-world recognition scenarios, often leading to overly confident predictions for unknown targets. The proposed FEUR module effectively mitigates this issue, validating its necessity and practical value in ship target detection in open environments.

(2) Effectiveness of the FEUR module: In order to validate the role and effectiveness of the proposed FEUR module in ship target detection in open scenarios, we conducted targeted ablation experiments on the FGSRCS dataset. Specifically, we set the Tailsize to fixed values of 5, 10, and 15 to analyze the impact of Tailsize on unknown detection performance, and for each type of target, we set Tailsize to a variation of 5% of the number of instances of that target class. The unknown detection metrics under the above settings are shown in Table 1.

The following conclusions can be drawn from the table:

When the Tailsize is 10, the A-OSE and WI metrics are at their lowest, indicating that the risk of the model misclassifying unknown categories as known categories is the lowest;
As Tailsize increases, UDR gradually improves, which is because a larger Tailsize covers more tail samples, thereby enhancing its sensitivity to outliers (unknown categories). In extreme cases, when Tailsize equals the total number of samples, the model treats all targets as outliers. Although UDR = 1, the mAP for known categories becomes 0, losing its practical significance;
When Tailsize is 10, the model achieves the highest UDP, indicating that the model’s confidence calibration performance for detecting unknown categories is optimal at this parameter;
The dynamic selection strategy for Tailsize does not show a significant performance advantage.

The comparison of mAP metrics for different Tailsizes is shown in Table 2. It can be seen from the table that as Tailsize increases, the mAP for known categories gradually decreases. This is because the model needs to learn and adjust among more tail samples at this point, leading to confusion in classification decisions for known categories.

Based on the analysis above, if the application is geared towards the detection of unknown classes of targets, it is recommended to set a larger Tailsize; if the aim is to maintain the accuracy of recognized known categories, a smaller Tailsize is recommended. This article chooses a Tailsize of 10 to achieve a performance balance between the detection of unknown class targets and the recognition of known targets.

In addition, this paper conducted experiments on different types of ships and different scales of increments, setting ATD and LCS as unknown categories, while the remaining settings were consistent with the previous text. The UDR for the unknown category was 0.8379 and the UDP was 0.5103, which are not significantly different from the experimental results of a single unknown category. This indicates that the unknown detection method proposed in this paper has certain generalization ability for different types of targets, with the visual results shown in Figure 5.

(3) Effectiveness of the JOIL module: In order to evaluate the impact of different regularization coefficients

λ

on recognition accuracy in the proposed JOIL module, we designed and conducted comparative experiments on the FGSRCS dataset. Specifically, we set

λ

to 200, 500, 800, 1000, 2000, and 3000, and the experimental results are shown in Table 3.

From the table, it can be seen that the choice of the regularization coefficient

λ

has a significant regulatory effect on the balance between new and old tasks in the model. When

λ

< 2000, as

λ

increases, the mAP of new classes shows a downward trend, while the mAP of old classes gradually improves. This is because the posterior distribution

p (θ| D_{A})

of the model parameters gradually dominates the model’s optimization process, thereby suppressing the learning of new tasks. When

λ

is 3000, the network completely loses its ability to learn new tasks, indicating that a single

λ

value is difficult to adapt to task requirements. In the hierarchical elastic constraint strategy proposed in this paper, although the equivalent

λ

value of the backbone layer is 4000, both the mAP of new and old classes remain at a high level, verifying the effective balance between feature stability and task adaptability in this method.

4.4. Algorithm Performance Comparison

4.4.1. Unknown Detection Experiment Results

In order to validate the performance of the unknown detection method proposed for ship targets, we conducted comparative experiments on the FGSRCS dataset against advanced open-world detection models. In this paper, we label the high-score region proposals generated by the RPN that do not overlap with the ground truth as the unknown class. After removing all out-of-bounds region proposals and those that intersect with network output, we retain the top 5 region proposals with the highest scores as the unknown class. To evaluate the proposed framework, we compare it against several state-of-the-art methods (ORE [13], OW-DETR [35], UC-OWOD [36], PROB [38], VOS [39], and UnSniffer [40]) on the FGSRCS dataset. The experimental results are shown in Table 4.

As shown in Table 4, our method achieves superior performance across all metrics. While generative methods like VOS and UnSniffer show improvements over early approaches, they often struggle with the subtle inter-class differences of remote sensing ships or rely on Gaussian assumptions that do not fit the long-tail distribution of maritime data well. In contrast, by leveraging fine-grained feature modeling and Extreme Value Theory (EVT), our method effectively captures these tail distributions, resulting in the lowest A-OSE (20) and highest UDP (0.5238). Compared to the Baseline, which exhibits poor performance in both localization (UDR 0.5410) and precision due to background interference, our method achieves the highest UDR (0.8235). The results indicate that the proposed method has the best detection performance, with A-OSE, WI, UDR, and UDP values of 20, 0.0086, 0.8235, and 0.5238 respectively. The UDP of the baseline method is 0, indicating that the baseline method misclassifies a large number of background areas (such as waves and clouds) as unknown categories. In addition, due to the high similarity between ship targets within the category, baseline methods are also prone to mistakenly attributing unknown category targets to other known categories. In contrast, the proposed method fully explores the fine-grained feature differences between different ship categories, effectively distinguishing between known and unknown targets, as well as achieving precise localization of unknown targets, demonstrating its significant advantages in open scenarios. Some visual results of unknown detection are shown in Figure 6a–c, several results with accurate localization are presented, where the CRS class is successfully detected, and the bounding boxes are tightly surrounding the targets, indicating the capability of this method to detect unknown class ship targets. In Figure 6d, although the CRS class was also successfully detected, the localization error was larger due to interference from the superstructure of the CRS class targets. Figure 6e,f illustrate false alarms generated during the unknown detection process, where the former is caused by cloud interference and the latter by dock interference.

4.4.2. Incremental Learning Experimental Results

In order to verify the performance of our proposed incremental recognition method, we compared the results of full fine-tuning, elastic weight consolidation, the hierarchical elastic constraint method in this paper, the lightweight sample storage method, and the fine-tuning results of the elastic constraint combined with the lightweight sample storage method on the FGSRCS dataset, as shown in Table 5. Among them, full fine-tuning does not fix any parameters; elastic weight consolidation introduces a regularization term,

λ = 2000

; in our method,

λ = 200

, we weight the Fisher matrix such that the value of the Fisher matrix for the backbone layer of the network becomes 20 times that of the previous value, while for the neck and rpn_head layers, the Fisher matrix value becomes 10 times that of the previous value, simultaneously setting the Fisher matrix corresponding to the CRS class’s classification head to 0, in order to lift the constraints of the old classes on the new task classification layer; the lightweight sample storage method fixed all parameters outside the network classification head.

From the table, it can be seen that due to full fine-tuning directly adjusting all parameters without protecting the existing features, the mAP declines the most, with the mAP of class SC dropping by 71.4%. The traditional EWC method maintains better performance on old tasks, with the most severe forgetting occurring in class AC, which only dropped by 6.3%, but the mAP for new classes is relatively low. The hierarchical elastic constraint strategy proposed in this paper limits changes to the backbone network parameters through high-intensity constraints, protecting the basic features of the ship, while appropriately relaxing the constraints on the detection head, allowing the network to learn new class knowledge. As a result, the mAP for new class targets increased by 16.7% compared to EWC, leading to a slight overall improvement in mAP. The sample storage-based method directly retains the original feature distribution through old samples, maximizing the suppression of forgetting, achieving the highest mAP (83.4), but it requires access to the original training data.

In addition, this paper combines two incremental learning methods with sample storage-based methods. The traditional EWC combined with sample storage achieved a mean Average Precision (mAP) of 81.6, but the mAP for new classes was only 36.4, which is significantly lower than the methods using a single strategy. This is because the regularization gradient of EWC contradicts the optimization direction of stored samples, leading to inhibition of learning new tasks. In contrast, the method in this paper, when combined with the sample storage strategy, achieved an mAP of 91.3 for new classes, indicating that the hierarchical elastic constraint strategy effectively balances the accuracy of new and old classes.

This article compares the training costs of different methods, with results shown in Table 6. Due to catastrophic forgetting, full fine-tuning is only suitable for scenarios that do not consider historical task performance; EWC-based methods only require storing the Fisher information matrix and a very short training time to achieve good incremental learning results, making them suitable for scenarios where historical data is inaccessible (such as privacy constraints or limited storage); sample storage-based methods require some historical data, resulting in nearly double the training time compared to EWC-based methods; the method in this article combines sample storage, requiring the maximum storage space and training data, but can achieve the highest new class recognition accuracy, suitable for locations where some historical data can be accessed and high recognition accuracy is required.

5. Conclusions

This paper addresses the practical need for the continuous variation of ship categories and the frequent emergence of unknown targets in complex marine remote sensing scenarios, proposing an open-world detection framework focused on ship targets. This method introduces two key modules while maintaining high accuracy in detecting known categories: first, an unknown identification module based on fine-grained feature differences and extreme value modeling, aimed at enhancing the model’s ability to recognize and reject unknown categories; second, an incremental learning module that optimizes the learning of new and old categories together, effectively alleviating the catastrophic forgetting problem in traditional incremental learning and achieving efficient adaptation to new categories. Through the organic combination of these two components, the proposed method strikes a good balance between target localization accuracy, unknown category discrimination capability, and new category learning ability. Experimental results based on the FGSRCS dataset demonstrate that the proposed method significantly outperforms current mainstream open-world detection algorithms across multiple core evaluation metrics, not only validating the framework’s effectiveness and adaptability but also highlighting its potential for practical application in complex open environments.

Author Contributions

Methodology, writing the original draft, investigation, Y.L.; validation, revising, and editing, G.B.; Writing—review & editing, funding acquisition and project administration, J.H.; supervision, X.Z.; conceptualization, revising, and editing, T.H., J.W. and W.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Beijing Key Laboratory of Advanced Optical Remote Sensing Technology Fund.

Data Availability Statement

The used dataset is available on https://github.com/dwddw/FGSRCS (accessed on 20 December 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Prasad, D.K.; Rajan, D.; Rachmawati, L.; Rajabally, E.; Quek, C. Video processing from electro-optical sensors for object detection and tracking in a maritime environment: A survey. IEEE Trans. Intell. Transp. Syst. 2017, 18, 1993–2016. [Google Scholar] [CrossRef]
Tang, B.; Lu, R.; Yang, X.; Li, Y.; Li, Y.; Zhang, D.; Chen, S. R2PLoc: A Region-to-Point UAV Visual Geo-Localization Framework Leveraging Hierarchical Semantic Representation. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5643818. [Google Scholar] [CrossRef]
Hu, J.; Wei, Y.; Chen, W.; Zhi, X.; Zhang, W. CM-YOLO: Typical object detection method in remote sensing cloud and mist scene images. Remote Sens. 2025, 17, 125. [Google Scholar] [CrossRef]
Yao, Y.; Jiang, Z.; Zhang, H.; Zhao, D.; Cai, B. Ship detection in optical remote sensing images based on deep convolutional neural networks. J. Appl. Remote Sens. 2017, 11, 042611. [Google Scholar] [CrossRef]
Zhang, R.; Yao, J.; Zhang, K.; Feng, C.; Zhang, J. S-CNN-based ship detection from high-resolution remote sensing images. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, 41, 423–430. [Google Scholar] [CrossRef]
Zou, Z.; Shi, Z. Ship detection in spaceborne optical image with SVD networks. IEEE Trans. Geosci. Remote Sens. 2016, 54, 5832–5845. [Google Scholar] [CrossRef]
Zhou, K.; Zhang, M.; Wang, H.; Tan, J. Ship detection in SAR images based on multi-scale feature extraction and adaptive feature fusion. Remote Sens. 2022, 14, 755. [Google Scholar] [CrossRef]
Zhuo, Z.; Lu, R.; Yao, Y.; Wang, S.; Zheng, Z.; Zhang, J.; Yang, X. TAF-YOLO: A Small-Object Detection Network for UAV Aerial Imagery via Visible and Infrared Adaptive Fusion. Remote Sens. 2025, 17, 3936. [Google Scholar] [CrossRef]
Zhou, Y.; Zhu, Y.; Ren, H.; Kang, J.; Zou, L.; Wang, X. Refined Multi-modal Feature Learning Framework for Marine Target Detection Using Radar Sensor. Digit. Signal Process. 2025, 170, 105816. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Dosovitskiy, A. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Joseph, K.; Khan, S.; Khan, F.S.; Balasubramanian, V.N. Towards open world object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 5830–5840. [Google Scholar]
Peng, C.; Zhao, K.; Wang, T.; Li, M.; Lovell, B.C. Few-shot class-incremental learning from an open-set perspective. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022; pp. 382–397. [Google Scholar]
Shmelkov, K.; Schmid, C.; Alahari, K. Incremental learning of object detectors without catastrophic forgetting. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3400–3409. [Google Scholar]
Kirkpatrick, J.; Pascanu, R.; Rabinowitz, N.; Veness, J.; Desjardins, G.; Rusu, A.A.; Milan, K.; Quan, J.; Ramalho, T.; Grabska-Barwinska, A.; et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. USA 2017, 114, 3521–3526. [Google Scholar] [CrossRef]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
Er, M.J.; Zhang, Y.; Chen, J.; Gao, W. Ship detection with deep learning: A survey. Artif. Intell. Rev. 2023, 56, 11825–11865. [Google Scholar] [CrossRef]
Zhang, Z.; Zhang, L.; Wang, Y.; Feng, P.; He, R. ShipRSImageNet: A large-scale fine-grained dataset for ship detection in high-resolution optical remote sensing images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 8458–8472. [Google Scholar] [CrossRef]
Bentes, C.; Velotto, D.; Tings, B. Ship classification in TerraSAR-X images with convolutional neural networks. IEEE J. Ocean. Eng. 2017, 43, 258–266. [Google Scholar] [CrossRef]
Liu, G.; Zhang, Y.; Zheng, X.; Sun, X.; Fu, K.; Wang, H. A new method on inshore ship detection in high-resolution satellite images using shape and context information. IEEE Geosci. Remote Sens. Lett. 2013, 11, 617–621. [Google Scholar] [CrossRef]
Zhang, S.; Wu, R.; Xu, K.; Wang, J.; Sun, W. R-CNN-based ship detection from high resolution remote sensing imagery. Remote Sens. 2019, 11, 631. [Google Scholar] [CrossRef]
Kanjir, U.; Greidanus, H.; Oštir, K. Vessel detection and classification from spaceborne optical images: A literature survey. Remote Sens. Environ. 2018, 207, 1–26. [Google Scholar] [CrossRef] [PubMed]
Hu, J.; Li, Y.; Zhi, X.; Shi, T.; Zhang, W. Complementarity-aware Feature Fusion for Aircraft Detection via Unpaired Opt2SAR Image Translation. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5628019. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; Volume 25. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Tian, Z.; Shen, C.; Chen, H.; He, T. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
Yu, Y.; Yang, X.; Li, J.; Gao, X. A cascade rotated anchor-aided detector for ship detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2020, 60, 5600514. [Google Scholar] [CrossRef]
Su, H.; He, Y.; Jiang, R.; Zhang, J.; Zou, W.; Fan, B. DSLA: Dynamic smooth label assignment for efficient anchor-free object detection. Pattern Recognit. 2022, 131, 108868. [Google Scholar] [CrossRef]
Ren, Z.; Tang, Y.; Yang, Y.; Zhang, W. Sasod: Saliency-aware ship object detection in high-resolution optical images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5611115. [Google Scholar] [CrossRef]
Bendale, A.; Boult, T.E. Towards open set deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1563–1572. [Google Scholar]
Gupta, A.; Narayan, S.; Joseph, K.; Khan, S.; Khan, F.S.; Shah, M. Ow-detr: Open-world detection transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 9235–9244. [Google Scholar]
Wu, Z.; Lu, Y.; Chen, X.; Wu, Z.; Kang, L.; Yu, J. UC-OWOD: Unknown-classified open world object detection. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022; pp. 193–210. [Google Scholar]
Ma, S.; Wang, Y.; Wei, Y.; Fan, J.; Li, T.H.; Liu, H.; Lv, F. Cat: Localization and identification cascade detection transformer for open-world object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 19681–19690. [Google Scholar]
Zohar, O.; Wang, K.C.; Yeung, S. Prob: Probabilistic objectness for open world object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 11444–11453. [Google Scholar]
Du, X.; Wang, Z.; Cai, M.; Li, Y. Vos: Learning what you don’t know by virtual outlier synthesis. arXiv 2022, arXiv:2202.01197. [Google Scholar]
Liang, W.; Xue, F.; Liu, Y.; Zhong, G.; Ming, A. Unknown sniffer for object detection: Don’t turn a blind eye to unknown objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 3230–3239. [Google Scholar]
Li, Z.; Hoiem, D. Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 2935–2947. [Google Scholar] [CrossRef] [PubMed]
Rebuffi, S.A.; Kolesnikov, A.; Sperl, G.; Lampert, C.H. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2001–2010. [Google Scholar]
Peng, C.; Zhao, K.; Lovell, B.C. Faster ilod: Incremental learning for object detectors based on faster rcnn. Pattern Recognit. Lett. 2020, 140, 109–115. [Google Scholar] [CrossRef]

Figure 1. Overview of the proposed method.

Figure 2. Visualization results of different category target features in the FGSRCS dataset.

Figure 3. Parameter variation across different network layers.

Figure 4. The model’s recognition results for images containing CRS class. (a) Description of baseline model identification results without the FEUR module. (b) Description of baseline model identification results with the FEUR module.

Figure 5. Visualization results of open set detection for multiple unknown categories. (a) Example result 1, (b) Example result 2.

Figure 6. Partial unknown detection visualization results. It should be noted that red bounding boxes denote predictions for the Unknown Class, while cyan bounding boxes denote predictions for Known Classes: (a) Correct detection sample 1. (b) Correct detection sample 2. (c) Correct detection sample 3. (d) Poor positioning example. (e) Cloud layers cause false alarm examples. (f) The dock causes a false alarm example.

Table 1. Comparison of open set detection metrics with different tailsizes.

Tailsize	A-OSE	WI	UDR	UDP
5	21	0.0087	0.7059	0.4167
10	20	0.0086	0.8235	0.5238
15	21	0.0090	0.8431	0.5116
5%	22	0.0093	0.7647	0.4359

Table 2. Comparison of mAP metrics with different Tailsizes.

	Tailsize = 5	Tailsize = 10	Tailsize = 15	Tailsize = 5%
Unknown (CRS)	1.1	1.7	1.8	0.9
AC	87.2	85.3	85.3	85.3
AAS	82.7	79.8	78.2	78.2
ATD	81.6	79.7	79.7	79.7
CV	62.3	47.1	44.1	40.9
CR	65.7	62.1	62.1	62.1
DD	74.4	63.7	62.9	62.3
FF	87.5	76.4	75.0	73.9
LS	65.3	58.8	57.3	56.5
LCS	77.2	77.2	77.2	77.2
MWV	44.8	27.8	22.4	22.1
PV	67.5	62.1	61.8	59.4
RO	77.8	68.6	67.1	66.1
SUB	76.7	75.5	75.5	73.2
CTS	87.4	87.4	87.4	87.4
OT	77.0	75.0	72.6	73.9
SC	77.4	79.2	77.4	77.4
mAP (w/o CRS)	74.5	69.1	67.9	67.2

Table 3. Comparison of EWC method accuracy with different regularization coefficients

λ

.

Table 3. Comparison of EWC method accuracy with different regularization coefficients

λ

.

	$λ = 200$	$λ = 500$	$λ = 800$	$λ = 1000$	$λ = 2000$	$λ = 3000$
CRS	64.5	63.0	63.6	64.9	54.5	0.0
AC	86.2	86.9	87.3	89.0	89.6	89.3
AAS	85.0	88.7	92.2	91.9	92.7	93.1
ATD	56.4	69.6	73.7	79.3	85.4	85.3
CV	72.3	72.7	72.7	72.6	73.9	73.8
CR	71.6	72.9	74.9	76.7	81.8	81.8
DD	79.1	79.5	80.0	80.6	82.8	83.5
FF	86.9	87.1	87.0	87.1	87.2	87.3
LS	70.0	71.1	71.4	72.7	78.1	75.5
LCS	89.0	88.6	86.7	87.0	87.3	86.1
MWV	47.4	51.0	51.3	52.7	54.0.	53.2
PV	76.2	78.0	78.7	78.6	78.6	79.1
RO	55.6	65.3	75.0	83.2	86.7	86.7
SUB	81.2	87.3	81.3	81.2	88.6	88.4
CTS	82.8	85.4	89.3	90.6	94.1	94.6
OT	74.1	79.5	80.3	87.0	85.6	81.8
SC	71.9	82.7	85.3	85.8	87.7	89.7
mAP	73.5	77.0	78.3	80.1	81.7	78.2

Table 4. Comparison of open set detection results.

Methods	A-OSE	WI	UDR	UDP
Baseline method	48	0.0151	0.5410	0.0002
ORE	45	0.0142	0.5820	0.2215
OW-DETR	42	0.0135	0.6842	0.2850
UC-OWOD	38	0.0126	0.7105	0.3120
PROB	32	0.0110	0.7350	0.3980
VOS	35	0.0118	0.7420	0.3640
UnSniffer	29	0.0095	0.7850	0.4510
Ours	20	0.0086	0.8235	0.5238

Table 5. Comparison of mAP50 metrics for different incremental learning methods.

	Full Fine-Tuning	EWC	Our Method	Sample Storage	EWC + Sample Storage	Our Method + Sample Storage	Original Network
CRS	58.0	54.5	63.6	63.6	36.4	91.3	0.0
AC	67.3	89.6	89.6	89.5	89.3	89.0	95.6
AAS	65.3	92.7	93.4	94.0	93.7	89.8	95.5
ATD	47.0	85.4	85.7	86.7	85.8	87.3	88.0
CV	68.1	73.9	73.4	74.8	74.3	73.6	74.6
CR	68.3	81.8	81.4	82.1	82.3	82.0	83.7
DD	76.3	82.8	83.0	84.1	83.8	83.4	84.9
FF	86.2	87.2	87.3	87.3	87.3	86.6	87.5
LS	62.0	78.1	75.4	79.7	79.8	75.4	81.6
LCS	85.0	87.3	87.6	88.6	87.6	88.0	89.8
MWV	41.9	54.0	54.6	61.6	60.6	51.6	59.2
PV	73.2	78.6	78.7	78.9	79.0	72.5	78.3
RO	44.7	86.7	86.4	86.8	86.9	87.2	88.6
SUB	80.6	88.6	88.5	89.2	88.7	81.2	89.3
CTS	38.9	94.1	94.6	95.4	95.5	90.4	90.2
OT	62.7	85.6	85.0	86.4	86.3	81.6	90.6
SC	26.0	87.7	87.9	89.9	89.7	89.7	90.9
mAP	61.9	81.7	82.1	83.4	81.6	82.4	80.5

Table 6. Comparison of training costs for different incremental learning methods.

	Storage Content	Storage Space	Training Time
Full Fine-tuning	Training data for the new classes	216 MB	4.2 min
EWC/Our Method	Data for new classes + Fisher Information Matrix	356 MB	4.3 min
Sample Storage	Data for new classes + Representative samples from old classes	332 MB	12.7 min
Our Method + Sample Storage	Data for new classes + Representative samples from old classes + Fisher Information Matrix	472 MB	13.1 min

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, Y.; Bao, G.; Hu, J.; Zhi, X.; Hu, T.; Wang, J.; Wu, W. A Ship Incremental Recognition Framework via Unknown Extraction and Joint Optimization Learning. Remote Sens. 2026, 18, 149. https://doi.org/10.3390/rs18010149

AMA Style

Li Y, Bao G, Hu J, Zhi X, Hu T, Wang J, Wu W. A Ship Incremental Recognition Framework via Unknown Extraction and Joint Optimization Learning. Remote Sensing. 2026; 18(1):149. https://doi.org/10.3390/rs18010149

Chicago/Turabian Style

Li, Yugao, Guangzhen Bao, Jianming Hu, Xiyang Zhi, Tianyi Hu, Junjie Wang, and Wenbo Wu. 2026. "A Ship Incremental Recognition Framework via Unknown Extraction and Joint Optimization Learning" Remote Sensing 18, no. 1: 149. https://doi.org/10.3390/rs18010149

APA Style

Li, Y., Bao, G., Hu, J., Zhi, X., Hu, T., Wang, J., & Wu, W. (2026). A Ship Incremental Recognition Framework via Unknown Extraction and Joint Optimization Learning. Remote Sensing, 18(1), 149. https://doi.org/10.3390/rs18010149

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Ship Incremental Recognition Framework via Unknown Extraction and Joint Optimization Learning

Highlights

Abstract

1. Introduction

2. Related Works

2.1. Ship Object Detection

2.2. Unknown Identification and Open World Detection

2.3. Incremental Learning

3. Methods

3.1. Overall Architecture of the Proposed Method

3.2. Fine-Grained Feature and Extreme Value–Based Unknown Recognition Module

3.3. Joint Optimization–Based Incremental Learning Module

4. Experiments

4.1. Datasets and Evaluation Metrics

4.2. Implementation Details

4.3. Ablation Analysis

4.4. Algorithm Performance Comparison

4.4.1. Unknown Detection Experiment Results

4.4.2. Incremental Learning Experimental Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI