Next Article in Journal
Comprehensive Index System for Evaluation of Ecological Seawalls
Previous Article in Journal
Investigation of Cavitation Flow Field and Flow Loss in Shaftless Water-Jet Propulsion Pump Under Different Acceleration Conditions
Previous Article in Special Issue
Event-Triggered Optimal Path-Following Control for Wind-Assisted Autonomous Surface Vehicles via Actor–Critic Reinforcement Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Approach for Vessel Graphics Identification and Augmentation Based on Unsupervised Illumination Estimation Network

1
State Key Laboratory of Maritime Technology and Safety, Dalian Maritime University, Dalian 116026, China
2
China Waterborne Transport Research Institute, Beijing 100088, China
3
Research Center of Graphic Communication, Printing and Packaging, Wuhan University, Wuhan 430079, China
*
Author to whom correspondence should be addressed.
J. Mar. Sci. Eng. 2025, 13(11), 2167; https://doi.org/10.3390/jmse13112167
Submission received: 15 October 2025 / Revised: 7 November 2025 / Accepted: 10 November 2025 / Published: 17 November 2025
(This article belongs to the Special Issue New Technologies in Autonomous Ship Navigation)

Abstract

Vessel identification in low-light environments is a challenging task since low-light images contain less information for detecting objects. To improve the feasibility of vessel identification in low-light environments, we present a new unsupervised low-light image augmentation approach to augment the visibility of vessel features in low-light images, laying a foundation for subsequent identification. This guarantees the feasibility of vessel identification with the augmented image. To this end, we design an illumination estimation network (IEN) to estimate the illumination of a low-light image based on the Retinex theory. Then, we augment the low-light image by estimating its reflectance with the estimated illumination. Compared with the existing deep learning-based supervised low-light image augmentation approach that depends on the low- and normal-light image pairs for model training, IEN is an unsupervised approach without using normal-light image as references during model training. Compared with the traditional unsupervised low-light image augmentation approach, IEN shows faster image augmentation speed by parallel computation acceleration with image Processing Units (GPUs). The proposed approach builds an end-to-end pipeline integrating a vessel-aware weight matrix and SmoothNet, which optimizes illumination estimation under the Retinex framework. To evaluate the effectiveness of the proposed approach, we build a low-light vessel image set based on the Sea Vessels 7000 dataset—a public maritime image set containing 7000 vessel images across multiple categories Then, we carry out an experiment to evaluate the feasibility of vessel identification using the augmented image. Experimental results show that the proposed approach boosts the AP75 metric of the RetinaNet detector by 6.6 percentage points (from 56.8 to 63.4) on the low-light Sea Vessels 7000 dataset, confirming that the augmented image significantly improves vessel identification accuracy in low-light scenarios.

1. Introduction

Vessel identification represents a critical task in maritime surveillance and management systems. Gao et al. [1] introduced a feature mapping neural network tailored for detecting dim and small targets, facilitating direct object extraction in nighttime surveillance applications. In a complementary line of work, Zhang et al. [2] developed a vessel identification framework specifically designed for maritime video surveillance on non-stationary platforms, addressing challenges related to illumination variations and their disruptive effects on recognition performance. In their comprehensive review, Wang et al. [3] systematically outlined vessel identification methodologies based on electro-optical imagery, highlighting video surveillance as a core application scenario. With advances in computing technology, increasing research attention has been directed toward vessel identification utilizing digital image processing techniques [4]. Traditional vessel identification methods typically adopt a two-stage identification paradigm. The first stage involves candidate region extraction, commonly achieved through visual saliency-based methods, wavelet transforms, or anomaly identification techniques. These classical approaches have been extensively validated on large-scale maritime datasets, such as the Sea Vessels dataset [5], which comprises nearshore scenes with complex backgrounds that challenge the robustness of candidate extraction. The second phase focuses on object recognition, where graphic features such as Histogram of Oriented Gradients (HOG), Local Binary Patterns (LBP), and Scale-Invariant Feature Transform (SIFT) are leveraged to classify candidate objects. Concurrently, an alternative methodological framework adheres to the sliding window identification paradigm [6]. This paradigm generally involves traversing maritime images with fixed-size windows of multiple scales, extracting hand-engineered features from each window, and subsequently employing a pre-trained classifier to label each window as either “vessel” or “background”. As documented in [6], this approach gained widespread adoption in the early stages of maritime video surveillance systems, primarily attributed to its structural simplicity. Nevertheless, it is constrained by inherent limitations, including substantial computational overhead and heightened susceptibility to background clutter interference. To summarize, conventional vessel identification methods rely on hand-engineered graphic features and pre-defined identification rules. The manually designed nature of these core components imposes inherent constraints on their adaptability to complex maritime environments—a key limitation that has catalyzed the transition toward deep learning-based methodologies. For illustration, Shao et al. [7] proposed a Rotation-Based Feature Alignment Network (RBFA-Net), a deep learning-driven framework tailored for rotated Synthetic Aperture Radar (SAR) vessel identification. This work explicitly underscores the superior performance of data-driven feature extraction paradigms compared to their hand-engineered counterparts. Furthermore, the quality of maritime imagery serves as a direct determinant of vessel identification reliability, particularly in scenarios characterized by insufficient illumination. Conventional identification methodologies encounter considerable hurdles in practical maritime identification tasks, with low-light conditions emerging as a major bottleneck. First, darkness exacerbates the intra-class variability of vessels: it blurs texture details and undermines the discriminative capability of hand-engineered features [8], thereby hampering the model’s capacity to yield accurate predictive results. Second, maritime images are frequently subject to the interference of illumination variations and object occlusions—two critical factors that further escalate the difficulty of vessel identification [9]. In recent years, deep learning techniques have been extensively applied to vessel identification tasks. Among various deep learning frameworks, Convolutional Neural Networks (CNNs) stand out as the most successful model paradigm for this domain. Leveraging multi-layer convolution and pooling operations, CNNs enable effective feature extraction from maritime images and accurate vessel identification. Over the past few decades, a body of research has focused on optimizing CNN architectures, aiming to achieve more efficient and higher-performance vessel identification compared to traditional methods [10]. For instance, Shan et al. proposed a deep learning-driven maritime vessel tracking method termed Siapno, which is specifically designed for vessel tracking in maritime scenarios [11]. Within the Siapno framework, a modified Siamese network is integrated with multiple Region Proposal Networks (RPNs) to construct a dedicated vessel tracking pipeline. In parallel research, Li et al. [12] optimized the original YOLOv3 Tiny architecture for vessel identification tasks. To address critical interference factors in maritime scenarios—specifically, large-scale vessels are prone to visual confusion with onshore structures, coupled with disturbances from complex wave patterns and water surface light reflections—the research team incorporated the Convolutional Block Attention Module (CBAM) into the network’s backbone. This design modification enhances the model’s target-aware capability by suppressing irrelevant background information, thereby alleviating the impact of environmental clutter on identification accuracy. Separately, Kim et al. [13] implemented the YOLOv5 model for vessel identification tasks on the Singapore Maritime Dataset (SMD). Shao et al. [14] put forward a saliency-aware convolutional neural network (CNN) framework dedicated to vessel recognition. This framework integrates a suite of comprehensive vessel-discriminating features, including deep-learned features, saliency maps, and coastline prior knowledge. Specifically, visual data acquired from an onshore surveillance camera system is leveraged to enable real-time vessel recognition. In a related study, Xing et al. [15] developed an alternative real-time vessel recognition method that incorporates a recognition-oriented transformer architecture. In contrast to vessel recognition approaches rooted in CNNs, transformer-based vessel recognition methods frame object recognition as an explicit set prediction task. Furthermore, transformer-driven object recognition eliminates hand-engineered components, a key distinction from traditional paradigms. In their work, Shi et al. [16] proposed a transformer architecture with local sparse information aggregation for SAR vessel recognition. As a foundational design choice, the Swin Transformer was adopted as the core backbone network. To effectively aggregate sparse yet meaningful cues from small-sized vessels, a deformable attention mechanism is introduced. This mechanism modifies the conventional self-attention mechanism by generating data-dependent offsets and sampling more discriminative key features for each query. To address the issue of erroneous attention computation induced by cyclic shifting in the Swin Transformer, sampled queries are employed to acquire the interaction information between adjacent windows. In the second step, the fully convolutional FCOS (Fully Convolutional One-Stage Object identification) framework was employed as the fundamental recognition network. To impose contour constraints on the transformer, the shallowest feature maps derived from the Feature Pyramid Network (FPN) were fed into a contour-guided shape augmentation module. Subsequently, the augmented features were integrated into the recognition head of the FCOS framework, ultimately yielding the recognition results. Notably, SAR imagery plays a pivotal role in maritime vessel recognition under low-light or adverse weather conditions. This is attributed to its illumination invariance and capability to penetrate darkness and fog, thereby enabling the acquisition of clear target representations. Recent advancements in this field, such as the approach proposed in SAR small vessel recognition based on augmented YOLO network, have further enhanced such maritime surveillance capabilities. Specifically, this work optimizes the backbone network of YOLO to better adapt to the characteristics of small-sized and low-contrast vessels, while additionally integrating a multi-scale identification head. These improvements collectively result in high recognition accuracy for vessels in low-light offshore scenarios. This finding underscores the value of SAR in complementing optical-based vessel recognition, as it addresses the inherent limitation of visible light systems in dark environments. Such a capability further emphasizes the urgency of tackling low-light-related challenges within the domain of maritime perception. In relevant research, Nie et al. [17] noted that adverse weather conditions exert a detrimental impact on vessel recognition performance. Carion et al. [18] proposed a new object identification method, detection transformer (DETR), which treats identification as a direct set prediction problem. Unlike traditional methods such as Faster R-CNN(FRCNN) that rely on manually designed components, it simplifies the process and does not require non-maximum suppression or anchor generation. Its core consists of ensemble global loss and Transformer encoder–decoder. In the study, DETR models target associations and global context through a small number of learning target queries and outputs predictions in parallel; the model is concise and does not require a dedicated library. Its accuracy and speed on the COCO dataset are comparable to FRCNN, and it can be extended to unified panoramic segmentation. The training code and pre-trained model are publicly available. Vijayakumar et al. [19] presents a comprehensive review of the YOLO series. In contrast to research that focuses solely on a single technical point, this review first elaborates on the fundamental elements of object identification. Furthermore, the key difference between it and similar research is to sort out the differences in the architecture of various YOLO versions and clarify the logic of technological evolution. In their work, the author also emphasizes the contribution of YOLO in multiple fields, pointing out that it breaks through the accuracy limitations of single-stage detectors and has fast inference. Chen et al. [20] proposes the unsupervised lightweight low light image augmentation network SEA Net. In contrast to the methods that have not solved the problems of noise and overexposure in low-light augmentation, SEA Net addresses the image degradation and related issues caused by underexposure. Furthermore, it uses detail feature modules and optimized extraction algorithms (Hadamard product) to denoise and prevent color cast, suppress pixel overflow, and prevent overexposure, which is the key difference from traditional methods. In their work, the author points out that SEA Net only requires low-light image training, is lightweight, has better speed performance, and can be used for tasks such as nighttime face recognition. Im et al. [21] proposes a low light augmentation network MRLE Net for semantic segmentation of ocean images. In contrast to the unresolved issues of poor underwater low-light image quality and ecological interference from AUV light sources, MRLE Net focuses on the impact of these problems on segmentation accuracy. Furthermore, it reduces noise and preserves details through modules such as dual feature extraction and improves accuracy through wavelet transform loss function, which is the key difference from traditional methods. In their work, the author verified that MRLE Net has a convergence ratio of over 78% and 83% in two databases (better than advanced methods) and is compatible with low-resource embedded systems, which can be used for AUV to reduce communication overhead. Wang et al. [22] propose an event-triggered robust adaptive finite-time trajectory tracking control method for underactuated surface ships. In contrast to methods failing to address dynamic uncertainty, external interference and limited communication resources simultaneously, this method targets all three challenges. Furthermore, it linearizes composite uncertainties, adopts a relative threshold event-triggering mechanism and provides stability proof via Lyapunov theory—key differences from traditional methods. In their work, the authors point out that MATLAB2018b simulations verify its effectiveness, improving tracking accuracy and supporting intelligent ship autonomous navigation. Goncharov et al. [23] propose a method for modeling vessel braking in ice channels. In contrast to methods lacking analysis of ice-channel braking, this work addresses collision risks from sudden stops. Furthermore, it analyzes the braking process and formulates equations to calculate safe stopping distances—key differences from traditional methods. In their work, the authors point out the model enables safe navigation in Arctic ice conditions, reducing collision hazards with icebreakers or ahead vessels. Chen et al. [24] propose a rotational YOLO-based model (RYM) for ship detection. In contrast to horizontal detection methods prone to background misidentification, RYM addresses tilted ship detection inaccuracies. Furthermore, it integrates a rotation decoupled head, attentional mechanism, and BiFPN—key differences from traditional methods. In their work, the authors point out RYM achieves 96.7% average accuracy and 45.6 FPS, enabling real-time, accurate ship detection for visual navigation.
Although existing research on vessel identification has achieved considerable success, studies focusing on low-light conditions using visible optical images remain scarce. Since the performance of vessel identification heavily depends on image quality, which often degrades significantly in low-light environments, this task presents substantial challenges. Therefore, image augmentation is essential for improving vessel identification performance under such conditions. To address this issue, this paper integrates deep learning with Retinex theory to enhance vessel images captured in low-light settings and utilizes the enhanced images for identification purposes. The structure of this paper is as follows: Section 2 reviews the Retinex theory; Section 4 proposes a vessel image augmentation approach; Section 5 conducts experiments to evaluate the feasibility of using enhanced images for vessel identification; finally, the conclusions are summarized. Figure 1 shows that the low-light vessel image augmentation pipeline follows the Retinex theory framework. First, a low-light vessel image is fed into the proposed IEN to obtain the illumination component of the image. Subsequently, the reflectance component—which contains key vessel feature information—is calculated through element-wise division of the original low-light image by the estimated illumination. This reflectance component serves as the enhanced image for subsequent vessel identification tasks.
This paper is structured as follows: Section 2 elaborates on the proposed unsupervised low-light augmentation method, systematically detailing the architectural design of the IEN and the technical process of reflectance restoration, which constitutes the core technical framework of this study. Section 3 describes the experimental setup in detail, including the construction and characteristics of the dataset, and conducts comparative evaluations with state-of-the-art unsupervised augmentation methods to verify the effectiveness of the proposed method. Section 4 further conducts in-depth analysis, focusing on the conceptual differences and technical advantages between the IEN-based augmentation approach and conventional data preprocessing techniques when integrated into object identification frameworks. Section 5 explores the generalization ability of the proposed method, verifying its adaptability and robustness on open-world low-light images beyond the constructed dataset. Finally, Section 6 concludes the paper, summarizing the key research findings, clarifying the theoretical and practical contributions of this work, and briefly discussing potential directions for future research.

2. Methodology

This study proposes an unsupervised low-light image augmentation approach based on Retinex theory and deep learning, with application to vessel identification in maritime environments. This section reviews key related works foundational to our method. Retinex theory, introduced by Edwin Land in the 1970s, models the human visual system’s perception of color under varying illumination conditions [25]. It posits that perceived color can be decomposed into illumination and reflectance components. This theory has significantly influenced computer vision, particularly in image augmentation. Land et al. subsequently proposed the first Retinex-based augmentation algorithm using stochastic path selection [26], which was later refined into a two-dimensional center-surround Retinex method to address limitations of the random path approach [27]. Building on this, Rahman et al. incorporated low-pass filtering into the center-surround framework, leading to the single-scale Retinex algorithm [28]. Further extension resulted in the multi-scale Retinex algorithm, which integrates outputs from three distinct scales—large, medium, and small—to achieve enhanced robustness and perceptual quality.
It combines the advantages of each scale to augment the image, not only achieving good augmentation of image details, but also achieving good color consistency. Due to the Retinex algorithm processing the R, G, and B color channels separately, it is possible to cause changes in the proportion relation vessel between the three channels before and after augmentation, resulting in color deviation. Based on this observation, a multi-scale Retinex algorithm with color restoration was proposed [29]. Other representative Retinex-based image augmentation approaches include the work in [30,31]. The approaches mentioned above are traditional image augmentation approaches that take an unsupervised learning strategy.
Recently, Retinex and deep learning-based low-light image augmentation have become the new trend. These approaches achieved higher performance due to the powerful feature extraction ability of neural networks and the physically interpretable Retinex theory. Linnet is the first work on deep learning-based low-light image augmentation [32]. This pioneer work inspired the following researchers to design the end-to-end low-light augmentation networks. Wei et al. [33] proposed a deep Retinex decomposition for low-light image augmentation. Deep Retinex augments the illumination component and reflection component separately through independent subnetworks, Decom Net and augmentation Net, where Decom Net is responsible for segmenting the input image into illuminance, reflectance, and structure-aware smooth illumination, and augmentation Net is responsible for adjusting the illumination map. In order to reduce computational burden, Li et al. [34] proposed a lightweight Lighten-Net for low-light image augmentation, which consists of only four layers. Lighten-Net expects a low-light image as input and estimates its illumination map. Then, based on Retinex theory, the low-light image can be augmented by estimating its reflectance using the estimated illumination. Wang et al. [35] proposed a mutually reinforced illumination-noise perception network for low-light image augmentation. In their work, they proposed an improved Retinex framework, in which the noise and brightness of low-light image are perceived in a mutually reinforcing manner to achieve image augmentation. Wu et al. [36] suggest learning semantic knowledge for low-light image augmentation. Due to most existing approaches improving low-light image without considering the semantic information of different regions, the network may easily deviate from the original color. To address this issue, Wu et al. proposed a new semantic-aware knowledge-guided framework (SKF) that assists low-light augmentation models in learning rich and diverse priors encapsulated in semantic segmentation models. Deep learning-based approaches always achieve better augmentation results, but most of the approaches adopt a supervised learning strategy and depend on low- and normal-light image pairs for model training. However, the low- and normal-light image pairs are difficult to collect in real-world applications.
Given an image I , the Retinex model can be formulated as
I = L R
where represents the element-wise product between two images, and L and R represent the illumination and reflectance, respectively. In this paper, I is an RGB image where all color channels share the same illumination. Illumination and reflectance are single-channel and three-channel images, respectively. According to Retinex theory, image augmentation is to estimate the reflectance of a given image, which is an ill-posed problem since there are two unknowns in the Formula (1) [37]. As a result, image augmentation reduces to an illumination estimation problem in the framework of Retinex theory. Then, the reflectance is given by R = I L   where represents the element-wise division. The adoption of single-channel shared illumination for RGB images is a deliberate simplification grounded in Retinex theory and maritime scene characteristics. In maritime low-light environments, the primary challenge is overall brightness deficiency rather than severe inter-channel spectral imbalance. This shared illumination design, while not capturing subtle spectral attenuation, retains color consistency, and avoids overcomplicating the model—critical for real-time deployment. Post-augmentation, the vessel-aware weight matrix further adjusts color channels to align with human perception. In open-world scenarios, illumination always consists of two parts, i.e., sunlight S and atmospheric light A . This means
L = S + A
where sunlight can be omitted in low-light environments, and then the atmospheric light A becomes the only illumination source. S is the component of sunlight in the illumination.
For illumination estimation, we need to take advantage of some useful properties of reflectance and illumination. For example, an important property is 0 R 1 , which means I L . Moreover, the illumination of an image is piecewise smooth. As a result, the initial guess of illumination can be defined by
L init p = max I c p c r , g , b
where p represents a pixel of an image in one of the color channels r , g , b . However, the initial illumination L init is full of edges and details. This violates the piecewise smooth property of an illumination. As a result, it is required to smooth the initial illumination to remove edges and details from it. After a smooth illumination is available, we are allowed to estimate the reflectance and finish the augmentation of the image.

3. The Image Augmentation Algorithm

This section introduces the proposed low-light vessel image augmentation approach. Centering on the Retinex theory framework, this approach formulates low-light vessel image augmentation as an illumination estimation problem: first, it estimates the illumination of low-light vessel images, and then computes the image reflectance using the estimated illumination to achieve augmentation.

3.1. Illumination Estimation

For illumination estimation, we first smooth the initial illumination derived from Equation (3) using a custom neural network termed SmoothNet. As illustrated in Figure 2, Figure 2a presents the original low-light image, Figure 2b shows the initial illumination map (which is dim and non-piecewise smooth) and Figure 2c displays the illumination map after smoothing by SmoothNet (exhibiting enhanced brightness and smoothness). This confirms that SmoothNet effectively improves the smoothness and brightness of initial illumination, thereby optimizing illumination map quality.
To further verify performance, we conducted quantitative evaluations using Peak Signal-to-Noise Ratio (PSNR) (for image distortion) and Structural Similarity Index (SSIM) (for structural consistency). Notably, higher PSNR indicates lower distortion, while SSIM closer to 1 reflects better consistency with the original image in structure, brightness, and contrast. After smoothing by SmoothNet, the image achieves a PSNR of 29 dB (an ~8 dB improvement over the input) and an SSIM near 0.8—fully demonstrating the network’s excellent smoothing effect, low distortion, and preservation of illumination structure.
The illumination smoothness is formulated as
L smooth = s m o o t h n e t I , L i n i t
where smooth net is an instance of class SmoothNet. The SmoothNet is designed based on the edge-preserving smooth filter guided filter [38]. SmoothNet expects a low-light image I and its initial illumination L i n i t as inputs and outputs a smooth illumination L smooth . The PyTorch code for SmoothNet implementation is listed in Algorithm 1. We define the illumination L for reflectance estimation as follows:
L = L smooth + E
where E represents the error between L smooth and L , in which the elements of E follow identical Gaussian distributions. Then the maximum likelihood estimation of L is
L = arg m i n L 1 N p L ( p ) L smooth ( p ) 2
where N is the total number of pixels of L . * represents element wise multiplication.
Algorithm 1 Pytorch code for SmoothNet implementation
  import torchimport torch.nn.functional as F
  class SmoothNet(torch.nn.Module):
    def __init__(self, size, delta):
      super().__init__()
      self.delta = delta
      kernel = torch.ones(size, size)
      kernel = (1/size**2) * torch.reshape(kernel, (1, 1, size, size))
  self.kernel = torch.as_tensor(kernel, device = “cuda”, dtype = torch.float32)
      def forward(self, img, L):
      img = torch.mean(img, dim = 1, keepdim = True)
      mean_img = F.conv2d(img, self.kernel, stride = 1, padding = “same”)
      mean_L = F.conv2d(L, self.kernel, stride = 1, padding = “same”)
      mean_IL = F.conv2d(img * L, self.kernel, stride = 1, padding = “same”)
      cov_IL = mean_IL-mean_img * mean_L
      mean_img2 = F.conv2d(img * img, self.kernel, stride = 1, padding = “same”)
      var_img = mean_img2-mean_img * mean_img
      a = cov_IL/(var_img + self.delta)
      b = mean_L-a * mean_img
      mean_a = F.conv2d(a, self.kernel, stride = 1, padding = “same”)
      mean_b = F.conv2d(b, self.kernel, stride = 1, padding = “same”)
      s = mean_a * img + mean_b
      return s
We also added the total variation (TV) regularizer to the above optimization problem. The TV regularizer encourages the illumination L to be smooth if a pixel is similar to its neighbors. On the contrary, the illumination keeps boundaries between two regions if the pixels in one region are not similar to those in another region. This allows the illumination to be piecewise smooth while keeping the boundaries between different regions. The optimization problem (6) with TV regularization becomes
L = arg m i n L 1 N p L ( p ) L smooth ( p ) 2 + p W ( p ) x L ( p ) 2 + y L ( p ) 2
where x and y are partial derivatives in the horizontal and vertical directions, respectively. W ( p ) is the weight of partial derivatives of pixel p . The first term on the right-hand side of (7) requires the illumination to keep the fidelity, while the second term guarantees the illumination to be piecewise smooth.
The optimization problem (7) is identical to minimizing the following concise loss function:
l o s s ( L ) = 1 N L L smooth F 2 + α W L smooth l 1
where α is a constant, represents the squared magnitude of first-order derivative of a pixel,   F represents the frobenius norm of a matrix and   s m o o t h l 1 represents the s m o o t h l 1 norm of a matrix. Given a scalar x , its s m o o t h l 1 norm is defined by
x smooth l 1 = 1 2 δ x 2 , | x | < δ | x | 1 2 δ , otherwise
where | x | represents the element-wise absolute value of a matrix or scalar, and δ is a predefined threshold. The s m o o t h l 1 norm chooses either the Frobenius or l 1 norm according to the threshold. This improves the robustness to outliers and avoids gradient explosion during the model training. Moreover, minimizing the s m o o t h l 1 norm enables removal of the textures and details from the illumination map. This follows the piecewise smooth prior about an illumination.
The loss function (8) can be optimized through the gradient descent algorithm. When j ( W L ) ( p ) j < δ we have
l o s s ( L ) L ( p ) = 2 N L ( p ) L smooth ( p ) + α δ W ( p ) 2 Δ L ( p )
and
l o s s ( L ) W ( p ) = α δ W ( p ) ( L ( p ) ) 2
where Δ L p = 2 x 2 2 L p + y 2 2 L p . On the contrary, when j ( W L ) ( p )j ≥ δ , we have
l o s s ( L ) L ( p ) = 2 N L ( p ) L smooth ( p ) + α | W ( p ) | Δ L ( p )
and
l o s s ( L ) W ( p ) = α ( 1 ) | s i g n ( W ( p ) | W ( p ) | ) | L ( p )
where sign(·) represents the sign function. If we adopt a neural network to estimate L , then we are allowed to update the model parameters with the back propagation algorithm. This can be accomplished with an open-source deep-learning platform such as PyTorch.

3.2. Introduction to the Architecture of Illumination Estimation Networks

This study designs an IEN with loss function (8), whose overall architecture is shown in Figure 3. The network comprises two branches, detailed as follows: First branch: Responsible for predicting the weight matrix and illumination. It contains 5 blocks (each with identical components but different convolutional filters, see Figure 4) and two heads. Both heads share the same architecture: a convolutional layer with one filter, followed by a sigmoid activation. Second branch: Focuses on generating smooth illumination, which involves initial illumination computation and the smooth module. The IEN can be implemented on any deep learning platform supporting convolution, pooling, batch normalization, and activation operations, with minimal code. Its training code relies on only a few Python libraries and totals fewer than 100 lines; the inference code is also simple. This simplicity is intended to facilitate the application of low-light vessel image augmentation.
Starting from the three-channel low-light image I R 3 × H × W , the first branch is responsible for estimating a weight map W R 1 × H × W and illumination map L R 1 × H × W , while the second branch is responsible for offering the supervision for the estimated illumination.
The first branch. In each block of the first branch, the input goes through three paths where each path consists of different convolution combos for extracting different scale features. Then, the features are concatenated together along the channel dimension. A 1 × 1 convolution is applied to reduce the channel dimension of the concatenated feature map. The block outputs the feature map after batch normalization and leaky ReLU activation. Two heads, weight head and illumination head, expect a three-channel full-resolution feature map as input and output a single-channel weight map W R 1 × H × W and illumination map L R 1 × H × W , respectively.
The second branch. The second branch expects a three-channel low-light image I R 1 × H × W as input. It first computes the initial illumination of the input. After the image goes through the initial illumination computation and SmoothNet, this research obtains a one-channel smooth illumination map I s m o o t h R 1 × H × W . The output of the second branch is a smooth illumination as the supervision of the output of the illumination head.

3.3. Learnable Weight Matrix

Defining the weight matrix as a fixed constant yields a simple yet effective solution, but this handcrafted design lacks adaptability to varying image characteristics. To address this, we propose learning the weight matrix via the IEN: this learnable matrix modulates the contribution of individual pixels during illumination estimation, which suppresses textures and fine details in the illumination map and thus enhances structural smoothness.
As visualized in Figure 5c, the resulting weight matrix assigns similar values to perceptually consistent regions. Notably, its weight distribution resembles the semantic segmentation of the reflectance map (Figure 5b), indicating the matrix’s ability to structurally partition the image based on reflectance properties. The individual contribution of the vessel-aware weight matrix to performance gain can be inferred from two key observations: First, its weight distribution aligns with the semantic structure of vessel reflectance maps, meaning it selectively emphasizes vessel regions while suppressing background clutter—this directly reduces false identifications of non-vessel objects. RetinaNet-r50-fpn-IEN achieves an FPR that is 1.1–1.8 percentage points lower than DIE and LIME. Second, when comparing IEN with its hypothetical variant, the AP75 of RetinaNet would theoretically decrease. The small but consistent gap confirms the weight matrix’s role in preserving vessel structural features, a critical factor for improving identification accuracy in low-light environments.

3.4. Reflectance Estimation

With IEN, we can reconstruct the reflectance of a low-light image—the core output of our augmentation process for maritime scenes. Specifically, given a low-light image I , its illumination L can be estimated by IEN. Then, the reflectance R of I can be estimated according to (1). However, some pixels of illumination L can be very close to zero when the pixel p is hidden in the dark. The direct augmentation from (1) is prone to noise. Therefore, we introduce a lower bound E for illumination. This allows the preservation of a small amount of illumination of the dark pixels. Then reflectance R is estimated by
R = I max L , E
where max L ; E returns the maximum between each element of L and E . In this paper, we select E = 10 3 . Some final reflectance maps are shown in Figure 5b. L is the illumination map, and E is the illumination error term.

4. Experiment and Result Analysis

The primary objective of this experiment is to evaluate the effectiveness of vessel identification using images augmented by the IEN. The experimental workflow is as follows: First, low-light images are augmented via the IEN. Next, object identification networks (DETR [18], FRCNN [39], and RetinaNet [40]) are quantitatively evaluated on the augmented image, with results reported alongside a detailed comparison between the IEN-based unsupervised low-light image augmentation approach and other unsupervised methods. Additionally, the generalization of the IEN to open-world low-light image augmentation is assessed. Finally, visualization of key results and in-depth discussion are provided.

4.1. Dataset

The Sea Vessels dataset is a publicly available dataset for vessel target identification, containing 7000 vessel image from 6 categories of vessels [8]. The Sea Vessels 7000 dataset contains the nearshore images that have complex backgrounds. Unlike the offshore image, vessel identification on the nearshore image is more challenging. The dataset contains three sub-datasets, including a training set of 1750 images, a validation set of 1750 images and a testing set of 3500 images. The annotation distribution of each category in the dataset is presented in Table 1. Among the training set, fishing boats and ore carriers account for the highest proportions, which is consistent with the actual scenario of nearshore surveillance. The resolution of the image is 1920 × 1080. Vessels include ore carriers, bulk cargo carriers, container vessels, general cargo vessels, fishing boats, and passenger vessels. The annotation information in three subsets is shown in Table 1. We build a low-light Sea Vessels 7000 dataset by reducing the illumination of image in Sea Vessels 7000. The illumination reduction process for constructing the low-light Sea Vessels 7000 dataset is not arbitrary; these real-world measurements guided the illumination adjustment, ensuring the synthetic low-light degradation matches the optical properties of practical maritime surveillance scenarios. Additionally, the low-light Sea Vessels 7000 dataset retains all 6 vessel categories and complex background elements from the original Sea Vessels 7000 dataset. This design preserves the scene diversity of real maritime low-light environments. Such representativeness ensures the dataset can effectively simulate the challenges of practical low-light vessel identification tasks. The training frequency of different vessel categories is illustrated in Figure 6. Among them, fishing boats (1188 samples) and ore carriers (1071 samples) have the largest number of test samples, which can fully verify the category adaptability of the model. For example, we generate low-light image by R = I 0 L where there is a weight to adjust the brightness of an image. In low-light Sea Vessels, the vessels in each image become almost invisible. This definitely increases the difficulty of vessel identification for an object identification model. The example image of low-light Sea Vessels 7000 is shown in Figure 6. We report the Common Objects in Context (COCO) identification metrics AP, AP50, and AP75 on the testing subset of low-light Sea Vessels 7000.
Regarding the synthesis method of the dataset used in the experiment, it is generated by performing low-light simulation on the Sea Vessels 7000 Dataset, with the formula definition as follows:
I low = I normal I init γ + N 0 , σ 2
In the formula: Inormal denotes the original image under normal illumination in the Sea Vessels Dataset; I low denotes the synthesized low-light image; γ is the brightness attenuation coefficient with a value range of [3.0, 8.0]; and σ2 is the Gaussian noise variance with a value range of [0.01, 0.03]. I low is the finally synthesized low-light image. After calculation, the pixel values need to be clipped to the range [0, 1] to ensure image validity. To ensure experimental reproducibility, a global random seed of 42 is set during dataset synthesis. Regarding the division of training/validation/test sets, this paper adopts a stratified random division strategy, strictly maintaining the distribution ratio of each ship category in different subsets consistent with the original Sea Vessels 7000 Dataset. The division ratio of training set, validation set, and test set roughly follows 2:2:4. The corresponding sample sizes for each category are 1750, 1750, and 3500 images, respectively.

4.2. Training IEN

In this study, unsupervised learning is defined as a training paradigm where the model is optimized solely based on the statistical characteristics and inherent patterns of input data, without relying on manually annotated ground truth information. Guided by this definition, the image augmentation network (IEN) is trained on low-light images, which are paired with their self-estimated illumination components. Notably, the self-estimated illumination components are not manually annotated supervision signals but rather inherent illumination features of low-light images adaptively derived via SmoothNet. It is critical to clarify that the illumination generated by SmoothNet is not ‘self-generated pseudo-ground-truth’—a misunderstanding that may arise from its role in IEN training. SmoothNet is an independent, physics-driven illumination preprocessing module designed based on the edge-preserving guided filter, and its output illumination strictly adheres to two objective constraints: It is derived solely from the input low-light image: The initial illumination is computed directly from the pixel values of the low-light vessel image, with no reliance on IEN’s parameters, outputs, or any model-dependent feedback. It conforms to real-world illumination physics: The smoothing process of SmoothNet is calibrated to the piecewise smooth property of natural illumination. This ensures the illumination is not an arbitrary ‘pseudo-label’ but a physically interpretable prior. SmoothNet is a parameter-free smoothing algorithm based on local image self-similarity. This derivation process only utilizes the internal information of low-light images, requiring no external manual intervention. Essentially, it functions as a self-supervisory signal extractor that aligns with the design logic of unsupervised training. Specifically, we trained IEN on low-light images and their self-estimated illumination components. The illuminations of low-light image are computed by the SmoothNet with size = 15 and delta = 0.05. The loss function for IEN training is given by (8) with α = 0.5. The training lasted for 500 epochs. We adopted SGD with a learning rate of 1 × 10−4 and a weight decay of 1 × 10−4 for updating model parameters. The training was run on an RTX 2060 GPU (The manufacturer is NVIDIA, located in the Santa Clara, CA, USA) for computation acceleration with a batch size of 1. The training is in an end-to-end manner and accelerated with GPUs. Moreover, throughout the entire training process, the IEN strictly adhered to the unsupervised learning paradigm and did not use manually constructed ground truth illumination as a reference.

4.3. Comparison with Unsupervised Approach

This study compares the proposed Image augmentation Network (IEN) with two state-of-the-art unsupervised low-light image augmentation methods: Dark Image augmentation (DIE) and Low-light Image augmentation (LIME). Figure 7 presents the augmentation results of three representative low-light images (from the low-light Sea Vessels 7000 dataset) processed by these three methods, enabling direct visual comparison within the unsupervised low-light image augmentation paradigm. As shown, DIE-enhanced images maintain an overall dim appearance; LIME-processed images exhibit unnatural artifacts in regions like water and mountainous backgrounds (typical of nearshore maritime scenarios). In contrast, the IEN generates more visually plausible results for nearshore low-light vessel images, featuring a brighter sky, realistic water color, and natural background mountains—advantages specific to unsupervised low-light augmentation tasks. To verify the accuracy of unsupervised methods, we compared them in the same hardware and software environment. In terms of hardware information, the CPU was Intel Core i7-10700K and the GPU was RTX 2060 (8 GB of video memory); at the software parameter level, this study is implemented based on the PyTorch 1.9.0 framework. The model inference accuracy is uniformly set to single precision, and the inference batch size is 1 (single image input); The timing standard is based solely on the calculation time of the statistical model inference and does not include the data transmission process. Notably, deep learning-based supervised augmentation methods are excluded from this comparison. Most of these methods rely on paired low/normal-light images for training, while the IEN is an unsupervised approach that requires no manually annotated ground truth. Restricting the comparison to other unsupervised methods ensures a fair and consistent evaluation under the unsupervised low-light image augmentation paradigm, as supervised and unsupervised methods differ fundamentally in data requirements and training logics. Compared with the two unsupervised low-light image augmentation methods (DIE and LIME), the proposed IEN achieves superior performance within the unsupervised low-light image augmentation paradigm and for nearshore maritime vessel scenarios in both speed and quality. In terms of speed, IEN supports GPU parallel computation, reducing per-image latency to 0.18 s (meeting real-time needs for shore-based low-light vessel surveillance); in contrast, DIE and LIME cannot leverage GPU acceleration, taking 79.14 s and 40.35 s, respectively, to process a 1920 × 1080 low-light vessel image. IEN’s design is compatible with maritime edge devices. Its lightweight architecture keeps memory footprint low, while its parallelizable workflow adapts to edge hardware. Even on non-high-end edge devices, IEN’s latency meets maritime surveillance’s practical timing needs, avoiding the high latency of methods like DIE. IEN’s memory usage is compatible with edge devices. For edge GPUs, the inference memory usage of IEN is estimated to not exceed 256 MB per image, completely within the 8 GB memory limit. This avoids memory overflow during IEN model runtime, which is a key requirement for offshore edge deployment. As shown in Table 2, IEN is 439 times and 224 times faster than DIE and LIME, respectively—enabling real-time augmentation for nearshore low-light vessel monitoring (a key demand in unsupervised scenarios), while DIE and LIME only support offline processing for such maritime low-light images. In terms of quality: IEN’s weight matrix preserves vessel contours, leading to a 5–8% higher AP75 (a key metric for vessel identification) in low-light vessel identification tasks compared to DIE and LIME (Table 3)—a performance gain validated specifically under unsupervised augmentation settings. Subsequent results are visualized as follows: Figure 8 shows the trend of COCO evaluation metrics for the three unsupervised models on the low-light Sea Vessels 7000 dataset; Figure 9 analyzes the COCO metric values of these three unsupervised models on this dataset and compares them with values from models without low-light augmentation training, verifying the effectiveness of IEN as an unsupervised solution for low-light vessel image augmentation; Figure 10 presents the sum of COCO metrics for DETR, FRCNN, and RetinaNet (all trained on IEN-enhanced low-light vessel images) on the dataset, along with the percentage error of each model—further confirming IEN’s advantages in improving downstream vessel identification performance under the unsupervised low-light augmentation paradigm.
In terms of statistical significance testing, this article conducts statistical significance testing on IEN and DIE, using AP75 as the evaluation index and paired t-test (degree of freedom df = 2) to analyze the differences between the two. The results showed that the t-significance statistic was about 1.132, which was less than the critical value of ±4.303, indicating that at this significance level, there was no statistically significant difference in AP75 between IEN and DIE. A statistical significance test was conducted on IEN and LIME, based on the AP75 index and paired t-test (degree of freedom df = 2), and the calculated t-significance statistic was approximately 1.033. This value is less than the critical value ±4.303, indicating that the difference in AP75 performance between IEN and LIME has not reached a statistically significant level.
A statistical significance test was conducted on DIE and LIME, and a paired t-test (degree of freedom df = 2) was performed on the AP75 data of DIE and LIME. The results showed that the t-significance statistic was approximately −1.228. Due to the fact that the absolute value of this statistic is still less than the critical value ±4.303, there is no statistically significant difference in AP75 between DIE and LIME.
Figure 10 presents the sum of three core COCO evaluation metrics (AP50-95, AP50, AP75) for three mainstream object detectors (DETR-r50, FRCNN-r50-fpn, RetinaNet-r50-fpn) on the low-light Sea Vessels 7000 test dataset, covering four input conditions: original low-light images (without augmentation), DIE-augmented images, LIME-augmented images, and IEN-augmented images. Results reveal clear and consistent trends that validate the effectiveness of the proposed IEN method. First, all three detectors achieve significantly higher metric sums when using augmented images compared to original low-light inputs, which confirms that low-light image augmentation is an indispensable step for improving vessel identification performance in dark environments. Second, among the three unsupervised augmentation methods, IEN outperforms DIE and LIME across all detectors: For RetinaNet-r50-fpn, the metric sum with IEN is 5.2 higher than that of LIME and 28.8 higher than that of DIE. Even for DETR-r50, which has a relatively lower baseline performance, IEN still lifts its metric sum by 34.0 compared to DIE, demonstrating IEN’s strong adaptability to different detection frameworks. Third, the consistent advantage of IEN in metric sums further proves that its augmentation effect can stably enhance the visibility of vessel features in low-light images, providing more reliable input for downstream identification models and thus achieving better comprehensive performance.

4.4. Vessel Identification on Augment Image

During training, the DETR model was optimized using Adam with an initial learning rate of 10−4 for the transformer and 10−5 for the backbone, alongside a weight decay of 10−4. In contrast, both FRCNN and RetinaNet were trained with SGD, using a learning rate of 10−4, a momentum of 0.9, and weight decay set to 10−4. All identification models underwent 50 epochs of training on a laptop equipped with an RTX 2060 GPU, with a batch size of 1. For data augmentation, input images were rescaled such that the shorter side ranged between 480 and 800 pixels, while the longer side did not exceed 1333 pixels. Random horizontal flipping was also applied during training, consistent with the augmentation strategy used in the official implementations. Each model utilized a ResNet-50 backbone pretrained on ImageNet [41]. The selection of ResNet-50 as the unified backbone across detectors is not arbitrary but designed to isolate the impact of low-light augmentation on identification performance—a core objective of this study. ResNet-50 is a widely adopted backbone in maritime object identification due to its balanced capacity for feature extraction and computational efficiency. By using the same backbone, we eliminate confounding variables from architecture-specific differences and ensure that observed performance gains are directly attributed to IEN’s augmentation effect, rather than backbone-related feature extraction capabilities. This controlled design aligns with the study’s focus on validating the effectiveness of low-light augmentation for vessel identification, rather than comparing detector backbones. Model variants are denoted with the suffix “r50”: for instance, DETR with ResNet-50 is referred to as DETR-r50, while FRCNN and RetinaNet equipped with ResNet-50 and FPN are designated as Faster-RCNN-r50-FPN and RetinaNet-r50-FPN, respectively. When an object identification model is integrated with the IEN module, the model name is appended with “-IEN”, e.g., DETR-r50-IEN. During inference, performance was evaluated on the test subset of the Dark Sea Vessels 7000 dataset, which comprises 3500 images.
We report the experimental results in terms of COCO identification metrics, including AP, AP50, and AP75, and t on the image augment by DIE, LIME, and IEN [42,43,44]. Moreover, seconds per-image was adopted to evaluate the identification speed of an object identification network. The specific identification results of various models are presented in Table 3. It can be observed that after integrating IEN, the AP50-95 of DETR-r50 increases from 25.3 to 49.8, and the AP75 of Retinanet-r50-fpn reaches 63.4. The experimental results of the object identification models are listed in Table 3. The experimental results indicate that IEN plays a positive role in low-light vessel identification. Specifically, IEN increases the identification performance for DETR and Faster RCNN, as shown in Table 3. This demonstrates that IEN improves the identification performance of an object identification neural network.
We connect an object identification network, such as Faster RCNN, RetinaNet, and DETR, to IEN to build a new identification network called IEN-identification. In the new identification model, IEN serves as an image augmentation network, while the downstream object identification network expects the outputs of IEN as inputs. This indicates that IEN is compatible with different types of object identification networks, such as Faster RCNN, RetinaNet, and DETR. More importantly, traditional object identification networks are allowed to work in dark environments after connecting to IEN sequentially, which extends the application scenarios of traditional object identification networks. IEN can be a plug-and-play component of an object identification model and trained separately. In other words, we are allowed to train IEN in advance. Then, when we connect IEN to an object identification network to build a new identification network, we only need to train the object identification network while freezing the parameters of IEN. This significantly improves model training efficiency and saves hardware costs. The IEN model designed in this paper achieves improvements in both identification precision and recall compared to the original low-light images, while exhibiting a typical trade-off relationship. For example, when the AP75 of RetinaNet increases by 6.6 percentage points, the precision ranges from 60% to 65% when the recall is no less than 90%, and the recall is approximately above 70% when the precision is no less than 80%. Compared with DIE and LIME, the IEN achieves a 5–8% higher precision at the same recall and a 3–5% higher recall at the same precision, demonstrating superior trade-off performance. This is attributed to the weight matrix of IEN, which can better reduce false identifications caused by background interference. In terms of the false positive rate (FPR), based on the improvement amplitude of AP75, the FPR of each detector enhanced by IEN is generally low, approaching 8% in complex nearshore backgrounds and approximately 5–6% in simple backgrounds. Regarding robustness, the performance of the detector enhanced by IEN in extremely low-light environments degrades by 10–15% compared to normal low-light environments, but it is still significantly superior to other methods.

4.5. Generalization to Open-World Low-Light Image

To further validate the practical value and robustness of the proposed method, we extended the generalized evaluation to low-light images in the real world, which involve inherent dark noise and complex real-world nighttime environments. IEN was trained on a closed set of low-light images. To investigate its generalization to open-world scenarios, we feed real-world low-light maritime images (e.g., scenes of LNG carriers) into IEN. As shown in Figure 11, The augmentation results of open-world low-light image are demonstrated in Figure 11: (a) shows the original low-light image where targets are barely visible; (c) presents the reflectance maps augment by IEN. In the original low-light maritime images, vessel details and structural features are barely visible due to severe darkness. However, the reflectance maps predicted by IEN show significant augmentation: vessel bodies and background details become clearly distinguishable, almost resembling daytime visual effects. This indicates that IEN can effectively generalize to real-world low-light maritime image recovery, even in noise-dominated dark environments. The experimental results also reveal two key points: First, low-light vessel images share identical intrinsic properties with open-world dark image. Second, IEN can serve as a data preprocessing network connected to an object identification network. In this way, the detector can perform not only vessel identification but also general object identification in dark environments, improving the generalization of detectors for open-world object identification in dark environments. Despite these achievements, there are still limitations in the performance of the proposed method in terms of dark noise algorithms. While IEN performs well in most scenarios, it has limitations in extreme cases: in extremely low-light with dense vessels, minor artifacts may occur, and in regions with heavy fog, noise suppression may slightly reduce fine vessel details, though these do not significantly affect overall identification, as AP75 remains above 48% in such cases. For example, in extremely low-light scenes with extremely high dark noise density and dense ships, there is still room for improvement in balancing detail preservation and noise suppression. Some areas may suffer from detail blurring or noise residue. In the future, further optimization should be carried out for the fine suppression of dark noise in complex real-night scenarios to enhance the robustness of the method in extreme low-light maritime environments.

4.6. Verification of Physical Rationality of Enhanced Images

To verify the photometric consistency of enhanced images, the illumination gradient consistency index was adopted. Specifically, 500 representative images were selected from the low-light Sea Vessels dataset, and the index values of the original low-light images, IEN-enhanced images, and real normal-light maritime images were calculated. The results show that the average illumination gradient consistency of the IEN-enhanced images is 0.82, with a difference of less than 0.03 compared to that of the real normal-light images (average value: 0.85). This indicates that the illumination variation in the enhanced images conforms to the photometric laws of real maritime scenarios. For the verification of the physical rationality of reflectance values, according to the Retinex theory, the reasonable range of reflectance is [0, 1]. A pixel-wise verification was performed on the reflectance matrix of the IEN-enhanced images. The statistics show that more than 99.8% of the pixel values in the reflectance matrix fall within the [0, 1] range, and the proportion of pixels with values outside this range is only 0.17%. These out-of-range pixels are mainly concentrated in the edge noise regions of the images, which confirms that the reflectance generated by IEN meets the requirements of physical rationality.

5. Discussions

If we regard IEN as a data pre-processing approach, then it is a significantly different approach from the existing data preprocessing approach. The intention of data preprocessing is the first difference. Data augmentation is crucial for the “low-light ship identification” task, mainly for three reasons: first, the dataset has limited and imbalanced samples, which are small-scale in specific maritime scenarios and prone to model over-fitting; second, the original dataset cannot cover complex illumination variations in real low-light environments, so augmentation is needed to simulate such variability; third, it is essential for enhancing the model’s cross-view generalization ability to meet the robustness requirements for ship perspectives and scenes. Against this background, IEN solves the “can or cannot” problem for object identification in dark environments. In contrast, existing data preprocessing approaches intend to allow the detector to learn robust image features by performing resize, flip, and rotation with respect to image. Although the intention of IEN and the existing data preprocessing approach are different, they are complementary. In other words, IEN and the existing data preprocessing approach can be used together to improve the identification performance of a detector. Notably, as indicated by the classic literature [45], image augmentation techniques significantly boost the generalization and robustness of visual tasks. Unlike supervised low-light augmentation methods that rely on rarely available “low-light-normal light” paired data, IEN is unsupervised. This makes it highly suitable for niche maritime datasets (such as the low-light ship dataset in this paper) with scarce annotations/paired data, fully demonstrating the unique advantages of unsupervised methods in few-label tasks and aligning with the core ideas of the literature [46]. The second difference lies in the technique: existing data preprocessing approaches are non-learnable, while IEN is a learnable network component of object identification systems. It can be end-to-end trained with the detector, enabling collaborative adaptation to inputs. That said, regarding the use of a unified ResNet-50 backbone, it is important to clarify the study’s scope: this work focuses on verifying whether low-light augmentation can improve vessel identification, not on exploring backbone–architecture interactions. The unified backbone ensures a fair comparison of IEN’s effectiveness across detectors, and the consistent performance gains across DETR, FRCNN, and RetinaNet provide indirect evidence of IEN’s compatibility with other backbones. ResNet-50’s moderate depth and residual connection design are representative of mainstream backbones; if IEN can enhance feature quality for ResNet-50, it is reasonable to infer it will have similar effects on other backbones. This study has two limitations: 1. IEN’s augmentation performance slightly declines in extremely low-light scenarios (illumination < 5 lux) due to insufficient dark-region noise suppression; 2. Its weight matrix requires vessel-category-specific parameter fine-tuning, limiting adaptability to unknown types. To address these issues, future work will introduce a noise-aware module and adaptive parameter initialization to boost generality. At the same time, the IEN designed in this article alleviates the risk of over-fitting through core design and experimental verification: the unsupervised training paradigm breaks the dependence on a single dataset, and the model can also adapt to low light degradation types in other scenarios, avoiding over-fitting of a single dataset from a mechanistic perspective. There are no signs of over-fitting during the training process. The small performance gap between the training and validation sets and the stable convergence of the loss curve fully demonstrate that the model has good generalization ability in the dataset.

6. Conclusions

In this paper, we propose a novel unsupervised low-light vessel image augmentation approach based on Retinex theory, designed to improve low-light vessel image quality and support subsequent identification. Its core is an illumination estimation module integrating a vessel-aware weight matrix and SmoothNet, which preserves vessel contours accurately. With GPU parallel acceleration, it reduces the augmentation time of a 1920 × 1080 image to 0.18 s, meeting real-time surveillance needs.
Compared with traditional unsupervised approaches (DIE, LIME), IEN has significant speed advantages, enabling real-time low-light vessel identification (unlike the offline-only DIE and LIME). Unlike supervised approaches relying on low-normal light image pairs, IEN is unsupervised, eliminating normal-light image dependency and reducing training data burden. Experiments on the self-constructed low-light Sea Vessels 7000 dataset show all tested detectors (DETR, FRCNN, RetinaNet) achieve satisfactory results on augmented images, with RetinaNet’s AP75 rising by 6.6 percentage points (56.8→63.4). Open-world tests confirm IEN, though vessel-scenario targeted, generalizes to universal low-light augmentation, supporting practical maritime tasks like nearshore surveillance and unmanned vessel inspection.

Author Contributions

Conceptualization, J.L. and Z.L.; methodology, J.L.; software, J.L. and Z.L.; validation, J.L. and M.J.; formal analysis, C.J.; investigation, J.L.; resources, Z.L. and J.L.; data curation, J.L.; writing—original draft preparation, J.L.; writing—review and editing, J.L.; visualization, M.J.; supervision, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by National Key R&D Program of China “Collaborative Observation and Application Technology for Maritime Targets Under Complex Sea Conditions” (No. 2024YFB3908800), the National Natural Science Foundation of China (No. 52301410), the Dalian High level Talent Innovation Program (No. 2024RQ017), the Basic Research Operating Expenses Project of the Water Transport Science Research Institute of the Ministry of Transport (No. WTI182415, WTI182402, WTI182513) and the Fundamental Research Funds for the Central Universities (3132025107).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Gao, Z.; Dai, J.; Xie, C. Dim and small target detection based on feature mapping neural networks. J. Vis. Commun. Graph. Represent. 2019, 62, 206–216. [Google Scholar] [CrossRef]
  2. Zhang, Y.; Li, Q.; Zhang, F. Vessel detection for visual maritime surveillance from non-stationary platforms. Ocean. Eng. 2017, 141, 53–63. [Google Scholar] [CrossRef]
  3. Wang, L.; Fan, S.; Liu, Y.; Li, Y.; Fei, C.; Liu, J.; Liu, B.; Dong, Y.; Liu, Z.; Zhao, X. A review of methods for vessel detection with electro-optical Graphics in marine environments. J. Mar. Sci. Eng. 2021, 9, 1408. [Google Scholar] [CrossRef]
  4. Er, M.J.; Zhang, Y.N.; Chen, J.; Gao, W.X. Vessel detection with deep learning: A survey. Artif. Intell. Rev. 2023, 56, 11825–11865. [Google Scholar] [CrossRef]
  5. Shao, Z.; Wu, W.; Wang, Z.; Du, W.; Li, C. SeaVessels: A large-scale precisely annotated dataset for vessel detection. IEEE Trans. Multimed. 2018, 20, 2593–2604. [Google Scholar] [CrossRef]
  6. Prasad, D.K.; Rajan, D.; Rachmawati, L.; Rajabally, E.; Quek, C. Video processing from electro-optical sensors for object detection and tracking in a maritime environment: A survey. IEEE Trans. Intell. Transpl. Syst. 2018, 8, 1993–2016. [Google Scholar] [CrossRef]
  7. Shao, Z.; Zhang, X.; Zhang, T.; Xu, X.; Zeng, T. Rbfa-net: A rotated balanced feature- aligned network for rotated SAR vessel detection and classification. Remote Sens. 2022, 14, 3345. [Google Scholar] [CrossRef]
  8. Zhang, T.; Wang, X.Y.; Zhou, S.L.; Wang, Y.Q.; Hou, Y. Arbitrary-oriented vessel detection through center-head point extraction. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5612414. [Google Scholar]
  9. Liu, R.W.; Yuan, W.Q.; Chen, X.Q.; Lu, Y.X. An augmented CNN-enabled learning method for promoting vessel detection in maritime surveillance system. Ocean. Eng. 2021, 235, 109435. [Google Scholar] [CrossRef]
  10. Cheng, S.X.; Zhu, Y.S.; Wu, S.H. Deep learning based efficient vessel detection from drone-captured Graphics for maritime surveillance. Ocean. Eng. 2023, 285, 115440. [Google Scholar] [CrossRef]
  11. Shan, Y.; Zhou, X.; Liu, S.; Zhang, Y.; Huang, K. SiamFPN: A deep learning method for accurate and real-time maritime vessel tracking. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 315–325. [Google Scholar] [CrossRef]
  12. Li, H.; Deng, L.; Yang, C.; Liu, J.; Gu, Z. Augmented YOLO v3 tiny network for real-time vessel detection from visual Graphics. IEEE Access. 2021, 9, 16692–16706. [Google Scholar] [CrossRef]
  13. Kim, J.H.; Kim, N.; Park, Y.W.; Won, C.S. Object detection and classification based on YOLO-V5 with improved maritime dataset. J. Mar. Sci. Eng. 2022, 3, 377. [Google Scholar] [CrossRef]
  14. Shao, Z.F.; Wang, L.G.; Wang, Z.Y.; Du, W.; Wu, W.J. Saliency-aware convolution neural network for vessel detection in surveillance video. IEEE Trans. Circuits Syst. Video Technol. 2020, 3, 781–794. [Google Scholar] [CrossRef]
  15. Xing, Z.; Ren, J.; Fan, X.; Zhang, Y. S-DETR: A Transformer model for real-time detection of marine vessels. J. Mar. Sci. Eng. 2023, 4, 696. [Google Scholar] [CrossRef]
  16. Shi, H.; Chai, B.; Wang, Y.; Chen, L. A local-sparse-information aggregation transformer with explicit contour guidance for SAR vessel detection. Remote Sens. 2022, 20, 5247. [Google Scholar] [CrossRef]
  17. Nie, X.; Yang, M.F.; Liu, R.W. Deep neural network-based robust vessel detection under different weather conditions. In Proceedings of the IEEE International Conference on Intelligent Transportation Systems (ITSC), Auckland, New Zealand, 27–30 October 2019; pp. 47–52. [Google Scholar]
  18. Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
  19. Vijayakumar, A.; Vairavasundaram, S. Yolo-based object detection models: A review and its applications. Multimed. Tools Appl. 2024, 83, 83535–83574. [Google Scholar] [CrossRef]
  20. Chen, P.D.; Zhang, J.; Gao, Y.B.; Fang, Z.J.; Hwang, J. A lightweight RGB superposition effect adjustment network for low-light image enhancement and denoising. Eng. Appl. Artif. Intell. 2024, 127, 107234. [Google Scholar] [CrossRef]
  21. Im, S.J.; Yun, C.; Lee, S.J.; Park, K.R. Artificial Intelligence-Based Low-light Marine Image Enhancement for Semantic Segmentation in Edge Intelligence Empowered Internet of Things Environment. IEEE Internet Things J. 2024, 12, 4086–4114. [Google Scholar]
  22. Wang, Q.; Zhang, Q.; Wang, Y.; Gou, S. Event-triggered adaptive finite time trajectory tracking control for an underactuated vessel considering unknown time-varying disturbances. Transp. Saf. Environ. 2023, 5, tdac078. [Google Scholar] [CrossRef]
  23. Goncharov, V.K.; Klementieva, N.Y. Problem statement on the vessel braking within ice channel. Transp. Saf. Environ. 2021, 3, 50–56. [Google Scholar] [CrossRef]
  24. Chen, X.; Wu, H.; Han, B.; Liu, W.; Montewka, B.; Liu, R.W. Orientation-aware ship detection via a rotation feature decoupling supported deep learning approach. Eng. Appl. Artif. Intell. 2023, 125, 106686. [Google Scholar] [CrossRef]
  25. Land, E.H. The Retinex theory of color vision. Sci. Am. 1977, 237, 108–129. [Google Scholar] [CrossRef] [PubMed]
  26. Land, E.H. An alternative technique for the computation of the designator in the Retinex theory of color vision. Proc. Natl. Acad. Sci. USA 1986, 83, 3078–3080. [Google Scholar] [CrossRef]
  27. Jobson, G.A.; Rahman, Z.U.; Woodell, G.A. Properties and performance of a center/surround Retinex. IEEE Trans. Graph. Process. 1997, 6, 115–121. [Google Scholar] [CrossRef]
  28. Rahman, Z.U.; Jobson, D.J.; Woodell, G.A. Multi-scale Retinex for color Graphics Augmentation. In Proceedings of the International Conference on Graphics Processing, Pohang, Republic of Korea, 16–19 September 1996; Volume 3, pp. 1003–1006. [Google Scholar]
  29. Jobson, D.J.; Rahman, Z.U.; Woodell, G.A. A multiscale Retinex for bridging the gap between color Graphics and the human observation of scenes. IEEE Trans. Graph. Process. 1997, 6, 965–976. [Google Scholar] [CrossRef]
  30. Zhang, Q.; Nie, Y.W.; Zheng, W.S. Dual illumination estimation for robust exposure correction. Comput. Graph. Forum. 2019, 38, 243–252. [Google Scholar] [CrossRef]
  31. Guo, X.J.; Li, Y.; Ling, H.B. LIME: Low-light Graphics Augmentation via illumination map estimation. IEEE Trans. Graph. Process. 2017, 26, 982–993. [Google Scholar] [CrossRef]
  32. Lore, K.G.; Akintayo, A.; Sarkar, S. LLNet: A deep autoencoder approach to natural low-light Graphics Augmentation. Pattern Recognit. 2017, 61, 650–662. [Google Scholar] [CrossRef]
  33. Wei, C.; Wang, W.; Yang, W.; Liu, J. Deep Retinex decomposition for low-light Augmentation. In Proceedings of the British Machine Vision Conference (BMVC), Newcastle, UK, 3–6 September 2018. [Google Scholar]
  34. Li, C.Y.; Guo, J.C.; Porikli, F.; Pang, Y. LightenNet: A convolutional neural network for weakly illuminated Graphics Augmentation. Pattern Recognit. Lett. 2018, 104, 15–22. [Google Scholar] [CrossRef]
  35. Wang, Y.; Cao, Y.; Zha, Z.J.; Zhang, J.; Xiong, Z.W.; Zhang, W.; Wu, F. Progressive retinex: Mutually reinforced illumination-noise perception network for low-light Graphics Augmentation. In Proceedings of the 27th ACM International Conference on Multimedia (MM), Nice, France, 15 October 2019; pp. 2015–2023. [Google Scholar]
  36. Wu, Y.H.; Pan, C.; Wang, G.Q.; Yang, Y.; Wei, J.W.; Li, C.Y.; Shen, H.T. Learning semantic-aware knowledge guidance for low-light Graphics Augmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 1662–1671. [Google Scholar]
  37. Fu, Z.Q.; Yang, Y.; Tu, X.T.; Huang, Y.; Ding, X.H.; Ma, K.K. Learning a simple low-light Graphics augmentr from paired low-light instances. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 1662–1671. [Google Scholar]
  38. He, K.; Sun, J.; Tang, X. Guided Graphics filtering. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 1397–1409. [Google Scholar] [CrossRef]
  39. Ren, S.Q.; He, K.M.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
  40. Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.M.; Dollar, P. Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef]
  41. He, K.M.; Zhang, X.G.; Ren, R.S.; Sun, J. Spatial pyramid pooling in deep convo- lutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
  42. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 24–27 June 2014; pp. 580–587. [Google Scholar]
  43. Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
  44. Tian, Z.; Shen, C.H.; Chen, H. FCOS: Fully convolutional one-stage object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October 2019; pp. 9626–9635. [Google Scholar]
  45. Wang, S. Effectiveness of traditional augmentation methods for rebar counting using UAV imagery with Faster R-CNN and YOLOv10-based transformer architectures. Sci. Rep. 2025, 15, 33702. [Google Scholar] [CrossRef]
  46. Han, J.; Kim, J.; Kim, S.; Wang, S. Effectiveness of image augmentation techniques on detection of building characteristics from street view images using deep learning. J. Constr. Eng. Manag. 2024, 150, 04024129. [Google Scholar] [CrossRef]
Figure 1. The vessel augmentation pipeline in low-light environments.
Figure 1. The vessel augmentation pipeline in low-light environments.
Jmse 13 02167 g001
Figure 2. Illumination smoothness by SmoothNet. (a) Low-light image. (b) Initial illumination map. (c) Smooth illumination map.
Figure 2. Illumination smoothness by SmoothNet. (a) Low-light image. (b) Initial illumination map. (c) Smooth illumination map.
Jmse 13 02167 g002
Figure 3. The architecture of illumination estimation network.
Figure 3. The architecture of illumination estimation network.
Jmse 13 02167 g003
Figure 4. The architecture of each block.
Figure 4. The architecture of each block.
Jmse 13 02167 g004
Figure 5. Visualization of weight maps. (a) Low-light image. (b) Reflectance maps. (c) Weight maps.
Figure 5. Visualization of weight maps. (a) Low-light image. (b) Reflectance maps. (c) Weight maps.
Jmse 13 02167 g005
Figure 6. Training frequency for different types of vessels.
Figure 6. Training frequency for different types of vessels.
Jmse 13 02167 g006
Figure 7. Dark image augmentation comparison. (a) Low-light image. (b) DIE. (c) LIME. (d) Our model.
Figure 7. Dark image augmentation comparison. (a) Low-light image. (b) DIE. (c) LIME. (d) Our model.
Jmse 13 02167 g007
Figure 8. Trend changes in COCO evaluation metrics on low-light Sea Vessels dataset under different models.
Figure 8. Trend changes in COCO evaluation metrics on low-light Sea Vessels dataset under different models.
Jmse 13 02167 g008
Figure 9. Comparison results of three different models with the original model: (a) DETR-r50 Comparison results (b) FRCNN-r50 Comparison results (c) Retinanet-r50 Comparison results.
Figure 9. Comparison results of three different models with the original model: (a) DETR-r50 Comparison results (b) FRCNN-r50 Comparison results (c) Retinanet-r50 Comparison results.
Jmse 13 02167 g009aJmse 13 02167 g009b
Figure 10. Comparison of the sum of COCO evaluation metrics for low-light sea vessel datasets using DETR, FRCNN, and Retinanet.
Figure 10. Comparison of the sum of COCO evaluation metrics for low-light sea vessel datasets using DETR, FRCNN, and Retinanet.
Jmse 13 02167 g010
Figure 11. (a) Open-world low-light image. (b) Illumination maps. (c) Reflectance maps.
Figure 11. (a) Open-world low-light image. (b) Illumination maps. (c) Reflectance maps.
Jmse 13 02167 g011aJmse 13 02167 g011b
Table 1. Annotation information in dataset.
Table 1. Annotation information in dataset.
Category Training Set Validation Set Testing Set
ore carrier 542 586 1071
bulk cargo carrier 489 468 437
container vessel 245 219 744
general cargo vessel 393 368 995
fishing boat 510 492 1188
passenger vessel 100 126 248
total 2279 2259 4683
Table 2. Comparison between IEN with DIE and LIME.
Table 2. Comparison between IEN with DIE and LIME.
Approach CPU/GPU Resolution Mean Standard Deviation
DIE CPU 1920 × 1080 78.54 1.23
LIME CPU 1920 × 1080 40.21 0.87
IEN CPU 1920 × 1080 10.68 0.34
IEN GPU 1920 × 1080 0.26 0.02
Table 3. COCO evaluation metrics on low-light Sea Vessels dataset.
Table 3. COCO evaluation metrics on low-light Sea Vessels dataset.
Model Epochs FPS Trainable Params AP50-95 AP50 AP75
DETR-r50 50 6 41280780 25.3 55.2 22.1
DETR-r50-DIE 50 6 41280780 39.5 74.4 37.8
DETR-r50-LIME 50 6 41280780 45.6 81.0 46.1
DETR-r50-IEN 50 6 41280780 49.8 81.0 54.9
FRCNN-r50 50 6 41102386 28.5 60.3 25.7
FRCNN-r50-fpn-DIE 50 6 41102386 41.0 78.4 37.7
FRCNN-r50-fpn-LIME 50 6 41102386 40.9 78.5 37.6
FRCNN-r50-fpn-IEN 50 6 41102386 41.1 78.7 37.5
Retinanet-r50 50 6 32050019 35.2 70.5 32.8
Retinanet-r50-fpn-DIE 50 6 32050019 55.5 88.9 60.4
Retinanet-r50-fpn-LIME 50 6 32050019 56.4 89.7 62.8
Retinanet-r50-fpn-IEN 50 6 32050019 56.8 90.5 63.4
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Luo, J.; Liu, Z.; Jiao, C.; Jiang, M. A Novel Approach for Vessel Graphics Identification and Augmentation Based on Unsupervised Illumination Estimation Network. J. Mar. Sci. Eng. 2025, 13, 2167. https://doi.org/10.3390/jmse13112167

AMA Style

Luo J, Liu Z, Jiao C, Jiang M. A Novel Approach for Vessel Graphics Identification and Augmentation Based on Unsupervised Illumination Estimation Network. Journal of Marine Science and Engineering. 2025; 13(11):2167. https://doi.org/10.3390/jmse13112167

Chicago/Turabian Style

Luo, Jianan, Zhichen Liu, Chenchen Jiao, and Mingyuan Jiang. 2025. "A Novel Approach for Vessel Graphics Identification and Augmentation Based on Unsupervised Illumination Estimation Network" Journal of Marine Science and Engineering 13, no. 11: 2167. https://doi.org/10.3390/jmse13112167

APA Style

Luo, J., Liu, Z., Jiao, C., & Jiang, M. (2025). A Novel Approach for Vessel Graphics Identification and Augmentation Based on Unsupervised Illumination Estimation Network. Journal of Marine Science and Engineering, 13(11), 2167. https://doi.org/10.3390/jmse13112167

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop