Multi-Feature Fusion-Guided Low-Visibility Image Enhancement for Maritime Surveillance

: Low-visibility maritime image enhancement is essential for maritime surveillance in extreme weathers. However, traditional methods merely optimize contrast while ignoring image features and color recovery, which leads to subpar enhancement outcomes. The majority of learning-based methods attempt to improve low-visibility images by only using local features extracted from convolutional layers, which signiﬁcantly improves performance but still falls short of fully resolving these issues. Furthermore, the computational complexity is always sacriﬁced for larger receptive ﬁelds and better enhancement in CNN-based methods. In this paper, we propose a multiple-feature fusion-guided low-visibility enhancement network (MFF-Net) for real-time maritime surveillance, which extracts global and local features simultaneously to guide the reconstruction of the low-visibility image. The quantitative and visual experiments on both standard and maritime-related datasets demonstrate that our MFF-Net provides superior enhancement with noise reduction and color restoration, and has a fast computational speed. Furthermore, the object detection experiment indicates practical beneﬁts for maritime surveillance.


Introduction
With the growth of the Internet of Things and artificial intelligence, the perception efficiency of maritime sensors has been employed for different tasks in ocean engineering, e.g., vessel trajectory prediction [1,2] and maritime surveillance [3].In particular, visual sensors are widely used because of their unique intuitiveness and high timeliness [4].However, imaging devices working in extremely low-visibility conditions, typically lowlight and hazy, will generate images with severe distortion [5,6], which constantly suffer from low contrast, non-uniform noise, and details lost, as shown in Figure 1.Undoubtedly, the negative impact of low visibility will make it tricky to analyze critical information in the image, which brings difficulty in subsequent tasks [7].For instance, it has been proven that low visibility will reduce the precision of object detection [8][9][10], image semantic segmentation [11,12], etc.Therefore, an effective and real-time method for low-visibility image enhancement is required in various domains, such as visual navigation [13], maritime management [14], etc.
Many academics have attempted to improve extremely low-visibility photos with both hardware-and software-enabled methods during the past several decades.The former attempts to increase the robustness of the visual sensors by applying extra artificial light sources, such as infrared and ultraviolet flashes [15], while the latter is more popular [16] due to the relatively low cost.Specifically, some traditional software-enabled methods have tried to employ some physical model and prior knowledge [15,17], which successfully enhanced the visibility but caused severe detail lost and failed to effectively overcome the noise interference.The convolutional neural network (CNN) has become increasingly popular in recent years for enhancement tasks [18].The learnable convolutional kernel parameters enable CNNs to simultaneously eliminate noise interference [19].However, the features extracted by convolutional layers are local, which works ineffectively for some non-uniform illumination patches, and the translation invariance of the CNN is incompatible to the non-linear relationship between the object and the background, which causes vague edges in enhanced images.Furthermore, to improve the receptive view of the convolutional kernel for better feature extraction, the computational complexity gradually increases with the deepening of the network structure [20].

Motivation
For the convolutional layers, the critical mechanism is to learn a convolution kernel with fixed parameters and perform the same transformation process on the entire feature map.The size and stride settings of the convolution kernels only change the scope of action.The translation invariance is an important feature of the convolutional layer, which also makes it difficult for the CNN to extract the non-local features [21].
Meanwhile, the spatial attention mechanism is widely employed in computer vision tasks [22].Unlike convolution, the receptive field of the spatial mechanism is larger and more diverse, which can extract the global features from the feature map and overcomes the limitation of the local features.However, compared with words in passages of text, the resolution of pixels is much higher, which requires more parameters to learn.In 2019, Huang et al. [23] proposed the criss-cross attention mechanism, which extracts the contextual information from full-image dependencies with competitive computational efficiency.
To let the comprehensive information guide the enhancement processing, we propose the multi-feature fusion-guided network.Specifically, inspired by [23], we employ the densely connected convolution layers and the cross attention module for local and global feature extraction and fuse them to form the general feature map, which helps the network enhance the low-visibility image with more detail preservation and better color recovery.

Contribution
In this paper, we present a multi-feature fusion-guided low-visibility image enhancement network (MFF-Net) for maritime surveillance advancement.It achieves a superior balance between the enhancement effect and computational time.The main contributions of our method are summarized as follows:

•
We propose a multiple feature fusion-guided low-visibility image enhancement method for maritime surveillance advancement.It extracts the features of the image and reconstruct it with the supervision of a joint loss function to calculate both the Euclidean distance and angle difference between the output and the ground truth.The proposed network tackles two typical low-visibility problems, i.e., low-light and hazy, with the same framework.

•
To overcome the limitation caused by the translation invariance of the CNN, we design a two-branched global and local features extraction block (GL-Block) comprising cross attention modules and densely residual convolutional layers.The output feature maps are then fused to guide the enhancement processing with more comprehensive information.

•
Extensive experimental results show that our MFF-Net enhances both low-light and hazy maritime images with significant noise reduction and detail preservation, which outperforms other competitive methods.Furthermore, we evaluate the computational complexity of the MFF-Net.The results indicate an outperforming balance between the effect and the speed.

Organization
The rest of this paper is organized as follows: Section 2 reviews previous research on low-visibility image enhancement tasks.Section 3 introduces the proposed method and the detailed design of each component.Section 4 presents the experimental results compared with state-of-the-art methods on both enhancement performance and the running time cost.In addition, the ablation study investigates the necessity of the multi-feature fusion guidance for low-visibility image enhancement and the rationality about the weight settings of the joint loss function.The experiment on vessel detection demonstrates the practical benefits of our method.Section 5 summarizes the content of the paper and discusses future work.

Related Work
Low-light and haze are the most common low-visibility weathers in maritime surveillance.Many research studies have been proposed to over come these problems [24].In this section, we briefly review the related works about low-light image enhancement and image dehazing, which can be generally classified into traditional and deep learning-based method.

Low-Light Image Enhancement Methods
Low-light image enhancement methods can generally be divided to mathematical model-and deep learning-based methods.Mathematical model-based methods include some famous theories such as histogram equalization (HE) [17], gamma correction (GC) [25], Retinex theory [26], and so on.HE firstly attempts to enhance the image with the most frequent intensity values uniformly.However, in practical applications, HE and its variants [27,28] are severely hampered by non-uniform noise.GC tries to increase the intensity of each pixel with an exponential function, which is also effective for contrast enhancement.However, it ignores adjacent pixels' correlation, resulting in artifacts and enlarged noise.Retinex theory is based on the retinal-imaging concept that decomposes images into illumination and reflection maps.It was first utilized in 1997 to lighten lowlight images [26,29].Many Retinex-based methods were proposed in subsequent years.For instance, Wang et al. [30] proposed a specially designed enhancement method for non-uniform illumination, and Guo et al. [31] proposed low-light image enhancement via illumination map estimation (LIME), which achieved competitive performance in low-light image enhancement.However, mathematical model-based methods generally use some specific functions to estimate the noise and illumination, which is non-uniform and difficult to express as a specific equation.Therefore, the results always suffer from severe color distortion.Noise interference is also a thorny problem for traditional mathematical model-based methods.
With the rapid advancement of computing devices, deep learning-based methods have produced outstanding results in low-light image enhancement.In 2018, Chen et al. proposed an end-to-end network trained using extremely low-light raw sensor data [32], which demonstrated the superior performance of the neural network in low-light image enhancement tasks.In the following years, a number of works were published on low-light enhancement [33,34].For instance, KinD [35] proposed a CNN based on Retinex theory, which successfully correlated the mathematical model and neural network.Zero [36] formulated light enhancement as a task of image-specific curve estimation, which enhanced the low-light images with a lightweight neural network.Hap et al. [37] proposed a low-light image enhancement method, which decouples the model into two sequential stages to improve the scene visibility and suppress the rest degeneration factors separately.Guo et al. [38] designed a multi-scale deep stacking fusion enhancer to lighten the darkness in an intelligent transportation system.LLFlow [39] proposed a normalizing flow model to establish the relationship between the single low-light images to different normally exposed images.However, most deep learning methods suffer from several thorny problems like color distortion and detail lost, which are difficult to solve simultaneously by a lightweight CNN.

Image Dehazing Method
Image dehazing methods can be generally divided into prior-and deep-learning based methods.Prior-based dehazing methods exploit the statistical properties of clean images to estimate transmission maps, and then predict the haze-free image using the scattering model, which can be expressed as where H(x) is the hazy image, t(x) is the transmittance, and A is the atmospheric light intensity.
To acquire prior knowledge, early works attempted to concentrate on statistic analysis or observation of the haze-free images.Among them, He et al. [40] proposed the dark channel prior (DCP), which detects the haze distribution of hazy images by assuming that the lowest local intensity in the RGB channels are close to zero in a clear image.Zhu et al. [41] introduced the color attenuation prior, which supposes that in a linear model, the difference between the saturation and the pixel values are positively correlated with the depth of the scene.Although these methods have achieved certain dehazing effects, they are based on artificially constructed prior models, which cannot fully describe the real haze image.Therefore, these methods are highly restricted by the scene and have insufficient generalization ability.
The method based on deep learning also has a large application in dehazing.Cai et al. [42] proposed an end-to-end-based DehazeNet, which estimates the transmission map from a hazy image.Tang et al. [43] proposed a multi-scale network to exploit multi-scale information, which predicts the transmission by a coarse-scale net and a fine-scale one.Chen et al. [44] proposed a gated context aggregation network (GCANet), which employs a smooth dilated convolution to reduce the gridding artifacts led by the dilation technique.However, the image enhanced by GCANet still has unevenly distributed haze.However, these methods cannot recover the details of the image.Therefore, Qin et al. [45] further employed the application of the attention mechanism in dehazing work, which exploits a feature attention module that fuses the features with pixel and channel attention.Guo et al. [46] proposed a self-paced semi-curricular attention network to overcome the non-uniform distribution features of the hazy images.

Proposed Method
In practical applications, low-visibility weathers always bring challenges in traffic observation and navigational environment perception.An effective and efficient lowvisibility enhancement method is beneficial for maritime surveillance.In this section, we introduce our method in detail.For a better understanding, Table 1 lists the main symbols adopted in this work.

M
The feature map generated by neural networks M i,_ The i-th row vectors of the feature map M _,j The j-th column vectors of the feature map M i,j The vector at position (i, j) of the feature map λ The weight of the loss function L The loss function P The pixel of the output image P The pixel of the ground truth image h The height of the image w The width of the image c The channel number of the image

Architecture
The overview of the network is presented in Figure 2. To reduce the computational complexity, we use 1 × 1 convolutional and max-pooling layers to downsample the lowvisibility image.For feature extraction, we propose the GL-Block consisting of convolutional layers and cross attention modules.In the end, 1 × 1 convolutional and bilinear upsampling layers transform the output image to the corresponding fine scale.

GL-Block
We design a two-branched block to extract multiple features simultaneously.Firstly we employ cross attention modules [23] to extract global features, which collect global information in the horizontal and vertical directions to enhances the representative capability of each pixel, as shown in Figure 3. Specifically, 1 × 1 convolutional layers are used to obtain the query (Q), key (K), and value (V) matrix and generate the attention map M with an affinity operation.Unlike the common attention method, GL-Block achieves global spatial information interaction with two cross attention modules, which sufficiently reduces the computational complexity.The contextual information collected by the cross attention module can be expressed as where M i,j represents each vector in the input feature maps, M i,_ and M _,j represent the horizontal and vertical vectors, respectively, and f denotes the process of establishing the connection between each pixel.However, the cross attention mechanism will cause a black-line problem due to the extremely dark or bright pixels, as discussed in Section 4.6.Therefore, to balance the extreme non-uniformity, we optimize the cross attention module with two subsequent dilated convolutional layers.The kernel size is set to three, and the dilation steps are set to four and six, respectively.The other branch consists of several residual convolutional layers, designed to extract local features separately.In particular, inspired by [47], the convolutional layers are densely connected for better detail preservation.The kernel size and stride of convolutional layers are set to three and one, respectively.In the end, we merge the global and local feature maps and feed it into a 1 × 1 convolutional layer for feature fusion.

Loss Function
For the back-propagation process, we propose a joint loss function consisting of L1 loss L 1 , L2 loss L 2 , and color similarity loss L color to supervise our network from both the Euclidean and angle difference.This can be defined as where λ 1 , λ 2 , and λ 3 are the weights of each loss function.L1 Loss.To ensure the quality of the generated images, we employ the widely used L1 loss function, which is based on the Euclidean distance between each pixel.It can be expressed as where P k i and Pk i are the pixels of the output images and ground truth, respectively.i and k represent the positions and channels, respectively.h, w, and c denote the height, width and the number of channels, respectively.
L2 Loss.Besides L1, the L2 loss function is also widely used in low-level computer vision tasks for the effective restriction on the output image, which can be expressed as Color Similarity Loss.In RGB images, the Euclidean distance is a typical evaluation metrics to validate the similarity between two pixels, However, it ignores the angle difference between two RGB vectors, which also causes severe color differences between two pixels.To measure the deviation more comprehensively, we employ the cosine similarity between each vector as the color similarity loss to take the angle difference into consideration.The color loss function can be expressed as where the cosine value of the angle between the RGB vectors P i and Pi is calculated, which represents the angle differences of the pixel at the position i. = 0.001 is a hyper-parameter used to avoid zero becoming the denominator.

Experiments
In this section, we firstly describe the dataset and the implementation details used in the experiment.To comprehensively evaluate the performance of the MFF-Net, different aspects of the state-of-the-art methods and our model are compared including GT reference, noise reduction, color naturalness, and computational complexity.The ablation studies concerning the necessity of multiple feature fusion and the weight settings of the joint loss function are presented to demonstrate the rationality of the proposed method.Finally, to verify the practical benefits of the proposed method, we construct the vessel detection experiments on the enhanced images.

Dataset and Evaluation Indicators
Supervised learning requires the perfectly paired dataset to calculate the pixel difference between the output and ground truth.However, the current publicly available paired datasets (LOL [48], EnlightenGAN [49], I-HAZE [50], SMOKE [51], etc.) are not suitable for maritime low-visbility image enhancement 1 , and the paired maritime low-visibility image dataset is difficult to obtain.We thus synthesize a large number of marine low-visibility images based on the Seaships dataset.Specifically, we select 1500 high-quality images from the Seaships dataset for training and 30 images for testing, as shown in Figure 4.It is noted that the characters in the image are the timestamps and locations of the camera, which is contained in the original dataset.In low-light image enhancement tasks, we also adopt traditional methods to synthesize low-light maritime images, which can be expressed as where L maritime (x) is the low-light maritime image, J(x) is the clear image, and g(x) is the coefficient, which is a random number between 0.1 and 0.8.Meanwhile, we exploit Equation (1) to obtain synthetic training hazy data.We restrict t(x) from 0.1 to 0.7, and set A from 0.2 to 0.8.For the test data, we synthesized three types of low-light images with different light levels, i.e., g 1 (x) = 0.2, g 2 (x) = 0.4, and g 3 (x) = 0.6 (termed Test-L).
For the supervised neural network, the results closer to the ground truth represent a better performance.Therefore, for quantitative image quality assessment comparisons, we choose five reference evaluation indicators, i.e., peak signal-to-noise ratio (PSNR), structural similarity (SSIM) [52], feature similarity index measure (FSIM) [53], and visual saliency-induced index (VSI) [54] to evaluate the enhancement performance.It is noted that a higher PSNR, SSIM, FSIM, VSI represent better image quality and a closer proximity to the ground truth.

Implementation Details
We use Pytorch to build and train the MFF-Net.The network is trained for 300 epochs, and the ADAM optimizer is employed during training.The starting learning rate is set to 1 × 10 −3 and is multiplied by 0.1 after every 100 epochs.In the loss function, to equally employ the Euclidean distance and the angle difference as the restraint, the weights of L 1 , L 2 , and L color are set to 0.25, 0.25, and 0.5, respectively.For data augmentation, we randomly crop the 600 × 400 images to patches of size 128 × 128 for training and the original size for testing, and the running time costs are calculated on a laptop with an AMD Ryzen 7 5800H CPU accelerated by an NVIDIA GTX 3060 GPU.For a fair comparison, the parameters of all competing methods are from the open access checkpoints by the authors.

Experiments on Maritime Low-Light Images
To verify the superior performance of our method, we select some competitive classical algorithms and state-of-the-art methods to compare: (a) traditional mathematical model methods, including HE [17], NPE [30], BCP [56], SRIE [57], and LIME [31]; (b) deep learning-based methods, including RetinexNet [48], LightenNet [58], MBLLEN [59], KinD [35], Zero [36], and StableLLVE [18].The visual comparisons on Test-L are shown in Figure 5.In terms of mathematical model-based methods, the results of HE and BCP have obvious color distortion, some non-uniform artifact exists in the results enhanced by NPE, and LIME fails to lighten the low-light images effectively.In addition, for deep learning-based methods, RetinexNet suffers from severe color distortion, KinD only enhances the image with local features, which is incompatible with the illumination diversity between the non-adjacent patches, Zero sacrifices the enhancement effect for fast speed, making the results look a little dark, and StableLLVE fails to enlighten the extremely dark regions.Compared with these methods, our results look more natural with better recovery.As shown in Table 2, the quantitative experiment result indicates that our method achieves a competitive performance on the whole.Although not the best in terms of certain metrics, our proposed method has substantial advantages in running speed, as discussed in Section 4.5.[17]; (c) NPE [30]; (d) BCP [56]; (e) SRIE [57]; (f) LIME [31]; (g) RetinexNet [48]; (h) LightenNet [58]; (i) MBLLEN [59]; (j) KinD [35]; (k) Zero [36]; (l) StableLLVE [18]; (m) MFF-Net; (n) ground truth.
Table 2. Quantitative comparison between our method and the state-of-the-art methods on the 90 maritime low-light images.The top three results are marked in red, blue, and green colors, respectively.The ↑ represents that the higher value means better result.

Experiments on Maritime Hazy Images
To demonstrate the dehazing ability of MFF-Net, we also select some classical traditional enhancement methods, including DCP [40], CAP [41], HL [60], F-LDCP [61], and GRM [62], and some state-of-the-art learning-based methods, including DehazeNet [42], MSCNN [63], AOD-Net [64], GCANet [44], HTDNet [65], and FFANet [45], for testing.As shown in Figure 6, the image dehazed by DCP suffers serious noise interference, especially on the water surface.Meanwhile, color distortion occurs in some areas.CAP and F-LDCP fail to dehaze the images thoroughly, resulting in the overall image being covered by a layer of haze.In contrast, HL dehazes the images better, but the enhanced image is too bright with serious noise interference.Furthermore, on the edge of the vessel and some of the water surface, the reflection phenomenon seriously influences the visual feeling, which brings a barrier to the lookout.On the whole, the learning-based methods achieve better performance than the traditional methods.However, DehazeNet, MSCNN, and AOD-Net still cannot dehaze the images thoroughly.Furthermore, although GCANet and FFA-Net can remove most of the haze, the images are still riddled with artifacts.The quantitative results are shown in Table 3.Compared with other methods, MFF-Net can successfully dehaze the images with a good balance between color restoration and detail preservation, benefiting from the strong learning ability of the CNN and the multi-feature fusion strategy.The expoeriments on the real captued images are shown in Figure 7.It can be seen that our method is effective for both real-captured low-light and hazy images.

Computational Complexity Analysis
In the practical maritime surveillance, the visual enhancement methods must take the running time into account.To evaluate the performance on computational complexity, we provide the running time cost on both low-light image enhancement and dehazing.For lowlight enhancement, as shown in Table 4, MFF-Net is able to enhance the 400 × 600 images at over 20 FPS with the acceleration of an NVIDIA RTX 3060 GPU, which is faster than most other methods.LightenNet [58] and Zero [36] are more lightweight, but the enhancement effect is much worse than ours.For the dehazing method, as shown in Table 5, our MFF-Net also outperforms most of the previous methods.AOD-Net is faster, but the effectiveness of dehazing is worse than ours.In general, MFF-Net achieves a superior balance between the enhancement effect and the running time cost compared with the other methods, as depicted in Figure 8.
Table 3. Quantitative comparison between our method and the state-of-the-art methods on the 90 maritime hazy images.The top three results are marked in red, blue, and green colors, respectively.The ↑ represents that the higher value means better result.

Ablation Study
To validate the necessity of multiple feature fusion guidance, we first conduct the ablation experiment on the architecture with two incomplete versions: (a) with only the local feature guidance network (OLF-Net), which only employs the dense connected convolutional layers to extract the local features during the processing; and (b) with only the global feature guidance network (OGF-Net), which only uses the optimized cross attention module to extract global features.The visual results are shown in Figure 9, the enhanced image of OLF-Net suffers from obvious dark artifacts, which proves that the shallow convolutional layers cannot meet the learning capabilities required for extremely low-visibility image enhancement tasks.In addition, the noticeable black-line issue exists in the OGF-Net, due to the effect of exceptionally dark or bright pixels on correlated pixels in the cross attention module.The ablation study on the architecture indicates that multiple feature fusion guidance successfully improves the effectiveness of feature representation and alleviates the excessive influence of extremely bright or dark pixels in the cross attention mechanism, which is necessary in low-visibility enhancement tasks.We also investigate the effectiveness of the weight setting on different loss functions, including the Euclidean distance (L 1 + L 2 ) and angle difference (L color ).For a fair experiment, the architecture of the network is set to MFF-Net.According to the comparison of the quantitative indicators shown in Table 6, the network shows the best performance on the proposed weight setting, which guarantee the enhancement quality comprehensively with a rational balance between the Euclidean distance and angle difference.Table 6.Quantitative quality assessment comparison between different weight distributions of the loss functions on the testing data consisting of paired images extracted from the LOL [48] and EnlightenGAN [49] datasets.↑ and ↓ represent that higher or lower values mean the better results, respectively.

Improvement in Maritime Vessel Detection
To further demonstrate the practical benefits of our MFF-Net for maritime surveillance under low-visibility weathers, we apply YOLOv5 and YOLOX [68] to conduct maritime vessel detection experiments.The test images are randomly selected from the Test-L and Test-H.First, we select 1500 maritime-related images in the COCO dataset to train our detection networks.The evaluation tests are then constructed on the selected images.In lowvisibility scenes, the vessel detection accuracy decreases heavily due to the low contrast and vague edge features, which can cause difficulties in maritime surveillance.In other words, the caption cannot make full use of the computer vision to assist the artificial lookout.After enhancement, the visual data can deliver clearer traffic scenes to the managers, and the detection accuracy is also significantly increased.The experimental results are illustrated in Figures 10 and 11.Compared with the state-of-the-art methods, the enhanced results of the MFF-Net perform better due to the application of multi-feature fusion.The quantitative comparison is shown in Table 7.It is noted that the input image will be first resized to 640 × 640 in YOLOX.However, most of the traditional method cannot enhance them within one second; thus, they cannot be applied in practical engineering.Therefore, we compare our method with the fastest representative traditional and deep learning-based methods in a quantitative experiment.The experimental results demonstrate that the MFF-Net has practical benefits in maritime surveillance, which is more beneficial for higher-level visual tasks under low-visibility weathers when assisting artificial observations, thereby improving maritime management.

Conclusions
In this paper, we proposed an end-to-end multi-feature fusion-guided low-visbility enhancement method for maritime surveillance.Firstly, the maritime low-visibility images, i.e., low-light and hazy, are downsampled and then fed into the GL-Block comprising cross attention modules and dense residual convolutional layers.The GL-Block is designed to extract the global and local features to guide the enhancement processing simultaneously.After enhancement, the image is upsampled to a finer scale.For better constraint of the enhanced output, we introduced a joint loss function comprising L1 loss, L2 loss, and color similarity loss.In the experiments, we made massive comparisons on the visual performance, including quantitative image quality assessment, noise reduction, and color naturalness on both low-light enhancement and dehazing.Compared with other methods, the MFF-Net achieved a competitive quantitative and visual performance with effective noise reduction and superior color naturalness.Moreover, we evaluated the operating time cost and model size of the state-of-the-art methods, which indicates that MFF-Net can efficiently enhance extremely low-visibility images with lower computational complexity.In the ablation study, we conducted a series of experiments to investigate the necessity of multiple feature guidance and rational weight settings of the proposed loss function.Finally, the experiment of vessel detection indicate that our method is beneficial for practical maritime surveillance under low-visibility weathers.
In the future, we will test more methods for global feature extraction to demonstrate the advantages of multiple features for low-visibility image enhancement.Furthermore, high-definition videos cannot currently be enhanced in real time.We will thus optimize the architecture of the MFF-Net to achieve a better performance with lower computational complexity, which will enable the network to work on a diverse range of maritime edge devices.

Figure 1 .
Figure 1.Examples of the comparison between maritime low-visibility images and clear images.

Figure 2 .
Figure 2. Flowchart of the proposed MFF-Net.Firstly, the low-visibility image is downsampled with a max-pooling layer.The multiple features are then extracted with three GL-Blocks to guide the enhancement process.The enhanced image is finally upsampled to the original scale.

Figure 3 .
Figure 3.The detailed implementation of the GL-Block and cross attention module.

Figure 4 .
Figure 4. Thirty selected maritime images from the Seaships [55] dataset, which contains raw maritime surveillance data captured in different scenes.

Figure 7 .
Figure 7. Visual performance on the physically captured low-visibility images.The first row contains the low-light images extracted from the TMDIED [66] dataset, and the third row contains the hazy images extracted from SMD [67] and online websites.The corresponding enhanced results of our MFF-Net are shown in the second and fourth rows.

Figure 8 .
Figure 8.The trade-off between the visibility enhancement performance and the computational efficiency on several state-of-the-art low-light enhancement and dehazing methods.It is noted that the frame per second (FPS) metric is tested on a 600 × 400 resolution image.

Figure 9 .
Figure 9. Visual comparison between the enhanced results of the MFF-Net with the incomplete versions on the standard low-light image enhancement dataset.It is noted that the results in the ablation study are output from the network trained and tested with the same implementation details as the MFF-Net.

Table 1 .
Summary of key notations.

Table 4 .
Average running time cost (unit: seconds) and parameter comparison on low-light images with different resolutions (400 × 600, 480 × 640, and 768 × 1024) of the different methods.

Table 5 .
Average running time cost (unit: seconds) and the parameters comparison on hazy images with different resolutions (400 × 600, 480 × 640, and 768 × 1024) of the different methods.

Table 7 .
[55]titative experiments about the vessel detection accuracy improvement on YOLOX, which is tested on the Seaships[55]dataset.It is noted that mAP (clear), mAP (low-visibility), and mAP (enhancement) represent the mean average precision on clear, low-visbility, and enhanced images, respectively.