1. Introduction
In recent years, with the development of ship intelligence, higher demands have been placed on the acquisition of information on the approaching vessels. The type of vessel constitutes a crucial part of this information, and the characteristics of the ship such as ship size, propulsion mode, movement, and operational nature can be considered attributes of the vessel type. Therefore, the determination of the types of approaching vessels is regarded as essential for obtaining information on the approaching vessels. Ship lights not only serve as signals indicating the presence of a vessel but also, to a certain extent, reflect information about the vessel type, propulsion mode, size, movement, and operational nature. Traditional ship light recognition relies on the lookout conducted by maritime personnel themselves and is influenced by their competence, experience, and subjective factors. This can lead to issues such as misrecognition and time-consuming processes, which can minimally affect navigation efficiency and at worst result in navigation accidents. Therefore, utilizing modern technological means to quickly and accurately determine the types of approaching vessels based on their ship lights is of great significance for the development of intelligent shipping and the reduction in maritime accidents.
In the field of the determination of the types of approaching vessels, numerous studies have been conducted by international scholars and can primarily be categorized into three types: vessel type determination based on optical images [
1], Synthetic Aperture Radar (SAR) images [
2], and ship trajectories. Based on optical images, Song et al. proposed a semi-supervised multi-modal vessel-type classification method composed of a multi-level, cross-modal interaction network and a semi-supervised contrastive learning strategy, which performed well on public datasets [
3]. To address the issue of the manual extraction of ship image features, Yang et al. introduced a Convolutional Neural Network (CNN)-based vessel type classification method that can autonomously extract image features [
4]. For ship SAR images, Toumi et al. proposed a specifically designed shallow CNN architecture aimed at optimizing the performance of SAR datasets and combined this CNN with a Long Short-Term Memory (LSTM) network. This model achieved high accuracy in SAR image classification [
5]. Raj et al. constructed a small-sample deep learning model with a preprocessing stage and feature fusion technique, which demonstrated good results on imbalanced ship SAR image datasets [
6]. Yan et al. proposed a SAR image vessel type classification method combining Multiple Classifiers Ensemble Learning (MCEL) and transferring learning from Automatic Identification System (AIS) data. This method transfers MCEL models trained on AIS data to SAR image vessel type classification, effectively addressing the issue of labeled SAR sample scarcity [
7]. Based on ship trajectories, Sun et al. introduced an enhanced hierarchical spatiotemporal embedding approach that enhances node semantic information during data preprocessing and employs the Hierarchical Spatial-Temporal Embedding Method (Hi-STEM) for vessel type classification [
8]. Luo et al. proposed a novel ensemble classifier based on traditional machine learning algorithms, overcoming the limitations of single classifier performance [
9]. However, vessel type determination based on optical or SAR images requires high-quality images and is challenging under nighttime conditions, when the main body of the ship cannot be observed. While vessel type determination based on trajectories is not affected by nighttime environments, it requires the long-term analysis of ship trajectories, which does not meet real-time requirements. Given that ship lights inherently reflect vessel type characteristics and are unaffected by the main body of the vessel, this paper selects ship light features as the foundation for vessel type determination.
Currently, international research on the determination of types of vessels based on ship lights remains limited, with existing studies primarily focusing on light recognition and ship positioning. Hu et al. proposed an image-processing-based ship light recognition algorithm for maritime ships, which utilizes geometric attributes and regional pixel information in the Hue, Saturation, Value (HSV) color space to recognize lights. They further mapped the recognition results to three mutual-visibility encounter scenarios, developing a light-based auxiliary method for ship encounter situation assessment to support collision risk calculation and autonomous avoidance decision-making [
10,
11]. Zhong et al. conducted comparative experiments using LJ1-01 data and National Polar-orbiting Partnership (NPP)/Visible Infrared Imaging Radiometer Suite (VIIRS) data, applying threshold segmentation and a two-parameter constant false alarm rate (CFAR) method to determine ship positions. Their study analyzed the characteristics, performance, and application potential of high-resolution nighttime light remote sensing imagery in marine ship detection [
12]. Bi et al. designed a vision-sensing- and machine-prediction-based method for ship-collision avoidance navigation signal recognition, constructing a micro-feature imaging model using optical information and employing an improved adaptive boosting (Adaboost) algorithm to identify vessel types and lights [
13]. However, the determination of types of vessels based on ship lights requires the establishment of quantifiable light feature metrics with uniqueness to distinguish different ship categories. While studies [
10,
11,
12] have focused on light recognition and positioning using color and geometric properties, they did not address ship classification. Study [
13] only covered limited vessel types (e.g., container ships, LNG carriers, bulk carriers, and passenger ships), providing insufficient data support for collision avoidance decisions. In recent years, deep learning has seen rapid development in recognition and classification tasks, with applications spanning medical imaging [
14], image processing [
15], text analysis [
16], and radio signal processing [
17]. Das et al. proposed a B-scan attentive convolutional neural network (BACNN), integrating spatial features with clinical diagnostic relevance through a CNN and attention module to enhance classification accuracy [
18]. Elsharkawy et al. proposed an improved YOLOv8 model that uses triplet attention (TA) and multi-head self-attention (MHSA) mechanisms to enhance YOLOv8’s performance, addressing the challenges of crack detection in complex scenarios [
19]. Building on these advancements, this study adopts the YOLO algorithm for navigating light recognition and employs 1D-CNN-based deep learning techniques for the determination of types of vessels.
To address the need to acquire information on approaching vessels, this paper proposes a method for determining the types of approaching vessels based on an improved YOLOv8 model and ship light features. Firstly, this paper proposes a ship light recognition model based on YOLOv8. By introducing the DWR module, BiFPN module, and HAT, the model’s capability to extract and detect small target features is enhanced, enabling the recognition of ship lights under complex conditions such as dawn and dusk, fog, and nighttime. Secondly, in accordance with the provisions of Chapter III of the International Regulations for Preventing Collisions at Sea (COLREGs) 1972, an analysis of the light signals displayed by 36 categories of ships was conducted by comparing the color, number, combination, and spatial distribution of lights shown by different types of vessels. Based on this analysis, characteristic indicators for the light signals were established, and feature extraction was performed on the results of ship light recognition to create relevant datasets. Finally, deep learning algorithms such as ECA-1D-CNN were utilized to assess the viewing angle of the light signals and identify the types of approaching vessels, thereby obtaining ship information such as propulsion method, size, movement, and operational nature.
2. Ship Light Recognition Method
Ship light recognition falls under the category of small-target detection, whereby light patterns are relatively small compared to the ship body, making their features challenging to capture. Furthermore, ship lights are typically displayed under low-visibility conditions such as nighttime, twilight, or heavy fog, in which recognition must contend with complex backgrounds and interfering lights. While YOLOv8 has been widely applied to ship detection [
20], it comes with limitations in small-target recognition, including insufficient local detail extraction and information loss [
21]. To address these challenges, this paper introduces multiple enhancements to the YOLOv8 framework to improve its capability in light recognition:
Feature Extraction Phase: The DWR module is integrated into the C2f structure to enhance multi-scale feature extraction for small targets.
Feature Fusion Phase: The neck section is optimized using the BiFPN, which leverages multi-level feature pyramids and bidirectional information flow to improve the fusion of small-target features.
Feature Allocation and Detection Phase: The detection head is upgraded with the HAT mechanism. By combining channel attention and self-attention mechanisms, HAT enhances single-image super-resolution reconstruction, thereby refining feature allocation for small targets.
The ship light recognition model based on the improved YOLOv8 proposed in this paper consists of three parts: Backbone, Neck, and Head. The specific structural schematic diagram is shown in
Figure 1. The Backbone comprises five convolution (Conv) modules, one C2f module, three C2f-DWR modules, and one spatial pyramid pooling fast (SPPF) module, collectively forming a feature extraction network designed to capture feature maps at various scales. Among these, the Conv modules process the input through convolution layers to perform downsampling operations, reducing the size of the feature maps while increasing the number of channels. Batch normalization (BN) layers and the sigmoid linear unit (SiLU) activation function are employed to enhance the model’s nonlinear representation capabilities. The C2f module splits the feature maps after convolution into two parts: one part directly goes through a series of bottleneck layers for feature extraction, while the other part undergoes residual connections after each operation layer before being concatenated (Concat) again. Finally, the concatenated features are passed through a convolution layer for output. This design aims to effectively aggregate multi-scale information and maintain or even enhance the model’s expressiveness. The SPPF module enhances the model’s ability to detect objects of different sizes by fusing multiple max-pooled feature maps. The Neck section employs a BiFPN structure, consisting of four C2f modules, two upsample (Upsample) modules, two Conv modules, and four BiFPN-Add modules, forming a feature fusion network. This network is designed to merge feature maps of different scales and outputs three feature maps at different scales for connection with the detection heads. Among these, the Upsample modules upscale lower-resolution feature maps to match the dimensions of higher-resolution feature maps, facilitating their fusion. The BiFPN-Add modules are specialized operations within the BiFPN for merging feature maps. Specifically, BiFPN-Add2 denotes the fusion of two input feature maps, while BiFPN-Add3 indicates the fusion of three input feature maps. The Head section consists of three detection heads, corresponding to the three different-sized feature maps output by the Neck section. Through regression and classification operations performed on these feature maps, the Head section outputs the final results for object detection.
2.1. Enhancements to the YOLOv8 Backbone Based on the DWR Module
Ship light recognition is a task involving small target detection and identification, whereby the color of the lights can be significantly affected by environmental circumstances. Therefore, the display of ship lights shows notable differences under conditions such as fog, nighttime, and dawn or dusk. Additionally, in video images, the occlusion of ship lights varies from different perspectives, which significantly increases the difficulty of detection and identification. DWR is a network module designed for efficiently extracting multi-scale contextual information, originating from the ultra-real-time semantic segmentation model DWR-Seg, which has already achieved good results in real-time semantic segmentation tasks. DWR employs a two-step residual feature extraction method, namely region residualization and semantic residualization, effectively capturing feature information at different scales [
22]. The network structure of DWR is shown in
Figure 2.
Firstly, a 3 × 3 depthwise separable dilated convolution is used to process the input features, obtaining region-form feature maps
with different scales. Then, a three-branch structure is employed to expand the receptive field, performing depthwise convolutions at dilation rates of 1, 3, and 5, respectively. This morphological filtering of region-form feature maps at different scales effectively aggregates multi-scale information, resulting in the output
. Finally, a 1 × 1 pointwise convolution is applied to fuse and reduce the dimensions of the features, which are then added to the original input feature map to form a residual connection, yielding the final output
. The specific operations are shown in Formulas (1)–(3).
In the formula, represents the input features; is the activation function; denotes the batch normalization layer; refers to the 3 × 3 depthwise separable dilated convolution; is the dilation rate, also known as the atrous rate; indicates the depthwise convolution with a dilation rate of ; denotes the 1 × 1 pointwise convolution; and represents the concatenation operation of all convolutions with different dilation rates.
In this paper, the bottleneck module in the C2f structure is replaced with the DWR module to construct the C2f-DWR variant. By leveraging depthwise convolutions with varying dilation rates, this modification expands the receptive field and strengthens the model’s ability to extract multi-scale features, significantly enhancing its capability to capture the critical characteristics of ship lights. The detailed architecture of the C2f-DWR module is depicted in
Figure 2. Similar to the C2f module, the C2f-DWR module splits the feature maps resulting from convolution into two parts through a Split operation. One part directly undergoes a series of DWR layers to extract feature information, while the other part involves residual connections after each processing layer. Ultimately, the features from both parts are combined and passed through a convolutional layer for output.
2.2. Enhancements to the YOLOv8 Neck Based on the BiFPN Module
YOLOv8 employs a feature pyramid network (FPN) and a path aggregation network (PAN) for multi-scale feature fusion. FPN facilitates top–down feature information propagation, whereas PAN relies entirely on the output from FPN to conduct bottom–up feature information propagation. However, during small target detection tasks, there is an issue of losing original feature information. BiFPN is a weighted bidirectional feature pyramid network architecture designed for object detection tasks. Unlike FPN, BiFPN establishes bidirectional connections between top–down and bottom–up paths, enhancing the flow of multi-scale feature information [
23]. Additionally, BiFPN optimizes cross-scale connections by removing nodes with only a single input connection, adding extra connections between input and output nodes at the same level, and treating each bidirectional path as a feature layer that iterates multiple times. Furthermore, BiFPN assigns learnable weights to each input feature, assessing the importance of different features and thereby enhancing the model’s effective fusion capabilities. Specifically, BiFPN uses batch normalization fusion with a weighted method to add weights to the input features, as calculated by Formula (4).
In the formula, represents the weight value, which is ensured to be greater than 0 through the ReLu activation function; represents the input feature; and represents the learning rate.
Although BiFPN enables bidirectional feature propagation, shallow and deep-level features are not directly fused. Instead, feature information undergoes multiple convolutional and upsampling operations during transmission, which may lead to a partial loss of local features. To address this limitation, this paper introduces direct cross-layer connections between the deepest and shallowest feature maps, ensuring a comprehensive fusion of semantic information from deep layers (capturing target semantics) and positional details from shallow layers (retaining spatial accuracy). The structure of the modified BiFPN module is illustrated in
Figure 3, where red arrows denote the direct cross-layer connections.
2.3. Enhancements to the YOLOv8 Detection Head Based on the HAT Module
HAT is a transformer model that combines channel attention and self-attention mechanisms to activate more pixels for high-resolution reconstruction [
24]. HAT primarily consists of three steps: shallow feature extraction, deep feature extraction, and image reconstruction. During the shallow feature extraction phase, the model employs convolution operations to extract initial low-level features, yielding shallow feature maps. Following this, deep feature extraction is performed on these shallow feature maps, incorporating a global residual connection that fuses the shallow feature maps with the deep feature maps. Finally, a high-resolution image is constructed through an image reconstruction module. The deep feature extraction is the core part of the model, composed of multiple residual hybrid attention group (RHAG) modules and a convolutional layer. Each RHAG module includes several hybrid attention blocks (HABs), an overlapping cross-attention block (OCAB) layer, and a convolutional layer. HAB is the core module that integrates channel attention and self-attention mechanisms. It first processes the input features through a normalization layer, then uses a dual-branch structure to apply channel attention and window-based multi-head self-attention to the input features separately, and finally obtains the output by weighted summation. OCAB is an overlapping cross-attention module that enhances the performance of super-resolution reconstruction by establishing cross-connections between different windows. The network architecture of HAT and its constituent modules is illustrated in
Figure 4.
Given that ship light recognition is categorized as small target recognition, this paper integrates the HAT mechanism at the feature input stage of the detection head to achieve a super-resolution reconstruction of images. This enhancement aims to improve the model’s capability to detect small targets. The resulting modified detection head is termed HAT-Head. By leveraging attention mechanisms, HAT-Head effectively consolidates the global pixel information of images, significantly boosting the model’s feature extraction capabilities. This integration ensures not only the precision but also the robustness of the recognition model, making it more reliable for detecting small targets like ship lights.
3. The Features of Ship Lights and the Creation of a Dataset
At night, the encounter situation of a vessel can be determined based on the types of approaching vessels (e.g., motorized vessels, sailboats) and the color/types of ship lights (e.g., mast lights, sidelights, stern lights). While the improved YOLOv8-based ship light recognition model can recognize the color of ship lights, it cannot infer critical attributes such as vessel type, size, operational status, or working nature, nor can it classify light types to assess encounter situations or collision avoidance responsibilities. Chapter 3 of the COLREGs clearly specifies the lights that various types of vessels should display under different circumstances. For example, an air-cushion vessel, when operating in the non-displacement mode, shall display one white masthead light, sidelights on both sides, one yellow all-round light, and one white stern light. When towing, a power-driven vessel with L > 50 m and s > 200 m shall display one white masthead light, three vertical white masthead lights, sidelights on both sides, one white stern light, and one yellow towing light. The configuration of ship lights reflects information about the vessel’s type, propulsion mode, size, status, and operational nature. Once this information is confirmed, the types of ship lights exhibited by the vessel can be determined, further allowing for the assessment of the encounter situation. Therefore, this paper identifies and extracts ship light features for common approaching vessels using the color, number, combination, and spatial distribution of the ship’s lights as characteristic indicators. In addition, ship information such as the ship’s size, status, and operational nature is used as supplementary criteria for categorizing approaching vessels, and accordingly, the common approaching vessels are categorized into 36 types. Additionally, it classifies the exhibition of ship lights under various nighttime encounter situations, assessing the vessel’s collision avoidance responsibilities according to COLREGs.
3.1. Definitions Related to Ship Lights
Ship lights are lamps used to display characteristics such as the presence of a ship, its size, status, type, and operational nature. Ship lights mainly include masthead lights, sidelights, stern lights, towing lights, all-round lights, and flashing lights. Ship lights shall be exhibited from sunset to sunrise, from sunrise to sunset in restricted visibility, and in all other circumstances when it is deemed necessary to inform other ships about the displaying ship’s size, movement, propulsion mode, type, operational nature, and other information, thereby facilitating early, substantial, wide-ranging, and clear avoidance actions. “Masthead light” means a white light placed on the fore-and-aft centerline of the vessel, showing an unbroken light across an arc of the horizon of 225 degrees and so fixed as to show the light from right ahead to 22.5 degrees abaft the beam on either side of the vessel. “Sidelights” means a green light on the starboard side and a red light on the port side, each showing an unbroken light across an arc of the horizon of 112.5 degrees and so fixed as to show the light from right ahead to 22.5 degrees abaft the beam on its respective side. In a vessel of less than 20 m in length, the sidelights may be combined in one lantern carried on the fore-and-aft centerline of the vessel. “Sternlight” means a white light placed as close as practicable to the stern, showing an unbroken light across an arc of the horizon of 135 degrees and so fixed as to show the light 67.5 degrees from right aft on each side of the vessel. “Towing light” means a yellow light having the same characteristics as the “Sternlight”. “All-round light” means a light showing an unbroken light across an arc of the horizon of 360 degrees. “Flashing light” means a light flashing at regular intervals at a frequency of 120 flashes or more per minute. The horizontal arc range exhibited by the ship lights is shown in
Figure 5.
Ship lights are typically colored white, red, green, or yellow. Specifically, mast lights and stern lights are white. The port sidelight is red, and the starboard sidelight is green. The towing light is yellow, and all-round lights can display all four colors (white, red, green, and yellow).
3.2. Category of Vessel Types
During navigation, the lights exhibited by a vessel vary depending on its type, size, propulsion mode, movement, and operational nature. For the same vessel, the appearance of its lights observed from different view angles will also differ. Due to specific viewing angles, there may be an incomplete display of ship lights or inconsistent numbers of lights visible. This paper considers 36 types of vessels and four viewing angles, resulting in a total of 144 different light display scenarios. The viewing angles include front view, port-side view, starboard-side view, and stern view. The types of vessels are classified into seven categories: power-driven vessels underway, vessels towing and pushing, sailing vessels underway, fishing vessels, vessels not under command or restricted in their ability to maneuver, and pilot vessels.
3.2.1. Power-Driven Vessels Underway
This category includes power-driven vessels with L > 50 m, power-driven vessels with L < 50 m, power-driven vessels with L < 20 m, and air-cushion ships in non-displacement mode.
3.2.2. Vessels Engaged in Towing and Pushing
This category includes power-driven vessels engaged in towing with L > 50 m and s > 200 m, power-driven vessels engaged in towing with L > 50 m and s < 200 m, power-driven vessels engaged in towing with L < 50 m and s > 200 m, power-driven vessels engaged in towing with L < 50 m and s < 200 m, power-driven vessels engaged in pushing with L > 50 m, power-driven vessels engaged in pushing ahead with L < 50 m, power-driven vessels engaged in towing alongside with L > 50 m, and power-driven vessels engaged in towing alongside with L < 50 m.
3.2.3. Sailing Vessels Underway
This category includes sailing vessels underway.
3.2.4. Fishing Vessels
This category includes fishing vessels engaged in trawling with L > 50 m and making way through the water, fishing vessels engaged in trawling with L > 50 m and not making way through the water, fishing vessels engaged in trawling with L < 50 m and making way through the water, fishing vessels engaged in trawling with L < 50 m and not making way through the water, fishing vessels engaged in fishing other than trawling and making way through the water, and fishing vessels engaged in fishing other than trawling and not making way through the water.
3.2.5. Vessels Not Under Command or Restricted in Their Ability to Maneuver
This category includes vessels not under command and making way through the water; vessels not under command and not making way through the water; power-driven vessels with L > 50 m, making way through the water, and restricted in their ability to maneuver (excluding those engaged in mine clearance, towing, or dredging or underwater operations); power-driven vessels with L < 50 m, making way through the water, and restricted in their ability to maneuver (excluding those engaged in mine clearance, towing, or dredging or underwater operations); vessels restricted in their ability to maneuver and not making way through the water (excluding those engaged in mine clearance, towing, or dredging or underwater operations); vessels restricted in their ability to maneuver due to being engaged in dredging or underwater operations and not making way through the water; power-driven vessels with L > 50 m, engaged in towing operations with s > 200 m, and restricted in their ability to maneuver; power-driven vessels with L > 50 m, engaged in towing operations with s < 200 m, and restricted in their ability to maneuver; power-driven vessels with L < 50 m, engaged in towing operations with s > 200 m, and restricted in their ability to maneuver; power-driven vessels with L < 50 m, engaged in towing operations with s < 200 m, and restricted in their ability to maneuver; vessels with L > 50 m, making way through the water, and restricted in their ability to maneuver due to being engaged in dredging or underwater operations; vessels with L < 50 m, making way through the water, and restricted in their ability to maneuver due to being engaged in dredging or underwater operations; vessels with L > 50 m engaged in mine clearance operations; and vessels with L < 50 m engaged in mine clearance operations.
3.2.6. Vessels Constrained by Their Draught
This category includes vessels constrained by their draught with L > 50 m and vessels constrained by their draught with L < 50 m.
3.2.7. Pilot Vessels
This category includes vessels engaged in pilot duty.
In the aforementioned classification of vessels, L represents the length of the vessel, and s represents the length of the tow. These are subsequently denoted as N1-N36 for the 36 types of vessels discussed.
Among the 36 types of vessels, specific configurations of lights are used to indicate the type or operational nature of the vessel. Two masthead lights indicate a vessel with a length greater than 50 m; a single masthead light signifies a vessel with a length less than 50 m. A red sidelight indicates the port side, while a green sidelight indicates the starboard side. A yellow all-round light denotes an air-cushion vessel in non-displacement mode. A combination of red and green light represents a power-driven vessel less than 12 m in length. A yellow stern light indicates towing. Three vertical white masthead lights signify a tow length exceeding 200 m, whereas two vertical white masthead lights indicate a tow length under 200 m or that the ship is engaged in pushing ahead or towing alongside. Two green and two red sidelights denote pushing ahead, while three green and three red sidelights represent pushing alongside. Two vertical all-round lights with red over green indicate a sailing vessel. Two vertical all-round lights with green over white signify a fishing vessel engaged in trawling, whereas red-over-white all-round lights indicate a fishing vessel engaged in fishing other than trawling. Two vertical red all-round lights signify a vessel not under command, and three vertical lights with red, white, and red denote a vessel restricted in her ability to maneuver. For vessels restricted in their ability to maneuver due to underwater operations or dredging, besides exhibiting the red, white, and red lights, they must exhibit two vertical red all-round lights on the side where there is an obstruction and two vertical green all-round lights on the side where the passage is clear. Three vertical red all-round lights indicate a vessel constrained by her draught. For example, the lights exhibited by power-driven vessels underway are shown in
Figure 6.
3.3. The Features of Ship Lights
Combining with Chapter 3 of COLREGs, an analysis can be conducted on the exhibition of lights for 36 types of vessels, from which 23 feature indicators of ship lights can be extracted. The feature indicators of ship lights are as follows: (1) except for fixed combinations, only a single white light; (2) only two white lights, and they are not aligned vertically or horizontally; (3) a single green light; (4) a single red light; (5) white-over-red, white-over-green, or white-over-combined lantern; (6) the presence of yellow light; (7) a number of horizontal red-green side light groups; (8) three vertical white masthead lights; (9) two vertical white masthead lights; (10) only one group of vertical yellow upper and white lower lights; (11) two green lights; (12) two red lights; (13) three green lights; (14) three red lights; (15) only one group of vertical red upper and green lower lights; (16) only one group of vertical green upper and white lower lights; (17) only one group of vertical red upper and white lower lights; (18) only one group of two vertical red lights; (19) only one group of three vertical lights with a red-white-red configuration; (20) only one group of two vertical green lights; (21) only one group of a triangular arrangement of three green lights; (22) only one group of three vertical red lights; and (23) lights exhibited by towed vessels. These features will be represented as
M1-
M23 in the Tables in
Appendix A, indicating the exhibitions of ship lights viewed from four angles: front view, port-side view, starboard-side view, and stern view. In these tables, “0” indicates that the feature is not present, and “1” indicates that the feature is present, while “2” and “3” indicate the quantity of the feature.
3.4. Navigation Rules for Vessels in Different Encounter Situations at Night
This paper analyzes the navigation rules for various scenarios—including interactions with sailing vessels, overtaking, head-on, and crossing situations—based on the ship light configurations of 36 vessel types within the research scope. Notably, the collision avoidance responsibilities differ depending on the type of vessel.
3.4.1. Sailing Vessels
Two vertical all-round lights with red over green indicate a sailing vessel. When an approaching vessel is identified as a sailboat and its ship lights are observed, the two vessels are deemed to be in sight of one another. At this point, Part B of Chapter II of the COLREGs comes into effect.
- (a)
If our vessel is also a sailing vessel, Rule 12 begins to apply. It states that when two sailing vessels are approaching one another so as to involve risk of collision, one of them shall keep out of the way of the other as follows:
(i) When each has the wind on a different side, the vessel that has the wind on the port side shall keep out of the way of the other.
(ii) When both have the wind on the same side, the vessel that is to windward shall keep out of the way of the vessel that is to leeward.
(iii) If the vessel with the wind on the port side sees a vessel to windward and cannot determine with certainty whether the other vessel has the wind on the port or on the starboard side, she shall keep out of the way of the other.
- (b)
If our vessel is a power-driven vessel, Rule 18 becomes applicable. According to Rule 18, a vessel underway and propelled by machinery shall keep out of the way of a sailing vessel. The method for a power-driven vessel to avoid a sailing vessel typically adheres to the following principles:
(i) If the sailing vessel is running before the wind, the power-driven vessel should pass astern of the sailing vessel.
(ii) If the sailing vessel is reaching across the wind, the power-driven vessel should pass to the windward side of the sailing vessel.
3.4.2. Overtaking
In this paper, all 36 vessel types under investigation are equipped with a single white stern light. At night, when only the stern light of another vessel directly ahead is observed, and no sidelights are visible, Rule 13 of the COLREGs applies. According to Rule 13, if only the stern light of the overtaken vessel is visible and neither of its sidelights can be seen, the situation shall be deemed an overtaking scenario. In such cases, the overtaking vessel is obligated to give way to the vessel being overtaken.
3.4.3. Head-On Situation
A power-driven vessel underway shall exhibit a white masthead light, a red sidelight, a green sidelight, and a white stern light. According to Rule 14 of the COLREGs, such a situation shall be deemed to exist when a vessel sees the other ahead or nearly ahead, and by night she would see the masthead lights of the other in a line or nearly in a line and/or both sidelights. In this case, two power-driven vessels should each alter course to starboard so as to pass well clear of each other’s port side.
3.4.4. Crossing Situation
According to Rule 15 of the Regulations, when two power-driven vessels are crossing so as to involve risk of collision, the vessel which has the other on her own starboard side shall keep out of the way and shall take action to avoid collision. That is to say, at night, when two vessels are in a crossing situation, the give-way vessel will only see the red sidelight of the stand-on vessel and not its green sidelight, while the stand-on vessel will only see the green sidelight of the give-way vessel and not its red sidelight.
3.4.5. Responsibilities Between Vessels
Except where Rules 9, 10, and 13 otherwise require,
- (a)
A power-driven vessel underway shall keep out of the way of the following:
(i) a vessel not under command;
(ii) a vessel restricted in her ability to maneuver;
(iii) a vessel engaged in fishing;
(iv) a sailing vessel.
- (b)
A sailing vessel underway shall keep out of the way of:
(i) a vessel not under command;
(ii) a vessel restricted in her ability to maneuver;
(iii) a vessel engaged in fishing.
- (c)
A vessel engaged in fishing when underway shall, so far as possible, keep out of the way of the following:
(i) a vessel not under command;
(ii) a vessel restricted in her ability to maneuver.
(d) Any vessel other than a vessel not under command or a vessel restricted in her ability to maneuver shall, if the circumstances of the case admit, avoid impeding the safe passage of a vessel constrained by her draught in accordance with Rule 28.
3.5. The Creation of a Dataset for Ship Light Features
The ship lights feature dataset in this paper is constructed based on light recognition results. Specifically, guided by predefined light feature indicators, this paper builds a comprehensive dataset covering 36 vessel categories under four viewing angles: front, port side, starboard, and stern. This dataset encodes critical attributes such as light color, quantity, spatial arrangement, and combinatorial patterns. The detailed methodology for dataset creation, including data acquisition, annotation protocols, and experimental setup, is described in
Section 5.1 (Dataset and Experimental Environment).
4. A Determination Method for the Types of Approaching Vessels Based on Ship Light Features
In practical navigation, it is usually not feasible for a vessel to simultaneously observe the lights of an approaching vessel from four viewing angles: front, port side, starboard, and stern. Given the ship’s maneuverability and the requirement of safe distances, vessels often keep at considerable distances from one another. Under such circumstances, changes in viewing angles occur slowly, meaning that over extended periods, the light information obtained about an approaching vessel may only be from a single view. Since the exhibition of ship lights varies with different viewing angles, assessing which viewing angle the observed light features belong to is fundamental for determining the type of approaching vessel. To address this, distinct sub-modules are established for each of the four viewing angles to facilitate the determination of the types of approaching vessels. Therefore, the model comprises two main components: a viewing angle assessment module and a vessel type determination module for the approaching vessels. The viewing angle assessment module analyses the features of the ship lights to assess the viewing angle; once assessed, the system transitions to the corresponding sub-module for the assessed viewing angle, whereby the vessel type determination module conducts the final determination of the vessel type. A schematic diagram illustrating this model is shown in
Figure 7.
4.1. Viewing Angle Assessment Module
During the ship light feature extraction phase, this paper obtained the lighting configurations of 36 types of vessels under four different viewing angles and systematically organized these into feature indicators. An analysis of these feature indicators reveals that the exhibition patterns of ship lights vary across different viewing angles. Given that the ship light feature indicators form a one-dimensional vector, this paper opts for a 1D-CNN as the model for the viewing angle assessment module.
The concept of CNN was first introduced by Yann LeCun in 1998 as a nonlinear deep learning algorithm with powerful feature extraction capabilities [
25]. CNNs are highly efficient in processing grid-structured data and are commonly applied in fields such as computer vision, natural language processing, and remote sensing science [
26,
27,
28]. A typical CNN structure consists of an input layer, convolutional layers, activation functions, pooling layers, and fully connected layers. Convolutional layers extract local features from input data through sliding convolutions using multiple kernels to generate feature maps. Pooling layers downsample the convolved features to reduce the spatial dimensions of the data while preserving essential feature information. Fully connected layers integrate and classify the features extracted by the convolutional and pooling layers, ultimately outputting classification results or regression values. A 1D-CNN is a variant of CNNs, operating on the principles of convolution and pooling operations tailored for one-dimensional data. During the forward propagation phase, input data pass through convolutional layers to extract features, then through pooling layers to reduce dimensions, and finally through fully connected layers for classification or regression. In the backward propagation phase, gradient descent is used to minimize the error function, enabling the optimization and iterative updating of network parameters to refine weights. The convolution and pooling processes in a 1D-CNN are illustrated in
Figure 8.
To enhance the computational efficiency and classification accuracy of 1D-CNNs, this paper introduces a lightweight efficient channel attention (ECA) module [
29], which was proposed by Wang et al. in 2020. The core structure of the ECA module is illustrated in
Figure 9. By avoiding the dimensionality reduction operations present in traditional channel attention mechanisms, the ECA effectively mitigates the issue of feature information loss, thereby significantly boosting the model’s capability to extract critical features of ship lights. Moreover, the ECA employs an adaptive mechanism to determine the size of the one-dimensional convolution kernel (parameter
), the value of which is dynamically adjusted based on the number of input channels
(as calculated by Formula (5)). This approach ensures optimal performance while minimizing the number of parameters, achieving a balance between efficiency and precision.
In the formula, = 2, = 1, and ||odd refers to taking the absolute value and then rounding down to the nearest odd number.
In this paper, the ECA module is introduced between the first set of one-dimensional convolutional layers and the first max pooling layer to enhance the model’s capability in processing the features of ship lights.
To determine the network structure and parameters of the ECA-1D-CNN model, this paper selects micro-precision, macro-precision, macro-recall, micro-recall, and loss values from the test set as performance evaluation metrics for the model. The dataset used is the ship light feature dataset under mixed-viewing angles presented in
Section 5.1 (Dataset and Experimental Environment) of this paper.
4.1.1. Network Depth
The ECA-1D-CNN model discussed in this paper consists of an input layer, feature extraction layers, a flatten layer, and fully connected layers. As the core component of the model, the feature extraction layers each comprise a single convolutional layer, an ECA module, a pooling layer, and a batch normalization (BN) layer. In order to ascertain the optimal depth for these feature extraction layers, this paper conducted training on models with different depths. The outcomes of these experiments are illustrated in
Figure 10 and detailed in
Table 1.
As shown in
Figure 10, the model employing two layers of feature extraction exhibits a more stable loss function during the training process and ultimately converges to a lower loss value. Additionally, according to
Table 1, the model with two layers of feature extraction achieves higher precision on the test set and demonstrates a lower loss value. Therefore, this paper selects the two-layer feature extraction as part of the ECA-1D-CNN to leverage its advantages in stability and efficiency.
4.1.2. The Setting of BN Layer
BN is a technique utilized to enhance the training speed and stability of neural networks by normalizing the inputs of each mini-batch, thereby reducing internal covariate shift. In CNN, BN layers, especially when placed after fully connected layers or convolutional layers, can accelerate the training process and improve model performance. To determine whether to adopt BN layers, comparative experiments were conducted under consistent model parameters. The experimental results are illustrated in
Figure 11 and summarized in
Table 2.
As shown in
Figure 11, the model employing BN layers exhibits a more stable loss function during the training process and ultimately converges to a lower loss value. According to
Table 2, on the test set, the model with BN layers achieves higher precision and has a smaller loss value. Therefore, this paper opts to include BN layers as part of the model, placing them after the convolutional layers and before the pooling layers.
4.1.3. The Setting of Pooling Methods
The pooling layer can reduce feature dimensions and decrease computational load. In 1D-CNN, pooling methods are divided into max pooling and average pooling. Max pooling is capable of retaining the most significant features from the input feature map but may lose some detailed information. On the other hand, average pooling retains the global features of the input feature map but might blur the prominent features within the input data. To determine the appropriate pooling method, comparative experiments were conducted under consistent model parameters, with the experimental results shown in
Figure 12 and
Table 3.
As shown in
Figure 12 and
Table 3, the average pooling method shows a certain improvement over max pooling in terms of macro-precision, macro-recall, micro-precision, and micro-recall. Additionally, its loss value is lower. Max pooling excels at extracting local features, whereas average pooling demonstrates better performance in capturing global features. In the context of the ship light feature metrics established in this paper, each metric is considered to have equal importance, without any priority consideration. Moreover, since the indicator parameters used in this paper mostly consist of binary values (0 and 1) with minimal fluctuation, the average pooling method has been chosen in this paper.
4.1.4. The Setting of the Fully Connected Layer
The fully connected layer is placed after the flatten layer, which flattens the multi-dimensional feature maps into one-dimensional feature vectors. Subsequently, the fully connected layer performs a linear transformation on these vectors using a weight matrix and a bias vector, further combining and classifying these features. Given that the viewing angle assessment of ship lights is a four-class classification problem, the Softmax function is used to convert the outputs into a probability distribution, with each category’s probability indicating the likelihood of belonging to that particular viewing angle. To determine the optimal depth of the fully connected layer, comparative experiments were conducted under consistent model parameters, and the experimental results are shown in
Figure 13 and
Table 4.
As indicated by
Figure 13 and
Table 4, employing two fully connected layers in the model leads to a certain improvement in macro-precision, macro-recall, micro-precision, and micro-recall. Moreover, the loss value is smaller with this configuration. The experimental results demonstrate that using two layers for the fully connected section can enhance the model’s performance while ensuring its stability. Consequently, this paper opts for a two-layer fully connected layer as part of the model.
4.1.5. The Setting of Dropout
Dropout is a regularization technique widely employed in neural networks. It reduces the complex co-adaptation between neurons by randomly “dropping out” a subset of neurons with a certain probability, thereby preventing overfitting and enhancing the model’s generalization capability. In this paper’s model, Dropout values of 0.2 and 0.5 are considered, and the optimal value is selected through comparative experiments. The experimental results are shown in
Figure 14 and
Table 5.
As shown in
Figure 14 and
Table 5, when the Dropout is set to 0.2, there is a noticeable improvement in micro-precision, macro-precision, macro-recall, and micro-recall. Additionally, the loss value is smaller at this setting. The experimental results indicate that with a Dropout rate of 0.2, the model exhibits higher performance and better stability. Therefore, the Dropout value for the ECA-1D-CNN model is set to 0.2.
4.1.6. The Setting of ECA
The ECA module is a channel attention mechanism designed for deep convolutional neural networks, with its core idea being to avoid dimensionality reduction while maintaining appropriate cross-channel interaction, thereby significantly reducing model complexity while preserving performance. Experimental comparisons between 1D-CNN and ECA-1D-CNN were conducted, and the results are shown in
Table 6.
As shown in
Table 6, the introduction of the ECA module has increased the micro-precision by 3.1 percentage points, macro-precision by 3.4 percentage points, macro-recall by 3.3 percentage points, and micro-recall by 3.4 percentage points. Meanwhile, compared to the 1D-CNN, the ECA-1D-CNN has reduced the mean detection time by 4.9%. The adoption of the ECA avoids dimensionality reduction while employing one-dimensional convolution to achieve local cross-channel interaction, thereby lowering the model complexity. It maintains high model performance while reducing computational requirements.
In summary, the ECA-1D-CNN model presented in this paper consists of an input layer, two feature extraction layers, a flatten layer, and two fully connected layers. Each feature extraction layer comprises a convolutional layer, a BN layer, and an average pooling layer. Notably, an ECA module is integrated within the first feature extraction layer, with a Dropout rate set at 0.2. The architectural layout of this model is illustrated in
Figure 15.
4.2. Vessel Type Determination Module
Given that the viewing angle assessment module has categorized the ship’s light feature data into four classes based on viewing angles, this paper’s vessel type determination module for approaching vessels is composed of four sub-modules. As illustrated in
Figure 7, each viewing angle corresponds to an individual vessel type determination model. The 1D-CNN, feed-forward neural network (FNN), and deep neural networks (DNNs) are three foundational and commonly used deep learning algorithms, and they are therefore selected by this paper for vessel type determination. The optimal algorithm of 1D-CNN, FNN, and DNN is further selected for each viewing angle through comparative experimental analysis. The experimental results are shown in
Table 7. The four datasets used are the fixed-viewing angle ship light feature dataset described in
Section 5.1 (Dataset and Experimental Environment) of this paper.
As shown in the first part of
Table 7 the performance of 1D-CNN is superior in the frontal viewing angle. The macro-precision and micro-precision of 1D-CNN are 1.2 to 1.8 percentage points higher than those of FNN and DNN. The macro-recall of 1D-CNN is 0.4 to 0.5 percentage points higher than that of FNN and DNN. The micro-recall of 1D-CNN is 2.2 to 2.5 percentage points higher than that of FNN and DNN. According to the second part of
Table 7, under the port-side viewing angle, DNN performs better. The macro-precision of DNN is 4.3 to 5.0 percentage points higher than that of 1D-CNN and FNN. The micro-precision of DNN is 1.1 to 1.5 percentage points higher than that of 1D-CNN and FNN. The macro-recall of DNN is 4.4 to 5.0 percentage points higher than that of 1D-CNN and FNN. The micro-recall of DNN is 1.7 to 2.1 percentage points higher than that of 1D-CNN and FNN. From the third part of
Table 7, it can be seen that in the starboard-side viewing angle, DNN also outperforms others. The macro-precision of DNN is 3.1 percentage points higher than that of 1D-CNN and 5.9 percentage points higher than that of FNN. The micro-precision of DNN is 1.5 percentage points higher than that of 1D-CNN and 3.0 percentage points higher than that of FNN. The macro-recall of DNN is almost identical to that of 1D-CNN with a difference within 0.02 percentage points, and it is 6.5 percentage points higher than that of FNN. The micro-recall of DNN is nearly the same as that of 1D-CNN, with a difference within 0.1 percentage points, and it is 1.2 percentage points higher than that of FNN. The results in the fourth part of
Table 7 indicate that under the stern viewing angle, the performance of 1D-CNN, FNN, and DNN is poor, leading to failure of the vessel type determination model. In this view, the values for macro-precision, macro-recall, micro-precision, and micro-recall are all less than 66%, which is significantly lower compared to the other three viewing angles.
In the frontal and side views, the display of ship lights varies among different vessels, with each ship’s light features being unique. However, in the stern view, different types of vessels may exhibit identical light, making their light features non-unique. For example, a towing vessel, regardless of whether its length exceeds 50 m or its tow length exceeds 200 m, will always show only one white stern light and one yellow towing light from the stern view. Vessels with identical light features are categorized into one class. When identifying the class label, while it is not possible to determine the specific type of vessel, all potential vessel types that could display those particular light features can be determined. By combining the observed ship light displays and light feature metrics under the stern view, the 36 classes of ships labeled
N1–N36 have been reclassified into 19 distinct labels, as shown in
Table 8.
According to
Table 8, the stern view ship light feature dataset was relabeled, and experimental validation analysis was conducted. The experimental results are shown in
Table 9. As indicated by
Table 9, under the stern view, the performance of the 1D-CNN model is superior. The macro-precision and micro-precision of 1D-CNN are 0.7 percentage points higher than that of DNN and 1.0 to 2.1 percentage points higher than that of FNN. The macro-recall of 1D-CNN is similar to that of FNN and DNN, with differences kept within 0.3 percentage points. The micro-recall of 1D-CNN is 0.7 percentage points higher than that of FNN and 0.2 percentage points higher than that of DNN.
In summary, in accordance with the comparative experimental results, 1D-CNN is finally selected for the front viewing angle and stern viewing angle, and DNN is finally selected for the port-side viewing angle and starboard viewing angle.
5. Experimental Validation Analysis
5.1. Dataset and Experimental Environment
The experiments conducted in this paper primarily involve verifying the effectiveness of two models: a ship light recognition model based on improved YOLOv8 and a vessel type determination model based on ship light features. Each part of the experiment requires its corresponding dataset.
The experiments in this paper are for the purpose of verifying the effectiveness of the two models, which are the ship light recognition model based on the improved YOLOv8 and the vessel type determination model based on ship light features. Each part of the experiment requires its corresponding dataset. The verification of the ship light recognition model based on the improved YOLOv8 requires a dataset of ship light images. The verification of the vessel type determination model, which comprises a viewing angle assessment module and a vessel type determination module, requires two types of ship light feature datasets. Specifically, the viewing angle assessment module requires a dataset featuring mixed-perspective ship light features; the vessel type determination module requires datasets from four fixed viewing angles, corresponding to the front viewing angle, port-side viewing angle, starboard viewing angle, and stern viewing angle. However, existing datasets are mainly designed for conventional ship detection tasks and are not suitable for light recognition and determination. Therefore, new datasets need to be created.
The ship light image dataset comprises a total of 4500 JPG images, consisting of two parts: photographs taken of vessels at sea and images of ship lights sourced from the internet. The photographs taken of vessels at sea were captured by the M/V SITC ZHAOMING along the coast of Japan, totaling 3471 images, while the internet-sourced ship light images amount to 1029. The mixed-viewing angle ship light feature dataset is stored as a CSV file, with data format consistent with
Appendix A, containing 2260 samples. This dataset was established based on the ship light recognition results presented in this paper; the recognized images can be processed according to the ship light feature metrics outlined in
Section 3.3 (The Features of Ship Lights) of this paper to generate the dataset. The viewing angles are labeled as “0–3,” representing the front viewing angle, port-side viewing angle, starboard viewing angle, and stern viewing angle, respectively. The four single-viewing-angle ship light feature datasets are also stored as CSV files, adhering to the format specified in
Appendix A, with each dataset containing 500 samples to ensure experimental accuracy. Similarly, these datasets were built upon the ship light recognition outcomes discussed in this paper, with the feature processing procedure being identical to that described previously. However, these datasets include labels for vessel types, where “0–35” denote the 36 categories of ships examined in this paper.
Experimental hardware environment: The computer’s CPU is an i7-12700H, and the GPU is an NVIDIA GeForce RTX 3080Ti. The experimental software environment is Python 3.9.7, CUDA 11.7, and the deep learning framework used is Pytorch 1.12.1. In this paper, the number of training epochs is set to 400 for all experiments, with a batch size of 16.
5.2. Analysis of Ship Light Recognition
To verify the effectiveness of the ship light recognition model, this paper conducts ablation experiments and a comparative analysis of ship light recognition results. Precision, recall, and mean average precision at IoU threshold 0.5 (mAP@0.5) are selected as performance metrics to accurately compare the performance among different models. The results are presented in
Table 10 and
Figure 16.
From
Table 10, it can be seen that, compared with the baseline model, the introduction of DWR results in an improvement of 2.4% in precision, 2.8% in recall, and 6% in mAP@50. The further introduction of BiFPN enhances precision by an additional 2.6%, recall by 2.9%, and mAP50 by 3.1%. With the incorporation of HAT, precision is further boosted by 3.1%, recall by 3.0%, and mAP50 by 2.3%. According to
Table 10 and
Figure 16, after implementing all the improvements, the final improved model in this paper achieves significant enhancements compared to the baseline model, specifically with an 8.1% increase in precision, 8.7% in recall, and 11.4% in mAP@50.
To verify the effectiveness of the improved YOLOv8-based ship light recognition model under various circumstances, ship light images from the dataset that contain interference lights, shadows, sea surface reflections, and other environmental factors are selected for experiments. In
Figure 17, the experimental image “ship2” contains a large number of interfering lights from the subject vessel itself, as well as extensive light and shadow effects, and the experimental image “ship_navigation_lights1216” contains noticeable sea surface reflections. The experimental images “ship2” and “ship_navigation_lights1216” were experimentally recognized by both the baseline YOLOv8 model and the improved YOLOv8 model of this paper, with the results shown in
Figure 17b and
Figure 17c, respectively. It can be observed that the recognition accuracy of the baseline YOLOv8 model is relatively low, while the recognition accuracy of the improved YOLOv8 model has been significantly enhanced.
To verify the effectiveness of the improved YOLOv8 compared with other algorithms, experiments for light recognition were carried out using YOLOv3, SSD, Faster-RCNN, YOLOv5, YOLOv6, YOLOv7, YOLOv8, and improved YOLOv8. The experimental results are shown in
Table 11. According to
Table 11, the YOLO series algorithms exhibit superior ship light recognition performance compared to SSD and Faster-RCNN. The improved YOLOv8 stands out within the YOLO series with the highest recognition accuracy, showing an 8–17 percentage point increase in precision and an 11–20 percentage point increase in mAP@0.5 over the baseline models within the YOLO series, demonstrating a significant advantage.
In summary, the improvements to YOLOv8 have significantly enhanced the model’s capability in detecting small targets, achieving high recognition accuracy specifically in ship light recognition. In the process of model improvement, DWR has employed a two-step residual feature extraction method using depthwise separable convolutions at varying rates to expand the receptive field. This approach improves the model’s ability to extract features such as the color, geometric shape, and spatial distribution of ship lights, thereby enhancing the recognition accuracy for small targets like ship lights. By introducing BiFPN along with direct cross-layer connections added in this work, the model’s capability to recognize distant ship lights is not only improved but also reduces the impact of interfering lights from the host ship on the recognition results. The bidirectional paths and cross-layer connections facilitate better feature integration across different scales, which is crucial for accurately recognizing small or distant objects. Additionally, the introduction of HAT, which merges channel attention and self-attention mechanisms, enables the super-resolution reconstruction of images. This integration not only boosts the model’s performance in recognizing small target ship lights but also diminishes the effects of light and shadow, as well as reflections off sea surfaces, on the recognition outcomes. The use of attention mechanisms allows the model to focus on relevant features while suppressing noise and irrelevant information, leading to more accurate and reliable detection.
5.3. Analysis of Determination of Ship Types
Traditional methods for discriminating the types of approaching vessels often rely on two-dimensional data such as video or images for ship recognition and classification. In contrast, the method proposed in this paper extracts one-dimensional feature data based on ship light recognition and employs a two-step approach of “viewing angle assessment-vessel type determination (VAA-VTD)” for classification. This method introduces two key innovations: (1) the use of one-dimensional light feature data, and (2) the incorporation of a “viewing angle assessment module”. To validate the feasibility of vessel classification using one-dimensional light feature data, Experiment 1 compares the proposed method with conventional image-based vessel recognition and classification approaches. Additionally, to assess the effectiveness of the “viewing angle assessment module”, Experiment 2 contrasts the proposed method with the vessel classification approaches that do not incorporate this module. The evaluation metrics employed are macro-precision, macro-recall, micro-precision, and micro-recall. The experimental results are presented in
Table 12 and
Table 13.
Experiment 1: In this experiment, YOLOv3, SSD, Faster-RCNN, YOLOv5, YOLOv6, YOLOv7, and YOLOv8 models were utilized for vessel recognition and classification based on ship light image dataset. These models represent common image-based object detection and recognition methods. Meanwhile, the model proposed by this paper was tested on a mixed-viewing angle light feature dataset for approaching vessel type determination.
Experiment 2: This experiment compares the model proposed by this paper against OvR-SVM, FNN, DNN, and 1D-CNN for vessel determination on the mixed-viewing angle light feature dataset. OvR-SVM and other baseline methods are standard one-dimensional data classification approaches without the viewing angle assessment module.
According to
Table 12, in the “Comparison experiment on whether to use one-dimensional ship light features”, the model proposed by this paper again demonstrated superior performance on both macro and micro metrics. It showed an improvement of 22–37 percentage points on macro metrics and 20–31 percentage points on micro metrics over other models. Currently, mainstream ship recognition and determination models heavily rely on images, requiring clear visibility of the ship’s main body. Ship lights are exhibited under various circumstances such as dawn/dusk, fog, and night. Under nighttime circumstances, when other vessels are distant or their lights are dim, it becomes challenging to observe the vessel hull, leaving only ship lights detectable. Image-based vessel detection and determination methods typically rely on the vessel hull as the primary visual feature, rather than ship lights. While these methods perform well when the hull is observable, they have difficulty producing reliable results in scenarios in which only ship lights are visible. Given that the dataset of this paper contains a significant number of such challenging nighttime images, models like YOLOv3 exhibit lower recognition accuracy. However, since the model proposed by this paper performs type determination based on one-dimensional ship light features, it is unaffected by the visibility of the ship’s main body, resulting in higher model accuracy. As shown in
Table 13, during the “Comparison experiment on whether to use viewing angle assessment”, the proposed model achieved higher precision on both macro and micro metrics. Specifically, there was a 16–43 percentage point improvement in macro metrics compared to other models, and a 12–43 percentage point improvement in micro metrics. This is because the same vessel exhibits distinct ship light patterns across four viewing angles, and the feature data corresponding to each angle can represent the vessel. However, these ship light feature data lack uniqueness. Directly performing vessel type determination under such conditions severely disrupts the training of model weights, thereby degrading determination performance. By implementing the viewing angle assessment module, which employs dedicated classification models for distinct viewing angles, the non-uniqueness issue of navigation light features is resolved, significantly enhancing classification accuracy. Since baseline methods like OvR-SVM omit the viewing angle assessment module, their determination accuracy is markedly lower. In contrast, the proposed model by this paper integrates this module, achieving superior performance.
The vessel type determination method based on ship light features proposed in this paper comprises four stages: approaching vessel navigation light recognition, feature extraction of navigation lights, viewing angle determination, and vessel type classification. Taking
Figure 18 as an example, the process unfolds as follows.
For an approaching vessel, the recognition result identifies one white light and one red light.
Based on the ship light feature indicators in this paper, feature extraction is performed on the recognition results, extracting the features “a single white light exists independently of fixed combinations” (denoted as M1) and “a single red light” (denoted as M2). Here, M1 and M2 are set to 1, while the remaining 21 indicators are all set to 0.
The 1D feature vector is input into the viewing angle assessment module, which identifies the viewing angle as port-side.
Under the condition of a port-side viewing angle, vessel type determination is conducted, resulting in the determination that the approaching vessel is a power-driven vessel underway with a length of less than 50 m. When we determine the approaching vessel as a power-driven vessel, we can infer that its ship lights exhibit a white mast light and a red sidelight. At this point, our vessel and the approaching vessel are in a crossing situation. According to Rule 15 of the COLREGs, our vessel is designated as the give-way vessel and must give way to the approaching vessel. In
Figure 18, the proposed approaching vessel type determination method is applied, and information regarding the type, size, dynamics, propulsion method, and other related attributes of the approaching vessel is obtained. This information further clarifies the current encounter situation, defining the collision avoidance responsibilities for the vessel.
6. Discussion
Ship light recognition serves as the foundation for determining approaching vessel types. Only by extracting features such as color, combination, spatial distribution, and positional information from the observed ship lights can the vessel type and related attributes be determined. During the recognition stage, however, interference from background lights (e.g., deck lights, ambient lighting) significantly impacts the detection and classification of ship lights. The COLREGs explicitly define the combination patterns, spatial arrangements, and installation positions of ship lights, endowing them with structured features. In contrast, interference lights lack regulatory constraints, exhibit no discernible patterns, and possess only basic color characteristics. By leveraging the combination, spatial distribution, and positional features of ship lights, the influence of interference lights can be mitigated. Traditional YOLOv8 models exhibit limited capability in extracting features from small targets like ship lights, often confusing them with interference lights, resulting in low recognition accuracy. The improved YOLOv8 model proposed in this paper addresses this by integrating DWR, BiFPN, and HAT modules, enhancing small object detection performance. This enables the model to extract sufficient color, combination, spatial, and positional features, achieving higher recognition accuracy. Despite these improvements, interference lights near ship lights (e.g., adjacent or partially overlapping) can invalidate spatial and positional features, leading to misclassification. This remains a critical challenge for achieving further accuracy gains and will be addressed in subsequent research.
During the process of capturing ship light image data aboard the SITC ZHAOMING vessel, we integrated data from marine radar and the AIS to perform precise annotations of vessel types, sizes, and other relevant information for the photographed ships. The experiments described in this paper were conducted onshore after data collection, and we did not use a maritime simulator for our experiments. While maritime simulators can replicate realistic navigational environments—even allowing users to observe the ship lights of approaching vessels—additional development efforts are required to integrate our improved YOLOv8-based vessel type classification method (leveraging navigation light features) into such simulators. This integration is necessary to ensure seamless interaction between the autonomous vessel classification system and the simulator’s operational framework. This will be addressed as part of our future research.
In practical maritime operations, the visibility and configurations of ship lights from other vessels can be affected by various factors such as environmental conditions and observation angles. For example, the motion amplitude of a ship can greatly increase due to waves and typhoons, hindering the accurate observation of the combination and positional information of ship lights from approaching vessels. Furthermore, to deploy the proposed model onboard vessels, model lightweighting and computational efficiency optimization are essential to meet real-time processing requirements in dynamic maritime environments. This will be addressed as part of our future research.
Whether it is marine radar, AIS, or the ship type determination model studied in this paper, the ultimate goal is to provide data support for a vessel’s situational awareness and collision avoidance decisions. In future research, we will integrate the data obtained from AIS, marine radar, and the model presented in this paper to achieve multimodal data integration. By doing so, we can fully leverage the advantages of each modality, overcome the limitations of a single data source, and provide more accurate information support for assessing the encounter situation of vessels and determining collision avoidance responsibilities.
7. Conclusions
To enrich the methods of acquiring approaching vessel information in complex circumstances, this paper proposes an autonomous determination method based on an improved YOLOv8 model and multi-view ship light features. The main contributions are as follows. (1) Ship Light Recognition Model Improvement: By introducing a DWR module, BiFPN, and HAT, the detection capability of YOLOv8 for small target ship lights was significantly enhanced. The DWR module enhances feature extraction through multi-scale dilated convolutions. BiFPN optimizes feature fusion efficiency with cross-layer connections. HAT improves small target detection ability and mitigates interference from complex backgrounds using a hybrid attention mechanism. Consequently, the mAP@0.5 of the ship light recognition model reached 94.7%, an improvement of 11.4% over the baseline model. (2) Dataset Construction: Based on the COLREGs, a systematic analysis of the light exhibition patterns for 36 types of vessels across four viewing angles was conducted. A multi-view dataset encompassing 144 light combinations was constructed, and 23 feature indicators were proposed to quantify the spatial distribution and semantic information of ship lights, providing a data foundation for vessel type determination models. (3) Determination Framework Innovation: An innovative VAA-VTD two-stage model was proposed. Firstly, ECA-1D-CNN achieved viewing angle determination with a precision of 95.79%. Subsequently, the model proposed by this paper dynamically selects the optimal determination algorithm based on the viewing angle assessment results of ship light features. 1D-CNN is applied for front and stern viewing angles, while DNN is utilized for port-side and starboard viewing angles. Ultimately, the approaching vessel type determination accuracy of the model proposed in this paper reaches 92.49%, representing an improvement of 22 to 37 percentage points over image-based vessel detection and classification methods.
The experimental results demonstrate that the approaching vessel type determination method based on ship light features, as proposed in this paper, maintains high robustness and achieves considerable accuracy under complex conditions such as nighttime and twilight. This paper provides theoretical foundations and technical references for the development of autonomous navigation and intelligent collision avoidance systems for vessels.