Deep Learning for Visual Leading of Ships: AI for Human Factor Accident Prevention

Vázquez Neira, Manuel; Cao Feijóo, Genaro; Sánchez Fernández, Blanca; Orosa, José A.

doi:10.3390/app15158261

Open AccessArticle

Deep Learning for Visual Leading of Ships: AI for Human Factor Accident Prevention

Department of Navigation Sciences and Marine Engineering, University of A Coruña, Paseo de Ronda, 51, 15011 A Coruña, Spain

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(15), 8261; https://doi.org/10.3390/app15158261

Submission received: 21 June 2025 / Revised: 11 July 2025 / Accepted: 22 July 2025 / Published: 24 July 2025

(This article belongs to the Special Issue Advances in Maritime Transport: Sustainability, Contamination and New Technologies—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Traditional navigation relies on visual alignment with leading lights, a task typically monitored by bridge officers over extended periods. This process can lead to fatigue-related human factor errors, increasing the risk of maritime accidents and environmental damage. To address this issue, this study explores the use of convolutional neural networks (CNNs), evaluating different training strategies and hyperparameter configurations to assist officers in identifying deviations from proper visual leading. Using video data captured from a navigation simulator, we trained a lightweight CNN capable of advising bridge personnel with an accuracy of 86% during night-time operations. Notably, the model demonstrated robustness against visual interference from other light sources, such as lighthouses or coastal lights. The primary source of classification error was linked to images with low bow deviation, largely influenced by human mislabeling during dataset preparation. Future work will focus on refining the classification scheme to enhance model performance. We (1) propose a lightweight CNN based on SqueezeNet for night-time ship navigation, (2) expand the traditional binary risk classification into six operational categories, and (3) demonstrate improved performance over human judgment in visually ambiguous conditions.

Keywords:

artificial intelligence; leading; ships; fatigue

1. Introduction

In light of past maritime accidents, international conventions, such as SOLAS (Safety of Life at Sea [1]) and MARPOL (Maritime Pollution [2]), have been continuously updated to pursue two main goals: safeguarding the lives of seafarers and minimizing marine pollution. These conventions define minimum training and certification requirements for ship personnel, covering the full career progression from cadet to Master or Chief Engineer. These standards are codified in the STCW Convention (International Convention on Standards of Training, Certification and Watchkeeping for Seafarers) [3], which includes specific competencies for different operational areas, such as marine engineering, bridge operations, and electro-technical systems.

To meet these requirements, advanced maritime simulators have been developed by companies like TRANSAS [4], providing high-fidelity training environments that replicate real shipboard conditions. These simulators are particularly valuable for training officers in scenarios that are rare or difficult to replicate at sea.

Among the many navigation techniques practiced, leading lights—also known as range lights—remain essential. They offer reliable visual guidance, especially in narrow channels or riverine environments. Despite advances in satellite-based navigation, leading lights continue to be widely used due to their accuracy and resilience under various environmental conditions.

However, this method requires continuous visual monitoring by the bridge officer, particularly during night shifts. Over time, this sustained attention can lead to fatigue and degraded performance—factors that contribute significantly to maritime incidents [5,6,7,8,9].

In response to this challenge, the present study investigates the potential of artificial intelligence (AI) to support officers in maintaining visual alignment during night navigation. Specifically, we propose a deep learning model based on convolutional neural networks (CNNs) trained to recognize correct and incorrect alignment with leading lights, using video data collected from a navigation simulator.

This work makes three key contributions: (1) the development of a lightweight CNN adapted to low-light maritime conditions; (2) an expanded classification scheme that distinguishes between six navigation states, improving interpretability; and (3) experimental validation showing that the proposed model can outperform human observation under visually ambiguous conditions.

2. Materials and Methods

2.1. Transas 5000 Simulator of Navigation

The Higher Technical School of Nautical and Marine Engineering at the University of A Coruña (UDC) is equipped with the Navi-Trainer Professional 5000 [10,11], a state-of-the-art simulator that replicates a full navigation bridge. Developed by the Norwegian company Transas, this system is primarily used to train deck students preparing to become future officers and captains.

Thanks to its advanced capabilities, it supports the training and certification of watch officers, chief mates, masters, and pilots operating merchant and fishing vessels over 500 gross tons, in accordance with the IMO STCW 78/95 Convention (International Maritime Organization). It is particularly effective for specialized training courses and is also a valuable tool for reconstructing and analyzing complex navigational scenarios, including real-life maritime emergencies.

As shown in Figure 1, the NTPRO 5000 combines hardware and software to create highly realistic navigation bridge environments. The system is managed from an instructor’s station and operates over a local network of interconnected standard personal computers.

The simulator software includes multiple modules: a network operation manager; systems for computing mathematical models of own ships, target vessels, drifting objects, tugboats, 3D wind-generated sea states, mooring lines, and fenders; the main instructor interface; conning displays; visual channels; and interfaces with actual ship equipment.

It also features preloaded and regularly updated databases, including a 3D visual scene library for various maritime regions, radar scene libraries, and a broad range of vessel-specific mathematical models.

This simulator holds a Statement of Compliance with the Class A Standard for the Certification of Maritime Simulators [12] Standard No. 2.14, issued by DNV (Det Norske Veritas) in accordance with STCW Convention Regulation I/12.

One of its key advantages is the instructor’s ability to fully customize training scenarios. They can configure different ship types, navigation zones, weather and sea conditions, and time of day, enabling a highly realistic visual and acoustic simulation. Properly calibrated ship models ensure that vessel behavior during navigation, berthing, towing, and other port operations reflects real-world dynamics, even under adverse conditions.

2.2. Convolutional Neural Networks: Knowledge Transfers and Hyperparameters

This initial study demonstrates that, during night-time navigation, it is possible to improve the course-keeping of a merchant vessel, which is still commonly guided by leading lights. As a result, the human factor becomes critically important, with direct implications for fuel efficiency, operational delays, and navigational safety. To address this, we tested different neural network training strategies. The first approach was based on transfer learning, while the second involved adjusting hyperparameters such as the learning rate and the number of iterations needed to ensure convergence of the gradient descent algorithm used throughout the training process.

The transfer learning approach makes use of a pre-trained network that is adapted to a specific image recognition task. In this case, we selected SqueezeNet, a convolutional neural network with 18 layers, which is available in a pre-trained version trained on over one million images from the ImageNet database. These reference images include vehicles and everyday objects, offering a solid foundation for transfer learning in related visual tasks. Therefore, the purpose of this approach is to leverage the feature recognition capabilities already learned by SqueezeNet and fine-tune it with domain-specific images, reducing training time while maintaining accuracy.

For the training setup, the convolutional base of SqueezeNet (up to the final fire module) was frozen to retain general feature extraction, while the classifier layers were replaced and fine-tuned using our simulator-acquired dataset. The model was trained with a batch size of 32 using the Adam optimizer (learning rate = 0.0005, β₁ = 0.9, β₂ = 0.999), and a stepwise learning rate decay was applied every 60 epochs. Although weight decay was not included in the baseline, L2 regularization (λ = 0.01) was tested to reduce overfitting. Dropout (p = 0.5) was applied to the fully connected layers, and data augmentation techniques, such as horizontal flips, brightness shifts, and random cropping, were used to increase variability in night-time lighting conditions.

SqueezeNet was chosen over other lightweight CNNs (e.g., MobileNet, EfficientNet) due to its minimal parameter count (~1.25 M), fast inference, and low memory footprint, which are critical for real-time performance and potential integration into onboard navigation systems with constrained computational resources.

Model Selection Justification

Convolutional neural networks (CNNs) were selected for this study due to their well-established capacity to extract spatial features from image-based data, particularly in low-light environments where leading lights may be partially occluded or distorted by ambient illumination. Although alternative architectures, such as Fully Connected Networks (FCNs), Recurrent Neural Networks (RNNs), and Transformer-based models, were considered, CNNs provided the best trade-off between accuracy, inference speed, and real-time feasibility for onboard assistance systems. The following table (Table 1) summarizes the comparative strengths and weaknesses of each architecture:

We selected SqueezeNet, a compact CNN with significantly fewer parameters than larger networks, such as ResNet or Inception, which allowed faster training and inference while preserving accuracy. Transfer learning with ImageNet-pre-trained weights also accelerated convergence and improved generalization.

2.3. Video Recording

The video recording process was carried out using a tripod-mounted camera placed at the pilot’s expected eye level, ensuring a realistic perspective from the navigation bridge. The camera captured footage at a resolution of 16 megapixels. To replicate realistic night-time conditions, the simulator bridge lighting was dimmed, as is standard practice on real merchant vessels during night operations.

A professional pilot with over 15 years of experience performed the navigation maneuvers. Initially, a scenario with correct leading alignment at a distance of 2 nautical miles was recorded. This was followed by controlled port and starboard deviations, generating diverse image samples for each condition. Subsequently, navigation sequences from greater distances (10 nautical miles or more) were recorded to capture both correct and incorrect leading perspectives, including cases where only a single light could be perceived due to the distance.

The bow’s position within the image frame is a critical reference for assessing visual alignment. Therefore, as shown in Figure 2 and Figure 3, the bow was consistently placed in the center of the main display screen during all recordings. As will be discussed in the following sections, the video footage was later segmented into still images with a resolution of 1280 × 720 pixels for processing.

2.4. Image Classification

Once the video sequences were recorded under various navigation conditions—including both correct and incorrect alignments at short and long distances—they were converted into still images. Approximately 6000 frames were extracted per video, resulting in a dataset of around 20,000 images.

An initial classification was performed using three categories:

Correct navigation (Figure 4 and Figure 5),
Incorrect navigation (Figure 6), and
Uncertain cases, where the leading lights were difficult to identify due to interference from other light sources on the horizon (Figure 7).

Correct alignment was defined as the presence of two grey lights positioned vertically, one above the other, clearly indicating the correct course at closer distances. At greater distances (e.g., over 10 nautical miles), these lights may appear as a single point due to perspective. Additionally, the presence of nearby lighthouses could introduce a secondary bright light, typically located higher above the horizon, which was also taken into account during classification.

All these factors were carefully considered during the manual labeling process, which was initially performed on a representative subset of 177 images to support model training and validation.

3. Results and Discussion

3.1. Network Training Accuracy and Hyperparameters: Transfer Learning

As previously explained, a convolutional neural network was initially trained using a transfer learning approach. For this task, we selected SqueezeNet, a pre-trained CNN, due to its exposure to a large-scale dataset of over one million images across 1000 categories, providing a rich feature base for visual recognition tasks. Other architectures, such as GoogleNet, were not chosen because of their comparatively lower accuracy in similar applications [13].

During training, we configured the model with three output classes (OK, NOT OK, and DOUBT). The Adam optimizer was used with a constant weight learning rate factor of 10, and training was carried out over 40 epochs, based on the observed convergence of the algorithm after approximately ten iterations. The initial stopping criterion was set to the minimum loss, but after several tests an alternative criterion based on a plateau in the loss value over multiple epochs was adopted. The main results are presented in Figure 8 and Figure 9 as confusion matrices. In these confusion matrices, we can compare the obtained result (Predicted class) with respect to the initial manual classification done previously by humans (True Class) to test the artificial intelligence.

To interpret these figures, it is important to note that a high loss and low accuracy indicate poor performance, reflecting widespread misclassification. Conversely, a low loss combined with high accuracy indicates strong predictive performance with only minor errors. A scenario with both low loss and low accuracy suggests a limited error magnitude across many data points, but lack of effective generalization.

Figure 8 illustrates a poor prediction outcome, with the model classifying all test images as OK (correct leading). As we can see in Figure 8, all the images are in the OK column of the predicted images, but really some of them are NOT OK or in the DOUBT row of the true classes previously defined by humans to start the test. Although the training and validation phases reported an accuracy of approximately 60%, this level was deemed insufficient.

To improve the model’s performance, several training experiments were conducted, including reducing the weight learning rate factor, testing both constant and variable learning rate schedules, and evaluating alternative optimization algorithms, such as replacing Adam with SGDM (Stochastic Gradient Descent with Momentum) [14,15]. Additionally, L2 regularization was introduced to minimize overfitting. As shown in Figure 10, some variation was observed in the accuracy curve, yet the overall classification performance and confusion matrix remained suboptimal.

In response, a new convolutional neural network was developed, specifically designed for the visual conditions of night navigation. Because of the high validation frequency, the validation curve appears with linear segments. Nevertheless, it is evident that the loss decreased compared to the previous model, due to reducing the weight learning rate factor to 1. This suggests that, although the overall loss improved, significant errors persisted in specific subsets of the data, as reflected in the confusion matrix.

3.2. Network Training Accuracy and Hyperparameters: Specialized CNN

The Adam algorithm was selected for the training process due to the large size of the image dataset, and because it offers efficient convergence and high accuracy within a reduced training time. As an extension of the Stochastic Gradient Descent (SGD) algorithm, Adam dynamically updates the neural network weights during training by combining momentum and adaptive learning rates.

In this study, the dataset was divided as follows: 70% for training, 15% for validation, and 15% for testing.

As is common in neural network training, accuracy and loss were recorded for each epoch. As shown in Figure 11, the training process achieved 100% accuracy, while the validation process reached 86%, which indicates effective generalization and strong predictive performance on new data. These results suggest that overfitting was successfully avoided, as the validation accuracy remained stable across epochs and closely tracked the training accuracy.

Finally, Figure 12 presents the confusion matrix, showing the performance of the network in classifying the three output classes: OK, NOT OK, and DOUBT. Out of the 177 test images, 111 were correctly classified as OK (correct leading), corresponding to 63% of the dataset. Additionally, 40 images were correctly identified as NOT OK.

Only 12 and 14 images were misclassified between the OK and NOT OK categories, which demonstrates the reliability of the model’s predictions.

It is worth noting that, although 13 images were labeled and included in training as DOUBT, the network did not classify any image into this category during testing. This suggests that the model consistently assigned a decision (either OK or NOT OK), even in cases with potentially ambiguous visual conditions, such as interference from lighthouses or lights on the horizon.

3.3. CNN Test

As is standard practice in neural network workflows, a final test was conducted after the training and validation phases to further assess the model’s image recognition performance.

To this end, a new set of images was introduced, and the network was tasked with predicting three categories: correct leading (Figure 13), incorrect leading (Figure 14), and a doubtful case, which was expected to be challenging even for an artificial intelligence system due to visual ambiguity (Figure 15).

From this test, it was clearly confirmed—consistent with the training and validation results—that the network achieved full accuracy, correctly classifying all test images with 100% success across the evaluated categories. It is of interest to highlight that the image in Figure 15 shows ambiguous visual conditions caused by background light interference on the horizon, making it difficult to determine the correct alignment of the leading lights. This scenario challenges both human observers and AI classification systems.

3.4. CNN Optimization

To further improve the accuracy of the trained neural networks, two main strategies were explored: refining the initial image classification and tuning the network’s hyperparameters. In particular, we tested different combinations of the iteration algorithm (Adam or SGM) and learning rate configurations (constant vs. variable), and applied both to the CNN derived from transfer learning and to the network trained specifically for night navigation conditions.

3.4.1. Increasing the Weight Learning Rate to 0.2, 3 Classes, 200 Epochs (Accuracy = 0.9435)

The results obtained by increasing the weight learning rate are shown in Figure 16 and Figure 17.

3.4.2. Increasing the Weight Learning Rate to 0.2 and Two Classes 200 Epochs

It can be concluded that accuracy improves when the number of output classes increases, even though the network never selected the “DOUBT” class explicitly, instead assigning each sample to one of the two main categories (OK or NOT OK). This suggests that the initial human-based classification can be refined by the neural network’s own learning process (Figure 18 and Figure 19).

Furthermore, human intervention in the early stages of deep learning—particularly in labeling—can influence the final model performance. For instance, increasing the number of epochs from 40 to 200 resulted in a modest improvement in accuracy, from 0.9322 to 0.9435 when using three classes, compared to 0.8632 with fewer classes.

As a result, it is essential to enhance the quality of initial image classification to achieve a better fit during network training, as will be further illustrated in the next section.

At the same time, the stopping criterion based solely on loss minimization proved insufficient. Due to the inherent characteristics of the image dataset, a minimum residual error persisted even after full training across all epochs. Therefore, an alternative stopping criterion—such as observing stability in accuracy over a fixed number of epochs (e.g., 20)—is recommended when training this type of visual data.

In line with previous research, a threshold of approximately 40 epochs is typically sufficient to reach convergence or stability, as demonstrated in earlier figures.

3.5. Accuracy Analysis in Visual Detection

Based on the previous results, a new test was conducted using navigation scenarios at varying distances between the vessel and the two reference lights, in order to evaluate the accuracy of leading light detection by artificial intelligence. At greater distances, the two lights often appear as a single point to the human eye, due to their visual overlap. Moreover, even a slight misalignment of the reference lights can result in significant navigational errors, leading to economic costs and increased accident risk.

As a result, and following the earlier findings, a new classification scheme was introduced to better capture the degree of deviation from correct alignment. Specifically, six categories were defined based on the quality of visual leading:

Light and OK: At long distances, the two leading lights appear as a single point, which should ideally be positioned at the bow.
Lights and OK: The vessel is close to the leading lights, which are clearly visible and often overlapping. This represents proper navigational alignment.

NOT OK–Port Deviation: The ship is misaligned to port, deviating from the intended leading line.
NOT OK–Starboard Deviation: The ship is misaligned to starboard, also indicating incorrect leading.
NOT OK–Low Bow Deviation: Both lights appear near the bow but are not vertically aligned, indicating a subtle deviation.
DOUBT: Visual ambiguity due to interference from lights on the horizon, making it difficult to assess the alignment.

A new training process was carried out using this six-class classification scheme and obtaining Figure 20 and Figure 21. Although the number of images in some categories was reduced, this approach provided an interesting case study on improving the network’s ability to predict incorrect leading with greater precision.

This last confusion matrix was an attempt to identify the possibility of getting an autonomous pilot based on this system. For instance, when both lights appear vertically aligned, correct leading is assured. However, a small horizontal shift in the front light (e.g., starboard of the rear light) implies a portside deviation, which can be operationally significant even if visually subtle. These classifications allow the AI system to support not only binary navigation assessments (OK/NOT OK) but also to advise on corrective steering actions. It is of interest to highlight that when the deviation is elevated, the lights change their color; a simple black and white neural network will work perfectly for this application.

In general terms, the average accuracy obtained during the validation process was 0.68, representing a decrease compared to the accuracy achieved using only three classes. This reduction is primarily due to lower precision when identifying NOT OK conditions, as accuracy dropped from 83% (for the two-light case) to 68%.

This effect requires more detailed analysis. As shown in the confusion matrix, the neural network clearly recognizes the underlying reason why certain “doubtful” images appear ambiguous to human observers—typically due to background lighting such as coastal lights or other visual noise.

When examining class-specific performance, the lowest accuracy occurred in the NOT OK–Low Bow Deviation category, which was frequently misclassified as OK with one or two lights. This is understandable, as the lights are indeed positioned near the bow—suggesting correct alignment—but fail to meet the vertical alignment criteria of true leading lights.

Additionally, partial misclassification was observed in the NOT OK–Medium Port Deviation category, where the two lights should appear clearly shifted to one side of the bow. Despite this, only four misclassifications out of 150 images were recorded, resulting in a low error rate of just 2.6%.

Another source of misclassification involved confusion between NOT OK–Medium Bow and OK (one or two lights) categories. This likely stems from the inherent visual similarity between these cases, which also points to potential inconsistencies in the original human-based labeling. For instance, images labeled as containing two lights may, in fact, show only one, which may explain the moderate 70% accuracy (48 correct identifications out of 69) in this specific case.

From this Table 2, it can be concluded that the model demonstrates a very high recognition rate for correct leading conditions, whether represented by one or two visible lights, as well as for doubtful situations. However, moderate precision was observed when identifying incorrect leading in the “Low Bow Deviation” category, likely due to the visual similarity to a correctly aligned configuration where both lights appear near the bow but are slightly misaligned vertically.

Importantly, the reduced accuracy in detecting incorrect leading is primarily due to confusion between different incorrect categories, rather than misclassification as correct leading, which the model consistently recognized with high accuracy. Therefore, this tool shows strong potential as a decision-support system to assist bridge officers, though it is not yet suitable for fully autonomous navigation.

Finally, the main source of error was linked to misclassification between “Low Bow Deviation” and correct leading, a limitation that appears to stem from inconsistencies in the initial human labeling process prior to training. Consequently, future research will focus on increasing classification granularity and improving labeling accuracy to further enhance model performance.

Nevertheless, this study presents certain limitations. First, the dataset was entirely generated under simulated conditions, which—although realistic—may not capture the full variability of real-world maritime environments. Additionally, the model showed reduced accuracy in detecting subtle misalignments, particularly in “Low Bow Deviation” scenarios, highlighting the challenge of fine-grained classification. Finally, a preliminary analysis of statistical robustness was performed based on classification accuracy, confusion matrices, and misclassification trends across different navigation scenarios. While the model shows high precision in identifying correct alignment, its performance in low-deviation conditions reveals areas for improvement. These observations underscore the need for more extensive statistical validation in future work, ideally supported by larger and more diverse datasets.

Future work will focus on enhancing the labeling process, integrating real-world data from onboard cameras, and testing the model in operational conditions. The use of ensemble models or attention-based architectures may also be explored to improve detection in visually ambiguous or dynamic lighting environments.

Finally, a direct baseline comparison—such as with human observer performance or conventional rule-based methods—has not been included in this study, as no formal benchmarking data under identical simulator conditions is currently available. Future work will address this limitation by collecting empirical baseline data to enable a more comprehensive performance evaluation of the proposed CNN-based model.

4. Conclusions

This study leads to the following main conclusions, based on experimental and analytical findings: Artificial intelligence (AI) can reliably identify correct and incorrect navigation based on visual input, using image recognition techniques.

AI is capable of distinguishing between leading lights and background sources, such as lighthouses or horizon lighting, allowing it to consistently identify correct leading alignment.

This technology has the potential to reduce human factor-related accidents, improving operational safety during night-time navigation.

AI offers clear advantages over human observation in classifying navigation scenarios based on visual input. Furthermore, expanding the number of classification categories beyond a simple “OK/NOT OK” binary system leads to more informative and accurate results.
Custom training of the neural network based on the distance between the two leading lights can improve detection performance. Even using a medium-resolution camera, the system demonstrated greater reliability than human visual interpretation in identifying correct alignment at various distances.
An initial human-based classification is still necessary to enable effective supervised training. However, machine learning techniques, such as clustering algorithms, could assist in automating this step and improving labeling consistency, thereby enhancing final model accuracy.
Real-time testing confirmed the effectiveness of the visual AI system compared to human judgment, a result that should be seriously considered by shipyards and ship-owners. In the near future, this technology is expected to be implemented onboard as a standard aid to navigation.

Author Contributions

Conceptualization, M.V.N., G.C.F., B.S.F. and J.A.O.; Methodology, M.V.N., G.C.F., B.S.F. and J.A.O.; Software, M.V.N., G.C.F., B.S.F. and J.A.O.; Validation, M.V.N., G.C.F., B.S.F. and J.A.O.; Formal analysis, M.V.N., G.C.F., B.S.F. and J.A.O.; Investigation, M.V.N., G.C.F., B.S.F. and J.A.O.; Resources, M.V.N., G.C.F., B.S.F. and J.A.O.; Data curation, M.V.N., G.C.F., B.S.F. and J.A.O.; Writing—original draft, M.V.N., G.C.F., B.S.F. and J.A.O.; Writing—review & editing, M.V.N., G.C.F., B.S.F. and J.A.O.; Visualization, M.V.N., G.C.F., B.S.F. and J.A.O.; Supervision, M.V.N., G.C.F., B.S.F. and J.A.O.; Project administration, M.V.N., G.C.F., B.S.F. and J.A.O.; Funding acquisition, M.V.N., G.C.F., B.S.F. and J.A.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request. The data are not publicly available due to privacy.

Acknowledgments

The authors wish to express their gratitude to the University of A Coruña for its collaboration during the development of this research.

Conflicts of Interest

The authors declare no conflict of interest.

References

International Maritime Organization. SOLAS: Consolidated Edition 2020: International Convention for the Safety of Life at Sea, 1974, and its Protocol of 1988: Articles, Annexes and Certificates; International Maritime Organization: London, UK, 2020. [Google Scholar]
International Maritime Organization. MARPOL: Consolidated Edition 2022: Articles, Protocols, Annexes, Unified Interpretations of the International Convention for the Prevention of Pollution from Ships, 1973, as Modified by the Protocol of 1978 Relating Thereto; International Maritime Organization: London, UK, 2022. [Google Scholar]
International Maritime Organization. STCW: International Convention on Standards of Training, Certification and Watchkeeping for Seafarers, 1978: Including 2010 Manila Amendments, 2023 Edition; International Maritime Organization: London, UK, 2023. [Google Scholar]
International Maritime Organization. Navi-Trainer Professional 5000: TRANSAS Full-Mission Maritime Simulator; Transas Marine/Wärtsilä: Helsinki, Finland, 2007. [Google Scholar]
Veitch, E.; Alsos, O.A.; Cheng, T.; Senderud, K.; Utne, I.B. Human factor influences on supervisory control of remotely operated and autonomous vessels. Ocean. Eng. 2024, 299, 117257. [Google Scholar] [CrossRef]
Grosser, L.; Wilkinson, C.; Oppert, M.; Banks, S.; Clement, B. Automation at Sea and Human Factors. IFAC-PapersOnLine 2024, 58, 301–306. [Google Scholar] [CrossRef]
Hetherington, C.; Flin, R.; Mearns, K.J. Safety at Sea: Human Factors in Shipping. J. Saf. Res. 2006, 37, 401–411. [Google Scholar] [CrossRef] [PubMed]
Ibrahim, F.; Razali, M.N.; Abidin, N.Z. Content Analysis of International Standards for Human Factors in Ship Design and Operation. Trans. Marit. Sci. 2021, 10, 448–465. [Google Scholar] [CrossRef]
Brcko, T.; Pavić, I.; Mišković, J.; Androjna, A. Investigating the Human Factor in Maritime Accidents: A Focus on Compass-Related Incidents. Trans. Marit. Sci. 2023, 12. [Google Scholar] [CrossRef]
Gralak, R.; Muczyński, B.; Przywarty, M. Improving Ship Maneuvering Safety with Augmented Virtuality Navigation Information Displays. Appl. Sci. 2021, 11, 7663. [Google Scholar] [CrossRef]
Sencila, V.; Zazeckis, R.; Jankauskas, A. The use of a full mission bridge simulator ensuring navigational safety during the Klaipeda seaport development. TransNav Int. J. Mar. Navig. Saf. Sea Transp. 2020, 14, 417–424. [Google Scholar] [CrossRef]
Standard No. 2.14; Class A Standard for the Certification of Maritime Simulators. DNV GL: Høvik, Norway, 2020.
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. arXiv 2014, arXiv:1409.4842. [Google Scholar] [CrossRef]
Xiao, N.; Hu, X.; Liu, X.; Toh, K.-C. Adam-family Methods for Nonsmooth Optimization with Convergence Guarantees. J. Mach. Learn. Res. 2024, 25, 1–53. [Google Scholar]
Luo, Z.; Chen, S.; Chen, S.; Chen, S. Multi-stage stochastic gradient method with momentum acceleration. Signal Process. 2021, 188, 108201. [Google Scholar] [CrossRef]

Figure 1. View of the navigation bridge within the Transas NTPRO 5000 simulator (Transas Marine Ltd., Saint Petersburg, Russia) used for data acquisition.

Figure 2. Sample daytime images captured from the simulator bridge.

Figure 3. Example of images taken from the bridge during night navigation.

Figure 4. (a) Correct leading near; (b) Correct leading far away (only one point).

Figure 5. Visual example of correct leading at a distance with additional interference from a nearby lighthouse, demonstrating potential sources of ambiguity in classification.

Figure 6. Examples of incorrect leading situations at close distance. (a) and (b) both show the navigation lights misaligned horizontally, indicating deviation from the proper course.

Figure 7. Example of a visually ambiguous case where background lighting interferes with the correct identification of the leading lights, posing a challenge for both human and AI classification.

Figure 8. Confusion matrix for the CNN model trained with Adam optimizer and a constant learning rate of 0.001, using three output classes.

Figure 9. Training accuracy and loss curves over 40 epochs for the initial CNN configuration.

Figure 10. Accuracy and loss evolution during training with L2 regularization applied, showing a similar performance to the baseline configuration.

Figure 11. Accuracy trends for training (blue) and validation (red) sets using a specialized CNN trained for night navigation images. Validation accuracy reached 86%.

Figure 12. Confusion matrix (Adam and three classes).

Figure 13. Test image classified as “Correct Leading”.

Figure 14. Example of an “Incorrect Leading” situation, where the navigation lights appear misaligned horizontally.

Figure 15. Example of a “DOUBT” classification case.

Figure 16. Confusion matrix (Adam, Weight Learning factor 0.2 and three classes and 200 epochs).

Figure 17. Accuracy of the CNN (Adam, Weight Learning factor 0.2 and three classes and 200 epochs, Accuracy = 0.9435).

Figure 18. Confusion matrix (Adam, Weight Learning factor 0.2 and two classes 200 epochs).

Figure 19. Accuracy of the CNN (Adam, Weight Learning factor 0.2 and two classes and 200 epochs, Accuracy = 0.8362).

Figure 20. Confusion matrix (Weight Learning factor 0.2 and six classes 180 epochs).

Figure 21. Accuracy of the CNN (Weight Learning factor 0.2 and six classes and 180 epochs, Accuracy = 0.68).

Table 1. Comparative analysis of deep learning architectures for visual ship leading detection.

Architecture	Key Advantages	Limitations	Suitability for This Task
CNN (e.g., SqueezeNet)	- Excellent at spatial pattern recognition - Lightweight and fast inference - Robust to noise and local distortions	- Limited temporal modeling - Requires labeled image data	High: Ideal for static visual scenes like night-time navigation
Fully Connected Network (FCN)	- Simpler architecture - Easy to implement	- Poor spatial structure learning - High parameter count	Low: Not suitable for spatial pattern recognition in images
RNN/LSTM	- Effective for time-sequential data - Good for motion and event tracking	- High training time - Poor image feature representation	Low–Medium: Not ideal for static image classification
Transformer (ViT)	- High performance on large image datasets - Attention mechanism for feature relevance	- Requires large dataset - Computationally expensive	Medium: Promising, but not suited for real-time onboard use yet

Table 2. Precision of the convolutional neural network for each different situation.

Type of Image	Accuracy (%)
Doubt	100%
NOT OK–Low Bow Deviation	19%
NOT OK–Medium Starboard Deviation	0%
NOT OK–Medium Port Deviation	41%
OK 1 Light	70%
OK 2 Lights	83%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Vázquez Neira, M.; Cao Feijóo, G.; Sánchez Fernández, B.; Orosa, J.A. Deep Learning for Visual Leading of Ships: AI for Human Factor Accident Prevention. Appl. Sci. 2025, 15, 8261. https://doi.org/10.3390/app15158261

AMA Style

Vázquez Neira M, Cao Feijóo G, Sánchez Fernández B, Orosa JA. Deep Learning for Visual Leading of Ships: AI for Human Factor Accident Prevention. Applied Sciences. 2025; 15(15):8261. https://doi.org/10.3390/app15158261

Chicago/Turabian Style

Vázquez Neira, Manuel, Genaro Cao Feijóo, Blanca Sánchez Fernández, and José A. Orosa. 2025. "Deep Learning for Visual Leading of Ships: AI for Human Factor Accident Prevention" Applied Sciences 15, no. 15: 8261. https://doi.org/10.3390/app15158261

APA Style

Vázquez Neira, M., Cao Feijóo, G., Sánchez Fernández, B., & Orosa, J. A. (2025). Deep Learning for Visual Leading of Ships: AI for Human Factor Accident Prevention. Applied Sciences, 15(15), 8261. https://doi.org/10.3390/app15158261

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Learning for Visual Leading of Ships: AI for Human Factor Accident Prevention

Abstract

1. Introduction

2. Materials and Methods

2.1. Transas 5000 Simulator of Navigation

2.2. Convolutional Neural Networks: Knowledge Transfers and Hyperparameters

Model Selection Justification

2.3. Video Recording

2.4. Image Classification

3. Results and Discussion

3.1. Network Training Accuracy and Hyperparameters: Transfer Learning

3.2. Network Training Accuracy and Hyperparameters: Specialized CNN

3.3. CNN Test

3.4. CNN Optimization

3.4.1. Increasing the Weight Learning Rate to 0.2, 3 Classes, 200 Epochs (Accuracy = 0.9435)

3.4.2. Increasing the Weight Learning Rate to 0.2 and Two Classes 200 Epochs

3.5. Accuracy Analysis in Visual Detection

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI