You are currently viewing a new version of our website. To view the old version click .
Applied Sciences
  • Article
  • Open Access

14 September 2025

Research on Identification and Localization of Flanges on LNG Ships Based on Improved YOLOv8s Models

,
,
,
,
and
1
Institute of Marine Engineering Equipment, Zhejiang Ocean University, Zhoushan 316022, China
2
Jurong Energy (XinJiang) Co., Ltd., Urumqi 841603, China
*
Authors to whom correspondence should be addressed.

Abstract

Recognition of flanges on LNG (“Liquefied Natural Gas”) vessels and distance determination are key prerequisites for automated docking of an unloading arm, and in complex environments, flange detection faces challenges such as low accuracy and large distance measurement errors. Therefore, an improved lightweight high-precision approach based on YOLOv8s that integrates the C2f_Ghost module, a CBAM, and Concat_BiFPN was proposed. In addition, a monocular ranging algorithm based on pixel width and coordinate transformation was introduced to estimate the 3D coordinates of a flange. Specifically, the original Bottleneck in the C2f module was replaced by the Ghost module, which combines dynamic convolution and dynamic depth-separable convolution to enhance feature representation while reducing model complexity. A CBAM was introduced in the middle layer of the backbone module to improve the model’s focus on key features with minimal parameter increases, and Concat_BiFPN was used in the neck to facilitate cross-scale feature fusion. To ensure the reproducibility of the experiments, this study primarily employed a fixed random seed (0), and experimental validation analysis was conducted on a flange dataset. The results show that for the improved model, mAP@0.5 reached 97.5% and mAP@0.5:0.95 reached 82.3% with a parameter size of 9.34 M, representing improvements in accuracy of 0.6% and 13.4%, respectively, and a reduction in the parameter size of 16.2% compared with the original YOLOv8s. The average ranging errors of the X-axis, Y-axis, and Z-axis were 2.43%, 2.77%, and 0.71%, respectively. Therefore, the combination of the two algorithms significantly improves the detection and ranging accuracy of flanges in complex environments.

1. Introduction

In recent years, global energy demand has continued to grow at a rate of nearly 1.2% per year, and LNG has been widely used as a new clean energy source [1,2,3]. As a country with large energy demands, China’s annual imports are growing, so ship-to-shore docking work involving LNG ships has become an inevitable trend. An LNG unloading arm is an important hub connecting a ship and a receiving station, and after many years of development, the structure of the current technology is relatively mature, but the manual docking method is still used [4]. Due to the complex environment of a ship’s flange, as shown in Figure 1, relying only on the human eye to judge the target position and drive the unloading arm during the docking process can lead to inaccurate flange position information. This inaccuracy may cause contact between the unloading arm’s three-dimensional joints and the deck, resulting in sparks and serious accidents.
Figure 1. LNG ship flange environment.
In addition, under the impetus of intelligent construction, it is very important to design a technology that can realize automatic docking using algorithmic control code. Accurate flange identification and position information estimation are important prerequisites for the realization of this technology. Currently, identification of the flange and acquisition of 3D coordinates face the following challenges due to the working environment of the flange: First of all, because the ship is located in the sea, detection of the flange on the ship is affected by the light intensity, berth height, fog, and other factors; this may result in impaired detection and a misdiagnosis of the situation. Secondly, the intricacies of shipboard pipelines increase the range of flange detection, leading to larger errors when obtaining information related to the coordinates of a particular flange that needs to be docked. Third, the increased complexity of the working environment leads to increased computational resources, higher equipment requirements, and difficulties in the use of algorithms. Due to the above problems, the current algorithms related to target detection and ranging are limited. Therefore, in the related improvement work, we focus on the above problems to improve the accuracy and speed of target detection and distance judgment by constructing a real detection model environment.
Due to the rapid development of computer hardware and the breakthroughs of deep learning algorithms, deep learning is superior to traditional processing in the field of image processing, so it is widely used to direct target detection [5,6,7]. In the field of deep learning for target detection, there are two types of target detection algorithms: two-stage algorithms, represented by R-CNN, Fast R-CNN, Faster R-CNN, and SPP-Net, and single-stage algorithms, represented by YOLO, SSD, EfficientDet, RetinaNet, etc. Both types of algorithms have their advantages and disadvantages, and single-stage algorithms are more suitable for this study since automated docking requires real-time detection of the target. YOLO, as a currently popular algorithm, has attracted much attention. SUN et al. [8] proposed combining the two attention mechanisms of a CBAM and SE to achieve accurate detection of surface rust on power equipment by improving the YOLOv5s algorithm. Zhu et al. [9] proposed utilizing the parameterized backbone network of HGNetv2, as well as the introduction of a variable kernel convolution module (C2f-AKConv) in combination with a new detection head module, DCNV4-Dyhead, which further improves the accuracy of detection of people in underground mines to ensure their safety. Miao et al. [10] replaced C2f with the C2f_Ghost module and introduced the EMA attention mechanism in the feature extraction phase to achieve a substantial reduction in the number of model parameters; their method can be used to detect mine fires efficiently in real time. Li et al. [11] proposed an enhanced detection algorithm, MGS-YOLO, based on YOLOv5; it can reduce the number of network parameters without losing prediction accuracy and can realize efficient real-time detection of ships. Wang et al. [12] replaced the trunk of the YOLOv8 model with a lightweight detection head (AsDDet) and replaced C2f with C2f-SE to achieve real-time detection of the presence of pests and diseases in mango trees. Wang et al. [13] designed a lightweight feature fusion framework based on YOLOv4. This framework effectively mitigates the semantic confusion problem in the process of feature fusion. It improves the feature extraction capability of the network. Luo et al. [14] proposed integration of the Ghost module and the efficient multiscale attention module within the YOLOv8 module, a modification that has been demonstrated to enhance the accuracy of traffic sign recognition. Han et al. [15] proposed a visual localization method for pedestrians in underground tunnels that combines deep learning. This method integrates a CNN detection algorithm with monocular ranging technology to achieve 3D spatial localization of pedestrians in tunnels.
Vision-based ranging algorithms can be categorized into monocular ranging and binocular (multi)-ranging according to the number of cameras. In recent years, research related to monocular ranging algorithms in the field of visible light has been relatively mature [16,17], and these algorithms have found widespread applications in intelligent driving and face recognition due to their large perceptual viewpoints and flexible perceptual characteristics [18,19,20]. Deep learning target detection technology has been integrated with monocular ranging algorithms [21,22], which has significant advantages for identifying targets and judging their distances in complex environments. Yang et al. [23] proposed improving a monocular ranging algorithm through the size of the pixel width and obtain accurate distances to power equipment. The extant studies in this area all employ YOLOv8 as a baseline, which demonstrates superior performance through the enhanced anchor-free architecture and the dynamic allocation mechanism, which are further optimized in terms of accuracy, speed, and efficiency.
Deep learning and ranging algorithms have certain advantages for identifying and locating targets in complex environments. This study proposes an enhancement to the YOLO version of the YOLOv8s model and uses the improved model in combination with an improved algorithm based on monocular ranging via the pixel width size, reflecting the feasibility of engineering applications. These model improvements solve the problems of the traditional model, such as high computation requirements, poor recognition accuracy, and distance recognition difficulties caused by measuring the target from different shooting angles. The specific research activities undertaken are illustrated in Figure 2, and the principal contributions are as follows:
Figure 2. Overall research framework.
  • Capturing flange images, constructing a dataset, labeling the dataset with Roboflow software, training the model to learn, and turning on multiple data enhancements to expand the diversity of the dataset.
  • Introducing the Ghost module, which is a combination of dynamic convolution and dynamic depth-separable convolution. It replaces the Bottleneck in the C2f component of the backbone. This reduces the number of model parameters and improves the diversity and accuracy of feature extraction.
  • The CBAM attention mechanism is located in the middle two layers of the backbone, which enhances the feature expression ability of the middle layer; improves model generalization and recognition accuracy; and at the same time, avoids defects such as the excessive parameter volume caused by the overall addition, which leads to information redundancy and overfitting of the model.
  • Introducing a weighted BiFPN to replace the PAN-FPN structure in the neck for bi-directional enhancement of multiple features, reducing redundant connections and maintaining high efficiency of target feature fusion while having low computational costs.
  • Adopting a monocular ranging algorithm based on the target pixel width to realize acquisition of the three-dimensional coordinates of the target. This approach mitigates the risk of difficulty in distance recognition due to the shooting angle and improves the accuracy of ranging.

3. Experiments

3.1. Experimental Parameter Settings

The training platform used the Windows 11 operating system. The equipment utilized was manufactured by Lenovo in Beijing, China. The CPU was an Inter(R)Core(TM)i7-14650HX, the GPU was an NVIDIA GeForce RTX 4060 Laptop GPU, the development environment was Python 3.12, the resolution of the input images was 640 × 640, the initial learning rate was 0.001, the final learning rate coefficient was 0.01, the L2 regular term was added to prevent overfitting, the weight decay was 0.0005, the momentum was set to 0.937, Mosaic enhancement efficiency was always on, and the Mixup enhancement probability was 0.2 for robustness. The images were randomly rotated by ±15° and translated by ±10% in the X and Y directions. The scaling factor was 0.5, the shear angle was ±2°, the horizontal flip probability was 0.5, the training batch size was set to 16, and the number of iterations was set to 150. Our optimization primarily involved module replacement and additions, without modifying the backbone network width or other fundamental structures. The activation function employed was SiLU (Swish), the default setting for the YOLOv8 series. To ensure reproducibility, most models were trained five times using a fixed random seed (0). Due to computational resource constraints, the training set was divided into three portions (75/15/10). Five separate experiments were conducted for both the original and improved models using different random seeds (seed = 0, 7, 42, 123, and 150) to obtain standard deviations, thereby validating model stability. During inference, the confidence threshold and NMS intersection-over-union threshold were set to 0.25 and 0.5, respectively.

3.2. Evaluation Metrics

To assess the effects of improving the model’s performance in terms of both its light weight and high precision, this study focused on the performance of the improved model in terms of the parameters, recall, mAP@0.5, mAP@0.5:0.95, and other metrics. In deep neural networks, a model’s number of parameters represents the total number of trainable parameters in the network. It is one of the core metrics of the network’s complexity and computational overhead. The larger the number of parameters, the higher the demand of the model on the resources of storage and arithmetic, which are limited when models are deployed in resource-constrained edge devices, so appropriately reducing the number of parameters can accelerate the inference speed and improve the unloading arm’s ability to complete the target recognition grasping work involving the flange on the ship.
Recall is a key metric in target detection evaluation, as it can show the proportion of successful detections of all targets that were actually present. Since most LNG ships have pipeline flanges, the higher the recall, the lower the chance of missing the target flange, which is more conducive to later selection of flange targets that need to be docked, and recall can be a good measure of the model’s coverage ability, which is a reflection of the model’s generalization. The calculation is shown in Equation (17). TP is the number of positive samples that were correctly predicted, and FP is the number of positive samples that were incorrectly predicted.
Re c a l l = T P T P + F P
mAP@0.5 is a core measure of a model’s accuracy in detecting targets. It integrates the accuracy and completeness of the detected results. It is mainly indicated in the IOU of 0.5 for target detection evaluation, which is often used as the basic evaluation of target detection accuracy due to broader detection labeling, which is based on “target detected”. mAP@0.5:0.95 indicates that the IOU sets thresholds from 0.5 to 0.95 in 0.05 increments. It then calculates the APs separately and averages them. This is a more comprehensive measure of the model’s performance under different positioning accuracy requirements. It is also used as the official evaluation standard for the official COCO dataset and is mostly used for tasks requiring higher accuracy or more fine-grained localization capabilities, as well as for evaluations of target detection accuracy in the context of multi-target complexity, and is able to accurately measure the ability of a model to ‘detect a target and locate it accurately’. In order to realize target detection and localization and ranging functions, this study considered two kinds of performance indicators for detection of target recognition accuracy. Both can detect this ability but also look at positioning accuracy and can comprehensively judge the quality of a model. The calculation methods for the two evaluation indexes are shown in Equations (18) and (19). In these equations, N represents the number of detection target categories; A P i represents the average precision of category i ; and P i and R i represent the precision and recall of category i , respectively.
m A P @ 0.5 = 1 N i = 1 N A P i I O U = 0.5
A P i = 0 1 P i R i
m A P @ 0.5 : 0.95 = 1 10 × N t = 0.5 0.95 i = 1 N A P i I O U = t

3.3. Ablation Experiments

To verify the effect of the added improvement modules on the overall model accuracy, ablation experiments with three improvement modules were performed on the original YOLOv8s model. The comparison curves obtained from training on the self-constructed flange dataset are shown in Figure 10a–d. The results are shown in Table 1. All results in Table 1 were obtained using a fixed random seed to ensure comparability between the ablation experiments, and “√” indicates that the module was selected. When compared to the baseline YOLOv8s model (method A), introducing the C2f_Ghost module resulted in an 11.8% increase in map0.5:95 and a 15.4% decrease in the parameters. After introducing the CBAM, map0.5:95 increased by 11.6%. After introducing Concat_BiFPN, map0.5:95 increased by 2.1% and recall increased by 2%. This indicates that these methods can contribute to different aspects of the model’s detection performance.
Figure 10. Comparisons of indicators before and after model improvement. (a) Comparison of model parameter, (b) recall vs. number of iterations, (c) mAP@0.5 vs. number of iterations, and (d) mAP@0.5:0.95 vs. iteration count.
Table 1. Results of ablation experiments.
To further verify the facilitating effect of individual-module coupling, a comparison of the training results of multiple-module coupling was conducted. The results show that the performance of multiple modules was better after coupling than the performance when adding a single module. In addition, the two aspects of the model’s light weight and high precision were considered comprehensively, and the model containing the C2f_Ghost module, the CBAM, and Concat_BiFPN (method H) had the best results, with a 0.6% increase in mAP@0.5, a 13.4% increase in mAP@0.5:0.95, a 16.2% decrease in the number of parameters, and a 1.2% increase in recall compared to the original YOLOv8s model. This shows that using the C2f_Ghost module to replace the original C2f greatly reduced the number of parameters while improving target recognition accuracy. However, introducing the CBAM attention mechanism increased the number of parameters. To prevent overfitting, only the middle layer used the CBAM, which reduced the increase in the parameters and improved accuracy. Concat_BiFPN improved the generalization ability and avoided model misdetection without increasing the parameters.
To ensure the comparability of the ablation experiments, the results in Table 1 were obtained using a fixed random seed. To account for variability, we retrained Method A (i.e., the YOLOv8s model) and Method H (i.e., the improved model) using five random seeds (0, 7, 42, 123, and 150), and the means ± standard deviations are reported in Table 2. The results show that while recall and mAP@0.5 remained broadly comparable within the range of variability, the improved model (H) consistently achieved higher and more stable performance for mAP@0.5:0.95. This demonstrates that the proposed improvements effectively enhance high-precision localization performance while maintaining the model’s lightweight advantages (the parameters were reduced from 11.14 M to approximately 9.5 M). The slight discrepancy between Table 1 (9.34 M) and Table 2 (9.51 M) stems from implementation differences: the values in Table 1 were manually calculated based on the model architecture, while those in Table 2 were automatically computed using the torchsummary library, which includes all trainable parameters (e.g., bias terms and normalization layer parameters). This minor discrepancy does not affect the core conclusion of this paper that the improved model significantly reduces the number of parameters. In addition, FPS values were measured on the RTX 4060 Laptop GPU and are reported in Table 2 as reference indicators. However, these results were affected by runtime fluctuations and should not be considered representative of edge device performance.
Table 2. Comprehensive performance evaluations of models based on different random seeds (means ± standard deviations).

3.4. Model Comparison Experiments

To verify the effectiveness of the proposed improved model, using a fixed seed, we compared the improved YOLOv8s model with other versions of the YOLO model in terms of the number of parameters, precision, and recall using the same dataset and the same training strategy. The results of comparisons of the various indicators are shown in Figure 11a–d, and the data are shown in Table 3.
Figure 11. Comparisons of training metrics for different models. (a) Comparison of model parameters, (b) recall vs. number of iterations, (c) mAP@0.5 vs. number of iterations, and (d) mAP@0.5:0.95 vs. iteration count.
Table 3. Comparative experimental results of different models.
As illustrated in Table 3, the YOLOv5s model was only improved in terms of its light weight and was inferior to the YOLOv8s model in terms of precision as well as recall, whereas the YOLOv10s model structure was improved by 9.1% compared to the YOLOv8s model structure, the mAP@0.5:0.95 was increased by 9.1%, and the number of parameters was reduced by 27.5%, making this model more lightweight. At the same time, the target positioning accuracy was greatly improved. The improved YOLOv8s model structure had greater improvements than the YOLOv10s model structure in many aspects, except for parameterization, which was slightly lower than that of the YOLOv10s model structure relative to the YOLOv8s model. The enhanced model structure fulfills the lightweight improvement strategy while concomitantly enhancing model accuracy. This is achieved by incorporating considerations of target detection performance and practical deployment requirements. Consequently, this model is better suited for identifying and localizing flanges in complex onboard environments, exhibiting considerable engineering application potential.

3.5. Visualizing the Results

To further validate the optimization effectiveness of the YOLOv8s model architecture, Grad-CAM technology was introduced to enhance the model’s object detection representation capabilities across diverse scenarios. Relevant visualization results are shown in Figure 12. Grad-CAM provides a more intuitive representation of the image regions the model focuses on during object detection, effectively addressing issues related to false positives and false negatives during localization. Heatmap diagrams generated using the improved model demonstrate that, even after lightweight optimization, the model can still accurately pinpoint target areas. This proves that the algorithmic model structure maintains its perceptual capabilities while reducing parameter counts, achieving dual enhancements to its lightweight design and accuracy to meet practical application requirements.
Figure 12. Target detection + Grad-CAM.

3.6. Ranging and Localization Model Experiment

The present study sought to verify the accuracy and precision of two models, the improved model and the original model, for target recognition using the monocular ranging algorithm. To this end, experiments related to monocular ranging were carried out, as illustrated in Figure 13. A zoom camera was employed with a focal length of 13 mm. Calibration was performed using Zhang Zhengyou’s calibration method, incorporating a model that includes radial distortion (K1, K2, and K3) and perfect tangential distortion (P1 and P2). The calibrated average reprojection error was 0.15 pixels, directly quantifying the calibration accuracy and model fit. The experimental circular flange had a diameter of 290 mm. To ensure the accuracy of the experimental data, the experiments were performed many times at a distance of 3 m–4.8 m. A monocular camera was utilized for target identification and distance detection. The different models were compared using the monocular ranging algorithm by measuring the error between the 3D coordinates and the actual coordinates. The measurement results are shown in Table 4 and Table 5.
Figure 13. Experimental platform.
Table 4. Experimental results (YOLOv8s).
Table 5. Experimental results (Improved YOLOv8s).
By analyzing Table 4 and Table 5, it can be observed that within the ranging distance of 3 m to 4.8 m, the improved YOLOv8s model achieved average ranging errors of 2.43% ± 0.83% (X-axis), 2.77% ± 1.02% (Y-axis), and 0.71% ± 0.31% (Z-axis). To further evaluate statistical significance, Welch’s t-tests were conducted between the original YOLOv8s model and the improved model. It should be noted that statistical significance depends not only on the sample size but also on the magnitude of the improvement relative to the variability. With the same number of measurements, the X-axis error rate decreased from 3.57% to 2.43%. This reduction was large relative to the standard deviation and therefore reached statistical significance (t = 2.89, p = 0.0098 < 0.05). In contrast, the Y-axis error decreased from 3.50% to 2.77%, but due to larger variability, it did not reach statistical significance with the limited sample size (t = 1.43, p = 0.170). The Z-axis error decreased from 0.93% to 0.71%. As the error was already small and the improvement was only about 0.2 percentage points, it was more difficult to achieve statistical significance with the small sample size (t = 1.28, p = 0.218). Nevertheless, since LNG unloading arm docking operations typically require a positioning error of less than 3%, the mean errors of all three axes remained within the engineering tolerance. Therefore, even though some of the improvements did not reach strict statistical significance, the proposed model enhancements still provide practically meaningful and reliable ranging accuracy for engineering applications.
According to the Z-axis data in the above table, the error rate curve is shown in Figure 14. With an increase in the measurement distance, the error generated was greater. The main reason was that the target should have been detected in real time, and a camera resolution that was too high led to delays in obtaining the target information. Thus, the camera resolution selected for the experiment may have been insufficient, and when the measurement distance increased, the target of the experiment became smaller in the image on the screen. However, a lack of resolution will lead to an overall change of the pixel width of unknown magnitude, which will lead to error. According to Equation (11), it can be seen that f and D are fixed values, and the distance is inversely proportional to the pixel width. When the distance is larger, the pixel width (D’) is smaller and the error of target detection is larger. Consequently, when using monocular ranging, it is imperative to ensure that the ranging target is situated within the designated distance interval to optimize acquisition of high-precision target coordinate information.
Figure 14. Absolute error difference.

3.7. Model Generalizability Validation Analysis

To further validate the robustness and engineering adaptability of the proposed improved YOLOv8s model in real environments, field tests were conducted by simulating the working environment in which an LNG unloading arm is located. This environment contains some errors in target distance measurement due to seashore lighting variations (e.g., glare and backlighting), increased background complexity (e.g., dock structure interference), and camera angle deviations. By building a typical operation platform that would be used during the docking process of an LNG unloading arm, as shown in Figure 15, we analyzed the effects of lighting conditions on recognition detection to test the degree of model generalization.
Figure 15. Real docking platform.
Due to limited space, it was not easy to perform data measurement in the XY direction of the target, and there was some danger, so we only collected the ranging results of the two models in the Z-axis direction under different illumination conditions for comparative analysis. We only needed to extend the experimental platform outward in the Z-axis direction and did not need to shake the platform left or right, so the risk of the experiments was greatly reduced. Due to resource constraints, only a few sets of data were collected to verify and compare the generalization results of the two models in a real environment. The experimental results are shown in Table 6.
Table 6. Results in different lighting conditions.
When analyzing the experimental data in Table 6 and comparing the experimental curves in Figure 16, it can be seen that the error rate of the improved YOLOv8s model in the Z-axis direction was much lower than that of the traditional YOLOv8s model under different lighting conditions and with complex backgrounds. It is evident that the improved model is more applicable to real working scenarios and that its degree of generalization is much higher than that of the traditional model. Therefore, its application in the automatic docking of an LNG unloading arm can greatly improve its docking accuracy and working efficiency.
Figure 16. Z-axis error rate comparison.

4. Conclusions

Accurate identification and positioning of LNG vessel flanges are critical for achieving automated unloading arm docking and enhancing port operation safety and efficiency. To address challenges such as low detection accuracy and significant ranging errors in complex environments, this study proposes an enhanced YOLOv8s model. This model integrates C2f_Ghost module, CBAM, and Concat_BiFPN technologies, combined with a monocular ranging algorithm based on pixel width, achieving high-precision detection and localization of flange targets.
The main contributions and methods of this study can be summarized as follows: (1) We constructed a maritime flange dataset covering diverse lighting conditions, angles, and complex scenarios, laying the foundation for model training and validation. (2) We implemented multiple lightweight and accuracy-enhancing improvements to the YOLOv8s model. We introduced the C2f_Ghost module in the backbone, integrating dynamic convolutions and depth-separable convolutions to reduce the parameters while enhancing feature expression. We added CBAM attention in the intermediate layers to boost focus on key features with a minimal parameter cost. In the neck section, a Concat_BiFPN structure with weighted feature fusion replaced the native PAN-FPN to strengthen cross-scale feature integration. (3) A monocular ranging algorithm based on the relationship between the target pixel width and coordinate transformation was proposed to estimate the three-dimensional spatial coordinates of the flange.
The experimental results demonstrate that the method proposed in this study effectively balances model accuracy and complexity: (1) Detection performance: The improved YOLOv8s model achieved mAP@0.5 and mAP@0.5:0.95 values of 97.5% and 82.3%, respectively, using a fixed random seed, which were improvements of 0.6% and 13.4% compared to the original model. Simultaneously, the model’s parameter count decreased by 16.2% (to 9.34 million), while its recall improved by 1.2%, demonstrating significant gains in detection accuracy and the model’s lightweight design. (2) Localization performance: Combined with a monocular ranging algorithm, the improved model achieved average ranging error rates of 2.43%, 2.77%, and 0.71% along the X, Y, and Z axes, respectively. These values represent significant reductions compared to the original model and satisfy the practical engineering requirement that measurement errors must not exceed 3%. (3) Generalization capability: To prevent overfitting due to the limited dataset, data augmentation was employed. Comparative experiments were conducted on the improved model and the original model using multiple random seeds, and the results are shown in Table 2. The improved model demonstrated better stability and robustness, with a significant reduction in the standard deviation of its mAP@0.5:0.95 metric, proving the potential applicability of this method in real-world scenarios.
However, this study has certain limitations. Due to the relatively small dataset and its focus solely on single object detection tasks, we did not employ k-fold cross-validation for comprehensive validation, which may have led to statistically overestimated model performance. Additionally, the model validation experiments were conducted on a laptop equipped with an RTX 4060 GPU. Due to unstable frame rates, the frame rate (FPS) data provided is for reference only and should not be used as a metric for evaluating edge device deployment performance. The lightweight models were only validated by combining the GPU memory usage with the parameter count. Deployment verification on embedded devices has not yet been conducted. It should also be noted that the ranging accuracy may decrease at long distances due to pixel resolution limitations.
Future work will focus on the following directions: (1) further expanding the dataset’s size and diversity and adopting more rigorous evaluation methods, such as k-fold cross-validation, to provide comprehensive performance metrics (e.g., means and standard deviations); (2) exploring multi-camera information fusion techniques to enhance positioning accuracy and system robustness in complex scenarios; and (3) deploying the optimized model on embedded edge devices for real-world online testing and performance refinement, ultimately advancing the engineering application of fully automated LNG unloading arm docking technology.

Author Contributions

S.S. participated in the algorithm improvement study, conducted the related experiments, completed the overall concept, and wrote the manuscript. W.F. and R.L. completed the revision and touched up the manuscript. W.W. and L.X. provided writing instruction. G.L. touched up the images in a relevant way. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The dataset used in this study contains 1500 annotated images covering three scenes (a factory, a dock platform, and a ship). It was generated and fully annotated in the Roboflow platform. As it is dedicated to this study, this dataset has not been created or published elsewhere and is available upon reasonable request from the corresponding author.

Acknowledgments

This research did not receive any specific grants from funding agencies in the public, commercial, or not-for-profit sectors. We thank the editor-in-chief of the journal and the reviewers for their useful feedback, which improved this paper.

Conflicts of Interest

Author Rongsheng Lin was employed by the company Jurong Energy (XinJiang) Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Zou, C.; Lin, M.; Ma, F.; Liu, H.; Yang, Z.; Zhang, G.; Yang, Y.; Guan, C.; Liang, Y.; Wang, Y.; et al. Progress, Challenges and Countermeasures of China’s Natural Gas Industry under the Carbon Neutral Target. Pet. Explor. Dev. 2024, 51, 418–435. [Google Scholar] [CrossRef]
  2. Song, K.; Liu, Q.; Du, Y.; Liu, X. Analysis of the development status and prospect of China’s gas industry under the “dual-carbon target”. Shandong Chem. Ind. 2022, 51, 97–99. [Google Scholar]
  3. Gao, Y.; Wang, B.; Hu, M.; Gao, Y.; Hu, A. Review of China’s natural gas development in 2023 and outlook for 2024. Nat. Gas Ind. 2024, 44, 166–177. [Google Scholar]
  4. Sun, C.; Zhang, J. Analysis of the development status and prospect of LNG industry. Tianjin Sci. Technol. 2017, 44, 94–96. [Google Scholar]
  5. Cai, J.; Mao, Z.; Li, J.; Wu, X. A review of deep learning based target detection algorithms and applications. Netw. Secur. Technol. Appl. 2023, 11, 41–45. [Google Scholar]
  6. Ming, Z.Q.; Zhu, M.; Wang, X.; Cheng, J.; Gao, C.; Yang, Y.; Wei, X. Deep learning-based person re-identification methods: Asurvey and outlook of recent works. Image Vision. Comput. 2022, 119, 104394. [Google Scholar] [CrossRef]
  7. Wei, W.Y.; Yang, W.; Zuo, E.; Qian, Y.; Wang, L. Person re-identification based on deep learning An overview. J. Vis. Commun. Image Represent. 2022, 82, 103418. [Google Scholar] [CrossRef]
  8. Sun, W.; Zhang, J.; Liao, W.; Guo, Y.; Li, T. Detection of Corrosion Areas in Power Equipment Based on Improved YOLOv5s Algorithm with CBAM Attention Mechanism. In 6GN for Future Wireless Networks. 6GN 2023; Li, J., Zhang, B., Ying, Y., Eds.; Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering; Springer: Cham, Switzerland, 2024; Volume 553. [Google Scholar] [CrossRef]
  9. Zhu, Y.S.; Wang, X.; Tang, C.H.; Liu, C.Y. Improved RAD-YOLOv8s deep learning algorithm for personnel detection in deep mining workings of mines. J. Real-Time Image Process. 2025, 22, 121. [Google Scholar] [CrossRef]
  10. Miao, Z.; Zhou, H.; Wang, Q.; Xu, H.; Wang, M.; Zhang, L.; Bai, Y. Improved mine fire detection algorithm for YOLOv8n. Min. Res. Dev. 2025, 45, 200–206. [Google Scholar] [CrossRef]
  11. Qian, L.; Zheng, Y.; Cao, J.; Ma, Y.; Zhang, Y.; Liu, X. Lightweight ship target detection algorithm based on improved YOLOv5s. J. Real-Time Image Process. 2024, 21, 3. [Google Scholar] [CrossRef]
  12. Wang, J.; Wang, J. A lightweight YOLOv8 based on attention mechanism for mango pest and disease detection. J. Real-Time Image Process. 2024, 21, 136. [Google Scholar] [CrossRef]
  13. Wang, J.B.; Wu, Y.X. Safety helmet wearing detection algorithmof improved YOLOv4-tiny. Comput. Eng. Appl. 2023, 59, 183–190. [Google Scholar] [CrossRef]
  14. Luo, Y.; Ci, Y.; Jiang, S.; Wei, X. A novel lightweight real-time traffic sign detection method based on an embedded device and YOLOv8. J. Real-Time Image Process. 2024, 21, 24. [Google Scholar] [CrossRef]
  15. Han, J.; Yuan, J.W.X.; Lu, Y. Pedestrian visualpositioning algorithm for underground roadway based on deep learning. J. Comput. Appl. 2019, 39, 688. [Google Scholar] [CrossRef]
  16. Zhao, C.; Sun, Q.; Zhang, C.; Tang, Y.; Qian, F. Monocular depth estimation based on deep learning: An overview. Sci. China Technol. Sci. 2020, 63, 1612–1627. [Google Scholar] [CrossRef]
  17. Shen, C.; Zhao, X.; Liu, Z.; Gao, T.; Xu, J. Joint vehicle detection and distance prediction via monocular depth estimation. IET Intell. Transp. Syst. 2020, 14, 753–763. [Google Scholar] [CrossRef]
  18. Wang, X.; Zeng, P.; Cao, Z.; Bu, G.; Hao, Y. A Monocular Vision Ranging Method Related to Neural Networks. In Advances and Trends in Artificial Intelligence. Theory and Applications; Fujita, H., Wang, Y., Xiao, Y., Moonis, A., Eds.; IEA/AIE 2023; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2023; Volume 13925. [Google Scholar] [CrossRef]
  19. Liu, Y.; Chen, X. Monocular meta-imaging camera sees depth. Light Sci. Appl. 2025, 14, 5. [Google Scholar] [CrossRef]
  20. Liu, Q.; Pan, M.; Li, Y. Design of vehicle monocular ranging system based on FPGA. Chin. J. Liq. Cryst. Disp. 2014, 29, 422–428. [Google Scholar]
  21. Yuan, K.; Huang, Y.; Guo, L.; Chen, H.; Chen, J. Human feedback enhanced autonomous intelligent systems: A perspective from intelligent driving. Auton. Intell. Syst. 2024, 4, 9. [Google Scholar] [CrossRef]
  22. Chen, H.; Lin, M.; Xue, L.; Gao, T.; Zhu, H. Research on location method based on monocular vision. J. Phys. Conf. Ser. 2021, 1961, 012063. [Google Scholar] [CrossRef]
  23. Yang, F.; Wang, M.; Tan, T.; Lu, X.; Hu, R. Improved algorithm for monocular ranging of infrared imaging of power equipment based on target pixel width recognition. J. Electrotechnol. 2023, 38, 2244–2254. [Google Scholar] [CrossRef]
  24. Zeng, S.; Geng, G.; Zou, L.; Zhou, M. Real spatial terrain reconstruction of first person point- of-view sketches. Opt. Precis. Eng. 2020, 28, 1861–1871. [Google Scholar]
  25. Kim, S.M.; Lee, J.S. A comprehensive review on Compton camera image reconstruction: From principles to AI innovations. Biomed. Eng. Lett. 2024, 14, 1175–1193. [Google Scholar] [CrossRef] [PubMed]
  26. Han, Y.; Zhang, Z.; Dai, M. Monocular vision system for distance measurement based on feature points. Opt. Precis. Eng. 2011, 19, 1110–1117. [Google Scholar] [CrossRef]
  27. Vince, J. Coordinate Systems. In Foundation Mathematics for Computer Science; Springer: Cham, Switzerland, 2024. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.