Triplet Spatial Reconstruction Attention-Based Lightweight Ship Component Detection for Intelligent Manufacturing

Feng, Bocheng; Yao, Zhenqiu; Feng, Chuanpu

doi:10.3390/app15158676

Open AccessArticle

Triplet Spatial Reconstruction Attention-Based Lightweight Ship Component Detection for Intelligent Manufacturing

by

Bocheng Feng

¹

,

Zhenqiu Yao

^1,2,* and

Chuanpu Feng

¹

School of Naval Architecture and Ocean Engineering, Jiangsu University of Science and Technology, Zhenjiang 212003, China

²

Marine Equipment and Technology Institute, Jiangsu University of Science and Technology, Zhenjiang 212003, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(15), 8676; https://doi.org/10.3390/app15158676

Submission received: 5 July 2025 / Revised: 23 July 2025 / Accepted: 5 August 2025 / Published: 5 August 2025

(This article belongs to the Special Issue Artificial Intelligence on the Edge for Industry 4.0)

Download

Browse Figures

Versions Notes

Abstract

Automatic component recognition plays a crucial role in intelligent ship manufacturing, but existing methods suffer from low recognition accuracy and high computational cost in industrial scenarios involving small samples, component stacking, and diverse categories. To address the requirements of shipbuilding industrial applications, a Triplet Spatial Reconstruction Attention (TSA) mechanism that combines threshold-based feature separation with triplet parallel processing is proposed, and a lightweight You Only Look Once Ship (YOLO-Ship) detection network is constructed. Unlike existing attention mechanisms that focus on either spatial reconstruction or channel attention independently, the proposed TSA integrates triplet parallel processing with spatial feature separation–reconstruction techniques to achieve enhanced target feature representation while significantly reducing parameter count and computational overhead. Experimental validation on a small-scale actual ship component dataset demonstrates that the improved network achieves 88.7% mean Average Precision (mAP), 84.2% precision, and 87.1% recall, representing improvements of 3.5%, 2.2%, and 3.8%, respectively, compared to the original YOLOv8n algorithm, requiring only 2.6 M parameters and 7.5 Giga Floating-point Operations per Second (GFLOPs) computational cost, achieving a good balance between detection accuracy and lightweight model design. Future research directions include developing adaptive threshold learning mechanisms for varying industrial conditions and integration with surface defect detection capabilities to enhance comprehensive quality control in intelligent manufacturing systems.

Keywords:

intelligent manufacturing; lightweight object detection; attention mechanism; small sample detection

1. Introduction

The global shipbuilding industry, as a crucial supporting sector for international trade, is undergoing transformation and upgrading toward intelligent manufacturing. According to statistics from the United Nations Conference on Trade and Development, over 80% of global trade goods are transported by maritime shipping, making the maritime industry a vital pillar of international trade. In this context, the level of intelligence in the shipbuilding industry [1] directly relates to the efficiency and cost control of global maritime logistics. However, due to the unique characteristics of order-based production mode in shipbuilding, the sorting work after ship component cutting has become a key bottleneck, constraining production efficiency improvement.

Ship component sorting, as the core process connecting cutting and assembly, directly affects the overall production cycle [2]. Current sorting modes rely on Quick Response (QR) code recognition technology to achieve semi-automated operation but still require manual classification and stacking according to patterns, with heavy dependence on manual processes severely constraining the transition toward fully intelligent shipbuilding. Unlike standardized manufacturing industries, ship component manufacturing features diverse patterns, complex and varied geometric shapes, different size specifications, and widespread component stacking occlusion phenomena, limiting the application effectiveness of traditional intelligent recognition technologies.

For target recognition and classification tasks in industrial scenarios, existing research is primarily based on deep learning methods. Single-stage detection networks represented by the You Only Look Once (YOLO) series have gained widespread attention in industrial applications due to their efficiency but typically require large amounts of training data to achieve ideal recognition performance. When facing complex scenarios in ship component sorting with limited dataset samples and target stacking occlusion, existing methods still encounter problems such as inaccurate recognition and missed detection.

To address the problems of small samples and target occlusion, researchers mainly adopt two improvement approaches. The first is network structure improvement design, such as Residual Network (ResNet) and Densely Connected Convolutional Network (DenseNet), using residual connections and dense connectivity, respectively, achieving lightweight through improved connection methods but with limited effectiveness in handling stacking occlusion problems [3]. The second is the introduction of attention mechanisms, which although showing potential in detection performance improvement, existing attention mechanisms still face key challenges in industrial scenarios of ship component recognition including high computational complexity, insufficient small-sample adaptability, and limited capability for processing spatial redundant information [4].

Current attention mechanism research demonstrates promising results in general computer vision tasks, yet effective solutions remain needed for the specific requirements of ship component sorting, including small sample learning, target stacking recognition, and feature redundancy suppression. In ship manufacturing environments, existing attention mechanisms encounter difficulties distinguishing between diverse component patterns and complex geometric shapes, while manual markings and surface scratches create interference that conventional spatial attention cannot effectively filter, and component stacking occlusion produces spatial relationships that current channel attention mechanisms fail to properly address. Therefore, a Triplet Spatial Reconstruction Attention (TSA) mechanism is proposed, addressing these limitations through threshold-based feature separation combined with triplet parallel processing to simultaneously achieve high accuracy in small-sample scenarios while maintaining computational efficiency for industrial deployment.

The proposed methodology encompasses dataset construction for ship component recognition, systematic analysis of existing attention mechanisms, design of the TSA mechanism featuring dual-branch parallel processing architecture, and integration with YOLOv8n to develop the YOLO-Ship detection network. The TSA mechanism establishes feature extraction in different spatial directions within sub-networks, achieving cross-dimensional feature fusion through channel dimensions while performing separation and reconstruction of spatial semantic information through weight discrimination, suppressing redundant information in spatial dimensions and enhancing semantic feature expression capability. Experimental validation demonstrates substantial performance improvements with a mean Average Precision reaching 88.7%, precision achieving 84.2%, and recall attaining 87.1%, while maintaining computational efficiency with 2.6 M parameters and 7.5 GFLOPs. The network fully leverages the feature separation reconstruction and parallel processing advantages of the TSA mechanism, achieving an effective balance between accuracy and efficiency in small-sample ship component recognition tasks. The following sections provide a comprehensive analysis of related work, detailed methodology description, experimental validation, and discussion of the proposed approach’s performance and implications.

2. Related Work

2.1. Attention Mechanisms in Object Detection

Attention mechanisms such as Squeeze-and-Excitation Network (SENet) and Convolutional Block Attention Module (CBAM) adopt spatial and channel combination approaches, effectively improving network performance. Among them, CBAM shows significant effectiveness in cross-dimensional attention weight integration [5]. With the development of attention mechanisms, researchers have proposed improvement schemes from different perspectives. Efficient Multi-scale Attention (EMA), proposed by Daliang Ouyang, as a multi-scale attention mechanism, adopts a three-way parallel architecture combining spatial scale with channel information to achieve target feature information enhancement and cross-dimensional interactive fusion [6]. Kin Wai Lau proposed Large Separable Kernel Attention (LSKAttention), a separable kernel attention mechanism that decomposes 2D convolution kernels into 1D cascaded structures, reducing computational overhead [7]. Despite these advances, existing attention mechanisms exhibit fundamental limitations when applied to industrial scenarios: sequential processing architectures that create computational bottlenecks unsuitable for real-time deployment, uniform spatial treatment that fails to distinguish meaningful signals from industrial noise such as surface markings and lighting variations, and inadequate integration of spatial reconstruction with channel attention for robust feature representation under challenging industrial conditions.

2.2. Spatial Reconstruction and Feature Separation Methods

Spatial reconstruction approaches aim to address feature redundancy through separation and reorganization strategies. The Spatial and Channel Reconstruction Convolution (SCConv) demonstrates effectiveness in handling spatial and channel redundancy through separation–reconstruction strategies, utilizing dedicated units for spatial and channel processing. Coordinate Attention (CA) performs cross-dimensional fusion embedding of spatial channels, showing excellent performance in feature interaction across different dimensional spaces. Meanwhile, lightweight network architectures such as Residual Network (ResNet) and Densely Connected Convolutional Network (DenseNet) employ residual connections and dense connectivity, respectively, to achieve computational efficiency through improved connection methods, yet these approaches show limited effectiveness in handling complex spatial relationships and feature separation requirements characteristic of industrial detection scenarios.

2.3. Industrial Scene Small-Sample Detection Challenges

Industrial manufacturing environments present distinct detection challenges due to inherent complexities in object characteristics and data acquisition constraints. In aluminum casting inspection, deep object detection methods face significant obstacles when dealing with simulated defects and complex geometric variations under industrial conditions [8]. Recent research has examined difficulties in processing small-scale components, occlusion patterns, and constrained training datasets within manufacturing facilities [9]. Industrial stamping processes demonstrate similar constraints where deep metric learning approaches must operate effectively with limited sample sizes for tool condition diagnosis [10].

Ship component detection encounters specific small-sample challenges due to the diverse patterns, complex geometric shapes, and varying size specifications inherent in maritime manufacturing. The acquisition of defective assembly samples requires extended production periods, as assembly errors occur infrequently during normal operations. Traditional few-shot learning approaches face difficulties in ship component recognition due to high visual similarity between correctly and incorrectly assembled components, where subtle assembly differences must be detected reliably [11]. Advanced prototype-based methods show effectiveness in laboratory conditions yet struggle when applied to ship manufacturing environments where lighting variations, component wear, and assembly contexts affect detection accuracy [12]. These limitations highlight the necessity for robust detection mechanisms specifically designed for industrial small-sample scenarios.

3. Materials and Methods

The methodology encompasses four key components: dataset construction for ship component recognition, analysis of existing attention mechanisms, design of the proposed TSA triplet spatial reconstruction attention mechanism, and development of the YOLO-Ship detection network.

3.1. Dataset Construction

Due to the order-based customized production mode adopted in shipbuilding, with complex and diverse geometric shapes of components, there is currently a lack of open-source ship component datasets internationally. The dataset was constructed through a combination of web collection and on-site acquisition methods, expanding 160 original images to 792 images through rotation and flipping operations. The image size was uniformly set to 640 × 640. Based on the sample quantity, this dataset belongs to a small-sample dataset. To improve training effectiveness, homomorphic filtering [13] was applied to the dataset images using an improved Gaussian function to filter noise and the effects of light diffuse reflection from steel plate surfaces. The improved homomorphic filtering function is shown as follows:

H (u, v) = (1 - e^{(- c D^{2} (u, v))}) [1 - \frac{k}{e^{(D (u, v) - 1)} + e^{(1 + D (u, v))}}] + R l

(1)

where

H (u, v)

represents the improved homomorphic filtering transfer function,

c

represents the steepness of the function,

D (u, v)

represents the Euclidean distance from point

(u, v)

to the center in the frequency domain,

k

represents the adjustment parameter,

l

represents half the horizontal length of the image after Fourier transform, and

R l

is the low-frequency gain coefficient.

Regarding the determination of sample categories, because layout planning of cutting component positions is performed during steel plate component cutting to maximize steel plate utilization, diversification and randomization of component styles and distribution occur during component recognition. To address this issue, the complex target recognition and classification task was decomposed. After identifying and classifying components to their corresponding ship sections through QR code scanning, the component styles of target steel plates were classified based on geometric features as the classification criterion, which were overall divided into rectangle, circle, and rectangle-circle (linear features dominate over curved features), circle-rectangle (curved features dominate over linear features), and others. The five component classification labels were set as rectangle, circle, rec-cir, cir-rec, and other, respectively, thereby generating label files and completing dataset construction. This sorting and classification method is applicable to the sorting of ship steel plate cutting components. Partial component image data are shown in Figure 1.

To ensure reproducibility and facilitate future research, the complete dataset will be publicly released upon publication. The dataset includes all processed images in 640 × 640 resolution with corresponding annotation files in standard format for all five component categories. This release aims to establish a benchmark for ship component recognition research and enable direct comparison with future methods.

3.2. Attention Mechanism

Due to the harsh production environment for ship components, with numerous interference information such as manual markings, scratches, and stacking on their surfaces, leading to substantial redundant information in spatial dimensions, a spatial separation-reconstruction approach was adopted to divide feature map data into two categories: information-rich and information-sparse, separated using thresholds. The Spatial and Channel Reconstruction Convolution (SCConv) module proposed by Li et al. effectively handles spatial and channel redundancy through Spatial Reconstruction Unit (SRU) and Channel Reconstruction Unit (CRU) units [14], providing effective ideas for spatial reconstruction, but the overall architecture has high computational complexity and large parameter count. Coordinate Attention (CA) performs cross-dimensional fusion embedding of spatial channels, showing excellent performance in feature interaction, but its identification and filtering effectiveness for redundant feature information is not ideal when facing multi-scale feature fusion [15].

Inspired by the design concepts of SCConv spatial reconstruction strategy and CA cross-dimensional fusion, addressing the problems of high computational complexity in the former and insufficient redundant information identification capability in the latter, TSA is constructed as a triplet spatial reconstruction attention mechanism. TSA achieves a balance between performance enhancement and computational efficiency in small-sample data scenarios through innovative triplet parallel processing and spatial separation-reconstruction strategies.

3.2.1. SRU

The SRU is the spatial reconstruction component of the Spatial and Channel Reconstruction Convolution (SCConv) module proposed in 2023. It adopts a separation–reconstruction approach to effectively reduce redundant features in spatial and channel dimensions, decrease model computational complexity, and enhance semantic feature information. Group normalization is applied to input feature maps, dividing channels equally into 16 groups, with the normalization principle shown in Equation (2).

X_{o u t} = G N (X) = γ \frac{X - μ}{\sqrt{σ^{2} + ε}} + β

(2)

where

μ

and

σ

are the mean and standard deviation of the input feature map

X

,

ε

is the stability coefficient, and

γ

and

β

are trainable affine transformation values.

Weight coefficients are obtained through normalization processing, as shown in Equation (3).

W_{γ} = \{w_{i}\} = \frac{r_{i}}{\sum_{j = 1}^{C} r_{i}}, i, j = 1, 2, \dots C

(3)

where

W_{γ}

represents the normalized weight coefficient vector,

w_{i}

represents the weight value of the

i

-th channel,

r_{i}

represents the scaling parameter of the corresponding channel, and

C

represents the total number of feature map channels.

Based on weight coefficients

W_{γ}

, feature separation is achieved using Sigmoid activation function and gating mechanism, as shown in Equation (4).

W = G a t e (S i g m o i d (W_{γ} (G N (X))))

(4)

where the

G a t e

function converts continuous weights to binary masks through preset thresholds, achieving the separation of information-rich features from redundant features. Finally, based on Equation (5), the separated features are reorganized through cross-reconstruction operations, suppressing spatial redundant information while enhancing effective feature expression.

\{\begin{matrix} \begin{array}{l} X_{1}^{ω} = W_{1} \otimes X \\ X_{2}^{ω} = W_{2} \otimes X \end{array} \\ X_{11}^{ω} \oplus X_{22}^{ω} = X^{ω 1} \\ X_{21}^{ω} \oplus X_{12}^{ω} = X^{ω 2} \\ X^{ω 1} \cup X^{ω 2} = X^{ω} \end{matrix}

(5)

where

X_{1}^{ω}

and

X_{2}^{ω}

represent the feature maps separated by weights

W_{1}

and

W_{2}

, respectively,

W_{1}

and

W_{2}

are the binary masks obtained from Equation (4),

X_{11}^{ω}

,

X_{22}^{ω}

,

X_{21}^{ω}

, and

X_{12}^{ω}

represent the subdivided feature groups for cross-reconstruction,

\otimes

denotes element-wise multiplication,

\oplus

denotes element-wise addition, and

\cup

denotes concatenation.

3.2.2. CA Mechanism

Coordinate Attention (CA) is a lightweight attention module characterized by considering channel dimensions while incorporating spatial position information, elevating ordinary cross-dimensional fusion to the connection between channels and spatial positions [16]. The structure is shown in Figure 2.

The CA mechanism performs global average pooling operations along vertical and horizontal directions on the input tensor, generating two one-dimensional feature vectors to capture spatial feature information from different directions. The pooled features from both directions are concatenated to construct a feature representation containing complete spatial positional information. The concatenated features undergo convolution, batch normalization, and Rectified Linear Unit (ReLU) activation processing to further enhance feature representation capability. The processed feature maps are split along horizontal and vertical directions, with each direction activated through sigmoid functions to generate direction-aware attention weights, which are then applied to the original input features through element-wise multiplication, resulting in enhanced feature outputs that fuse spatial features from different directions with channel information.

3.3. TSA Triplet Spatial Reconstruction Attention Mechanism

The proposed TSA triplet spatial reconstruction attention mechanism adopts horizontal spatial pooling, vertical spatial pooling, and global feature enhancement as a triplet processing strategy, dividing input image features

X

into two branches for parallel processing, achieving reduced computational overhead while enhancing recognition effectiveness. The horizontal and vertical pooling operations extract directional spatial dependencies while preserving channel information, and the global enhancement component applies semantic-driven threshold discrimination to distinguish meaningful features from redundant information, enabling adaptive feature reconstruction through cross-reorganization operations.

The first branch performs horizontal and vertical pooling on input feature values

X

, retaining channel information while combining the two directional dimensional values for batch normalization, obtaining target feature weight coefficients under the combination of spatial directional feature information and channel dimensions. After multiplying the weights with the feature map values

X

themselves, sigmoid activation mapping is performed. Using these values for threshold judgment, features are divided into feature values

X_{1}^{ω}

with higher semantic information and relatively lower feature values

X_{2}^{ω}

. The two sets of feature values are then divided separately, with four groups of data undergoing pairwise cross-reorganization and addition to suppress redundant spatial features, finally combining the two sets of features into original-size feature values.

The other branch performs batch normalization on feature values

X

, and after activation mapping, adds them to the first branch output feature values with weight coefficient readjustment to ensure output feature values do not overfit, yielding output

X^{ω}

. The entire TSA triplet spatial reconstruction attention mechanism is shown in Figure 3.

The computational efficiency of TSA stems from its parallel processing architecture and linear complexity design. The horizontal and vertical pooling operations maintain O (C × H × W) complexity for spatial feature extraction, while the dual-branch structure avoids the quadratic channel dependencies inherent in traditional attention mechanisms. In contrast to SCConv’s O (C² × H × W) complexity arising from its sequential spatial-channel processing, TSA achieves feature separation and reconstruction through threshold-based operations that scale linearly with input dimensions. Similarly, while CA mechanisms typically require O (C × H × W + C²) operations for coordinate embedding and channel-wise attention computation, TSA’s triplet processing strategy eliminates the need for expensive channel-wise matrix operations through its parallel pooling and reconstruction approach. The threshold separation and cross-reorganization operations maintain constant computational overhead regardless of channel depth, contributing to the overall efficiency gain.

3.4. YOLO-Ship Network

Targeting the characteristics of small samples, diversity, and stacking occlusion in ship component recognition, YOLOv8n was selected as the base network. Compared to newer YOLO versions, YOLOv8 achieves better balance between accuracy and computational efficiency, with good maturity and stability in industrial applications [17], making it more suitable for the lightweight objectives of this research.

Based on the aforementioned TSA triplet spatial reconstruction attention mechanism, the TSA mechanism is combined with the C2f module for improvement, constructing the C2f-TSA module with the structure shown in Figure 4.

The improved C2f-TSA module replaces the original C2f modules in the Neck section of the YOLOv8n network, forming the YOLO-Ship detection network. Through the spatial reconstruction and cross-dimensional feature fusion capabilities of the TSA mechanism, it better handles small target detection and stacking occlusion problems in ship component images. Compared to the original network, the lightweight design of TSA ensures that the improved network reduces model parameters while improving convergence speed and detection accuracy, meeting the real-time requirements of industrial applications. The complete structure of the YOLO-Ship network is shown in Figure 5.

4. Results

Comprehensive experimental validation was conducted to evaluate the performance of the proposed YOLO-Ship model in ship component detection tasks.

4.1. Experimental Configuration

The experimental platform adopted Ubuntu 18.04 LTS operating system, PyTorch deep learning framework version 1.9.0 with Compute Unified Device Architecture (CUDA) 11.1, and RTX3090 24 GB graphics card. The dataset was divided into training (554 images), test (79 images), and validation (159 images) sets following a 7:1:2 ratio to ensure balanced category representation. The training set was used for model parameter optimization, the validation set for hyperparameter tuning and early stopping strategy, and the test set for final performance evaluation. Adaptive Moment Estimation with Weight Decay (AdamW) optimizer was selected with learning rate set to 0.001, batch size of 8, total training epochs of 300, network input image size of 640 × 640, and no pre-trained weights loaded. Model performance evaluation adopted three metrics: Precision (P), Recall (R), and mean Average Precision (mAP) [18]. These metrics comprehensively evaluate detection performance, with Precision addressing false positive control crucial for industrial applications, Recall ensuring complete detection coverage, and mAP providing integrated assessment across all confidence thresholds and categories. All comparative experiments were conducted under identical hardware, software, and parameter configurations to ensure fair comparison across different methods.

4.2. Experimental Results and Analysis

The experimental results demonstrate the effectiveness of the TSA mechanism through comprehensive performance analysis and comparative studies with existing methods.

4.2.1. Model Performance Analysis

The YOLO-Ship network with integrated TSA mechanism optimizes model parameters to 2,625,035 (a reduction of 0.6 M compared to the original model) with a computational cost of 7.5 GFLOPs. The experimental results validate the theoretical advantages of TSA analyzed in Section 3.3, confirming that the mechanism achieves significant computational efficiency improvements while maintaining detection accuracy. The actual detection results are shown in Figure 6, and the model’s performance on the test set is shown in Figure 7 and Figure 8.

4.2.2. Comparison of Different Attention Mechanisms

To verify that the proposed TSA mechanism improves accuracy and precision in component recognition, ablation experiments were conducted with four control groups: no attention mechanism, SCConv attention, CA, and the proposed TSA mechanism under the same training and testing conditions [19]. The evaluation was based on Precision (P), mean Average Precision (mAP), and Recall (R) [20], with experimental results and conclusions shown in Table 1.

4.2.3. Comparison with Other Mainstream Models

To validate and evaluate YOLO-Ship model performance, multiple current mainstream target detection network models were trained on the ship component dataset under the same experimental environment and parameter configuration [21], providing comprehensive comparison across different architectural approaches. The baseline selection encompasses classical convolutional architectures and YOLO variants to represent both traditional feature extraction approaches and modern single-stage detection frameworks, ensuring fair comparison across different architectural paradigms. Performance comparison was conducted [22], as shown in Table 2.

The comparative analysis demonstrates that YOLO-Ship achieves superior performance across all evaluation metrics. Notably, YOLO-Ship outperforms recent YOLO variants, achieving 88.7% mAP, which represents significant improvements over the compared models.

5. Discussion

The proposed study introduces a Triplet Spatial Reconstruction Attention (TSA) mechanism integrated with YOLOv8n to address ship component detection challenges in small-sample industrial manufacturing scenarios. The approach achieves 88.7% mean Average Precision while reducing parameters to 2.6 M and maintaining linear computational complexity. This discussion examines the technical innovations compared to existing methods, practical implications for industrial deployment, potential application prospects, current limitations, and future research directions.

Traditional attention mechanisms exhibit fundamental limitations when applied to industrial small-sample detection scenarios. Approaches like SCConv and CA, while effective in standard computer vision tasks, demonstrate inherent design constraints in industrial environments. SCConv’s sequential processing of spatial and channel information creates computational bottlenecks that limit real-time deployment capabilities, while CA’s coordinate-based attention fails to adequately address feature redundancy prevalent in cluttered manufacturing scenes.

Recent advances in object detection increasingly explore transformer-based attention mechanisms and sophisticated multi-scale architectures. While transformer-based approaches demonstrate powerful global modeling capabilities through self-attention operations, they typically require substantial computational resources and extensive training data to establish effective attention patterns, presenting challenges for resource-constrained industrial applications. Furthermore, complex multi-scale feature fusion frameworks, despite achieving impressive accuracy in general scenarios, often introduce computational overhead that may not align optimally with the geometric patterns and spatial constraints characteristic of manufacturing environments. The threshold-based separation strategy employed in TSA addresses these limitations through targeted spatial-semantic processing that enables more granular feature discrimination while maintaining computational efficiency for industrial small-sample scenarios.

The TSA mechanism demonstrates the potential for extension to manufacturing contexts sharing similar characteristics with ship component detection, particularly scenarios involving geometric component classification, small-sample constraints, and computational efficiency requirements. The lightweight architecture and parallel processing design may prove applicable to manufacturing environments where edge computing deployment offers advantages. The performance improvements achieved by YOLO-Ship demonstrate practical significance in shipbuilding applications where component recognition accuracy impacts production efficiency. The 3.5% mAP improvement corresponds to meaningful cost reductions in large-scale manufacturing operations, where error minimization helps prevent material waste and rework cycles. The 12.8% parameter reduction while maintaining enhanced performance suggests that TSA provides an effective approach to the model complexity–accuracy balance, which is relevant for edge computing applications in industrial Internet of Things deployments.

The current implementation presents several limitations requiring acknowledgment. The dataset scale, while representative of typical small-sample industrial scenarios, limits comprehensive evaluation across diverse shipbuilding environments and component types. The current evaluation focuses primarily on geometric component classification, and performance in surface defect detection remains unexplored. Practical deployment of TSA involves several implementation considerations that warrant discussion. The triplet parallel processing architecture, although reduces parameter count and theoretical computational complexity, may exhibit varying inference performance across different hardware platforms due to memory bandwidth constraints and parallel processing efficiency differences. The mechanism incorporates architectural parameters such as group normalization settings that are optimized for the current dataset characteristics. However, while these demonstrate robust performance in the current evaluation, their optimal configuration may vary when applied to different industrial contexts with varying data distributions or challenging visual conditions such as extreme metallic reflections under high-intensity lighting, where the separation mechanism’s feature discrimination capabilities could be compromised.

Future research directions should prioritize addressing current limitations and expanding the method’s robustness and applicability. Adaptive threshold learning mechanisms warrant investigation to enable automatic parameter optimization across different manufacturing environments. Robust preprocessing strategies for challenging illumination conditions require development to address metallic surface reflections and varying lighting scenarios common in industrial settings. Validation across additional small-sample industrial detection tasks would help establish the broader applicability of TSA beyond ship component recognition. Integration with emerging vision–language models presents opportunities for enhanced semantic understanding in manufacturing quality control applications, potentially enabling more sophisticated defect classification and process optimization capabilities.

6. Conclusions

A lightweight detection network YOLO-Ship has been presented, integrating the novel TSA mechanism to address ship component recognition challenges in intelligent manufacturing. The main contributions include the development of the triplet spatial reconstruction attention mechanism and its successful integration with YOLOv8n to create an effective detection network optimized for industrial scenarios.

Experimental validation demonstrates substantial performance improvements: mAP reaches 88.7% (3.5% increase), precision achieves 84.2% (2.2% increase), and recall attains 87.1% (3.8% increase), while maintaining computational efficiency with only 2.6 M parameters and 7.5 GFLOPs. These results validate the practical value of TSA in addressing small-sample learning and component occlusion challenges in industrial environments.

Limitations include dataset scale constraints and domain-specific validation that may affect broader applicability. Future research should prioritize cross-domain validation across diverse manufacturing sectors including automotive and heavy machinery, to demonstrate TSA’s broader industrial applicability, focus on developing adaptive threshold mechanisms for TSA, and investigate the method’s general effectiveness across different intelligent manufacturing systems.

Author Contributions

Conceptualization, B.F. and Z.Y.; methodology, B.F.; software, B.F.; validation, B.F., Z.Y. and C.F.; formal analysis, B.F.; investigation, B.F. and C.F.; resources, Z.Y.; data curation, B.F.; writing—original draft preparation, B.F.; writing—review and editing, B.F., Z.Y. and C.F.; visualization, B.F.; supervision, Z.Y.; project administration, Z.Y.; funding acquisition, Z.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The ship component dataset constructed in this study, including all 792 images with corresponding annotations, will be made publicly available through GitHub repository upon publication. The dataset includes detailed geometric classification labels for all five component categories in standard annotation format. The implementation was developed using Python 3.8.10, PyTorch 1.8.1+cu111, and CUDA 11.1 for GPU acceleration. The source code and detailed implementation of the proposed TSA mechanism and YOLO-Ship network are not publicly released due to ongoing coherent research; however, researchers may obtain the relevant code and technical details by submitting reasonable academic requests to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

YOLO	You Only Look Once
QR	Quick Response
ResNet	Residual Network
DenseNet	Densely Connected Convolutional Network
SENet	Squeeze-and-Excitation Network
CBAM	Convolutional Block Attention Module
EMA	Efficient Multi-scale Attention
LSKAttention	Large Separable Kernel Attention
TSA	Triplet Spatial Reconstruction Attention
C2f	Cross Stage Partial with two convolutions and feature fusion
SCConv	Spatial and Channel reconstruction Convolution
SRU	Spatial Reconstruction Unit
CRU	Channel Reconstruction Unit
CA	Coordinate Attention
GFLOPs	Giga Floating-point Operations per Second
mAP	mean Average Precision
P	Precision
R	Recall
AdamW	Adaptive Moment Estimation with Weight Decay
ReLU	Rectified Linear Unit
CUDA	Compute Unified Device Architecture

References

Zhang, X.; Chen, D. Shipbuilding 4.0: A Systematic Literature Review. Appl. Sci. 2024, 14, 6363. [Google Scholar] [CrossRef]
Jebbor, I.; Benmamoun, Z.; Hachimi, H. Optimizing Manufacturing Cycles to Improve Production: Application in the Traditional Shipyard Industry. Processes 2023, 11, 3136. [Google Scholar] [CrossRef]
Ahmed, M.; Hashmi, K.A.; Pagani, A.; Liwicki, M.; Stricker, D.; Afzal, M.Z. Survey and Performance Analysis of Deep Learning Based Object Detection in Challenging Environments. Sensors 2021, 21, 5116. [Google Scholar] [CrossRef] [PubMed]
Xu, W.; Wan, Y.; Zhao, D. SFA: Efficient Attention Mechanism for Superior CNN Performance. Neural Process. Lett. 2025, 57, 38. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient Multi-Scale Attention Module with Cross-Spatial Learning. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Rhodes Island, Greece, 4–10 June 2023. [Google Scholar]
Lau, K.W.; Po, L.M.; Rehman, Y.A.U. Large separable kernel attention: Rethinking the large kernel attention design in CNN. Expert Syst. Appl. 2024, 236, 121352. [Google Scholar] [CrossRef]
Mery, D. Aluminum Casting Inspection using Deep Object Detection Methods and Simulated Ellipsoidal Defects. Mach. Vis. Appl. 2021, 32, 72. [Google Scholar] [CrossRef]
Tang, J.; Lu, H.; Xu, X.; Wu, R.; Hu, S.; Zhang, T.; Cheng, T.W.; Ge, M.; Chen, Y.C.; Tsung, F. An Incremental Unified Framework for Small Defect Inspection. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024. [Google Scholar]
Dzulfikri, Z.; Su, P.-W.; Huang, C.-Y. Stamping Tool Conditions Diagnosis: A Deep Metric Learning Approach. Appl. Sci. 2021, 11, 6959. [Google Scholar] [CrossRef]
Kim, S.; An, S.; Chikontwe, P.; Kang, M.; Adeli, E.; Pohl, K.M.; Park, S.H. Few Shot Part Segmentation Reveals Compositional Logic for Industrial Anomaly Detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, Canada, 20–27 February 2024. [Google Scholar]
Jaykumar P, J.; Palanisamy, K.; Chao, Y.W.; Du, X.; Xiang, Y. Proto-CLIP: Vision-Language Prototypical Network for Few-Shot Learning. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Abu Dhabi, United Arab Emirates, 14–18 October 2024. [Google Scholar]
Karnati, M.; Seal, A.; Yazidi, A.; Krejcar, O. Flepnet: Feature level ensemble parallel network for facial expression recognition. IEEE Trans. Affect. Comput. 2022, 13, 2058–2070. [Google Scholar] [CrossRef]
Li, J.; Wen, Y.; He, L. SCConv: Spatial and Channel Reconstruction Convolution for Feature Redundancy. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
Bai, L.; Zhi, J.S. Omni-dimensional dynamic convolution with coordinate attention detection scheme. Sci. Prog. 2025, 108, 00368504251336695. [Google Scholar] [CrossRef] [PubMed]
Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
Ali, M.L.; Zhang, Z. The YOLO framework: A comprehensive review of evolution, applications, and benchmarks in object detection. Computers 2024, 13, 336. [Google Scholar] [CrossRef]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
Yang, L.; Zhang, R.Y.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the Machine Learning Research, Virtual, 18–24 July 2021. [Google Scholar]
Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Zhang, D.; Zheng, Z.; Li, M.; Liu, R. CSART: Channel and spatial attention-guided residual learning for real-time object tracking. Neurocomputing 2021, 436, 260–272. [Google Scholar] [CrossRef]
Rezatofighi, H.; Tsoi, N.; Gwak, J.Y.; Sadeghiam, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]

Figure 1. Sample classification of ship components.

Figure 2. CA network structure diagram.

Figure 3. TSA Triplet spatial reconstruction attention structure diagram.

Figure 4. C2f-TSA structure diagram.

Figure 5. YOLO-Ship network structure diagram.

Figure 6. Ship component detection effect.

Figure 7. Precision-Confidence curve.

Figure 8. Precision-Recall curve.

Table 1. Comparison of the effects of different attention mechanisms.

Attention Mechanism	Precision/%	mAP/%	Recall/%	GFLOPs	Parameters
No addition	82	85.2	83.3	8.2	3,011,823
ScConv	81.4	85.6	84.6	8.0	2,879,887
CA	82	85.3	84.9	8.2	3,023,543
TSA	84.2	88.7	87.1	7.5	2,625,035

Table 2. Comparison of the effect of different models.

Model	Precision/%	mAP/%	Recall/%
ResNet50	81.4	84.5	82.2
ResNet101	79.6	85.6	85.9
YOLOv5n	45.1	52.1	55.8
YOLOv8n	82	85.2	83.3
YOLOv9t	80.7	86.1	82.0
YOLOv10n	77.4	82.9	77.7
YOLO-Ship	84.2	88.7	87.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Feng, B.; Yao, Z.; Feng, C. Triplet Spatial Reconstruction Attention-Based Lightweight Ship Component Detection for Intelligent Manufacturing. Appl. Sci. 2025, 15, 8676. https://doi.org/10.3390/app15158676

AMA Style

Feng B, Yao Z, Feng C. Triplet Spatial Reconstruction Attention-Based Lightweight Ship Component Detection for Intelligent Manufacturing. Applied Sciences. 2025; 15(15):8676. https://doi.org/10.3390/app15158676

Chicago/Turabian Style

Feng, Bocheng, Zhenqiu Yao, and Chuanpu Feng. 2025. "Triplet Spatial Reconstruction Attention-Based Lightweight Ship Component Detection for Intelligent Manufacturing" Applied Sciences 15, no. 15: 8676. https://doi.org/10.3390/app15158676

APA Style

Feng, B., Yao, Z., & Feng, C. (2025). Triplet Spatial Reconstruction Attention-Based Lightweight Ship Component Detection for Intelligent Manufacturing. Applied Sciences, 15(15), 8676. https://doi.org/10.3390/app15158676

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Triplet Spatial Reconstruction Attention-Based Lightweight Ship Component Detection for Intelligent Manufacturing

Abstract

1. Introduction

2. Related Work

2.1. Attention Mechanisms in Object Detection

2.2. Spatial Reconstruction and Feature Separation Methods

2.3. Industrial Scene Small-Sample Detection Challenges

3. Materials and Methods

3.1. Dataset Construction

3.2. Attention Mechanism

3.2.1. SRU

3.2.2. CA Mechanism

3.3. TSA Triplet Spatial Reconstruction Attention Mechanism

3.4. YOLO-Ship Network

4. Results

4.1. Experimental Configuration

4.2. Experimental Results and Analysis

4.2.1. Model Performance Analysis

4.2.2. Comparison of Different Attention Mechanisms

4.2.3. Comparison with Other Mainstream Models

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI