Aircraft Foreign Object Debris Detection Method Using Registration–Siamese Network

Chen, Mo; Li, Xuhui; Liu, Yan; Cheng, Sheng; Zuo, Hongfu

doi:10.3390/app151910750

Open AccessArticle

Aircraft Foreign Object Debris Detection Method Using Registration–Siamese Network

by

Mo Chen

^1,2,*,

Xuhui Li

²,

Yan Liu

¹,

Sheng Cheng

^2,* and

Hongfu Zuo

^1,*

¹

College of Civil Aviation, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China

²

COMAC Shanghai Aircraft Manufacturing Co., Ltd., Shanghai 201324, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(19), 10750; https://doi.org/10.3390/app151910750

Submission received: 8 September 2025 / Revised: 29 September 2025 / Accepted: 2 October 2025 / Published: 6 October 2025

Download

Browse Figures

Versions Notes

Abstract

Foreign object debris (FOD) in civil aviation environments poses severe risks to flight safety. Conventional detection primarily relies on manual visual inspection, which is inefficient, susceptible to fatigue-related errors, and carries a high risk of missed detections. Therefore, there is an urgent need to develop an efficient and convenient intelligent method for detecting aircraft FOD. This study proposes a detection model based on a Siamese network architecture integrated with a spatial transformation module. The proposed model identifies FOD by comparing the registered features of evidence-retention images with their corresponding normally distributed features. A dedicated aircraft FOD dataset was constructed for evaluation, and extensive experiments were conducted. The results indicate that the proposed model achieves an average improvement of 0.1365 in image-level AUC (Area Under the Curve) and 0.0834 in pixel-level AUC compared to the Patch Distribution Modeling (PaDiM) method. Additionally, the effects of the spatial transformation module and training dataset on detection performance were systematically investigated, confirming the robustness of the model and providing guidance for parameter selection in practical deployment. Overall, this research introduces a novel and effective approach for intelligent aircraft FOD detection, offering both methodological innovation and practical applicability.

Keywords:

image registration; Siamese networks; spatial transformation module; aircraft FOD

1. Introduction

In aircraft manufacturing, operation, and maintenance environments, residual foreign object debris (FOD) poses significant threats to flight safety and product quality, potentially causing failures such as short circuits, jams, and blockages during aircraft operation. Traditional methods currently used in industrial settings primarily rely on manual visual inspection and photographic documentation. However, these methods are highly dependent on human vigilance and are prone to missed detections due to fatigue and oversight. Consequently, the development of image-based automated FOD detection methods is of considerable significance for enhancing detection reliability and ensuring aviation safety.

This task falls within the domain of Image Anomaly Detection (IAD). In recent years, deep learning-based object detection and anomaly recognition methods have achieved remarkable progress in the field of industrial visual inspection. Beyond general IAD frameworks, numerous studies have proposed various innovative approaches tailored to specific tasks such as aircraft FOD detection, industrial defect identification, and small object localization. Roth et al. introduced an attention-based multi-scale feature fusion architecture, which enables high-precision localization in semiconductor defect inspection, offering new insights for detecting subtle anomalies in complex backgrounds [1]. Chen et al. developed a Generative Adversarial Network (GAN)-based anomaly detection model that performs unsupervised anomaly segmentation through reconstruction errors of normal samples, demonstrating strong performance in detecting missing fasteners in aerospace applications [2]. Wang et al. proposed an Augmented Feature Alignment Network (AFAN) that integrates intermediate domain image generation and adversarial domain training. By leveraging multi-scale feature alignment and region-level domain-invariant feature learning, AFAN significantly outperforms existing methods in both similar and dissimilar unsupervised domain adaptation tasks, effectively mitigating issues of insufficient annotated samples and inadequate feature alignment [3]. Da et al. designed a Local and Global Feature Aggregation Network (LGFAN) incorporating a Visual Geometry Group backbone, attention mechanisms, and aggregation modules. By efficiently utilizing features to reduce FOD, LGFAN maintains competitive performance across five public datasets while improving computational efficiency, addressing the limitations of existing CNN-based salient object detection algorithms that overly emphasize multi-scale feature fusion at the expense of other critical characteristics [4].

At the same time, Wang et al. presented an enhanced YOLOv8-based model for detecting aircraft skin defects. By integrating Shuffle Attention++, SIOU, and Focal Loss, a bidirectional feature pyramid network, and depthwise separable convolutions, alongside data augmentation and class-balancing strategies, the model considerably improves detection accuracy, recall, and speed for small objects in complex environments, meeting the demands of high-precision real-time inspection [5]. Liao et al. developed a system utilizing portable devices (e.g., smartphones and drones) coupled with IoT technology for real-time aircraft skin defect inspection. Employing a YOLOv9 model, the system achieves high recognition accuracy, demonstrating the potential of automated image-based detection in aviation maintenance [6]. Qiao et al. proposed a Unified Enhanced Feature Pyramid Network (UEFPN), which constructs a unified multi-scale feature domain and incorporates channel attention fusion modules to alleviate feature aliasing while enhancing contextual information in shallow features. UEFPN can be rapidly adapted to various models, and experiments confirm its significant improvements in small object detection performance [7]. Hu et al. introduced a self-supervised learning framework based on Faster R-CNN that substantially reduces the need for manual annotation in steel surface defect detection by effectively leveraging unlabeled data, offering a scalable and general solution for industrial defect recognition under low annotation budgets [8]. Zhang et al. designed a lightweight FOD model named LF-YOLO for detecting FOD on runways. By integrating high-resolution feature maps and employing a lightweight backbone and detection head, LF-YOLO achieves superior detection accuracy on small-object FOD datasets with fewer parameters compared to state-of-the-art methods [9]. Ye et al. proposed an improved YOLOv3-based FOD detection algorithm for identifying FOD on airport runways. Through multi-scale detection and enhanced feature extraction, combined with image processing and deep learning, the method enables efficient autonomous recognition with a detection speed of less than 0.2 s, exhibiting promising environmental adaptability and engineering applicability [10]. Zhang et al. addressed the challenge of detecting small and inconspicuous FOD on runways—which often leads to false positives and missed detections—by comprehensively refining the YOLOv5 algorithm through structural optimization, novel modules and loss functions, and alternative upsampling methods. Their approach achieves a notable increase of 5.4% on the Fod_Tiny dataset and 1.9% on the Micro_COCO dataset over baseline performance, validating the effectiveness and generalization capability of the proposed improvements [11]. Shan et al. surveyed traditional, radar, and AI-based techniques for FOD detection in critical areas such as airport runways, outlining the strengths and weaknesses of each method and emphasizing the need for integrated radar and AI systems to enhance detection performance, thereby guiding future research toward safer and more efficient operations [12]. Kumari et al. introduced an enhanced YOLOv8-based deep learning approach for runway FOD detection, achieving higher precision (mAP50 of 99.022% and mAP50-95 of 88.354%) on the public FOD-A dataset and outperforming conventional methods in complex environments [13]. Mo et al. proposed an intelligent detection method based on an improved YOLOv5 architecture. Trained on a dual-spectrum dataset, the model achieves real-time detection at 36.3 frames per second with 91.1% accuracy (a 7.4% improvement over the baseline), demonstrating both effectiveness and practical utility [14]. Yu et al. developed a lightweight FOD detection model based on YOLOv5s, which significantly enhances the accuracy and speed of small FOD detection through optimized feature extraction, receptive field modules, and detection head design, providing an innovative solution for automated FOD inspection in civil aircraft assembly [15]. These studies offer diverse technical pathways for aircraft FOD detection, encompassing feature fusion, domain adaptation, small object optimization, and real-time processing. Nevertheless, they also underscore the persistent challenges in achieving efficient, accurate, and robust detection in real-world complex scenarios.

In contrast to the aforementioned studies, which are predominantly focused on structured environments such as runways and aircraft skins, this paper addresses the challenge of FOD detection within enclosed aircraft compartments. Such environments present multifaceted difficulties including confined spaces, highly variable viewpoints, uneven illumination, and complex backgrounds, resulting in significantly more complicated detection conditions and a scarcity of large-scale annotated data. To tackle these challenges, we propose a registration-based Siamese network framework for FOD detection, aiming to resolve the critical problem of cross-view anomaly detection in complex interior spaces.

The proposed model is built upon a Siamese architecture and trained using pairs of normal images. To preserve spatial information, the network retains the first three convolutional blocks of ResNet, while incorporating spatial transformation modules to handle geometric variations between image pairs. A feature registration module is further integrated to prevent model collapse. During inference, anomalies are identified by quantifying the deviation between the features of the test image and the established normal distribution. Experimental results demonstrate that our model achieves significant improvements in both image-level and pixel-level AUC (Area Under the Curve) compared to baseline methods.

The main contributions of this work can be summarized as follows: (1) the construction of a dedicated dataset specifically designed for aircraft FOD detection; (2) the development of a novel detection framework that integrates Siamese networks with spatial transformations; (3) comprehensive validation of the proposed model, accompanied by an in-depth analysis of key factors influencing detection performance; and (4) demonstration of the model’s robustness through evaluation on additional datasets.

The remainder of this paper is organized as follows: Section 2 describes the materials and methods, Section 3 presents experimental results and analysis, Section 4 provides discussion, and Section 5 concludes the study.

2. Materials and Methods

2.1. Siamese Representation Learning

Siamese networks [16,17,18], which employ weight-sharing branches to process paired inputs, have become a cornerstone of unsupervised visual representation learning. Early methods predominantly utilized contrastive learning frameworks, such as SimCLR (A Simple Framework for Contrastive Learning of Representations) [19] and MoCo (Momentum Contrast) [20], which maximize the similarity between augmented views of the same image while repelling negative pairs to prevent representation collapse. These approaches typically required large batch sizes or memory banks to maintain performance. Subsequently, alternative strategies were introduced: clustering-based methods like SwAV (Swapping Assignments between Views) [21] assigned representations to prototypes through online clustering, while BYOL (Bootstrap Your Own Latent) [22] employed a momentum encoder in one branch to provide stable targets without relying on negative examples. These techniques collectively reinforced the assumption that complex mechanisms—such as negative sampling, large batches, or momentum encoders—were indispensable for avoiding collapse.

A significant shift in this paradigm was introduced by SimSiam, which demonstrated that none of these components is fundamentally required. The architecture of SimSiam is notably simple: it maximizes the similarity between two augmented views using only a weight-shared encoder, a small predictor MLP (Multilayer Perceptron), and a stop-gradient operation. Critically, the removal of the stop-gradient operation leads to immediate model collapse, whereas its inclusion enables the model to achieve competitive accuracy on ImageNet. Although the predictor MLP is essential to the learning process, it does not need to converge fully. Similarly, while batch normalization aids in optimization, it is not the factor preventing collapse.

As shown in Figure 1, the

e n c o d e r f

processes both input views under a weight-sharing mechanism, producing feature representations

z_{a} = f (x_{3, a}^{t}), z_{b} = f (x_{3, b}^{t})

. A prediction MLP head, denoted as

h

, is applied to the output of one view and aligns it with the representation of the other view, yielding

p_{a} = h (z_{a}), p_{b} = h (z_{b})

. Defining the two output vectors as

p_{a} ≜ h (f (x_{3, a}^{t}))

and

z_{b} ≜ f (x_{3, b}^{t})

, we minimize the negative cosine similarity between them:

D (p_{a}, z_{b}) = - \frac{p_{a}}{{‖p_{a}‖}_{2}} \cdot \frac{z_{b}}{{‖z_{b}‖}_{2}}

(1)

A symmetrized loss was defined as:

L = \frac{1}{2} D (p_{a}, z_{b}) + \frac{1}{2} D (p_{b}, z_{a})

(2)

2.2. Spatial Transformation Module

Spatial Transformation Module [23] addresses a fundamental limitation of Convolutional Neural Networks (CNNs): the lack of efficient and learnable spatial invariance to geometric input transformations such as translation, rotation, scaling, and warping. Although CNNs excel in tasks such as classification and detection, their ability to handle spatial variations depends primarily on fixed mechanisms like small-kernel max-pooling, which are often inadequate in the presence of significant or complex deformations. The Spatial Transformation Module provides a differentiable component that actively warps feature maps into a canonical form, thereby improving model robustness without requiring additional supervision.

The Spatial Transformation Module consists of three key components:

1. Localisation Net: A sub-network (e.g., a convolutional neural network or a fully connected network) that estimates the transformation parameters

θ

based on the input. The localization function, denoted as

f_{l o c} ()

, must include a final regression layer to output the transformation parameters

θ

. The localization network takes an input feature map

U \in R^{H \times W \times C}

, with W and H denote the width and height, and C denotes the number of channels.

θ = f_{l o c} (U) U \in R^{H \times W \times C}

(3)

2. Grid Generator: This module generates a sampling grid

T_{θ} (G)

by applying the transformation parameters

θ

to a predefined target grid

G

. It maps each coordinate in the output feature map back to the corresponding location in the input feature space. Assuming

T_{θ}

represents a 2D affine transformation

A_{θ}

, the pointwise mapping can be expressed as:

(\binom{x_{i}^{s}}{y_{i}^{s}}) = T_{θ} (G) = A_{θ} (\binom{x_{i}^{t}}{\begin{matrix} y_{i}^{t} \\ 1 \end{matrix}}) = [\begin{matrix} θ_{11} & θ_{12} & θ_{13} \\ θ_{21} & θ_{22} & θ_{23} \end{matrix}] (\binom{x_{i}^{t}}{\begin{matrix} y_{i}^{t} \\ 1 \end{matrix}})

(4)

where

{(x}_{i}^{t},

y_{i}^{t})

are the target coordinates of the regular grid in the output feature map,

(x_{i}^{s},

y_{i}^{s})

are the source coordinates in the input feature map that define the sample points, and

A_{θ}

is the affine transformation matrix. The transformation matrix can be affine (6 parameters), projective (8 parameters), or thin-plate spline (16+ parameters). Crucially, the spatial Transformation module is differentiable end-to-end, enabling seamless integration into CNNs via standard backpropagation.

3. Differentiable Sampler: Warps the input feature map using bilinear interpolation (or other kernels) at the sampled coordinates, ensuring gradients flow back to

θ

. To perform a spatial transformation of the input feature map, a sampler must take the set of sampling points

T_{θ} (G)

, along with the input feature map

U

and produce the sampled output feature map

V

.

V_{i}^{c} = \sum_{n}^{H} \sum_{m}^{W} U_{n m}^{c} m a x (0, 1 - |x_{i}^{s} - m|) m a x (0, 1 - |y_{i}^{s} - n|)

(5)

w h e r e

U_{n m}^{c}

is the value at location

(n, m)

in channel

c

of the input, and

V_{i}^{c}

is the output value for pixel

i

at location

{(x}_{i}^{t},

y_{i}^{t})

in channel

c

.

2.3. Anomaly Scoring

During inference, we compare the registered features of the evidence retention image to its corresponding normal distribution to detect anomalies. Test samples out of the normal distribution are considered anomalies. Given the estimated normal distribution

D_{n o r m}

, denote

f_{i j}

as the registered feature of

I_{t e s t}

at the patch position

(i, j)

,

μ_{i j}

is the sample mean of the

f_{i j}

, and the anomaly score of the patch at position

(i, j)

is formulated as:

S_{i j} = \sqrt{{(f_{i j} - μ_{i j})}^{T} \sum_{i j}^{- 1} (f_{i j} - μ_{i j})}

(6)

2.4. Designed Method

Conventional image anomaly detection (IAD) methods typically require training a dedicated model for each object category. However, such a paradigm becomes highly impractical in real-world aircraft inspection scenarios due to the large number of enclosed compartments, each of which must be extensively documented with evidentiary images. A separate model should be trained for each distinct category, the total number of required models would become prohibitively large, rendering this approach infeasible for practical deployment. Standard IAD frameworks, such as those evaluated on the MVTec dataset (15 categories, requiring 15 models) or the MPDD dataset (6 categories, requiring 6 models), operate under this category-specific training assumption. In the context of our study, each evidentiary retention image effectively constitutes a unique category. With more than 150 enclosed zones in a typical aircraft and nearly 100 evidence retention images per zone, the total number of categories would exceed 1500. Consequently, training over 1500 individual models for FOD detection is clearly computationally intractable and economically unviable.

Upon analyzing the process of visual FOD detection, it becomes evident that the underlying mechanism involves mentally comparing the current image against a reference image of the same location without any FOD, thereby identifying anomalies as discrepancies. Inspired by this cognitive process, we propose a computational approach that compares evidence retention images against reference images captured at identical locations but under anomaly-free conditions. However, due to variations in imaging viewpoint, field of view, color balance, and illumination between evidence and reference images, feature-level image registration is essential to align their representations. By comparing the registered features of each evidence retention image against a corresponding normal distribution estimated from reference images, FOD can be effectively identified as statistical anomalies. This transforms the task into training a model capable of performing image registration—a process analogous to human mental comparison and invariant to image content, thereby resulting in a category-agnostic detection framework.

As illustrated in Figure 1, the proposed model is constructed based on a Siamese network architecture [16]. It is trained using positive image pairs from the MVTec, MPDD, and a self-built dataset, all belonging to the same object category. First, following the design of PaDiM [24], we retain the first three convolutional residual blocks of ResNet (B1, B2, and B3) while omitting the final block to preserve sufficient spatial information in the output features. Second, to facilitate image registration, a spatial transformation module [25] is incorporated after each convolutional residual block. This module supports six transformation modes: affine, shear, rotation–scale, translation–scale, rotation–translation, and rotation–translation–scale. Finally, the outputs from both the convolutional blocks and the spatial transformation modules are fed into a feature registration module, which prevents model collapse during optimization even in the absence of negative samples.

During inference, a normal distribution is first estimated for the target category using a small set of reference images from the same location that are free of FOD. Given the limited number of such reference images, data augmentation is applied to enhance statistical robustness. The registered features of each evidence retention image are then compared against this estimated distribution to compute an anomaly score, which quantifies the deviation from normality and serves as the detection criterion for FOD.

3. Results

3.1. Custom-Built Dataset

We constructed a custom-built dataset, termed FOD, comprising three distinct aircraft scenarios and four types of FOD. The three scenarios include the EE (Electrical Equipment) compartment wiring panel, the lower section of the EE compartment, and the lower section of the forward accessory compartment. The four types of FOD represent commonly encountered contaminants across these scenarios: cable protection sleeves, cable ties, cleaning cloths, and adhesive tapes. To emulate real deployment, smartphones were used for image capture (resolution: 2048 × 2048), with randomized viewpoints and oblique angles across multiple shots of the same scene/object. When ambient illumination made visual detection difficult, auxiliary lighting was used to ensure that both naturally challenging and improved-lighting conditions are represented. A detailed summary of the image counts within the FOD dataset is provided in Table 1.

3.2. Experimental Results

Using the training split of the FOD dataset as input, we trained our model with the spatial transformation module set to shear mode; the resulting model is denoted M_FOD. We then evaluated M_FOD on the held-out test split, reporting image-level and pixel-level AUC, as summarized in Table 2. For a fair comparison, PaDiM was trained and tested on the same splits. Across the three scenarios, M_FOD achieves an average image-level AUC of 0.7765 versus 0.6400 for PaDiM (+0.1365 absolute), and an average pixel-level AUC of 0.8972 versus 0.8138 for PaDiM (+0.0834 absolute), demonstrating consistent gains under identical evaluation conditions.

3.3. Result Visualization

The training subset of the FOD dataset was used as the input for model training. The visualization of experimental results on the test subset, specifically for the category of the lower EE compartment, is presented in Figure 2. Each row corresponds to a different test sample: the first column displays the original test images, which include one anomaly-free (good) scenario and four scenarios containing FOD—namely cable protection sleeves, cable ties, wiping cloths, and adhesive tapes, respectively. The second column provides the ground truth masks of the FOD, while the third column shows the predicted anomaly masks generated by the proposed M_FOD model. The fourth column presents the anomaly score heatmaps, indicating the spatial distribution of detected deviations.

4. Discussion

4.1. Impact of Training Datasets

To evaluate the influence of training data, models were trained using the training sets of MVTec, MPDD, and the custom FOD dataset, respectively, with the spatial transformation module fixed in shear mode. The resulting models are designated as M_MVTec, M_MPDD, and M_FOD. During the testing phase, the FOD test set was evaluated separately using each of these three models. The corresponding image-level and pixel-level AUC values are summarized in Table 3.

As evidenced by the experimental results, the model M_MVTec, trained on the MVTec dataset comprising 3629 images, demonstrates excellent efficiency in terms of Image-level AUC, which can be attributed to the relatively ample size and diversity of its training set. In contrast, although the FOD dataset contains only 300 training images, the model M_FOD still achieves competitive training effectiveness owing to the high semantic and contextual similarity between its training and testing images, as well as their comparable complexity.

4.2. Impact of Spatial Transformation

Given that the spatial transformation module significantly influences the proposed image registration approach during model construction, we systematically trained the model using the MVTec training dataset across each of the following six transformation modes: affine, shear, rotation, translation, rotation-translation, and rotation-translation-scale. Each resulting model was subsequently evaluated on the test set from the lower EE compartment category of the FOD dataset. The corresponding Image-level and Pixel-level AUC values are comprehensively summarized in Table 4.

As indicated by the experimental results, the choice of mode in the spatial transformation module exhibits a limited impact on the overall performance. Among the evaluated modes, rotation-translation achieved the highest Image-level AUC, while shear mode yielded the optimal Pixel-level AUC.

4.3. Robustness of the Designed Model

In this study, we investigated a critical practical question regarding the detection of FOD in aircraft: an aircraft typically comprises over 150 enclosed zones, each requiring approximately 100 evidence retention images, resulting in more than 15,000 categories in total. As it is infeasible to construct a dedicated dataset for each individual zone under realistic conditions, we introduce an additional dataset termed “Car”, consisting of diverse children’s toy vehicles arranged in common daily environments. This dataset is explicitly designed to maximize domain shift relative to the MVTec, MPDD, and FOD datasets, thereby rigorously evaluating the model’s robustness under out-of-distribution conditions. The detailed image composition of the Car dataset is provided in Table 5. The models, designated as M_MVTec, M_MPDD, and M_FOD, were trained using the training sets of the MVTec, MPDD, and FOD datasets, respectively, with the spatial transformation module fixed in shear mode. During the evaluation phase, the test set of the Car dataset was assessed on each of these three models. The corresponding Image-level and Pixel-level AUC values are comprehensively summarized in Table 6. As indicated by the experimental results, since both the FOD and Car datasets comprise complex real-world scenes (a visual comparison is provided in Figure 3), the designed model M_FOD trained on the FOD dataset demonstrates excellent efficiency in both Image-level AUC and Pixel-level AUC.

The model was trained using the training set of the FOD dataset. Experimental results on the test set of the Car dataset are visualized in Figure 4. Each row corresponds to a test example: the first column displays the original input images, including an anomaly-free scenario (no car, labeled as “good”) and scenarios containing FOD in the form of black, green, orange, purple, red, and yellow cars. The second column provides the ground truth masks indicating the presence of extraneous cars. The third column shows the predicted anomaly masks generated by M_FOD, highlighting the detected FOD. The fourth column presents the anomaly score heatmap, which reflects the spatial distribution of deviation scores.

5. Conclusions

Based on image registration, this study investigates an auxiliary detection method for aircraft FOD. Experimental results demonstrate the effectiveness of the proposed model, the influence of key parameters, and its robustness. The detection approach is inspired by the human cognitive process during visual inspection, which involves comparing the currently observed image with a correct mental reference. This work represents an attempt to mimic such human-like comparative reasoning in machine learning. With only a small number of reference images depicting normal conditions, the method can inherently determine whether a test image deviates from the norm. Its applicability extends beyond aircraft FOD detection. For example, it could be employed in tasks such as verifying the presence of fasteners in aircraft assembly—a scenario characterized by high volume and proneness to human error. More importantly, the same model weights can be applied across different tasks. Specifically, the trained model M_FOD from this study could, in principle, be directly utilized to detect missing fasteners in aircraft assembly, thereby serving as a universal model for this category of inspection tasks.

However, the extent and limitations of this generalizability—such as the impact of scene complexity—require further in-depth investigation. The proposed method also has certain limitations. Although the self-constructed FOD and Car datasets in this study exhibit significantly higher scene complexity compared with the MVTec and MPDD datasets, further increases in complexity may challenge the method. Under such conditions, discrepancies between test images and reference images may become less detectable due to constraints in the model’s input dimensions. In real-world deployment, differences in shooting viewpoint, illumination, and capture devices between test and reference images may markedly affect model performance; a quantitative analysis of this influence requires further study. Moreover, future research should focus on improving robustness to small objects, transparent or thin structures, color variations, and shadows to facilitate broader practical application.

Author Contributions

Conceptualization, M.C. and H.Z.; methodology, M.C. and Y.L.; software, X.L. and S.C.; validation, H.Z.; writing—original draft preparation, M.C.; writing—review and editing, S.C.; supervision, H.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding authors.

Conflicts of Interest

Authors Mo Chen, Xuhui Li and Sheng Cheng were employed by the company COMAC Shanghai Aircraft Manufacturing Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Roth, K.; Pemula, L.; Zepeda, J.; Schölkopf, B.; Brox, T.; Gehler, P. Towards Total Recall in Industrial Anomaly Detection. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 14298–14308. [Google Scholar] [CrossRef]
Chen, L.; Jiang, H.; Wang, L.; Li, J.; Yu, M.; Shen, Y.; Du, X. Generative Adversarial Synthetic Neighbors-Based Unsupervised Anomaly Detection. Sci. Rep. 2025, 15, 16. [Google Scholar] [CrossRef] [PubMed]
Wang, H.; Liao, S.; Shao, L. AFAN: Augmented Feature Alignment Network for Cross-Domain Object Detection. IEEE Trans. Image Process. 2021, 30, 4046–4056. [Google Scholar] [CrossRef] [PubMed]
Da, Z.; Gao, Y.; Xue, Z.; Cao, J.; Wang, P. Local and Global Feature Aggregation-Aware Network for Salient Object Detection. Electronics 2022, 11, 231. [Google Scholar] [CrossRef]
Wang, H.; Fu, L.; Wang, L. Detection Algorithm of Aircraft Skin Defects Based on Improved YOLOv8n. Signal Image Video Process. 2024, 18, 3877–3891. [Google Scholar] [CrossRef]
Liao, K.-C.; Lau, J.; Hidayat, M. Aircraft Skin Damage Visual Testing System Using Lightweight Devices with YOLO: An Automated Real-Time Material Evaluation System. AI 2024, 5, 1793–1815. [Google Scholar] [CrossRef]
Qiao, Z.; Shi, D.; Yi, X.; Shi, Y.; Liu, Y.; Zhang, Y. UEFPN: Unified and Enhanced Feature Pyramid Networks for Small Object Detection. ACM Trans. Multimed. Comput. Commun. Appl. 2022, 19, 95. [Google Scholar] [CrossRef]
Hu, S.; Ma, X.; Zhang, Y.; Xu, W. Application of Self-Supervised Learning in Steel Surface Defect Detection. J. Mater. Inform. 2025, 5, 44. [Google Scholar] [CrossRef]
Zhang, H.; Fu, W.; Wang, X.; Li, D.; Zhu, D.; Su, X. An Improved and Lightweight Small-Scale Foreign Object Debris Detection Model. Clust. Comput. 2025, 28, 296. [Google Scholar] [CrossRef]
Ye, D.; Wang, J.; Li, Z. A Real-Time Algorithm for Foreign Object Debris Detection on Airport Runways. In Proceedings of the AOPC 2020: Optical Sensing and Imaging Technology, Beijing, China, 30 November–2 December 2020; Volume 11567, p. 115674G. [Google Scholar] [CrossRef]
Zhang, H.; Fu, W.; Li, D.; Wang, X.; Xu, T. Improved Small Foreign Object Debris Detection Network Based on YOLOv5. J. Real-Time Image Process. 2024, 21, 21. [Google Scholar] [CrossRef]
Shan, J.; Miccinesi, L.; Beni, A.; Pagnini, L.; Cioncolini, A.; Pieraccini, M. A Review of Foreign Object Debris Detection on Airport Runways: Sensors and Algorithms. Remote Sens. 2025, 17, 225. [Google Scholar] [CrossRef]
Kumari, A.; Dixit, A.; Agrawal, P. Deep Learning Based Foreign Object Debris (FOD) Detection on Runway. In Proceedings of the 2024 International Conference on Emerging Smart Computing and Informatics (ESCI), Pune, India, 5–7 March 2024; pp. 1–6. [Google Scholar] [CrossRef]
Mo, Y.; Wang, L.; Hong, W.; Chu, C.; Li, P.; Xia, H. Small-Scale Foreign Object Debris Detection Using Deep Learning and Dual Light Modes. Appl. Sci. 2024, 14, 2162. [Google Scholar] [CrossRef]
Yu, M.; Zhao, Q.; Cheng, S.; Cai, H.; Liu, L. An Automatic Detecting Method for Multi-Scale Foreign Object Debris in Civil Aircraft Manufacturing and Assembly Scenario. J. Intell. Manuf. 2024. [Google Scholar] [CrossRef]
Chen, X.; He, K. Exploring Simple Siamese Representation Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 15750–15758. [Google Scholar]
Li, Y.; Chen, C.L.P.; Zhang, T. A Survey on Siamese Network: Methodologies, Applications, and Opportunities. IEEE Trans. Artif. Intell. 2022, 3, 994–1014. [Google Scholar] [CrossRef]
Cen, M.; Jung, C. Fully Convolutional Siamese Fusion Networks for Object Tracking. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 3718–3722. [Google Scholar] [CrossRef]
Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. Int. Conf. Mach. Learn. 2020, 119, 1597–1607. [Google Scholar]
He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 9726–9735. [Google Scholar] [CrossRef]
Caron, M.; Misra, I.; Mairal, J.; Goyal, P.; Bojanowski, P.; Joulin, A. Unsupervised learning of visual features by contrasting cluster assignments. Adv. Neural Inf. Process. Syst. 2020, 33, 9912–9924. [Google Scholar]
Grill, J.; Strub, F.; Tallec, C.; Guo, Z.; Valko, M. Bootstrap your own latent: A new approach to self-supervised learning. In Proceedings of the 34th Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020. [Google Scholar] [CrossRef]
Jaderberg, M.; Simonyan, K.; Zisserman, A. Spatial Transformer Networks. In Proceedings of the Advances in Neural Information Processing Systems 28 (NeurIPS 2015), Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
Defard, T.; Setkov, A.; Loesch, A.; Audigier, R. PaDiM: A Patch Distribution Modeling Framework for Anomaly Detection and Localization. In Proceedings of the International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 475–489. [Google Scholar]
Huang, C.; Guan, H.; Jiang, A.; Zhang, Y.; Spratling, M.; Wang, Y.-F. Registration Based Few-Shot Anomaly Detection. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; pp. 303–319. [Google Scholar]

Figure 1. Overall of the proposed Registration–Siamese network.

Figure 2. Result Visualization of the designed model in the custom-built dataset. In the second column, the white regions indicate anomalies manually annotated. In the third column, the white regions indicate anomalies detected by the model. In the fourth column, regions with higher anomaly scores are rendered closer to red.

Figure 3. Comparison of scene types across the MVTec, MPDD, FOD, and Car datasets.

Figure 4. Visualization of Experimental Results on the Car Scenario.

Table 1. FOD dataset in civil aircraft.

	Train Dataset	Test Dataset
	Train Dataset	Good	Cable Protection Sleeves	Cable Ties	Cleaning Cloths	Adhesive Tapes
EE compartment wiring panel	100	50	100	100	100	100
Lower section of the EE compartment	100	50	100	100	100	100
Lower section of the forward accessory compartment	100	50	100	100	100	100

Table 2. The Image-level AUC and Pixel-level AUC of M_FOD and PaDiM.

	EE Compartment Wiring Panel		Lower Section of the EE Compartment		Lower Section of the Forward Accessory Compartment		Average Value
	Img-Level AUC	Pixel-Level AUC	Img-Level AUC	Pixel-Level AUC	Img-Level AUC	Pixel-Level AUC	Average Img-Level AUC	Average Pixel-Level AUC
$M_{FOD}$	0.5976	0.8175	0.9920	0.9565	0.7400	0.9177	0.7765	0.8972
PaDiM	0.5145	0.7356	0.7778	0.8839	0.6278	0.8219	0.6400	0.8138

Table 3. Image-level and Pixel-level AUC performance of various models.

	EE Compartment Wiring Panel		Lower Section of the EE Compartment		Lower Section of the Forward Accessory Compartment		Average Value
	Img-Level AUC	Pixel-Level AUC	Img-Level AUC	Pixel-Level AUC	Img-Level AUC	Pixel-Level AUC	Img-Level AUC	Pixel-Level AUC
$M_{M V T e c}$	0.7344	0.8474	0.9936	0.9484	0.8260	0.8807	0.8513	0.8922
$M_{M P D D}$	0.6812	0.8323	0.9584	0.9318	0.5532	0.8960	0.7309	0.8867
$M_{F O D}$	0.5976	0.8175	0.9920	0.9565	0.7400	0.9177	0.7765	0.8972

Table 4. The Img-level AUC and Pixel-level AUC under different spatial transformation.

	Lower Section of the EE Compartment
	Img-Level AUC	Pixel-Level AUC
Affine	0.9928	0.8937
Shear	0.9936	0.9484
Rotation_scale	0.9964	0.9294
Translation_scale	0.9928	0.8937
Rotation-translation	0.9980	0.9281
Rotation-translation_scale	0.9928	0.8937

Table 5. Car dataset.

	Train Dataset	Test Dataset
	Train Dataset	Good	Black Car	Green Car	Orange Car	Purple Car	Red Car	Yellow Car
Living scenario	960	90	30	30	30	30	30	30

Table 6. Image-level and Pixel-level AUC of the Car test set evaluated on different models.

	Living Scenario
	Img-Level AUC	Pixel-Level AUC
$M_{M V T e c}$	0.9747	0.9959
$M_{M P D D}$	0.9704	0.9972
$M_{F O D}$	0.9864	0.9975

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, M.; Li, X.; Liu, Y.; Cheng, S.; Zuo, H. Aircraft Foreign Object Debris Detection Method Using Registration–Siamese Network. Appl. Sci. 2025, 15, 10750. https://doi.org/10.3390/app151910750

AMA Style

Chen M, Li X, Liu Y, Cheng S, Zuo H. Aircraft Foreign Object Debris Detection Method Using Registration–Siamese Network. Applied Sciences. 2025; 15(19):10750. https://doi.org/10.3390/app151910750

Chicago/Turabian Style

Chen, Mo, Xuhui Li, Yan Liu, Sheng Cheng, and Hongfu Zuo. 2025. "Aircraft Foreign Object Debris Detection Method Using Registration–Siamese Network" Applied Sciences 15, no. 19: 10750. https://doi.org/10.3390/app151910750

APA Style

Chen, M., Li, X., Liu, Y., Cheng, S., & Zuo, H. (2025). Aircraft Foreign Object Debris Detection Method Using Registration–Siamese Network. Applied Sciences, 15(19), 10750. https://doi.org/10.3390/app151910750

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Aircraft Foreign Object Debris Detection Method Using Registration–Siamese Network

Abstract

1. Introduction

2. Materials and Methods

2.1. Siamese Representation Learning

2.2. Spatial Transformation Module

2.3. Anomaly Scoring

2.4. Designed Method

3. Results

3.1. Custom-Built Dataset

3.2. Experimental Results

3.3. Result Visualization

4. Discussion

4.1. Impact of Training Datasets

4.2. Impact of Spatial Transformation

4.3. Robustness of the Designed Model

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI