A SAM-Based Detection Method for the Distance Between Air-Craft Fire Detection Lines

Chen, Mo; Cheng, Sheng; Liu, Yan; Yin, Qifan; Zuo, Hongfu

doi:10.3390/app15105342

Open AccessArticle

A SAM-Based Detection Method for the Distance Between Air-Craft Fire Detection Lines

by

Mo Chen

^1,2,*,

Sheng Cheng

²,

Yan Liu

¹,

Qifan Yin

^2,3 and

Hongfu Zuo

^1,*

¹

College of Civil Aviation, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China

²

COMAC Shanghai Aircraft Manufacturing Co., Ltd., Shanghai 201324, China

³

School of Aeronautics and Astronautics, Shanghai Jiao Tong University, Shanghai 200240, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(10), 5342; https://doi.org/10.3390/app15105342

Submission received: 19 March 2025 / Revised: 5 May 2025 / Accepted: 8 May 2025 / Published: 10 May 2025

(This article belongs to the Section Aerospace Science and Engineering)

Download

Browse Figures

Versions Notes

Abstract

Checking the distance between aircraft fire detection lines is a crucial task in the conformity inspection process of civil aircraft manufacturing. Currently, this task is mainly performed manually, which is inefficient and prone to errors and omissions. To address this issue, we propose a method for detecting the distance between aircraft fire detection lines based on the Segment Anything Model (SAM). In this method, we develop a general model for aircraft parts image segmentation and detection, named the Aircraft Segment Anything Model (ASAM). This model uses a low-rank fine-tuning strategy (LoRA) to fine-tune the encoder and mask decoder synchronously and incorporates a fusion loss function for adaptive training of the target task. The trained ASAM model is tested on a self-built fire detection line segmentation dataset, achieving Dice scores of 85.28, 85.99, and 86.26 based on sam_vit_b, sam_vit_l, and sam_vit_h weights, respectively. It scores 13.82 points higher than the classic segmentation model K-Net under the same training parameters. The proposed method provides a new approach for the widespread application of SAM in the field of aircraft parts image segmentation.

Keywords:

SAM; LoRA; aircraft parts; image segmentation; fusion loss function computer vision

1. Introduction

Fire detection lines are critical components of an aircraft’s fire protection system, and they are responsible for monitoring whether key compartments such as the engine bay, APU bay, EE bay, and cargo hold experience fire or overheating. According to aircraft manufacturing process specifications, the distance between any two fire detection lines must exceed a defined threshold. Therefore, verifying the spacing between fire detection lines is a crucial step in ensuring conformity with type design during aircraft assembly.

Due to the wide distribution of fire detection lines across various complex compartments and their large quantity, manual inspection for spacing compliance is time-consuming and prone to errors or omissions. The task of computing the distance between fire detection lines can be divided into two subtasks: (1) segmentation of the fire detection lines and (2) calculation of the minimum spacing between them. As the temperature of fire detection lines is typically the same as the surrounding environment, infrared imaging cannot be used to identify them. Therefore, visual algorithms are required to detect their contours.

To the best of our knowledge, there is currently no dedicated research focused on visual detection algorithms specifically for fire detection lines. Related work has mainly addressed similar elongated objects using either traditional edge detection methods or deep learning-based segmentation models. However, conventional edge detection algorithms generally lack adaptability to complex and cluttered scenarios such as those inside aircraft compartments, making them unsuitable for this task. Deep learning-based segmentation models have shown strong performance in various fields, including automatic parking [1], point cloud processing [2], remote sensing image processing [3,4], autonomous driving [5], security imaging [6], aerial imaging [7], and medical imaging [8]. Nevertheless, applications in the aerospace field remain rare, largely due to the high accuracy requirements, lack of high-quality public datasets, prevalence of fine elongated structures, difficulty in extracting texture-similar features, and the need for domain-specific knowledge [9,10].

However, high-quality image segmentation of aircraft components can compute the geometric relationships between parts by using segmented masks. These relationships can be compared with the engineering requirements of the aircraft to determine manufacturing compliance. This provides significant support for assessing the conformity of assembly relationships for a large number of components, such as cables, pipes, and fasteners in aircraft. It also helps to reduce the workload of operators and minimize the impact of human errors.

In recent years, the segmentation detection model based on deep learning has made great progress. A full convolutional network (FCN) [11] has broken through the limit of input image size, which can process images of any size and generate segmentation results by replacing the full connection layer with a convolution layer. It greatly enhances the flexibility and application scope of the model. U-Net [12] combines the encoder and decoder structure and retains important detailed features through cross-level jump connections. The encoder down-samples the input image to capture the features, and the decoder up-samples the feature map to predict the segmentation results, which realizes fine local reconstruction. Zhou et al. [13] further developed and expanded the U-Net architecture on this basis by introducing the nested jump connection strategy, enabling the model to more effectively capture multi-scale context information, promote the deep fusion of features at different levels, and improve the accuracy and robustness of segmentation. The Deep lab [14] series models introduce the concept of void convolution and a fully connected conditional random field, which not only expands the receptive field and enhances the model’s ability to understand a wide range of contexts but also optimizes the segmentation boundary through post-processing steps, making the model perform well in complex scenes and fine-grained segmentation tasks. However, due to the limited convolution characteristics, these methods have some difficulties in identifying small and slender objects [15]. At the same time, the small-scale self-built dataset is fine-tuned to transfer the pre-trained model to other objects.

In recent years, Transformer [16] has been gradually introduced to machine vision by NLP tasks. ViT [17], for the first time, applied the self-attention mechanism to the image classification task based on the Transformer, demonstrating the ability of global feature capture. TransUNet [18] creatively integrates CNN and ViT and uses the local details (CNN is good at) and global context information (ViT is good at) in the image segmentation task to realize the effective integration of regional and global features of the input image. Swin UNETR [19] focuses on using ViT as the main feature extractor, using the Swin Transformer structure to improve the segmentation efficiency and ability by gradually narrowing the range of attention.

Due to the highly parallelized self-attention mechanism, hierarchical representation learning method, and robustness, which is easy to scale and transfer, the Transformer has gradually become a core component of large foundational models. In the field of machine vision, the Segment Anything Model (SAM) based on a Transformer has become a powerful tool for image segmentation tasks, offering impressive zero sample performance and the ability to transfer to new image distributions and functions [20]. It has been applied in various practical scenarios, including agriculture, manufacturing, remote sensing, and health care [21]. Although SAM has been trained on 11 million images with 1 billion masks, the application of the SAM training dataset still faces challenges in the aircraft field due to the lack of aircraft parts image samples and the different characteristics of aircraft parts and ordinary objects.

Therefore, it is necessary to build a custom dataset, develop fine-tuning methods and loss functions, analyze the impact of involved parameters on model accuracy, and conduct fine-tuning of SAM to apply SAM in the field of aircraft component image segmentation. Currently, there are three technical approaches: (1) A large-scale domain-specific dataset is created, and comprehensive fine-tuning of all parameters in the SAM model is performed [22]; (2) A CNN network is added to SAM [23,24]. Then, most of the SAM parameters are frozen, and a small number of parameters are fine-tuned; (3) LoRA [25] or adapters [26,27] are added to SAM, which refers to the fine-tuning methods of large language models. The branch parameters are adjusted, trained, and combined with the SAM model parameters. This architecture is relatively simpler [28].

Considering the characteristics of target segmentation detection in the aviation field, the third technical route is selected. For the distance detection between aircraft fire detection lines, the main contributions are as follows:

The dataset of fire detection line segmentation is self-built. Fire detection lines are widely involved in the aircraft manufacturing compliance inspection process and have slender features, which are typically representative and difficult to divide.
The SAM fine-tuning model, ASAM, based on LoRA, is proposed, which is 13.82 times higher than the Dice of the classical segmentation model K-Net under the same training parameters. Only a small part of the model parameters needs to be updated to reduce the training cost in the actual use process.
The fusion loss function is proposed to conduct adaptive training to the target task, which improves the ability of the training model to match the target task. Since the minimum distance between the fire detection lines should be controlled to be greater than the value required by the process specification, we can take the target task of calculating the minimum distance between the fire detection lines through image segmentation as an example. The research method has a certain generality for the application in the aviation field.
The influence of different LoRA ranks and SAM weights on the model is studied through experiments, which provides a new reference for the wide application of SAM in the field of image segmentation and detection of aircraft parts.

2. Materials and Methods

2.1. SAM

The core technology of the Segment Anything Model (SAM) draws on Transformer, whose self-attention mechanism allows models to capture long-distance dependencies, which is particularly important for understanding complex visual scenes and fine-grained segmentation tasks. At the beginning of SAM design, SAM is able to learn rich visual patterns through large-scale and diversified datasets and conduct zero-sample migration on unseen data distributions and tasks, which makes SAM the basic model for other visual tasks.

The SAM core consists of three parts: image encoder, prompt encoder, and mask decoder. The image encoder uses a Vision Transformer (ViT) to transform the input image into a high-dimensional feature vector or image encoding. The prompt encoder processes different types of input prompts, enabling it to guide the model in producing a segmentation mask. The mask decoder efficiently generates the segmentation mask of the target object based on the input image encoding and prompt encoding.

However, for private datasets like aircraft part images, SAM requires either global or local fine-tuning to learn its unique textures and features. Although there is no case in the field of aircraft parts, MedSAM uses 1,570,263 image mask pairs to fine-tune SAM globally in the field of medical imaging. The model performs 150 training [22] tasks on 20 A100 (80 G) GPUs. Such training costs are not affordable for aircraft manufacturing and maintenance enterprises. Therefore, this paper studies the SAM local fine-tuning method with more industrial utility.

2.2. LoRA

Low-Rank Adaptation (LoRA) can efficiently customize pre-trained large models. It is originally applied to the diffusion model or language model. It has trained a few parameters, retains complete fine-tuning performance, accelerates the training speed, and greatly reduces the size of the weights. Recently, it has become one of the preferred methods for customized AI models. LoRA allows the indirect training of some dense layers in the neural network by optimizing the low-rank decomposition matrix of dense layers varying during debugging, which keeps the pre-trained weight frozen [25].

As shown in Figure 1, a matrix d × d can represent any linear transformation in a vector space of dimension d. First, the transformation is converted from R^d to R^r(where r ≪ d), and then it is converted back to R^d. Through this process, the number of parameters is reduced from d × d to 2 × d × r. Theoretically, LoRA can be applied to all models by using matrix multiplication. The effect of LoRA on the fine-tuning of the model can be improved by adjusting r and other methods. Based on the above principles, LoRA can be extended to the field of machine vision. By injecting a trainable low-rank decomposition matrix into each layer of the Transformer architecture, the weight of the pre-trained model is unchanged, and the number of parameters that need to be trained during fine-tuning is significantly reduced. The trainable matrix is merged with frozen weights during deployment, so it does not introduce additional inference delays like other add-on modules.

In view of the above characteristics, we can also train multiple LoRA models in parallel in subsequent studies, such as fire detection lines, pipelines, and rivets. Each LoRA performs its task and shares the same underlying model weights. Multiple LoRAs are loaded in RAM simultaneously. Different inputs are fed through different LoRA models to build agents adapted to multiple aircraft parts image segmentation tasks.

2.3. Model Structure

This paper proposes the ASAM model. Based on SAM, it applies mature image encoders, prompt encoders, and a LoRA fine-tuning scheme for the mask decoder and innovatively proposes a fusion loss function. For the image encoder, all parameters of the SAM are frozen, and the image features of the custom dataset are compressed and projected to the same dimension as the SAM output by introducing the frozen SAM parameters. The original SAM points, boxes, masks, and text prompts were discarded, and the encoder was prompted to use the automatic segmentation mode uniformly. All parameters in the SAM mask decoder are frozen by using the same mode. Fine-tuning is performed by using LoRA. For the segmentation head of SAM, since the target task only requires distinguishing between the background and the fire detection line, the segmentation head is simplified by removing the ambiguity setting that allows SAM to output multiple segmentation masks.

2.4. Image Encoder

Referring to the method in the LoRA paper [20], LoRA can adjust Wq, Wv, Wk, or Wo. According to the experimental results in the paper, the benefits obtained by adjusting Wq and Wv are comparable to those obtained by adjusting Wq, Wk, Wv, and Wo. They are greater than the benefits obtained by adjusting Wq. Therefore, to simplify the model, this paper chooses to change the Wq, Wv scheme. The structure of the Multi-Head Attention model after adding LoRA is detailed in Figure 2. In the figure, an attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and production are all vectors. The output is computed as a weighted sum of the values, where a compatibility function of the query with the corresponding key computes the weight assigned to each value. In practice, we compute the attention function on a set of queries simultaneously, packed together into a matrix Q. The keys and values are also packed together into matrices K and V.

In Figure 2, value (V), key (K), and query (Q) are projected to a low dimension h times. Then, the attention function of h times is performed. Finally, the output is combined and then projected back to obtain the final output. During the process, the LoRA bypass is added to V and Q.

M u l t i H e a d (Q, K, V) = C o n c a t ({h e a d}_{1}, \dots, {h e a d}_{h}) W^{0}

(1)

{h e a d}_{i} = A t t e n t i o n (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V})

(2)

where

W^{O} \in R^{{h d}_{v} \times d_{m o d e l}}

,

W_{i}^{Q} \in R^{d_{m o d e l} \times d_{k}}

,

W_{i}^{K} \in R^{d_{m o d e l} \times d_{k}}

,

W_{i}^{V} \in R^{d_{m o d e l} \times d_{v}}

.

I n t h e f o r m u l a, h

= 8 is chosen, because

d_{m o d e l}

= 512,

d_{k}

=

d_{v}

=

d_{m o d e l} / h

= 64.

For the given F, due to the addition of LoRA layers, V and Q in the above formula are combined with low-rank matrix parameters based on the original SAM parameters. Wq, Wk, and Wv are from the frozen projection layers of SAM, Aq, Bq, Av, and Bv, which are trainable LoRA parameters. The calculation method for the above formula is the same as in the paper [16].

Q = W_{q} F + B_{q} A_{q} F

(3)

K = W_{k} F

(4)

where

A \in R^{r \times C_{i n}}

,

B \in R^{C_{o u t} \times r}

,

W_{q} \in R^{C_{o u t} \times C_{i n}}

,

W_{v} \in R^{C_{o u t} \times C_{i n}}

.

2.5. Tip Encoder and Mask Decoder

For the prompt encoder, the auto-segmentation mode is always used, and other prompt modes are removed during the fine-tuning process. For the mask decoder, since the mask decoder of SAM includes a Transformer decoder block and a dynamic mask prediction head, the same LoRA fine-tuning method used for the image encoder is applied to the Transformer decoder block. In contrast, the dynamic mask prediction head is significantly simplified. Since only the mask for the fire detection line needs to be segmented, the category is fixed, and the ambiguity prediction feature of SAM is removed. Analysis shows that this simplification helps improve performance.

2.6. Loss Function

The constructed loss function consists of two parts. For the segmentation task, the traditional Dice loss function is used. On this basis, to optimize the fine-tuned model’s adaptability to the target task (i.e., fire detection line distance detection), a target task loss function is added. The sum of the weights of the two loss functions is used as the total loss function.

2.6.1. Dice Loss Function

The Dice loss function is a loss function based on the Dice coefficient (Sørensen–Dice coefficient), which is widely used in computer vision tasks such as image segmentation and object detection, especially performing well in dealing with datasets with category imbalance characteristics. The purpose of introducing the Dice loss function is to measure the degree of overlap between the prediction mask and the true mask, enabling the model to learn more accurate information.

The Dice coefficient is a measure of the similarity of two sets, which is defined as the size of the intersection of two sets divided by the size of the union. In the second classification problem, it can be expressed as follows:

D i c e C o e f f i c i e n t = \frac{2 \times T P}{2 \times T P + F P + F N}

(5)

where TP is the true positive, FP is the false positive, and FN is the false negative.

The Dice loss function minimizes the loss by maximizing the value of the Dice coefficient. Therefore, the Dice loss function is defined as one minus the Dice coefficient:

D i c e L o s s = 1 - D i c e C o e f f i c i e n t = 1 - \frac{2 \times T P}{2 \times T P + F P + F N}

(6)

To avoid division by zero and improve numerical stability, a smoothing term

ϵ = 10^{- 6}

is added to both the numerator and the denominator:

D i c e L o s s = 1 - \frac{2 \times T P + ϵ}{2 \times T P + F P + F N + ϵ}

(7)

2.6.2. Loss Function of the Target Task

Since the target task is to calculate the minimum distance between the fire detection lines through the segmentation masks, the ratio of the width of the segmentation masks to the actual width of the fire detection lines is used as a reference during the calculation process. In order to prevent the situation where the mask prediction is good but the mask width and relative distance prediction are poor, the minimum distance between the fire detection lines is recorded during the process of collecting the custom dataset. During training, the minimum distance between the fire detection lines is calculated by using the predefined masks, and the difference between this calculated distance and the recorded true minimum distance is used as part of the total loss function to optimize the model’s performance on the target task. Because the mask prediction in the early training stage is relatively rough, it is meaningless to introduce the target task loss function. Therefore, when the training is 50%, the target task loss function is introduced. The total loss function is

L o s s = \{\begin{matrix} D i c e L o s s e p o c h < 0.5 e p o c h \\ λ_{1} D i c e L o s s + λ_{2} D i s t a n c e L o s s e p o c h \geq 0.5 e p o c h \end{matrix}

(8)

In this study, hyper-parameters λ₁ and λ₂ are set to 0.5. Although this method is demonstrated with the task of measuring the minimum distance between fire detection lines, it is of universal applicability. Additionally, Distance Loss can be adjusted or replaced to suit different target tasks. The calculation method for Distance Loss used in this paper is as follows.

Curve width measurement is widely used in the field of traditional machine vision. Because this curve is usually an object with uneven width of road cracks, product scratches, or human tissue, it is generally necessary to identify the contour of the curve and then calculate it based on the contour. The width of the fire detection line is consistent. The algorithm can be appropriately simplified. Because the distance between the fire detection line is introduced into the training iteration process, the simplified algorithm also helps to reduce the computational overhead.

The overall workflow of the proposed method is illustrated in Figure 3, including image acquisition, annotation and supervision preparation, LoRA-based SAM fine-tuning with fusion loss, segmentation inference, post-processing for distance calculation, and final evaluation.

As shown in Figure 4, after denoising the mask image, uniform grid lines

n \times n

are drawn. The pixel lengths

l = \{l_{1}, l_{2}, \dots, l_{m}\}

of the line segments where the grid lines intersect with the mask, the midpoint coordinates

p = \{p_{1}, p_{2}, \dots, p_{m}\}

, and the pixel distances

r = \{r_{1}, r_{2}, \dots, r_{m}\}

from

p

to the nearest black pixel are calculated.The pixel width of the fire detection line is

w = 2 \bar{r}

. If

p_{i}

and

p_{j}

are on the same initially drawn grid line but in different connected components of the mask, then the pixel distance between the fire detection lines at

p_{i}

is

d_{i} = d_{p_{i} p_{j}} \times r_{i} / 0.5 l_{i}

, where

d_{p_{i} p_{j}}

is the distance from

p_{i}

to

p_{j}

. When

n

is sufficiently large, the minimum pixel distance between the fire detection lines is

d_{m i n} = m i n \{d_{1}, d_{2}, \dots, d_{t}\}

. Since some segments

p_{i}

and

p_{j}

are discarded because they are in the same connected component, the result is

t \leq m

. Since the true width

W

of the fire detection lines is known, the minimum true distance between the fire detection lines is

D = W \times d_{m i n} / w

, defined as follows:

D i s t a n c e L o s s = \frac{D - D_{r e c o r d}}{D_{r e c o r d}}

(9)

where D_record is the minimum distance between the fire detection lines recorded for each image in the collected dataset.

3. Results

3.1. Dataset Introduction

To investigate the fine-tuning capability of the model on small-scale datasets, we constructed a relatively small custom dataset. The training set consists of 400 images, including 100 images each from the engine bay, APU bay, EE bay, and cargo hold, with 25% of the images used for validation during training. The test set contains 100 images, with 25 images from each of the same four compartments, which collectively represent the typical environments where aircraft fire detection lines are located. All images were captured under natural lighting conditions, from a human inspector’s perspective, perpendicular to the plane formed by the two fire detection lines. Interactive annotations were performed using EISeg, and standard data augmentation techniques were applied.

Unlike conventional datasets, our dataset is task-specific. During the image acquisition process, we applied standard inspection procedures to measure the minimum distance between the fire detection lines within the camera’s field of view. These measurements were recorded alongside each image. In the subsequent training process, the recorded minimum distance was used as part of the input to the fusion loss function. During testing, in addition to computing the Dice coefficient for segmentation accuracy, the model’s accuracy in estimating the minimum distance between the fire detection lines was also evaluated. This design enhances the model’s applicability to the target task.

3.2. Experimental Results

Training of ASAM is conducted on an RTX 4090 and uses OpenMMLab’s MMSegmentation. It is compared with classic semantic segmentation algorithms, including UNet, SegFormer, PSPNet, K-Net, Fast-SCNN, and DeepLabV3. To ensure fairness, the same loss function is used, and the epoch is set to 50. Among them, UNet, SegFormer, PSPNet, Fast-SCNN, and DeepLabV3+ fail to segment the fire detection line mask on the custom dataset. We qualitatively analyze the reason, as the custom dataset images have relatively elongated features, and the dataset is small, which prevents these models from learning effective features. Among these models, only K-Net was able to generate valid segmentation masks on this task, making it the only operational baseline model in our experiments. We compare the Dice coefficients and the accuracy of the minimum distance between the fire detection lines for ASAM and K-Net, both when trained with only

D i c e L o s s

and when trained with the fusion loss function. ASAM uses the sam_vit_b weights and r = 512. K-Net uses the swin_large weights.

The accuracy of the minimum distance between the fire detection lines for the target task, denoted as A, is defined by Equation (10). The experimental results are detailed in Table 1.

A = 1 - \frac{D - D_{r e c o r d}}{D_{r e c o r d}}

(10)

As shown in Table 1, the use of a fused loss function that includes the target task loss function has a minimal impact on the model’s segmentation performance. Still, it significantly improves the accuracy of the minimum distance measurement between the fire detection lines. This provides insights for applying machine vision models to various domain tasks, emphasizing not only the model’s segmentation performance but also considering the application effects on the final target task during the training stage.

4. Discussion

4.1. Model Robustness

From the visualization results, we notice that the model can segment small sections of the fire detection lines that the annotators miss, such as the small segment in the lower right corner in Figure 5. This segment is missed due to its texture and color being very similar to the surrounding environment, leading to human error in manual annotation. Similarly, in Figure 6, the model identifies a small segment of the fire detection line in the lower left corner that the annotators miss due to its minuscule size, also leading to human error. This qualitatively demonstrates that the model is robust enough to handle similar textures and small objects.

4.2. Influence of Different SAM Weights and Ranks

In existing studies on SAM fine-tuning with LoRA, there is limited comprehensive research on the impact of different SAM weights and rank values. Since the selection of SAM weights and rank values is particularly important for the subsequent deployment and inference of the model, we conduct extensive experiments. We train the model by using the weights sam_vit_b, sam_vit_l, and sam_vit_h, with LoRA rank values set to 2, 4, 6, 8, 16, 32, 64, 128, 256, 512, 1024, and 2048. The Dice coefficients obtained on the test set are shown in Table 2. It is evident that the SAM models fine-tuned with different LoRA rank values show significant improvements compared to the SAM model without LoRA fine-tuning (the “no LoRA” row).

For the SAM models fine-tuned with LoRA, we find that the experimental results are relatively close. To provide a more intuitive analysis, the data are converted into a line chart, as shown in Figure 7.

4.3. Qualitative Analysis of the Results from Different Rank Experiments

To further investigate how changes in the rank value affect the model’s segmentation performance, we visualize the experimental results with the sam_vit_b weights. According to the visualization results, as the rank value decreases, the model begins to miss the identification of some fire detection lines, as shown in Figure 8. As the rank value increases, the model starts to incorrectly identify some noise as fire detection lines, as shown in Figure 9. This indicates that the rank value acts as a control for the sensitivity of segmentation. An appropriate rank value will ensure that the fine-tuned model is neither too insensitive nor too sensitive.

5. Conclusions

Based on SAM, a LoRA fine-tuning model, ASAM, is constructed for image segmentation of aircraft parts. An innovative fusion loss function has been proposed to enhance the model’s applicability to the target task. Taking the fire detection line as an example, a custom dataset is formed through collection and annotation. Experiments are conducted to compare the performance of the proposed model with K-Net quantitatively. The robustness of the model is also qualitatively analyzed. Meanwhile, the effects of different SAM weights and ranks on model performance are discussed from both quantitative and qualitative aspects to support subsequent deployment inference.

Given that fire detection lines are elongated and challenging to segment, the proposed ASAM model demonstrates a certain level of generalization in the field of aircraft component image segmentation. In future work, different LoRA fine-tuning models can be developed for various inspection tasks, enabling input routing within a single batch to construct a unified agent capable of handling multiple segmentation tasks for different aircraft parts. Furthermore, considering the multi-dimensional nature of aircraft inspection tasks, it is also promising to design task-specific fusion loss functions and model architectures to enhance adaptability across diverse application scenarios.

Despite its effectiveness, the proposed method still has several limitations. (1) To accurately calculate the distance between fire detection lines, the input image must be captured from a viewpoint perpendicular to the plane formed by the two lines during inference. (2) The current model does not yet support scenarios involving multiple pairs of fire detection lines in a single image. (3) The segmentation performance may be affected by occlusion from pipelines, shadows, and image overlap, especially in complex environments. Future work will focus on addressing these issues by exploring spatial distance estimation based on multi-view monocular images, as well as developing more advanced image enhancement and denoising techniques. These efforts are expected to improve the model’s robustness and applicability further.

Author Contributions

Conceptualization, M.C. and H.Z.; methodology, M.C. and Y.L.; software, S.C. and Q.Y.; validation, H.Z.; writing—original draft preparation, M.C.; writing—review and editing, S.C.; supervision, H.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

Authors Mo Chen, Sheng Cheng and Qifan Yin were employed by the company COMAC Shanghai Aircraft Manufacturing Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Kumar, V.R.; Eising, C.; Witt, C.; Yogamani, S.K. Surround-View Fisheye Camera Perception for Automated Driving: Overview, Survey & Challenges. IEEE Trans. Intell. Transp. Syst. 2023, 24, 3638–3659. [Google Scholar] [CrossRef]
Luo, H.; Chen, C.; Fang, L.; Khoshelham, K.; Shen, G. MS-RRFSegNet: Multiscale Regional Relation Feature Segmentation Network for Semantic Segmentation of Urban Scene Point Clouds. IEEE Trans. Geosci. Remote Sens. 2020, 58, 8301–8315. [Google Scholar] [CrossRef]
Zheng, X.; Chen, T. Segmentation of high spatial resolution remote sensing image based on U-Net convolutional networks. In Proceedings of the IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium, Virtual, 26 September–2 October 2020; pp. 2571–2574. [Google Scholar]
Wei, H.; Xu, X.; Ou, N.; Zhang, X.; Dai, Y. DEANet: Dual Encoder with Attention Network for Semantic Segmentation of Remote Sensing Imagery. Remote Sens. 2021, 13, 3900. [Google Scholar] [CrossRef]
Alberti, E.; Tavera, A.; Masone, C.; Caputo, B. IDDA: A Large-Scale Multi-Domain Dataset for Autonomous Driving. IEEE Robot. Autom. Lett. 2020, 5, 5526–5533. [Google Scholar] [CrossRef]
Akcay, S.; Breckon, T. Towards automatic threat detection: A survey of advances of deep learning within X-ray security imaging. Pattern Recognit. 2022, 122, 108245. [Google Scholar] [CrossRef]
Deng, G.; Wu, Z.; Wang, C.; Xu, M.; Zhong, Y. CCANet: Class-Constraint Coarse-to-Fine Attentional Deep Network for Subdecimeter Aerial Image Semantic Segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4401120. [Google Scholar] [CrossRef]
Li, B.; Zhang, J.; Yang, R.; Li, H. FM-Net: Deep Learning Network for the Fundamental Matrix Estimation from Biplanar Radiographs. Comput. Methods Programs Biomed. 2022, 220, 106782. [Google Scholar] [CrossRef]
Yin, Q.; Cheng, S.; Chu, Y.; Jing, Z. Health status assessment of electrical measuring equipment for aircraft based on Analytical Hierarchy Process. Aerosp. Syst. 2025, 1–13. [Google Scholar] [CrossRef]
Zhang, G.; Shen, B.; Cheng, S.; Tang, Y.; Zhong, L.; Liu, Y.; Yue, H. Failure analysis of flight damage to a composite outer door of a civil aircraft landing gear. Eng. Fail. Anal. 2025, 169, 109222. [Google Scholar] [CrossRef]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. pp. 234–241. [Google Scholar]
Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, 20 September 2018; Proceedings 4. pp. 3–11. [Google Scholar]
Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern. Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef]
Zhang, J.; Yang, X.; Wang, W.; Brilakis, I.; Davletshina, D.; Wang, H.; Cao, M. Segment-to-track for pavement crack with light-weight neural network on unmanned wheeled robot. Autom. Constr. 2024, 161, 105346. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
Ma, J.; He, Y.; Li, F.; Han, L.; You, C.; Wang, B. Segment anything in medical images. Nat. Commun. 2024, 15, 654. [Google Scholar] [CrossRef] [PubMed]
Azad, R.; Aghdam, E.K.; Rauland, A.; Jia, Y.; Avval, A.H.; Bozorgpour, A.; Karimijafarbigloo, S.; Cohen, J.P.; Adeli, E.; Merhof, D. Medical Image Segmentation Review: The Success of U-Net. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10076–10095. [Google Scholar] [CrossRef]
Zhou, H.Y.; Guo, J.; Zhang, Y.; Han, X.; Yu, L.; Wang, L.; Yu, Y. nnFormer: Volumetric Medical Image Segmentation via a 3D Transformer. IEEE Trans. Image Process. 2023, 32, 4036–4045. [Google Scholar] [CrossRef]
Gui, S.; Song, S.; Qin, R.; Tang, Y. Remote Sensing Object Detection in the Deep Learning Era—A Review. Remote Sens. 2024, 16, 327. [Google Scholar] [CrossRef]
Huang, Y.; Yang, X.; Liu, L.; Zhou, H.; Chang, A.; Zhou, X.; Chen, R.; Yu, J.; Chen, J.; Chen, C.; et al. Segment anything model for medical images? Med. Image Anal. 2024, 92, 103061. [Google Scholar] [CrossRef] [PubMed]
Awais, M.; Naseer, M.; Khan, S.; Anwer, R.M.; Cholakkal, H.; Shah, M.; Yang, M.H.; Khan, F.S. Foundation Models Defining a New Era in Vision: A Survey and Outlook. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 2245–2264. [Google Scholar] [CrossRef]
Chai, S.; Jain, R.K.; Teng, S.; Liu, J.; Li, Y.; Tateyama, T.; Chen, Y.-w. Ladder fine-tuning approach for sam integrating complementary network. Procedia Comput. Sci. 2024, 246, 4951–4958. [Google Scholar] [CrossRef]
Sultan, R.I.; Li, C.; Zhu, H.; Khanduri, P.; Brocanelli, M.; Zhu, D. GeoSAM: Fine-tuning SAM with sparse and dense visual prompting for automated segmentation of mobility infrastructure. arXiv 2023, arXiv:2311.11319. [Google Scholar]
Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W.J.I. Lora: Low-rank adaptation of large language models. arXiv 2022, arXiv:2106.09685. [Google Scholar] [CrossRef]
Qiu, J.; Li, L.; Sun, J.; Peng, J.; Shi, P.; Zhang, R.; Dong, Y.; Lam, K.; Lo, F.P.; Xiao, B.; et al. Large AI Models in Health Informatics: Applications, Challenges, and the Future. IEEE J. Biomed. Health Inf. 2023, 27, 6074–6087. [Google Scholar] [CrossRef] [PubMed]
Pu, X.; Jia, H.; Zheng, L.; Wang, F.; Xu, F. ClassWise-SAM-Adapter: Parameter-Efficient Fine-Tuning Adapts Segment Anything to SAR Domain for Semantic Segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 4791–4804. [Google Scholar] [CrossRef]
Kaidong Zhang, D.L. Customized Segment Anything Model for Medical Image Segmentation. arXiv 2023, arXiv:2304.13785. [Google Scholar] [CrossRef]

Figure 1. The principle of LoRA parameter conversion.

Figure 2. Multi-head attention model structure with added LoRA.

Figure 3. The framework of the proposed ASAM-based segmentation and distance measurement method.

Figure 4. Fire detection line spacing calculation method.

Figure 5. The model segmented and annotated the fire detection line on the lower right side, which was missed by personnel.

Figure 6. The model segments and annotates the fire detection line in the lower-left side that was missed by the personnel.

Figure 7. The effect of different SAM checkpoints and ranks on the model.

Figure 8. As the rank decreases, the model experiences a loss in recognizing some of the fire detection lines.

Figure 9. As the rank increases, the model may incorrectly identify some interference as fire detection lines.

Table 1. Dice and target task accuracy of ASAM and K-Net when applying different loss functions.

Model	Loss Function	Dice	Target Task Accuracy
ASAM	Dice Loss	85.17	87.16
	Fusion loss function	85.28	95.45
K-Net	Dice Loss	72.07	79.63
	Fusion loss function	72.44	87.63

Table 2. The impact of different SAM checkpoints and ranks on the model.

LoRA	Sam_vit_b	Sam_vit_l	Sam_vit_h
None	44.32	42.49	41.14
2	83.55	84.2	84.78
4	83.34	84.47	84.78
6	82.23	84.81	84.92
8	81.88	84.53	85.12
16	82.82	84.34	85.01
32	82.62	84.39	84.75
64	83.47	83.98	85.11
128	84.44	85.11	85.47
256	84.98	85.1	86.17
512	85.28	85.53	85.78
1024	84.94	85.78	86.26
2048	85.28	85.99	86.25

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, M.; Cheng, S.; Liu, Y.; Yin, Q.; Zuo, H. A SAM-Based Detection Method for the Distance Between Air-Craft Fire Detection Lines. Appl. Sci. 2025, 15, 5342. https://doi.org/10.3390/app15105342

AMA Style

Chen M, Cheng S, Liu Y, Yin Q, Zuo H. A SAM-Based Detection Method for the Distance Between Air-Craft Fire Detection Lines. Applied Sciences. 2025; 15(10):5342. https://doi.org/10.3390/app15105342

Chicago/Turabian Style

Chen, Mo, Sheng Cheng, Yan Liu, Qifan Yin, and Hongfu Zuo. 2025. "A SAM-Based Detection Method for the Distance Between Air-Craft Fire Detection Lines" Applied Sciences 15, no. 10: 5342. https://doi.org/10.3390/app15105342

APA Style

Chen, M., Cheng, S., Liu, Y., Yin, Q., & Zuo, H. (2025). A SAM-Based Detection Method for the Distance Between Air-Craft Fire Detection Lines. Applied Sciences, 15(10), 5342. https://doi.org/10.3390/app15105342

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A SAM-Based Detection Method for the Distance Between Air-Craft Fire Detection Lines

Abstract

1. Introduction

2. Materials and Methods

2.1. SAM

2.2. LoRA

2.3. Model Structure

2.4. Image Encoder

2.5. Tip Encoder and Mask Decoder

2.6. Loss Function

2.6.1. Dice Loss Function

2.6.2. Loss Function of the Target Task

3. Results

3.1. Dataset Introduction

3.2. Experimental Results

4. Discussion

4.1. Model Robustness

4.2. Influence of Different SAM Weights and Ranks

4.3. Qualitative Analysis of the Results from Different Rank Experiments

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI